AndroTest

Automated Test Input Generation for Android: Are We There Yet?

Shauvik Roy Choudhary, Alessandra Gorla, Alessandro (Alex) Orso

Paper accepted at the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015)

Early version of the publication is available at arxiv.org/abs/1503.07217

Project Overview

The goal of this project is to compare the state-of-art test input generation techniques for Android. Recently, a lot of research has been done in the development of such techniques, which differ in the way they generate inputs, the strategy they use to explore the behavior of the app under test, and the specific heuritics they use. To better understand the strengths and weaknesses of these existing approaches, and get general insight on ways they could be made more effective, in this project we perform a thorough comparison of the main existing test input generation tools for Android. In our comparison, we evaluate the effectiveness of these tools, and their corresponding techniques, according to four metrics: code coverage, ability to detect faults, ability to work on multiple platforms, and ease of use.

List of Tools

The following table contains the list of tools, which were used in our study.

# Tool Name Publication
1. Monkey N/A, Part of the Android SDK (Google Inc.)
2. Acteve Automated Concolic Testing of Smartphone Apps.
Saswat Anand, Mayur Naik, Hongseok Yang, and Mary Jean Harrold.
FSE'12: ACM Symposium on Foundations of Software Engineering.
3. Dynodroid Dynodroid: An Input Generation System for Android Apps.
Aravind Machiry, Rohan Tahiliani, and Mayur Naik.
FSE'13: ACM Symposium on Foundations of Software Engineering.
4. A3E Targeted and Depth-first Exploration for Systematic Testing of Android Apps.
Tanzirul Azim and Iulian Neamtiu.
OOPSLA'13: Object-Oriented Programming, Systems, Languages, and Applications.
5. SwiftHand Guided GUI Testing of Android Apps with Minimal Restart and Approximate Learning
Wontae Choi, George Necula and Koushik Sen.
OOPSLA'13: Object-Oriented Programming, Systems, Languages, and Applications.
6. GuiRipper a.k.a. MobiGuitar MobiGUITAR -- A Tool for Automated Model-Based Testing of Mobile Apps.
Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Bryan Dzung Ta and Atif M. Memon.
IEEE-S/W'14: IEEE Software Volume: PP, Issue: 99, April 2014
7. PUMA PUMA: programmable UI-automation for large-scale dynamic analysis of mobile apps
Shuai Hao, Bin Liu, Suman Nath, William G.J. Halfond, and Ramesh Govindan.
MobiSys'14: Mobile systems, applications, and services

Experiment Details

For our experiments, we setup each of the tools along with the benchmark apps, on a common virtualized linux infrastructure. Each tool is run on each benchmark 10 times to account for non-deterministic factors, and we report both the best and the mean coverage in our results. Our results also reports the failures (unhandled exceptions), which were triggered by the tool during the 10 executions and attribute it to the tool's fault detection capability.

Our experimental benchmarks are a union of all open-source benchmarks used in the tool's evaluation. The following chart shows the distribution of the categories of these apps.

Evaluation Results

(Note: Click on charts to expand)

1. Pairwise comparison of coverage achieved and failures triggered.

Explanation:

This chart shows pairwise comparison of the tools in terms of coverage and failures. The pairwise statement coverage information is shown above the diagonal (right top; white background) and the percentage of statements covered by both is highlighted in grey. Similarly, pairwise failure information is shown below the diagonal (left botton; yellow background). Failures are unhandled exceptions that originate from the mobile application under test (i.e., the stack trace contains the application's package name).

NOTE: We could not obtain the statement coverage information from SwiftHand due to technical limitations of its underlying framework. Hence, for SwiftHand, we only compare the failures it invokes in the applications.

2. Variance of statement coverage achieved by tools on benchmark apps.

Explanation:

This chart shows the cross-benchmark variance in statement coverage obtained by the different tools. The mean of statement coverage for the 10 runs was considered for each application.

3. Progress of coverage for each tool on benchmark apps.

Explanation:

This chart shows the progress of the statement coverage achieved by each tool over 5 minute intervals. The mean of statement coverage for the 10 runs was considered for each application.

4. Unique failures triggered by tools on benchmark apps across 10 runs

Explanation:

This chart reports the cumulative failures, across 10 runs, which were triggered in the benchmark applications by the tools. The chart report unique failures, where uniqueness is determined by the stack trace associated with the failure.

Result Details

Experimental Infrastructure

Our experimental infrastructure contains all the tools, benchmark applications and scripts used in our empirical evaluation.

Download Virtual Machine (8.2GB)

Setup Instructions

To use our virtual machine, you will need to download and install VirtualBox and Vagrant tools. If you would like to see the GUI of the VM, then you also need to install the VirtualBox extension pack. Once both of these tools are installed, follow the steps below to setup out VM.

  1. In a terminal, add the Androtest box to Vagrant
    $ vagrant box add androtest http://bear.cc.gatech.edu/~shauvik/androtest/boxes/androtest_v2.box
    If you already downloaded the VM, put it's file path instead of the URL.
  2. Create a directory, say ~/vagrant/androtest, to host the vagrant machine and inside this directory download this Vagrantfile. This file contains the VirtualBox VM configuration that vagrant uses. Note that the configuration defines 10 VM instances labelled run1-run10.
    $ mkdir -p ~/vagrant/androtest
    $ cd ~/vagrant/androtest
    $ wget http://bear.cc.gatech.edu/~shauvik/androtest/boxes/Vagrantfile
  3. You can start the vm using vagrant up. Vagrant will create all virtual machines on your computer (run1-run10) and start them. To start only one (or few) VMs, pass the VM name(s) as parameters. Once the VM has booted up, login to the VM using SSH.
    $ vagrant up run1 
    $ vagrant ssh run1
    
    vagrant@run1:~$ ls -1
    android-ndk-r10     #--> Android NDK
    android-sdk-linux   #--> Android SDK
    lib                 #--> Libraries needed by tools
    scripts             #--> Scripts for experiments (invokes tools)
    subjects            #--> Open source android app benchmarks
    tools               #--> Android test input generation tools
    
    vagrant@run1:~$ ls /vagrant
    Vagrantfile	    #--> Host machine directory (i.e., ~/vagrant/androtest) is mounted as /vagrant on the Vagrant box
    
  4. To start the experiments, run the ~/scripts/run_[tool].sh to start the input generation tool on all benchmarks. Example for monkey below.
    vagrant@run1:~$ cd scripts
    vagrant@run1:~/scripts$ bash -x run_monkey.sh
    This command runs the monkey tool on all benchmarks. Results are saved in /vagrant/results directory in the VM, which is ~/vagrant/androtest/results directory on the host machine.

Choosing benchmarks

  1. To run the tool on different/selected benchmarks, change ~/scripts/projects.txt with names of those benchmarks. Also, edit the ~/scripts/run_[tool].sh to comment/uncomment the following lines to pick subjects from this file.
    < for p in `ls -d */`; do
    < #for p in `cat $DIR/projects.txt`; do
    ---
    > #for p in `ls -d */`; do
    > for p in `cat $DIR/projects.txt`; do
    

Understanding the result files

  • Results and logs are reported in the result folder under results/run[id]/[tool]/[benchmark]/. Here is a description of the files.
    • tool.log -- log generated by the tool
    • tool.logcat -- logcat from the emulator, while tool was running (contains failure stack traces)
    • install.log -- installation log of the app on the device
    • icoverage -- log of intermediate coverage collected
    • coverage.em -- Emma coverage metadata
    • coverage.ec or coverage.es -- complete coverage file
    • coverage[1-11].ec -- snapshots of progressive coverage collected every 5 minutes.

    Troubleshooting

    • To connect, graphically to the VM, find the VRDE port by running this script and connect to that port on the host machine using a remote desktop client (I prefer MS Remote Desktop Connection, that comes with office).
    • If /vagrant directory is not mounted in the VM, then probably guest additions needs to be updated in the VM to match the host. The vagrant-vbguest plugin automatically keeps guest additions up to date.
    • For general help with the virtual machine infrastructure, consult the official VirtualBox and vagrant docs and also search on stackoverflow.