An Empirical Study of Flaky Tests in Android Apps

2018 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2018-09-01 DOI:10.1109/ICSME.2018.00062

S. Thorve, Chandani Sreshtha, Na Meng

{"title":"An Empirical Study of Flaky Tests in Android Apps","authors":"S. Thorve, Chandani Sreshtha, Na Meng","doi":"10.1109/ICSME.2018.00062","DOIUrl":null,"url":null,"abstract":"A flaky test is a test that may fail or pass for the same code under testing (CUT). Flaky tests could be harmful to developers because the non-deterministic test outcome is not reliable and developers cannot easily debug the code. A prior study characterized the root causes and fixing strategies of flaky tests by analyzing commits of 51 Apache open source projects, without analyzing any Android app. Due to the popular usage of Android devices and the multitude of interactions of Android apps with third-party software libraries, hardware, network, and users, we were curious to find if the Android apps manifested unique flakiness patterns and called for any special resolution for flaky tests as compared to the existing literature. For this paper, we conducted an empirical study to characterize the flaky tests in Android apps. By classifying the root causes and fixing strategies of flakiness, we aimed to investigate how our proposed characterization for flakiness in Android apps varies from prior findings, and whether there are domain-specific flakiness patterns. After mining GitHub, we found 29 Android projects containing 77 commits that were relevant to flakiness. We identified five root causes of Android apps' flakiness. We revealed three novel causes - Dependency, Program Logic, and UI. Five types of resolution strategies were observed to address the flaky behavior. Many of the examined commits show developers' attempt to fix flakiness by changing software implementation in various ways. However, there are still 13% commits that simply skipped or removed the flaky tests. Our observations provide useful insights for future research on flaky tests of Android apps.","PeriodicalId":6572,"journal":{"name":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"51 1","pages":"534-538"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2018.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 56

Abstract

A flaky test is a test that may fail or pass for the same code under testing (CUT). Flaky tests could be harmful to developers because the non-deterministic test outcome is not reliable and developers cannot easily debug the code. A prior study characterized the root causes and fixing strategies of flaky tests by analyzing commits of 51 Apache open source projects, without analyzing any Android app. Due to the popular usage of Android devices and the multitude of interactions of Android apps with third-party software libraries, hardware, network, and users, we were curious to find if the Android apps manifested unique flakiness patterns and called for any special resolution for flaky tests as compared to the existing literature. For this paper, we conducted an empirical study to characterize the flaky tests in Android apps. By classifying the root causes and fixing strategies of flakiness, we aimed to investigate how our proposed characterization for flakiness in Android apps varies from prior findings, and whether there are domain-specific flakiness patterns. After mining GitHub, we found 29 Android projects containing 77 commits that were relevant to flakiness. We identified five root causes of Android apps' flakiness. We revealed three novel causes - Dependency, Program Logic, and UI. Five types of resolution strategies were observed to address the flaky behavior. Many of the examined commits show developers' attempt to fix flakiness by changing software implementation in various ways. However, there are still 13% commits that simply skipped or removed the flaky tests. Our observations provide useful insights for future research on flaky tests of Android apps.

查看原文本刊更多论文

Android应用中片状测试的实证研究

片状测试是指在测试(CUT)下的相同代码可能失败或通过的测试。不可靠的测试可能对开发人员有害，因为不确定的测试结果不可靠，开发人员无法轻松调试代码。之前的一项研究通过分析51个Apache开源项目的提交，在没有分析任何Android应用的情况下，描述了不可靠测试的根本原因和修复策略。由于Android设备的广泛使用以及Android应用与第三方软件库、硬件、网络和用户的大量交互，与现有文献相比，我们很想知道Android应用是否表现出独特的片状模式，是否需要针对片状测试的特殊解决方案。在本文中，我们对Android应用中的片状测试进行了实证研究。通过对易碎性的根本原因和修复策略进行分类，我们的目标是调查我们提出的Android应用中易碎性的特征与先前的发现有何不同，以及是否存在特定领域的易碎性模式。在挖掘GitHub后，我们发现了29个Android项目，其中包含77个与flakiness相关的提交。我们确定了Android应用不稳定的五个根本原因。我们揭示了三个新的原因——依赖性、程序逻辑和用户界面。观察到五种类型的解决策略来解决片状行为。许多被检查的提交显示了开发人员试图通过以各种方式更改软件实现来修复漏洞。然而，仍然有13%的提交只是跳过或删除了不稳定的测试。我们的观察结果为未来Android应用的零散测试研究提供了有用的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量