Perception and Practices of Differential Testing

2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) Pub Date : 2019-05-27 DOI:10.1109/ICSE-SEIP.2019.00016

Muhammad Ali Gulzar, Yongkang Zhu, Xiaofeng Han

{"title":"Perception and Practices of Differential Testing","authors":"Muhammad Ali Gulzar, Yongkang Zhu, Xiaofeng Han","doi":"10.1109/ICSE-SEIP.2019.00016","DOIUrl":null,"url":null,"abstract":"Tens of thousands engineers are contributing to Google's codebase that spans billions of lines of code. To ensure high code quality, tremendous amount of effort has been made with new testing techniques and frameworks. However, with increasingly complex data structures and software systems, traditional test case based testing strategies cannot scale well to achieve the desired level of test adequacy. Differential (Diff) is one of the new testing techniques adapted to fill this gap. It uses the same input to run two versions of a software system, namely base and test, where base is the verified/tested version of the system while test is the modified version. The output of two runs are then thoroughly compared to find abnormalities that may lead to possible bugs. Over the past few years, differential testing has been quickly adopted by hundreds of teams across all major product areas at Google. Meanwhile, many new differential testing frameworks were developed to simplify the creation, maintenance, and analysis of diff tests. Curious by this emerging popularity, we conducted the first empirical study on differential testing in practice at large scale. In this study, we investigated common practices and usage of diff tests. We further explore the features of diff tests that users value the most and the pain points of using diff tests. Through this user study, we discovered that differential testing does not replace fine-grained testing techniques such as unit tests. Instead it supplements existing testing suites. It helps users verify the impact on unmodified and unfamiliar components in the absence of a test oracle. In terms of limitations, diff tests often take long time to run and appear to generate noisy and flaky outcomes. Finally, we highlight problems (including smart data differencing, sampling, and traceability) to guide future research in differential testing.","PeriodicalId":378237,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE-SEIP.2019.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Tens of thousands engineers are contributing to Google's codebase that spans billions of lines of code. To ensure high code quality, tremendous amount of effort has been made with new testing techniques and frameworks. However, with increasingly complex data structures and software systems, traditional test case based testing strategies cannot scale well to achieve the desired level of test adequacy. Differential (Diff) is one of the new testing techniques adapted to fill this gap. It uses the same input to run two versions of a software system, namely base and test, where base is the verified/tested version of the system while test is the modified version. The output of two runs are then thoroughly compared to find abnormalities that may lead to possible bugs. Over the past few years, differential testing has been quickly adopted by hundreds of teams across all major product areas at Google. Meanwhile, many new differential testing frameworks were developed to simplify the creation, maintenance, and analysis of diff tests. Curious by this emerging popularity, we conducted the first empirical study on differential testing in practice at large scale. In this study, we investigated common practices and usage of diff tests. We further explore the features of diff tests that users value the most and the pain points of using diff tests. Through this user study, we discovered that differential testing does not replace fine-grained testing techniques such as unit tests. Instead it supplements existing testing suites. It helps users verify the impact on unmodified and unfamiliar components in the absence of a test oracle. In terms of limitations, diff tests often take long time to run and appear to generate noisy and flaky outcomes. Finally, we highlight problems (including smart data differencing, sampling, and traceability) to guide future research in differential testing.

查看原文本刊更多论文

感知和实践的差异测试

成千上万的工程师为b谷歌的代码库贡献了数十亿行代码。为了确保高质量的代码，在新的测试技术和框架上付出了巨大的努力。然而，随着数据结构和软件系统的日益复杂，传统的基于测试用例的测试策略不能很好地扩展以达到期望的测试充分性水平。差分(Diff)是一种新的测试技术，用来填补这一空白。它使用相同的输入来运行软件系统的两个版本，即base和test，其中base是系统的验证/测试版本，而test是修改版本。然后彻底比较两次运行的输出，以发现可能导致错误的异常。在过去的几年中，差分测试已经被b谷歌所有主要产品领域的数百个团队迅速采用。同时，开发了许多新的差分测试框架，以简化差分测试的创建、维护和分析。对这种新兴的流行感到好奇，我们在实践中进行了第一次大规模的差异测试实证研究。在这项研究中，我们调查了diff测试的常见做法和使用。我们进一步探讨了diff测试中用户最看重的特性以及使用diff测试的痛点。通过对用户的研究，我们发现差异测试并不能代替细粒度的测试技术，比如单元测试。相反，它补充了现有的测试套件。它帮助用户在没有测试oracle的情况下验证对未修改的和不熟悉的组件的影响。就局限性而言，diff测试通常需要很长时间才能运行，并且似乎会产生嘈杂和不稳定的结果。最后，我们强调了问题(包括智能数据差分、采样和可追溯性)，以指导差分测试的未来研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

自引率

0.00%

发文量