削皮器:学习在不运行测试的情况下有效地预测片状

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2022-10-01 DOI:10.1109/ICSME55016.2022.00031

Yihao Qin, Shangwen Wang, Kui Liu, Bo Lin, Hongjun Wu, Li Li, Xiaoguang Mao, Tegawendé F. Bissyandé

{"title":"削皮器:学习在不运行测试的情况下有效地预测片状","authors":"Yihao Qin, Shangwen Wang, Kui Liu, Bo Lin, Hongjun Wu, Li Li, Xiaoguang Mao, Tegawendé F. Bissyandé","doi":"10.1109/ICSME55016.2022.00031","DOIUrl":null,"url":null,"abstract":"Regression testing is a widely adopted approach to expose change-induced bugs as well as to verify the correctness/robustness of code in modern software development settings. Unfortunately, the occurrence of flaky tests leads to a significant increase in the cost of regression testing and eventually reduces the productivity of developers (i.e., their ability to find and fix real problems). State-of-the-art approaches leverage dynamic test information obtained through expensive re-execution of test cases to effectively identify flaky tests. Towards accounting for scalability constraints, some recent approaches have built on static test case features, but fall short on effectiveness. In this paper, we introduce Peeler, a new fully static approach for predicting flaky tests through exploring a representation of test cases based on the data dependency relations. The predictor is then trained as a neural network based model, which achieves at the same time scalability (because it does not require any test execution), effectiveness (because it exploits relevant test dependency features), and practicality (because it can be applied in the wild to find new flaky tests). Experimental validation on 17,532 test cases from 21 Java projects shows that Peeler outperforms the state-of-the-art FlakeFlagger by around 20 percentage points: we catch 22% more flaky tests while yielding 51% less false positives. Finally, in a live study with projects in-the-wild, we reported to developers 21 flakiness cases, among which 12 have already been confirmed by developers as being indeed flaky.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Peeler: Learning to Effectively Predict Flakiness without Running Tests\",\"authors\":\"Yihao Qin, Shangwen Wang, Kui Liu, Bo Lin, Hongjun Wu, Li Li, Xiaoguang Mao, Tegawendé F. Bissyandé\",\"doi\":\"10.1109/ICSME55016.2022.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regression testing is a widely adopted approach to expose change-induced bugs as well as to verify the correctness/robustness of code in modern software development settings. Unfortunately, the occurrence of flaky tests leads to a significant increase in the cost of regression testing and eventually reduces the productivity of developers (i.e., their ability to find and fix real problems). State-of-the-art approaches leverage dynamic test information obtained through expensive re-execution of test cases to effectively identify flaky tests. Towards accounting for scalability constraints, some recent approaches have built on static test case features, but fall short on effectiveness. In this paper, we introduce Peeler, a new fully static approach for predicting flaky tests through exploring a representation of test cases based on the data dependency relations. The predictor is then trained as a neural network based model, which achieves at the same time scalability (because it does not require any test execution), effectiveness (because it exploits relevant test dependency features), and practicality (because it can be applied in the wild to find new flaky tests). Experimental validation on 17,532 test cases from 21 Java projects shows that Peeler outperforms the state-of-the-art FlakeFlagger by around 20 percentage points: we catch 22% more flaky tests while yielding 51% less false positives. Finally, in a live study with projects in-the-wild, we reported to developers 21 flakiness cases, among which 12 have already been confirmed by developers as being indeed flaky.\",\"PeriodicalId\":300084,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSME55016.2022.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

在现代软件开发环境中，回归测试是一种广泛采用的方法，用于暴露变化引起的错误，以及验证代码的正确性/健壮性。不幸的是，不稳定测试的出现导致回归测试成本的显著增加，并最终降低了开发人员的生产力(即，他们发现和修复实际问题的能力)。最先进的方法利用通过昂贵的重新执行测试用例获得的动态测试信息来有效地识别不可靠的测试。为了考虑可伸缩性的限制，一些最近的方法建立在静态测试用例的特性上，但是缺乏有效性。在本文中，我们通过探索基于数据依赖关系的测试用例的表示，引入了一种新的全静态方法Peeler，用于预测片状测试。然后将预测器训练为基于神经网络的模型，它同时实现了可伸缩性(因为它不需要任何测试执行)、有效性(因为它利用了相关的测试依赖特性)和实用性(因为它可以在野外应用以找到新的薄片测试)。对来自21个Java项目的17,532个测试用例的实验验证表明，Peeler比最先进的FlakeFlagger高出约20个百分点:我们捕获了22%的不可靠测试，而产生的误报减少了51%。最后，在对项目的实时研究中，我们向开发人员报告了21个不稳定的案例，其中12个已经被开发人员确认为确实不稳定。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Peeler: Learning to Effectively Predict Flakiness without Running Tests

Regression testing is a widely adopted approach to expose change-induced bugs as well as to verify the correctness/robustness of code in modern software development settings. Unfortunately, the occurrence of flaky tests leads to a significant increase in the cost of regression testing and eventually reduces the productivity of developers (i.e., their ability to find and fix real problems). State-of-the-art approaches leverage dynamic test information obtained through expensive re-execution of test cases to effectively identify flaky tests. Towards accounting for scalability constraints, some recent approaches have built on static test case features, but fall short on effectiveness. In this paper, we introduce Peeler, a new fully static approach for predicting flaky tests through exploring a representation of test cases based on the data dependency relations. The predictor is then trained as a neural network based model, which achieves at the same time scalability (because it does not require any test execution), effectiveness (because it exploits relevant test dependency features), and practicality (because it can be applied in the wild to find new flaky tests). Experimental validation on 17,532 test cases from 21 Java projects shows that Peeler outperforms the state-of-the-art FlakeFlagger by around 20 percentage points: we catch 22% more flaky tests while yielding 51% less false positives. Finally, in a live study with projects in-the-wild, we reported to developers 21 flakiness cases, among which 12 have already been confirmed by developers as being indeed flaky.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量