Azeem Ahmad, F. D. O. Neto, Zhixiang Shi, K. Sandahl, O. Leifler
{"title":"片状测试检测及自动化根本原因分析的多因素方法","authors":"Azeem Ahmad, F. D. O. Neto, Zhixiang Shi, K. Sandahl, O. Leifler","doi":"10.1109/APSEC53868.2021.00041","DOIUrl":null,"url":null,"abstract":"Developers often spend time to determine whether test case failures are real failures or flaky. The flaky tests, also known as non-deterministic tests, switch their outcomes without any modification in the codebase, hence reducing the confidence of developers during maintenance as well as in the quality of a product. Re-running test cases to reveal flakiness is resource-consuming, unreliable and does not reveal the root causes of test flakiness. Our paper evaluates a multi-factor approach to identify flaky test executions implemented in a tool named MDF laker. The four factors are: trace-back coverage, flaky frequency, number of test smells, and test size. Based on the extracted factors, MDFlaker uses k-Nearest Neighbor (KNN) to determine whether failed test executions are flaky. We investigate MDFlaker in a case study with 2166 test executions from different open-source repositories. We evaluate the effectiveness of our flaky detection tool. We illustrate how the multi-factor approach can be used to reveal root causes for flakiness, and we conduct a qualitative comparison between MDF laker and other tools proposed in literature. Our results show that the combination of different factors can be used to identify flaky tests. Each factor has its own trade-off, e.g., trace-back leads to many true positives, while flaky frequency yields more true negatives. Therefore, specific combinations of factors enable classification for testers with limited information (e.g., not enough test history information).","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Multi-factor Approach for Flaky Test Detection and Automated Root Cause Analysis\",\"authors\":\"Azeem Ahmad, F. D. O. Neto, Zhixiang Shi, K. Sandahl, O. Leifler\",\"doi\":\"10.1109/APSEC53868.2021.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developers often spend time to determine whether test case failures are real failures or flaky. The flaky tests, also known as non-deterministic tests, switch their outcomes without any modification in the codebase, hence reducing the confidence of developers during maintenance as well as in the quality of a product. Re-running test cases to reveal flakiness is resource-consuming, unreliable and does not reveal the root causes of test flakiness. Our paper evaluates a multi-factor approach to identify flaky test executions implemented in a tool named MDF laker. The four factors are: trace-back coverage, flaky frequency, number of test smells, and test size. Based on the extracted factors, MDFlaker uses k-Nearest Neighbor (KNN) to determine whether failed test executions are flaky. We investigate MDFlaker in a case study with 2166 test executions from different open-source repositories. We evaluate the effectiveness of our flaky detection tool. We illustrate how the multi-factor approach can be used to reveal root causes for flakiness, and we conduct a qualitative comparison between MDF laker and other tools proposed in literature. Our results show that the combination of different factors can be used to identify flaky tests. Each factor has its own trade-off, e.g., trace-back leads to many true positives, while flaky frequency yields more true negatives. Therefore, specific combinations of factors enable classification for testers with limited information (e.g., not enough test history information).\",\"PeriodicalId\":143800,\"journal\":{\"name\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC53868.2021.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Multi-factor Approach for Flaky Test Detection and Automated Root Cause Analysis
Developers often spend time to determine whether test case failures are real failures or flaky. The flaky tests, also known as non-deterministic tests, switch their outcomes without any modification in the codebase, hence reducing the confidence of developers during maintenance as well as in the quality of a product. Re-running test cases to reveal flakiness is resource-consuming, unreliable and does not reveal the root causes of test flakiness. Our paper evaluates a multi-factor approach to identify flaky test executions implemented in a tool named MDF laker. The four factors are: trace-back coverage, flaky frequency, number of test smells, and test size. Based on the extracted factors, MDFlaker uses k-Nearest Neighbor (KNN) to determine whether failed test executions are flaky. We investigate MDFlaker in a case study with 2166 test executions from different open-source repositories. We evaluate the effectiveness of our flaky detection tool. We illustrate how the multi-factor approach can be used to reveal root causes for flakiness, and we conduct a qualitative comparison between MDF laker and other tools proposed in literature. Our results show that the combination of different factors can be used to identify flaky tests. Each factor has its own trade-off, e.g., trace-back leads to many true positives, while flaky frequency yields more true negatives. Therefore, specific combinations of factors enable classification for testers with limited information (e.g., not enough test history information).