Linux内核测试用例优先级的实证研究

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-05-13 DOI:10.1007/s10515-025-00522-8

Haichi Wang, Ruiguo Yu, Dong Wang, Yiheng Du, Yingquan Zhao, Junjie Chen, Zan Wang

{"title":"Linux内核测试用例优先级的实证研究","authors":"Haichi Wang, Ruiguo Yu, Dong Wang, Yiheng Du, Yingquan Zhao, Junjie Chen, Zan Wang","doi":"10.1007/s10515-025-00522-8","DOIUrl":null,"url":null,"abstract":"<div><p>The Linux kernel is a complex and constantly evolving system, where each code change can impact different components of the system. Regression testing ensures that new changes do not affect existing functionality or introduce new defects. However, due to the complexity of the Linux kernel, maintenance remains challenging. While practices like Continuous Integration (CI) facilitate rapid commits through automated regression testing, each CI process still incurs substantial costs due to the extensive number of test cases. Traditional software testing employs test case prioritization (TCP) techniques to prioritize test cases, thus enabling the early detection of defects. Due to the unique characteristics of the Linux kernel, it remains unclear whether the existing TCP techniques are suitable for its regression testing. In this paper, we present the first empirical study by comparing various TCP techniques in Linux kernel context. Specifically, we examined a total of 17 TCP techniques, including similarity-based, information-retrieval-based, and coverage-based techniques. The experimental results demonstrate that: (1) Similarity-based TCP techniques perform best on the Linux kernel, achieving a mean APFD (Average Percentage of Faults Detected) value of 0.7583 and requiring significantly less time; (2) The majority of TCP techniques show relatively stable performance across multiple commits, where similarity-based TCP techniques are more stable with a maximum decrease of 3.03% and 3.92% in terms of mean and median APFD values, respectively; (3) More than half of the studied techniques are significantly affected by flaky tests, with both mean and median APFD values ranging from -29.9% to -63.5%. This work takes the first look at the adoption of TCP techniques in the Linux kernel, confirming its potential for effective and efficient prioritization.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An empirical study of test case prioritization on the Linux Kernel\",\"authors\":\"Haichi Wang, Ruiguo Yu, Dong Wang, Yiheng Du, Yingquan Zhao, Junjie Chen, Zan Wang\",\"doi\":\"10.1007/s10515-025-00522-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The Linux kernel is a complex and constantly evolving system, where each code change can impact different components of the system. Regression testing ensures that new changes do not affect existing functionality or introduce new defects. However, due to the complexity of the Linux kernel, maintenance remains challenging. While practices like Continuous Integration (CI) facilitate rapid commits through automated regression testing, each CI process still incurs substantial costs due to the extensive number of test cases. Traditional software testing employs test case prioritization (TCP) techniques to prioritize test cases, thus enabling the early detection of defects. Due to the unique characteristics of the Linux kernel, it remains unclear whether the existing TCP techniques are suitable for its regression testing. In this paper, we present the first empirical study by comparing various TCP techniques in Linux kernel context. Specifically, we examined a total of 17 TCP techniques, including similarity-based, information-retrieval-based, and coverage-based techniques. The experimental results demonstrate that: (1) Similarity-based TCP techniques perform best on the Linux kernel, achieving a mean APFD (Average Percentage of Faults Detected) value of 0.7583 and requiring significantly less time; (2) The majority of TCP techniques show relatively stable performance across multiple commits, where similarity-based TCP techniques are more stable with a maximum decrease of 3.03% and 3.92% in terms of mean and median APFD values, respectively; (3) More than half of the studied techniques are significantly affected by flaky tests, with both mean and median APFD values ranging from -29.9% to -63.5%. This work takes the first look at the adoption of TCP techniques in the Linux kernel, confirming its potential for effective and efficient prioritization.</p></div>\",\"PeriodicalId\":55414,\"journal\":{\"name\":\"Automated Software Engineering\",\"volume\":\"32 2\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automated Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10515-025-00522-8\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00522-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

Linux内核是一个复杂且不断发展的系统，其中每个代码更改都会影响系统的不同组件。回归测试确保新的更改不会影响现有的功能或引入新的缺陷。然而，由于Linux内核的复杂性，维护仍然具有挑战性。虽然像持续集成（CI）这样的实践通过自动化的回归测试促进了快速提交，但是由于测试用例的大量存在，每个CI过程仍然会产生大量的成本。传统的软件测试使用测试用例优先级（TCP）技术来确定测试用例的优先级，从而能够早期发现缺陷。由于Linux内核的独特特性，目前尚不清楚现有的TCP技术是否适合其回归测试。在本文中，我们通过比较Linux内核环境中的各种TCP技术，提出了第一个实证研究。具体地说，我们研究了总共17种TCP技术，包括基于相似性的、基于信息检索的和基于覆盖的技术。实验结果表明：(1)基于相似度的TCP技术在Linux内核上表现最好，平均APFD（平均故障检测百分比）值为0.7583，所需时间显著减少；(2)大多数TCP技术在多个提交中表现出相对稳定的性能，其中基于相似性的TCP技术更稳定，APFD均值和中位数分别最大下降3.03%和3.92%；(3)半数以上的研究技术受到片状试验的显著影响，APFD的平均值和中位数在-29.9% ~ -63.5%之间。本文首先介绍了在Linux内核中采用TCP技术，确认了其有效和高效优先级的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An empirical study of test case prioritization on the Linux Kernel

The Linux kernel is a complex and constantly evolving system, where each code change can impact different components of the system. Regression testing ensures that new changes do not affect existing functionality or introduce new defects. However, due to the complexity of the Linux kernel, maintenance remains challenging. While practices like Continuous Integration (CI) facilitate rapid commits through automated regression testing, each CI process still incurs substantial costs due to the extensive number of test cases. Traditional software testing employs test case prioritization (TCP) techniques to prioritize test cases, thus enabling the early detection of defects. Due to the unique characteristics of the Linux kernel, it remains unclear whether the existing TCP techniques are suitable for its regression testing. In this paper, we present the first empirical study by comparing various TCP techniques in Linux kernel context. Specifically, we examined a total of 17 TCP techniques, including similarity-based, information-retrieval-based, and coverage-based techniques. The experimental results demonstrate that: (1) Similarity-based TCP techniques perform best on the Linux kernel, achieving a mean APFD (Average Percentage of Faults Detected) value of 0.7583 and requiring significantly less time; (2) The majority of TCP techniques show relatively stable performance across multiple commits, where similarity-based TCP techniques are more stable with a maximum decrease of 3.03% and 3.92% in terms of mean and median APFD values, respectively; (3) More than half of the studied techniques are significantly affected by flaky tests, with both mean and median APFD values ranging from -29.9% to -63.5%. This work takes the first look at the adoption of TCP techniques in the Linux kernel, confirming its potential for effective and efficient prioritization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.