{"title":"Unveiling the impact of unchanged modules across versions on the evaluation of within-project defect prediction models","authors":"Xutong Liu, Yufei Zhou, Zeyu Lu, Yuanqing Mei, Yibiao Yang, Junyan Qian, Yuming Zhou","doi":"10.1002/smr.2715","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Software defect prediction (SDP) is a topic actively researched in the software engineering community. Within-project defect prediction (WPDP) involves using labeled modules from previous versions of the same project to train classifiers. Over time, many defect prediction models have been evaluated under the WPDP scenario.</p>\n </section>\n \n <section>\n \n <h3> Problem</h3>\n \n <p>Data duplication poses a significant challenge in current WPDP evaluation procedures. Unchanged modules, characterized by identical executable source code, are frequently present in both target and source versions during experimentation. However, it is still unclear how and to what extent the presence of unchanged modules affects the performance assessment of WPDP models and the comparison of multiple WPDP models.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>In this paper, we provide a method to detect and remove unchanged modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The experiments conducted on 481 target versions from 62 projects provide evidence that data duplication significantly affects the reported performance values of individual learners in WPDP. However, when ranking multiple WPDP models based on prediction performance, the impact of removing unchanged instances is not substantial. Nevertheless, it is important to note that removing unchanged instances does have a slight influence on the selection of models with better generalization.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>We recommend that future WPDP studies take into consideration the removal of unchanged modules from target versions when evaluating the performance of their models. This practice will enhance the reliability and validity of the results obtained in WPDP research, leading to improved understanding and advancements in defect prediction models.</p>\n </section>\n </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 12","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.2715","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Software defect prediction (SDP) is a topic actively researched in the software engineering community. Within-project defect prediction (WPDP) involves using labeled modules from previous versions of the same project to train classifiers. Over time, many defect prediction models have been evaluated under the WPDP scenario.
Problem
Data duplication poses a significant challenge in current WPDP evaluation procedures. Unchanged modules, characterized by identical executable source code, are frequently present in both target and source versions during experimentation. However, it is still unclear how and to what extent the presence of unchanged modules affects the performance assessment of WPDP models and the comparison of multiple WPDP models.
Method
In this paper, we provide a method to detect and remove unchanged modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation.
Results
The experiments conducted on 481 target versions from 62 projects provide evidence that data duplication significantly affects the reported performance values of individual learners in WPDP. However, when ranking multiple WPDP models based on prediction performance, the impact of removing unchanged instances is not substantial. Nevertheless, it is important to note that removing unchanged instances does have a slight influence on the selection of models with better generalization.
Conclusion
We recommend that future WPDP studies take into consideration the removal of unchanged modules from target versions when evaluating the performance of their models. This practice will enhance the reliability and validity of the results obtained in WPDP research, leading to improved understanding and advancements in defect prediction models.