{"title":"A study on cross-project fault prediction through resampling and feature reduction along with source projects selection","authors":"Pravali Manchala, Manjubala Bisi","doi":"10.1007/s10515-024-00465-6","DOIUrl":null,"url":null,"abstract":"<div><p>Software Fault Prediction is an efficient strategy to improve the quality of software systems. In reality, there won’t be adequate software fault data for a recently established project where the Cross-Project Fault Prediction (CPFP) model plays an important role. CPFP model utilizes other finished projects data to predict faults in ongoing projects. Existing CPFP methods concentrate on discrepancies in distribution between projects without exploring relevant source projects selection combined with distribution gap minimizing methods. Additionally, performing imbalance learning and feature extraction in software projects only balances the data and reduces features by eliminating redundant and unrelated features. This paper proposes a novel SRES method called Similarity and applicability based source projects selection, REsampling, and Stacked autoencoder (SRES) model. To analyze the performance of relevant source projects over CPFP, we proposed a new similarity and applicability based source projects selection method to automatically select sources for the target project. In addition, we introduced a new resampling method that balances source project data by generating data related to the target project, eliminating unrelated data, and reducing the distribution gap. Then, SRES uses the stacked autoencoder to extract informative intermediate feature data to further improve the prediction accuracy of the CPFP. SRES performs comparable to or superior to the conventional CPFP model on six different performance indicators over 24 projects by effectively addressing the issues of CPFP. In conclusion, we can ensure that resampling and feature reduction techniques, along with source projects selection can improve cross-project prediction performance.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-024-00465-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Software Fault Prediction is an efficient strategy to improve the quality of software systems. In reality, there won’t be adequate software fault data for a recently established project where the Cross-Project Fault Prediction (CPFP) model plays an important role. CPFP model utilizes other finished projects data to predict faults in ongoing projects. Existing CPFP methods concentrate on discrepancies in distribution between projects without exploring relevant source projects selection combined with distribution gap minimizing methods. Additionally, performing imbalance learning and feature extraction in software projects only balances the data and reduces features by eliminating redundant and unrelated features. This paper proposes a novel SRES method called Similarity and applicability based source projects selection, REsampling, and Stacked autoencoder (SRES) model. To analyze the performance of relevant source projects over CPFP, we proposed a new similarity and applicability based source projects selection method to automatically select sources for the target project. In addition, we introduced a new resampling method that balances source project data by generating data related to the target project, eliminating unrelated data, and reducing the distribution gap. Then, SRES uses the stacked autoencoder to extract informative intermediate feature data to further improve the prediction accuracy of the CPFP. SRES performs comparable to or superior to the conventional CPFP model on six different performance indicators over 24 projects by effectively addressing the issues of CPFP. In conclusion, we can ensure that resampling and feature reduction techniques, along with source projects selection can improve cross-project prediction performance.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.