A study on cross-project fault prediction through resampling and feature reduction along with source projects selection

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2024-08-16 DOI:10.1007/s10515-024-00465-6

Pravali Manchala, Manjubala Bisi

{"title":"A study on cross-project fault prediction through resampling and feature reduction along with source projects selection","authors":"Pravali Manchala, Manjubala Bisi","doi":"10.1007/s10515-024-00465-6","DOIUrl":null,"url":null,"abstract":"<div><p>Software Fault Prediction is an efficient strategy to improve the quality of software systems. In reality, there won’t be adequate software fault data for a recently established project where the Cross-Project Fault Prediction (CPFP) model plays an important role. CPFP model utilizes other finished projects data to predict faults in ongoing projects. Existing CPFP methods concentrate on discrepancies in distribution between projects without exploring relevant source projects selection combined with distribution gap minimizing methods. Additionally, performing imbalance learning and feature extraction in software projects only balances the data and reduces features by eliminating redundant and unrelated features. This paper proposes a novel SRES method called Similarity and applicability based source projects selection, REsampling, and Stacked autoencoder (SRES) model. To analyze the performance of relevant source projects over CPFP, we proposed a new similarity and applicability based source projects selection method to automatically select sources for the target project. In addition, we introduced a new resampling method that balances source project data by generating data related to the target project, eliminating unrelated data, and reducing the distribution gap. Then, SRES uses the stacked autoencoder to extract informative intermediate feature data to further improve the prediction accuracy of the CPFP. SRES performs comparable to or superior to the conventional CPFP model on six different performance indicators over 24 projects by effectively addressing the issues of CPFP. In conclusion, we can ensure that resampling and feature reduction techniques, along with source projects selection can improve cross-project prediction performance.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-024-00465-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software Fault Prediction is an efficient strategy to improve the quality of software systems. In reality, there won’t be adequate software fault data for a recently established project where the Cross-Project Fault Prediction (CPFP) model plays an important role. CPFP model utilizes other finished projects data to predict faults in ongoing projects. Existing CPFP methods concentrate on discrepancies in distribution between projects without exploring relevant source projects selection combined with distribution gap minimizing methods. Additionally, performing imbalance learning and feature extraction in software projects only balances the data and reduces features by eliminating redundant and unrelated features. This paper proposes a novel SRES method called Similarity and applicability based source projects selection, REsampling, and Stacked autoencoder (SRES) model. To analyze the performance of relevant source projects over CPFP, we proposed a new similarity and applicability based source projects selection method to automatically select sources for the target project. In addition, we introduced a new resampling method that balances source project data by generating data related to the target project, eliminating unrelated data, and reducing the distribution gap. Then, SRES uses the stacked autoencoder to extract informative intermediate feature data to further improve the prediction accuracy of the CPFP. SRES performs comparable to or superior to the conventional CPFP model on six different performance indicators over 24 projects by effectively addressing the issues of CPFP. In conclusion, we can ensure that resampling and feature reduction techniques, along with source projects selection can improve cross-project prediction performance.

Abstract Image

查看原文本刊更多论文

通过重采样和特征缩减以及源项目选择进行跨项目故障预测的研究

软件故障预测是提高软件系统质量的有效策略。在现实中，一个新近建立的项目不会有足够的软件故障数据，这时跨项目故障预测（CPFP）模型就发挥了重要作用。CPFP 模型利用其他已完成项目的数据来预测正在进行的项目中的故障。现有的 CPFP 方法只关注项目间分布的差异，而没有结合分布差距最小化方法探索相关源项目的选择。此外，在软件项目中进行不平衡学习和特征提取只会平衡数据，并通过消除冗余和不相关的特征来减少特征。本文提出了一种新颖的 SRES 方法，称为基于相似性和适用性的源项目选择、REsampling 和堆叠自动编码器（SRES）模型。为了分析相关源项目相对于 CPFP 的性能，我们提出了一种新的基于相似性和适用性的源项目选择方法，以自动为目标项目选择源。此外，我们还引入了一种新的重采样方法，通过生成与目标项目相关的数据来平衡源项目数据，剔除不相关的数据并缩小分布差距。然后，SRES 利用堆叠自动编码器提取信息量大的中间特征数据，进一步提高 CPFP 的预测精度。通过有效解决 CPFP 存在的问题，SRES 在 24 个项目的 6 个不同性能指标上的表现与传统 CPFP 模型相当或更胜一筹。总之，我们可以确保重采样和特征缩减技术以及源项目选择能够提高跨项目预测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.