Industrial applications of artificial intelligence in software defects prediction: Systematic review, challenges, and future works

IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Alfredo Daza , Gonzalo Apaza-Perez , Katherine Samanez-Torres , Juan Benites-Noriega , Orlando Llanos Gonzales , Pablo Cesar Condori-Cutipa
{"title":"Industrial applications of artificial intelligence in software defects prediction: Systematic review, challenges, and future works","authors":"Alfredo Daza ,&nbsp;Gonzalo Apaza-Perez ,&nbsp;Katherine Samanez-Torres ,&nbsp;Juan Benites-Noriega ,&nbsp;Orlando Llanos Gonzales ,&nbsp;Pablo Cesar Condori-Cutipa","doi":"10.1016/j.compeleceng.2025.110411","DOIUrl":null,"url":null,"abstract":"<div><div>Software defect prediction is a constant challenge in industrial software engineering and represents a significant problem for quality and cost in software development worldwide<strong>.</strong> The purpose of this study is to gain a deeper understanding of the quartiles, countries, keywords, techniques, metrics, tools, platforms or languages, variables, data sources, and datasets used in software defect prediction. A comprehensive search of 45 articles from 2019 to 2023, using 5 databases (Scopus, ProQuest, ScienceDirect, EBSCOhost, and Web of Science), was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology. Results show that 60.00 % of the studies were carried out in 2023, and 68.89 % of journals were in the Q1 and Q2 quartiles. The most common techniques were Support Vector Machine (42.22 %) and Random Forest (35.56 %). The most commonly used evaluation metrics were Accuracy and F1-Score (68.89 %). Python was the main programming language (35.56 %), with Kilo (thousands) of lines of code (31.11 %) and Cyclomatic complexity (26.67 %) as key variables. Finally, NASA's Metrics Data Program Data Repository was the most used data source (31.11 %) with a dataset ranging from a minimum of 759 instances and 37 attributes to a maximum of 3579 instances and 38 attributes from 5 projects: CM1, MW1, PC1, PC3, and PC4. This systematic review provides scientific evidence on how machine learning algorithms aid in predicting software defects and improving development processes. In addition, it offers a detailed discussion by identifying trends, limitations, successful approaches, and areas for improvement, providing valuable recommendations for future research.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"124 ","pages":"Article 110411"},"PeriodicalIF":4.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625003544","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Software defect prediction is a constant challenge in industrial software engineering and represents a significant problem for quality and cost in software development worldwide. The purpose of this study is to gain a deeper understanding of the quartiles, countries, keywords, techniques, metrics, tools, platforms or languages, variables, data sources, and datasets used in software defect prediction. A comprehensive search of 45 articles from 2019 to 2023, using 5 databases (Scopus, ProQuest, ScienceDirect, EBSCOhost, and Web of Science), was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology. Results show that 60.00 % of the studies were carried out in 2023, and 68.89 % of journals were in the Q1 and Q2 quartiles. The most common techniques were Support Vector Machine (42.22 %) and Random Forest (35.56 %). The most commonly used evaluation metrics were Accuracy and F1-Score (68.89 %). Python was the main programming language (35.56 %), with Kilo (thousands) of lines of code (31.11 %) and Cyclomatic complexity (26.67 %) as key variables. Finally, NASA's Metrics Data Program Data Repository was the most used data source (31.11 %) with a dataset ranging from a minimum of 759 instances and 37 attributes to a maximum of 3579 instances and 38 attributes from 5 projects: CM1, MW1, PC1, PC3, and PC4. This systematic review provides scientific evidence on how machine learning algorithms aid in predicting software defects and improving development processes. In addition, it offers a detailed discussion by identifying trends, limitations, successful approaches, and areas for improvement, providing valuable recommendations for future research.

Abstract Image

人工智能在软件缺陷预测中的工业应用:系统回顾、挑战和未来工作
软件缺陷预测是工业软件工程中不断面临的挑战,也是影响软件开发质量和成本的重要问题。本研究的目的是更深入地了解软件缺陷预测中使用的四分位数、国家、关键词、技术、度量、工具、平台或语言、变量、数据源和数据集。使用5个数据库(Scopus、ProQuest、ScienceDirect、EBSCOhost和Web of Science)对2019年至2023年的45篇文章进行了全面检索,采用PRISMA(系统评价和元分析的首选报告项目)方法。结果显示,60.00 %的研究是在2023年进行的,68.89%的期刊在Q1和Q2四分位数。最常用的技术是支持向量机(42.22%)和随机森林(35.56%)。最常用的评价指标是准确率和F1-Score(68.89%)。Python是主要的编程语言(35.56%),关键变量是千行代码(31.11%)和圈复杂度(26.67%)。最后,NASA的Metrics数据计划数据存储库是最常用的数据源(31.11%),其数据集范围从最少759个实例和37个属性到最多3579个实例和38个属性,来自5个项目:CM1, MW1, PC1, PC3和PC4。这个系统的回顾提供了关于机器学习算法如何帮助预测软件缺陷和改进开发过程的科学证据。此外,它还通过确定趋势、限制、成功的方法和改进的领域提供了详细的讨论,为未来的研究提供了有价值的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Electrical Engineering
Computers & Electrical Engineering 工程技术-工程:电子与电气
CiteScore
9.20
自引率
7.00%
发文量
661
审稿时长
47 days
期刊介绍: The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信