Alfredo Daza , Gonzalo Apaza-Perez , Katherine Samanez-Torres , Juan Benites-Noriega , Orlando Llanos Gonzales , Pablo Cesar Condori-Cutipa
{"title":"Industrial applications of artificial intelligence in software defects prediction: Systematic review, challenges, and future works","authors":"Alfredo Daza , Gonzalo Apaza-Perez , Katherine Samanez-Torres , Juan Benites-Noriega , Orlando Llanos Gonzales , Pablo Cesar Condori-Cutipa","doi":"10.1016/j.compeleceng.2025.110411","DOIUrl":null,"url":null,"abstract":"<div><div>Software defect prediction is a constant challenge in industrial software engineering and represents a significant problem for quality and cost in software development worldwide<strong>.</strong> The purpose of this study is to gain a deeper understanding of the quartiles, countries, keywords, techniques, metrics, tools, platforms or languages, variables, data sources, and datasets used in software defect prediction. A comprehensive search of 45 articles from 2019 to 2023, using 5 databases (Scopus, ProQuest, ScienceDirect, EBSCOhost, and Web of Science), was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology. Results show that 60.00 % of the studies were carried out in 2023, and 68.89 % of journals were in the Q1 and Q2 quartiles. The most common techniques were Support Vector Machine (42.22 %) and Random Forest (35.56 %). The most commonly used evaluation metrics were Accuracy and F1-Score (68.89 %). Python was the main programming language (35.56 %), with Kilo (thousands) of lines of code (31.11 %) and Cyclomatic complexity (26.67 %) as key variables. Finally, NASA's Metrics Data Program Data Repository was the most used data source (31.11 %) with a dataset ranging from a minimum of 759 instances and 37 attributes to a maximum of 3579 instances and 38 attributes from 5 projects: CM1, MW1, PC1, PC3, and PC4. This systematic review provides scientific evidence on how machine learning algorithms aid in predicting software defects and improving development processes. In addition, it offers a detailed discussion by identifying trends, limitations, successful approaches, and areas for improvement, providing valuable recommendations for future research.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"124 ","pages":"Article 110411"},"PeriodicalIF":4.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625003544","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Software defect prediction is a constant challenge in industrial software engineering and represents a significant problem for quality and cost in software development worldwide. The purpose of this study is to gain a deeper understanding of the quartiles, countries, keywords, techniques, metrics, tools, platforms or languages, variables, data sources, and datasets used in software defect prediction. A comprehensive search of 45 articles from 2019 to 2023, using 5 databases (Scopus, ProQuest, ScienceDirect, EBSCOhost, and Web of Science), was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology. Results show that 60.00 % of the studies were carried out in 2023, and 68.89 % of journals were in the Q1 and Q2 quartiles. The most common techniques were Support Vector Machine (42.22 %) and Random Forest (35.56 %). The most commonly used evaluation metrics were Accuracy and F1-Score (68.89 %). Python was the main programming language (35.56 %), with Kilo (thousands) of lines of code (31.11 %) and Cyclomatic complexity (26.67 %) as key variables. Finally, NASA's Metrics Data Program Data Repository was the most used data source (31.11 %) with a dataset ranging from a minimum of 759 instances and 37 attributes to a maximum of 3579 instances and 38 attributes from 5 projects: CM1, MW1, PC1, PC3, and PC4. This systematic review provides scientific evidence on how machine learning algorithms aid in predicting software defects and improving development processes. In addition, it offers a detailed discussion by identifying trends, limitations, successful approaches, and areas for improvement, providing valuable recommendations for future research.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.