Zhiqiang Li;Hongyu Zhang;Xiao-Yuan Jing;Wangyang Yu;Yueyue Liu
{"title":"Unsupervised Software Defect Prediction Through Multiview Clustering","authors":"Zhiqiang Li;Hongyu Zhang;Xiao-Yuan Jing;Wangyang Yu;Yueyue Liu","doi":"10.1109/TR.2025.3548107","DOIUrl":null,"url":null,"abstract":"The core goal of software defect prediction (SDP) is to identify modules with a high likelihood of defects, thereby enabling prioritization of quality assurance activities with low inspection effort. There are many supervised defect prediction models that are extensively studied. However, these methods require the need for labeling data to get enough training modules, which will cause a lot of waste of human resources. Cross-project defect prediction primarily reuses models trained on other projects with enough historical data. However, this strategy is often hindered by large distribution differences across different projects and privacy concerns of data. Unsupervised learning technique is an alternative solution to the unlabeled data, but it mainly focuses on single-view prediction by concatenating all the software metrics. This ignores the diversity and complementarity of different types of metrics. This study proposes a novel approach, namely, multiview unsupervised software defect prediction (MUSDP). It aims to collaboratively learn the diversity and complementarity of different views to build a robust and reliable defect prediction model. Extensive experiments on <inline-formula><tex-math>$ 28$</tex-math></inline-formula> releases from eight software projects indicate that MUSDP exhibits superior or comparable results regarding <italic>G-mean</i>, <italic>AUC</i>, <inline-formula><tex-math>$P_{\\text{opt}}$</tex-math></inline-formula>, and <italic>Recall@20%</i> compared to competing supervised and unsupervised methods. For the interpretation of MUSDP, the number of added and deleted lines significantly influence its predictions.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3356-3370"},"PeriodicalIF":5.7000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10938692/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The core goal of software defect prediction (SDP) is to identify modules with a high likelihood of defects, thereby enabling prioritization of quality assurance activities with low inspection effort. There are many supervised defect prediction models that are extensively studied. However, these methods require the need for labeling data to get enough training modules, which will cause a lot of waste of human resources. Cross-project defect prediction primarily reuses models trained on other projects with enough historical data. However, this strategy is often hindered by large distribution differences across different projects and privacy concerns of data. Unsupervised learning technique is an alternative solution to the unlabeled data, but it mainly focuses on single-view prediction by concatenating all the software metrics. This ignores the diversity and complementarity of different types of metrics. This study proposes a novel approach, namely, multiview unsupervised software defect prediction (MUSDP). It aims to collaboratively learn the diversity and complementarity of different views to build a robust and reliable defect prediction model. Extensive experiments on $ 28$ releases from eight software projects indicate that MUSDP exhibits superior or comparable results regarding G-mean, AUC, $P_{\text{opt}}$, and Recall@20% compared to competing supervised and unsupervised methods. For the interpretation of MUSDP, the number of added and deleted lines significantly influence its predictions.
期刊介绍:
IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.