Unsupervised Software Defect Prediction Through Multiview Clustering

IF 5.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Reliability Pub Date : 2025-03-26 DOI:10.1109/TR.2025.3548107

Zhiqiang Li;Hongyu Zhang;Xiao-Yuan Jing;Wangyang Yu;Yueyue Liu

{"title":"Unsupervised Software Defect Prediction Through Multiview Clustering","authors":"Zhiqiang Li;Hongyu Zhang;Xiao-Yuan Jing;Wangyang Yu;Yueyue Liu","doi":"10.1109/TR.2025.3548107","DOIUrl":null,"url":null,"abstract":"The core goal of software defect prediction (SDP) is to identify modules with a high likelihood of defects, thereby enabling prioritization of quality assurance activities with low inspection effort. There are many supervised defect prediction models that are extensively studied. However, these methods require the need for labeling data to get enough training modules, which will cause a lot of waste of human resources. Cross-project defect prediction primarily reuses models trained on other projects with enough historical data. However, this strategy is often hindered by large distribution differences across different projects and privacy concerns of data. Unsupervised learning technique is an alternative solution to the unlabeled data, but it mainly focuses on single-view prediction by concatenating all the software metrics. This ignores the diversity and complementarity of different types of metrics. This study proposes a novel approach, namely, multiview unsupervised software defect prediction (MUSDP). It aims to collaboratively learn the diversity and complementarity of different views to build a robust and reliable defect prediction model. Extensive experiments on <inline-formula><tex-math>$ 28$</tex-math></inline-formula> releases from eight software projects indicate that MUSDP exhibits superior or comparable results regarding <italic>G-mean</i>, <italic>AUC</i>, <inline-formula><tex-math>$P_{\\text{opt}}$</tex-math></inline-formula>, and <italic>Recall@20%</i> compared to competing supervised and unsupervised methods. For the interpretation of MUSDP, the number of added and deleted lines significantly influence its predictions.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3356-3370"},"PeriodicalIF":5.7000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10938692/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The core goal of software defect prediction (SDP) is to identify modules with a high likelihood of defects, thereby enabling prioritization of quality assurance activities with low inspection effort. There are many supervised defect prediction models that are extensively studied. However, these methods require the need for labeling data to get enough training modules, which will cause a lot of waste of human resources. Cross-project defect prediction primarily reuses models trained on other projects with enough historical data. However, this strategy is often hindered by large distribution differences across different projects and privacy concerns of data. Unsupervised learning technique is an alternative solution to the unlabeled data, but it mainly focuses on single-view prediction by concatenating all the software metrics. This ignores the diversity and complementarity of different types of metrics. This study proposes a novel approach, namely, multiview unsupervised software defect prediction (MUSDP). It aims to collaboratively learn the diversity and complementarity of different views to build a robust and reliable defect prediction model. Extensive experiments on

$ 28$

releases from eight software projects indicate that MUSDP exhibits superior or comparable results regarding G-mean, AUC,

$P_{\text{opt}}$

, and Recall@20% compared to competing supervised and unsupervised methods. For the interpretation of MUSDP, the number of added and deleted lines significantly influence its predictions.

查看原文本刊更多论文

基于多视图聚类的无监督软件缺陷预测

软件缺陷预测（SDP）的核心目标是识别具有高缺陷可能性的模块，从而以较低的检查工作实现质量保证活动的优先级。有监督缺陷预测模型得到了广泛的研究。然而，这些方法都需要标注数据才能获得足够的训练模块，这会造成大量的人力资源浪费。跨项目缺陷预测主要重用在具有足够历史数据的其他项目上训练的模型。然而，这种策略经常受到不同项目之间分布差异和数据隐私问题的阻碍。无监督学习技术是对未标记数据的一种替代解决方案，但它主要侧重于通过连接所有软件指标进行单视图预测。这忽略了不同类型度量的多样性和互补性。本研究提出了一种新的方法，即多视图无监督软件缺陷预测（MUSDP）。它旨在协作学习不同视图的多样性和互补性，以构建健壮可靠的缺陷预测模型。对来自8个软件项目的$ 28$版本的广泛实验表明，与竞争的监督和非监督方法相比，MUSDP在G-mean， AUC, $P_{\text{opt}}$和Recall@20%方面表现出优越或可比较的结果。对于MUSDP的解释，增加和删除的行数显著影响其预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Reliability 工程技术-工程：电子与电气

CiteScore

12.20

自引率

8.50%

发文量

153

审稿时长

7.5 months

期刊介绍： IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.