基于实例的跨项目及时缺陷预测新方法

Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai
{"title":"基于实例的跨项目及时缺陷预测新方法","authors":"Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai","doi":"10.1002/spe.3316","DOIUrl":null,"url":null,"abstract":"Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel instance-based method for cross-project just-in-time defect prediction\",\"authors\":\"Xiaoyan Zhu, Tian Qiu, Jiayin Wang, Xin Lai\",\"doi\":\"10.1002/spe.3316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.\",\"PeriodicalId\":21899,\"journal\":{\"name\":\"Software: Practice and Experience\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software: Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/spe.3316\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

跨项目(CP)及时软件缺陷预测(JIT-SDP)利用 CP 数据克服初始数据稀缺的问题,在软件项目的早期阶段训练高性能的 JIT-SDP 分类器。JIT-SDP 在跨项目背景下面临的主要挑战在于训练数据和测试数据之间的不同分布。为了解决这个问题,我们选择了与目标数据非常相似的源数据实例来构建分类器。软件数据集通常会表现出类不平衡问题,即缺陷类与干净类的比例明显偏低。这种不平衡通常会降低分类器的性能。在本研究中,我们提出了一种利用核均值匹配(ISKMM)的实例选择方法,该方法能同时解决跨项目缺陷预测(CPDP)中的知识转移和类不平衡问题。该方法采用核均值匹配(KMM)技术来评估训练数据和目标数据之间的相似性。它选择具有高相似性的实例,保留它们,并根据相似性加权对数据进行重新采样,以缓解类不平衡问题。我们在 10 个开源项目上进行的实验表明,ISKMM 算法优于现有的 CP 单源软件缺陷预测 (SDP) 算法。此外,在使用所提出的算法时,从跨项目数据构建的缺陷预测器的整体性能可与从项目内数据学习的预测器相媲美。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A novel instance-based method for cross-project just-in-time defect prediction
Cross-project (CP) just-in-time software defect prediction (JIT-SDP) uses CP data to overcome initial data scarcity for training high-performing JIT-SDP classifiers in the early stages of software projects. The primary challenge faced by JIT-SDP in a cross-project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross-project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open-source projects, reveal that the ISKMM algorithm outperforms existing CP single-source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross-project data demonstrate an overall performance comparable to predictors learned from within-project data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信