Multi-source Cross Project Defect Prediction with Joint Wasserstein Distance and Ensemble Learning

Quanyi Zou, Lu Lu, Zhanyu Yang, Hao Xu
{"title":"Multi-source Cross Project Defect Prediction with Joint Wasserstein Distance and Ensemble Learning","authors":"Quanyi Zou, Lu Lu, Zhanyu Yang, Hao Xu","doi":"10.1109/ISSRE52982.2021.00019","DOIUrl":null,"url":null,"abstract":"Cross-Project Defect Prediction (CPDP) refers to transferring knowledge from source software projects to a target software project. Previous research has shown that the impacts of knowledge transferred from different source projects differ on the target task. Therefore, one of the fundamental challenges in CPDP is how to measure the amount of knowledge transferred from each source project to the target task. This article proposed a novel CPDP method called Multi-source defect prediction with Joint Wasserstein Distance and Ensemble Learning (MJWDEL) to learn transferred weights for evaluating the importance of each source project to the target task. In particular, first of all, applying the TCA technique and Logistic Regression (LR) train a sub-model for each source project and the target project. Moreover, the article designs joint Wassertein distance to understand the source-target relationship and then uses this as a basis to compute the transferred weights of different sub-models. After that, the transferred weights can be used to reweight these sub-models to determine their importance in knowledge transfer to the target task. We conducted experiments on 19 software projects from PROMISE, NASA and AEEEM datasets. Compared with several state-of-the-art CPDP methods, the proposed method substantially improves CPDP performance in terms of four evaluation indicators (i.e., F-measure, Balance, G-measure and MMC).","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE52982.2021.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Cross-Project Defect Prediction (CPDP) refers to transferring knowledge from source software projects to a target software project. Previous research has shown that the impacts of knowledge transferred from different source projects differ on the target task. Therefore, one of the fundamental challenges in CPDP is how to measure the amount of knowledge transferred from each source project to the target task. This article proposed a novel CPDP method called Multi-source defect prediction with Joint Wasserstein Distance and Ensemble Learning (MJWDEL) to learn transferred weights for evaluating the importance of each source project to the target task. In particular, first of all, applying the TCA technique and Logistic Regression (LR) train a sub-model for each source project and the target project. Moreover, the article designs joint Wassertein distance to understand the source-target relationship and then uses this as a basis to compute the transferred weights of different sub-models. After that, the transferred weights can be used to reweight these sub-models to determine their importance in knowledge transfer to the target task. We conducted experiments on 19 software projects from PROMISE, NASA and AEEEM datasets. Compared with several state-of-the-art CPDP methods, the proposed method substantially improves CPDP performance in terms of four evaluation indicators (i.e., F-measure, Balance, G-measure and MMC).
联合Wasserstein距离和集成学习的多源跨项目缺陷预测
跨项目缺陷预测(CPDP)是指将知识从源软件项目转移到目标软件项目。以往的研究表明,不同源项目的知识转移对目标任务的影响是不同的。因此,CPDP的基本挑战之一是如何度量从每个源项目转移到目标任务的知识量。本文提出了一种基于联合Wasserstein距离和集成学习(MJWDEL)的多源缺陷预测方法,以学习用于评估每个源项目对目标任务重要性的转移权。特别是,首先,应用TCA技术和逻辑回归(LR)为每个源项目和目标项目训练子模型。此外,本文设计联合Wassertein距离来理解源目标关系,并以此为基础计算各子模型的转移权值。然后,利用转移的权重对这些子模型进行重新加权,以确定它们在知识转移到目标任务中的重要性。我们对来自PROMISE、NASA和aeeeem数据集的19个软件项目进行了实验。与几种最新的CPDP方法相比,该方法在F-measure、Balance、G-measure和MMC四个评价指标上显著提高了CPDP的绩效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信