{"title":"Joint distribution matching model for distribution-adaptation-based cross-project defect prediction","authors":"Shaojian Qiu, Lu Lu, Siyu Jiang","doi":"10.1049/IET-SEN.2018.5131","DOIUrl":null,"url":null,"abstract":"Using classification methods to predict software defect is receiving a great deal of attention and most of the existing studies primarily conduct prediction under the within-project setting. However, there usually had no or very limited labelled data to train an effective prediction model at an early phase of the software lifecycle. Thus, cross-project defect prediction (CPDP) is proposed as an alternative solution, which is learning a defect predictor for a target project by using labelled data from a source project. Differing from previous CPDP methods that mainly apply instances selection and classifiers adjustment to improve the performance, in this study, the authors put forward a novel distribution–adaptation-based CPDP approach, joint distribution matching (JDM). Specifically, JDM aims to minimise the joint distribution divergence between the source and target project to improve the CPDP performance. By constructing an adaptive weight vector for the instances of the source project, JDM can be effective and robust at reducing marginal distribution discrepancy and conditional distribution discrepancy simultaneously. Extensive experiments verify that JDM can outperform related distribution–adaptation-based methods on 15 open-source projects that are derived from two types of repositories.","PeriodicalId":13395,"journal":{"name":"IET Softw.","volume":"48 1","pages":"393-402"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Softw.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/IET-SEN.2018.5131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Using classification methods to predict software defect is receiving a great deal of attention and most of the existing studies primarily conduct prediction under the within-project setting. However, there usually had no or very limited labelled data to train an effective prediction model at an early phase of the software lifecycle. Thus, cross-project defect prediction (CPDP) is proposed as an alternative solution, which is learning a defect predictor for a target project by using labelled data from a source project. Differing from previous CPDP methods that mainly apply instances selection and classifiers adjustment to improve the performance, in this study, the authors put forward a novel distribution–adaptation-based CPDP approach, joint distribution matching (JDM). Specifically, JDM aims to minimise the joint distribution divergence between the source and target project to improve the CPDP performance. By constructing an adaptive weight vector for the instances of the source project, JDM can be effective and robust at reducing marginal distribution discrepancy and conditional distribution discrepancy simultaneously. Extensive experiments verify that JDM can outperform related distribution–adaptation-based methods on 15 open-source projects that are derived from two types of repositories.
利用分类方法预测软件缺陷受到了广泛的关注,现有的研究大多是在项目内部环境下进行预测。然而,在软件生命周期的早期阶段,通常没有或非常有限的标记数据来训练有效的预测模型。因此,跨项目缺陷预测(CPDP)被提议作为一种替代解决方案,它是通过使用来自源项目的标记数据来学习目标项目的缺陷预测器。与以往CPDP方法主要采用实例选择和分类器调整来提高性能不同,本文提出了一种基于分布自适应的CPDP方法——联合分布匹配(joint distribution matching, JDM)。具体来说,JDM旨在最小化源项目和目标项目之间的联合分布差异,以提高CPDP的性能。通过为源项目实例构造自适应权向量,JDM可以同时有效地减小边际分布差异和条件分布差异。大量的实验验证了JDM在15个源自两种类型存储库的开源项目上的表现优于相关的基于发行版的方法。