{"title":"An Empirical Study of Heterogeneous Cross-Project Defect Prediction Using Various Statistical Techniques","authors":"Rohit Vashisht, S. Rizvi","doi":"10.4018/ijec.2021040104","DOIUrl":null,"url":null,"abstract":"Cross-project defect prediction (CPDP) forecasts flaws in a target project through defect prediction models (DPM) trained by defect data of another project. However, CPDP has a prevalent problem (i.e., distinct projects must have identical features to describe themselves). This article emphasizes on heterogeneous CPDP (HCPDP) modeling that does not require same metric set between two applications and builds DPM based on metrics showing comparable distribution in their values for a given pair of datasets. This paper evaluates empirically and theoretically HCPDP modeling, which comprises of three main phases: feature ranking and feature selection, metric matching, and finally, predicting defects in the target application. The research work has been experimented on 13 benchmarked datasets of three open source projects. Results show that performance of HCPDP is very much comparable to baseline within project defect prediction (WPDP) and XG boosting classification model gives best results when used in conjunction with Kendall's method of correlation as compared to other set of classifiers.","PeriodicalId":46330,"journal":{"name":"International Journal of e-Collaboration","volume":null,"pages":null},"PeriodicalIF":0.2000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of e-Collaboration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijec.2021040104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2
Abstract
Cross-project defect prediction (CPDP) forecasts flaws in a target project through defect prediction models (DPM) trained by defect data of another project. However, CPDP has a prevalent problem (i.e., distinct projects must have identical features to describe themselves). This article emphasizes on heterogeneous CPDP (HCPDP) modeling that does not require same metric set between two applications and builds DPM based on metrics showing comparable distribution in their values for a given pair of datasets. This paper evaluates empirically and theoretically HCPDP modeling, which comprises of three main phases: feature ranking and feature selection, metric matching, and finally, predicting defects in the target application. The research work has been experimented on 13 benchmarked datasets of three open source projects. Results show that performance of HCPDP is very much comparable to baseline within project defect prediction (WPDP) and XG boosting classification model gives best results when used in conjunction with Kendall's method of correlation as compared to other set of classifiers.
期刊介绍:
The International Journal of e-Collaboration (IJeC) addresses the design and implementation of e-collaboration technologies, assesses its behavioral impact on individuals and groups, and presents theoretical considerations on links between the use of e-collaboration technologies and behavioral patterns. An innovative collection of the latest research findings, this journal covers significant topics such as Web-based chat tools, Web-based asynchronous conferencing tools, e-mail, listservs, collaborative writing tools, group decision support systems, teleconferencing suites, workflow automation systems, and document management technologies.