Concept Drift in Software Defect Prediction: A Method for Detecting and Handling the Drift

IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Arvind Kumar Gangwar, Sandeep Kumar
{"title":"Concept Drift in Software Defect Prediction: A Method for Detecting and Handling the Drift","authors":"Arvind Kumar Gangwar, Sandeep Kumar","doi":"https://dl.acm.org/doi/10.1145/3589342","DOIUrl":null,"url":null,"abstract":"<p>Software Defect Prediction (SDP) is crucial towards software quality assurance in software engineering. SDP analyzes the software metrics data for timely prediction of defect prone software modules. Prediction process is automated by constructing defect prediction classification models using machine learning techniques. These models are trained using metrics data from historical projects of similar types. Based on the learned experience, models are used to predict defect prone modules in currently tested software. These models perform well if the concept is stationary in a dynamic software development environment. But their performance degrades unexpectedly in the presence of change in concept (Concept Drift). Therefore, concept drift (CD) detection is an important activity for improving the overall accuracy of the prediction model. Previous studies on SDP have shown that CD may occur in software defect data and the used defect prediction model may require to be updated to deal with CD. This phenomenon of handling the CD is known as CD adaptation. It is observed that still efforts need to be done in this direction in the SDP domain. In this article, we have proposed a pair of paired learners (PoPL) approach for handling CD in SDP. We combined the drift detection capabilities of two independent paired learners and used the paired learner (PL) with the best performance in recent time for next prediction. We experimented on various publicly available software defect datasets garnered from public data repositories. Experimentation results showed that our proposed approach performed better than the existing similar works and the base PL model based on various performance measures.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"281 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3589342","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Software Defect Prediction (SDP) is crucial towards software quality assurance in software engineering. SDP analyzes the software metrics data for timely prediction of defect prone software modules. Prediction process is automated by constructing defect prediction classification models using machine learning techniques. These models are trained using metrics data from historical projects of similar types. Based on the learned experience, models are used to predict defect prone modules in currently tested software. These models perform well if the concept is stationary in a dynamic software development environment. But their performance degrades unexpectedly in the presence of change in concept (Concept Drift). Therefore, concept drift (CD) detection is an important activity for improving the overall accuracy of the prediction model. Previous studies on SDP have shown that CD may occur in software defect data and the used defect prediction model may require to be updated to deal with CD. This phenomenon of handling the CD is known as CD adaptation. It is observed that still efforts need to be done in this direction in the SDP domain. In this article, we have proposed a pair of paired learners (PoPL) approach for handling CD in SDP. We combined the drift detection capabilities of two independent paired learners and used the paired learner (PL) with the best performance in recent time for next prediction. We experimented on various publicly available software defect datasets garnered from public data repositories. Experimentation results showed that our proposed approach performed better than the existing similar works and the base PL model based on various performance measures.

软件缺陷预测中的概念漂移:一种检测和处理漂移的方法
在软件工程中,软件缺陷预测是保证软件质量的关键。SDP分析软件度量数据,以便及时预测容易出现缺陷的软件模块。利用机器学习技术构建缺陷预测分类模型,实现了预测过程的自动化。这些模型使用来自类似类型的历史项目的度量数据进行训练。基于所学的经验,模型被用来预测当前测试软件中容易出现缺陷的模块。如果概念在动态软件开发环境中是固定的,那么这些模型表现良好。但当概念发生变化时,它们的性能会意外下降(概念漂移)。因此,概念漂移(CD)检测是提高预测模型整体精度的重要活动。以往关于SDP的研究表明,软件缺陷数据中可能出现CD,所使用的缺陷预测模型可能需要更新来处理CD。这种处理CD的现象被称为CD适应。可以观察到,在SDP领域,仍需要在这个方向上作出努力。在本文中,我们提出了一对配对学习器(PoPL)方法来处理SDP中的CD。我们结合了两个独立的配对学习器的漂移检测能力,并使用最近表现最好的配对学习器(PL)进行下一次预测。我们对从公共数据存储库中收集的各种公开可用的软件缺陷数据集进行了实验。实验结果表明,我们提出的方法比现有的类似工作和基于各种性能指标的基本PL模型表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Internet Technology
ACM Transactions on Internet Technology 工程技术-计算机:软件工程
CiteScore
10.30
自引率
1.90%
发文量
137
审稿时长
>12 weeks
期刊介绍: ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信