Combining CNN with DS3 for Detecting Bug-prone Modules in Cross-version Projects

Andrea Fiore, Alfonso Russo, C. Gravino, M. Risi
{"title":"Combining CNN with DS3 for Detecting Bug-prone Modules in Cross-version Projects","authors":"Andrea Fiore, Alfonso Russo, C. Gravino, M. Risi","doi":"10.1109/SEAA53835.2021.00021","DOIUrl":null,"url":null,"abstract":"The paper focuses on Cross-Version Defect Prediction (CVDP) where the classification model is trained on information of the prior version and then tested to predict defects in the components of the last release. To avoid the distribution differences which could negatively impact the performances of machine learning based model, we consider Dissimilarity-based Sparse Subset Selection (DS3) technique for selecting meaningful representatives to be included in the training set. Furthermore, we employ a Convolutional Neural Network (CNN) to generate structural and semantic features to be merged with the traditional software measures to obtain a more comprehensive list of predictors. To evaluate the usefulness of our proposal for the CVDP scenario, we perform an empirical study on a total of 20 cross-version pairs from 10 different software projects. To build prediction models we consider Logistic Regression (LR) and Random Forest (RF) and we adopt 3 evaluation criteria (i.e., F-measure, G-mean, Balance) to assess the prediction accuracy. Our results show that the use of CNN with both LR and RF models has a significant impact, with an improvement of ∼20% for each evaluation criteria. Differently, we notice that DS3 does not impact significantly in improving prediction accuracy.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEAA53835.2021.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The paper focuses on Cross-Version Defect Prediction (CVDP) where the classification model is trained on information of the prior version and then tested to predict defects in the components of the last release. To avoid the distribution differences which could negatively impact the performances of machine learning based model, we consider Dissimilarity-based Sparse Subset Selection (DS3) technique for selecting meaningful representatives to be included in the training set. Furthermore, we employ a Convolutional Neural Network (CNN) to generate structural and semantic features to be merged with the traditional software measures to obtain a more comprehensive list of predictors. To evaluate the usefulness of our proposal for the CVDP scenario, we perform an empirical study on a total of 20 cross-version pairs from 10 different software projects. To build prediction models we consider Logistic Regression (LR) and Random Forest (RF) and we adopt 3 evaluation criteria (i.e., F-measure, G-mean, Balance) to assess the prediction accuracy. Our results show that the use of CNN with both LR and RF models has a significant impact, with an improvement of ∼20% for each evaluation criteria. Differently, we notice that DS3 does not impact significantly in improving prediction accuracy.
结合CNN和DS3检测跨版本项目中容易出错的模块
本文的重点是跨版本缺陷预测(CVDP),其中分类模型是在先前版本的信息上训练的,然后测试以预测上一个版本组件中的缺陷。为了避免可能对基于机器学习的模型的性能产生负面影响的分布差异,我们考虑了基于不相似度的稀疏子集选择(DS3)技术来选择包含在训练集中的有意义的代表。此外,我们使用卷积神经网络(CNN)来生成结构和语义特征,并与传统的软件度量合并,以获得更全面的预测因子列表。为了评估我们的建议对CVDP场景的有用性,我们对来自10个不同软件项目的总共20个跨版本对进行了实证研究。为了建立预测模型,我们考虑了Logistic回归(LR)和随机森林(RF),并采用3个评价标准(即F-measure, G-mean, Balance)来评估预测准确性。我们的研究结果表明,将CNN与LR和RF模型一起使用具有显著的影响,每个评估标准都提高了约20%。不同的是,我们注意到DS3对提高预测精度没有显著影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信