Parameter tuning in KNN for software defect prediction: an empirical analysis

M. Mabayoje, A. O. Balogun, Hajara Jibril, Jelili Olaniyi Atoyebi, H. A. Mojeed, V. E. Adeyemo
{"title":"Parameter tuning in KNN for software defect prediction: an empirical analysis","authors":"M. Mabayoje, A. O. Balogun, Hajara Jibril, Jelili Olaniyi Atoyebi, H. A. Mojeed, V. E. Adeyemo","doi":"10.14710/jtsiskom.7.4.2019.121-126","DOIUrl":null,"url":null,"abstract":"Software Defect Prediction (SDP) provides insights that can help software teams to allocate their limited resources in developing software systems. It predicts likely defective modules and helps avoid pitfalls that are associated with such modules. However, these insights may be inaccurate and unreliable if parameters of SDP models are not taken into consideration. In this study, the effect of parameter tuning on the k nearest neighbor (k-NN) in SDP was investigated. More specifically, the impact of varying and selecting optimal k value, the influence of distance weighting and the impact of distance functions on k-NN. An experiment was designed to investigate this problem in SDP over 6 software defect datasets. The experimental results revealed that k value should be greater than 1 (default) as the average RMSE values of k-NN when k>1(0.2727) is less than when k=1(default) (0.3296). In addition, the predictive performance of k-NN with distance weighing improved by 8.82% and 1.7% based on AUC and accuracy respectively. In terms of the distance function, kNN models based on Dilca distance function performed better than the Euclidean distance function (default distance function). Hence, we conclude that parameter tuning has a positive effect on the predictive performance of k-NN in SDP.","PeriodicalId":56231,"journal":{"name":"Jurnal Teknologi dan Sistem Komputer","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknologi dan Sistem Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14710/jtsiskom.7.4.2019.121-126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Software Defect Prediction (SDP) provides insights that can help software teams to allocate their limited resources in developing software systems. It predicts likely defective modules and helps avoid pitfalls that are associated with such modules. However, these insights may be inaccurate and unreliable if parameters of SDP models are not taken into consideration. In this study, the effect of parameter tuning on the k nearest neighbor (k-NN) in SDP was investigated. More specifically, the impact of varying and selecting optimal k value, the influence of distance weighting and the impact of distance functions on k-NN. An experiment was designed to investigate this problem in SDP over 6 software defect datasets. The experimental results revealed that k value should be greater than 1 (default) as the average RMSE values of k-NN when k>1(0.2727) is less than when k=1(default) (0.3296). In addition, the predictive performance of k-NN with distance weighing improved by 8.82% and 1.7% based on AUC and accuracy respectively. In terms of the distance function, kNN models based on Dilca distance function performed better than the Euclidean distance function (default distance function). Hence, we conclude that parameter tuning has a positive effect on the predictive performance of k-NN in SDP.
用于软件缺陷预测的KNN参数调优:一个实证分析
软件缺陷预测(SDP)提供了能够帮助软件团队在开发软件系统时分配有限资源的见解。它预测可能有缺陷的模块,并帮助避免与这些模块相关的陷阱。然而,如果不考虑SDP模型的参数,这些见解可能是不准确和不可靠的。在本研究中,研究了参数调整对SDP中k近邻(k- nn)的影响。更具体地说,变化和选择最优k值的影响,距离加权的影响和距离函数对k- nn的影响。设计了一个实验来研究SDP中6个软件缺陷数据集上的这个问题。实验结果表明,k>1(0.2727)时k- nn的平均RMSE值小于k=1(默认)时的平均RMSE值(0.3296),因此k值应大于1(默认)。此外,基于距离加权的k-NN在AUC和准确率上的预测性能分别提高了8.82%和1.7%。在距离函数方面,基于Dilca距离函数的kNN模型优于欧氏距离函数(默认距离函数)。因此,我们得出结论,参数调整对k-NN在SDP中的预测性能有积极的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
6
审稿时长
6 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信