Analysis of Software Bug Prediction and Tracing Models from a Statistical Perspective Using Machine Learning

Darshana N. Tambe, L. Ragha
{"title":"Analysis of Software Bug Prediction and Tracing Models from a Statistical Perspective Using Machine Learning","authors":"Darshana N. Tambe, L. Ragha","doi":"10.1109/CONIT55038.2022.9848385","DOIUrl":null,"url":null,"abstract":"Software is the heart of over 99% of all modern-day devices which include smartphones, personal computers, internet of things (IoT) networks, etc. This software is built by a team of engineers which divide the final product into multiple smaller components and these components are integrated together to build the final software, due to which inherent interfacing vulnerabilities & bugs are injected into it. Multiple bugs are also injected into the system due to inexperience or mistakes made by software engineers & programmers. To identify these mistakes, a wide variety of bug prediction & tracing models are proposed by researchers, which assist programmers to predict & track these bugs. But these models have large variations in terms of accuracy, precision, recall, delay, computational complexity, cost of deployment and other performance metrics, due to which it is ambiguous for software designers to identify best bug tracing method(s) for their application deployments. To reduce this ambiguity, a discussion about design of different bug tracing & prediction models and their statistical comparison is done in this paper. This comparison includes evaluation of accuracy, precision, recall, computational complexity and scalability under different scenarios. Based on this comparison, in this paper experiments were performed on five publically available datasets from NASA MDP repository using different algorithms i.e. DRF, LSVM, LR, RF, and kNN. From the results it was observed that kNN algorithm outperforms average 98.8% accuracy on these five datasets and hence kNN were considered to be the most significant with its selected features. In the future, this performance can be improved via use of CNN & LSTM based models, which can utilize the base kNN layer, and estimate highly dense features for efficient classification performance.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9848385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Software is the heart of over 99% of all modern-day devices which include smartphones, personal computers, internet of things (IoT) networks, etc. This software is built by a team of engineers which divide the final product into multiple smaller components and these components are integrated together to build the final software, due to which inherent interfacing vulnerabilities & bugs are injected into it. Multiple bugs are also injected into the system due to inexperience or mistakes made by software engineers & programmers. To identify these mistakes, a wide variety of bug prediction & tracing models are proposed by researchers, which assist programmers to predict & track these bugs. But these models have large variations in terms of accuracy, precision, recall, delay, computational complexity, cost of deployment and other performance metrics, due to which it is ambiguous for software designers to identify best bug tracing method(s) for their application deployments. To reduce this ambiguity, a discussion about design of different bug tracing & prediction models and their statistical comparison is done in this paper. This comparison includes evaluation of accuracy, precision, recall, computational complexity and scalability under different scenarios. Based on this comparison, in this paper experiments were performed on five publically available datasets from NASA MDP repository using different algorithms i.e. DRF, LSVM, LR, RF, and kNN. From the results it was observed that kNN algorithm outperforms average 98.8% accuracy on these five datasets and hence kNN were considered to be the most significant with its selected features. In the future, this performance can be improved via use of CNN & LSTM based models, which can utilize the base kNN layer, and estimate highly dense features for efficient classification performance.
用机器学习从统计角度分析软件Bug预测和跟踪模型
软件是99%以上现代设备的核心,包括智能手机、个人电脑、物联网(IoT)网络等。这个软件是由一个工程师团队构建的,他们将最终产品分成多个较小的组件,这些组件集成在一起构建最终的软件,由于固有的接口漏洞和错误被注入其中。由于缺乏经验或软件工程师和程序员犯的错误,也会将多个bug注入系统。为了识别这些错误,研究人员提出了各种各样的bug预测和跟踪模型,帮助程序员预测和跟踪这些bug。但是这些模型在准确性、精度、召回率、延迟、计算复杂性、部署成本和其他性能指标方面有很大的差异,因此软件设计人员很难确定适合其应用程序部署的最佳bug跟踪方法。为了减少这种模糊性,本文讨论了不同的bug跟踪和预测模型的设计以及它们的统计比较。这种比较包括对不同场景下的准确率、精密度、召回率、计算复杂性和可扩展性的评估。在此基础上,本文采用DRF、LSVM、LR、RF和kNN算法对NASA MDP知识库中5个公开的数据集进行了实验。从结果中可以观察到,kNN算法在这五个数据集上的平均准确率超过98.8%,因此kNN被认为是其所选择的特征中最显著的。在未来,这种性能可以通过使用基于CNN和LSTM的模型来提高,该模型可以利用基本kNN层,并估计高密度特征以获得有效的分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信