用机器学习从统计角度分析软件Bug预测和跟踪模型

2022 2nd International Conference on Intelligent Technologies (CONIT) Pub Date : 2022-06-24 DOI:10.1109/CONIT55038.2022.9848385

Darshana N. Tambe, L. Ragha

{"title":"用机器学习从统计角度分析软件Bug预测和跟踪模型","authors":"Darshana N. Tambe, L. Ragha","doi":"10.1109/CONIT55038.2022.9848385","DOIUrl":null,"url":null,"abstract":"Software is the heart of over 99% of all modern-day devices which include smartphones, personal computers, internet of things (IoT) networks, etc. This software is built by a team of engineers which divide the final product into multiple smaller components and these components are integrated together to build the final software, due to which inherent interfacing vulnerabilities & bugs are injected into it. Multiple bugs are also injected into the system due to inexperience or mistakes made by software engineers & programmers. To identify these mistakes, a wide variety of bug prediction & tracing models are proposed by researchers, which assist programmers to predict & track these bugs. But these models have large variations in terms of accuracy, precision, recall, delay, computational complexity, cost of deployment and other performance metrics, due to which it is ambiguous for software designers to identify best bug tracing method(s) for their application deployments. To reduce this ambiguity, a discussion about design of different bug tracing & prediction models and their statistical comparison is done in this paper. This comparison includes evaluation of accuracy, precision, recall, computational complexity and scalability under different scenarios. Based on this comparison, in this paper experiments were performed on five publically available datasets from NASA MDP repository using different algorithms i.e. DRF, LSVM, LR, RF, and kNN. From the results it was observed that kNN algorithm outperforms average 98.8% accuracy on these five datasets and hence kNN were considered to be the most significant with its selected features. In the future, this performance can be improved via use of CNN & LSTM based models, which can utilize the base kNN layer, and estimate highly dense features for efficient classification performance.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":"359 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis of Software Bug Prediction and Tracing Models from a Statistical Perspective Using Machine Learning\",\"authors\":\"Darshana N. Tambe, L. Ragha\",\"doi\":\"10.1109/CONIT55038.2022.9848385\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software is the heart of over 99% of all modern-day devices which include smartphones, personal computers, internet of things (IoT) networks, etc. This software is built by a team of engineers which divide the final product into multiple smaller components and these components are integrated together to build the final software, due to which inherent interfacing vulnerabilities & bugs are injected into it. Multiple bugs are also injected into the system due to inexperience or mistakes made by software engineers & programmers. To identify these mistakes, a wide variety of bug prediction & tracing models are proposed by researchers, which assist programmers to predict & track these bugs. But these models have large variations in terms of accuracy, precision, recall, delay, computational complexity, cost of deployment and other performance metrics, due to which it is ambiguous for software designers to identify best bug tracing method(s) for their application deployments. To reduce this ambiguity, a discussion about design of different bug tracing & prediction models and their statistical comparison is done in this paper. This comparison includes evaluation of accuracy, precision, recall, computational complexity and scalability under different scenarios. Based on this comparison, in this paper experiments were performed on five publically available datasets from NASA MDP repository using different algorithms i.e. DRF, LSVM, LR, RF, and kNN. From the results it was observed that kNN algorithm outperforms average 98.8% accuracy on these five datasets and hence kNN were considered to be the most significant with its selected features. In the future, this performance can be improved via use of CNN & LSTM based models, which can utilize the base kNN layer, and estimate highly dense features for efficient classification performance.\",\"PeriodicalId\":270445,\"journal\":{\"name\":\"2022 2nd International Conference on Intelligent Technologies (CONIT)\",\"volume\":\"359 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Intelligent Technologies (CONIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONIT55038.2022.9848385\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9848385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

软件是99%以上现代设备的核心，包括智能手机、个人电脑、物联网(IoT)网络等。这个软件是由一个工程师团队构建的，他们将最终产品分成多个较小的组件，这些组件集成在一起构建最终的软件，由于固有的接口漏洞和错误被注入其中。由于缺乏经验或软件工程师和程序员犯的错误，也会将多个bug注入系统。为了识别这些错误，研究人员提出了各种各样的bug预测和跟踪模型，帮助程序员预测和跟踪这些bug。但是这些模型在准确性、精度、召回率、延迟、计算复杂性、部署成本和其他性能指标方面有很大的差异，因此软件设计人员很难确定适合其应用程序部署的最佳bug跟踪方法。为了减少这种模糊性，本文讨论了不同的bug跟踪和预测模型的设计以及它们的统计比较。这种比较包括对不同场景下的准确率、精密度、召回率、计算复杂性和可扩展性的评估。在此基础上，本文采用DRF、LSVM、LR、RF和kNN算法对NASA MDP知识库中5个公开的数据集进行了实验。从结果中可以观察到，kNN算法在这五个数据集上的平均准确率超过98.8%，因此kNN被认为是其所选择的特征中最显著的。在未来，这种性能可以通过使用基于CNN和LSTM的模型来提高，该模型可以利用基本kNN层，并估计高密度特征以获得有效的分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analysis of Software Bug Prediction and Tracing Models from a Statistical Perspective Using Machine Learning

Software is the heart of over 99% of all modern-day devices which include smartphones, personal computers, internet of things (IoT) networks, etc. This software is built by a team of engineers which divide the final product into multiple smaller components and these components are integrated together to build the final software, due to which inherent interfacing vulnerabilities & bugs are injected into it. Multiple bugs are also injected into the system due to inexperience or mistakes made by software engineers & programmers. To identify these mistakes, a wide variety of bug prediction & tracing models are proposed by researchers, which assist programmers to predict & track these bugs. But these models have large variations in terms of accuracy, precision, recall, delay, computational complexity, cost of deployment and other performance metrics, due to which it is ambiguous for software designers to identify best bug tracing method(s) for their application deployments. To reduce this ambiguity, a discussion about design of different bug tracing & prediction models and their statistical comparison is done in this paper. This comparison includes evaluation of accuracy, precision, recall, computational complexity and scalability under different scenarios. Based on this comparison, in this paper experiments were performed on five publically available datasets from NASA MDP repository using different algorithms i.e. DRF, LSVM, LR, RF, and kNN. From the results it was observed that kNN algorithm outperforms average 98.8% accuracy on these five datasets and hence kNN were considered to be the most significant with its selected features. In the future, this performance can be improved via use of CNN & LSTM based models, which can utilize the base kNN layer, and estimate highly dense features for efficient classification performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 2nd International Conference on Intelligent Technologies (CONIT)

自引率

0.00%

发文量