{"title":"Analysis of Software Bug Prediction and Tracing Models from a Statistical Perspective Using Machine Learning","authors":"Darshana N. Tambe, L. Ragha","doi":"10.1109/CONIT55038.2022.9848385","DOIUrl":null,"url":null,"abstract":"Software is the heart of over 99% of all modern-day devices which include smartphones, personal computers, internet of things (IoT) networks, etc. This software is built by a team of engineers which divide the final product into multiple smaller components and these components are integrated together to build the final software, due to which inherent interfacing vulnerabilities & bugs are injected into it. Multiple bugs are also injected into the system due to inexperience or mistakes made by software engineers & programmers. To identify these mistakes, a wide variety of bug prediction & tracing models are proposed by researchers, which assist programmers to predict & track these bugs. But these models have large variations in terms of accuracy, precision, recall, delay, computational complexity, cost of deployment and other performance metrics, due to which it is ambiguous for software designers to identify best bug tracing method(s) for their application deployments. To reduce this ambiguity, a discussion about design of different bug tracing & prediction models and their statistical comparison is done in this paper. This comparison includes evaluation of accuracy, precision, recall, computational complexity and scalability under different scenarios. Based on this comparison, in this paper experiments were performed on five publically available datasets from NASA MDP repository using different algorithms i.e. DRF, LSVM, LR, RF, and kNN. From the results it was observed that kNN algorithm outperforms average 98.8% accuracy on these five datasets and hence kNN were considered to be the most significant with its selected features. In the future, this performance can be improved via use of CNN & LSTM based models, which can utilize the base kNN layer, and estimate highly dense features for efficient classification performance.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":"359 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9848385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Software is the heart of over 99% of all modern-day devices which include smartphones, personal computers, internet of things (IoT) networks, etc. This software is built by a team of engineers which divide the final product into multiple smaller components and these components are integrated together to build the final software, due to which inherent interfacing vulnerabilities & bugs are injected into it. Multiple bugs are also injected into the system due to inexperience or mistakes made by software engineers & programmers. To identify these mistakes, a wide variety of bug prediction & tracing models are proposed by researchers, which assist programmers to predict & track these bugs. But these models have large variations in terms of accuracy, precision, recall, delay, computational complexity, cost of deployment and other performance metrics, due to which it is ambiguous for software designers to identify best bug tracing method(s) for their application deployments. To reduce this ambiguity, a discussion about design of different bug tracing & prediction models and their statistical comparison is done in this paper. This comparison includes evaluation of accuracy, precision, recall, computational complexity and scalability under different scenarios. Based on this comparison, in this paper experiments were performed on five publically available datasets from NASA MDP repository using different algorithms i.e. DRF, LSVM, LR, RF, and kNN. From the results it was observed that kNN algorithm outperforms average 98.8% accuracy on these five datasets and hence kNN were considered to be the most significant with its selected features. In the future, this performance can be improved via use of CNN & LSTM based models, which can utilize the base kNN layer, and estimate highly dense features for efficient classification performance.