Prioritization of Software Bugs Using Entropy-Based Measures

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Software-Evolution and Process Pub Date : 2024-11-26 DOI:10.1002/smr.2742

Madhu Kumari, Rashmi Singh, V. B. Singh

{"title":"Prioritization of Software Bugs Using Entropy-Based Measures","authors":"Madhu Kumari, Rashmi Singh, V. B. Singh","doi":"10.1002/smr.2742","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Open-source software is evolved through the active participation of users. In general, a user request for bug fixing, the addition of new features, and feature enhancements. Due to this, the software repositories are increasing day by day at an enormous rate. Additionally, user distinct requests add uncertainty and irregularity to the reported bug data. The performance of machine learning algorithms drastically gets influenced by the inappropriate handling of uncertainty and irregularity in the bug data. Researchers have used machine learning techniques for assigning priority to the bug without considering the uncertainty and irregularity in reported bug data. In order to capture the uncertainty and irregularity in the reported bug data, the summary entropy–based measure in combination with the severity and summary weight is considered in this study to predict the priority of bugs in the open-source projects. Accordingly, the classifiers are build using these measures for different machine learning techniques, namely, <i>k</i>-nearest neighbor (KNN), naïve Bayes (NB), J48, random forest (RF), condensed nearest neighbor (CNN), multinomial logistic regression (MLR), decision tree (DT), deep learning (DL), and neural network (NNet) for bug priority prediction This research aims to systematically analyze the summary entropy–based machine learning classifiers from three aspects: type of machine learning technique considered, estimation of various performance measures: Accuracy, Precision, Recall, and F-measure and through existing model comparison. The experimental analysis is carried out using three open-source projects, namely, Eclipse, Mozilla, and OpenOffice. Out of 145 cases (29 products X 5 priority levels), the J48, RF, DT, CNN, NNet, DL, MLR, and KNN techniques give the maximum F-measure for 46, 35, 28, 11, 15, 4, 3, and 1 cases, respectively. The result shows that the proposed summary entropy–based approach using different machine learning techniques performs better than without entropy-based approach and also entropy-based approach improves the Accuracy and F-measure as compared with the existing approaches. It can be concluded that the classifier build using summary entropy measure significantly improves the machine learning algorithms' performance with appropriate handling of uncertainty and irregularity. Moreover, the proposed summary entropy–based classifiers outperform the existing models available in the literature for predicting bug priority.</p>\n </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 2","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.2742","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Open-source software is evolved through the active participation of users. In general, a user request for bug fixing, the addition of new features, and feature enhancements. Due to this, the software repositories are increasing day by day at an enormous rate. Additionally, user distinct requests add uncertainty and irregularity to the reported bug data. The performance of machine learning algorithms drastically gets influenced by the inappropriate handling of uncertainty and irregularity in the bug data. Researchers have used machine learning techniques for assigning priority to the bug without considering the uncertainty and irregularity in reported bug data. In order to capture the uncertainty and irregularity in the reported bug data, the summary entropy–based measure in combination with the severity and summary weight is considered in this study to predict the priority of bugs in the open-source projects. Accordingly, the classifiers are build using these measures for different machine learning techniques, namely, k-nearest neighbor (KNN), naïve Bayes (NB), J48, random forest (RF), condensed nearest neighbor (CNN), multinomial logistic regression (MLR), decision tree (DT), deep learning (DL), and neural network (NNet) for bug priority prediction This research aims to systematically analyze the summary entropy–based machine learning classifiers from three aspects: type of machine learning technique considered, estimation of various performance measures: Accuracy, Precision, Recall, and F-measure and through existing model comparison. The experimental analysis is carried out using three open-source projects, namely, Eclipse, Mozilla, and OpenOffice. Out of 145 cases (29 products X 5 priority levels), the J48, RF, DT, CNN, NNet, DL, MLR, and KNN techniques give the maximum F-measure for 46, 35, 28, 11, 15, 4, 3, and 1 cases, respectively. The result shows that the proposed summary entropy–based approach using different machine learning techniques performs better than without entropy-based approach and also entropy-based approach improves the Accuracy and F-measure as compared with the existing approaches. It can be concluded that the classifier build using summary entropy measure significantly improves the machine learning algorithms' performance with appropriate handling of uncertainty and irregularity. Moreover, the proposed summary entropy–based classifiers outperform the existing models available in the literature for predicting bug priority.

查看原文本刊更多论文

使用基于熵的度量对软件缺陷进行优先级排序

开源软件是通过用户的积极参与而发展起来的。一般来说，用户请求修复错误、添加新特性和增强特性。因此，软件存储库正以惊人的速度与日俱增。此外，用户不同的请求给报告的bug数据增加了不确定性和不规则性。机器学习算法的性能受到错误数据中不确定性和不规则性处理不当的极大影响。研究人员使用机器学习技术为错误分配优先级，而不考虑报告错误数据的不确定性和不规则性。为了捕捉bug报告数据中的不确定性和不规则性，本研究将基于汇总熵的度量方法与严重性和汇总权重相结合，来预测开源项目中bug的优先级。针对不同的机器学习技术，分别使用k-近邻（KNN）、naïve贝叶斯（NB）、J48、随机森林（RF）、精简近邻（CNN）、多项逻辑回归（MLR）、决策树（DT）、深度学习（DL）和神经网络（NNet）构建分类器进行bug优先级预测。本研究旨在从三个方面系统分析基于总结熵的机器学习分类器：考虑的机器学习技术类型，估计各种性能指标：准确性，精度，召回率和F-measure，并通过现有模型比较。实验分析采用Eclipse、Mozilla和OpenOffice三个开源项目进行。在145个案例（29个产品X 5个优先级）中，J48、RF、DT、CNN、NNet、DL、MLR和KNN技术分别给出了46、35、28、11、15、4、3和1个案例的最大f值。结果表明，使用不同机器学习技术的基于汇总熵的方法比不使用基于熵的方法性能更好，并且与现有方法相比，基于熵的方法提高了准确性和F-measure。可以得出结论，使用汇总熵度量构建的分类器通过适当处理不确定性和不规则性显著提高了机器学习算法的性能。此外，所提出的基于汇总熵的分类器在预测bug优先级方面优于文献中现有的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Software-Evolution and Process COMPUTER SCIENCE, SOFTWARE ENGINEERING-

自引率

10.00%

发文量

109