Predicting Software Defect Severity Level using Sentence Embedding and Ensemble Learning

L. Kumar, Prakhar Gupta, Lalita Bhanu Murthy Neti, S. K. Rath, Shashank Mouli Satapathy, Vipul Kocher, S. Padmanabhuni
{"title":"Predicting Software Defect Severity Level using Sentence Embedding and Ensemble Learning","authors":"L. Kumar, Prakhar Gupta, Lalita Bhanu Murthy Neti, S. K. Rath, Shashank Mouli Satapathy, Vipul Kocher, S. Padmanabhuni","doi":"10.1109/SEAA53835.2021.00056","DOIUrl":null,"url":null,"abstract":"Bug tracking is one of the prominent activities during the maintenance phase of software development. The severity of the bug acts as a key indicator of its criticality and impact towards planning evolution and maintenance of various types of software products. This indicator measures how negatively the bug may affect the system functionality. This helps in determining how quickly the development teams need to address the bug for successful execution of the software system. Due to a large number of bugs reported every day, the developers find it really difficult to assign the severity level to bugs accurately. Assigning incorrect severity level results in delaying the bug resolution process. Thus automated systems were developed which will assign a severity level using various machine learning techniques. In this work, five different types of sentence embedding techniques have been applied on bugs description to convert the description comments to an n-dimensional vector. These computed vectors are used as an input of the software defect severity level prediction models and ensemble techniques like Bagging, Random Forest classifier, Extra Trees classifier, AdaBoost and Gradient Boosting have been used to train these models. We have also considered different variants of the Synthetic Minority Oversampling Technique (SMOTE) to handle the class imbalance problem as the considered datasets are not evenly distributed. The experimental results on six projects highlight that the usage of sentence embedding, ensemble techniques, and different variants of SMOTE techniques helps in improving the predictive ability of defect severity level prediction models.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEAA53835.2021.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Bug tracking is one of the prominent activities during the maintenance phase of software development. The severity of the bug acts as a key indicator of its criticality and impact towards planning evolution and maintenance of various types of software products. This indicator measures how negatively the bug may affect the system functionality. This helps in determining how quickly the development teams need to address the bug for successful execution of the software system. Due to a large number of bugs reported every day, the developers find it really difficult to assign the severity level to bugs accurately. Assigning incorrect severity level results in delaying the bug resolution process. Thus automated systems were developed which will assign a severity level using various machine learning techniques. In this work, five different types of sentence embedding techniques have been applied on bugs description to convert the description comments to an n-dimensional vector. These computed vectors are used as an input of the software defect severity level prediction models and ensemble techniques like Bagging, Random Forest classifier, Extra Trees classifier, AdaBoost and Gradient Boosting have been used to train these models. We have also considered different variants of the Synthetic Minority Oversampling Technique (SMOTE) to handle the class imbalance problem as the considered datasets are not evenly distributed. The experimental results on six projects highlight that the usage of sentence embedding, ensemble techniques, and different variants of SMOTE techniques helps in improving the predictive ability of defect severity level prediction models.
基于句子嵌入和集成学习的软件缺陷严重程度预测
Bug跟踪是软件开发维护阶段的重要活动之一。缺陷的严重程度是其关键性和对各种类型软件产品的规划发展和维护的影响的关键指标。这个指标衡量bug对系统功能的负面影响程度。这有助于确定开发团队为成功执行软件系统需要多快地解决缺陷。由于每天都有大量的错误报告,开发人员发现很难准确地为错误分配严重性级别。分配不正确的严重性级别会导致错误解决过程的延迟。因此,自动化系统被开发出来,它将使用各种机器学习技术分配严重性级别。本文将五种不同类型的句子嵌入技术应用于bug描述,将描述注释转换为n维向量。这些计算出来的向量被用作软件缺陷严重程度预测模型的输入,而Bagging、Random Forest分类器、Extra Trees分类器、AdaBoost和Gradient Boosting等集成技术被用于训练这些模型。我们还考虑了合成少数过采样技术(SMOTE)的不同变体来处理类不平衡问题,因为所考虑的数据集不是均匀分布的。六个项目的实验结果表明,句子嵌入技术、集成技术和SMOTE技术的不同变体有助于提高缺陷严重程度预测模型的预测能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信