Predicting Software Defect Severity Level using Sentence Embedding and Ensemble Learning

2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) Pub Date : 2021-09-01 DOI:10.1109/SEAA53835.2021.00056

L. Kumar, Prakhar Gupta, Lalita Bhanu Murthy Neti, S. K. Rath, Shashank Mouli Satapathy, Vipul Kocher, S. Padmanabhuni

{"title":"Predicting Software Defect Severity Level using Sentence Embedding and Ensemble Learning","authors":"L. Kumar, Prakhar Gupta, Lalita Bhanu Murthy Neti, S. K. Rath, Shashank Mouli Satapathy, Vipul Kocher, S. Padmanabhuni","doi":"10.1109/SEAA53835.2021.00056","DOIUrl":null,"url":null,"abstract":"Bug tracking is one of the prominent activities during the maintenance phase of software development. The severity of the bug acts as a key indicator of its criticality and impact towards planning evolution and maintenance of various types of software products. This indicator measures how negatively the bug may affect the system functionality. This helps in determining how quickly the development teams need to address the bug for successful execution of the software system. Due to a large number of bugs reported every day, the developers find it really difficult to assign the severity level to bugs accurately. Assigning incorrect severity level results in delaying the bug resolution process. Thus automated systems were developed which will assign a severity level using various machine learning techniques. In this work, five different types of sentence embedding techniques have been applied on bugs description to convert the description comments to an n-dimensional vector. These computed vectors are used as an input of the software defect severity level prediction models and ensemble techniques like Bagging, Random Forest classifier, Extra Trees classifier, AdaBoost and Gradient Boosting have been used to train these models. We have also considered different variants of the Synthetic Minority Oversampling Technique (SMOTE) to handle the class imbalance problem as the considered datasets are not evenly distributed. The experimental results on six projects highlight that the usage of sentence embedding, ensemble techniques, and different variants of SMOTE techniques helps in improving the predictive ability of defect severity level prediction models.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEAA53835.2021.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Bug tracking is one of the prominent activities during the maintenance phase of software development. The severity of the bug acts as a key indicator of its criticality and impact towards planning evolution and maintenance of various types of software products. This indicator measures how negatively the bug may affect the system functionality. This helps in determining how quickly the development teams need to address the bug for successful execution of the software system. Due to a large number of bugs reported every day, the developers find it really difficult to assign the severity level to bugs accurately. Assigning incorrect severity level results in delaying the bug resolution process. Thus automated systems were developed which will assign a severity level using various machine learning techniques. In this work, five different types of sentence embedding techniques have been applied on bugs description to convert the description comments to an n-dimensional vector. These computed vectors are used as an input of the software defect severity level prediction models and ensemble techniques like Bagging, Random Forest classifier, Extra Trees classifier, AdaBoost and Gradient Boosting have been used to train these models. We have also considered different variants of the Synthetic Minority Oversampling Technique (SMOTE) to handle the class imbalance problem as the considered datasets are not evenly distributed. The experimental results on six projects highlight that the usage of sentence embedding, ensemble techniques, and different variants of SMOTE techniques helps in improving the predictive ability of defect severity level prediction models.

查看原文本刊更多论文

基于句子嵌入和集成学习的软件缺陷严重程度预测

Bug跟踪是软件开发维护阶段的重要活动之一。缺陷的严重程度是其关键性和对各种类型软件产品的规划发展和维护的影响的关键指标。这个指标衡量bug对系统功能的负面影响程度。这有助于确定开发团队为成功执行软件系统需要多快地解决缺陷。由于每天都有大量的错误报告，开发人员发现很难准确地为错误分配严重性级别。分配不正确的严重性级别会导致错误解决过程的延迟。因此，自动化系统被开发出来，它将使用各种机器学习技术分配严重性级别。本文将五种不同类型的句子嵌入技术应用于bug描述，将描述注释转换为n维向量。这些计算出来的向量被用作软件缺陷严重程度预测模型的输入，而Bagging、Random Forest分类器、Extra Trees分类器、AdaBoost和Gradient Boosting等集成技术被用于训练这些模型。我们还考虑了合成少数过采样技术(SMOTE)的不同变体来处理类不平衡问题，因为所考虑的数据集不是均匀分布的。六个项目的实验结果表明，句子嵌入技术、集成技术和SMOTE技术的不同变体有助于提高缺陷严重程度预测模型的预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

自引率

0.00%

发文量