L. Kumar, Prakhar Gupta, Lalita Bhanu Murthy Neti, S. K. Rath, Shashank Mouli Satapathy, Vipul Kocher, S. Padmanabhuni
{"title":"Predicting Software Defect Severity Level using Sentence Embedding and Ensemble Learning","authors":"L. Kumar, Prakhar Gupta, Lalita Bhanu Murthy Neti, S. K. Rath, Shashank Mouli Satapathy, Vipul Kocher, S. Padmanabhuni","doi":"10.1109/SEAA53835.2021.00056","DOIUrl":null,"url":null,"abstract":"Bug tracking is one of the prominent activities during the maintenance phase of software development. The severity of the bug acts as a key indicator of its criticality and impact towards planning evolution and maintenance of various types of software products. This indicator measures how negatively the bug may affect the system functionality. This helps in determining how quickly the development teams need to address the bug for successful execution of the software system. Due to a large number of bugs reported every day, the developers find it really difficult to assign the severity level to bugs accurately. Assigning incorrect severity level results in delaying the bug resolution process. Thus automated systems were developed which will assign a severity level using various machine learning techniques. In this work, five different types of sentence embedding techniques have been applied on bugs description to convert the description comments to an n-dimensional vector. These computed vectors are used as an input of the software defect severity level prediction models and ensemble techniques like Bagging, Random Forest classifier, Extra Trees classifier, AdaBoost and Gradient Boosting have been used to train these models. We have also considered different variants of the Synthetic Minority Oversampling Technique (SMOTE) to handle the class imbalance problem as the considered datasets are not evenly distributed. The experimental results on six projects highlight that the usage of sentence embedding, ensemble techniques, and different variants of SMOTE techniques helps in improving the predictive ability of defect severity level prediction models.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEAA53835.2021.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Bug tracking is one of the prominent activities during the maintenance phase of software development. The severity of the bug acts as a key indicator of its criticality and impact towards planning evolution and maintenance of various types of software products. This indicator measures how negatively the bug may affect the system functionality. This helps in determining how quickly the development teams need to address the bug for successful execution of the software system. Due to a large number of bugs reported every day, the developers find it really difficult to assign the severity level to bugs accurately. Assigning incorrect severity level results in delaying the bug resolution process. Thus automated systems were developed which will assign a severity level using various machine learning techniques. In this work, five different types of sentence embedding techniques have been applied on bugs description to convert the description comments to an n-dimensional vector. These computed vectors are used as an input of the software defect severity level prediction models and ensemble techniques like Bagging, Random Forest classifier, Extra Trees classifier, AdaBoost and Gradient Boosting have been used to train these models. We have also considered different variants of the Synthetic Minority Oversampling Technique (SMOTE) to handle the class imbalance problem as the considered datasets are not evenly distributed. The experimental results on six projects highlight that the usage of sentence embedding, ensemble techniques, and different variants of SMOTE techniques helps in improving the predictive ability of defect severity level prediction models.