{"title":"An Empirical Investigation to Overcome Class-Imbalance in Inspection Reviews","authors":"Maninder Singh, G. Walia, Anurag Goswami","doi":"10.1109/MLDS.2017.15","DOIUrl":null,"url":null,"abstract":"Background: software inspection results in reviews that report the presence of faults. Requirements author must manually read through the reviews and differentiate between true-faults and false-positives. Problem: post-inspection decisions (fault or nonfault) are difficult and time consuming. It is difficult to employ machine learning (ML) techniques directly to raw (unstructured) data because of class imbalance problem and possible fault-slippage through misclassification of fault. Aim: The aim of this research is to solve this problem with the help of ensemble approach and priority analysis to achieve significant accuracy in determining true-fault and false-positive reviews without losing any listed fault. Method: We conducted empirical experiment using two trained models (with reviews from inspection domain vs. movies domain) to address class imbalance problem. Our approach uses ensemble methods to develop classification confidence of inspection reviews and assigns them to appropriate priority class. Results: The results showed that movies trained model performed better than inspection trained and restricted any possible fault-slippage.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Machine Learning and Data Science (MLDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLDS.2017.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Background: software inspection results in reviews that report the presence of faults. Requirements author must manually read through the reviews and differentiate between true-faults and false-positives. Problem: post-inspection decisions (fault or nonfault) are difficult and time consuming. It is difficult to employ machine learning (ML) techniques directly to raw (unstructured) data because of class imbalance problem and possible fault-slippage through misclassification of fault. Aim: The aim of this research is to solve this problem with the help of ensemble approach and priority analysis to achieve significant accuracy in determining true-fault and false-positive reviews without losing any listed fault. Method: We conducted empirical experiment using two trained models (with reviews from inspection domain vs. movies domain) to address class imbalance problem. Our approach uses ensemble methods to develop classification confidence of inspection reviews and assigns them to appropriate priority class. Results: The results showed that movies trained model performed better than inspection trained and restricted any possible fault-slippage.