An Empirical Investigation to Overcome Class-Imbalance in Inspection Reviews

2017 International Conference on Machine Learning and Data Science (MLDS) Pub Date : 2017-12-01 DOI:10.1109/MLDS.2017.15

Maninder Singh, G. Walia, Anurag Goswami

{"title":"An Empirical Investigation to Overcome Class-Imbalance in Inspection Reviews","authors":"Maninder Singh, G. Walia, Anurag Goswami","doi":"10.1109/MLDS.2017.15","DOIUrl":null,"url":null,"abstract":"Background: software inspection results in reviews that report the presence of faults. Requirements author must manually read through the reviews and differentiate between true-faults and false-positives. Problem: post-inspection decisions (fault or nonfault) are difficult and time consuming. It is difficult to employ machine learning (ML) techniques directly to raw (unstructured) data because of class imbalance problem and possible fault-slippage through misclassification of fault. Aim: The aim of this research is to solve this problem with the help of ensemble approach and priority analysis to achieve significant accuracy in determining true-fault and false-positive reviews without losing any listed fault. Method: We conducted empirical experiment using two trained models (with reviews from inspection domain vs. movies domain) to address class imbalance problem. Our approach uses ensemble methods to develop classification confidence of inspection reviews and assigns them to appropriate priority class. Results: The results showed that movies trained model performed better than inspection trained and restricted any possible fault-slippage.","PeriodicalId":248656,"journal":{"name":"2017 International Conference on Machine Learning and Data Science (MLDS)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Machine Learning and Data Science (MLDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLDS.2017.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Background: software inspection results in reviews that report the presence of faults. Requirements author must manually read through the reviews and differentiate between true-faults and false-positives. Problem: post-inspection decisions (fault or nonfault) are difficult and time consuming. It is difficult to employ machine learning (ML) techniques directly to raw (unstructured) data because of class imbalance problem and possible fault-slippage through misclassification of fault. Aim: The aim of this research is to solve this problem with the help of ensemble approach and priority analysis to achieve significant accuracy in determining true-fault and false-positive reviews without losing any listed fault. Method: We conducted empirical experiment using two trained models (with reviews from inspection domain vs. movies domain) to address class imbalance problem. Our approach uses ensemble methods to develop classification confidence of inspection reviews and assigns them to appropriate priority class. Results: The results showed that movies trained model performed better than inspection trained and restricted any possible fault-slippage.

查看原文本刊更多论文

检视评鉴中克服阶层失衡的实证研究

背景:软件检查的结果是报告故障存在的评审。需求作者必须手动阅读审查，并区分真正的错误和假阳性。问题:检查后的决定(故障或无故障)是困难和耗时的。由于类不平衡问题和错误分类可能导致的断层滑动，机器学习技术很难直接应用于原始(非结构化)数据。目的:本研究的目的是借助集成方法和优先级分析来解决这一问题，在不丢失任何列出的故障的情况下，在确定真故障和假阳性评论方面达到显著的准确性。方法:我们使用两个训练好的模型(分别来自检验领域和电影领域的评论)进行实证实验来解决阶级失衡问题。我们的方法使用集成方法来开发检查审查的分类置信度，并将它们分配到适当的优先级类。结果:实验结果表明，电影训练模型的性能优于检测训练模型，有效地抑制了任何可能的断层滑动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 International Conference on Machine Learning and Data Science (MLDS)

自引率

0.00%

发文量