{"title":"利用特征选择改进缺陷报告严重性预测的实证研究","authors":"Cheng-Zen Yang, Chun-Chi Hou, Wei-Chen Kao, Ing-Xiang Chen","doi":"10.1109/APSEC.2012.144","DOIUrl":null,"url":null,"abstract":"In software maintenance, severity prediction on defect reports is an emerging issue obtaining research attention due to the considerable triaging cost. In the past research work, several text mining approaches have been proposed to predict the severity using advanced learning models. Although these approaches demonstrate the effectiveness of predicting the severity, they do not discuss the problem of how to find the indicators in good quality. In this paper, we discuss whether feature selection can benefit the severity prediction task with three commonly used feature selection schemes, Information Gain, Chi-Square, and Correlation Coefficient, based on the Multinomial Naive Bayes classification approach. We have conducted empirical experiments with four open-source components from Eclipse and Mozilla. The experimental results show that these three feature selection schemes can further improve the predication performance in over half the cases.","PeriodicalId":364411,"journal":{"name":"2012 19th Asia-Pacific Software Engineering Conference","volume":"155 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":"{\"title\":\"An Empirical Study on Improving Severity Prediction of Defect Reports Using Feature Selection\",\"authors\":\"Cheng-Zen Yang, Chun-Chi Hou, Wei-Chen Kao, Ing-Xiang Chen\",\"doi\":\"10.1109/APSEC.2012.144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In software maintenance, severity prediction on defect reports is an emerging issue obtaining research attention due to the considerable triaging cost. In the past research work, several text mining approaches have been proposed to predict the severity using advanced learning models. Although these approaches demonstrate the effectiveness of predicting the severity, they do not discuss the problem of how to find the indicators in good quality. In this paper, we discuss whether feature selection can benefit the severity prediction task with three commonly used feature selection schemes, Information Gain, Chi-Square, and Correlation Coefficient, based on the Multinomial Naive Bayes classification approach. We have conducted empirical experiments with four open-source components from Eclipse and Mozilla. The experimental results show that these three feature selection schemes can further improve the predication performance in over half the cases.\",\"PeriodicalId\":364411,\"journal\":{\"name\":\"2012 19th Asia-Pacific Software Engineering Conference\",\"volume\":\"155 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"57\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 19th Asia-Pacific Software Engineering Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC.2012.144\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 19th Asia-Pacific Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC.2012.144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Empirical Study on Improving Severity Prediction of Defect Reports Using Feature Selection
In software maintenance, severity prediction on defect reports is an emerging issue obtaining research attention due to the considerable triaging cost. In the past research work, several text mining approaches have been proposed to predict the severity using advanced learning models. Although these approaches demonstrate the effectiveness of predicting the severity, they do not discuss the problem of how to find the indicators in good quality. In this paper, we discuss whether feature selection can benefit the severity prediction task with three commonly used feature selection schemes, Information Gain, Chi-Square, and Correlation Coefficient, based on the Multinomial Naive Bayes classification approach. We have conducted empirical experiments with four open-source components from Eclipse and Mozilla. The experimental results show that these three feature selection schemes can further improve the predication performance in over half the cases.