{"title":"Bug报告类型识别的短文本分类方法比较研究","authors":"J. Polpinij, M. Kaenampornpan, B. Luaphol","doi":"10.1109/RI2C56397.2022.9910299","DOIUrl":null,"url":null,"abstract":"This document is a model and instructions for LATEX. Previous related studies often used the ‘summary’ of bug reports because this part contains less noise. However, bug report summaries are often short, leading to short text classification issues which may have been overlooked. This study compares short text classification methods by categorizing bug reports into two classes as real-bug and non-bug based on three major factors namely bug report features, term weighting schemes and machine learning algorithms. Four bug report features (i.e. unigram, unigram + bigram, unigram + CamelCase, and all features), three term weighting schemes (i.e. tf, tf-idf and tf-igm) and three machine learning algorithms (i.e. random forest, support vector machine, and k-means clustering) are compared using bug reports relating to the Mozilla Firefox open source. Finally, unigram + CamelCase features along with tf-igm and support vector machine provide the most optimal bug report classification performance.","PeriodicalId":403083,"journal":{"name":"2022 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C)","volume":"347 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Study of Short Text Classification Methods for Bug Report Type Identification\",\"authors\":\"J. Polpinij, M. Kaenampornpan, B. Luaphol\",\"doi\":\"10.1109/RI2C56397.2022.9910299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This document is a model and instructions for LATEX. Previous related studies often used the ‘summary’ of bug reports because this part contains less noise. However, bug report summaries are often short, leading to short text classification issues which may have been overlooked. This study compares short text classification methods by categorizing bug reports into two classes as real-bug and non-bug based on three major factors namely bug report features, term weighting schemes and machine learning algorithms. Four bug report features (i.e. unigram, unigram + bigram, unigram + CamelCase, and all features), three term weighting schemes (i.e. tf, tf-idf and tf-igm) and three machine learning algorithms (i.e. random forest, support vector machine, and k-means clustering) are compared using bug reports relating to the Mozilla Firefox open source. Finally, unigram + CamelCase features along with tf-igm and support vector machine provide the most optimal bug report classification performance.\",\"PeriodicalId\":403083,\"journal\":{\"name\":\"2022 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C)\",\"volume\":\"347 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RI2C56397.2022.9910299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RI2C56397.2022.9910299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Study of Short Text Classification Methods for Bug Report Type Identification
This document is a model and instructions for LATEX. Previous related studies often used the ‘summary’ of bug reports because this part contains less noise. However, bug report summaries are often short, leading to short text classification issues which may have been overlooked. This study compares short text classification methods by categorizing bug reports into two classes as real-bug and non-bug based on three major factors namely bug report features, term weighting schemes and machine learning algorithms. Four bug report features (i.e. unigram, unigram + bigram, unigram + CamelCase, and all features), three term weighting schemes (i.e. tf, tf-idf and tf-igm) and three machine learning algorithms (i.e. random forest, support vector machine, and k-means clustering) are compared using bug reports relating to the Mozilla Firefox open source. Finally, unigram + CamelCase features along with tf-igm and support vector machine provide the most optimal bug report classification performance.