Cheng-Zen Yang, Hung Du, Sin-Sian Wu, Ing-Xiang Chen
{"title":"Duplication Detection for Software Bug Reports Based on BM25 Term Weighting","authors":"Cheng-Zen Yang, Hung Du, Sin-Sian Wu, Ing-Xiang Chen","doi":"10.1109/TAAI.2012.20","DOIUrl":null,"url":null,"abstract":"Handling bug reports is an important issue in software maintenance. Recently, detection on duplicate bug reports has received much attention. There are two main reasons. First, duplicate bug reports may waste human resource to process these redundant reports. Second, duplicate bug reports may provide abundant information for further software maintenance. In the past studies, many schemes have been proposed using the information retrieval and natural language processing techniques. In this thesis, we propose a novel detection scheme based on a BM25 term weighting scheme. We have conducted empirical experiments on three open source projects, Apache, ArgoUML, and SVN. The experimental results show that the BM25-based scheme can effectively improve the detection performance in nearly all cases.","PeriodicalId":385063,"journal":{"name":"2012 Conference on Technologies and Applications of Artificial Intelligence","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Conference on Technologies and Applications of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI.2012.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Handling bug reports is an important issue in software maintenance. Recently, detection on duplicate bug reports has received much attention. There are two main reasons. First, duplicate bug reports may waste human resource to process these redundant reports. Second, duplicate bug reports may provide abundant information for further software maintenance. In the past studies, many schemes have been proposed using the information retrieval and natural language processing techniques. In this thesis, we propose a novel detection scheme based on a BM25 term weighting scheme. We have conducted empirical experiments on three open source projects, Apache, ArgoUML, and SVN. The experimental results show that the BM25-based scheme can effectively improve the detection performance in nearly all cases.