基于BM25词权的软件Bug报告重复检测

Cheng-Zen Yang, Hung Du, Sin-Sian Wu, Ing-Xiang Chen
{"title":"基于BM25词权的软件Bug报告重复检测","authors":"Cheng-Zen Yang, Hung Du, Sin-Sian Wu, Ing-Xiang Chen","doi":"10.1109/TAAI.2012.20","DOIUrl":null,"url":null,"abstract":"Handling bug reports is an important issue in software maintenance. Recently, detection on duplicate bug reports has received much attention. There are two main reasons. First, duplicate bug reports may waste human resource to process these redundant reports. Second, duplicate bug reports may provide abundant information for further software maintenance. In the past studies, many schemes have been proposed using the information retrieval and natural language processing techniques. In this thesis, we propose a novel detection scheme based on a BM25 term weighting scheme. We have conducted empirical experiments on three open source projects, Apache, ArgoUML, and SVN. The experimental results show that the BM25-based scheme can effectively improve the detection performance in nearly all cases.","PeriodicalId":385063,"journal":{"name":"2012 Conference on Technologies and Applications of Artificial Intelligence","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Duplication Detection for Software Bug Reports Based on BM25 Term Weighting\",\"authors\":\"Cheng-Zen Yang, Hung Du, Sin-Sian Wu, Ing-Xiang Chen\",\"doi\":\"10.1109/TAAI.2012.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Handling bug reports is an important issue in software maintenance. Recently, detection on duplicate bug reports has received much attention. There are two main reasons. First, duplicate bug reports may waste human resource to process these redundant reports. Second, duplicate bug reports may provide abundant information for further software maintenance. In the past studies, many schemes have been proposed using the information retrieval and natural language processing techniques. In this thesis, we propose a novel detection scheme based on a BM25 term weighting scheme. We have conducted empirical experiments on three open source projects, Apache, ArgoUML, and SVN. The experimental results show that the BM25-based scheme can effectively improve the detection performance in nearly all cases.\",\"PeriodicalId\":385063,\"journal\":{\"name\":\"2012 Conference on Technologies and Applications of Artificial Intelligence\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Conference on Technologies and Applications of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TAAI.2012.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Conference on Technologies and Applications of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI.2012.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

摘要

处理bug报告是软件维护中的一个重要问题。最近,对重复错误报告的检测受到了广泛关注。主要有两个原因。首先,重复的bug报告可能会浪费人力资源来处理这些冗余的报告。第二,重复的错误报告可以为进一步的软件维护提供丰富的信息。在过去的研究中,利用信息检索和自然语言处理技术提出了许多方案。在本文中,我们提出了一种新的基于BM25术语加权的检测方案。我们已经对Apache、ArgoUML和SVN这三个开源项目进行了实证实验。实验结果表明,基于bm25的方案在几乎所有情况下都能有效提高检测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Duplication Detection for Software Bug Reports Based on BM25 Term Weighting
Handling bug reports is an important issue in software maintenance. Recently, detection on duplicate bug reports has received much attention. There are two main reasons. First, duplicate bug reports may waste human resource to process these redundant reports. Second, duplicate bug reports may provide abundant information for further software maintenance. In the past studies, many schemes have been proposed using the information retrieval and natural language processing techniques. In this thesis, we propose a novel detection scheme based on a BM25 term weighting scheme. We have conducted empirical experiments on three open source projects, Apache, ArgoUML, and SVN. The experimental results show that the BM25-based scheme can effectively improve the detection performance in nearly all cases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信