层次狄利克雷过程与潜在狄利克雷分配在bug报告多类分类中的比较

Nachai Limsettho, Hideaki Hata, Ken-ichi Matsumoto
{"title":"层次狄利克雷过程与潜在狄利克雷分配在bug报告多类分类中的比较","authors":"Nachai Limsettho, Hideaki Hata, Ken-ichi Matsumoto","doi":"10.1109/SNPD.2014.6888695","DOIUrl":null,"url":null,"abstract":"Bug reports play essential roles in many software engineering tasks. Since validity and performance of these tasks definitely rely on the quality of bug reports, accurate information from bug reports is very important. However, as found in previous study, significant numbers of reports classified as bug are not really a bug. Recent studies proposed techniques to automatically classify bug reports into binary classes, yet there is still more to desire. These bug reports can be classified into multiple classes, which could help to identify what these reports are actually about. Moreover, previous study only looks into one possibility of topic modeling, that is, Latent Dirichlet Allocation (LDA). While LDA has its advantage, parameter tuning is required. In this paper, we propose a nonparametric approach to automatically classify bug reports with, another topic modeling method, Hierarchical Dirichlet Process (HDP). The result indicates that our nonparametric approach performance is comparable to the parametric one. We also examine various aspects of LDA to provide more thoroughly understanding of this process.","PeriodicalId":272932,"journal":{"name":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Comparing hierarchical dirichlet process with latent dirichlet allocation in bug report multiclass classification\",\"authors\":\"Nachai Limsettho, Hideaki Hata, Ken-ichi Matsumoto\",\"doi\":\"10.1109/SNPD.2014.6888695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bug reports play essential roles in many software engineering tasks. Since validity and performance of these tasks definitely rely on the quality of bug reports, accurate information from bug reports is very important. However, as found in previous study, significant numbers of reports classified as bug are not really a bug. Recent studies proposed techniques to automatically classify bug reports into binary classes, yet there is still more to desire. These bug reports can be classified into multiple classes, which could help to identify what these reports are actually about. Moreover, previous study only looks into one possibility of topic modeling, that is, Latent Dirichlet Allocation (LDA). While LDA has its advantage, parameter tuning is required. In this paper, we propose a nonparametric approach to automatically classify bug reports with, another topic modeling method, Hierarchical Dirichlet Process (HDP). The result indicates that our nonparametric approach performance is comparable to the parametric one. We also examine various aspects of LDA to provide more thoroughly understanding of this process.\",\"PeriodicalId\":272932,\"journal\":{\"name\":\"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SNPD.2014.6888695\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2014.6888695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

Bug报告在许多软件工程任务中起着至关重要的作用。由于这些任务的有效性和性能完全依赖于bug报告的质量,因此来自bug报告的准确信息非常重要。然而,正如在之前的研究中发现的那样,大量被归类为bug的报告并不是真正的bug。最近的研究提出了将bug报告自动分类为二进制类的技术,但仍有更多的需求。这些bug报告可以分为多个类,这有助于确定这些报告实际上是关于什么的。而且,以往的研究只探讨了主题建模的一种可能性,即潜狄利克雷分配(Latent Dirichlet Allocation, LDA)。虽然LDA有它的优点,但参数调优是必需的。在本文中,我们提出了一种非参数的方法来自动分类bug报告,另一种主题建模方法,层次狄利克雷过程(HDP)。结果表明,非参数方法的性能与参数方法相当。我们还研究了LDA的各个方面,以便更全面地理解这个过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing hierarchical dirichlet process with latent dirichlet allocation in bug report multiclass classification
Bug reports play essential roles in many software engineering tasks. Since validity and performance of these tasks definitely rely on the quality of bug reports, accurate information from bug reports is very important. However, as found in previous study, significant numbers of reports classified as bug are not really a bug. Recent studies proposed techniques to automatically classify bug reports into binary classes, yet there is still more to desire. These bug reports can be classified into multiple classes, which could help to identify what these reports are actually about. Moreover, previous study only looks into one possibility of topic modeling, that is, Latent Dirichlet Allocation (LDA). While LDA has its advantage, parameter tuning is required. In this paper, we propose a nonparametric approach to automatically classify bug reports with, another topic modeling method, Hierarchical Dirichlet Process (HDP). The result indicates that our nonparametric approach performance is comparable to the parametric one. We also examine various aspects of LDA to provide more thoroughly understanding of this process.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信