识别软件Bug库中特定于软件Bug的命名实体

Cheng Zhou, Bin Li, Xiaobing Sun, Hongjing Guo
{"title":"识别软件Bug库中特定于软件Bug的命名实体","authors":"Cheng Zhou, Bin Li, Xiaobing Sun, Hongjing Guo","doi":"10.1145/3196321.3196335","DOIUrl":null,"url":null,"abstract":"Software bug issues are unavoidable in software development and maintenance. In order to manage bugs effectively, bug tracking systems are developed to help to record, manage and track the bugs of each project. The rich information in the bug repository provides the possibility of establishment of entity-centric knowledge bases to help understand and fix the bugs. However, existing named entity recognition (NER) systems deal with text that is structured, formal, well written, with a good grammatical structure and few spelling errors, which cannot be directly used for bug-specific named entity recognition. For bug data, they are free-form texts, which include a mixed language studded with code, abbreviations and software-specific vocabularies. In this paper, we summarize the characteristics of bug entities, propose a classification method for bug entities, and build a baseline corpus on two open source projects (Mozilla and Eclipse). On this basis, we propose an approach for bug-specific entity recognition called BNER with the Conditional Random Fields (CRF) model and word embedding technique. An empirical study is conducted to evaluate the accuracy of our BNER technique, and the results show that the two designed baseline corpus are suitable for bug-specific named entity recognition, and our BNER approach is e?ective on cross-projects NER.","PeriodicalId":441573,"journal":{"name":"2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Recognizing Software Bug-Specific Named Entity in Software Bug Repository\",\"authors\":\"Cheng Zhou, Bin Li, Xiaobing Sun, Hongjing Guo\",\"doi\":\"10.1145/3196321.3196335\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software bug issues are unavoidable in software development and maintenance. In order to manage bugs effectively, bug tracking systems are developed to help to record, manage and track the bugs of each project. The rich information in the bug repository provides the possibility of establishment of entity-centric knowledge bases to help understand and fix the bugs. However, existing named entity recognition (NER) systems deal with text that is structured, formal, well written, with a good grammatical structure and few spelling errors, which cannot be directly used for bug-specific named entity recognition. For bug data, they are free-form texts, which include a mixed language studded with code, abbreviations and software-specific vocabularies. In this paper, we summarize the characteristics of bug entities, propose a classification method for bug entities, and build a baseline corpus on two open source projects (Mozilla and Eclipse). On this basis, we propose an approach for bug-specific entity recognition called BNER with the Conditional Random Fields (CRF) model and word embedding technique. An empirical study is conducted to evaluate the accuracy of our BNER technique, and the results show that the two designed baseline corpus are suitable for bug-specific named entity recognition, and our BNER approach is e?ective on cross-projects NER.\",\"PeriodicalId\":441573,\"journal\":{\"name\":\"2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)\",\"volume\":\"120 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3196321.3196335\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3196321.3196335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

摘要

软件bug问题在软件开发和维护中是不可避免的。为了有效地管理bug,开发了bug跟踪系统来帮助记录、管理和跟踪每个项目的bug。错误存储库中的丰富信息提供了建立以实体为中心的知识库的可能性,以帮助理解和修复错误。然而,现有的命名实体识别(NER)系统处理的文本是结构化的、正式的、写得好的、语法结构良好、拼写错误很少的文本,这些文本不能直接用于特定于bug的命名实体识别。对于bug数据,它们是自由格式的文本,其中包括一种混合语言,其中包含代码、缩写和特定于软件的词汇表。本文总结了bug实体的特点,提出了bug实体的分类方法,并在两个开源项目(Mozilla和Eclipse)上构建了一个基线语料库。在此基础上,我们提出了一种基于条件随机场(CRF)模型和词嵌入技术的针对bug的实体识别方法BNER。通过实证研究对BNER技术的准确性进行了评估,结果表明所设计的两个基线语料库都适合于针对特定bug的命名实体识别,并且我们的BNER方法是e?有效的跨项目NER。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Recognizing Software Bug-Specific Named Entity in Software Bug Repository
Software bug issues are unavoidable in software development and maintenance. In order to manage bugs effectively, bug tracking systems are developed to help to record, manage and track the bugs of each project. The rich information in the bug repository provides the possibility of establishment of entity-centric knowledge bases to help understand and fix the bugs. However, existing named entity recognition (NER) systems deal with text that is structured, formal, well written, with a good grammatical structure and few spelling errors, which cannot be directly used for bug-specific named entity recognition. For bug data, they are free-form texts, which include a mixed language studded with code, abbreviations and software-specific vocabularies. In this paper, we summarize the characteristics of bug entities, propose a classification method for bug entities, and build a baseline corpus on two open source projects (Mozilla and Eclipse). On this basis, we propose an approach for bug-specific entity recognition called BNER with the Conditional Random Fields (CRF) model and word embedding technique. An empirical study is conducted to evaluate the accuracy of our BNER technique, and the results show that the two designed baseline corpus are suitable for bug-specific named entity recognition, and our BNER approach is e?ective on cross-projects NER.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信