{"title":"识别软件Bug库中特定于软件Bug的命名实体","authors":"Cheng Zhou, Bin Li, Xiaobing Sun, Hongjing Guo","doi":"10.1145/3196321.3196335","DOIUrl":null,"url":null,"abstract":"Software bug issues are unavoidable in software development and maintenance. In order to manage bugs effectively, bug tracking systems are developed to help to record, manage and track the bugs of each project. The rich information in the bug repository provides the possibility of establishment of entity-centric knowledge bases to help understand and fix the bugs. However, existing named entity recognition (NER) systems deal with text that is structured, formal, well written, with a good grammatical structure and few spelling errors, which cannot be directly used for bug-specific named entity recognition. For bug data, they are free-form texts, which include a mixed language studded with code, abbreviations and software-specific vocabularies. In this paper, we summarize the characteristics of bug entities, propose a classification method for bug entities, and build a baseline corpus on two open source projects (Mozilla and Eclipse). On this basis, we propose an approach for bug-specific entity recognition called BNER with the Conditional Random Fields (CRF) model and word embedding technique. An empirical study is conducted to evaluate the accuracy of our BNER technique, and the results show that the two designed baseline corpus are suitable for bug-specific named entity recognition, and our BNER approach is e?ective on cross-projects NER.","PeriodicalId":441573,"journal":{"name":"2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Recognizing Software Bug-Specific Named Entity in Software Bug Repository\",\"authors\":\"Cheng Zhou, Bin Li, Xiaobing Sun, Hongjing Guo\",\"doi\":\"10.1145/3196321.3196335\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software bug issues are unavoidable in software development and maintenance. In order to manage bugs effectively, bug tracking systems are developed to help to record, manage and track the bugs of each project. The rich information in the bug repository provides the possibility of establishment of entity-centric knowledge bases to help understand and fix the bugs. However, existing named entity recognition (NER) systems deal with text that is structured, formal, well written, with a good grammatical structure and few spelling errors, which cannot be directly used for bug-specific named entity recognition. For bug data, they are free-form texts, which include a mixed language studded with code, abbreviations and software-specific vocabularies. In this paper, we summarize the characteristics of bug entities, propose a classification method for bug entities, and build a baseline corpus on two open source projects (Mozilla and Eclipse). On this basis, we propose an approach for bug-specific entity recognition called BNER with the Conditional Random Fields (CRF) model and word embedding technique. An empirical study is conducted to evaluate the accuracy of our BNER technique, and the results show that the two designed baseline corpus are suitable for bug-specific named entity recognition, and our BNER approach is e?ective on cross-projects NER.\",\"PeriodicalId\":441573,\"journal\":{\"name\":\"2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)\",\"volume\":\"120 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3196321.3196335\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3196321.3196335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recognizing Software Bug-Specific Named Entity in Software Bug Repository
Software bug issues are unavoidable in software development and maintenance. In order to manage bugs effectively, bug tracking systems are developed to help to record, manage and track the bugs of each project. The rich information in the bug repository provides the possibility of establishment of entity-centric knowledge bases to help understand and fix the bugs. However, existing named entity recognition (NER) systems deal with text that is structured, formal, well written, with a good grammatical structure and few spelling errors, which cannot be directly used for bug-specific named entity recognition. For bug data, they are free-form texts, which include a mixed language studded with code, abbreviations and software-specific vocabularies. In this paper, we summarize the characteristics of bug entities, propose a classification method for bug entities, and build a baseline corpus on two open source projects (Mozilla and Eclipse). On this basis, we propose an approach for bug-specific entity recognition called BNER with the Conditional Random Fields (CRF) model and word embedding technique. An empirical study is conducted to evaluate the accuracy of our BNER technique, and the results show that the two designed baseline corpus are suitable for bug-specific named entity recognition, and our BNER approach is e?ective on cross-projects NER.