{"title":"问题跟踪系统中基于IR的主题模型研究,以推断软件特定的语义相关术语对","authors":"D. Correa, A. Sureka, Sangeeta Lal","doi":"10.1109/IC3.2017.8284329","DOIUrl":null,"url":null,"abstract":"Software maintenance is a core component of any software development life-cycle. Contemporary software systems contain voluminous and complex information stored in software repositories. Software maintenance professionals spend significant amount of time in search and exploration of these repositories for common maintenance tasks like bug fixing, feature enhancements, code refactoring and reengineering. Therefore, tools and methods to facilitate search in software repositories can aid software maintenance professionals to have faster access to required information and increase productivity. A domain-specific lexical resource is an important tool to bridge the semantic gap existing between the information need and search query. In this work, we investigate the use of information retrieval (IR) based topic models (like LSI and LDA) to infer semantically related terms for a software context specific lexical resource. We perform our experiments on Google Chromium — a widely popular open-source browser — issue tracker system which contains 134,000+ bug reports. We divide our study into two parts — (1) In the first part, we apply our IR models on free form natural language textual data present in defect tracking systems. We perform qualitative analysis on the obtained output and uncover semantically related terms in the Google Chromium software context. We observe that we are able to infer semantically similar term pairs in four different contexts of English language, Software, Google Chromium and Code details. (2) In second part of this study, we utilize the semantically inferred terms obtained from the output of IR models to facilitate the software maintenance task of duplicate bug report detection. Our results demonstrate that the use of IR based topic models on defect tracking systems to automatically infer semantically related terms can help build a software domain-specific lexical resource and reduce the vocabulary gap.","PeriodicalId":147099,"journal":{"name":"2017 Tenth International Conference on Contemporary Computing (IC3)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigation of IR based topic models on issue tracking systems to infer software-specific semantic related term pairs\",\"authors\":\"D. Correa, A. Sureka, Sangeeta Lal\",\"doi\":\"10.1109/IC3.2017.8284329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software maintenance is a core component of any software development life-cycle. Contemporary software systems contain voluminous and complex information stored in software repositories. Software maintenance professionals spend significant amount of time in search and exploration of these repositories for common maintenance tasks like bug fixing, feature enhancements, code refactoring and reengineering. Therefore, tools and methods to facilitate search in software repositories can aid software maintenance professionals to have faster access to required information and increase productivity. A domain-specific lexical resource is an important tool to bridge the semantic gap existing between the information need and search query. In this work, we investigate the use of information retrieval (IR) based topic models (like LSI and LDA) to infer semantically related terms for a software context specific lexical resource. We perform our experiments on Google Chromium — a widely popular open-source browser — issue tracker system which contains 134,000+ bug reports. We divide our study into two parts — (1) In the first part, we apply our IR models on free form natural language textual data present in defect tracking systems. We perform qualitative analysis on the obtained output and uncover semantically related terms in the Google Chromium software context. We observe that we are able to infer semantically similar term pairs in four different contexts of English language, Software, Google Chromium and Code details. (2) In second part of this study, we utilize the semantically inferred terms obtained from the output of IR models to facilitate the software maintenance task of duplicate bug report detection. Our results demonstrate that the use of IR based topic models on defect tracking systems to automatically infer semantically related terms can help build a software domain-specific lexical resource and reduce the vocabulary gap.\",\"PeriodicalId\":147099,\"journal\":{\"name\":\"2017 Tenth International Conference on Contemporary Computing (IC3)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Tenth International Conference on Contemporary Computing (IC3)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3.2017.8284329\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Tenth International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2017.8284329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigation of IR based topic models on issue tracking systems to infer software-specific semantic related term pairs
Software maintenance is a core component of any software development life-cycle. Contemporary software systems contain voluminous and complex information stored in software repositories. Software maintenance professionals spend significant amount of time in search and exploration of these repositories for common maintenance tasks like bug fixing, feature enhancements, code refactoring and reengineering. Therefore, tools and methods to facilitate search in software repositories can aid software maintenance professionals to have faster access to required information and increase productivity. A domain-specific lexical resource is an important tool to bridge the semantic gap existing between the information need and search query. In this work, we investigate the use of information retrieval (IR) based topic models (like LSI and LDA) to infer semantically related terms for a software context specific lexical resource. We perform our experiments on Google Chromium — a widely popular open-source browser — issue tracker system which contains 134,000+ bug reports. We divide our study into two parts — (1) In the first part, we apply our IR models on free form natural language textual data present in defect tracking systems. We perform qualitative analysis on the obtained output and uncover semantically related terms in the Google Chromium software context. We observe that we are able to infer semantically similar term pairs in four different contexts of English language, Software, Google Chromium and Code details. (2) In second part of this study, we utilize the semantically inferred terms obtained from the output of IR models to facilitate the software maintenance task of duplicate bug report detection. Our results demonstrate that the use of IR based topic models on defect tracking systems to automatically infer semantically related terms can help build a software domain-specific lexical resource and reduce the vocabulary gap.