Automatically mining software-based, semantically-similar words from comment-code mappings

M. Howard, Samir Gupta, L. Pollock, K. Vijay-Shanker
{"title":"Automatically mining software-based, semantically-similar words from comment-code mappings","authors":"M. Howard, Samir Gupta, L. Pollock, K. Vijay-Shanker","doi":"10.1109/MSR.2013.6624052","DOIUrl":null,"url":null,"abstract":"Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch in vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.","PeriodicalId":325271,"journal":{"name":"2013 10th Working Conference on Mining Software Repositories (MSR)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"92","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 10th Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2013.6624052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 92

Abstract

Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch in vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.
从注释代码映射中自动挖掘基于软件的语义相似的单词
许多软件开发和维护工具涉及不同软件工件(例如,可追溯性)中的自然语言单词之间的匹配,或者用户和软件工件(例如,代码搜索)提交的查询之间的匹配。由于不同的人可能创建查询和各种工件,因此这些工具的有效性通常通过扩展查询和向文本工件表示添加相关单词来提高。同义词对于克服词汇表中的不匹配以及其他表示语义相似性的单词关系特别有用。然而,经验表明,在计算机科学的情况下,许多词在语义上是相似的,而在典型的自然语言文档中则不然。在本文中,我们提出了一种自动挖掘语义相似词的技术,特别是在软件环境中。在编写方法和程序员约定时,我们利用了方法的主要注释的作用。我们对未在WordNet中出现的挖掘相关的注释代码词映射的评估确实被视为计算机科学,语义相似的词对的比例很高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信