STRICT: Information retrieval based search term identification for concept location

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2017-02-01 DOI:10.1109/SANER.2017.7884611

M. M. Rahman, C. Roy

{"title":"STRICT: Information retrieval based search term identification for concept location","authors":"M. M. Rahman, C. Roy","doi":"10.1109/SANER.2017.7884611","DOIUrl":null,"url":null,"abstract":"During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using a search technique. Once such locations are identified, they can implement the requested changes there. Studies suggest that developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose a novel technique-STRICT-that automatically identifies suitable search terms for a software change task by analyzing its task description using two information retrieval (IR) techniques-TextRank and POSRank. These IR techniques determine a term's importance based on not only its co-occurrences with other important terms but also its syntactic relationships with them. Experiments using 1,939 change requests from eight subject systems report that STRICT can identify better quality search terms than baseline terms from 52%–62% of the requests with 30%–57% Top-10 retrieval accuracy which are promising. Comparison with two state-of-the-art techniques not only validates our empirical findings and but also demonstrates the superiority of our technique.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"42 1","pages":"79-90"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2017.7884611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using a search technique. Once such locations are identified, they can implement the requested changes there. Studies suggest that developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose a novel technique-STRICT-that automatically identifies suitable search terms for a software change task by analyzing its task description using two information retrieval (IR) techniques-TextRank and POSRank. These IR techniques determine a term's importance based on not only its co-occurrences with other important terms but also its syntactic relationships with them. Experiments using 1,939 change requests from eight subject systems report that STRICT can identify better quality search terms than baseline terms from 52%–62% of the requests with 30%–57% Top-10 retrieval accuracy which are promising. Comparison with two state-of-the-art techniques not only validates our empirical findings and but also demonstrates the superiority of our technique.

查看原文本刊更多论文

严格:基于信息检索的概念位置搜索词识别

在维护期间，软件开发人员处理使用自然语言以非结构化方式编写的大量变更请求。这些自然语言文本说明了涉及各种领域相关概念的变更需求。软件开发人员需要从这些概念中找到合适的搜索词，以便他们可以使用搜索技术在源代码中定位可能的位置。一旦确定了这些位置，他们就可以在那里实现所请求的更改。研究表明，开发人员在为变更任务提供合适的搜索条件方面通常表现不佳。在本文中，我们提出了一种新的技术- strict -通过使用两种信息检索技术(textrank和POSRank)分析任务描述来自动识别适合软件变更任务的搜索词。这些IR技术不仅根据术语与其他重要术语的共现情况，而且根据它们之间的句法关系来确定术语的重要性。使用来自8个主题系统的1939个更改请求的实验报告表明，STRICT可以从52%-62%的请求中识别出比基线更高质量的搜索词，前10名的检索准确率为30%-57%，这是有希望的。与两种最先进的技术进行比较，不仅验证了我们的实证研究结果，而且证明了我们技术的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量