{"title":"SWIM: Synthesizing What I Mean - Code Search and Idiomatic Snippet Synthesis","authors":"Mukund Raghothaman, Yi Wei, Y. Hamadi","doi":"10.1145/2884781.2884808","DOIUrl":null,"url":null,"abstract":"Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as \"generate md5 hash code\". We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce \\emph{structured call sequences} to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis. We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"23 1","pages":"357-367"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"152","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2884781.2884808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 152
Abstract
Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code". We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce \emph{structured call sequences} to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis. We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.
现代编程框架都带有大型库,以及各种各样的应用程序,比如匹配正则表达式、解析XML文件和发送电子邮件。程序员经常使用谷歌和必应等搜索引擎来了解现有的api。在本文中,我们描述了SWIM,这是一个工具,它给出了与api相关的自然语言查询(如“生成md5哈希码”)的代码片段。我们使用必应搜索引擎的点击数据将用户查询转换为感兴趣的api。然后,基于从开源代码存储库学习到的模式,我们合成描述这些api使用的惯用代码。我们引入\emph{结构化调用序列}来捕获api使用模式。结构化调用序列是方法调用序列的一种通用形式,使用if分支和while循环来表示有条件的和重复的API使用模式,并且易于提取并且易于合成。我们用Bing收到的30个常见的c# api相关查询对SWIM进行了评估。70岁% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.