Statistical Translation of English Texts to API Code Templates

A. Nguyen, Peter C. Rigby, THANH VAN NGUYEN, Dharani Palani, Mark Karanfil, T. Nguyen
{"title":"Statistical Translation of English Texts to API Code Templates","authors":"A. Nguyen, Peter C. Rigby, THANH VAN NGUYEN, Dharani Palani, Mark Karanfil, T. Nguyen","doi":"10.1109/ICSE-C.2017.81","DOIUrl":null,"url":null,"abstract":"We develop T2API, a context-sensitive, graph-based statistical translation approach that takes as input an English description of a programming task and synthesizes the corresponding API code template for the task. We train T2API to statistically learn the alignments between English and API elements and determine the relevant API elements. The training is done on StackOverflow, a bilingual corpus on which developers discuss programming problems in two types of language: English and programming language. T2API considers both the context of the words in the input query and the context of API elements that often go together in the corpus. The derived API elements with their relevance scores are assembled into an API usage by GraSyn, a novel graph-based API synthesis algorithm that generates a graph representing an API usage from a large code corpus. Importantly, it is capable of generating new API usages from previously seen sub-usages. We curate a test benchmark of 250 real-world StackOverflow posts. Across the benchmark, T2API's synthesized snippets have the correct API elements with a median top-1 precision and recall of 67% and 100%, respectively. Four professional developers and five graduate students judged that 77% of our top synthesized API code templates are useful to solve the problem presented in the StackOverflow posts.","PeriodicalId":6572,"journal":{"name":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"194-205"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE-C.2017.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

We develop T2API, a context-sensitive, graph-based statistical translation approach that takes as input an English description of a programming task and synthesizes the corresponding API code template for the task. We train T2API to statistically learn the alignments between English and API elements and determine the relevant API elements. The training is done on StackOverflow, a bilingual corpus on which developers discuss programming problems in two types of language: English and programming language. T2API considers both the context of the words in the input query and the context of API elements that often go together in the corpus. The derived API elements with their relevance scores are assembled into an API usage by GraSyn, a novel graph-based API synthesis algorithm that generates a graph representing an API usage from a large code corpus. Importantly, it is capable of generating new API usages from previously seen sub-usages. We curate a test benchmark of 250 real-world StackOverflow posts. Across the benchmark, T2API's synthesized snippets have the correct API elements with a median top-1 precision and recall of 67% and 100%, respectively. Four professional developers and five graduate students judged that 77% of our top synthesized API code templates are useful to solve the problem presented in the StackOverflow posts.
英文文本到API代码模板的统计翻译
我们开发了T2API,这是一种上下文敏感的、基于图形的统计翻译方法,它将编程任务的英文描述作为输入,并为该任务合成相应的API代码模板。我们训练T2API来统计学习英语和API元素之间的对齐,并确定相关的API元素。培训是在StackOverflow上完成的,这是一个双语语料库,开发人员在上面用两种语言讨论编程问题:英语和编程语言。T2API既考虑输入查询中单词的上下文,也考虑语料库中经常一起出现的API元素的上下文。通过GraSyn(一种新颖的基于图的API合成算法),将派生的API元素及其相关分数组合成API使用情况,该算法可以从大型代码语料库中生成表示API使用情况的图。重要的是,它能够从以前看到的子用法生成新的API用法。我们策划了250个真实世界StackOverflow帖子的测试基准。在整个基准测试中,T2API的合成片段具有正确的API元素,其中位数top-1精度和召回率分别为67%和100%。四名专业开发人员和五名研究生认为,77%的顶级合成API代码模板对解决StackOverflow帖子中提出的问题是有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信