When Code Completion Fails: A Case Study on Real-World Completions

V. Hellendoorn, Sebastian Proksch, H. Gall, Alberto Bacchelli
{"title":"When Code Completion Fails: A Case Study on Real-World Completions","authors":"V. Hellendoorn, Sebastian Proksch, H. Gall, Alberto Bacchelli","doi":"10.1109/ICSE.2019.00101","DOIUrl":null,"url":null,"abstract":"Code completion is commonly used by software developers and is integrated into all major IDE's. Good completion tools can not only save time and effort but may also help avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, which we study and contrast with artificial completions to inform future research and tools in this area. We find that synthetic benchmarks misrepresent many aspects of real-world completions; tested completion tools were far less accurate on real-world data. Worse, on the few completions that consumed most of the developers' time, prediction accuracy was less than 20% -- an effect that is invisible in synthetic benchmarks. Our findings have ramifications for future benchmarks, tool design and real-world efficacy: Benchmarks must account for completions that developers use most, such as intra-project APIs; models should be designed to be amenable to intra-project data; and real-world developer trials are essential to quantifying performance on the least predictable completions, which are both most time-consuming and far more typical than artificial data suggests. We publicly release our preprint [https://doi.org/10.5281/zenodo.2565673] and replication data and materials [https://doi.org/10.5281/zenodo.2562249].","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"960-970"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 55

Abstract

Code completion is commonly used by software developers and is integrated into all major IDE's. Good completion tools can not only save time and effort but may also help avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, which we study and contrast with artificial completions to inform future research and tools in this area. We find that synthetic benchmarks misrepresent many aspects of real-world completions; tested completion tools were far less accurate on real-world data. Worse, on the few completions that consumed most of the developers' time, prediction accuracy was less than 20% -- an effect that is invisible in synthetic benchmarks. Our findings have ramifications for future benchmarks, tool design and real-world efficacy: Benchmarks must account for completions that developers use most, such as intra-project APIs; models should be designed to be amenable to intra-project data; and real-world developer trials are essential to quantifying performance on the least predictable completions, which are both most time-consuming and far more typical than artificial data suggests. We publicly release our preprint [https://doi.org/10.5281/zenodo.2565673] and replication data and materials [https://doi.org/10.5281/zenodo.2562249].
当代码完成失败:一个关于真实世界完成的案例研究
代码补全通常被软件开发人员使用,并且集成到所有主要的IDE中。好的完井工具不仅可以节省时间和精力,还可以帮助避免错误的API使用。许多建议的完井工具在综合基准测试中显示出有希望的结果,但是这些基准测试并没有说明它们所测试完井的真实性。缺乏真实数据的基础可能会阻碍我们对开发人员需求和完井模型有效性的科学理解。本文提出了一个由66名真正的开发人员应用的15000个代码完成的案例研究,我们将其与人工完成进行研究和对比,以告知该领域未来的研究和工具。我们发现,合成基准错误地反映了现实完井的许多方面;经过测试的完井工具在实际数据上的准确性要低得多。更糟糕的是,在少数几个消耗了开发人员大部分时间的完井中,预测准确率低于20%——这在合成基准测试中是看不见的。我们的研究结果对未来的基准测试、工具设计和现实世界的效率有影响:基准测试必须考虑开发人员使用最多的完成,比如项目内部api;模型的设计应符合项目内部数据;现实世界的开发人员试验对于量化最不可预测完井的性能至关重要,这些完井既耗时又比人工数据所显示的更为典型。我们公开发布我们的预印本[https://doi.org/10.5281/zenodo.2565673]和复制数据和材料[https://doi.org/10.5281/zenodo.2562249]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信