Are deep neural networks the best choice for modeling source code?

V. Hellendoorn, Premkumar T. Devanbu
{"title":"Are deep neural networks the best choice for modeling source code?","authors":"V. Hellendoorn, Premkumar T. Devanbu","doi":"10.1145/3106237.3106290","DOIUrl":null,"url":null,"abstract":"Current statistical language modeling techniques, including deep-learning based models, have proven to be quite effective for source code. We argue here that the special properties of source code can be exploited for further improvements. In this work, we enhance established language modeling approaches to handle the special challenges of modeling source code, such as: frequent changes, larger, changing vocabularies, deeply nested scopes, etc. We present a fast, nested language modeling toolkit specifically designed for software, with the ability to add & remove text, and mix & swap out many models. Specifically, we improve upon prior cache-modeling work and present a model with a much more expansive, multi-level notion of locality that we show to be well-suited for modeling software. We present results on varying corpora in comparison with traditional N-gram, as well as RNN, and LSTM deep-learning language models, and release all our source code for public use. Our evaluations suggest that carefully adapting N-gram models for source code can yield performance that surpasses even RNN and LSTM based deep-learning models.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"270","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106237.3106290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 270

Abstract

Current statistical language modeling techniques, including deep-learning based models, have proven to be quite effective for source code. We argue here that the special properties of source code can be exploited for further improvements. In this work, we enhance established language modeling approaches to handle the special challenges of modeling source code, such as: frequent changes, larger, changing vocabularies, deeply nested scopes, etc. We present a fast, nested language modeling toolkit specifically designed for software, with the ability to add & remove text, and mix & swap out many models. Specifically, we improve upon prior cache-modeling work and present a model with a much more expansive, multi-level notion of locality that we show to be well-suited for modeling software. We present results on varying corpora in comparison with traditional N-gram, as well as RNN, and LSTM deep-learning language models, and release all our source code for public use. Our evaluations suggest that carefully adapting N-gram models for source code can yield performance that surpasses even RNN and LSTM based deep-learning models.
深度神经网络是源代码建模的最佳选择吗?
目前的统计语言建模技术,包括基于深度学习的模型,已被证明对源代码非常有效。我们认为源代码的特殊属性可以用于进一步的改进。在这项工作中,我们增强了已建立的语言建模方法,以处理源代码建模的特殊挑战,例如:频繁更改、更大、不断变化的词汇表、深度嵌套作用域等。我们提供了一个专门为软件设计的快速、嵌套的语言建模工具包,具有添加和删除文本以及混合和交换许多模型的能力。具体来说,我们改进了先前的缓存建模工作,并提出了一个具有更广泛、多层次的局域性概念的模型,我们认为它非常适合建模软件。与传统的N-gram、RNN和LSTM深度学习语言模型相比,我们展示了不同语料库上的结果,并发布了所有源代码供公众使用。我们的评估表明,仔细调整源代码的N-gram模型可以产生甚至超过基于RNN和LSTM的深度学习模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信