Blended, precise semantic program embeddings

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2020-06-06 DOI:10.1145/3385412.3385999

Ke Wang, Z. Su

{"title":"Blended, precise semantic program embeddings","authors":"Ke Wang, Z. Su","doi":"10.1145/3385412.3385999","DOIUrl":null,"url":null,"abstract":"Learning neural program embeddings is key to utilizing deep neural networks in program languages research --- precise and efficient program representations enable the application of deep models to a wide range of program analysis tasks. Existing approaches predominately learn to embed programs from their source code, and, as a result, they do not capture deep, precise program semantics. On the other hand, models learned from runtime information critically depend on the quality of program executions, thus leading to trained models with highly variant quality. This paper tackles these inherent weaknesses of prior approaches by introducing a new deep neural network, Liger, which learns program representations from a mixture of symbolic and concrete execution traces. We have evaluated Liger on two tasks: method name prediction and semantics classification. Results show that Liger is significantly more accurate than the state-of-the-art static model code2seq in predicting method names, and requires on average around 10x fewer executions covering nearly 4x fewer paths than the state-of-the-art dynamic model DYPRO in both tasks. Liger offers a new, interesting design point in the space of neural program embeddings and opens up this new direction for exploration.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385412.3385999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

Abstract

Learning neural program embeddings is key to utilizing deep neural networks in program languages research --- precise and efficient program representations enable the application of deep models to a wide range of program analysis tasks. Existing approaches predominately learn to embed programs from their source code, and, as a result, they do not capture deep, precise program semantics. On the other hand, models learned from runtime information critically depend on the quality of program executions, thus leading to trained models with highly variant quality. This paper tackles these inherent weaknesses of prior approaches by introducing a new deep neural network, Liger, which learns program representations from a mixture of symbolic and concrete execution traces. We have evaluated Liger on two tasks: method name prediction and semantics classification. Results show that Liger is significantly more accurate than the state-of-the-art static model code2seq in predicting method names, and requires on average around 10x fewer executions covering nearly 4x fewer paths than the state-of-the-art dynamic model DYPRO in both tasks. Liger offers a new, interesting design point in the space of neural program embeddings and opens up this new direction for exploration.

查看原文本刊更多论文

混合，精确的语义程序嵌入

学习神经程序嵌入是在程序语言研究中利用深度神经网络的关键——精确和有效的程序表示使深度模型能够应用于广泛的程序分析任务。现有的方法主要是从源代码学习嵌入程序，因此，它们不能捕获深入、精确的程序语义。另一方面，从运行时信息中学习的模型严重依赖于程序执行的质量，从而导致训练的模型具有高度变化的质量。本文通过引入一种新的深度神经网络Liger来解决先前方法的这些固有弱点，Liger从符号和具体执行痕迹的混合中学习程序表示。我们在两个任务上对Liger进行了评估:方法名称预测和语义分类。结果表明，在预测方法名方面，Liger比最先进的静态模型code2seq要准确得多，并且在这两个任务中，与最先进的动态模型DYPRO相比，平均需要的执行次数减少了10倍左右，路径减少了近4倍。Liger在神经程序嵌入领域提供了一个新的、有趣的设计点，并开辟了这一探索的新方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量