{"title":"基于深度学习的编程代码补全模型","authors":"Shuai Wang, Jinyang Liu, Ye Qiu, Zhiyi Ma, Junfei Liu, Zhonghai Wu","doi":"10.1145/3386164.3389083","DOIUrl":null,"url":null,"abstract":"With the fast development of Information Technology, program software and mobile applications have been widely used in the world, and are playing important roles in human's daily life. Thus, writing programming codes has been important work in many fields. however, it is a hard and time-cost task which presents a great amount of workload to programmers. To make programmers' work easier, intelligent code completion models have been a popular research topic in recent years. This paper designs Deep Learning based models to automatically complete programming codes, which are LSTM-based neural networks, and are combined with several techniques such as Word Embedding models in NLP (Natural Language Processing), and Multihead Attention Mechanism. Moreover, in the models, this paper raises a new algorithm of generating input sequences from partial AST (Abstract Syntax Tree) that have most relevance with nodes to be predicted which is named as RZT (Reverse Zig-zag Traverse) Algorithm, and is the first work of applying Multihead Attention Block into this task. This paper makes insight into codes of several different programming languages, and the models this paper presents show good performances in accuracy comparing with the state-of-art models.","PeriodicalId":231209,"journal":{"name":"Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Deep Learning Based Code Completion Models for Programming Codes\",\"authors\":\"Shuai Wang, Jinyang Liu, Ye Qiu, Zhiyi Ma, Junfei Liu, Zhonghai Wu\",\"doi\":\"10.1145/3386164.3389083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the fast development of Information Technology, program software and mobile applications have been widely used in the world, and are playing important roles in human's daily life. Thus, writing programming codes has been important work in many fields. however, it is a hard and time-cost task which presents a great amount of workload to programmers. To make programmers' work easier, intelligent code completion models have been a popular research topic in recent years. This paper designs Deep Learning based models to automatically complete programming codes, which are LSTM-based neural networks, and are combined with several techniques such as Word Embedding models in NLP (Natural Language Processing), and Multihead Attention Mechanism. Moreover, in the models, this paper raises a new algorithm of generating input sequences from partial AST (Abstract Syntax Tree) that have most relevance with nodes to be predicted which is named as RZT (Reverse Zig-zag Traverse) Algorithm, and is the first work of applying Multihead Attention Block into this task. This paper makes insight into codes of several different programming languages, and the models this paper presents show good performances in accuracy comparing with the state-of-art models.\",\"PeriodicalId\":231209,\"journal\":{\"name\":\"Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3386164.3389083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3386164.3389083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
随着信息技术的快速发展,程序软件和移动应用程序在世界范围内得到了广泛的应用,在人们的日常生活中发挥着重要的作用。因此,编写程序代码在许多领域都是很重要的工作。然而,这是一项困难且耗时的任务,给程序员带来了大量的工作量。为了使程序员的工作更容易,智能代码完成模型是近年来的一个热门研究课题。本文设计了基于深度学习的自动完成编程代码的模型,该模型是基于lstm的神经网络,并结合了NLP(自然语言处理)中的词嵌入模型和多头注意机制等技术。此外,在模型中,本文提出了一种从部分AST (Abstract Syntax Tree)生成与待预测节点最相关的输入序列的新算法,称为RZT (Reverse z -zag Traverse)算法,这是首个将多头注意力块应用于该任务的算法。本文深入研究了几种不同编程语言的代码,与目前的模型相比,本文提出的模型在精度上有良好的表现。
Deep Learning Based Code Completion Models for Programming Codes
With the fast development of Information Technology, program software and mobile applications have been widely used in the world, and are playing important roles in human's daily life. Thus, writing programming codes has been important work in many fields. however, it is a hard and time-cost task which presents a great amount of workload to programmers. To make programmers' work easier, intelligent code completion models have been a popular research topic in recent years. This paper designs Deep Learning based models to automatically complete programming codes, which are LSTM-based neural networks, and are combined with several techniques such as Word Embedding models in NLP (Natural Language Processing), and Multihead Attention Mechanism. Moreover, in the models, this paper raises a new algorithm of generating input sequences from partial AST (Abstract Syntax Tree) that have most relevance with nodes to be predicted which is named as RZT (Reverse Zig-zag Traverse) Algorithm, and is the first work of applying Multihead Attention Block into this task. This paper makes insight into codes of several different programming languages, and the models this paper presents show good performances in accuracy comparing with the state-of-art models.