基于结构化树输入和AST解码器注意力增强的代码生成方法

2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C) Pub Date : 2022-12-01 DOI:10.1109/QRS-C57518.2022.00077

Wenjun Wei, Junhua Wu

{"title":"基于结构化树输入和AST解码器注意力增强的代码生成方法","authors":"Wenjun Wei, Junhua Wu","doi":"10.1109/QRS-C57518.2022.00077","DOIUrl":null,"url":null,"abstract":"Automatic code generation based on natural language input is important to research in the field of software engineering. In the past, it was mostly a seq2seq structure and used the RNN model. Input and output are regarded as simple sequences, and syntactic structure information in source information is often ignored. This paper proposes a code generation method Tx(Tree-Tree). It uses structured trees to replace simple word sequences so that the model can better learn the syntactic and semantic information in the source information. Therefore, it can alleviate the long dependency problem caused by too long source information. At the same time, the enhanced attention mechanism is adopted in the decoder to distinguish the influence of different historical actions on the current predicted action. The model is validated on three datasets: DJANGO, CONALA, and ATIS. Compared with some typical models, Tx(Tree-Tree) improves both accuracy and BLEU.","PeriodicalId":183728,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Code Generation Method based on Structured Tree Input and AST Decoder Attention Augmentation\",\"authors\":\"Wenjun Wei, Junhua Wu\",\"doi\":\"10.1109/QRS-C57518.2022.00077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic code generation based on natural language input is important to research in the field of software engineering. In the past, it was mostly a seq2seq structure and used the RNN model. Input and output are regarded as simple sequences, and syntactic structure information in source information is often ignored. This paper proposes a code generation method Tx(Tree-Tree). It uses structured trees to replace simple word sequences so that the model can better learn the syntactic and semantic information in the source information. Therefore, it can alleviate the long dependency problem caused by too long source information. At the same time, the enhanced attention mechanism is adopted in the decoder to distinguish the influence of different historical actions on the current predicted action. The model is validated on three datasets: DJANGO, CONALA, and ATIS. Compared with some typical models, Tx(Tree-Tree) improves both accuracy and BLEU.\",\"PeriodicalId\":183728,\"journal\":{\"name\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)\",\"volume\":\"113 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QRS-C57518.2022.00077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS-C57518.2022.00077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于自然语言输入的代码自动生成是软件工程领域的一个重要研究课题。在过去，它主要是一个seq2seq结构，并使用RNN模型。输入和输出被视为简单的序列，而源信息中的语法结构信息往往被忽略。本文提出了一种代码生成方法Tx(Tree-Tree)。它使用结构化的树来代替简单的词序列，使模型能够更好地学习源信息中的语法和语义信息。因此，它可以缓解由于源信息过长而导致的长依赖问题。同时，在解码器中采用增强注意机制，区分不同历史动作对当前预测动作的影响。模型在三个数据集上进行了验证:DJANGO、CONALA和ATIS。与一些典型模型相比，Tx(Tree-Tree)既提高了准确率，又提高了BLEU。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Code Generation Method based on Structured Tree Input and AST Decoder Attention Augmentation

Automatic code generation based on natural language input is important to research in the field of software engineering. In the past, it was mostly a seq2seq structure and used the RNN model. Input and output are regarded as simple sequences, and syntactic structure information in source information is often ignored. This paper proposes a code generation method Tx(Tree-Tree). It uses structured trees to replace simple word sequences so that the model can better learn the syntactic and semantic information in the source information. Therefore, it can alleviate the long dependency problem caused by too long source information. At the same time, the enhanced attention mechanism is adopted in the decoder to distinguish the influence of different historical actions on the current predicted action. The model is validated on three datasets: DJANGO, CONALA, and ATIS. Compared with some typical models, Tx(Tree-Tree) improves both accuracy and BLEU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)

自引率

0.00%

发文量