Fine-grained Pseudo-code Generation Method via Code Feature Extraction and Transformer

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-02-12 DOI:10.1109/APSEC53868.2021.00029

Guang Yang, Yanlin Zhou, Xiang Chen, Chi Yu

{"title":"Fine-grained Pseudo-code Generation Method via Code Feature Extraction and Transformer","authors":"Guang Yang, Yanlin Zhou, Xiang Chen, Chi Yu","doi":"10.1109/APSEC53868.2021.00029","DOIUrl":null,"url":null,"abstract":"Pseudo-code written by natural language is helpful for novice developers' program comprehension. However, writing such pseudo-code is time-consuming and laborious. Motivated by the research advancements of sequence-to-sequence learning and code semantic learning, we propose a novel deep pseudo-code generation method DeepPseudo via code feature extraction and Transformer. In particular, DeepPseudo utilizes a Transformer encoder to perform encoding for source code and then use a code feature extractor to learn the knowledge of local features. Finally, it uses a pseudo-code generator to perform decoding, which can generate the corresponding pseudo-code. We choose two corpora (i.e., Django and SPoC) from real-world large-scale projects as our empirical subjects. We first compare DeepPseudo with seven state-of-the-art baselines from pseudo-code generation and neural machine translation domains in terms of four performance measures. Results show the competitiveness of DeepPseudo. Moreover, we also analyze the rationality of the component settings in DeepPseudo.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Pseudo-code written by natural language is helpful for novice developers' program comprehension. However, writing such pseudo-code is time-consuming and laborious. Motivated by the research advancements of sequence-to-sequence learning and code semantic learning, we propose a novel deep pseudo-code generation method DeepPseudo via code feature extraction and Transformer. In particular, DeepPseudo utilizes a Transformer encoder to perform encoding for source code and then use a code feature extractor to learn the knowledge of local features. Finally, it uses a pseudo-code generator to perform decoding, which can generate the corresponding pseudo-code. We choose two corpora (i.e., Django and SPoC) from real-world large-scale projects as our empirical subjects. We first compare DeepPseudo with seven state-of-the-art baselines from pseudo-code generation and neural machine translation domains in terms of four performance measures. Results show the competitiveness of DeepPseudo. Moreover, we also analyze the rationality of the component settings in DeepPseudo.

查看原文本刊更多论文

基于代码特征提取和转换的细粒度伪代码生成方法

用自然语言编写的伪代码有助于新手理解程序。然而，编写这样的伪代码既费时又费力。在序列到序列学习和代码语义学习研究进展的推动下，我们提出了一种基于代码特征提取和转换的深度伪代码生成方法DeepPseudo。特别是，DeepPseudo利用Transformer编码器对源代码进行编码，然后使用代码特征提取器来学习局部特征的知识。最后利用伪码生成器进行解码，生成相应的伪码。我们从现实世界的大型项目中选择了两个语料库(即Django和SPoC)作为我们的实证对象。我们首先将deepseudo与来自伪代码生成和神经机器翻译领域的七个最先进的基线在四个性能指标方面进行比较。结果显示了deepppseudo的竞争力。此外，我们还分析了deepppsedo中组件设置的合理性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 28th Asia-Pacific Software Engineering Conference (APSEC)

自引率

0.00%

发文量