{"title":"Fine-grained Pseudo-code Generation Method via Code Feature Extraction and Transformer","authors":"Guang Yang, Yanlin Zhou, Xiang Chen, Chi Yu","doi":"10.1109/APSEC53868.2021.00029","DOIUrl":null,"url":null,"abstract":"Pseudo-code written by natural language is helpful for novice developers' program comprehension. However, writing such pseudo-code is time-consuming and laborious. Motivated by the research advancements of sequence-to-sequence learning and code semantic learning, we propose a novel deep pseudo-code generation method DeepPseudo via code feature extraction and Transformer. In particular, DeepPseudo utilizes a Transformer encoder to perform encoding for source code and then use a code feature extractor to learn the knowledge of local features. Finally, it uses a pseudo-code generator to perform decoding, which can generate the corresponding pseudo-code. We choose two corpora (i.e., Django and SPoC) from real-world large-scale projects as our empirical subjects. We first compare DeepPseudo with seven state-of-the-art baselines from pseudo-code generation and neural machine translation domains in terms of four performance measures. Results show the competitiveness of DeepPseudo. Moreover, we also analyze the rationality of the component settings in DeepPseudo.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Pseudo-code written by natural language is helpful for novice developers' program comprehension. However, writing such pseudo-code is time-consuming and laborious. Motivated by the research advancements of sequence-to-sequence learning and code semantic learning, we propose a novel deep pseudo-code generation method DeepPseudo via code feature extraction and Transformer. In particular, DeepPseudo utilizes a Transformer encoder to perform encoding for source code and then use a code feature extractor to learn the knowledge of local features. Finally, it uses a pseudo-code generator to perform decoding, which can generate the corresponding pseudo-code. We choose two corpora (i.e., Django and SPoC) from real-world large-scale projects as our empirical subjects. We first compare DeepPseudo with seven state-of-the-art baselines from pseudo-code generation and neural machine translation domains in terms of four performance measures. Results show the competitiveness of DeepPseudo. Moreover, we also analyze the rationality of the component settings in DeepPseudo.