基于变压器的代码语义摘要结构特征的深入研究

2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC) Pub Date : 2023-05-01 DOI:10.1109/ICPC58990.2023.00024

Kang Yang, Xinjun Mao, Shangwen Wang, Yihao Qin, Tanghaoran Zhang, Yao Lu, Kamal Al-Sabahi

{"title":"基于变压器的代码语义摘要结构特征的深入研究","authors":"Kang Yang, Xinjun Mao, Shangwen Wang, Yihao Qin, Tanghaoran Zhang, Yao Lu, Kamal Al-Sabahi","doi":"10.1109/ICPC58990.2023.00024","DOIUrl":null,"url":null,"abstract":"Transformers are now widely utilized in code intelligence tasks. To better fit highly structured source code, various structure information is passed into Transformer, such as positional encoding and abstract syntax tree (AST) based structures. However, it is still not clear how these structural features affect code intelligence tasks, such as code summarization. Addressing this problem is of vital importance for designing Transformer-based code models. Existing works are keen to introduce various structural information into Transformers while lacking persuasive analysis to reveal their contributions and interaction effects. In this paper, we conduct an empirical study of frequently-used code structure features for code representation, including two types of position encoding features and AST-based structure features. We propose a couple of probing tasks to detect how these structure features perform in Transformer and conduct comprehensive ablation studies to investigate how these structural features affect code semantic summarization tasks. To further validate the effectiveness of code structure features in code summarization tasks, we assess Transformer models equipped with these code structure features on a structural dependent summarization dataset. Our experimental results reveal several findings that may inspire future study: (1) there is a conflict between the influence of the absolute positional embeddings and relative positional embeddings in Transformer; (2) AST-based code structure features and relative position encoding features show a strong correlation and much contribution overlap for code semantic summarization tasks indeed exists between them; (3) Transformer models still have space for further improvement in explicitly understanding code structure information.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization\",\"authors\":\"Kang Yang, Xinjun Mao, Shangwen Wang, Yihao Qin, Tanghaoran Zhang, Yao Lu, Kamal Al-Sabahi\",\"doi\":\"10.1109/ICPC58990.2023.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformers are now widely utilized in code intelligence tasks. To better fit highly structured source code, various structure information is passed into Transformer, such as positional encoding and abstract syntax tree (AST) based structures. However, it is still not clear how these structural features affect code intelligence tasks, such as code summarization. Addressing this problem is of vital importance for designing Transformer-based code models. Existing works are keen to introduce various structural information into Transformers while lacking persuasive analysis to reveal their contributions and interaction effects. In this paper, we conduct an empirical study of frequently-used code structure features for code representation, including two types of position encoding features and AST-based structure features. We propose a couple of probing tasks to detect how these structure features perform in Transformer and conduct comprehensive ablation studies to investigate how these structural features affect code semantic summarization tasks. To further validate the effectiveness of code structure features in code summarization tasks, we assess Transformer models equipped with these code structure features on a structural dependent summarization dataset. Our experimental results reveal several findings that may inspire future study: (1) there is a conflict between the influence of the absolute positional embeddings and relative positional embeddings in Transformer; (2) AST-based code structure features and relative position encoding features show a strong correlation and much contribution overlap for code semantic summarization tasks indeed exists between them; (3) Transformer models still have space for further improvement in explicitly understanding code structure information.\",\"PeriodicalId\":376593,\"journal\":{\"name\":\"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPC58990.2023.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC58990.2023.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

变形器现在广泛应用于代码智能任务中。为了更好地适应高度结构化的源代码，各种结构信息被传递到Transformer中，例如位置编码和基于抽象语法树(AST)的结构。然而，这些结构特性如何影响代码智能任务(如代码摘要)仍然不清楚。解决这个问题对于设计基于transformer的代码模型是至关重要的。现有的作品热衷于将各种结构信息引入变形金刚，但缺乏有说服力的分析来揭示它们的贡献和相互作用。本文对编码表示中常用的编码结构特征进行了实证研究，包括两类位置编码特征和基于ast的编码结构特征。我们提出了几个探测任务来检测这些结构特征如何在Transformer中执行，并进行全面的消融研究来研究这些结构特征如何影响代码语义摘要任务。为了进一步验证代码结构特征在代码摘要任务中的有效性，我们在一个结构依赖的摘要数据集上评估了配备了这些代码结构特征的Transformer模型。我们的实验结果揭示了几个可能对未来研究有启发的发现:(1)变压器中绝对位置嵌入和相对位置嵌入的影响之间存在冲突;(2)基于ast的代码结构特征与相对位置编码特征之间存在较强的相关性和对代码语义汇总任务的贡献重叠;(3) Transformer模型在显式理解代码结构信息方面仍有进一步改进的空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization

Transformers are now widely utilized in code intelligence tasks. To better fit highly structured source code, various structure information is passed into Transformer, such as positional encoding and abstract syntax tree (AST) based structures. However, it is still not clear how these structural features affect code intelligence tasks, such as code summarization. Addressing this problem is of vital importance for designing Transformer-based code models. Existing works are keen to introduce various structural information into Transformers while lacking persuasive analysis to reveal their contributions and interaction effects. In this paper, we conduct an empirical study of frequently-used code structure features for code representation, including two types of position encoding features and AST-based structure features. We propose a couple of probing tasks to detect how these structure features perform in Transformer and conduct comprehensive ablation studies to investigate how these structural features affect code semantic summarization tasks. To further validate the effectiveness of code structure features in code summarization tasks, we assess Transformer models equipped with these code structure features on a structural dependent summarization dataset. Our experimental results reveal several findings that may inspire future study: (1) there is a conflict between the influence of the absolute positional embeddings and relative positional embeddings in Transformer; (2) AST-based code structure features and relative position encoding features show a strong correlation and much contribution overlap for code semantic summarization tasks indeed exists between them; (3) Transformer models still have space for further improvement in explicitly understanding code structure information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)

自引率

0.00%

发文量