基于变压器的代码语义摘要结构特征的深入研究

Kang Yang, Xinjun Mao, Shangwen Wang, Yihao Qin, Tanghaoran Zhang, Yao Lu, Kamal Al-Sabahi
{"title":"基于变压器的代码语义摘要结构特征的深入研究","authors":"Kang Yang, Xinjun Mao, Shangwen Wang, Yihao Qin, Tanghaoran Zhang, Yao Lu, Kamal Al-Sabahi","doi":"10.1109/ICPC58990.2023.00024","DOIUrl":null,"url":null,"abstract":"Transformers are now widely utilized in code intelligence tasks. To better fit highly structured source code, various structure information is passed into Transformer, such as positional encoding and abstract syntax tree (AST) based structures. However, it is still not clear how these structural features affect code intelligence tasks, such as code summarization. Addressing this problem is of vital importance for designing Transformer-based code models. Existing works are keen to introduce various structural information into Transformers while lacking persuasive analysis to reveal their contributions and interaction effects. In this paper, we conduct an empirical study of frequently-used code structure features for code representation, including two types of position encoding features and AST-based structure features. We propose a couple of probing tasks to detect how these structure features perform in Transformer and conduct comprehensive ablation studies to investigate how these structural features affect code semantic summarization tasks. To further validate the effectiveness of code structure features in code summarization tasks, we assess Transformer models equipped with these code structure features on a structural dependent summarization dataset. Our experimental results reveal several findings that may inspire future study: (1) there is a conflict between the influence of the absolute positional embeddings and relative positional embeddings in Transformer; (2) AST-based code structure features and relative position encoding features show a strong correlation and much contribution overlap for code semantic summarization tasks indeed exists between them; (3) Transformer models still have space for further improvement in explicitly understanding code structure information.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization\",\"authors\":\"Kang Yang, Xinjun Mao, Shangwen Wang, Yihao Qin, Tanghaoran Zhang, Yao Lu, Kamal Al-Sabahi\",\"doi\":\"10.1109/ICPC58990.2023.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformers are now widely utilized in code intelligence tasks. To better fit highly structured source code, various structure information is passed into Transformer, such as positional encoding and abstract syntax tree (AST) based structures. However, it is still not clear how these structural features affect code intelligence tasks, such as code summarization. Addressing this problem is of vital importance for designing Transformer-based code models. Existing works are keen to introduce various structural information into Transformers while lacking persuasive analysis to reveal their contributions and interaction effects. In this paper, we conduct an empirical study of frequently-used code structure features for code representation, including two types of position encoding features and AST-based structure features. We propose a couple of probing tasks to detect how these structure features perform in Transformer and conduct comprehensive ablation studies to investigate how these structural features affect code semantic summarization tasks. To further validate the effectiveness of code structure features in code summarization tasks, we assess Transformer models equipped with these code structure features on a structural dependent summarization dataset. Our experimental results reveal several findings that may inspire future study: (1) there is a conflict between the influence of the absolute positional embeddings and relative positional embeddings in Transformer; (2) AST-based code structure features and relative position encoding features show a strong correlation and much contribution overlap for code semantic summarization tasks indeed exists between them; (3) Transformer models still have space for further improvement in explicitly understanding code structure information.\",\"PeriodicalId\":376593,\"journal\":{\"name\":\"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPC58990.2023.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC58990.2023.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

变形器现在广泛应用于代码智能任务中。为了更好地适应高度结构化的源代码,各种结构信息被传递到Transformer中,例如位置编码和基于抽象语法树(AST)的结构。然而,这些结构特性如何影响代码智能任务(如代码摘要)仍然不清楚。解决这个问题对于设计基于transformer的代码模型是至关重要的。现有的作品热衷于将各种结构信息引入变形金刚,但缺乏有说服力的分析来揭示它们的贡献和相互作用。本文对编码表示中常用的编码结构特征进行了实证研究,包括两类位置编码特征和基于ast的编码结构特征。我们提出了几个探测任务来检测这些结构特征如何在Transformer中执行,并进行全面的消融研究来研究这些结构特征如何影响代码语义摘要任务。为了进一步验证代码结构特征在代码摘要任务中的有效性,我们在一个结构依赖的摘要数据集上评估了配备了这些代码结构特征的Transformer模型。我们的实验结果揭示了几个可能对未来研究有启发的发现:(1)变压器中绝对位置嵌入和相对位置嵌入的影响之间存在冲突;(2)基于ast的代码结构特征与相对位置编码特征之间存在较强的相关性和对代码语义汇总任务的贡献重叠;(3) Transformer模型在显式理解代码结构信息方面仍有进一步改进的空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization
Transformers are now widely utilized in code intelligence tasks. To better fit highly structured source code, various structure information is passed into Transformer, such as positional encoding and abstract syntax tree (AST) based structures. However, it is still not clear how these structural features affect code intelligence tasks, such as code summarization. Addressing this problem is of vital importance for designing Transformer-based code models. Existing works are keen to introduce various structural information into Transformers while lacking persuasive analysis to reveal their contributions and interaction effects. In this paper, we conduct an empirical study of frequently-used code structure features for code representation, including two types of position encoding features and AST-based structure features. We propose a couple of probing tasks to detect how these structure features perform in Transformer and conduct comprehensive ablation studies to investigate how these structural features affect code semantic summarization tasks. To further validate the effectiveness of code structure features in code summarization tasks, we assess Transformer models equipped with these code structure features on a structural dependent summarization dataset. Our experimental results reveal several findings that may inspire future study: (1) there is a conflict between the influence of the absolute positional embeddings and relative positional embeddings in Transformer; (2) AST-based code structure features and relative position encoding features show a strong correlation and much contribution overlap for code semantic summarization tasks indeed exists between them; (3) Transformer models still have space for further improvement in explicitly understanding code structure information.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信