程序的深度学习表示:系统的文献综述

IF 28 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai
{"title":"程序的深度学习表示:系统的文献综述","authors":"Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai","doi":"10.1145/3769008","DOIUrl":null,"url":null,"abstract":"In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of <jats:italic toggle=\"yes\">DL for code</jats:italic> (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network <jats:italic toggle=\"yes\">architectures</jats:italic> and deep-learning <jats:italic toggle=\"yes\">algorithms</jats:italic> that take programs as <jats:italic toggle=\"yes\">inputs</jats:italic> , serving various software engineering <jats:italic toggle=\"yes\">applications</jats:italic> . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the <jats:italic toggle=\"yes\">raw inputs</jats:italic> to the learning pipeline, <jats:italic toggle=\"yes\">neural network architecture</jats:italic> employed, learning algorithm utilized, and downstream tasks (i.e., <jats:italic toggle=\"yes\">applications</jats:italic> ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"56 1","pages":""},"PeriodicalIF":28.0000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning Representations of Programs: A Systematic Literature Review\",\"authors\":\"Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai\",\"doi\":\"10.1145/3769008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of <jats:italic toggle=\\\"yes\\\">DL for code</jats:italic> (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network <jats:italic toggle=\\\"yes\\\">architectures</jats:italic> and deep-learning <jats:italic toggle=\\\"yes\\\">algorithms</jats:italic> that take programs as <jats:italic toggle=\\\"yes\\\">inputs</jats:italic> , serving various software engineering <jats:italic toggle=\\\"yes\\\">applications</jats:italic> . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the <jats:italic toggle=\\\"yes\\\">raw inputs</jats:italic> to the learning pipeline, <jats:italic toggle=\\\"yes\\\">neural network architecture</jats:italic> employed, learning algorithm utilized, and downstream tasks (i.e., <jats:italic toggle=\\\"yes\\\">applications</jats:italic> ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.\",\"PeriodicalId\":50926,\"journal\":{\"name\":\"ACM Computing Surveys\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":28.0000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Computing Surveys\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3769008\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3769008","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

在当代,深度学习(DL)越来越被认为是一种有前途的方法,可以实现和优化各种技术,特别是在代码(软件程序)的深度学习领域。从本质上讲,深度学习主要是表征学习,这自然适用于这个领域。因此,代码深度学习的核心是程序的深度表示学习。然后可以将学习到的程序表示应用于各种与编码相关的任务,例如检测漏洞、提供API使用建议,以及从大量代码行中提取语义和语法见解。这是通过利用深度神经网络架构和深度学习算法来实现的,这些算法将程序作为输入,服务于各种软件工程应用程序。在本文中,我们进行了系统的文献检索,以回顾有关使用深度学习方法及其相应应用的程序表示的研究。我们检索了2017年至2023年间发表的178项主要研究。通过这些最新文献中的研究,我们提供了关于程序深度学习表示的系统化知识,涉及学习管道的原始输入,所采用的神经网络架构,所使用的学习算法以及所学习表示的下游任务(即应用程序)。在研究当前情况的同时,我们也确定了当前技术所面临的限制和挑战,以及深度程序表示学习的未来研究方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep Learning Representations of Programs: A Systematic Literature Review
In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of DL for code (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network architectures and deep-learning algorithms that take programs as inputs , serving various software engineering applications . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the raw inputs to the learning pipeline, neural network architecture employed, learning algorithm utilized, and downstream tasks (i.e., applications ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Computing Surveys
ACM Computing Surveys 工程技术-计算机:理论方法
CiteScore
33.20
自引率
0.60%
发文量
372
审稿时长
12 months
期刊介绍: ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信