程序的深度学习表示：系统的文献综述

IF 28 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

ACM Computing Surveys Pub Date : 2025-10-08 DOI:10.1145/3769008

Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai

{"title":"程序的深度学习表示：系统的文献综述","authors":"Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai","doi":"10.1145/3769008","DOIUrl":null,"url":null,"abstract":"In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of DL for code (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network architectures and deep-learning algorithms that take programs as inputs , serving various software engineering applications . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the raw inputs to the learning pipeline, neural network architecture employed, learning algorithm utilized, and downstream tasks (i.e., applications ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"56 1","pages":""},"PeriodicalIF":28.0000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning Representations of Programs: A Systematic Literature Review\",\"authors\":\"Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai\",\"doi\":\"10.1145/3769008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of DL for code (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network architectures and deep-learning algorithms that take programs as inputs , serving various software engineering applications . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the raw inputs to the learning pipeline, neural network architecture employed, learning algorithm utilized, and downstream tasks (i.e., applications ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.\",\"PeriodicalId\":50926,\"journal\":{\"name\":\"ACM Computing Surveys\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":28.0000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Computing Surveys\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3769008\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3769008","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

在当代，深度学习（DL）越来越被认为是一种有前途的方法，可以实现和优化各种技术，特别是在代码（软件程序）的深度学习领域。从本质上讲，深度学习主要是表征学习，这自然适用于这个领域。因此，代码深度学习的核心是程序的深度表示学习。然后可以将学习到的程序表示应用于各种与编码相关的任务，例如检测漏洞、提供API使用建议，以及从大量代码行中提取语义和语法见解。这是通过利用深度神经网络架构和深度学习算法来实现的，这些算法将程序作为输入，服务于各种软件工程应用程序。在本文中，我们进行了系统的文献检索，以回顾有关使用深度学习方法及其相应应用的程序表示的研究。我们检索了2017年至2023年间发表的178项主要研究。通过这些最新文献中的研究，我们提供了关于程序深度学习表示的系统化知识，涉及学习管道的原始输入，所采用的神经网络架构，所使用的学习算法以及所学习表示的下游任务（即应用程序）。在研究当前情况的同时，我们也确定了当前技术所面临的限制和挑战，以及深度程序表示学习的未来研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Learning Representations of Programs: A Systematic Literature Review

In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of DL for code (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network architectures and deep-learning algorithms that take programs as inputs , serving various software engineering applications . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the raw inputs to the learning pipeline, neural network architecture employed, learning algorithm utilized, and downstream tasks (i.e., applications ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.