Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai
{"title":"程序的深度学习表示:系统的文献综述","authors":"Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai","doi":"10.1145/3769008","DOIUrl":null,"url":null,"abstract":"In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of <jats:italic toggle=\"yes\">DL for code</jats:italic> (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network <jats:italic toggle=\"yes\">architectures</jats:italic> and deep-learning <jats:italic toggle=\"yes\">algorithms</jats:italic> that take programs as <jats:italic toggle=\"yes\">inputs</jats:italic> , serving various software engineering <jats:italic toggle=\"yes\">applications</jats:italic> . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the <jats:italic toggle=\"yes\">raw inputs</jats:italic> to the learning pipeline, <jats:italic toggle=\"yes\">neural network architecture</jats:italic> employed, learning algorithm utilized, and downstream tasks (i.e., <jats:italic toggle=\"yes\">applications</jats:italic> ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"56 1","pages":""},"PeriodicalIF":28.0000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning Representations of Programs: A Systematic Literature Review\",\"authors\":\"Deepika Shanmugasundaram, Pallavi Arivukkarasu, Huaming Chen, Haipeng Cai\",\"doi\":\"10.1145/3769008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of <jats:italic toggle=\\\"yes\\\">DL for code</jats:italic> (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network <jats:italic toggle=\\\"yes\\\">architectures</jats:italic> and deep-learning <jats:italic toggle=\\\"yes\\\">algorithms</jats:italic> that take programs as <jats:italic toggle=\\\"yes\\\">inputs</jats:italic> , serving various software engineering <jats:italic toggle=\\\"yes\\\">applications</jats:italic> . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the <jats:italic toggle=\\\"yes\\\">raw inputs</jats:italic> to the learning pipeline, <jats:italic toggle=\\\"yes\\\">neural network architecture</jats:italic> employed, learning algorithm utilized, and downstream tasks (i.e., <jats:italic toggle=\\\"yes\\\">applications</jats:italic> ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.\",\"PeriodicalId\":50926,\"journal\":{\"name\":\"ACM Computing Surveys\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":28.0000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Computing Surveys\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3769008\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3769008","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Deep Learning Representations of Programs: A Systematic Literature Review
In the contemporary era, deep learning (DL) is increasingly recognized as a promising approach for enabling and optimizing various techniques, notably in the domain of DL for code (software programs). In essence, deep learning is mainly representation learning, which naturally holds for this domain. Thus, at the core of DL for code is deep representation learning for programs. The learned program representations can then be applied to various coding related tasks, such as detecting vulnerabilities, providing recommendations for API usage, and extracting semantic and syntactic insights from extensive code lines. This is achieved by harnessing deep neural network architectures and deep-learning algorithms that take programs as inputs , serving various software engineering applications . In this paper, we conduct a systematic literature search to review studies pertaining to the representation of programs using deep learning approaches and their corresponding applications. Our search yielded 178 primary studies published between 2017 and 2023. Through these studies in the latest literature, we provide a systematization of knowledge in deep learning representation of programs, concerning the raw inputs to the learning pipeline, neural network architecture employed, learning algorithm utilized, and downstream tasks (i.e., applications ) of the learned representations. While examining the current landscape, we also identify limitations and challenges faced in the state of the art, as well as promising future research directions in deep program representation learning.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.