More than a framework: Sketching out technical enablers for natural language-based source code generation

IF 13.3 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science Review Pub Date : 2024-05-25 DOI:10.1016/j.cosrev.2024.100637

Chen Yang, Yan Liu, Changqing Yin

{"title":"More than a framework: Sketching out technical enablers for natural language-based source code generation","authors":"Chen Yang, Yan Liu, Changqing Yin","doi":"10.1016/j.cosrev.2024.100637","DOIUrl":null,"url":null,"abstract":"<div><p>Natural Language-based Source Code Generation (NLSCG) holds the promise to revolutionize the way how software is developed by means of facilitating a collection of intelligent technical enablers, based on sustained improvements on the natural language to source code pipelines and continuous adoption of new coding paradigms. In recent years, a large variety of NLSCG technical solutions have been proposed, and quite exciting experimental results have been reported. Meanwhile, current researches and initiative application projects in this area reflect a large diversity of NLSCG contexts and of major technical enablers. Such heterogeneity, fragmentation, and vagueness of the NLSCG technical landscape are currently frustrating the full realization of the NLSCG research and application vision. Players in this field could not find systematic guidelines on how to effectively address the ”known unknowns” and how to simply spot the ”unknown unknowns”, which eventually hinder the turning of NLSCG solutions into further research enhancements or production applications. Understanding the context, boundaries, capabilities, and integrations of NLSCG enablers is considered as one of the key drivers for the more practical application of NLSCG models. In this paper, we analyze in detail the natural language to source code pipelines and the evolvement of source code generation tasks, by considering both the problem context and technological aspects. A foresight reference framework for NLSCG is proposed to help handle the source code generation tasks with proper intelligent models. We review the present-day NLSCG technical landscape, as well as the core technical enablers along the source code generation pipelines. Relevant experiments are conducted to validate the role of representative models across different technical enablers on typical datasets, and we finally highlight the contribution of different enablers to code generation capabilities.</p></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"53 ","pages":"Article 100637"},"PeriodicalIF":13.3000,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013724000212","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Natural Language-based Source Code Generation (NLSCG) holds the promise to revolutionize the way how software is developed by means of facilitating a collection of intelligent technical enablers, based on sustained improvements on the natural language to source code pipelines and continuous adoption of new coding paradigms. In recent years, a large variety of NLSCG technical solutions have been proposed, and quite exciting experimental results have been reported. Meanwhile, current researches and initiative application projects in this area reflect a large diversity of NLSCG contexts and of major technical enablers. Such heterogeneity, fragmentation, and vagueness of the NLSCG technical landscape are currently frustrating the full realization of the NLSCG research and application vision. Players in this field could not find systematic guidelines on how to effectively address the ”known unknowns” and how to simply spot the ”unknown unknowns”, which eventually hinder the turning of NLSCG solutions into further research enhancements or production applications. Understanding the context, boundaries, capabilities, and integrations of NLSCG enablers is considered as one of the key drivers for the more practical application of NLSCG models. In this paper, we analyze in detail the natural language to source code pipelines and the evolvement of source code generation tasks, by considering both the problem context and technological aspects. A foresight reference framework for NLSCG is proposed to help handle the source code generation tasks with proper intelligent models. We review the present-day NLSCG technical landscape, as well as the core technical enablers along the source code generation pipelines. Relevant experiments are conducted to validate the role of representative models across different technical enablers on typical datasets, and we finally highlight the contribution of different enablers to code generation capabilities.

查看原文本刊更多论文

不仅仅是一个框架：勾勒基于自然语言的源代码生成的技术手段

基于自然语言的源代码生成（NLSCG）有望在持续改进自然语言到源代码流水线和不断采用新的编码范例的基础上，通过促进一系列智能技术手段，彻底改变软件开发的方式。近年来，人们提出了各种各样的 NLSCG 技术解决方案，并取得了令人振奋的实验结果。同时，该领域当前的研究和主动应用项目反映出 NLSCG 环境和主要技术手段的巨大多样性。目前，NLSCG 技术领域的这种异质性、分散性和模糊性阻碍了 NLSCG 研究和应用愿景的全面实现。该领域的参与者找不到系统的指导原则，不知道如何有效解决 "已知的未知问题"，也不知道如何简单地发现 "未知的未知问题"，这些问题最终阻碍了 NLSCG 解决方案转化为进一步的研究改进或生产应用。了解 NLSCG 使能因素的背景、边界、能力和集成被认为是更实际应用 NLSCG 模型的关键驱动力之一。在本文中，我们通过考虑问题背景和技术方面，详细分析了从自然语言到源代码的流水线以及源代码生成任务的发展。我们提出了一个 NLSCG 的前瞻性参考框架，以帮助使用适当的智能模型处理源代码生成任务。我们回顾了当今 NLSCG 的技术状况，以及源代码生成管道的核心技术推动因素。我们进行了相关实验，以验证不同技术手段的代表模型在典型数据集上的作用，最后我们强调了不同技术手段对代码生成能力的贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Science Review Computer Science-General Computer Science

CiteScore

32.70

自引率

0.00%

发文量

审稿时长

51 days

期刊介绍： Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.