Dianshu Liao;Shidong Pan;Xiaoyu Sun;Xiaoxue Ren;Qing Huang;Zhenchang Xing;Huan Jin;Qinying Li
{"title":"$\\mathbf{A^{3}}$A3-CodGen: A Repository-Level Code Generation Framework for Code Reuse With Local-Aware, Global-Aware, and Third-Party-Library-Aware","authors":"Dianshu Liao;Shidong Pan;Xiaoyu Sun;Xiaoxue Ren;Qing Huang;Zhenchang Xing;Huan Jin;Qinying Li","doi":"10.1109/TSE.2024.3486195","DOIUrl":null,"url":null,"abstract":"LLM-based code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed \n<inline-formula><tex-math>$A^{3}$</tex-math></inline-formula>\n-CodGen, to harness information within the code repository to generate code with fewer potential logical errors, code redundancy, and library-induced compatibility issues. We identify three types of representative information for the code repository: local-aware information from the current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the \n<inline-formula><tex-math>$A^{3}$</tex-math></inline-formula>\n-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 12","pages":"3369-3384"},"PeriodicalIF":6.5000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10734067/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
LLM-based code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed
$A^{3}$
-CodGen, to harness information within the code repository to generate code with fewer potential logical errors, code redundancy, and library-induced compatibility issues. We identify three types of representative information for the code repository: local-aware information from the current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the
$A^{3}$
-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.
基于llm的代码生成工具对于在软件开发过程中帮助开发人员是必不可少的。现有的工具经常与工作环境脱节,例如,代码存储库,导致生成的代码与人类开发人员不相似。在本文中,我们提出了一个新的代码生成框架,称为$ a ^{3}$-CodGen,以利用代码存储库中的信息来生成具有更少潜在逻辑错误、代码冗余和库引起的兼容性问题的代码。我们为代码存储库确定了三种类型的代表性信息:来自当前代码文件的本地感知信息,来自其他代码文件的全局感知信息,以及第三方库信息。结果表明,通过采用$A^{3}$-CodGen框架,我们成功地提取、融合并将代码库信息提供给LLM,生成了更准确、高效和高可重用性的代码。与人类开发人员相比,通过生成具有更高重用率的代码,进一步强调了我们框架的有效性。该研究对代码生成领域做出了重大贡献,为开发人员提供了一个更强大的工具来解决软件开发实践中不断变化的需求。
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.