$\mathbf{A^{3}}$A3-CodGen: A Repository-Level Code Generation Framework for Code Reuse With Local-Aware, Global-Aware, and Third-Party-Library-Aware

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Dianshu Liao;Shidong Pan;Xiaoyu Sun;Xiaoxue Ren;Qing Huang;Zhenchang Xing;Huan Jin;Qinying Li
{"title":"$\\mathbf{A^{3}}$A3-CodGen: A Repository-Level Code Generation Framework for Code Reuse With Local-Aware, Global-Aware, and Third-Party-Library-Aware","authors":"Dianshu Liao;Shidong Pan;Xiaoyu Sun;Xiaoxue Ren;Qing Huang;Zhenchang Xing;Huan Jin;Qinying Li","doi":"10.1109/TSE.2024.3486195","DOIUrl":null,"url":null,"abstract":"LLM-based code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed \n<inline-formula><tex-math>$A^{3}$</tex-math></inline-formula>\n-CodGen, to harness information within the code repository to generate code with fewer potential logical errors, code redundancy, and library-induced compatibility issues. We identify three types of representative information for the code repository: local-aware information from the current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the \n<inline-formula><tex-math>$A^{3}$</tex-math></inline-formula>\n-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 12","pages":"3369-3384"},"PeriodicalIF":6.5000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10734067/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

LLM-based code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed $A^{3}$ -CodGen, to harness information within the code repository to generate code with fewer potential logical errors, code redundancy, and library-induced compatibility issues. We identify three types of representative information for the code repository: local-aware information from the current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the $A^{3}$ -CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.
A3-CodGen : 面向代码重用的存储库级代码生成框架,具有本地感知、全局感知和第三方库感知功能
基于llm的代码生成工具对于在软件开发过程中帮助开发人员是必不可少的。现有的工具经常与工作环境脱节,例如,代码存储库,导致生成的代码与人类开发人员不相似。在本文中,我们提出了一个新的代码生成框架,称为$ a ^{3}$-CodGen,以利用代码存储库中的信息来生成具有更少潜在逻辑错误、代码冗余和库引起的兼容性问题的代码。我们为代码存储库确定了三种类型的代表性信息:来自当前代码文件的本地感知信息,来自其他代码文件的全局感知信息,以及第三方库信息。结果表明,通过采用$A^{3}$-CodGen框架,我们成功地提取、融合并将代码库信息提供给LLM,生成了更准确、高效和高可重用性的代码。与人类开发人员相比,通过生成具有更高重用率的代码,进一步强调了我们框架的有效性。该研究对代码生成领域做出了重大贡献,为开发人员提供了一个更强大的工具来解决软件开发实践中不断变化的需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信