Le Deng;Xiaoxia Ren;Chao Ni;Ming Liang;David Lo;Zhongxin Liu
{"title":"Enhancing Project-Specific Code Completion by Inferring Internal API Information","authors":"Le Deng;Xiaoxia Ren;Chao Ni;Ming Liang;David Lo;Zhongxin Liu","doi":"10.1109/TSE.2025.3592823","DOIUrl":null,"url":null,"abstract":"Project-specific code completion, which aims to complete code based on the context of the project, is an important and practical software engineering task. The state-of-the-art approaches employ the retrieval-augmented generation (RAG) paradigm and prompt large language models (LLMs) with information retrieved from the target project for project-specific code completion. In practice, developers always define and use custom functionalities, namely internal APIs, to facilitate the implementation of specific project requirements. Thus, it is essential to consider internal API information for accurate project-specific code completion. However, existing approaches either retrieve similar code snippets, which do not necessarily contain related internal API information, or retrieve internal API information based on import statements, which usually do not exist when the related internal APIs haven’t been used in the file. Therefore, these project-specific code completion approaches face challenges in effectiveness or practicability. To this end, this paper aims to enhance project-specific code completion by locating internal API information without relying on import statements. We first propose a method to infer internal API information. Our method first extends the representation of each internal API by constructing its usage examples and functional semantic information (i.e., a natural language description of the function’s purpose) and constructs a knowledge base. Based on the knowledge base, our method uses an initial completion solution generated by LLMs to infer the API information necessary for completion. Based on this method, we propose a code completion approach that enhances project-specific code completion by integrating similar code snippets and internal API information. Furthermore, we developed a benchmark named ProjBench, which consists of recent, large-scale real-world projects and is free of leaked import statements. We evaluated the effectiveness of our approach on ProjBench and an existing benchmark CrossCodeEval. Experimental results show that our approach outperforms the base-performing approach by an average of +5.91 in code exact match and +6.26 in identifier exact match, corresponding to relative improvements of 22.72% and 18.31%, respectively. We also show our method complements existing ones by integrating it into various baselines, boosting code match by +7.77 (47.80%) and identifier match by +8.50 (35.55%) on average.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 9","pages":"2566-2582"},"PeriodicalIF":5.6000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11096713/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Project-specific code completion, which aims to complete code based on the context of the project, is an important and practical software engineering task. The state-of-the-art approaches employ the retrieval-augmented generation (RAG) paradigm and prompt large language models (LLMs) with information retrieved from the target project for project-specific code completion. In practice, developers always define and use custom functionalities, namely internal APIs, to facilitate the implementation of specific project requirements. Thus, it is essential to consider internal API information for accurate project-specific code completion. However, existing approaches either retrieve similar code snippets, which do not necessarily contain related internal API information, or retrieve internal API information based on import statements, which usually do not exist when the related internal APIs haven’t been used in the file. Therefore, these project-specific code completion approaches face challenges in effectiveness or practicability. To this end, this paper aims to enhance project-specific code completion by locating internal API information without relying on import statements. We first propose a method to infer internal API information. Our method first extends the representation of each internal API by constructing its usage examples and functional semantic information (i.e., a natural language description of the function’s purpose) and constructs a knowledge base. Based on the knowledge base, our method uses an initial completion solution generated by LLMs to infer the API information necessary for completion. Based on this method, we propose a code completion approach that enhances project-specific code completion by integrating similar code snippets and internal API information. Furthermore, we developed a benchmark named ProjBench, which consists of recent, large-scale real-world projects and is free of leaked import statements. We evaluated the effectiveness of our approach on ProjBench and an existing benchmark CrossCodeEval. Experimental results show that our approach outperforms the base-performing approach by an average of +5.91 in code exact match and +6.26 in identifier exact match, corresponding to relative improvements of 22.72% and 18.31%, respectively. We also show our method complements existing ones by integrating it into various baselines, boosting code match by +7.77 (47.80%) and identifier match by +8.50 (35.55%) on average.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.