{"title":"DPEfficR: a data and parameter efficient approach for training neural API recommendation model","authors":"Haibo Yu, Xiaohong Han, Simin Chen, Xiaoning Feng, Guangzhao Sun, Wei Yang","doi":"10.1007/s10515-025-00530-8","DOIUrl":null,"url":null,"abstract":"<div><p>Recommending application programming interfaces (APIs) is practical and essential in today’s programming landscape. An accurate API recommendation system could significantly improve developers’ coding efficiency. State-of-the-art (SOTA) API recommendation systems typically employ deep learning models as the backend model. However, training the backend deep learning model for API recommendation systems poses a challenging task due to the significant effort required for data labeling and the need for extensive computations. These challenges deeply affect the process of updating an existing API recommendation system when the API evolves. To address these issues, this paper proposes <span>DPEfficR</span>, a data and parameter efficient method for building API recommendation systems. Specifically, <span>DPEfficR</span> includes (1) the data selection module; (2) the task-specific parameter tuning module; and (3) the runtime API selection module. The data selection module selects representative data, while the task-specific parameter tuning module tunes pre-trained LLMs with a small number of parameters. Once the LLM is well-tuned, the runtime API selection module searches for a more accurate API sequence through consistency checking. We compare our approach against seven baseline methods, which belong to three different types. Our comprehensive evaluation demonstrates the effectiveness of our approach in recommending a more accurate API sequence, achieving improvements of 40% in BLEU-4 and 25% in ROUGE-2 over the baseline methods, with only <span>\\(\\varvec{3.61 \\times 10}^{\\varvec{4}}\\)</span> tunable parameters, representing just 0.049% of the parameters used in the baseline methods. Moreover, our ablation study demonstrates the effectiveness of the proposed modules in our systems.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00530-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Recommending application programming interfaces (APIs) is practical and essential in today’s programming landscape. An accurate API recommendation system could significantly improve developers’ coding efficiency. State-of-the-art (SOTA) API recommendation systems typically employ deep learning models as the backend model. However, training the backend deep learning model for API recommendation systems poses a challenging task due to the significant effort required for data labeling and the need for extensive computations. These challenges deeply affect the process of updating an existing API recommendation system when the API evolves. To address these issues, this paper proposes DPEfficR, a data and parameter efficient method for building API recommendation systems. Specifically, DPEfficR includes (1) the data selection module; (2) the task-specific parameter tuning module; and (3) the runtime API selection module. The data selection module selects representative data, while the task-specific parameter tuning module tunes pre-trained LLMs with a small number of parameters. Once the LLM is well-tuned, the runtime API selection module searches for a more accurate API sequence through consistency checking. We compare our approach against seven baseline methods, which belong to three different types. Our comprehensive evaluation demonstrates the effectiveness of our approach in recommending a more accurate API sequence, achieving improvements of 40% in BLEU-4 and 25% in ROUGE-2 over the baseline methods, with only \(\varvec{3.61 \times 10}^{\varvec{4}}\) tunable parameters, representing just 0.049% of the parameters used in the baseline methods. Moreover, our ablation study demonstrates the effectiveness of the proposed modules in our systems.
在当今的编程环境中,推荐应用程序编程接口(api)是实用且必要的。一个准确的API推荐系统可以显著提高开发人员的编码效率。最先进(SOTA) API推荐系统通常使用深度学习模型作为后端模型。然而,训练API推荐系统的后端深度学习模型是一项具有挑战性的任务,因为数据标记需要大量的工作,并且需要大量的计算。当API发展时,这些挑战会严重影响现有API推荐系统的更新过程。为了解决这些问题,本文提出了一种数据和参数高效的构建API推荐系统的方法DPEfficR。具体来说,DPEfficR包括(1)数据选择模块;(2)任务参数调优模块;(3)运行时API选择模块。数据选择模块选择具有代表性的数据,而特定于任务的参数调优模块则对带有少量参数的预训练llm进行调优。LLM调优后,运行时API选择模块通过一致性检查搜索更准确的API序列。我们将我们的方法与属于三种不同类型的七种基线方法进行比较。我们的综合评估证明了我们的方法在推荐更准确的API序列方面的有效性,实现了40的改进% in BLEU-4 and 25% in ROUGE-2 over the baseline methods, with only \(\varvec{3.61 \times 10}^{\varvec{4}}\) tunable parameters, representing just 0.049% of the parameters used in the baseline methods. Moreover, our ablation study demonstrates the effectiveness of the proposed modules in our systems.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.