Specialized model initialization and architecture optimization for few-shot code search

IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Fan Zhang , Qiang Wu , Manman Peng , Yuanyuan Shen
{"title":"Specialized model initialization and architecture optimization for few-shot code search","authors":"Fan Zhang ,&nbsp;Qiang Wu ,&nbsp;Manman Peng ,&nbsp;Yuanyuan Shen","doi":"10.1016/j.infsof.2024.107571","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><p>Code search aims to find relevant code snippets from a codebase given a natural language query. It not only boosts developer efficiency but also improves the performance of tasks such as code generation and program repair, thus becoming one of the crucial tasks in software engineering.</p></div><div><h3>Objective:</h3><p>However, recent works are mainly designed for mainstream programming languages with abundant training data. We aim to address the challenges of code search for domain-specific programming languages with limited training data by proposing a novel two-stage, few-shot code search framework named SMIAO.</p></div><div><h3>Method:</h3><p>SMIAO includes a specialized model initialization and an architecture optimization stage. In the first stage, we first quantitatively identify a mainstream programming language’s dataset that is semantically closest to a target few-shot programming language. Then, we enrich the dataset with hard samples and train an Adapter-GraphCodeBERT model to obtain well-initialized parameters. In the second stage, we first design a search space for the initialized Adapter-GraphCodeBERT model. Then, we employ neural architecture search to optimize the Adapter modules’ positions and quantities in the GraphCodeBERT layers, tailoring for real-world few-shot code search tasks.</p></div><div><h3>Results:</h3><p>We conduct experiments on a publicly available dataset to demonstrate the effectiveness and rationality of SMIAO. The experimental results show that SMIAO outperforms other state-of-the-art baselines.</p></div><div><h3>Conclusion:</h3><p>Using mainstream languages’ datasets to initialize Adapter-GraphCodeBERT models, followed by adjusting the quantities and positions of Adapter modules within the GraphCodeBERT layers by neural architecture search, can effectively improve the performance of few-shot code search tasks.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107571"},"PeriodicalIF":3.8000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001769/pdfft?md5=42c9abafebc31bfce0fe9d0923669722&pid=1-s2.0-S0950584924001769-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001769","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Context:

Code search aims to find relevant code snippets from a codebase given a natural language query. It not only boosts developer efficiency but also improves the performance of tasks such as code generation and program repair, thus becoming one of the crucial tasks in software engineering.

Objective:

However, recent works are mainly designed for mainstream programming languages with abundant training data. We aim to address the challenges of code search for domain-specific programming languages with limited training data by proposing a novel two-stage, few-shot code search framework named SMIAO.

Method:

SMIAO includes a specialized model initialization and an architecture optimization stage. In the first stage, we first quantitatively identify a mainstream programming language’s dataset that is semantically closest to a target few-shot programming language. Then, we enrich the dataset with hard samples and train an Adapter-GraphCodeBERT model to obtain well-initialized parameters. In the second stage, we first design a search space for the initialized Adapter-GraphCodeBERT model. Then, we employ neural architecture search to optimize the Adapter modules’ positions and quantities in the GraphCodeBERT layers, tailoring for real-world few-shot code search tasks.

Results:

We conduct experiments on a publicly available dataset to demonstrate the effectiveness and rationality of SMIAO. The experimental results show that SMIAO outperforms other state-of-the-art baselines.

Conclusion:

Using mainstream languages’ datasets to initialize Adapter-GraphCodeBERT models, followed by adjusting the quantities and positions of Adapter modules within the GraphCodeBERT layers by neural architecture search, can effectively improve the performance of few-shot code search tasks.

针对少量代码搜索的专用模型初始化和架构优化
背景:代码搜索旨在根据自然语言查询从代码库中找到相关的代码片段。它不仅能提高开发人员的效率,还能改善代码生成和程序修复等任务的性能,因此成为软件工程中的重要任务之一。目标:然而,最近的研究主要是针对训练数据丰富的主流编程语言而设计的。方法:SMIAO 包括专门的模型初始化和架构优化阶段。在第一阶段,我们首先定量地确定一个主流编程语言的数据集,该数据集在语义上最接近目标 few-shot 编程语言。然后,我们用硬样本丰富数据集,并训练 Adapter-GraphCodeBERT 模型,以获得良好的初始化参数。在第二阶段,我们首先为初始化的 Adapter-GraphCodeBERT 模型设计一个搜索空间。然后,我们采用神经架构搜索来优化适配器模块在GraphCodeBERT层中的位置和数量,为现实世界的少量代码搜索任务量身定制。结果:我们在一个公开可用的数据集上进行了实验,以证明SMIAO的有效性和合理性。实验结果表明,SMIAO优于其他最先进的基线。结论:利用主流语言的数据集初始化Adapter-GraphCodeBERT模型,然后通过神经架构搜索调整Adapter模块在GraphCodeBERT层中的数量和位置,可以有效提高少量代码搜索任务的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information and Software Technology
Information and Software Technology 工程技术-计算机:软件工程
CiteScore
9.10
自引率
7.70%
发文量
164
审稿时长
9.6 weeks
期刊介绍: Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信