A search-and-fill strategy to code generation for complex software requirements

IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yukun Dong, Lingjie Kong, Lulu Zhang, Shuqi Wang, Xiaoshan Liu, Shuai Liu, Mingcheng Chen
{"title":"A search-and-fill strategy to code generation for complex software requirements","authors":"Yukun Dong,&nbsp;Lingjie Kong,&nbsp;Lulu Zhang,&nbsp;Shuqi Wang,&nbsp;Xiaoshan Liu,&nbsp;Shuai Liu,&nbsp;Mingcheng Chen","doi":"10.1016/j.infsof.2024.107584","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>The realm of software development has seen significant transformations with the rise of Low-Code Development (LCD) and the integration of Artificial Intelligence (AI), particularly large language models, into coding practices. The proliferation of open-source software also offers vast resources for developers.</div></div><div><h3>Objective:</h3><div>We aim to combine the benefits of modifying retrieved code with the use of an extensive code repository to tackle the challenges of complex control structures and multifunctional requirements in software development.</div></div><div><h3>Method:</h3><div>Our study introduces a Search-and-Fill strategy that utilizes natural language processing (NLP) to dissect complex software requirements. It extracts control structures and identifies atomic function points. By leveraging large-scale pre-trained models, the strategy searches for these elements to fill in the automatically transformed program structures derived from descriptions of control structures. This process generates a code snippet that includes program control structures and the implementations of various function points, thereby facilitating both code reuse and efficient development.</div></div><div><h3>Results:</h3><div>We have validated the effectiveness of our strategy in generating code snippets. For natural language requirements involving multifunctional complex structures, we constructed two datasets: the Basic Complex Requirements Dataset (BCRD) and the Advanced Complex Requirements Dataset (ACRD). These datasets are based on natural language descriptions and Python code that were randomly extracted and combined. For the code snippets to be generated, we achieved the best results with the ACRD dataset, with BLEU-4 scores reaching up to 0.6326 and TEDS scores peaking at 0.7807.</div></div><div><h3>Conclusion:</h3><div>The Search-and-Fill strategy successfully generates a comprehensive code snippets, integrating essential control structures and functions to streamline the development process. Experimental results substantiate our strategy’s efficacy in optimizing code reuse by effectively integrating preprocessing and selection optimization approach. Future research will focus on enhancing the recognition of complex software requirements and further refining the code snippets.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107584"},"PeriodicalIF":3.8000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001897","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Context:

The realm of software development has seen significant transformations with the rise of Low-Code Development (LCD) and the integration of Artificial Intelligence (AI), particularly large language models, into coding practices. The proliferation of open-source software also offers vast resources for developers.

Objective:

We aim to combine the benefits of modifying retrieved code with the use of an extensive code repository to tackle the challenges of complex control structures and multifunctional requirements in software development.

Method:

Our study introduces a Search-and-Fill strategy that utilizes natural language processing (NLP) to dissect complex software requirements. It extracts control structures and identifies atomic function points. By leveraging large-scale pre-trained models, the strategy searches for these elements to fill in the automatically transformed program structures derived from descriptions of control structures. This process generates a code snippet that includes program control structures and the implementations of various function points, thereby facilitating both code reuse and efficient development.

Results:

We have validated the effectiveness of our strategy in generating code snippets. For natural language requirements involving multifunctional complex structures, we constructed two datasets: the Basic Complex Requirements Dataset (BCRD) and the Advanced Complex Requirements Dataset (ACRD). These datasets are based on natural language descriptions and Python code that were randomly extracted and combined. For the code snippets to be generated, we achieved the best results with the ACRD dataset, with BLEU-4 scores reaching up to 0.6326 and TEDS scores peaking at 0.7807.

Conclusion:

The Search-and-Fill strategy successfully generates a comprehensive code snippets, integrating essential control structures and functions to streamline the development process. Experimental results substantiate our strategy’s efficacy in optimizing code reuse by effectively integrating preprocessing and selection optimization approach. Future research will focus on enhancing the recognition of complex software requirements and further refining the code snippets.
针对复杂软件需求的代码生成搜索和填充策略
背景:随着低代码开发(Low-Code Development,LCD)的兴起以及人工智能(Artificial Intelligence,AI)尤其是大型语言模型与编码实践的结合,软件开发领域发生了重大变革。目标:我们的目标是将修改检索代码的优势与使用广泛的代码库相结合,以应对软件开发中复杂控制结构和多功能需求的挑战。方法:我们的研究引入了一种搜索和填充策略,利用自然语言处理(NLP)来剖析复杂的软件需求。它能提取控制结构并识别原子功能点。通过利用大规模预训练模型,该策略可搜索这些元素,以填充根据控制结构描述自动转换的程序结构。这一过程生成的代码片段包括程序控制结构和各种功能点的实现,从而促进了代码重用和高效开发。对于涉及多功能复杂结构的自然语言需求,我们构建了两个数据集:基本复杂需求数据集(BCRD)和高级复杂需求数据集(ACRD)。这些数据集基于随机提取和组合的自然语言描述和 Python 代码。对于要生成的代码片段,我们使用 ACRD 数据集取得了最好的结果,BLEU-4 分数高达 0.6326,TEDS 分数最高达到 0.7807。实验结果证明,通过有效整合预处理和选择优化方法,我们的策略在优化代码重用方面卓有成效。未来的研究重点将放在提高复杂软件需求的识别能力和进一步完善代码片段上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information and Software Technology
Information and Software Technology 工程技术-计算机:软件工程
CiteScore
9.10
自引率
7.70%
发文量
164
审稿时长
9.6 weeks
期刊介绍: Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信