{"title":"A search-and-fill strategy to code generation for complex software requirements","authors":"Yukun Dong, Lingjie Kong, Lulu Zhang, Shuqi Wang, Xiaoshan Liu, Shuai Liu, Mingcheng Chen","doi":"10.1016/j.infsof.2024.107584","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>The realm of software development has seen significant transformations with the rise of Low-Code Development (LCD) and the integration of Artificial Intelligence (AI), particularly large language models, into coding practices. The proliferation of open-source software also offers vast resources for developers.</div></div><div><h3>Objective:</h3><div>We aim to combine the benefits of modifying retrieved code with the use of an extensive code repository to tackle the challenges of complex control structures and multifunctional requirements in software development.</div></div><div><h3>Method:</h3><div>Our study introduces a Search-and-Fill strategy that utilizes natural language processing (NLP) to dissect complex software requirements. It extracts control structures and identifies atomic function points. By leveraging large-scale pre-trained models, the strategy searches for these elements to fill in the automatically transformed program structures derived from descriptions of control structures. This process generates a code snippet that includes program control structures and the implementations of various function points, thereby facilitating both code reuse and efficient development.</div></div><div><h3>Results:</h3><div>We have validated the effectiveness of our strategy in generating code snippets. For natural language requirements involving multifunctional complex structures, we constructed two datasets: the Basic Complex Requirements Dataset (BCRD) and the Advanced Complex Requirements Dataset (ACRD). These datasets are based on natural language descriptions and Python code that were randomly extracted and combined. For the code snippets to be generated, we achieved the best results with the ACRD dataset, with BLEU-4 scores reaching up to 0.6326 and TEDS scores peaking at 0.7807.</div></div><div><h3>Conclusion:</h3><div>The Search-and-Fill strategy successfully generates a comprehensive code snippets, integrating essential control structures and functions to streamline the development process. Experimental results substantiate our strategy’s efficacy in optimizing code reuse by effectively integrating preprocessing and selection optimization approach. Future research will focus on enhancing the recognition of complex software requirements and further refining the code snippets.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107584"},"PeriodicalIF":3.8000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001897","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context:
The realm of software development has seen significant transformations with the rise of Low-Code Development (LCD) and the integration of Artificial Intelligence (AI), particularly large language models, into coding practices. The proliferation of open-source software also offers vast resources for developers.
Objective:
We aim to combine the benefits of modifying retrieved code with the use of an extensive code repository to tackle the challenges of complex control structures and multifunctional requirements in software development.
Method:
Our study introduces a Search-and-Fill strategy that utilizes natural language processing (NLP) to dissect complex software requirements. It extracts control structures and identifies atomic function points. By leveraging large-scale pre-trained models, the strategy searches for these elements to fill in the automatically transformed program structures derived from descriptions of control structures. This process generates a code snippet that includes program control structures and the implementations of various function points, thereby facilitating both code reuse and efficient development.
Results:
We have validated the effectiveness of our strategy in generating code snippets. For natural language requirements involving multifunctional complex structures, we constructed two datasets: the Basic Complex Requirements Dataset (BCRD) and the Advanced Complex Requirements Dataset (ACRD). These datasets are based on natural language descriptions and Python code that were randomly extracted and combined. For the code snippets to be generated, we achieved the best results with the ACRD dataset, with BLEU-4 scores reaching up to 0.6326 and TEDS scores peaking at 0.7807.
Conclusion:
The Search-and-Fill strategy successfully generates a comprehensive code snippets, integrating essential control structures and functions to streamline the development process. Experimental results substantiate our strategy’s efficacy in optimizing code reuse by effectively integrating preprocessing and selection optimization approach. Future research will focus on enhancing the recognition of complex software requirements and further refining the code snippets.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.