{"title":"Generating Java code pairing with ChatGPT","authors":"Zelong Zhao, Nan Zhang, Bin Yu, Zhenhua Duan","doi":"10.1016/j.tcs.2024.114879","DOIUrl":null,"url":null,"abstract":"<div><p>The Large Language Models (LLMs) like ChatGPT 3.5 have created a new era of automatic code generation. However, the existing research primarily focuses on generating simple code based on datasets (such as HumanEval, etc.). Most of approaches pay less attention to complex and practical code generation. Therefore, in this paper, we propose a new approach called “Xd-CodeGen” which can be used to generate large scale Java code. This approach is composed of four phases: requirement analysis, modeling, code generation, and code verification. In the requirement analysis phase, ChatGPT 3.5 is utilized to decompose and restate user requirements. To do so, a knowledge graph is developed to describe entities and their relationship in detail. Further, Propositional Projection Temporal Logic (PPTL) formulas are employed to define the properties of requirements. In the modeling phase, we use knowledge graphs to enhance prompts and generate UML class and activity diagrams for each sub-requirement using ChatGPT 3.5. In the code generation phase, based on established UML models, we make use of prompt engineering and knowledge graph to generate Java code. In the code verification phase, a runtime verification at code level approach is employed to verify generated Java code. Finally, we apply the proposed approach to develop a practical Java web project.</p></div>","PeriodicalId":49438,"journal":{"name":"Theoretical Computer Science","volume":"1021 ","pages":"Article 114879"},"PeriodicalIF":0.9000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Computer Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304397524004961","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The Large Language Models (LLMs) like ChatGPT 3.5 have created a new era of automatic code generation. However, the existing research primarily focuses on generating simple code based on datasets (such as HumanEval, etc.). Most of approaches pay less attention to complex and practical code generation. Therefore, in this paper, we propose a new approach called “Xd-CodeGen” which can be used to generate large scale Java code. This approach is composed of four phases: requirement analysis, modeling, code generation, and code verification. In the requirement analysis phase, ChatGPT 3.5 is utilized to decompose and restate user requirements. To do so, a knowledge graph is developed to describe entities and their relationship in detail. Further, Propositional Projection Temporal Logic (PPTL) formulas are employed to define the properties of requirements. In the modeling phase, we use knowledge graphs to enhance prompts and generate UML class and activity diagrams for each sub-requirement using ChatGPT 3.5. In the code generation phase, based on established UML models, we make use of prompt engineering and knowledge graph to generate Java code. In the code verification phase, a runtime verification at code level approach is employed to verify generated Java code. Finally, we apply the proposed approach to develop a practical Java web project.
期刊介绍:
Theoretical Computer Science is mathematical and abstract in spirit, but it derives its motivation from practical and everyday computation. Its aim is to understand the nature of computation and, as a consequence of this understanding, provide more efficient methodologies. All papers introducing or studying mathematical, logic and formal concepts and methods are welcome, provided that their motivation is clearly drawn from the field of computing.