Gabriele Dario De Siano, Anna Rita Fasolino, Giancarlo Sperlí, Andrea Vignali
{"title":"使用大型语言模型和人在循环反馈翻译代码","authors":"Gabriele Dario De Siano, Anna Rita Fasolino, Giancarlo Sperlí, Andrea Vignali","doi":"10.1016/j.infsof.2025.107785","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>In recent years, the code translation task has arisen as one of the major software issues in maintaining software quality during migration over complex infrastructure. This task involves human subjects with different background knowledge and could introduce errors due to the semantic gap between the programming languages and the complexity of the task. Generative Artificial Intelligence (AI) showed good capabilities in code generation, albeit this is highly dependent on the human factor.</div></div><div><h3>Objective:</h3><div>This paper investigates, from the human perspective, the use of three Generative AI tools (ChatGPT, Google Bard, and GitHub Copilot) in the context of translation tasks from code written in query languages to code written in framework-specific code languages, specifically focused on SQL dialects and PySpark. This translation is especially crucial during the migration from centralized architectures to cloud-based architectures.</div></div><div><h3>Methods:</h3><div>We evaluate the usefulness of these tools, the quality of the generated code, and their impact on performance. The models are tested with queries of various type in three different SQL dialects considering three usage scenarios of increasing complexity. It involves 15 participants with diverse programming backgrounds, who aim to solve tasks by interacting multiple times with the tools and manually changing the code.</div></div><div><h3>Results:</h3><div>The findings show a positive performance, demonstrating their reliability in generating coherent translations, achieving 100% precision in most tasks with a slight decrease in more complex scenarios, and producing well-documented code, with a response time of under 2 min, with Google Bard responding 50% faster than the others.</div></div><div><h3>Conclusion:</h3><div>In conclusion, this paper establishes a methodology and both quantitative and qualitative metrics for evaluating how generative AI tools streamline code translation, shifting the emphasis from production to refinement. It underscores the importance of continuously improving these tools to integrate them into developers’ workflows and to provide guidelines for intelligent use.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107785"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Translating code with Large Language Models and human-in-the-loop feedback\",\"authors\":\"Gabriele Dario De Siano, Anna Rita Fasolino, Giancarlo Sperlí, Andrea Vignali\",\"doi\":\"10.1016/j.infsof.2025.107785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>In recent years, the code translation task has arisen as one of the major software issues in maintaining software quality during migration over complex infrastructure. This task involves human subjects with different background knowledge and could introduce errors due to the semantic gap between the programming languages and the complexity of the task. Generative Artificial Intelligence (AI) showed good capabilities in code generation, albeit this is highly dependent on the human factor.</div></div><div><h3>Objective:</h3><div>This paper investigates, from the human perspective, the use of three Generative AI tools (ChatGPT, Google Bard, and GitHub Copilot) in the context of translation tasks from code written in query languages to code written in framework-specific code languages, specifically focused on SQL dialects and PySpark. This translation is especially crucial during the migration from centralized architectures to cloud-based architectures.</div></div><div><h3>Methods:</h3><div>We evaluate the usefulness of these tools, the quality of the generated code, and their impact on performance. The models are tested with queries of various type in three different SQL dialects considering three usage scenarios of increasing complexity. It involves 15 participants with diverse programming backgrounds, who aim to solve tasks by interacting multiple times with the tools and manually changing the code.</div></div><div><h3>Results:</h3><div>The findings show a positive performance, demonstrating their reliability in generating coherent translations, achieving 100% precision in most tasks with a slight decrease in more complex scenarios, and producing well-documented code, with a response time of under 2 min, with Google Bard responding 50% faster than the others.</div></div><div><h3>Conclusion:</h3><div>In conclusion, this paper establishes a methodology and both quantitative and qualitative metrics for evaluating how generative AI tools streamline code translation, shifting the emphasis from production to refinement. It underscores the importance of continuously improving these tools to integrate them into developers’ workflows and to provide guidelines for intelligent use.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"186 \",\"pages\":\"Article 107785\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925001247\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001247","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Translating code with Large Language Models and human-in-the-loop feedback
Context:
In recent years, the code translation task has arisen as one of the major software issues in maintaining software quality during migration over complex infrastructure. This task involves human subjects with different background knowledge and could introduce errors due to the semantic gap between the programming languages and the complexity of the task. Generative Artificial Intelligence (AI) showed good capabilities in code generation, albeit this is highly dependent on the human factor.
Objective:
This paper investigates, from the human perspective, the use of three Generative AI tools (ChatGPT, Google Bard, and GitHub Copilot) in the context of translation tasks from code written in query languages to code written in framework-specific code languages, specifically focused on SQL dialects and PySpark. This translation is especially crucial during the migration from centralized architectures to cloud-based architectures.
Methods:
We evaluate the usefulness of these tools, the quality of the generated code, and their impact on performance. The models are tested with queries of various type in three different SQL dialects considering three usage scenarios of increasing complexity. It involves 15 participants with diverse programming backgrounds, who aim to solve tasks by interacting multiple times with the tools and manually changing the code.
Results:
The findings show a positive performance, demonstrating their reliability in generating coherent translations, achieving 100% precision in most tasks with a slight decrease in more complex scenarios, and producing well-documented code, with a response time of under 2 min, with Google Bard responding 50% faster than the others.
Conclusion:
In conclusion, this paper establishes a methodology and both quantitative and qualitative metrics for evaluating how generative AI tools streamline code translation, shifting the emphasis from production to refinement. It underscores the importance of continuously improving these tools to integrate them into developers’ workflows and to provide guidelines for intelligent use.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.