使用大型语言模型和人在循环反馈翻译代码

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-05-31 DOI:10.1016/j.infsof.2025.107785

Gabriele Dario De Siano, Anna Rita Fasolino, Giancarlo Sperlí, Andrea Vignali

{"title":"使用大型语言模型和人在循环反馈翻译代码","authors":"Gabriele Dario De Siano, Anna Rita Fasolino, Giancarlo Sperlí, Andrea Vignali","doi":"10.1016/j.infsof.2025.107785","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>In recent years, the code translation task has arisen as one of the major software issues in maintaining software quality during migration over complex infrastructure. This task involves human subjects with different background knowledge and could introduce errors due to the semantic gap between the programming languages and the complexity of the task. Generative Artificial Intelligence (AI) showed good capabilities in code generation, albeit this is highly dependent on the human factor.</div></div><div><h3>Objective:</h3><div>This paper investigates, from the human perspective, the use of three Generative AI tools (ChatGPT, Google Bard, and GitHub Copilot) in the context of translation tasks from code written in query languages to code written in framework-specific code languages, specifically focused on SQL dialects and PySpark. This translation is especially crucial during the migration from centralized architectures to cloud-based architectures.</div></div><div><h3>Methods:</h3><div>We evaluate the usefulness of these tools, the quality of the generated code, and their impact on performance. The models are tested with queries of various type in three different SQL dialects considering three usage scenarios of increasing complexity. It involves 15 participants with diverse programming backgrounds, who aim to solve tasks by interacting multiple times with the tools and manually changing the code.</div></div><div><h3>Results:</h3><div>The findings show a positive performance, demonstrating their reliability in generating coherent translations, achieving 100% precision in most tasks with a slight decrease in more complex scenarios, and producing well-documented code, with a response time of under 2 min, with Google Bard responding 50% faster than the others.</div></div><div><h3>Conclusion:</h3><div>In conclusion, this paper establishes a methodology and both quantitative and qualitative metrics for evaluating how generative AI tools streamline code translation, shifting the emphasis from production to refinement. It underscores the importance of continuously improving these tools to integrate them into developers’ workflows and to provide guidelines for intelligent use.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107785"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Translating code with Large Language Models and human-in-the-loop feedback\",\"authors\":\"Gabriele Dario De Siano, Anna Rita Fasolino, Giancarlo Sperlí, Andrea Vignali\",\"doi\":\"10.1016/j.infsof.2025.107785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>In recent years, the code translation task has arisen as one of the major software issues in maintaining software quality during migration over complex infrastructure. This task involves human subjects with different background knowledge and could introduce errors due to the semantic gap between the programming languages and the complexity of the task. Generative Artificial Intelligence (AI) showed good capabilities in code generation, albeit this is highly dependent on the human factor.</div></div><div><h3>Objective:</h3><div>This paper investigates, from the human perspective, the use of three Generative AI tools (ChatGPT, Google Bard, and GitHub Copilot) in the context of translation tasks from code written in query languages to code written in framework-specific code languages, specifically focused on SQL dialects and PySpark. This translation is especially crucial during the migration from centralized architectures to cloud-based architectures.</div></div><div><h3>Methods:</h3><div>We evaluate the usefulness of these tools, the quality of the generated code, and their impact on performance. The models are tested with queries of various type in three different SQL dialects considering three usage scenarios of increasing complexity. It involves 15 participants with diverse programming backgrounds, who aim to solve tasks by interacting multiple times with the tools and manually changing the code.</div></div><div><h3>Results:</h3><div>The findings show a positive performance, demonstrating their reliability in generating coherent translations, achieving 100% precision in most tasks with a slight decrease in more complex scenarios, and producing well-documented code, with a response time of under 2 min, with Google Bard responding 50% faster than the others.</div></div><div><h3>Conclusion:</h3><div>In conclusion, this paper establishes a methodology and both quantitative and qualitative metrics for evaluating how generative AI tools streamline code translation, shifting the emphasis from production to refinement. It underscores the importance of continuously improving these tools to integrate them into developers’ workflows and to provide guidelines for intelligent use.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"186 \",\"pages\":\"Article 107785\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925001247\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001247","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

背景：近年来，代码翻译任务已经成为在复杂基础架构迁移期间维护软件质量的主要软件问题之一。该任务涉及具有不同背景知识的人类受试者，并且可能由于编程语言之间的语义差距和任务的复杂性而引入错误。生成式人工智能（AI）在代码生成方面表现出良好的能力，尽管这高度依赖于人为因素。目的：本文从人类的角度研究了三种生成式人工智能工具（ChatGPT， b谷歌Bard和GitHub Copilot）在从查询语言编写的代码到框架特定代码语言编写的代码的翻译任务背景下的使用情况，特别关注SQL方言和PySpark。在从集中式体系结构迁移到基于云的体系结构期间，这种转换尤为重要。方法：我们评估这些工具的有用性、生成代码的质量以及它们对性能的影响。考虑到三种日益复杂的使用场景，使用三种不同SQL方言的各种类型的查询对模型进行了测试。它涉及15名具有不同编程背景的参与者，他们的目标是通过多次与工具交互并手动更改代码来解决任务。结果：研究结果显示了积极的性能，证明了它们在生成连贯翻译方面的可靠性，在大多数任务中实现100%的准确率，在更复杂的场景中略有下降，并且生成了良好的文档代码，响应时间低于2分钟，b谷歌Bard的响应速度比其他代码快50%。结论：总之，本文建立了一种方法以及定量和定性指标，用于评估生成式人工智能工具如何简化代码翻译，将重点从生产转移到改进。它强调了不断改进这些工具的重要性，以便将它们集成到开发人员的工作流中，并为智能使用提供指导方针。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Translating code with Large Language Models and human-in-the-loop feedback

Context:

In recent years, the code translation task has arisen as one of the major software issues in maintaining software quality during migration over complex infrastructure. This task involves human subjects with different background knowledge and could introduce errors due to the semantic gap between the programming languages and the complexity of the task. Generative Artificial Intelligence (AI) showed good capabilities in code generation, albeit this is highly dependent on the human factor.

Objective:

This paper investigates, from the human perspective, the use of three Generative AI tools (ChatGPT, Google Bard, and GitHub Copilot) in the context of translation tasks from code written in query languages to code written in framework-specific code languages, specifically focused on SQL dialects and PySpark. This translation is especially crucial during the migration from centralized architectures to cloud-based architectures.

Methods:

We evaluate the usefulness of these tools, the quality of the generated code, and their impact on performance. The models are tested with queries of various type in three different SQL dialects considering three usage scenarios of increasing complexity. It involves 15 participants with diverse programming backgrounds, who aim to solve tasks by interacting multiple times with the tools and manually changing the code.

Results:

The findings show a positive performance, demonstrating their reliability in generating coherent translations, achieving 100% precision in most tasks with a slight decrease in more complex scenarios, and producing well-documented code, with a response time of under 2 min, with Google Bard responding 50% faster than the others.

Conclusion:

In conclusion, this paper establishes a methodology and both quantitative and qualitative metrics for evaluating how generative AI tools streamline code translation, shifting the emphasis from production to refinement. It underscores the importance of continuously improving these tools to integrate them into developers’ workflows and to provide guidelines for intelligent use.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.