A systematic exploration of C-to-rust code translation based on large language models: prompt strategies and automated repair

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-10-18 DOI:10.1007/s10515-025-00570-0

Ruxin Zhang, Shanxin Zhang, Linbo Xie

{"title":"A systematic exploration of C-to-rust code translation based on large language models: prompt strategies and automated repair","authors":"Ruxin Zhang, Shanxin Zhang, Linbo Xie","doi":"10.1007/s10515-025-00570-0","DOIUrl":null,"url":null,"abstract":"<div><p>C is widely used in system programming due to its low-level flexibility. However, as demands for memory safety and code reliability grow, Rust has become a more favorable alternative owing to its modern design principles. Migrating existing C code to Rust has therefore emerged as a key approach for enhancing the security and maintainability of software systems. Nevertheless, automating such migrations remains challenging due to fundamental differences between the two languages in terms of language design philosophy, type systems, and levels of abstraction. Most current code transformation tools focus on mappings of basic data types and syntactic replacements, such as handling pointers or conversion of lock mechanisms. These approaches often fail to deeply model the semantic features and programming paradigms of the target language. To address this limitation, this paper proposes RustFlow, a C-to-Rust code translation framework based on large language models (LLMs), designed to generate idiomatic and semantically accurate Rust code. This framework employs a multi-stage progressive architecture, which decomposes the overall translation task into several sequential stages, namely translation, validation, and repair. During the translation phase, a collaborative prompting strategy is employed to guide the LLM in achieving cross-language semantic alignment, thereby improving the accuracy of the generated code. Subsequently, a validation mechanism is introduced to perform syntactic and semantic checks on the generated output, and a conversational iterative repair strategy is employed to further enhance the quality of the final result. Experimental results show that RustFlow outperforms most of the latest baseline approaches, achieving an average improvement of 50.67% in translation performance compared to the base LLM. This work offers a novel technical approach and practical support for efficient and reliable cross-language code migration.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00570-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

C is widely used in system programming due to its low-level flexibility. However, as demands for memory safety and code reliability grow, Rust has become a more favorable alternative owing to its modern design principles. Migrating existing C code to Rust has therefore emerged as a key approach for enhancing the security and maintainability of software systems. Nevertheless, automating such migrations remains challenging due to fundamental differences between the two languages in terms of language design philosophy, type systems, and levels of abstraction. Most current code transformation tools focus on mappings of basic data types and syntactic replacements, such as handling pointers or conversion of lock mechanisms. These approaches often fail to deeply model the semantic features and programming paradigms of the target language. To address this limitation, this paper proposes RustFlow, a C-to-Rust code translation framework based on large language models (LLMs), designed to generate idiomatic and semantically accurate Rust code. This framework employs a multi-stage progressive architecture, which decomposes the overall translation task into several sequential stages, namely translation, validation, and repair. During the translation phase, a collaborative prompting strategy is employed to guide the LLM in achieving cross-language semantic alignment, thereby improving the accuracy of the generated code. Subsequently, a validation mechanism is introduced to perform syntactic and semantic checks on the generated output, and a conversational iterative repair strategy is employed to further enhance the quality of the final result. Experimental results show that RustFlow outperforms most of the latest baseline approaches, achieving an average improvement of 50.67% in translation performance compared to the base LLM. This work offers a novel technical approach and practical support for efficient and reliable cross-language code migration.

查看原文本刊更多论文

基于大型语言模型的c到rust代码翻译的系统探索：提示策略和自动修复

由于C语言具有较低的灵活性，在系统编程中得到了广泛的应用。然而，随着对内存安全性和代码可靠性需求的增长，由于其现代设计原则，Rust已成为更有利的选择。因此，将现有的C代码迁移到Rust已经成为增强软件系统安全性和可维护性的关键方法。然而，由于两种语言在语言设计哲学、类型系统和抽象级别方面的根本差异，自动化这样的迁移仍然具有挑战性。大多数当前的代码转换工具侧重于基本数据类型的映射和语法替换，例如处理指针或锁机制的转换。这些方法往往不能对目标语言的语义特征和编程范式进行深入的建模。为了解决这一限制，本文提出了RustFlow，一个基于大型语言模型（llm）的c到Rust代码翻译框架，旨在生成习惯用语和语义准确的Rust代码。该框架采用多阶段递进体系结构，将整个翻译任务分解为几个连续的阶段，即翻译、验证和修复。在翻译阶段，采用协同提示策略指导LLM实现跨语言语义对齐，从而提高生成代码的准确性。随后，引入验证机制对生成的输出执行语法和语义检查，并采用会话迭代修复策略进一步提高最终结果的质量。实验结果表明，RustFlow优于大多数最新的基线方法，与基础LLM相比，其翻译性能平均提高了50.67%。这项工作为高效可靠的跨语言代码迁移提供了一种新颖的技术方法和实际支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.