Project-Level C-to-Rust Translation via Synergistic Integration of Knowledge Graphs and Large Language Models
2510.10956v1
cs.SE, cs.AI
2025-10-16
Авторы:
Zhiqiang Yuan, Wenjun Mao, Zhuo Chen, Xiyue Shang, Chong Wang, Yiling Lou, Xin Peng
Abstract
Translating C code into safe Rust is an effective way to ensure its memory
safety. Compared to rule-based translation which produces Rust code that
remains largely unsafe, LLM-based methods can generate more idiomatic and safer
Rust code because LLMs have been trained on vast amount of human-written
idiomatic code. Although promising, existing LLM-based methods still struggle
with project-level C-to-Rust translation. They typically partition a C project
into smaller units (\eg{} functions) based on call graphs and translate them
bottom-up to resolve program dependencies. However, this bottom-up,
unit-by-unit paradigm often fails to translate pointers due to the lack of a
global perspective on their usage. To address this problem, we propose a novel
C-Rust Pointer Knowledge Graph (KG) that enriches a code-dependency graph with
two types of pointer semantics: (i) pointer-usage information which record
global behaviors such as points-to flows and map lower-level struct usage to
higher-level units; and (ii) Rust-oriented annotations which encode ownership,
mutability, nullability, and lifetime. Synthesizing the \kg{} with LLMs, we
further propose \ourtool{}, which implements a project-level C-to-Rust
translation technique. In \ourtool{}, the \kg{} provides LLMs with
comprehensive pointer semantics from a global perspective, thus guiding LLMs
towards generating safe and idiomatic Rust code from a given C project. Our
experiments show that \ourtool{} reduces unsafe usages in translated Rust by
99.9\% compared to both rule-based translation and traditional LLM-based
rewriting, while achieving an average 29.3\% higher functional correctness than
those fuzzing-enhanced LLM methods.
Ссылки и действия
Дополнительные ресурсы: