Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches
2510.04905v1
cs.SE, cs.CL
2025-10-08
Авторы:
Yicheng Tao, Yao Qin, Yepang Liu
Abstract
Recent advancements in large language models (LLMs) have substantially
improved automated code generation. While function-level and file-level
generation have achieved promising results, real-world software development
typically requires reasoning across entire repositories. This gives rise to the
challenging task of Repository-Level Code Generation (RLCG), where models must
capture long-range dependencies, ensure global semantic consistency, and
generate coherent code spanning multiple files or modules. To address these
challenges, Retrieval-Augmented Generation (RAG) has emerged as a powerful
paradigm that integrates external retrieval mechanisms with LLMs, enhancing
context-awareness and scalability. In this survey, we provide a comprehensive
review of research on Retrieval-Augmented Code Generation (RACG), with an
emphasis on repository-level approaches. We categorize existing work along
several dimensions, including generation strategies, retrieval modalities,
model architectures, training paradigms, and evaluation protocols. Furthermore,
we summarize widely used datasets and benchmarks, analyze current limitations,
and outline key challenges and opportunities for future research. Our goal is
to establish a unified analytical framework for understanding this rapidly
evolving field and to inspire continued progress in AI-powered software
engineering.
Ссылки и действия
Дополнительные ресурсы: