Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

2510.04905v1 cs.SE, cs.CL 2025-10-08

Авторы:

Yicheng Tao, Yao Qin, Yepang Liu

Abstract

Recent advancements in large language models (LLMs) have substantially improved automated code generation. While function-level and file-level generation have achieved promising results, real-world software development typically requires reasoning across entire repositories. This gives rise to the challenging task of Repository-Level Code Generation (RLCG), where models must capture long-range dependencies, ensure global semantic consistency, and generate coherent code spanning multiple files or modules. To address these challenges, Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm that integrates external retrieval mechanisms with LLMs, enhancing context-awareness and scalability. In this survey, we provide a comprehensive review of research on Retrieval-Augmented Code Generation (RACG), with an emphasis on repository-level approaches. We categorize existing work along several dimensions, including generation strategies, retrieval modalities, model architectures, training paradigms, and evaluation protocols. Furthermore, we summarize widely used datasets and benchmarks, analyze current limitations, and outline key challenges and opportunities for future research. Our goal is to establish a unified analytical framework for understanding this rapidly evolving field and to inspire continued progress in AI-powered software engineering.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-...

Bias Testing and Mitigation in Black Box LLMs using Metamorphic Relations

From Code Foundation Models to Agents and Applications: A Practical Guide to Cod...

M, Toolchain and Language for Reusable Model Compilation

Show and Tell: Prompt Strategies for Style Control in Multi-Turn LLM Code Genera...

Навигация