RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

2511.03939v1 cs.LG, cs.AI, cs.CL 2025-11-08

Авторы:

Raghav Sharma, Manan Mehta, Sai Tiger Raina

Abstract

Reinforcement Learning from Human Feedback (RLHF) is the standard for aligning Large Language Models (LLMs), yet recent progress has moved beyond canonical text-based methods. This survey synthesizes the new frontier of alignment research by addressing critical gaps in multi-modal alignment, cultural fairness, and low-latency optimization. To systematically explore these domains, we first review foundational algo- rithms, including PPO, DPO, and GRPO, before presenting a detailed analysis of the latest innovations. By providing a comparative synthesis of these techniques and outlining open challenges, this work serves as an essential roadmap for researchers building more robust, efficient, and equitable AI systems.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

Авторы:

Abstract

Ссылки и действия

Связанные статьи

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

Multi-LLM Collaboration for Medication Recommendation

Network of Theseus (like the ship)

SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

Mode-Conditioning Unlocks Superior Test-Time Scaling

Навигация