RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
2511.03939v1
cs.LG, cs.AI, cs.CL
2025-11-08
Авторы:
Raghav Sharma, Manan Mehta, Sai Tiger Raina
Abstract
Reinforcement Learning from Human Feedback (RLHF) is the standard for
aligning Large Language Models (LLMs), yet recent progress has moved beyond
canonical text-based methods. This survey synthesizes the new frontier of
alignment research by addressing critical gaps in multi-modal alignment,
cultural fairness, and low-latency optimization. To systematically explore
these domains, we first review foundational algo- rithms, including PPO, DPO,
and GRPO, before presenting a detailed analysis of the latest innovations. By
providing a comparative synthesis of these techniques and outlining open
challenges, this work serves as an essential roadmap for researchers building
more robust, efficient, and equitable AI systems.
Ссылки и действия
Дополнительные ресурсы: