Diverse Human Value Alignment for Large Language Models via Ethical Reasoning
2511.00379v1
cs.AI, cs.CL
2025-11-06
Авторы:
Jiahao Wang, Songkai Xue, Jinghui Li, Xiaozhen Wang
Abstract
Ensuring that Large Language Models (LLMs) align with the diverse and
evolving human values across different regions and cultures remains a critical
challenge in AI ethics. Current alignment approaches often yield superficial
conformity rather than genuine ethical understanding, failing to address the
complex, context-dependent nature of human values. In this paper, we propose a
novel ethical reasoning paradigm for LLMs inspired by well-established ethical
decision-making models, aiming at enhancing diverse human value alignment
through deliberative ethical reasoning. Our framework consists of a structured
five-step process, including contextual fact gathering, hierarchical social
norm identification, option generation, multiple-lens ethical impact analysis,
and reflection. This theory-grounded approach guides LLMs through an
interpretable reasoning process that enhances their ability to understand
regional specificities and perform nuanced ethical analysis, which can be
implemented with either prompt engineering or supervised fine-tuning methods.
We perform evaluations on the SafeWorld benchmark that specially designed for
regional value alignment. Experimental results demonstrate our framework
significantly improves LLM alignment with diverse human values compared to
baseline methods, enabling more accurate social norm identification and more
culturally appropriate reasoning. Our work provides a concrete pathway toward
developing LLMs that align more effectively with the multifaceted values of
global societies through interdisciplinary research.
Ссылки и действия
Дополнительные ресурсы: