TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation
2510.25536v2
cs.CL, I.2.7; I.2.6; I.2.0
2025-10-31
Авторы:
Bangde Du, Minghao Guo, Songming He, Ziyi Ye, Xi Zhu, Weihang Su, Shuqi Zhu, Yujia Zhou, Yongfeng Zhang, Qingyao Ai, Yiqun Liu
Abstract
Large Language Models (LLMs) are exhibiting emergent human-like abilities and
are increasingly envisioned as the foundation for simulating an individual's
communication style, behavioral tendencies, and personality traits. However,
current evaluations of LLM-based persona simulation remain limited: most rely
on synthetic dialogues, lack systematic frameworks, and lack analysis of the
capability requirement. To address these limitations, we introduce TwinVoice, a
comprehensive benchmark for assessing persona simulation across diverse
real-world contexts. TwinVoice encompasses three dimensions: Social Persona
(public social interactions), Interpersonal Persona (private dialogues), and
Narrative Persona (role-based expression). It further decomposes the evaluation
of LLM performance into six fundamental capabilities, including opinion
consistency, memory recall, logical reasoning, lexical fidelity, persona tone,
and syntactic style. Experimental results reveal that while advanced models
achieve moderate accuracy in persona simulation, they still fall short of
capabilities such as syntactic style and memory recall. Consequently, the
average performance achieved by LLMs remains considerably below the human
baseline.