DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
2510.15501v1
cs.CL, cs.AI, cs.LG
2025-10-21
Авторы:
Yao Huang, Yitong Sun, Yichi Zhang, Ruochen Zhang, Yinpeng Dong, Xingxing Wei
Abstract
Despite the remarkable advances of Large Language Models (LLMs) across
diverse cognitive tasks, the rapid enhancement of these capabilities also
introduces emergent deceptive behaviors that may induce severe risks in
high-stakes deployments. More critically, the characterization of deception
across realistic real-world scenarios remains underexplored. To bridge this
gap, we establish DeceptionBench, the first benchmark that systematically
evaluates how deceptive tendencies manifest across different societal domains,
what their intrinsic behavioral patterns are, and how extrinsic factors affect
them. Specifically, on the static count, the benchmark encompasses 150
meticulously designed scenarios in five domains, i.e., Economy, Healthcare,
Education, Social Interaction, and Entertainment, with over 1,000 samples,
providing sufficient empirical foundations for deception analysis. On the
intrinsic dimension, we explore whether models exhibit self-interested egoistic
tendencies or sycophantic behaviors that prioritize user appeasement. On the
extrinsic dimension, we investigate how contextual factors modulate deceptive
outputs under neutral conditions, reward-based incentivization, and coercive
pressures. Moreover, we incorporate sustained multi-turn interaction loops to
construct a more realistic simulation of real-world feedback dynamics.
Extensive experiments across LLMs and Large Reasoning Models (LRMs) reveal
critical vulnerabilities, particularly amplified deception under reinforcement
dynamics, demonstrating that current models lack robust resistance to
manipulative contextual cues and the urgent need for advanced safeguards
against various deception behaviors. Code and resources are publicly available
at https://github.com/Aries-iai/DeceptionBench.
Ссылки и действия
Дополнительные ресурсы: