Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
2510.02712v1
cs.CL, cs.AI, cs.LG
2025-10-07
Авторы:
Yubo Li, Ramayya Krishnan, Rema Padman
Abstract
Large Language Models (LLMs) have revolutionized conversational AI, yet their
robustness in extended multi-turn dialogues remains poorly understood. Existing
evaluation frameworks focus on static benchmarks and single-turn assessments,
failing to capture the temporal dynamics of conversational degradation that
characterize real-world interactions. In this work, we present the first
comprehensive survival analysis of conversational AI robustness, analyzing
36,951 conversation turns across 9 state-of-the-art LLMs to model failure as a
time-to-event process. Our survival modeling framework-employing Cox
proportional hazards, Accelerated Failure Time, and Random Survival Forest
approaches-reveals extraordinary temporal dynamics. We find that abrupt,
prompt-to-prompt(P2P) semantic drift is catastrophic, dramatically increasing
the hazard of conversational failure. In stark contrast, gradual, cumulative
drift is highly protective, vastly reducing the failure hazard and enabling
significantly longer dialogues. AFT models with interactions demonstrate
superior performance, achieving excellent discrimination and exceptional
calibration. These findings establish survival analysis as a powerful paradigm
for evaluating LLM robustness, offer concrete insights for designing resilient
conversational agents, and challenge prevailing assumptions about the necessity
of semantic consistency in conversational AI Systems.
Ссылки и действия
Дополнительные ресурсы: