Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures
2510.27190v1
cs.CR, cs.AI
2025-11-04
Авторы:
Dominik Schwarz
Abstract
As Large Language Models (LLMs) are increasingly integrated into automated,
multi-stage pipelines, risk patterns that arise from unvalidated trust between
processing stages become a practical concern. This paper presents a
mechanism-centered taxonomy of 41 recurring risk patterns in commercial LLMs.
The analysis shows that inputs are often interpreted non-neutrally and can
trigger implementation-shaped responses or unintended state changes even
without explicit commands. We argue that these behaviors constitute
architectural failure modes and that string-level filtering alone is
insufficient. To mitigate such cross-stage vulnerabilities, we recommend
zero-trust architectural principles, including provenance enforcement, context
sealing, and plan revalidation, and we introduce "Countermind" as a conceptual
blueprint for implementing these defenses.
Ссылки и действия
Дополнительные ресурсы: