ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
2510.10281v1
cs.CR, cs.AI, cs.CL, cs.CV, cs.LG
2025-10-15
Авторы:
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
Abstract
The integration of Large Language Models (LLMs) into computer applications
has introduced transformative capabilities but also significant security
challenges. Existing safety alignments, which primarily focus on semantic
interpretation, leave LLMs vulnerable to attacks that use non-standard data
representations. This paper introduces ArtPerception, a novel black-box
jailbreak framework that strategically leverages ASCII art to bypass the
security measures of state-of-the-art (SOTA) LLMs. Unlike prior methods that
rely on iterative, brute-force attacks, ArtPerception introduces a systematic,
two-phase methodology. Phase 1 conducts a one-time, model-specific pre-test to
empirically determine the optimal parameters for ASCII art recognition. Phase 2
leverages these insights to launch a highly efficient, one-shot malicious
jailbreak attack. We propose a Modified Levenshtein Distance (MLD) metric for a
more nuanced evaluation of an LLM's recognition capability. Through
comprehensive experiments on four SOTA open-source LLMs, we demonstrate
superior jailbreak performance. We further validate our framework's real-world
relevance by showing its successful transferability to leading commercial
models, including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, and by conducting
a rigorous effectiveness analysis against potential defenses such as LLaMA
Guard and Azure's content filters. Our findings underscore that true LLM
security requires defending against a multi-modal space of interpretations,
even within text-only inputs, and highlight the effectiveness of strategic,
reconnaissance-based attacks. Content Warning: This paper includes potentially
harmful and offensive model outputs.