ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text

2512.04125v1 cs.LG 2025-12-05

Авторы:

Kerry Luo, Michael Fu, Joshua Peguero, Husnain Malik, Anvay Patil, Joyce Lin, Megan Van Overborg, Ryan Sarmiento, Kevin Zhu

Abstract

Large language models (LLMs) have demonstrated several emergent behaviors with scale, including reasoning and fluency in long-form text generation. However, they continue to struggle with tasks requiring precise spatial and positional reasoning. ASCII art, a symbolic medium where characters encode structure and form, provides a unique probe of this limitation. We introduce ASCIIBench, a novel benchmark for evaluating both the generation and classification of ASCII-text images. ASCIIBench consists of a filtered dataset of 5,315 class-labeled ASCII images and is, to our knowledge, the first publicly available benchmark of its kind. Alongside the dataset, we release weights for a fine-tuned CLIP model adapted to capture ASCII structure, enabling the evaluation of LLM-generated ASCII art. Our analysis shows that cosine similarity over CLIP embeddings fails to separate most ASCII categories, yielding chance-level performance even for low-variance classes. In contrast, classes with high internal mean similarity exhibit clear discriminability, revealing that the bottleneck lies in representation rather than generational variance. These findings position ASCII art as a stress test for multimodal representations and motivate the development of new embedding methods or evaluation metrics tailored to symbolic visual modalities. All resources are available at https://github.com/ASCIIBench/ASCIIBench.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text

Авторы:

Abstract

Ссылки и действия

Связанные статьи

MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

Grokked Models are Better Unlearners

Multi-Modal Opinion Integration for Financial Sentiment Analysis using Cross-Mod...

Adaptive sampling using variational autoencoder and reinforcement learning

Навигация