Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models

2510.27009v1 cs.AI, cs.LG, stat.ML 2025-11-04

Авторы:

Jared Junkin, Samuel Nathanson

Abstract

Language models are traditionally designed around causal masking. In domains with spatial or relational structure, causal masking is often viewed as inappropriate, and sequential linearizations are instead used. Yet the question of whether it is viable to accept the information loss introduced by causal masking on nonsequential data has received little direct study, in part because few domains offer both spatial and sequential representations of the same dataset. In this work, we investigate this issue in the domain of chess, which naturally supports both representations. We train language models with bidirectional and causal self-attention mechanisms on both spatial (board-based) and sequential (move-based) data. Our results show that models trained on spatial board states - \textit{even with causal masking} - consistently achieve stronger playing strength than models trained on sequential data. While our experiments are conducted on chess, our results are methodological and may have broader implications: applying causal masking to spatial data is a viable procedure for training unimodal LLMs on spatial data, and in some domains is even preferable to sequentialization.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detect...

Epidemiology of Large Language Models: A Benchmark for Observational Distributio...

The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity

Understanding the Role of Training Data in Test-Time Scaling

Навигация