How Patterns Dictate Learnability in Sequential Data

2510.10744v1 stat.ML, cs.IT, cs.LG, math.IT 2025-10-16

Авторы:

Mario Morawski, Anais Despres, Rémi Rehm

Abstract

Sequential data - ranging from financial time series to natural language - has driven the growing adoption of autoregressive models. However, these algorithms rely on the presence of underlying patterns in the data, and their identification often depends heavily on human expertise. Misinterpreting these patterns can lead to model misspecification, resulting in increased generalization error and degraded performance. The recently proposed evolving pattern (EvoRate) metric addresses this by using the mutual information between the next data point and its past to guide regression order estimation and feature selection. Building on this idea, we introduce a general framework based on predictive information, defined as the mutual information between the past and the future, $I(X_{past}; X_{future})$. This quantity naturally defines an information-theoretic learning curve, which quantifies the amount of predictive information available as the observation window grows. Using this formalism, we show that the presence or absence of temporal patterns fundamentally constrains the learnability of sequential models: even an optimal predictor cannot outperform the intrinsic information limit imposed by the data. We validate our framework through experiments on synthetic data, demonstrating its ability to assess model adequacy, quantify the inherent complexity of a dataset, and reveal interpretable structure in sequential data.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

How Patterns Dictate Learnability in Sequential Data

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantizati...

On the Theory of Continual Learning with Gradient Descent for Neural Networks

Навигация