How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness
2510.01163v1
cs.LG, stat.ML, G.3; I.2.6
2025-10-04
Авторы:
Waïss Azizian, Ali Hasan
Abstract
The emergence of in-context learning (ICL) in large language models (LLMs)
remains poorly understood despite its consistent effectiveness, enabling models
to adapt to new tasks from only a handful of examples. To clarify and improve
these capabilities, we characterize how the statistical properties of the
pretraining distribution (e.g., tail behavior, coverage) shape ICL on numerical
tasks. We develop a theoretical framework that unifies task selection and
generalization, extending and sharpening earlier results, and show how
distributional properties govern sample efficiency, task retrieval, and
robustness. To this end, we generalize Bayesian posterior consistency and
concentration results to heavy-tailed priors and dependent sequences, better
reflecting the structure of LLM pretraining data. We then empirically study how
ICL performance varies with the pretraining distribution on challenging tasks
such as stochastic differential equations and stochastic processes with memory.
Together, these findings suggest that controlling key statistical properties of
the pretraining distribution is essential for building ICL-capable and reliable
LLMs.