Pareto-optimal Non-uniform Language Generation
2510.02795v1
cs.DS, cs.AI, cs.CL, cs.LG
2025-10-07
Авторы:
Moses Charikar, Chirag Pabbaraju
Abstract
Kleinberg and Mullainathan (2024) recently proposed an interesting model for
language generation in the limit: Given a countable collection of languages,
and an adversary enumerating the strings of some language $L$ from the
collection, the objective is to generate new strings from the target language,
such that all strings generated beyond some finite time are valid. Li, Raman
and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong non-uniform
generation guarantees in this model, giving algorithms that generate new valid
strings from $L$ after seeing a number of distinct input strings $t(L)$ that
depends only on $L$ (and the collection), but not the enumeration order.
However, for both these works, the language-wise generation times $t(L)$ of the
algorithm can be strictly sub-optimal.
In this work, we study Pareto-optimality of non-uniform language generation
in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are
(almost) Pareto-optimal: any other algorithm whose generation time for some
language $L$ is strictly smaller than $t^\star(L)$, must satisfy that its
generation time for some other language $L'$ is strictly worse than
$t^\star(L')$. Pareto-optimality is essentially the best that one can achieve
for non-uniform generation. Our algorithmic framework conveniently adapts to
further give Pareto-optimal non-uniform generation algorithms in the
practically motivated settings of noisy as well as representative generation.