📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch
2025-11-25Авторы:
Ziyang Zhang, Xinheng Ding, Jiayi Yuan, Rixin Liu, Huizi Mao, Jiarong Xing, Zirui Liu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Deterministic inference is increasingly critical for large language model (LLM) applications such as LLM-as-a-judge evaluation, multi-agent systems, and Reinforcement Learning (RL). However, existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent redu...
Авторы:
Xuchen Gong, Tian Li
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Classic zeroth-order optimization approaches typically optimize for a
smoothed version of the original function, i.e., the expected objective under
randomly perturbed model parameters. This can be interpreted as encouraging the
loss values in the perturbation set to be small on average. Popular
sharpness-aware minimization (SAM) objectives, however, typically focus on the
largest loss within the neighborhood to arrive at flat minima more effectively.
In this work, we connect zeroth-order optimiz...
Авторы:
Jairo Diaz-Rodriguez, Mumin Jia
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Kernel change-point detection (KCPD) has become a widely used tool for
identifying structural changes in complex data. While existing theory
establishes consistency under independence assumptions, real-world sequential
data such as text exhibits strong dependencies. We establish new guarantees for
KCPD under $m$-dependent data: specifically, we prove consistency in the number
of detected change points and weak consistency in their locations under mild
additional assumptions. We perform an LLM-ba...
Авторы:
Nicholas Lourie, He He, Kyunghyun Cho
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Hyperparameters greatly impact models' capabilities; however, modern models
are too large for extensive search. Instead, researchers design recipes that
train well across scales based on their understanding of the hyperparameters.
Despite this importance, few tools exist for understanding the hyperparameter
loss surface. We discover novel structure in it and propose a new theory
yielding such tools. The loss surface is complex, but as you approach the
optimum simple structure emerges. It becomes...