📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 BeLLMan: Controlling LLM Congestion
2025-10-21Авторы:
Tella Rajashekhar Reddy, Atharva Deshmukh, Karan Tandon, Rohan Gandhi, Anjaly Parayil, Debopam Bhattacherjee
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language model (LLM) applications are blindfolded to the infrastructure
underneath and generate tokens autoregressively, indifferent to the system
load, thus risking inferencing latency inflation and poor user experience. Our
first-cut controller, named beLLMan, enables the LLM infrastructure to actively
and progressively signal the first-party LLM application to adjust the output
length in response to changing system load. On a real testbed with H100 GPUs,
beLLMan helps keep inferencing l...