📊 Статистика дайджестов
Всего дайджестов: 35039 Добавлено сегодня: 432
Последнее обновление: сегодня
📄 Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live
2025-11-06Авторы:
Hanchen Li, Qiuyang Mang, Runyuan He, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Alvin Cheung, Joseph Gonzalez, Ion Stoica
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Agentic LLM applications interleave LLM generation requests with tool calls.
These tool calls break the continuity of the workflow by creating pauses
between LLM requests, bringing many challenges for the serving system,
especially under multi-turn scenarios. Each pause potentially causes KV cache
eviction and extra waiting time before entering the continuous batch for the
following LLM request. Since these pauses happen for each call, this problem
becomes increasingly severe as turn number grow...