📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Manik Rana, Calissa Man, Anotida Expected Msiiwa, Jeffrey Paine, Kevin Zhu, Sunishchal Dev, Vasu Sharma, Ahan M R
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Goal changes are a defining feature of real world multi-turn interactions,
yet current agent benchmarks primarily evaluate static objectives or one-shot
tool use. We introduce AgentChangeBench, a benchmark explicitly designed to
measure how tool augmented language model agents adapt to mid dialogue goal
shifts across three enterprise domains. Our framework formalizes evaluation
through four complementary metrics: Task Success Rate (TSR) for effectiveness,
Tool Use Efficiency (TUE) for reliabilit...