📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Elliot L. Epstein, John Winnicki, Thanawat Sornwanee, Rajat Dwaraknath
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language models (LLMs) excel at numerical estimation but struggle to
correctly quantify uncertainty. We study how well LLMs construct confidence
intervals around their own answers and find that they are systematically
overconfident. To evaluate this behavior, we introduce FermiEval, a benchmark
of Fermi-style estimation questions with a rigorous scoring rule for confidence
interval coverage and sharpness. Across several modern models, nominal 99\%
intervals cover the true answer only 65\% ...