📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 LLMs are Overconfident: Evaluating Confidence Interval Calibration with FermiEval

2025-11-04

Авторы:

Elliot L. Epstein, John Winnicki, Thanawat Sornwanee, Rajat Dwaraknath

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) excel at numerical estimation but struggle to correctly quantify uncertainty. We study how well LLMs construct confidence intervals around their own answers and find that they are systematically overconfident. To evaluate this behavior, we introduce FermiEval, a benchmark of Fermi-style estimation questions with a rigorous scoring rule for confidence interval coverage and sharpness. Across several modern models, nominal 99\% intervals cover the true answer only 65\% ...

ID: 2510.26995v1 stat.ME, cs.AI, cs.LG, I.2.7

arXiv PDF