📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Dario Loi, Elena Maria Muià, Federico Siciliano, Giovanni Trappolini, Vincenzo Crisà, Peter Kruger, Fabrizio Silvestri
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We present AutoBench, a fully automated and self-sustaining framework for
evaluating Large Language Models (LLMs) through reciprocal peer assessment.
This paper provides a rigorous scientific validation of the AutoBench
methodology, originally developed as an open-source project by eZecute S.R.L..
Unlike static benchmarks that suffer from test-set contamination and limited
adaptability, AutoBench dynamically generates novel evaluation tasks while
models alternately serve as question generators, ...