📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 The AI Productivity Index (APEX)
2025-10-03Авторы:
Bertie Vidgen, Abby Fennelly, Evan Pinnix, Chirag Mahapatra, Zach Richards, Austin Bridges, Calix Huang, Ben Hunsberger, Fez Zafar, Brendan Foody, Dominic Barton, Cass R. Sunstein, Eric Topol, Osvald Nitski
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We introduce the first version of the AI Productivity Index (APEX), a
benchmark for assessing whether frontier AI models can perform knowledge work
with high economic value. APEX addresses one of the largest inefficiencies in
AI research: outside of coding, benchmarks often fail to test economically
relevant capabilities. APEX-v1.0 contains 200 test cases and covers four
domains: investment banking, management consulting, law, and primary medical
care. It was built in three steps. First, we sour...
📄 The AI Productivity Index (APEX)
2025-10-02Авторы:
Bertie Vidgen, Abby Fennelly, Evan Pinnix, Chirag Mahapatra, Zach Richards, Austin Bridges, Calix Huang, Ben Hunsberger, Fez Zafar, Brendan Foody, Dominic Barton, Cass R. Sunstein, Eric Topol, Osvald Nitski
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We introduce the first version of the AI Productivity Index (APEX), a
benchmark for assessing whether frontier AI models can perform knowledge work
with high economic value. APEX addresses one of the largest inefficiencies in
AI research: outside of coding, benchmarks often fail to test economically
relevant capabilities. APEX-v1.0 contains 200 test cases and covers four
domains: investment banking, management consulting, law, and primary medical
care. It was built in three steps. First, we sour...