📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Blazej Manczak, Eric Lin, Francisco Eiras, James O' Neill, Vaikkunth Mugunthan
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language models (LLMs) are rapidly transitioning into medical clinical
use, yet their reliability under realistic, multi-turn interactions remains
poorly understood. Existing evaluation frameworks typically assess single-turn
question answering under idealized conditions, overlooking the complexities of
medical consultations where conflicting input, misleading context, and
authority influence are common. We introduce MedQA-Followup, a framework for
systematically evaluating multi-turn robu...
Авторы:
Shinya Uryu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large Language Models (LLMs) are rapidly being adopted in conservation to
address the biodiversity crisis, yet their reliability for species evaluation
is uncertain. This study systematically validates five leading models on 21,955
species across four core IUCN Red List assessment components: taxonomy,
conservation status, distribution, and threats. A critical paradox was
revealed: models excelled at taxonomic classification (94.9%) but consistently
failed at conservation reasoning (27.2% for st...