📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 Evaluating LLMs for Historical Document OCR: A Methodological Framework for Digital Humanities
2025-10-10Авторы:
Maria Levchenko
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Digital humanities scholars increasingly use Large Language Models for
historical document digitization, yet lack appropriate evaluation frameworks
for LLM-based OCR. Traditional metrics fail to capture temporal biases and
period-specific errors crucial for historical corpus creation. We present an
evaluation methodology for LLM-based historical OCR, addressing contamination
risks and systematic biases in diplomatic transcription. Using 18th-century
Russian Civil font texts, we introduce novel m...