ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents
2510.15557v1
cs.CV, cs.AI, eess.IV
2025-10-21
Авторы:
Tingyu Lin, Marco Peer, Florian Kleber, Robert Sablatnig
Abstract
This paper presents ClapperText, a benchmark dataset for handwritten and
printed text recognition in visually degraded and low-resource settings. The
dataset is derived from 127 World War II-era archival video segments containing
clapperboards that record structured production metadata such as date,
location, and camera-operator identity. ClapperText includes 9,813 annotated
frames and 94,573 word-level text instances, 67% of which are handwritten and
1,566 are partially occluded. Each instance includes transcription, semantic
category, text type, and occlusion status, with annotations available as
rotated bounding boxes represented as 4-point polygons to support spatially
precise OCR applications. Recognizing clapperboard text poses significant
challenges, including motion blur, handwriting variation, exposure
fluctuations, and cluttered backgrounds, mirroring broader challenges in
historical document analysis where structured content appears in degraded,
non-standard forms. We provide both full-frame annotations and cropped word
images to support downstream tasks. Using a consistent per-video evaluation
protocol, we benchmark six representative recognition and seven detection
models under zero-shot and fine-tuned conditions. Despite the small training
set (18 videos), fine-tuning leads to substantial performance gains,
highlighting ClapperText's suitability for few-shot learning scenarios. The
dataset offers a realistic and culturally grounded resource for advancing
robust OCR and document understanding in low-resource archival contexts. The
dataset and evaluation code are available at
https://github.com/linty5/ClapperText.
Ссылки и действия
Дополнительные ресурсы: