ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents

2510.15557v1 cs.CV, cs.AI, eess.IV 2025-10-21

Авторы:

Tingyu Lin, Marco Peer, Florian Kleber, Robert Sablatnig

Abstract

This paper presents ClapperText, a benchmark dataset for handwritten and printed text recognition in visually degraded and low-resource settings. The dataset is derived from 127 World War II-era archival video segments containing clapperboards that record structured production metadata such as date, location, and camera-operator identity. ClapperText includes 9,813 annotated frames and 94,573 word-level text instances, 67% of which are handwritten and 1,566 are partially occluded. Each instance includes transcription, semantic category, text type, and occlusion status, with annotations available as rotated bounding boxes represented as 4-point polygons to support spatially precise OCR applications. Recognizing clapperboard text poses significant challenges, including motion blur, handwriting variation, exposure fluctuations, and cluttered backgrounds, mirroring broader challenges in historical document analysis where structured content appears in degraded, non-standard forms. We provide both full-frame annotations and cropped word images to support downstream tasks. Using a consistent per-video evaluation protocol, we benchmark six representative recognition and seven detection models under zero-shot and fine-tuned conditions. Despite the small training set (18 videos), fine-tuning leads to substantial performance gains, highlighting ClapperText's suitability for few-shot learning scenarios. The dataset offers a realistic and culturally grounded resource for advancing robust OCR and document understanding in low-resource archival contexts. The dataset and evaluation code are available at https://github.com/linty5/ClapperText.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents

Авторы:

Abstract

Ссылки и действия

Связанные статьи

C3Net: Context-Contrast Network for Camouflaged Object Detection

MSRNet: A Multi-Scale Recursive Network for Camouflaged Object Detection

Prompt-Conditioned FiLM and Multi-Scale Fusion on MedSigLIP for Low-Dose CT Qual...

Deep learning-based object detection of offshore platforms on Sentinel-1 Imagery...

Estimation of Segmental Longitudinal Strain in Transesophageal Echocardiography ...

Навигация