Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis
2510.16371v1
cs.CV, cs.AI, cs.LG
2025-10-22
Авторы:
Mohammad Javad Ahmadi, Iman Gandomi, Parisa Abdi, Seyed-Farzad Mohammadi, Amirhossein Taslimi, Mehdi Khodaparast, Hassan Hashemi, Mahdi Tavakoli, Hamid D. Taghirad
Abstract
The development of computer-assisted surgery systems depends on large-scale,
annotated datasets. Current resources for cataract surgery often lack the
diversity and annotation depth needed to train generalizable deep-learning
models. To address this gap, we present a dataset of 3,000 phacoemulsification
cataract surgery videos from two surgical centers, performed by surgeons with a
range of experience levels. This resource is enriched with four annotation
layers: temporal surgical phases, instance segmentation of instruments and
anatomical structures, instrument-tissue interaction tracking, and quantitative
skill scores based on the established competency rubrics like the ICO-OSCAR.
The technical quality of the dataset is supported by a series of benchmarking
experiments for key surgical AI tasks, including workflow recognition, scene
segmentation, and automated skill assessment. Furthermore, we establish a
domain adaptation baseline for the phase recognition task by training a model
on a subset of surgical centers and evaluating its performance on a held-out
center. The dataset and annotations are available in Google Form
(https://docs.google.com/forms/d/e/1FAIpQLSfmyMAPSTGrIy2sTnz0-TMw08ZagTimRulbAQcWdaPwDy187A/viewform?usp=dialog).
Ссылки и действия
Дополнительные ресурсы: