Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer
2510.11128v1
cs.LG, cs.CV
2025-10-15
Авторы:
Qiyi Tong, Olivia Nocentini, Marta Lagomarsino, Kuanqi Cai, Marta Lorenzini, Arash Ajoudani
Abstract
Facial Landmark Detection (FLD) in thermal imagery is critical for
applications in challenging lighting conditions, but it is hampered by the lack
of rich visual cues. Conventional cross-modal solutions, like feature fusion or
image translation from RGB data, are often computationally expensive or
introduce structural artifacts, limiting their practical deployment. To address
this, we propose Multi-Level Cross-Modal Knowledge Distillation (MLCM-KD), a
novel framework that decouples high-fidelity RGB-to-thermal knowledge transfer
from model compression to create both accurate and efficient thermal FLD
models. A central challenge during knowledge transfer is the profound modality
gap between RGB and thermal data, where traditional unidirectional distillation
fails to enforce semantic consistency across disparate feature spaces. To
overcome this, we introduce Dual-Injected Knowledge Distillation (DIKD), a
bidirectional mechanism designed specifically for this task. DIKD establishes a
connection between modalities: it not only guides the thermal student with rich
RGB features but also validates the student's learned representations by
feeding them back into the frozen teacher's prediction head. This closed-loop
supervision forces the student to learn modality-invariant features that are
semantically aligned with the teacher, ensuring a robust and profound knowledge
transfer. Experiments show that our approach sets a new state-of-the-art on
public thermal FLD benchmarks, notably outperforming previous methods while
drastically reducing computational overhead.
Ссылки и действия
Дополнительные ресурсы: