CLIPPan: Adapting CLIP as A Supervisor for Unsupervised Pansharpening

2511.10896v1 eess.IV, cs.AI, cs.CV 2025-11-17

Авторы:

Lihua Jian, Jiabo Liu, Shaowu Wu, Lihui Chen

Abstract

Despite remarkable advancements in supervised pansharpening neural networks, these methods face domain adaptation challenges of resolution due to the intrinsic disparity between simulated reduced-resolution training data and real-world full-resolution scenarios.To bridge this gap, we propose an unsupervised pansharpening framework, CLIPPan, that enables model training at full resolution directly by taking CLIP, a visual-language model, as a supervisor. However, directly applying CLIP to supervise pansharpening remains challenging due to its inherent bias toward natural images and limited understanding of pansharpening tasks. Therefore, we first introduce a lightweight fine-tuning pipeline that adapts CLIP to recognize low-resolution multispectral, panchromatic, and high-resolution multispectral images, as well as to understand the pansharpening process. Then, building on the adapted CLIP, we formulate a novel \textit{loss integrating semantic language constraints}, which aligns image-level fusion transitions with protocol-aligned textual prompts (e.g., Wald's or Khan's descriptions), thus enabling CLIPPan to use language as a powerful supervisory signal and guide fusion learning without ground truth. Extensive experiments demonstrate that CLIPPan consistently improves spectral and spatial fidelity across various pansharpening backbones on real-world datasets, setting a new state of the art for unsupervised full-resolution pansharpening.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

CLIPPan: Adapting CLIP as A Supervisor for Unsupervised Pansharpening

Авторы:

Abstract

Ссылки и действия

Связанные статьи

MICCAI STS 2024 Challenge: Semi-Supervised Instance-Level Tooth Segmentation in ...

When Do Domain-Specific Foundation Models Justify Their Cost? A Systematic Evalu...

Adversarial Multi-Task Learning for Liver Tumor Segmentation, Dynamic Enhancemen...

Not Quite Anything: Overcoming SAMs Limitations for 3D Medical Imaging

Shape-Adapting Gated Experts: Dynamic Expert Routing for Colonoscopic Lesion Seg...

Навигация