MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation
2510.10802v1
cs.CV, cs.AI, cs.LG, F.2.2, I.2.7
2025-10-15
Авторы:
Md Abdullah Al Mazid, Liangdong Deng, Naphtali Rishe
Abstract
Clouds remain a critical challenge in optical satellite imagery, hindering
reliable analysis for environmental monitoring, land cover mapping, and climate
research. To overcome this, we propose MSCloudCAM, a Cross-Attention with
Multi-Scale Context Network tailored for multispectral and multi-sensor cloud
segmentation. Our framework exploits the spectral richness of Sentinel-2
(CloudSEN12) and Landsat-8 (L8Biome) data to classify four semantic categories:
clear sky, thin cloud, thick cloud, and cloud shadow. MSCloudCAM combines a
Swin Transformer backbone for hierarchical feature extraction with multi-scale
context modules ASPP and PSP for enhanced scale-aware learning. A
Cross-Attention block enables effective multisensor and multispectral feature
fusion, while the integration of an Efficient Channel Attention Block (ECAB)
and a Spatial Attention Module adaptively refine feature representations.
Comprehensive experiments on CloudSEN12 and L8Biome demonstrate that MSCloudCAM
delivers state-of-the-art segmentation accuracy, surpassing leading baseline
architectures while maintaining competitive parameter efficiency and FLOPs.
These results underscore the model's effectiveness and practicality, making it
well-suited for large-scale Earth observation tasks and real-world
applications.