AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes
2510.23151v1
cs.CV, cs.LG
2025-10-29
Авторы:
Sixian Liu, Chen Xu, Qiang Wang, Donghai Shi, Yiwen Li
Abstract
Multimodal camera-LiDAR fusion technology has found extensive application in
3D object detection, demonstrating encouraging performance. However, existing
methods exhibit significant performance degradation in challenging scenarios
characterized by sensor degradation or environmental disturbances. We propose a
novel Adaptive Gated Fusion (AG-Fusion) approach that selectively integrates
cross-modal knowledge by identifying reliable patterns for robust detection in
complex scenes. Specifically, we first project features from each modality into
a unified BEV space and enhance them using a window-based attention mechanism.
Subsequently, an adaptive gated fusion module based on cross-modal attention is
designed to integrate these features into reliable BEV representations robust
to challenging environments. Furthermore, we construct a new dataset named
Excavator3D (E3D) focusing on challenging excavator operation scenarios to
benchmark performance in complex conditions. Our method not only achieves
competitive performance on the standard KITTI dataset with 93.92% accuracy, but
also significantly outperforms the baseline by 24.88% on the challenging E3D
dataset, demonstrating superior robustness to unreliable modal information in
complex industrial scenes.
Ссылки и действия
Дополнительные ресурсы: