📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Chashi Mahiul Islam, Oteo Mamo, Samuel Jacob Chacko, Xiuwen Liu, Weikuan Yu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Vision-language models (VLMs) have advanced multimodal reasoning but still
face challenges in spatial reasoning for 3D scenes and complex object
configurations. To address this, we introduce SpatialViLT, an enhanced VLM that
integrates spatial features like depth maps, 3D coordinates, and edge maps
through a multi-task learning framework. This approach enriches multimodal
embeddings with spatial understanding. We propose two variants: SpatialViLT and
MaskedSpatialViLT, focusing on full and maske...