📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 While recognizing actions, LMMs struggle to detect core interaction events

2025-11-26

Авторы:

Daniel Harari, Michael Sidorov, Liel David, Chen Shterental, Abrham Kahsay Gebreselasie, Muhammad Haris Khan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large multi-modal models (LMMs) show increasing performance in realistic visual tasks for images and, more recently, for videos. For example, given a video sequence, such models are able to describe in detail objects, the surroundings and dynamic actions. In this study, we explored the extent to which these models ground their semantic understanding in the actual visual input. Specifically, given sequences of hands interacting with objects, we asked models when and where the interaction begins o...

ID: 2511.20162v1 cs.CV, cs.AI, q-bio.NC

arXiv PDF

📄 Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

2025-11-01

Авторы:

Roman Beliy, Amit Zalcher, Jonathan Kogman, Navve Wasserman, Michal Irani

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present "Brain-IT", a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all ...

ID: 2510.25976v1 cs.CV, cs.AI, q-bio.NC

arXiv PDF

📄 Scaling Vision Transformers for Functional MRI with Flat Maps

2025-10-17

Авторы:

Connor Lane, Daniel Z. Kaplan, Tanishq Mathew Abraham, Paul S. Scotti

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

A key question for adapting modern deep learning architectures to functional MRI (fMRI) is how to represent the data for model input. To bridge the modality gap between fMRI and natural images, we transform the 4D volumetric fMRI data into videos of 2D fMRI activity flat maps. We train Vision Transformers on 2.3K hours of fMRI flat map videos from the Human Connectome Project using the spatiotemporal masked autoencoder (MAE) framework. We observe that masked fMRI modeling performance improves wi...

ID: 2510.13768v1 cs.CV, cs.AI, q-bio.NC

arXiv PDF