📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Thomas W. Mitchel, Hyunwoo Ryu, Vincent Sitzmann
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this paper, we identify that the key criterion for determining whether a
model is truly capable of novel view synthesis (NVS) is transferability:
Whether any pose representation extracted from one video sequence can be used
to re-render the same camera trajectory in another. We analyze prior work on
self-supervised NVS and find that their predicted poses do not transfer: The
same set of poses lead to different camera trajectories in different 3D scenes.
Here, we present XFactor, the first geo...
Авторы:
Zian Li, Muhan Zhang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Masked autoregressive models (MAR) have recently emerged as a powerful
paradigm for image and video generation, combining the flexibility of masked
modeling with the potential of continuous tokenizer. However, video MAR models
suffer from two major limitations: the slow-start problem, caused by the lack
of a structured global prior at early sampling stages, and error accumulation
across the autoregression in both spatial and temporal dimensions. In this
work, we propose CanvasMAR, a novel video ...
Авторы:
Mustafa Munir, Alex Zhang, Radu Marculescu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Vision graph neural networks (ViG) have demonstrated promise in vision tasks
as a competitive alternative to conventional convolutional neural nets (CNN)
and transformers (ViTs); however, common graph construction methods, such as
k-nearest neighbor (KNN), can be expensive on larger images. While methods such
as Sparse Vision Graph Attention (SVGA) have shown promise, SVGA's fixed step
scale can lead to over-squashing and missing multiple connections to gain the
same information that could be ga...
Авторы:
Junhong Shen, Mu Cai, Bo Hu, Ameet Talwalkar, David A Ross, Cordelia Schmid, Alireza Fathi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Multimodal Large Language Models (MLLMs) struggle with precise reasoning for
structured visuals like charts and diagrams, as pixel-based perception lacks a
mechanism for verification. To address this, we propose to leverage derendering
-- the process of reverse-engineering visuals into executable code -- as a new
modality for verifiable visual reasoning. Specifically, we propose RECODE, an
agentic framework that first generates multiple candidate programs to reproduce
the input image. It then us...
📄 Deep Attention-guided Adaptive Subsampling
2025-10-16Авторы:
Sharath M Shankaranarayana, Soumava Kumar Roy, Prasad Sudhakar, Chandan Aladahalli
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Although deep neural networks have provided impressive gains in performance,
these improvements often come at the cost of increased computational complexity
and expense. In many cases, such as 3D volume or video classification tasks,
not all slices or frames are necessary due to inherent redundancies. To address
this issue, we propose a novel learnable subsampling framework that can be
integrated into any neural network architecture. Subsampling, being a
nondifferentiable operation, poses signif...
Авторы:
A. Alfarano, L. Venturoli, D. Negueruela del Castillo
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Multimodal Large Language Models (MLLMs) have demonstrated significant
capabilities in joint visual and linguistic tasks. However, existing Visual
Question Answering (VQA) benchmarks often fail to evaluate deep semantic
understanding, particularly in complex domains like visual art analysis.
Confined to simple syntactic structures and surface-level attributes, these
questions fail to capture the diversity and depth of human visual inquiry. This
limitation incentivizes models to exploit statistic...
Авторы:
Kevin Li, Manuel Brack, Sudeep Katakol, Hareesh Ravi, Ajinkya Kale
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Although recent advances in visual generation have been remarkable, most
existing architectures still depend on distinct encoders for images and text.
This separation constrains diffusion models' ability to perform cross-modal
reasoning and knowledge transfer. Prior attempts to bridge this gap often use
the last layer information from VLM, employ multiple visual encoders, or train
large unified models jointly for text and image generation, which demands
substantial computational resources and la...
Авторы:
Caner Korkmaz, Brighton Nuwagira, Barış Coşkunuzer, Tolga Birdal
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We present CuMPerLay, a novel differentiable vectorization layer that enables
the integration of Cubical Multiparameter Persistence (CMP) into deep learning
pipelines. While CMP presents a natural and powerful way to topologically work
with images, its use is hindered by the complexity of multifiltration
structures as well as the vectorization of CMP. In face of these challenges, we
introduce a new algorithm for vectorizing MP homologies of cubical complexes.
Our CuMPerLay decomposes the CMP int...
Авторы:
Jack Krolik, Jake Lynn, John Henry Rudden, Dmytro Vremenko
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This study explores the application of deep learning techniques in the
automated detection and segmentation of brain tumors from MRI scans. We employ
several machine learning models, including basic logistic regression,
Convolutional Neural Networks (CNNs), and Residual Networks (ResNet) to
classify brain tumors effectively. Additionally, we investigate the use of
U-Net for semantic segmentation and EfficientDet for anchor-based object
detection to enhance the localization and identification of ...
Авторы:
Zhao-Yang Wang, Jieneng Chen, Jiang Liu, Yuxiang Guo, Rama Chellappa
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Gait recognition, a fundamental biometric technology, leverages unique
walking patterns for individual identification, typically using 2D
representations such as silhouettes or skeletons. However, these methods often
struggle with viewpoint variations, occlusions, and noise. Multi-modal
approaches that incorporate 3D body shape information offer improved robustness
but are computationally expensive, limiting their feasibility for real-time
applications. To address these challenges, we introduce ...
Показано 151 -
160
из 358 записей