📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Not All Birds Look The Same: Identity-Preserving Generation For Birds

2025-12-05

Авторы:

Aaron Sun, Oindrila Saha, Subhransu Maji

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Since the advent of controllable image generation, increasingly rich modes of control have enabled greater customization and accessibility for everyday users. Zero-shot, identity-preserving models such as Insert Anything and OminiControl now support applications like virtual try-on without requiring additional fine-tuning. While these models may be fitting for humans and rigid everyday objects, they still have limitations for non-rigid or fine-grained categories. These domains often lack accessi...

ID: 2512.04485v1 cs.CV

arXiv PDF

📄 DeRA: Decoupled Representation Alignment for Video Tokenization

2025-12-05

Авторы:

Pengbo Guo, Junke Wang, Zhen Xing, Chengxu Liu, Daoguo Dong, Xueming Qian, Zuxuan Wu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper presents DeRA, a novel 1D video tokenizer that decouples the spatial-temporal representation learning in video tokenization to achieve better training efficiency and performance. Specifically, DeRA maintains a compact 1D latent space while factorizing video encoding into appearance and motion streams, which are aligned with pretrained vision foundation models to capture the spatial semantics and temporal dynamics in videos separately. To address the gradient conflicts introduced by th...

ID: 2512.04483v1 cs.CV

arXiv PDF

📄 Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal

2025-12-05

Авторы:

Tianci Huo, Lingfeng Qi, Yuhan Chen, Qihong Xue, Jinyuan Shao, Hai Yu, Jie Li, Zhanhua Zhang, Guofa Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Inevitable specular highlights in practical environments severely impair the visual performance, thus degrading the task effectiveness and efficiency. Although there exist considerable methods that focus on local information from convolutional neural network models or global information from transformer models, the single-type model falls into a modeling dilemma between local fine-grained details and global long-range dependencies, thus deteriorating for specular highlights with different scales...

ID: 2512.04496v1 cs.CV

arXiv PDF

📄 Controllable Long-term Motion Generation with Extended Joint Targets

2025-12-05

Авторы:

Eunjong Lee, Eunhee Kim, Sanghoon Hong, Eunho Jung, Jihoon Kim

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generating stable and controllable character motion in real-time is a key challenge in computer animation. Existing methods often fail to provide fine-grained control or suffer from motion degradation over long sequences, limiting their use in interactive applications. We propose COMET, an autoregressive framework that runs in real time, enabling versatile character control and robust long-horizon synthesis. Our efficient Transformer-based conditional VAE allows for precise, interactive control ...

ID: 2512.04487v1 cs.CV

arXiv PDF

📄 UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers

2025-12-05

Авторы:

Min Zhao, Bokai Yan, Xue Yang, Hongzhou Zhu, Jintao Zhang, Shilong Liu, Chongxuan Li, Jun Zhu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent image diffusion transformers achieve high-fidelity generation, but struggle to generate images beyond these scales, suffering from content repetition and quality degradation. In this work, we present UltraImage, a principled framework that addresses both issues. Through frequency-wise analysis of positional embeddings, we identify that repetition arises from the periodicity of the dominant frequency, whose period aligns with the training resolution. We introduce a recursive dominant frequ...

ID: 2512.04504v1 cs.CV

arXiv PDF

📄 Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model

2025-12-05

Авторы:

Yuduo Jin, Brandon Haworth

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Diffusion models have emerged as a widely utilized and successful methodology in human motion synthesis. Task-oriented diffusion models have significantly advanced action-to-motion, text-to-motion, and audio-to-motion applications. In this paper, we investigate fundamental questions regarding motion representations and loss functions in a controlled study, and we enumerate the impacts of various decisions in the workflow of the generative motion diffusion model. To answer these questions, we con...

ID: 2512.04499v1 cs.CV, cs.GR

arXiv PDF

📄 EgoLCD: Egocentric Video Generation with Long Context Diffusion

2025-12-05

Авторы:

Liuzhou Zhang, Jiarui Ye, Yuanlei Wang, Ming Zhong, Mingju Cao, Wanke Xia, Bowen Zeng, Zeyu Zhang, Hao Tang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from content drift, where object identity and scene semantics degrade over time. To address this challenge, we introduce EgoLCD, an end-to-end framework for egocentric long-context video generation that treats long video synthesis as a problem of efficient and stable memory management. EgoLCD combines a Long-Term Spars...

ID: 2512.04515v1 cs.CV

arXiv PDF

📄 DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance

2025-12-05

Авторы:

Yinghui Xing, Xiaoting Su, Shizhou Zhang, Donghao Chu, Di Xu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Infrared imaging plays a critical role in low-light and adverse weather conditions. However, due to the distinct characteristics of infrared images, existing foundation models such as Masked Autoencoder (MAE) trained on visible data perform suboptimal in infrared image interpretation tasks. To bridge this gap, an infrared foundation model known as InfMAE was developed and pre-trained on large-scale infrared datasets. Despite its effectiveness, InfMAE still faces several limitations, including th...

ID: 2512.04511v1 cs.CV

arXiv PDF

📄 Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation

2025-12-05

Авторы:

Chenlin Xu, Lei Zhang, Lituan Wang, Xinyu Pu, Pengfei Ma, Guangwu Qian, Zizhou Wang, Yan Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Due to the scarcity of annotated data and the substantial computational costs of model, conventional tuning methods in medical image segmentation face critical challenges. Current approaches to adapting pretrained models, including full-parameter and parameter-efficient fine-tuning, still rely heavily on task-specific training on downstream tasks. Therefore, zero-shot segmentation has gained increasing attention, especially with foundation models such as SAM demonstrating promising generalizatio...

ID: 2512.04520v1 cs.CV

arXiv PDF

📄 VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

2025-12-05

Авторы:

Yifei Yu, Xiaoshan Wu, Xinting Hu, Tao Hu, Yangtian Sun, Xiaoyang Lyu, Bo Wang, Lin Ma, Yuewen Ma, Zhongrui Wang, Xiaojuan Qi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Autoregressive (AR) diffusion enables streaming, interactive long-video generation by producing frames causally, yet maintaining coherence over minute-scale horizons remains challenging due to accumulated errors, motion drift, and content repetition. We approach this problem from a memory perspective, treating video synthesis as a recurrent dynamical process that requires coordinated short- and long-term context. We propose VideoSSM, a Long Video Model that unifies AR diffusion with a hybrid sta...

ID: 2512.04519v1 cs.CV

arXiv PDF

1
2
3
4
5
6
7
1161
1162

Показано 41 - 50 из 11614 записей