📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

2025-12-04

Авторы:

Zichuan Lin, Yicheng Liu, Yang Yang, Lvfang Tao, Deheng Ye

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language Models (VLMs) have achieved remarkable success in visual question answering tasks, but their reliance on large numbers of visual tokens introduces significant computational overhead. While existing efficient VLM approaches reduce visual tokens through fixed-ratio compression, they operate passively and lack the ability to adapt to varying task requirements. This motivates a fundamental question: Can VLMs autonomously determine the minimum number of visual tokens required for each...

ID: 2512.03794v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Research on Brain Tumor Classification Method Based on Improved ResNet34 Network

2025-12-04

Авторы:

Yufeng Li, Wenchao Zhao, Bo Dang, Weimin Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Previously, image interpretation in radiology relied heavily on manual methods. However, manual classification of brain tumor medical images is time-consuming and labor-intensive. Even with shallow convolutional neural network models, the accuracy is not ideal. To improve the efficiency and accuracy of brain tumor image classification, this paper proposes a brain tumor classification model based on an improved ResNet34 network. This model uses the ResNet34 residual network as the backbone networ...

ID: 2512.03751v1 cs.CV, cs.AI

arXiv PDF

📄 PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical Adaptation

2025-12-04

Авторы:

Hania Ghouse, Maryam Alsharqi, Farhad R. Nezami, Muzammil Behzad

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Cardiac image analysis remains fragmented across tasks: anatomical segmentation, disease classification, and grounded clinical report generation are typically handled by separate networks trained under different data regimes. No existing framework unifies these objectives within a single architecture while retaining generalization across imaging modalities and datasets. We introduce PULSE, a multi-task vision-language framework built on self-supervised representations and optimized through a com...

ID: 2512.03848v1 cs.CV, cs.AI

arXiv PDF

📄 BlurDM: A Blur Diffusion Model for Image Deblurring

2025-12-04

Авторы:

Jin-Ting He, Fu-Jen Tsai, Yan-Tsung Peng, Min-Hung Chen, Chia-Wen Lin, Yen-Yu Lin

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Diffusion models show promise for dynamic scene deblurring; however, existing studies often fail to leverage the intrinsic nature of the blurring process within diffusion models, limiting their full potential. To address it, we present a Blur Diffusion Model (BlurDM), which seamlessly integrates the blur formation process into diffusion for image deblurring. Observing that motion blur stems from continuous exposure, BlurDM implicitly models the blur formation process through a dual-diffusion for...

ID: 2512.03979v1 cs.CV, cs.AI

arXiv PDF

📄 DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual Degradation

2025-12-04

Авторы:

Zexin Lin, Hawen Wan, Yebin Zhong, Xiaoqiang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language Models (VLMs) deployed in safety-critical applications such as autonomous driving must handle continuous visual streams under imperfect conditions. However, existing benchmarks focus on static, high-quality images and ignore temporal degradation and error propagation, which are critical failure modes where transient visual corruption induces hallucinations that persist across subsequent frames. We introduce DIQ-H, the first benchmark for evaluating VLM robustness under dynamic vi...

ID: 2512.03992v1 cs.CV, cs.AI

arXiv PDF

📄 Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

2025-12-04

Авторы:

Jialuo Li, Bin Li, Jiahao Li, Yan Lu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The application of Large Multimodal Models (LMMs) to long-form video understanding is constrained by limited context lengths and the computationally prohibitive cost of processing dense video tokens. Consequently, recent research has focused on query-aware frame selection, methods that often incur significant computational overhead. This paper challenges the assumption that such complex search mechanisms are universally necessary. We first identify and validate a query typology distinguishing be...

ID: 2512.04000v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation

2025-12-04

Авторы:

Hang Xu, Linjiang Huang, Feng Zhao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Test-time scaling (TTS) aims to achieve better results by increasing random sampling and evaluating samples based on rules and metrics. However, in text-to-image(T2I) diffusion models, most related works focus on search strategies and reward models, yet the impact of the stochastic characteristic of noise in T2I diffusion models on the method's performance remains unexplored. In this work, we analyze the effects of randomness in T2I diffusion models and explore a new format of randomness for TTS...

ID: 2512.03996v1 cs.CV, cs.AI

arXiv PDF

📄 On the Temporality for Sketch Representation Learning

2025-12-04

Авторы:

Marcelo Isaias de Moraes Junior, Moacir Antonelli Ponti

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced significantly, there is still a gap in understanding the true relevance of the temporal aspect to the quality of these representations. This work investigates whether it is indeed justifiable to treat sketches as sequences, as well as which internal orders play a more relevant role. The results indicate that, although the use of traditional...

ID: 2512.04007v1 cs.CV, cs.AI

arXiv PDF

📄 PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

2025-12-04

Авторы:

Xiaolong Li, Youping Gu, Xi Lin, Weijie Wang, Bohan Zhuang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Attention mechanisms are the core of foundation models, but their quadratic complexity remains a critical bottleneck for scaling. This challenge has driven the development of efficient attention mechanisms, with sparsity emerging as the dominant paradigm. Current methods typically retain or discard entire key-value blocks with binary masks, resulting in substantial information loss under high sparsity. To mitigate this gap, we present Pyramid Sparse Attention (PSA), a versatile module applicable...

ID: 2512.04025v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 Fast & Efficient Normalizing Flows and Applications of Image Generative Models

2025-12-04

Авторы:

Sandeep Nagar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges. The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, (2) introduction of a more effic...

ID: 2512.04039v1 cs.CV, cs.AI, cs.LG

arXiv PDF

Показано 41 - 50 из 2274 записей