📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering

2025-12-04

Авторы:

Toshiki Katsube, Taiga Fukuhara, Kenichiro Ando, Yusuke Mukuta, Kohei Uehara, Tatsuya Harada

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This work addresses the scarcity of high-quality, large-scale resources for Japanese Vision-and-Language (V&L) modeling. We present a scalable and reproducible pipeline that integrates large-scale web collection with rigorous filtering/deduplication, object-detection-driven evidence extraction, and Large Language Model (LLM)-based refinement under grounding constraints. Using this pipeline, we build two resources: an image-caption dataset (DEJIMA-Cap) and a VQA dataset (DEJIMA-VQA), each contain...

ID: 2512.00773v1 cs.CV

arXiv PDF

📄 CircleFlow: Flow-Guided Camera Blur Estimation using a Circle Grid Target

2025-12-04

Авторы:

Jiajian He, Enjie Hu, Shiqi Chen, Tianchen Qiu, Huajun Feng, Zhihai Xu, Yueting Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The point spread function (PSF) serves as a fundamental descriptor linking the real-world scene to the captured signal, manifesting as camera blur. Accurate PSF estimation is crucial for both optical characterization and computational vision, yet remains challenging due to the inherent ambiguity and the ill-posed nature of intensity-based deconvolution. We introduce CircleFlow, a high-fidelity PSF estimation framework that employs flow-guided edge localization for precise blur characterization. ...

ID: 2512.00796v1 cs.CV

arXiv PDF

📄 PolarGS: Polarimetric Cues for Ambiguity-Free Gaussian Splatting with Accurate Geometry Recovery

2025-12-04

Авторы:

Bo Guo, Sijia Wen, Yifan Zhao, Jia Li, Zhiming Zheng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent advances in surface reconstruction for 3D Gaussian Splatting (3DGS) have enabled remarkable geometric accuracy. However, their performance degrades in photometrically ambiguous regions such as reflective and textureless surfaces, where unreliable cues disrupt photometric consistency and hinder accurate geometry estimation. Reflected light is often partially polarized in a manner that reveals surface orientation, making polarization an optic complement to photometric cues in resolving such...

ID: 2512.00794v1 cs.CV

arXiv PDF

📄 Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding

2025-12-04

Авторы:

Pengfei Hu, Meng Cao, Yingyao Wang, Yi Wang, Jiahua Dong, Jun Song, Yu Cheng, Bo Zheng, Xiaodan Liang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Long video understanding is essential for human-like intelligence, enabling coherent perception and reasoning over extended temporal contexts. While the emerging thinking-with-frames paradigm, which alternates between global temporal reasoning and local frame examination, has advanced the reasoning capabilities of video multi-modal large language models (MLLMs), it suffers from a significant efficiency bottleneck due to the progressively growing and redundant multi-modal context. To address this...

ID: 2512.00805v1 cs.CV

arXiv PDF

📄 PanFlow: Decoupled Motion Control for Panoramic Video Generation

2025-12-04

Авторы:

Cheng Zhang, Hanwen Liang, Donny Y. Chen, Qianyi Wu, Konstantinos N. Plataniotis, Camilo Cruz Gambardella, Jianfei Cai

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Panoramic video generation has attracted growing attention due to its applications in virtual reality and immersive media. However, existing methods lack explicit motion control and struggle to generate scenes with large and complex motions. We propose PanFlow, a novel approach that exploits the spherical nature of panoramas to decouple the highly dynamic camera rotation from the input optical flow condition, enabling more precise control over large and dynamic motions. We further introduce a sp...

ID: 2512.00832v1 cs.CV

arXiv PDF

📄 IRPO: Boosting Image Restoration via Post-training GRPO

2025-12-04

Авторы:

Haoxuan Xu. Yi Liu, Boyuan Jiang, Jinlong Peng, Donghao Luo, Xiaobin Hu, Shuicheng Yan, Haoang Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent advances in post-training paradigms have achieved remarkable success in high-level generation tasks, yet their potential for low-level vision remains rarely explored. Existing image restoration (IR) methods rely on pixel-level hard-fitting to ground-truth images, struggling with over-smoothing and poor generalization. To address these limitations, we propose IRPO, a low-level GRPO-based post-training paradigm that systematically explores both data formulation and reward modeling. We first...

ID: 2512.00814v1 cs.CV

arXiv PDF

📄 Neural Discrete Representation Learning for Sparse-View CBCT Reconstruction: From Algorithm Design to Prospective Multicenter Clinical Evaluation

2025-12-04

Авторы:

Haoshen Wang, Lei Chen, Wei-Hua Zhang, Linxia Wu, Yong Luo, Zengmao Wang, Yuan Xiong, Chengcheng Zhu, Wenjuan Tang, Xueyi Zhang, Wei Zhou, Xuhua Duan, Lefei Zhang, Gao-Jun Teng, Bo Du, Huangxuan Zhao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Cone beam computed tomography (CBCT)-guided puncture has become an established approach for diagnosing and treating early- to mid-stage thoracic tumours, yet the associated radiation exposure substantially elevates the risk of secondary malignancies. Although multiple low-dose CBCT strategies have been introduced, none have undergone validation using large-scale multicenter retrospective datasets, and prospective clinical evaluation remains lacking. Here, we propose DeepPriorCBCT - a three-stage...

ID: 2512.00873v1 cs.CV

arXiv PDF

📄 Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting

2025-12-04

Авторы:

Haishan Wang, Mohammad Hassan Vali, Arno Solin

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We present Smol-GS, a novel method for learning compact representations for 3D Gaussian Splatting (3DGS). Our approach learns highly efficient encodings in 3D space that integrate both spatial and semantic information. The model captures the coordinates of the splats through a recursive voxel hierarchy, while splat-wise features store abstracted cues, including color, opacity, transformation, and material properties. This design allows the model to compress 3D scenes by orders of magnitude witho...

ID: 2512.00850v1 cs.CV

arXiv PDF

📄 AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent

2025-12-04

Авторы:

Neeraj Anand, Rishabh Jain, Sohan Patnaik, Balaji Krishnamurthy, Mausoom Sarkar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

There is a growing demand for mobile user interface (UI) automation, driven by its broad applications across industries. With the advent of visual language models (VLMs), GUI automation has progressed from generating text-based instructions for humans to autonomously executing tasks, thus optimizing automation workflows. Recent approaches leverage VLMs for this problem due to their ability to 1) process on-screen content directly, 2) remain independent of device-specific APIs by utilizing human ...

ID: 2512.00846v1 cs.CV

arXiv PDF

📄 Quantum-Inspired Spectral Geometry for Neural Operator Equivalence and Structured Pruning

2025-12-04

Авторы:

Haijian Shao, Wei Liu, Xing Deng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The rapid growth of multimodal intelligence on resource-constrained and heterogeneous domestic hardware exposes critical bottlenecks: multimodal feature heterogeneity, real-time requirements in dynamic scenarios, and hardware-specific operator redundancy. This work introduces a quantum-inspired geometric framework for neural operators that represents each operator by its normalized singular value spectrum on the Bloch hypersphere. We prove a tight spectral-to-functional equivalence theorem showi...

ID: 2512.00880v1 cs.CV

arXiv PDF

1
2
17
18
19
20
21
1161
1162

Показано 181 - 190 из 11614 записей