📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 XAI-Driven Skin Disease Classification: Leveraging GANs to Augment ResNet-50 Performance

2025-12-02

Авторы:

Kim Gerard A. Villanueva, Priyanka Kumar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Accurate and timely diagnosis of multi-class skin lesions is hampered by subjective methods, inherent data imbalance in datasets like HAM10000, and the "black box" nature of Deep Learning (DL) models. This study proposes a trustworthy and highly accurate Computer-Aided Diagnosis (CAD) system to overcome these limitations. The approach utilizes Deep Convolutional Generative Adversarial Networks (DCGANs) for per class data augmentation to resolve the critical class imbalance problem. A fine-tuned ...

ID: 2512.00626v1 cs.CV, cs.AI

arXiv PDF

📄 MambaScope: Coarse-to-Fine Scoping for Efficient Vision Mamba

2025-12-02

Авторы:

Shanhui Liu, Rui Xu, Yunke Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision Mamba has emerged as a promising and efficient alternative to Vision Transformers, yet its efficiency remains fundamentally constrained by the number of input tokens. Existing token reduction approaches typically adopt token pruning or merging to reduce computation. However, they inherently lead to information loss, as they discard or compress token representations. This problem is exacerbated when applied uniformly to fine-grained token representations across all images, regardless of vi...

ID: 2512.00647v1 cs.CV, cs.AI

arXiv PDF

📄 Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition

2025-12-02

Авторы:

Razieh Ghaedi, AmirReza BabaAhmadi, Reyer Zwiggelaar, Xinqi Fan, Nashid Alam

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Cross-domain facial expression recognition (CD-FER) remains difficult due to severe domain shift between training and deployment data. We propose Graph-Attention Network with Adversarial Domain Alignment (GAT-ADA), a hybrid framework that couples a ResNet-50 as backbone with a batch-level Graph Attention Network (GAT) to model inter-sample relations under shift. Each mini-batch is cast as a sparse ring graph so that attention aggregates cross-sample cues that are informative for adaptation. To a...

ID: 2512.00641v1 cs.CV, cs.AI

arXiv PDF

📄 Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer

2025-12-02

Авторы:

Dong In Lee, Hyungjun Doh, Seunggeun Chi, Runlin Duan, Sangpil Kim, Karthik Ramani

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent progress in 4D representations, such as Dynamic NeRF and 4D Gaussian Splatting (4DGS), has enabled dynamic 4D scene reconstruction. However, text-driven 4D scene editing remains under-explored due to the challenge of ensuring both multi-view and temporal consistency across space and time during editing. Existing studies rely on 2D diffusion models that edit frames independently, often causing motion distortion, geometric drift, and incomplete editing. We introduce Dynamic-eDiTor, a traini...

ID: 2512.00677v1 cs.CV, cs.AI

arXiv PDF

📄 Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation

2025-12-02

Авторы:

Chengzhi Yu, Yifan Xu, Yifan Chen, Wenyi Zhang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recently, large vision-language models (LVLMs) have risen to be a promising approach for multimodal tasks. However, principled hallucination mitigation remains a critical challenge.In this work, we first analyze the data generation process in LVLM hallucination mitigation and affirm that on-policy data significantly outperforms off-policy data, which thus calls for efficient and reliable preference annotation of on-policy data. We then point out that, existing annotation methods introduce additi...

ID: 2512.00706v1 cs.CV, cs.AI

arXiv PDF

📄 Deep Learning-Based Computer Vision Models for Early Cancer Detection Using Multimodal Medical Imaging and Radiogenomic Integration Frameworks

2025-12-02

Авторы:

Emmanuella Avwerosuoghene Oghenekaro

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Early cancer detection remains one of the most critical challenges in modern healthcare, where delayed diagnosis significantly reduces survival outcomes. Recent advancements in artificial intelligence, particularly deep learning, have enabled transformative progress in medical imaging analysis. Deep learning-based computer vision models, such as convolutional neural networks (CNNs), transformers, and hybrid attention architectures, can automatically extract complex spatial, morphological, and te...

ID: 2512.00714v1 cs.CV, cs.AI

arXiv PDF

📄 Probabilistic Modeling of Multi-rater Medical Image Segmentation for Diversity and Personalization

2025-12-02

Авторы:

Ke Liu, Shangde Gao, Yichao Fu, Shangqi Gao, Chunhua Shen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Medical image segmentation is inherently influenced by data uncertainty, arising from ambiguous boundaries in medical scans and inter-observer variability in diagnosis. To address this challenge, previous works formulated the multi-rater medical image segmentation task, where multiple experts provide separate annotations for each image. However, existing models are typically constrained to either generate diverse segmentation that lacks expert specificity or to produce personalized outputs that ...

ID: 2512.00748v1 cs.CV, cs.AI

arXiv PDF

📄 EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes

2025-12-02

Авторы:

Xiaoshan Wu, Yifei Yu, Xiaoyang Lyu, Yihua Huang, Bo Wang, Baoheng Zhang, Zhongrui Wang, Xiaojuan Qi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Robust 3D geometry estimation from videos is critical for applications such as autonomous navigation, SLAM, and 3D scene reconstruction. Recent methods like DUSt3R demonstrate that regressing dense pointmaps from image pairs enables accurate and efficient pose-free reconstruction. However, existing RGB-only approaches struggle under real-world conditions involving dynamic objects and extreme illumination, due to the inherent limitations of conventional cameras. In this paper, we propose EAG3R, a...

ID: 2512.00771v1 cs.CV, cs.AI

arXiv PDF

📄 TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models

2025-12-02

Авторы:

Tim Veenboer, George Yiasemis, Eric Marcus, Vivien Van Veldhuizen, Cees G. M. Snoek, Jonas Teuwen, Kevin B. W. Groot Lipman

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing foundation models (FMs) in the medical domain often require extensive fine-tuning or rely on training resource-intensive decoders, while many existing encoders are pretrained with objectives biased toward specific tasks. This illustrates a need for a strong, task-agnostic foundation model that requires minimal fine-tuning beyond feature extraction. In this work, we introduce a suite of task-agnostic pretraining of CT foundation models (TAP-CT): a simple yet effective adaptation of Visio...

ID: 2512.00872v1 cs.CV, cs.AI

arXiv PDF

📄 Look, Recite, Then Answer: Enhancing VLM Performance via Self-Generated Knowledge Hints

2025-12-02

Авторы:

Xisheng Feng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language Models (VLMs) exhibit significant performance plateaus in specialized domains like precision agriculture, primarily due to "Reasoning-Driven Hallucination" where linguistic priors override visual perception. A key bottleneck is the "Modality Gap": visual embeddings fail to reliably activate the fine-grained expert knowledge already encoded in model parameters. We propose "Look, Recite, Then Answer," a parameter-efficient framework that enhances VLMs via self-generated knowledge h...

ID: 2512.00882v1 cs.CV, cs.AI

arXiv PDF

Показано 151 - 160 из 2274 записей