📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Real-time Cricket Sorting By Sex

2025-12-05

Авторы:

Juan Manuel Cantarero Angulo, Matthew Smith

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The global demand for sustainable protein sources is driving increasing interest in edible insects, with Acheta domesticus (house cricket) identified as one of the most suitable species for industrial production. Current farming practices typically rear crickets in mixed-sex populations without automated sex sorting, despite potential benefits such as selective breeding, optimized reproduction ratios, and nutritional differentiation. This work presents a low-cost, real-time system for automated ...

ID: 2512.04311v1 cs.CV, q-bio.QM

arXiv PDF

📄 Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction

2025-12-05

Авторы:

Rui Fonseca, Bruno Martins, Gil Rocha

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Image captioning has drawn considerable attention from the natural language processing and computer vision fields. Aiming to reduce the reliance on curated data, several studies have explored image captioning without any humanly-annotated image-text pairs for training, although existing methods are still outperformed by fully supervised approaches. This paper proposes TOMCap, i.e., an improved text-only training method that performs captioning without the need for aligned image-caption pairs. Th...

ID: 2512.04309v1 cs.CV, cs.CL

arXiv PDF

📄 DisentangleFormer: Spatial-Channel Decoupling for Multi-Channel Vision

2025-12-05

Авторы:

Jiashu Liao, Pietro Liò, Marc de Kamps, Duygu Sarikaya

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision Transformers face a fundamental limitation: standard self-attention jointly processes spatial and channel dimensions, leading to entangled representations that prevent independent modeling of structural and semantic dependencies. This problem is especially pronounced in hyperspectral imaging, from satellite hyperspectral remote sensing to infrared pathology imaging, where channels capture distinct biophysical or biochemical cues. We propose DisentangleFormer, an architecture that achieves...

ID: 2512.04314v1 cs.CV

arXiv PDF

📄 Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding

2025-12-05

Авторы:

Haolin Xiong, Tianwen Fu, Pratusha Bhuvana Prasad, Yunxuan Cai, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Current expressive avatar systems rely heavily on visual cues, failing when faces are occluded or when emotions remain internal. We present Mind-to-Face, the first framework that decodes non-invasive electroencephalogram (EEG) signals directly into high-fidelity facial expressions. We build a dual-modality recording setup to obtain synchronized EEG and multi-view facial video during emotion-eliciting stimuli, enabling precise supervision for neural-to-visual learning. Our model uses a CNN-Transf...

ID: 2512.04313v1 cs.CV

arXiv PDF

📄 A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

2025-12-05

Авторы:

Waleed Khalid, Dmitry Ignatov, Radu Timofte

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains difficult. We introduce NN-RAG, a retrieval-augmented generation system that converts large, heterogeneous PyTorch codebases into a searchable and executable library of validated neural modules. Unlike conventional code search or clone-detection tools, NN-RAG performs scope-aware dependency resolution, import-p...

ID: 2512.04329v1 cs.CV, cs.SE

arXiv PDF

📄 SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

2025-12-05

Авторы:

Yonghan Lee, Tsung-Wei Huang, Shiv Gehlot, Jaehoon Choi, Guan-Ming Su, Dinesh Manocha

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Modeling dynamic 3D scenes is challenging due to their high-dimensional nature, which requires aggregating information from multiple views to reconstruct time-evolving 3D geometry and motion. We present a novel multi-video 4D Gaussian Splatting (4DGS) approach designed to handle real-world, unsynchronized video sets. Our approach, SyncTrack4D, directly leverages dense 4D track representation of dynamic scene parts as cues for simultaneous cross-video synchronization and 4DGS reconstruction. We f...

ID: 2512.04315v1 cs.CV

arXiv PDF

📄 MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching

2025-12-05

Авторы:

Ao Xu, Rujin Zhao, Xiong Xu, Boceng Huang, Yujia Jia, Hongfeng Long, Fuxuan Chen, Zilong Cao, Fangyuan Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing stereo matching networks typically rely on either cost-volume construction based on 3D convolutions or deformation methods based on iterative optimization. The former incurs significant computational overhead during cost aggregation, whereas the latter often lacks the ability to model non-local contextual information. These methods exhibit poor compatibility on resource-constrained mobile devices, limiting their deployment in real-time applications. To address this, we propose a Multi-f...

ID: 2512.04358v1 cs.CV

arXiv PDF

📄 Open Set Face Forgery Detection via Dual-Level Evidence Collection

2025-12-05

Авторы:

Zhongyi Cai, Bryce Gernon, Wentao Bao, Yifan Li, Matthew Wright, Yu Kong

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The proliferation of face forgeries has increasingly undermined confidence in the authenticity of online content. Given the rapid development of face forgery generation algorithms, new fake categories are likely to keep appearing, posing a major challenge to existing face forgery detection methods. Despite recent advances in face forgery detection, existing methods are typically limited to binary Real-vs-Fake classification or the identification of known fake categories, and are incapable of det...

ID: 2512.04331v1 cs.CV

arXiv PDF

📄 Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease Detection

2025-12-05

Авторы:

Zeeshan Ahmad, Shudi Bao, Meng Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Medical image classification plays an increasingly vital role in identifying various diseases by classifying medical images, such as X-rays, MRIs and CT scans, into different categories based on their features. In recent years, deep learning techniques have attracted significant attention in medical image classification. However, it is usually infeasible to train an entire large deep learning model from scratch. To address this issue, one of the solutions is the transfer learning (TL) technique,...

ID: 2512.04397v1 cs.CV

arXiv PDF

📄 Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language Models

2025-12-05

Авторы:

Hieu Dinh Trung Pham, Huy Minh Nhat Nguyen, Cuong Tuan Nguyen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large-scale pre-trained Vision-Language Models (VLMs) have demonstrated strong few-shot learning capabilities. However, these methods typically learn holistic representations where an image's domain-invariant structure is implicitly entangled with its domain-specific style. This presents an opportunity to further enhance generalization by disentangling these visual cues. In this paper, we propose Fourier-Attentive Representation Learning (FARL), a novel framework that addresses this by explicitl...

ID: 2512.04395v1 cs.CV

arXiv PDF

1
2
3
4
5
1161
1162

Показано 21 - 30 из 11614 записей