📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization

2025-11-06

Авторы:

Zhicheng Wang, Chen Ju, Xu Chen, Shuai Xiao, Jinsong Lan, Xiaoyong Zhu, Ying Chen, Zhiguo Cao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Embedding models are a cornerstone of modern AI. Driven by Multimodal Large Language Models (MLLMs), they have made great progress in architecture and data curation, while the holistic paradigm is still limited to SSC, i.e., single input, singular embedding, contrastive supervision, which collapses rich, multifaceted inputs into monolithic embeddings and fails to fully exploit MLLM capabilities. In this paper, we tailor one Parallel Decoupling Framework (PDF) for multimodal embedding learning, b...

ID: 2511.01588v1 cs.LG, cs.CV

arXiv PDF

📄 OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning

2025-11-06

Авторы:

Kevin Valencia, Thilina Balasooriya, Xihaier Luo, Shinjae Yoo, David Keetae Park

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multimodal spatiotemporal learning on real-world experimental data is constrained by two challenges: within-modality measurements are sparse, irregular, and noisy (QA/QC artifacts) but cross-modally correlated; the set of available modalities varies across space and time, shrinking the usable record unless models can adapt to arbitrary subsets at train and test time. We propose OmniField, a continuity-aware framework that learns a continuous neural field conditioned on available modalities and i...

ID: 2511.02205v1 cs.LG, cs.CV

arXiv PDF

📄 Imbalanced Classification through the Lens of Spurious Correlations

2025-11-04

Авторы:

Jakob Hackstein, Sidney Bender

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Class imbalance poses a fundamental challenge in machine learning, frequently leading to unreliable classification performance. While prior methods focus on data- or loss-reweighting schemes, we view imbalance as a data condition that amplifies Clever Hans (CH) effects by underspecification of minority classes. In a counterfactual explanations-based approach, we propose to leverage Explainable AI to jointly identify and eliminate CH effects emerging under imbalance. Our method achieves competiti...

ID: 2510.27650v1 cs.LG, cs.CV

arXiv PDF

📄 CorVS: Person Identification via Video Trajectory-Sensor Correspondence in a Real-World Warehouse

2025-11-01

Авторы:

Kazuma Kano, Yuki Mori, Shin Katayama, Kenta Urano, Takuro Yonezawa, Nobuo Kawaguchi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Worker location data is key to higher productivity in industrial sites. Cameras are a promising tool for localization in logistics warehouses since they also offer valuable environmental contexts such as package status. However, identifying individuals with only visual data is often impractical. Accordingly, several prior studies identified people in videos by comparing their trajectories and wearable sensor measurements. While this approach has advantages such as independence from appearance, t...

ID: 2510.26369v1 cs.LG, cs.CV, cs.RO

arXiv PDF

📄 Feedback Alignment Meets Low-Rank Manifolds: A Structured Recipe for Local Learning

2025-10-31

Авторы:

Arani Roy, Marco P. Apolinario, Shristi Das Biswas, Kaushik Roy

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Training deep neural networks (DNNs) with backpropagation (BP) achieves state-of-the-art accuracy but requires global error propagation and full parameterization, leading to substantial memory and computational overhead. Direct Feedback Alignment (DFA) enables local, parallelizable updates with lower memory requirements but is limited by unstructured feedback and poor scalability in deeper architectures, specially convolutional neural networks. To address these limitations, we propose a structur...

ID: 2510.25594v1 cs.LG, cs.CV

arXiv PDF

📄 Synergistic Neural Forecasting of Air Pollution with Stochastic Sampling

2025-10-30

Авторы:

Yohan Abeysinghe, Muhammad Akhtar Munir, Sanoojan Baliah, Ron Sarafian, Fahad Shahbaz Khan, Yinon Rudich, Salman Khan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Air pollution remains a leading global health and environmental risk, particularly in regions vulnerable to episodic air pollution spikes due to wildfires, urban haze and dust storms. Accurate forecasting of particulate matter (PM) concentrations is essential to enable timely public health warnings and interventions, yet existing models often underestimate rare but hazardous pollution events. Here, we present SynCast, a high-resolution neural forecasting model that integrates meteorological and ...

ID: 2510.23977v1 cs.LG, cs.CV

arXiv PDF

📄 What do vision-language models see in the context? Investigating multimodal in-context learning

2025-10-30

Авторы:

Gabriel O. dos Santos, Esther Colombini, Sandra Avila

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In-context learning (ICL) enables Large Language Models (LLMs) to learn tasks from demonstration examples without parameter updates. Although it has been extensively studied in LLMs, its effectiveness in Vision-Language Models (VLMs) remains underexplored. In this work, we present a systematic study of ICL in VLMs, evaluating seven models spanning four architectures on three image captioning benchmarks. We analyze how prompt design, architectural choices, and training strategies influence multim...

ID: 2510.24331v1 cs.LG, cs.CV

arXiv PDF

📄 A supervised discriminant data representation: application to pattern classification

2025-10-29

Авторы:

Fadi Dornaika, Ahmad Khoder, Abdelmalik Moujahid, Wassim Khoder

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The performance of machine learning and pattern recognition algorithms generally depends on data representation. That is why, much of the current effort in performing machine learning algorithms goes into the design of preprocessing frameworks and data transformations able to support effective machine learning. The method proposed in this work consists of a hybrid linear feature extraction scheme to be used in supervised multi-class classification problems. Inspired by two recent linear discrimi...

ID: 2510.21898v1 cs.LG, cs.CV

arXiv PDF

📄 MAGIC-Flow: Multiscale Adaptive Conditional Flows for Generation and Interpretable Classification

2025-10-29

Авторы:

Luca Caldera, Giacomo Bottacini, Lara Cavinato

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generative modeling has emerged as a powerful paradigm for representation learning, but its direct applicability to challenging fields like medical imaging remains limited: mere generation, without task alignment, fails to provide a robust foundation for clinical use. We propose MAGIC-Flow, a conditional multiscale normalizing flow architecture that performs generation and classification within a single modular framework. The model is built as a hierarchy of invertible and differentiable bijecti...

ID: 2510.22070v1 cs.LG, cs.CV, eess.IV, stat.ML

arXiv PDF

📄 Simplifying Knowledge Transfer in Pretrained Models

2025-10-29

Авторы:

Siddharth Jain, Shyamgopal Karthik, Vineet Gandhi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Pretrained models are ubiquitous in the current deep learning landscape, offering strong results on a broad range of tasks. Recent works have shown that models differing in various design choices exhibit categorically diverse generalization behavior, resulting in one model grasping distinct data-specific insights unavailable to the other. In this paper, we propose to leverage large publicly available model repositories as an auxiliary source of model improvements. We introduce a data partitionin...

ID: 2510.22208v1 cs.LG, cs.CV

arXiv PDF

Показано 81 - 90 из 277 записей