📊 Статистика дайджестов

Всего дайджестов: 35039 Добавлено сегодня: 432

Последнее обновление: сегодня

📄 DAGLFNet:Deep Attention-Guided Global-Local Feature Fusion for Pseudo-Image Point Cloud Segmentation

2025-10-15

Авторы:

Chuang Chen, Wenyi Ge

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Environmental perception systems play a critical role in high-precision mapping and autonomous navigation, with LiDAR serving as a core sensor that provides accurate 3D point cloud data. How to efficiently process unstructured point clouds while extracting structured semantic information remains a significant challenge, and in recent years, numerous pseudo-image-based representation methods have emerged to achieve a balance between efficiency and performance. However, they often overlook the str...

ID: 2510.10471v1 cs.CV, cs.LG

arXiv PDF

📄 MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates

2025-10-15

Авторы:

Binyu Zhao, Wei Zhang, Zhaonian Zou

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, ofte...

ID: 2510.10534v1 cs.CV, cs.LG, cs.MM

arXiv PDF

📄 Deep semi-supervised approach based on consistency regularization and similarity learning for weeds classification

2025-10-15

Авторы:

Farouq Benchallal, Adel Hafiane, Nicolas Ragot, Raphael Canals

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Weed species classification represents an important step for the development of automated targeting systems that allow the adoption of precision agriculture practices. To reduce costs and yield losses caused by their presence. The identification of weeds is a challenging problem due to their shared similarities with crop plants and the variability related to the differences in terms of their types. Along with the variations in relation to changes in field conditions. Moreover, to fully benefit f...

ID: 2510.10573v1 cs.CV, cs.LG

arXiv PDF

📄 GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus

2025-10-15

Авторы:

Zhaolin Hu, Kun Li, Hehe Fan, Yi Yang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Linear attention mechanisms have emerged as efficient alternatives to full self-attention in Graph Transformers, offering linear time complexity. However, existing linear attention models often suffer from a significant drop in expressiveness due to low-rank projection structures and overly uniform attention distributions. We theoretically prove that these properties reduce the class separability of node representations, limiting the model's classification ability. To address this, we propose a ...

ID: 2510.10631v1 cs.CV, cs.LG

arXiv PDF

📄 Seeing My Future: Predicting Situated Interaction Behavior in Virtual Reality

2025-10-15

Авторы:

Yuan Xu, Zimu Zhang, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Virtual and augmented reality systems increasingly demand intelligent adaptation to user behaviors for enhanced interaction experiences. Achieving this requires accurately understanding human intentions and predicting future situated behaviors - such as gaze direction and object interactions - which is vital for creating responsive VR/AR environments and applications like personalized assistants. However, accurate behavioral prediction demands modeling the underlying cognitive processes that dri...

ID: 2510.10742v1 cs.CV, cs.LG

arXiv PDF

📄 Comparative Evaluation of Neural Network Architectures for Generalizable Human Spatial Preference Prediction in Unseen Built Environments

2025-10-15

Авторы:

Maral Doctorarastoo, Katherine A. Flanigan, Mario Bergés, Christopher McComb

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The capacity to predict human spatial preferences within built environments is instrumental for developing Cyber-Physical-Social Infrastructure Systems (CPSIS). A significant challenge in this domain is the generalizability of preference models, particularly their efficacy in predicting preferences within environmental configurations not encountered during training. While deep learning models have shown promise in learning complex spatial and contextual dependencies, it remains unclear which neu...

ID: 2510.10954v1 cs.CE, cs.CV, cs.LG, cs.MA

arXiv PDF

📄 Chart-RVR: Reinforcement Learning with Verifiable Rewards for Explainable Chart Reasoning

2025-10-15

Авторы:

Sanchit Sinha, Oana Frunza, Kashif Rasul, Yuriy Nevmyvaka, Aidong Zhang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The capabilities of Large Vision-Language Models (LVLMs) have reached state-of-the-art on many visual reasoning tasks, including chart reasoning, yet they still falter on out-of-distribution (OOD) data, and degrade further when asked to produce their chain-of-thought (CoT) rationales, limiting explainability. We present Chart-RVR, a general framework that fine-tunes LVLMs to be more robust and explainable for chart reasoning by coupling Group Relative Policy Optimization (GRPO) with automaticall...

ID: 2510.10973v1 cs.CV, cs.LG

arXiv PDF

📄 $Δ\mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization

2025-10-15

Авторы:

Lin Zhu, Yifeng Yang, Xinbing Wang, Qinying Gu, Nanyang Ye

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to c...

ID: 2510.11296v1 cs.CV, cs.LG

arXiv PDF

📄 MS-Mix: Unveiling the Power of Mixup for Multimodal Sentiment Analysis

2025-10-15

Авторы:

Hongyu Zhu, Lin Chen, Mounim A. El-Yacoubi, Mingsheng Shang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multimodal Sentiment Analysis (MSA) aims to identify and interpret human emotions by integrating information from heterogeneous data sources such as text, video, and audio. While deep learning models have advanced in network architecture design, they remain heavily limited by scarce multimodal annotated data. Although Mixup-based augmentation improves generalization in unimodal tasks, its direct application to MSA introduces critical challenges: random mixing often amplifies label ambiguity and ...

ID: 2510.11579v1 cs.CV, cs.LG

arXiv PDF

📄 Diffusion Transformers with Representation Autoencoders

2025-10-15

Авторы:

Boyang Zheng, Nanye Ma, Shengbang Tong, Saining Xie

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved. Most DiTs continue to rely on the original VAE encoder, which introduces several limitations: outdated backbones that compromise architectural simplicity, low-dimensional latent spaces that restrict information capacity, and weak representations that result fr...

ID: 2510.11690v1 cs.CV, cs.LG

arXiv PDF

Показано 371 - 380 из 863 записей