📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices

2025-11-26

Авторы:

Lianming Huang, Haibo Hu, Qiao Li, Nan Guan, Chun Jason Xue

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Sparsity is essential for deploying large models on resource constrained edge platforms. However, optimizing sparsity patterns for individual tasks in isolation ignores the significant I/O overhead incurred during frequent task switching. We introduce an on-demand multi-task sparsity framework specifically designed to minimize switching costs by maximizing parameter reuse. Unlike monolithic approaches, we decompose weights into reusable block-granular units and align sparse structures across tas...

ID: 2511.19986v1 cs.LG, cs.AI, cs.CV

arXiv PDF

📄 Zero-Shot Transfer Capabilities of the Sundial Foundation Model for Leaf Area Index Forecasting

2025-11-26

Авторы:

Peining Zhang, Hongchen Qin, Haochen Zhang, Ziqi Guo, Guiling Wang, Jinbo Bi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This work investigates the zero-shot forecasting capability of time-series foundation models for Leaf Area Index (LAI) forecasting in agricultural monitoring. Using the HiQ dataset (U.S., 2000-2022), we systematically compare statistical baselines, a fully supervised LSTM, and the Sundial foundation model under multiple evaluation protocols. We find that Sundial, in the zero-shot setting, can outperform a fully trained LSTM provided that the input context window is sufficiently long-specifically...

ID: 2511.20004v1 cs.LG, cs.AI, cs.CV

arXiv PDF

📄 The Devil in the Details: Emergent Misalignment, Format and Coherence in Open-Weights LLMs

2025-11-26

Авторы:

Craig Dickson

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Prior work has shown that fine-tuning models on a narrow domain with misaligned data can lead to broad misalignment - a phenomenon termed "emergent misalignment" (Betley et al. 2025). While all tested models were susceptible to emergent misalignment, some models showed more resistance than others. Specifically the Qwen-2.5 family proved to be relatively resistant, while GPT-4o exhibited the strongest misalignment. In this paper we evaluate if current-generation open-weights models exhibit simila...

ID: 2511.20104v1 cs.LG, cs.AI, cs.CL

arXiv PDF

📄 IDAP++: Advancing Divergence-Based Pruning via Filter-Level and Layer-Level Optimization

2025-11-26

Авторы:

Aleksei Samarin, Artem Nazarenko, Egor Kotenko, Valentin Malykh, Alexander Savelev, Aleksei Toropov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper presents a novel approach to neural network compression that addresses redundancy at both the filter and architectural levels through a unified framework grounded in information flow analysis. Building on the concept of tensor flow divergence, which quantifies how information is transformed across network layers, we develop a two-stage optimization process. The first stage employs iterative divergence-aware pruning to identify and remove redundant filters while preserving critical inf...

ID: 2511.20141v1 cs.LG, cs.AI

arXiv PDF

📄 On the Limits of Momentum in Decentralized and Federated Optimization

2025-11-26

Авторы:

Riccardo Zaccone, Sai Praneeth Karimireddy, Carlo Masone

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent works have explored the use of momentum in local methods to enhance distributed SGD. This is particularly appealing in Federated Learning (FL), where momentum intuitively appears as a solution to mitigate the effects of statistical heterogeneity. Despite recent progress in this direction, it is still unclear if momentum can guarantee convergence under unbounded heterogeneity in decentralized scenarios, where only some workers participate at each round. In this work we analyze momentum und...

ID: 2511.20168v1 cs.LG, cs.AI

arXiv PDF

📄 Leveraging weights signals - Predicting and improving generalizability in reinforcement learning

2025-11-26

Авторы:

Olivier Moulin, Vincent Francois-lavet, Paul Elbers, Mark Hoogendoorn

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generalizability of Reinforcement Learning (RL) agents (ability to perform on environments different from the ones they have been trained on) is a key problem as agents have the tendency to overfit to their training environments. In order to address this problem and offer a solution to increase the generalizability of RL agents, we introduce a new methodology to predict the generalizability score of RL agents based on the internal weights of the agent's neural networks. Using this prediction cap...

ID: 2511.20234v1 cs.LG, cs.AI

arXiv PDF

📄 Interpretable Air Pollution Forecasting by Physics-Guided Spatiotemporal Decoupling

2025-11-26

Авторы:

Zhiguo Zhang, Xiaoliang Ma, Daniel Schlesinger

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Accurate and interpretable air pollution forecasting is crucial for public health, but most models face a trade-off between performance and interpretability. This study proposes a physics-guided, interpretable-by-design spatiotemporal learning framework. The model decomposes the spatiotemporal behavior of air pollutant concentrations into two transparent, additive modules. The first is a physics-guided transport kernel with directed weights conditioned on wind and geography (advection). The seco...

ID: 2511.20257v1 cs.LG, cs.AI

arXiv PDF

📄 HVAdam: A Full-Dimension Adaptive Optimizer

2025-11-26

Авторы:

Yiheng Zhang, Shaowu Wu, Yuanzhuo Xu, Jiajun Wu, Shang Xu, Steve Drew, Xiaoguang Niu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel c...

ID: 2511.20277v1 cs.LG, cs.AI

arXiv PDF

📄 Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits

2025-11-26

Авторы:

Areeb Ahmad, Abhinav Joshi, Ashutosh Modi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Transformer-based language models exhibit complex and distributed behavior, yet their internal computations remain poorly understood. Existing mechanistic interpretability methods typically treat attention heads and multilayer perceptron layers (MLPs) (the building blocks of a transformer architecture) as indivisible units, overlooking possibilities of functional substructure learned within them. In this work, we introduce a more fine-grained perspective that decomposes these components into ort...

ID: 2511.20273v1 cs.LG, cs.AI, cs.CL

arXiv PDF

📄 Geometry of Decision Making in Language Models

2025-11-26

Авторы:

Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using m...

ID: 2511.20315v1 cs.LG, cs.AI, cs.CL

arXiv PDF

Показано 301 - 310 из 2901 записей