📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

2025-10-02

Авторы:

Adrian Kosowski, Przemysław Uznański, Jan Chorowski, Zuzanna Stamirowska, Michał Bartoszkiewicz

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The relationship between computing systems and the brain has served as motivation for pioneering theoreticians since John von Neumann and Alan Turing. Uniform, scale-free biological networks, such as the brain, have powerful properties, including generalizing over time, which is the main barrier for Machine Learning on the path to Universal Reasoning Models. We introduce `Dragon Hatchling' (BDH), a new Large Language Model architecture based on a scale-free biologically inspired network of \$n...

ID: 2509.26507v1 cs.NE, cs.AI, cs.LG, stat.ML

arXiv PDF

📄 FraudTransformer: Time-Aware GPT for Transaction Fraud Detection

2025-10-01

Авторы:

Gholamali Aminian, Andrew Elliott, Tiger Li, Timothy Cheuk Hin Wong, Victor Claude Dehon, Lukasz Szpruch, Carsten Maple, Christopher Read, Martin Brown, Gesine Reinert, Mo Mamouei

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Detecting payment fraud in real-world banking streams requires models that can exploit both the order of events and the irregular time gaps between them. We introduce FraudTransformer, a sequence model that augments a vanilla GPT-style architecture with (i) a dedicated time encoder that embeds either absolute timestamps or inter-event values, and (ii) a learned positional encoder that preserves relative order. Experiments on a large industrial dataset -- tens of millions of transactions and auxi...

ID: 2509.23712v1 cs.LG, stat.ML

arXiv PDF

📄 Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization

2025-10-01

Авторы:

Chris Kolb, Laetitia Frost, Bernd Bischl, David Rügamer

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Structured sparsity regularization offers a principled way to compact neural networks, but its non-differentiability breaks compatibility with conventional stochastic gradient descent and requires either specialized optimizers or additional post-hoc pruning without formal guarantees. In this work, we propose $D$-Gating, a fully differentiable structured overparameterization that splits each group of weights into a primary weight vector and multiple scalar gating factors. We prove that any local ...

ID: 2509.23898v2 cs.LG, stat.ML

arXiv PDF

📄 Does Weak-to-strong Generalization Happen under Spurious Correlations?

2025-10-01

Авторы:

Chenruo Liu, Yijun Dong, Qi Lei

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We initiate a unified theoretical and algorithmic study of a key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures? We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction $\eta_\ell$, and (ii) a g...

ID: 2509.24005v1 cs.LG, stat.ML

arXiv PDF

📄 On The Variability of Concept Activation Vectors

2025-10-01

Авторы:

Julia Wenkmann, Damien Garreau

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

One of the most pressing challenges in artificial intelligence is to make models more transparent to their users. Recently, explainable artificial intelligence has come up with numerous method to tackle this challenge. A promising avenue is to use concept-based explanations, that is, high-level concepts instead of plain feature importance score. Among this class of methods, Concept Activation vectors (CAVs), Kim et al. (2018) stands out as one of the main protagonists. One interesting aspect of ...

ID: 2509.24058v1 cs.LG, stat.ML

arXiv PDF

📄 Demographic-Agnostic Fairness without Harm

2025-10-01

Авторы:

Zhongteng Cai, Mohammad Mahdi Khalili, Xueru Zhang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As machine learning (ML) algorithms are increasingly used in social domains to make predictions about humans, there is a growing concern that these algorithms may exhibit biases against certain social groups. Numerous notions of fairness have been proposed in the literature to measure the unfairness of ML. Among them, one class that receives the most attention is \textit{parity-based}, i.e., achieving fairness by equalizing treatment or outcomes for different social groups. However, achieving pa...

ID: 2509.24077v1 cs.LG, stat.ML

arXiv PDF

📄 A Family of Kernelized Matrix Costs for Multiple-Output Mixture Neural Networks

2025-10-01

Авторы:

Bo Hu, José C. Príncipe

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Pairwise distance-based costs are crucial for self-supervised and contrastive feature learning. Mixture Density Networks (MDNs) are a widely used approach for generative models and density approximation, using neural networks to produce multiple centers that define a Gaussian mixture. By combining MDNs with contrastive costs, this paper proposes data density approximation using four types of kernelized matrix costs: the scalar cost, the vector-matrix cost, the matrix-matrix cost (the trace of Sc...

ID: 2509.24076v2 cs.LG, stat.ML

arXiv PDF

📄 A signal separation view of classification

2025-10-01

Авторы:

H. N. Mhaskar, Ryan O'Dowd

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The problem of classification in machine learning has often been approached in terms of function approximation. In this paper, we propose an alternative approach for classification in arbitrary compact metric spaces which, in theory, yields both the number of classes, and a perfect classification using a minimal number of queried labels. Our approach uses localized trigonometric polynomial kernels initially developed for the point source signal separation problem in signal processing. Rather tha...

ID: 2509.24140v1 cs.LG, stat.ML

arXiv PDF

📄 AuON: A Linear-time Alternative to Semi-Orthogonal Momentum Updates

2025-10-01

Авторы:

Dipan Maity

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Orthogonal gradient updates have emerged as a promising direction in optimization for machine learning. However, traditional approaches such as SVD/QR decomposition incur prohibitive computational costs of O(n^3) and underperform compared to well-tuned SGD with momentum, since momentum is applied only after strict orthogonalization. Recent advances, such as Muon, improve efficiency by applying momentum before orthogonalization and producing semi-orthogonal matrices via Newton-Schulz iterations, ...

ID: 2509.24320v2 cs.LG, stat.ML

arXiv PDF

📄 Interpretable Kernel Representation Learning at Scale: A Unified Framework Utilizing Nyström Approximation

2025-10-01

Авторы:

Maedeh Zarvandi, Michael Timothy, Theresa Wasserer, Debarghya Ghoshdastidar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Kernel methods provide a theoretically grounded framework for non-linear and non-parametric learning, with strong analytic foundations and statistical guarantees. Yet, their scalability has long been limited by prohibitive time and memory costs. While progress has been made in scaling kernel regression, no framework exists for scalable kernel-based representation learning, restricting their use in the era of foundation models where representations are learned from massive unlabeled data. We intr...

ID: 2509.24467v2 cs.LG, stat.ML

arXiv PDF

Показано 261 - 270 из 385 записей