📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation

2025-12-04

Авторы:

Timur Sattarov, Marco Schreyer, Damian Borth

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to high-dimensional datasets. To adapt DP-training to the diffusion process, we propose two privacy-aware training strategies: an adaptive timestep sampler that aligns updates with diffusion dynamics, and a feature-aggregated loss that mitigates clipping-induced bias. Tog...

ID: 2512.00638v1 cs.LG

arXiv PDF

📄 Restricted Block Permutation for Two-Sample Testing

2025-12-04

Авторы:

Jungwoo Ho

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We study a structured permutation scheme for two-sample testing that restricts permutations to single cross-swaps between block-selected representatives. Our analysis yields three main results. First, we provide an exact validity construction that applies to any fixed restricted permutation set. Second, for both the difference of sample means and the unbiased $\widehat{\mathrm{MMD}}^{2}$ estimator, we derive closed-form one-swap increment identities whose conditional variances scale as $O(h^{2})...

ID: 2512.00668v1 stat.ML, cs.LG

arXiv PDF

📄 Self-sufficient Independent Component Analysis via KL Minimizing Flows

2025-12-04

Авторы:

Song Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We study the problem of learning disentangled signals from data using non-linear Independent Component Analysis (ICA). Motivated by advances in self-supervised learning, we propose to learn self-sufficient signals: A recovered signal should be able to reconstruct a missing value of its own from all remaining components without relying on any other signals. We formulate this problem as the minimization of a conditional KL divergence. Compared to traditional maximum likelihood estimation, our algo...

ID: 2512.00665v1 stat.ML, cs.LG

arXiv PDF

📄 Non-Negative Matrix Factorization Using Non-Von Neumann Computers

2025-12-04

Авторы:

Ajinkya Borle, Charles Nicholas, Uchenna Chukwu, Mohammad-Ali Miri, Nicholas Chancellor

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Non-negative matrix factorization (NMF) is a matrix decomposition problem with applications in unsupervised learning. The general form of this problem (along with many of its variants) is NP-hard in nature. In our work, we explore how this problem could be solved with an energy-based optimization method suitable for certain machines with non-von Neumann architectures. We used the Dirac-3, a device based on the entropy computing paradigm and made by Quantum Computing Inc., to evaluate our approac...

ID: 2512.00675v1 quant-ph, cs.ET, cs.LG

arXiv PDF

📄 Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks

2025-12-04

Авторы:

Anish Lakkapragada

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Classical statistical inference and learning theory often fail to explain the success of modern neural networks. A key reason is that these models are non-identifiable (singular), violating core assumptions behind PAC bounds and asymptotic normality. Singular learning theory (SLT), a physics-inspired framework grounded in algebraic geometry, has gained popularity for its ability to close this theory-practice gap. In this paper, we empirically study SLT in toy settings relevant to interpretabilit...

ID: 2512.00686v2 cs.LG, stat.ML

arXiv PDF

📄 Exploiting Function-Family Structure in Analog Circuit Optimization

2025-12-04

Авторы:

Zhuohua Liu, Kaiqi Huang, Qinxin Mei, Yuanqi Hu, Wei W. Xing

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Analog circuit optimization is typically framed as black-box search over arbitrary smooth functions, yet device physics constrains performance mappings to structured families: exponential device laws, rational transfer functions, and regime-dependent dynamics. Off-the-shelf Gaussian-process surrogates impose globally smooth, stationary priors that are misaligned with these regime-switching primitives and can severely misfit highly nonlinear circuits at realistic sample sizes (50--100 evaluations...

ID: 2512.00712v1 cs.LG

arXiv PDF

📄 Towards Precision Protein-Ligand Affinity Prediction Benchmark: A Complete and Modification-Aware DAVIS Dataset

2025-12-04

Авторы:

Ming-Hsiu Wu, Ziqian Xie, Shuiwang Ji, Degui Zhi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimplified datasets that does not represent naturally occurring and biologically relevant proteins with modifications. In this work, we curate a complete and modification-aware version of the widely used DAVIS dataset by incorporating 4,032 kinase-ligand pairs involving substitutions, insertions, deletions, and p...

ID: 2512.00708v1 cs.LG, q-bio.BM

arXiv PDF

📄 Flow Matching for Tabular Data Synthesis

2025-12-04

Авторы:

Bahrul Ilmi Nasution, Floor Eijkelboom, Mark Elliot, Richard Allmendinger, Christian A. Naesseth

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Synthetic data generation is an important tool for privacy-preserving data sharing. While diffusion models have set recent benchmarks, flow matching (FM) offers a promising alternative. This paper presents different ways to implement flow matching for tabular data synthesis. We provide a comprehensive empirical study that compares flow matching (FM and variational FM) with a state-of-the-art diffusion method (TabDDPM and TabSyn) in tabular data synthesis. We evaluate both the standard Optimal Tr...

ID: 2512.00698v1 cs.LG, stat.ML

arXiv PDF

📄 ESMC: MLLM-Based Embedding Selection for Explainable Multiple Clustering

2025-12-04

Авторы:

Xinyue Wang, Yuheng Jia, Hui Liu, Junhui Hou

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Typical deep clustering methods, while achieving notable progress, can only provide one clustering result per dataset. This limitation arises from their assumption of a fixed underlying data distribution, which may fail to meet user needs and provide unsatisfactory clustering outcomes. Our work investigates how multi-modal large language models (MLLMs) can be leveraged to achieve user-driven clustering, emphasizing their adaptability to user-specified semantic requirements. However, directly usi...

ID: 2512.00725v1 cs.LG

arXiv PDF

📄 Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking

2025-12-04

Авторы:

Lingling Fu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reward models play a critical role in Reinforcement Learning from Human Feedback (RLHF) by assessing the consistency between generated outputs and human preferences. However, conventional reward models are prone to reward hacking or over-optimization, where the policy exploits shortcut patterns to obtain high reward scores that do not reflect true human preference. Although Mixture-of-Experts (MoE)-based reward models can enhance discriminative capability, they typically introduce substantial co...

ID: 2512.00724v1 cs.LG, cs.IR

arXiv PDF

1
2
20
21
22
23
24
1393
1394

Показано 211 - 220 из 13936 записей