📊 Статистика дайджестов

Всего дайджестов: 34123 Добавлено сегодня: 101

Последнее обновление: сегодня

📄 Large Language Models for Software Engineering: A Reproducibility Crisis

2025-12-04

Авторы:

Mohammed Latif Siddiq, Arvin Islam-Gomes, Natalie Sekerak, Joanna C. S. Santos

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reproducibility is a cornerstone of scientific progress, yet its state in large language model (LLM)-based software engineering (SE) research remains poorly understood. This paper presents the first large-scale, empirical study of reproducibility practices in LLM-for-SE research. We systematically mined and analyzed 640 papers published between 2017 and 2025 across premier software engineering, machine learning, and natural language processing venues, extracting structured metadata from publicat...

ID: 2512.00651v1 cs.SE, cs.LG

arXiv PDF

📄 Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation

2025-12-04

Авторы:

Timur Sattarov, Marco Schreyer, Damian Borth

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to high-dimensional datasets. To adapt DP-training to the diffusion process, we propose two privacy-aware training strategies: an adaptive timestep sampler that aligns updates with diffusion dynamics, and a feature-aggregated loss that mitigates clipping-induced bias. Tog...

ID: 2512.00638v1 cs.LG

arXiv PDF

📄 Restricted Block Permutation for Two-Sample Testing

2025-12-04

Авторы:

Jungwoo Ho

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We study a structured permutation scheme for two-sample testing that restricts permutations to single cross-swaps between block-selected representatives. Our analysis yields three main results. First, we provide an exact validity construction that applies to any fixed restricted permutation set. Second, for both the difference of sample means and the unbiased $\widehat{\mathrm{MMD}}^{2}$ estimator, we derive closed-form one-swap increment identities whose conditional variances scale as $O(h^{2})...

ID: 2512.00668v1 stat.ML, cs.LG

arXiv PDF

📄 Self-sufficient Independent Component Analysis via KL Minimizing Flows

2025-12-04

Авторы:

Song Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We study the problem of learning disentangled signals from data using non-linear Independent Component Analysis (ICA). Motivated by advances in self-supervised learning, we propose to learn self-sufficient signals: A recovered signal should be able to reconstruct a missing value of its own from all remaining components without relying on any other signals. We formulate this problem as the minimization of a conditional KL divergence. Compared to traditional maximum likelihood estimation, our algo...

ID: 2512.00665v1 stat.ML, cs.LG

arXiv PDF

📄 Non-Negative Matrix Factorization Using Non-Von Neumann Computers

2025-12-04

Авторы:

Ajinkya Borle, Charles Nicholas, Uchenna Chukwu, Mohammad-Ali Miri, Nicholas Chancellor

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Non-negative matrix factorization (NMF) is a matrix decomposition problem with applications in unsupervised learning. The general form of this problem (along with many of its variants) is NP-hard in nature. In our work, we explore how this problem could be solved with an energy-based optimization method suitable for certain machines with non-von Neumann architectures. We used the Dirac-3, a device based on the entropy computing paradigm and made by Quantum Computing Inc., to evaluate our approac...

ID: 2512.00675v1 quant-ph, cs.ET, cs.LG

arXiv PDF

📄 Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks

2025-12-04

Авторы:

Anish Lakkapragada

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Classical statistical inference and learning theory often fail to explain the success of modern neural networks. A key reason is that these models are non-identifiable (singular), violating core assumptions behind PAC bounds and asymptotic normality. Singular learning theory (SLT), a physics-inspired framework grounded in algebraic geometry, has gained popularity for its ability to close this theory-practice gap. In this paper, we empirically study SLT in toy settings relevant to interpretabilit...

ID: 2512.00686v2 cs.LG, stat.ML

arXiv PDF

📄 Exploiting Function-Family Structure in Analog Circuit Optimization

2025-12-04

Авторы:

Zhuohua Liu, Kaiqi Huang, Qinxin Mei, Yuanqi Hu, Wei W. Xing

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Analog circuit optimization is typically framed as black-box search over arbitrary smooth functions, yet device physics constrains performance mappings to structured families: exponential device laws, rational transfer functions, and regime-dependent dynamics. Off-the-shelf Gaussian-process surrogates impose globally smooth, stationary priors that are misaligned with these regime-switching primitives and can severely misfit highly nonlinear circuits at realistic sample sizes (50--100 evaluations...

ID: 2512.00712v1 cs.LG

arXiv PDF

📄 Towards Precision Protein-Ligand Affinity Prediction Benchmark: A Complete and Modification-Aware DAVIS Dataset

2025-12-04

Авторы:

Ming-Hsiu Wu, Ziqian Xie, Shuiwang Ji, Degui Zhi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimplified datasets that does not represent naturally occurring and biologically relevant proteins with modifications. In this work, we curate a complete and modification-aware version of the widely used DAVIS dataset by incorporating 4,032 kinase-ligand pairs involving substitutions, insertions, deletions, and p...

ID: 2512.00708v1 cs.LG, q-bio.BM

arXiv PDF

📄 Flow Matching for Tabular Data Synthesis

2025-12-04

Авторы:

Bahrul Ilmi Nasution, Floor Eijkelboom, Mark Elliot, Richard Allmendinger, Christian A. Naesseth

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Synthetic data generation is an important tool for privacy-preserving data sharing. While diffusion models have set recent benchmarks, flow matching (FM) offers a promising alternative. This paper presents different ways to implement flow matching for tabular data synthesis. We provide a comprehensive empirical study that compares flow matching (FM and variational FM) with a state-of-the-art diffusion method (TabDDPM and TabSyn) in tabular data synthesis. We evaluate both the standard Optimal Tr...

ID: 2512.00698v1 cs.LG, stat.ML

arXiv PDF

📄 ESMC: MLLM-Based Embedding Selection for Explainable Multiple Clustering

2025-12-04

Авторы:

Xinyue Wang, Yuheng Jia, Hui Liu, Junhui Hou

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Typical deep clustering methods, while achieving notable progress, can only provide one clustering result per dataset. This limitation arises from their assumption of a fixed underlying data distribution, which may fail to meet user needs and provide unsatisfactory clustering outcomes. Our work investigates how multi-modal large language models (MLLMs) can be leveraged to achieve user-driven clustering, emphasizing their adaptability to user-specified semantic requirements. However, directly usi...

ID: 2512.00725v1 cs.LG

arXiv PDF

1
2
25
26
27
28
29
1398
1399

Показано 261 - 270 из 13987 записей