Testing Most Influential Sets

2510.20372v2 stat.ML, cs.LG, econ.EM, math.ST, stat.ME, stat.TH 2025-10-27

Авторы:

Lucas Darius Konrad, Nikolas Kuschnig

Abstract

Small subsets of data with disproportionate influence on model outcomes can have dramatic impacts on conclusions, with a few data points sometimes overturning key findings. While recent work has developed methods to identify these most influential sets, no formal theory exists to determine when their influence reflects genuine problems rather than natural sampling variation. We address this gap by developing a principled framework for assessing the statistical significance of most influential sets. Our theoretical results characterize the extreme value distributions of maximal influence and enable rigorous hypothesis tests for excessive influence, replacing current ad-hoc sensitivity checks. We demonstrate the practical value of our approach through applications across economics, biology, and machine learning benchmarks.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Testing Most Influential Sets

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Riesz Regression As Direct Density Ratio Estimation

Bridging the Gap between Empirical Welfare Maximization and Conditional Average ...

A Unified Theory for Causal Inference: Direct Debiased Machine Learning via Breg...

Testing Most Influential Sets

Навигация