Testing Most Influential Sets
2510.20372v2
stat.ML, cs.LG, econ.EM, math.ST, stat.ME, stat.TH
2025-10-27
Авторы:
Lucas Darius Konrad, Nikolas Kuschnig
Abstract
Small subsets of data with disproportionate influence on model outcomes can
have dramatic impacts on conclusions, with a few data points sometimes
overturning key findings. While recent work has developed methods to identify
these most influential sets, no formal theory exists to determine when their
influence reflects genuine problems rather than natural sampling variation. We
address this gap by developing a principled framework for assessing the
statistical significance of most influential sets. Our theoretical results
characterize the extreme value distributions of maximal influence and enable
rigorous hypothesis tests for excessive influence, replacing current ad-hoc
sensitivity checks. We demonstrate the practical value of our approach through
applications across economics, biology, and machine learning benchmarks.