An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation
2510.25128v1
cs.LG, stat.ML
2025-10-31
Авторы:
Uzair Akbar, Niki Kilbertus, Hao Shen, Krikamol Muandet, Bo Dai
Abstract
The technique of data augmentation (DA) is often used in machine learning for
regularization purposes to better generalize under i.i.d. settings. In this
work, we present a unifying framework with topics in causal inference to make a
case for the use of DA beyond just the i.i.d. setting, but for generalization
across interventions as well. Specifically, we argue that when the outcome
generating mechanism is invariant to our choice of DA, then such augmentations
can effectively be thought of as interventions on the treatment generating
mechanism itself. This can potentially help to reduce bias in causal effect
estimation arising from hidden confounders. In the presence of such unobserved
confounding we typically make use of instrumental variables (IVs) -- sources of
treatment randomization that are conditionally independent of the outcome.
However, IVs may not be as readily available as DA for many applications, which
is the main motivation behind this work. By appropriately regularizing IV based
estimators, we introduce the concept of IV-like (IVL) regression for mitigating
confounding bias and improving predictive performance across interventions even
when certain IV properties are relaxed. Finally, we cast parameterized DA as an
IVL regression problem and show that when used in composition can simulate a
worst-case application of such DA, further improving performance on causal
estimation and generalization tasks beyond what simple DA may offer. This is
shown both theoretically for the population case and via simulation experiments
for the finite sample case using a simple linear example. We also present real
data experiments to support our case.
Ссылки и действия
Дополнительные ресурсы: