Partial Identification Approach to Counterfactual Fairness Assessment
2510.00163v1
cs.LG, cs.AI, cs.CY, stat.ME
2025-10-05
Авторы:
Saeyoung Rho, Junzhe Zhang, Elias Bareinboim
Abstract
The wide adoption of AI decision-making systems in critical domains such as
criminal justice, loan approval, and hiring processes has heightened concerns
about algorithmic fairness. As we often only have access to the output of
algorithms without insights into their internal mechanisms, it was natural to
examine how decisions would alter when auxiliary sensitive attributes (such as
race) change. This led the research community to come up with counterfactual
fairness measures, but how to evaluate the measure from available data remains
a challenging task. In many practical applications, the target counterfactual
measure is not identifiable, i.e., it cannot be uniquely determined from the
combination of quantitative data and qualitative knowledge. This paper
addresses this challenge using partial identification, which derives
informative bounds over counterfactual fairness measures from observational
data. We introduce a Bayesian approach to bound unknown counterfactual fairness
measures with high confidence. We demonstrate our algorithm on the COMPAS
dataset, examining fairness in recidivism risk scores with respect to race,
age, and sex. Our results reveal a positive (spurious) effect on the COMPAS
score when changing race to African-American (from all others) and a negative
(direct causal) effect when transitioning from young to old age.