Provenance Networks: End-to-End Exemplar-Based Explainability
2510.03361v1
cs.CV, cs.AI, cs.LG
2025-10-08
Авторы:
Ali Kayyam, Anusha Madan Gopal, M. Anthony Lewis
Abstract
We introduce provenance networks, a novel class of neural models designed to
provide end-to-end, training-data-driven explainability. Unlike conventional
post-hoc methods, provenance networks learn to link each prediction directly to
its supporting training examples as part of the model's normal operation,
embedding interpretability into the architecture itself. Conceptually, the
model operates similarly to a learned KNN, where each output is justified by
concrete exemplars weighted by relevance in the feature space. This approach
facilitates systematic investigations of the trade-off between memorization and
generalization, enables verification of whether a given input was included in
the training set, aids in the detection of mislabeled or anomalous data points,
enhances resilience to input perturbations, and supports the identification of
similar inputs contributing to the generation of a new data point. By jointly
optimizing the primary task and the explainability objective, provenance
networks offer insights into model behavior that traditional deep networks
cannot provide. While the model introduces additional computational cost and
currently scales to moderately sized datasets, it provides a complementary
approach to existing explainability techniques. In particular, it addresses
critical challenges in modern deep learning, including model opaqueness,
hallucination, and the assignment of credit to data contributors, thereby
improving transparency, robustness, and trustworthiness in neural models.
Ссылки и действия
Дополнительные ресурсы: