SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards
2509.26640v1
cs.LG, cs.CR
2025-10-02
Авторы:
João Vitorino, Eva Maia, Isabel Praça, Carlos Soares
Abstract
Due to the susceptibility of Artificial Intelligence (AI) to data
perturbations and adversarial examples, it is crucial to perform a thorough
robustness evaluation before any Machine Learning (ML) model is deployed.
However, examining a model's decision boundaries and identifying potential
vulnerabilities typically requires access to the training and testing datasets,
which may pose risks to data privacy and confidentiality. To improve
transparency in organizations that handle confidential data or manage critical
infrastructure, it is essential to allow external verification and validation
of AI without the disclosure of private datasets. This paper presents
Systematic Pattern Analysis (SPATA), a deterministic method that converts any
tabular dataset to a domain-independent representation of its statistical
patterns, to provide more detailed and transparent data cards. SPATA computes
the projection of each data instance into a discrete space where they can be
analyzed and compared, without risking data leakage. These projected datasets
can be reliably used for the evaluation of how different features affect ML
model robustness and for the generation of interpretable explanations of their
behavior, contributing to more trustworthy AI.
Ссылки и действия
Дополнительные ресурсы: