Data Reliability Scoring

2510.17085v1 cs.LG, cs.GT, stat.ML 2025-10-22

Авторы:

Yiling Chen, Shi Feng, Paul Kattuman, Fang-Yi Yu

Abstract

How can we assess the reliability of a dataset without access to ground truth? We introduce the problem of reliability scoring for datasets collected from potentially strategic sources. The true data are unobserved, but we see outcomes of an unknown statistical experiment that depends on them. To benchmark reliability, we define ground-truth-based orderings that capture how much reported data deviate from the truth. We then propose the Gram determinant score, which measures the volume spanned by vectors describing the empirical distribution of the observed data and experiment outcomes. We show that this score preserves several ground-truth based reliability orderings and, uniquely up to scaling, yields the same reliability ranking of datasets regardless of the experiment -- a property we term experiment agnosticism. Experiments on synthetic noise models, CIFAR-10 embeddings, and real employment data demonstrate that the Gram determinant score effectively captures data quality across diverse observation processes.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Data Reliability Scoring

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamic...

Look-Ahead Reasoning on Learning Platforms

Tight Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum ...

Calibration through the Lens of Indistinguishability

Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor...

Навигация