PrivaDE: Privacy-preserving Data Evaluation for Blockchain-based Data Marketplaces
2510.18109v1
cs.CR, cs.LG
2025-10-23
Авторы:
Wan Ki Wong, Sahel Torkamani, Michele Ciampi, Rik Sarkar
Abstract
Evaluating the relevance of data is a critical task for model builders
seeking to acquire datasets that enhance model performance. Ideally, such
evaluation should allow the model builder to assess the utility of candidate
data without exposing proprietary details of the model. At the same time, data
providers must be assured that no information about their data - beyond the
computed utility score - is disclosed to the model builder.
In this paper, we present PrivaDE, a cryptographic protocol for
privacy-preserving utility scoring and selection of data for machine learning.
While prior works have proposed data evaluation protocols, our approach
advances the state of the art through a practical, blockchain-centric design.
Leveraging the trustless nature of blockchains, PrivaDE enforces
malicious-security guarantees and ensures strong privacy protection for both
models and datasets. To achieve efficiency, we integrate several techniques -
including model distillation, model splitting, and cut-and-choose
zero-knowledge proofs - bringing the runtime to a practical level. Furthermore,
we propose a unified utility scoring function that combines empirical loss,
predictive entropy, and feature-space diversity, and that can be seamlessly
integrated into active-learning workflows. Evaluation shows that PrivaDE
performs data evaluation effectively, achieving online runtimes within 15
minutes even for models with millions of parameters.
Our work lays the foundation for fair and automated data marketplaces in
decentralized machine learning ecosystems.
Ссылки и действия
Дополнительные ресурсы: