Scalable Oversight via Partitioned Human Supervision
2510.22500v1
cs.LG, cs.AI, cs.CL
2025-10-29
Авторы:
Ren Yin, Takashi Ishida, Masashi Sugiyama
Abstract
As artificial intelligence (AI) systems approach and surpass expert human
performance across a broad range of tasks, obtaining high-quality human
supervision for evaluation and training becomes increasingly challenging. Our
focus is on tasks that require deep knowledge and skills of multiple domains.
Unfortunately, even the best human experts are knowledgeable only in a single
narrow area, and will not be able to evaluate the correctness of advanced AI
systems on such superhuman tasks. However, based on their narrow expertise,
humans may provide a weak signal, i.e., a complementary label indicating an
option that is incorrect. For example, a cardiologist could state that "this is
not related to cardiology,'' even if they cannot identify the true disease.
Based on this weak signal, we propose a scalable oversight framework that
enables us to evaluate frontier AI systems without the need to prepare the
ground truth. We derive an unbiased estimator of top-1 accuracy from
complementary labels and quantify how many complementary labels are needed to
match the variance of ordinary labels. We further introduce two estimators to
combine scarce ordinary labels with abundant complementary labels. We provide
finite-sample deviation guarantees for both complementary-only and the mixed
estimators. Empirically, we show that we can evaluate the output of large
language models without the ground truth, if we have complementary labels. We
further show that we can train an AI system with such weak signals: we show how
we can design an agentic AI system automatically that can perform better with
this partitioned human supervision. Our code is available at
https://github.com/R-Yin-217/Scalable-Oversight-via-Human-Partitioned-Supervision.
Ссылки и действия
Дополнительные ресурсы: