Ask What Your Country Can Do For You: Towards a Public Red Teaming Model
2510.20061v1
cs.CY, cs.AI, cs.CR
2025-10-25
Авторы:
Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave, Blake Chambers, Aayush Dhanotiya, Darshini Ramiah, Reva Schwartz, Jack Hagen, Akash Kundu, Mouni Pendharkar, Liam Baisley, Theodora Skeadas, Rumman Chowdhury
Abstract
AI systems have the potential to produce both benefits and harms, but without
rigorous and ongoing adversarial evaluation, AI actors will struggle to assess
the breadth and magnitude of the AI risk surface. Researchers from the field of
systems design have developed several effective sociotechnical AI evaluation
and red teaming techniques targeting bias, hate speech, mis/disinformation, and
other documented harm classes. However, as increasingly sophisticated AI
systems are released into high-stakes sectors (such as education, healthcare,
and intelligence-gathering), our current evaluation and monitoring methods are
proving less and less capable of delivering effective oversight.
In order to actually deliver responsible AI and to ensure AI's harms are
fully understood and its security vulnerabilities mitigated, pioneering new
approaches to close this "responsibility gap" are now more urgent than ever. In
this paper, we propose one such approach, the cooperative public AI red-teaming
exercise, and discuss early results of its prior pilot implementations. This
approach is intertwined with CAMLIS itself: the first in-person public
demonstrator exercise was held in conjunction with CAMLIS 2024. We review the
operational design and results of this exercise, the prior National Institute
of Standards and Technology (NIST)'s Assessing the Risks and Impacts of AI
(ARIA) pilot exercise, and another similar exercise conducted with the
Singapore Infocomm Media Development Authority (IMDA). Ultimately, we argue
that this approach is both capable of delivering meaningful results and is also
scalable to many AI developing jurisdictions.
Ссылки и действия
Дополнительные ресурсы: