Monte Carlo Expected Threat (MOCET) Scoring

2511.16823v1 cs.LG, cs.AI, cs.HC 2025-11-25

Авторы:

Joseph Kim, Saahith Potluri

Abstract

Evaluating and measuring AI Safety Level (ASL) threats are crucial for guiding stakeholders to implement safeguards that keep risks within acceptable limits. ASL-3+ models present a unique risk in their ability to uplift novice non-state actors, especially in the realm of biosecurity. Existing evaluation metrics, such as LAB-Bench, BioLP-bench, and WMDP, can reliably assess model uplift and domain knowledge. However, metrics that better contextualize "real-world risks" are needed to inform the safety case for LLMs, along with scalable, open-ended metrics to keep pace with their rapid advancements. To address both gaps, we introduce MOCET, an interpretable and doubly-scalable metric (automatable and open-ended) that can quantify real-world risks.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Monte Carlo Expected Threat (MOCET) Scoring

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environme...

Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An A...

From Prototypes to Sparse ECG Explanations: SHAP-Driven Counterfactuals for Mult...

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

Benchmark It Yourself (BIY): Preparing a Dataset and Benchmarking AI Models for ...

Навигация