Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
2510.27629v1
cs.CR, cs.AI
2025-11-04
Авторы:
Boyi Wei, Zora Che, Nathaniel Li, Udari Madhushani Sehwag, Jasper Götting, Samira Nedungadi, Julian Michael, Summer Yue, Dan Hendrycks, Peter Henderson, Zifan Wang, Seth Donoughe, Mantas Mazeika
Abstract
Open-weight bio-foundation models present a dual-use dilemma. While holding
great promise for accelerating scientific research and drug development, they
could also enable bad actors to develop more deadly bioweapons. To mitigate the
risk posed by these models, current approaches focus on filtering biohazardous
data during pre-training. However, the effectiveness of such an approach
remains unclear, particularly against determined actors who might fine-tune
these models for malicious use. To address this gap, we propose \eval, a
framework to evaluate the robustness of procedures that are intended to reduce
the dual-use capabilities of bio-foundation models. \eval assesses models'
virus understanding through three lenses, including sequence modeling,
mutational effects prediction, and virulence prediction. Our results show that
current filtering practices may not be particularly effective: Excluded
knowledge can be rapidly recovered in some cases via fine-tuning, and exhibits
broader generalizability in sequence modeling. Furthermore, dual-use signals
may already reside in the pretrained representations, and can be elicited via
simple linear probing. These findings highlight the challenges of data
filtering as a standalone procedure, underscoring the need for further research
into robust safety and security strategies for open-weight bio-foundation
models.
Ссылки и действия
Дополнительные ресурсы: