Universal and Transferable Attacks on Pathology Foundation Models
2510.16660v1
cs.CV, cs.LG, physics.med-ph
2025-10-22
Авторы:
Yuntian Wang, Xilin Yang, Che-Yung Shen, Nir Pillar, Aydogan Ozcan
Abstract
We introduce Universal and Transferable Adversarial Perturbations (UTAP) for
pathology foundation models that reveal critical vulnerabilities in their
capabilities. Optimized using deep learning, UTAP comprises a fixed and weak
noise pattern that, when added to a pathology image, systematically disrupts
the feature representation capabilities of multiple pathology foundation
models. Therefore, UTAP induces performance drops in downstream tasks that
utilize foundation models, including misclassification across a wide range of
unseen data distributions. In addition to compromising the model performance,
we demonstrate two key features of UTAP: (1) universality: its perturbation can
be applied across diverse field-of-views independent of the dataset that UTAP
was developed on, and (2) transferability: its perturbation can successfully
degrade the performance of various external, black-box pathology foundation
models - never seen before. These two features indicate that UTAP is not a
dedicated attack associated with a specific foundation model or image dataset,
but rather constitutes a broad threat to various emerging pathology foundation
models and their applications. We systematically evaluated UTAP across various
state-of-the-art pathology foundation models on multiple datasets, causing a
significant drop in their performance with visually imperceptible modifications
to the input images using a fixed noise pattern. The development of these
potent attacks establishes a critical, high-standard benchmark for model
robustness evaluation, highlighting a need for advancing defense mechanisms and
potentially providing the necessary assets for adversarial training to ensure
the safe and reliable deployment of AI in pathology.