SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space
2510.24446v1
cs.CL, cs.CV
2025-10-30
Авторы:
Viktoriia Zinkovich, Anton Antonov, Andrei Spiridonov, Denis Shepelev, Andrey Moskalenko, Daria Pugacheva, Elena Tutubalina, Andrey Kuznetsov, Vlad Shakhuro
Abstract
Multimodal large language models (MLLMs) have shown impressive capabilities
in vision-language tasks such as reasoning segmentation, where models generate
segmentation masks based on textual queries. While prior work has primarily
focused on perturbing image inputs, semantically equivalent textual
paraphrases-crucial in real-world applications where users express the same
intent in varied ways-remain underexplored. To address this gap, we introduce a
novel adversarial paraphrasing task: generating grammatically correct
paraphrases that preserve the original query meaning while degrading
segmentation performance. To evaluate the quality of adversarial paraphrases,
we develop a comprehensive automatic evaluation protocol validated with human
studies. Furthermore, we introduce SPARTA-a black-box, sentence-level
optimization method that operates in the low-dimensional semantic latent space
of a text autoencoder, guided by reinforcement learning. SPARTA achieves
significantly higher success rates, outperforming prior methods by up to 2x on
both the ReasonSeg and LLMSeg-40k datasets. We use SPARTA and competitive
baselines to assess the robustness of advanced reasoning segmentation models.
We reveal that they remain vulnerable to adversarial paraphrasing-even under
strict semantic and grammatical constraints. All code and data will be released
publicly upon acceptance.
Ссылки и действия
Дополнительные ресурсы: