From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation
2511.03128v1
cs.LG, cs.CL
2025-11-07
Авторы:
Najrin Sultana, Md Rafi Ur Rashid, Kang Gu, Shagufta Mehnaz
Abstract
LLMs can provide substantial zero-shot performance on diverse tasks using a
simple task prompt, eliminating the need for training or fine-tuning. However,
when applying these models to sensitive tasks, it is crucial to thoroughly
assess their robustness against adversarial inputs. In this work, we introduce
Static Deceptor (StaDec) and Dynamic Deceptor (DyDec), two innovative attack
frameworks designed to systematically generate dynamic and adaptive adversarial
examples by leveraging the understanding of the LLMs. We produce subtle and
natural-looking adversarial inputs that preserve semantic similarity to the
original text while effectively deceiving the target LLM. By utilizing an
automated, LLM-driven pipeline, we eliminate the dependence on external
heuristics. Our attacks evolve with the advancements in LLMs and demonstrate
strong transferability across models unknown to the attacker. Overall, this
work provides a systematic approach for the self-assessment of an LLM's
robustness. We release our code and data at
https://github.com/Shukti042/AdversarialExample.
Ссылки и действия
Дополнительные ресурсы: