On The Dangers of Poisoned LLMs In Security Automation
2511.02600v1
cs.CR, cs.AI
2025-11-06
Авторы:
Patrick Karlsen, Even Eilertsen
Abstract
This paper investigates some of the risks introduced by "LLM poisoning," the
intentional or unintentional introduction of malicious or biased data during
model training. We demonstrate how a seemingly improved LLM, fine-tuned on a
limited dataset, can introduce significant bias, to the extent that a simple
LLM-based alert investigator is completely bypassed when the prompt utilizes
the introduced bias. Using fine-tuned Llama3.1 8B and Qwen3 4B models, we
demonstrate how a targeted poisoning attack can bias the model to consistently
dismiss true positive alerts originating from a specific user. Additionally, we
propose some mitigation and best-practices to increase trustworthiness,
robustness and reduce risk in applied LLMs in security applications.
Ссылки и действия
Дополнительные ресурсы: