PromptLocate: Localizing Prompt Injection Attacks

2510.12252v1 cs.CR, cs.AI 2025-10-16

Авторы:

Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong

Abstract

Prompt injection attacks deceive a large language model into completing an attacker-specified task instead of its intended task by contaminating its input data with an injected prompt, which consists of injected instruction(s) and data. Localizing the injected prompt within contaminated data is crucial for post-attack forensic analysis and data recovery. Despite its growing importance, prompt injection localization remains largely unexplored. In this work, we bridge this gap by proposing PromptLocate, the first method for localizing injected prompts. PromptLocate comprises three steps: (1) splitting the contaminated data into semantically coherent segments, (2) identifying segments contaminated by injected instructions, and (3) pinpointing segments contaminated by injected data. We show PromptLocate accurately localizes injected prompts across eight existing and eight adaptive attacks.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

PromptLocate: Localizing Prompt Injection Attacks

Авторы:

Abstract

Ссылки и действия

Связанные статьи

A Light-Weight Large Language Model File Format for Highly-Secure Model Distribu...

SoK: a Comprehensive Causality Analysis Framework for Large Language Model Secur...

Hey GPT-OSS, Looks Like You Got It - Now Walk Me Through It! An Assessment of th...

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

Large Language Model based Smart Contract Auditing with LLMBugScanner

Навигация