Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection
Problem This paper addresses the gap in the detection of multi-turn prompt injection attacks on large language models (LLMs), specifically focusing on covert attacks where individual turns appear benign. Existing...