Major alignment safety

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

Xiaohua Zhan, Kazuki Egashira, Robin Staab, Mark Vero, Martin Vechev

Published: May 14, 2026 — 17:50 UTC

Problem
This paper addresses a significant gap in the security of large language model (LLM) quantization, particularly focusing on the vulnerability of advanced quantization techniques to adversarial attacks. Prior research has primarily concentrated on simpler quantization methods, leaving a void in understanding how more sophisticated schemes, such as Adaptive Weight Quantization (AWQ), GPTQ, and GGUF I-quants, can be compromised. The authors present a novel quantization-conditioned attack that consistently induces malicious behavior across these advanced methods, highlighting the broader implications of quantization security risks. This work is a preprint and has not yet undergone peer review.

Method
The core technical contribution of this paper is the introduction of an outlier injection attack that exploits the properties of modern quantization techniques. The attack operates by injecting large outliers into specific weight blocks of a model, which can lead to a targeted weight collapse, causing other weights to be rounded to zero during the quantization process. This method is designed to work across various quantization schemes, effectively bypassing the limitations of previous attacks that were restricted to simpler methods. The authors conduct extensive evaluations across three distinct attack scenarios and multiple LLM architectures, demonstrating the attack’s effectiveness against a range of quantization techniques.

Results
The authors report high success rates for their attack across multiple advanced quantization methods, achieving significant performance degradation in the targeted models. For instance, the attack successfully induced malicious behavior in models quantized with AWQ, GPTQ, and GGUF I-quants, where prior attacks had failed. While specific numerical results are not disclosed in the abstract, the authors emphasize that their approach demonstrates a broader vulnerability landscape than previously recognized, indicating that the security risks associated with quantization are not confined to simpler schemes.

Limitations
The authors acknowledge that their attack relies on the presence of outliers, which may not be universally applicable across all model architectures or datasets. Additionally, the paper does not explore the potential defenses against such attacks, leaving a gap in understanding how to mitigate these vulnerabilities. Another limitation is the focus on specific quantization methods, which may not encompass all existing techniques in the field. The generalizability of the attack to other model types or tasks remains to be fully validated.

Why it matters
This work has significant implications for the deployment of LLMs in security-sensitive applications. By demonstrating that advanced quantization methods are susceptible to adversarial manipulation, the authors raise critical awareness about the security of LLMs in real-world scenarios. This research could prompt further investigations into robust quantization techniques and the development of countermeasures to protect against such vulnerabilities. The findings underscore the necessity for a comprehensive understanding of the security landscape surrounding model quantization, which is increasingly relevant as LLMs become more widely adopted in various applications.

Authors: Xiaohua Zhan, Kazuki Egashira, Robin Staab, Mark Vero, Martin Vechev
Source: arXiv:2605.15152
URL: https://arxiv.org/abs/2605.15152v1

By Callan Zhang · May 14, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.AI