Notable alignment safety

S-GBT: Smooth Growth Bound Tensor for Certified Robustness Against Word Substitution Attacks in NLP

Mohammed Bouri, Mohammed Erradi, Adnane Saoud

Published
Jun 11, 2026 — 15:01 UTC

Problem — Despite advancements in NLP, models are still susceptible to word substitution attacks, which exploit vulnerabilities in input perturbations. Existing defenses primarily focus on first-order sensitivity, neglecting the evolution of this sensitivity characterized by curvature. This paper addresses this gap by introducing a second-order method, S-GBT, to provide a more comprehensive robustness framework. The work is presented as a preprint and has not undergone peer review.

Method — The core contribution is the Smooth Growth Bound Tensor (S-GBT), which bounds the Hessian matrix element-wise. The authors derive S-GBT for two architectures: Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs). The method integrates a regularization term into the training objective to minimize the bounds on the output changes due to word substitutions. The output change is bounded by both linear and quadratic terms, allowing for tighter certified robustness guarantees. The training process involves adjusting the model parameters to optimize these bounds, leveraging second-order information to enhance robustness.

Results — The effectiveness of S-GBT is evaluated across multiple benchmark datasets. The results indicate that the proposed method achieves an improvement in certified robust accuracy by up to 23.4% compared to existing first-order methods. Notably, the clean accuracy of the models remains competitive, suggesting that the introduction of second-order regularization does not compromise performance on unperturbed data. Specific baseline comparisons are not detailed in the abstract, but the significant improvement in robustness metrics is emphasized.

Limitations — The authors acknowledge that while S-GBT provides tighter bounds, the computational overhead associated with second-order methods may limit scalability in large-scale applications. Additionally, the paper does not explore the generalization of S-GBT beyond the tested architectures (LSTM and CNN), which may restrict its applicability to other model types. The reliance on specific datasets for evaluation may also raise questions about the robustness of the findings across diverse NLP tasks.

Why it matters — The introduction of S-GBT represents a significant advancement in the pursuit of certified robustness in NLP models, particularly against adversarial word substitution attacks. By incorporating second-order information, the method opens new avenues for enhancing model resilience, which is critical for deploying NLP systems in security-sensitive applications. The findings suggest that controlling both gradient and curvature can lead to more robust architectures, a direction that could influence future research in adversarial training and model robustness, as published in arXiv cs.CL.

Turing Wire

By Turing Wire editorial staff · Jun 11, 2026 · Editorial standards →

Source: arXiv cs.CL