Notable other

Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction

Takumi Goto, Yusuke Sakai, Taro Watanabe

Published: May 13, 2026 — 14:52 UTC

Problem
This paper addresses the prevalent issue of over-correction in grammatical error correction (GEC) tasks using large language models (LLMs). Over-correction occurs when a model not only corrects errors but also introduces new ones, leading to degraded output quality. The authors propose a novel approach that does not require modifications to existing models or additional training, filling a gap in the literature regarding efficient inference methods that can enhance GEC performance without incurring the costs associated with retraining.

Method
The core technical contribution is the introduction of an edit-level majority voting mechanism applied during inference. This method aggregates multiple candidate corrections generated by a single LLM, allowing for a more robust decision-making process that mitigates the risk of over-correction. The authors do not disclose specific architectural details of the LLM used, but they emphasize that their approach is training-free, relying solely on the model’s output during inference. The method is evaluated across nine diverse benchmarks, which include languages such as English, Czech, German, Ukrainian, Korean, Hindi, and Romanian. The authors also provide two repositories for GEC dataset loading and LLM inference, facilitating reproducibility and further research.

Results
The proposed edit-level majority voting method demonstrates superior performance compared to both greedy decoding and minimum Bayes risk (MBR) decoding across the evaluated benchmarks. While specific numerical results are not detailed in the abstract, the authors claim that their method consistently outperforms these baselines in most cases, indicating a significant improvement in correction quality. Additionally, the method exhibits stable performance irrespective of the instruction prompts used, suggesting robustness across varying input conditions.

Limitations
The authors acknowledge that their approach, while effective, is limited to the context of inference and does not address potential underlying issues in model training that may contribute to over-correction. They do not explore the computational efficiency of their method in terms of time complexity or resource usage during inference, which could be relevant for real-time applications. Furthermore, the reliance on a single model for generating candidates may limit the diversity of corrections compared to ensemble methods that utilize multiple models.

Why it matters
This work has significant implications for the field of grammatical error correction, particularly in enhancing the reliability of LLMs in practical applications. By providing a method that improves correction quality without necessitating retraining, the authors contribute to the ongoing discourse on efficient model deployment in NLP tasks. The edit-level majority voting approach could serve as a foundation for future research aimed at refining inference strategies in GEC and potentially other NLP applications where over-correction is a concern. The release of supporting datasets and inference tools also encourages further exploration and validation of the proposed method in diverse linguistic contexts.

Authors: Takumi Goto, Yusuke Sakai, Taro Watanabe
Source: arXiv:2605.13624
URL: https://arxiv.org/abs/2605.13624v1

By Callan Zhang · May 13, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CL