Notable interpretability

Metaphors in Literary Post-Editing: Opening Pandora's Box?

Aletta G. Dorst, Mayra O. Nas, Katinka Zeven

Published: May 20, 2026 — 13:45 UTC

Problem
This preprint addresses the gap in understanding how post-editors interact with metaphor translations produced by Neural Machine Translation (NMT) and Large Language Models (LLMs) in the context of literary texts. While previous research has explored the challenges of translating figurative language, there is limited empirical evidence on the specific reactions and adaptations of post-editors when faced with metaphorical content in machine-generated translations. This study aims to elucidate the complexities involved in post-editing literary translations, particularly regarding metaphor handling.

Method
The authors conducted a qualitative analysis of post-editing sessions involving literary texts translated by NMT and LLMs. They collected data from a cohort of post-editors who were tasked with revising machine-generated translations. The study focused on the frequency and nature of changes made to metaphorical expressions, categorizing them into types of modifications (e.g., literal vs. creative). The post-editors were also surveyed regarding their perceptions of translation quality and the effort required for post-editing compared to translating from scratch. The analysis included both quantitative metrics (e.g., one in three metaphors changed) and qualitative feedback from the post-editors.

Results
The findings reveal that approximately 33% of metaphorical expressions in the NMT output were altered by post-editors, indicating significant issues with the translation of figurative language in literary contexts. The post-editors reported that the overall quality of the machine-generated translations was poor, which aligns with their experiences of increased workload and effort during the post-editing process. Notably, the study highlights that post-editors found it particularly challenging to assess the acceptability of certain metaphor translations, especially multiword expressions. This suggests a disconnect between machine-generated outputs and the nuanced understanding required for literary translation.

Limitations
The authors acknowledge several limitations, including the small sample size of post-editors and the subjective nature of quality assessments, which may not generalize across different literary genres or languages. Additionally, the study does not explore the long-term implications of these findings on the broader field of literary translation or the potential for improving NMT systems specifically for metaphor translation. An obvious limitation not discussed is the lack of a control group using human translations for direct comparison, which could provide a clearer benchmark for evaluating the quality of machine-generated outputs.

Why it matters
This research has significant implications for the development of NMT and LLMs, particularly in enhancing their capabilities for translating figurative language in literary contexts. The findings underscore the necessity for improved algorithms that can better handle metaphors, which are critical for preserving the artistic and emotional nuances of literary texts. Furthermore, the study raises important questions about the role of post-editors in the translation process and the potential impact of machine translation on translator creativity and ownership. As the field of machine translation continues to evolve, understanding these dynamics will be crucial for developing more effective and user-friendly translation tools.

Authors: Aletta G. Dorst, Mayra O. Nas, Katinka Zeven
Source: arXiv:2605.21178
URL: https://arxiv.org/abs/2605.21178v1

By Callan Zhang · May 20, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CL