Notable interpretability

Scene Abstraction for Lexical Semantics: Structured Representations of Situated Meaning

Yejin Cho, Katrin Erk

Published: May 21, 2026 — 14:26 UTC

Problem
This paper addresses the gap in computational models of lexical semantics that fail to capture the situated dimensions of word meaning, which are crucial for understanding context-dependent interpretations. The authors propose a novel framework, Scene Abstraction, to construct structured representations of the interpretive scenes associated with words in various usage contexts. This work is presented as a preprint and has not yet undergone peer review.

Method
The core technical contribution is the Scene Abstraction framework, which delineates two main components: the Contextual Scene and the Expression Profile. The Contextual Scene encompasses Events, Entities, and Setting, while the Expression Profile includes Engaged Events, Generalizable Properties, and Evoked Emotions. The authors operationalize this framework using few-shot prompting of a large language model, allowing for the generation of structured representations from minimal input. The dataset COCA-Scenes is introduced, comprising 520 usage instances across 26 keywords, specifically designed for distinct scene identification. The authors employ two experiments to validate their framework, measuring the accuracy of scene identification and the alignment of scene profiles with human interpretations.

Results
The results demonstrate that the Scene Abstraction framework significantly improves the identification of lexical scenes. The accuracy of scene identification reached 82.4%, which is an increase of 11.8 percentage points over traditional text-only embeddings. Furthermore, when comparing the scene profiles generated by their framework to ATOMIC-based alternatives, the authors found that 86.4% of participants preferred the former across three semantic dimensions, indicating a closer alignment with human interpretations of word meanings in context.

Limitations
The authors acknowledge several limitations, including the reliance on a specific dataset (COCA-Scenes) that may not generalize across all lexical items or contexts. Additionally, the few-shot prompting approach may introduce variability based on the model’s pre-training and the specific prompts used. The study’s reliance on human evaluators for preference judgments may also introduce subjective biases. Notably, the authors do not address potential scalability issues when applying this framework to larger datasets or more diverse linguistic contexts.

Why it matters
The implications of this work are significant for downstream applications in natural language processing (NLP) and computational linguistics. By providing a structured representation of situated lexical meaning, the Scene Abstraction framework enhances the understanding of word usage in context, which can improve tasks such as semantic similarity, sentiment analysis, and contextualized language generation. This framework could also inform the development of more sophisticated models that better capture the nuances of human language, ultimately leading to more effective AI systems in understanding and generating natural language.

Authors: Yejin Cho, Katrin Erk
Source: arXiv:2605.22542
URL: https://arxiv.org/abs/2605.22542v1

By Callan Zhang · May 21, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CL