Major alignment safety

Grounding Text Embeddings in Stakeholder Associations

Jonathan Rystrøm, Sofie Burgos-Thorsen, Zihao Fu, Johan Irving Søltoft, Kenneth C. Enevoldsen, Chris Russell

Published: May 26, 2026 — 15:24 UTC

Problem
This preprint addresses a significant gap in the literature regarding the alignment of text embeddings with human semantic understanding. While text embeddings are extensively utilized for analyzing complex texts, there is a lack of empirical evidence demonstrating that these embeddings accurately reflect the semantic distances perceived by human experts. The authors highlight the necessity of ensuring that embedding representations align with human intentions to facilitate valid analyses, particularly in the context of policy issues.

Method
The authors introduce the Stakeholder Grounding Exercise, a novel methodology designed to make expert associations explicit and to ground the results of embedding models in human understanding. This method involves a structured process where domain experts evaluate and rank the relevance of various text embeddings concerning specific stakeholder associations. The primary case study focuses on Danish policy issues, where the authors compare the reliability of neural text embeddings against human expert assessments. They quantify the misalignment between embeddings and expert evaluations, reporting a gap of 19-26 percentage points (pp). Additionally, they assess the impact of this misalignment on downstream clustering performance, finding a high Spearman correlation coefficient (ρ=0.9) between the exercise rankings and cluster quality. A secondary study on US Federal AI use cases replicates these findings in English, confirming a consistent gap of 16 pp across different expert communities and methodologies.

Results
The results indicate that neural text embeddings exhibit substantial misalignment with human expert evaluations, with gaps of 19-26 pp in the primary study and 16 pp in the secondary study. The high Spearman correlation (ρ=0.9) between expert rankings and clustering quality underscores the practical implications of this misalignment, suggesting that poor alignment can adversely affect the performance of clustering algorithms that rely on these embeddings. The findings are robust across different domains and expert communities, indicating that the observed gaps are not artifacts of specific instruments or contexts.

Limitations
The authors acknowledge several limitations, including the potential for bias in expert evaluations and the specific focus on policy issues, which may not generalize to other domains. They also note that the Stakeholder Grounding Exercise may require significant time and resources to implement effectively. Additionally, the study does not explore the underlying reasons for the observed misalignment, leaving open questions regarding the specific characteristics of embeddings that contribute to this gap. The reliance on expert judgment also introduces variability that may affect reproducibility.

Why it matters
This work has significant implications for the development and application of text embeddings in various domains, particularly in policy analysis and decision-making contexts. By highlighting the misalignment between embeddings and human understanding, the authors advocate for the integration of expert evaluations into the embedding assessment process. This approach could enhance the validity of analyses derived from text embeddings and improve the performance of downstream tasks such as clustering and classification. The Stakeholder Grounding Exercise serves as a practical framework for researchers and practitioners to ensure that their embedding models capture the semantic distinctions that are most relevant to domain experts, ultimately leading to more reliable and interpretable AI systems.

Authors: Jonathan Rystrøm, Sofie Burgos-Thorsen, Zihao Fu, Johan Irving Søltoft, Kenneth C. Enevoldsen, Chris Russell
Source: arXiv:2605.27168
URL: https://arxiv.org/abs/2605.27168v1

By Callan Zhang · May 26, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CL