Notable efficiency inference

Dense Contexts Are Hard Contexts: Lexical Density Limits Effective Context in LLMs

Giovanni Dettori, Matteo Boffa, Danilo Giordano, Idilio Drago, Marco Mellia

Published
Jun 4, 2026 — 14:08 UTC

Problem
This work addresses a gap in the understanding of factors influencing the effective context window of large language models (LLMs), specifically focusing on lexical density, which has been largely overlooked in existing literature. While input length and the positioning of relevant information are well-documented causes of performance degradation in long-context scenarios, this study posits that the rate at which distinct information is introduced—lexical density—also significantly impacts LLM performance. The authors present their findings in a preprint format, indicating that the research is unreviewed.

Method
The authors conducted experiments using open-weight LLMs ranging from 9 billion to 685 billion parameters. They designed three “find-the-needle” benchmarks, each with a consistent length of approximately 12,000 tokens but varying in lexical density. The benchmarks were structured to control for needle position while systematically increasing the density of information. The performance of the models was evaluated based on their ability to retrieve relevant information from these dense contexts. To isolate the effects of lexical density, the authors varied and controlled the density within each benchmark while maintaining other properties constant. This methodological rigor allowed them to demonstrate the direct correlation between lexical density and effective context capacity.

Results
The findings reveal a pronounced performance collapse in high-density benchmarks, with models that achieved near-perfect retrieval scores in sparse contexts dropping below 60% in denser scenarios. This stark contrast underscores the detrimental impact of lexical density on LLM performance. Specifically, the results indicate that reducing lexical density can restore performance levels, particularly in high-density regimes where degradation is most pronounced. The study quantitatively illustrates that effective context capacity is not solely a function of input length but is significantly influenced by the density of information presented.

Limitations
The authors acknowledge that their study is limited to the specific benchmarks and models tested, which may not generalize across all LLM architectures or tasks. Additionally, while they control for various factors, the potential influence of other contextual elements remains unexplored. The focus on “find-the-needle” tasks may also limit the applicability of the findings to more complex, real-world scenarios where context and information density vary dynamically.

Why it matters
This research has significant implications for the design and deployment of LLMs in real-world applications, particularly those involving compact, information-rich inputs. Understanding the role of lexical density can inform strategies for optimizing model performance in scenarios where context is dense and complex. The insights gained from this study could lead to improved architectures or training methodologies that account for lexical density, enhancing the effectiveness of LLMs in practical applications. This work contributes to the ongoing discourse in the field, as highlighted in related studies on context management in LLMs, and is available on arXiv.

Turing Wire

By Turing Wire editorial staff · Jun 4, 2026 · Editorial standards →

Source: arXiv cs.CL