Major interpretability

Beyond Probabilistic Similarity: Structural, Temporal, and Causal Limitations of Retrieval-Augmented Generation in the Legal Domain

Hudson de Martim

Published: Jun 8, 2026 — 16:46 UTC

Problem
This paper addresses the limitations of Retrieval-Augmented Generation (RAG) in the legal domain, highlighting its failures such as fabricated citations and outdated legal content. The authors argue that these issues stem from an architectural mismatch between probabilistic retrieval methods and the complex, hierarchical, and temporal nature of legal knowledge. This work is a preprint and has not undergone peer review, indicating that the findings should be interpreted with caution.

Method
The authors propose a framework based on classical legal theory, articulating three core properties of legal knowledge: hierarchical structure, diachronic dynamism, and causal traceability. They identify three pathologies of retrieval that arise from these properties: mereological blindness (failure to recognize part-whole relationships), diachronic blindness (inability to account for temporal changes), and causal opacity (lack of clarity regarding the provenance of legal norms). Each pathology is defined operationally, with failure mechanisms and diagnostic criteria. The authors then outline four architectural commitments for a deterministic approach to legal retrieval: ontological primacy (prioritizing the structure of legal knowledge), event reification (treating legal events as first-class entities), bitemporal correctness (ensuring accuracy across different time frames), and deterministic interaction protocols (establishing clear rules for system interactions).

Results
The paper does not present quantitative results or benchmark comparisons against existing RAG models, focusing instead on qualitative analysis and theoretical contributions. The authors critique current state-of-the-art approaches, indicating that they inadequately address the identified pathologies and fail to integrate the proposed architectural commitments into a cohesive framework. The lack of empirical validation is a notable gap in the findings.

Limitations
The authors acknowledge that their framework is primarily conceptual and lacks empirical testing against real-world legal datasets. They do not provide specific metrics or performance comparisons with existing models, which limits the practical applicability of their claims. Additionally, the focus on legislative and constitutional retrieval may not generalize to other areas of legal AI, such as case law analysis or predictive legal analytics.

Why it matters
This work has significant implications for the development of more reliable legal AI systems. By addressing the structural and temporal limitations of current retrieval methods, the proposed framework could lead to more accurate and contextually aware legal information retrieval. This is crucial for applications that require high levels of precision and accountability, such as legal research and court submissions. The insights presented in this paper could inform future research directions and architectural designs in legal AI, as discussed in the context of the limitations of RAG systems, as published in arXiv.

By Callan Zhang · Jun 8, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.AI