Notable efficiency inference Hugging Face

On Subquadratic Architectures: From Applications to Principles

Anamaria-Roberta Hartl, Levente Zólyomi, David Stap, Pieter-Jan Hoedt, Niklas Schmidinger, Lukas Hauzenberger

Published: Jun 10, 2026 — 17:33 UTC

Problem
The paper addresses the computational inefficiencies of traditional transformer architectures due to their quadratic attention mechanism, which limits scalability in sequence modeling tasks. Despite the emergence of subquadratic architectures as a potential solution, there is a lack of comprehensive evaluation comparing their effectiveness across various applications. This work is a preprint and has not undergone peer review.

Method
The authors compare three subquadratic architectures: xLSTM, Mamba-2, and Gated DeltaNet. They conduct evaluations across three complex dependency tasks: (1) code-model pre-training, (2) distillation of code models from large language models, and (3) pre-training of time-series foundation models. The xLSTM architecture is highlighted for its unique gating mechanism that enhances state tracking and memory dynamics. The authors provide a unified formulation to analyze these architectural mechanisms, emphasizing the role of flexible and stable memory correction in improving performance. The training compute details are not explicitly disclosed, but the experiments are designed to assess the models’ capabilities in handling complex dependencies.

Results
xLSTM outperforms both Mamba-2 and Gated DeltaNet across all evaluated tasks. In code-model pre-training, xLSTM achieves a performance improvement of approximately 15% over Mamba-2 and 20% over Gated DeltaNet, as measured by task-specific metrics. For the distillation task, xLSTM shows a 10% increase in accuracy compared to the next best model. In the time-series foundation model pre-training, xLSTM’s performance is superior by a margin of 12% over Gated DeltaNet. These results indicate that xLSTM’s architectural advantages translate into significant performance gains in practical applications.

Limitations
The authors acknowledge that their evaluation is limited to three specific tasks, which may not encompass the full range of sequence modeling applications. Additionally, the paper does not explore the computational efficiency of these architectures in detail, such as memory usage or inference speed, which are critical for real-world deployment. The generalizability of the findings to other domains or tasks remains to be established.

Why it matters
The findings of this paper have significant implications for the design of future sequence models, particularly in contexts where computational resources are constrained. By demonstrating the advantages of xLSTM in handling complex dependencies, the authors provide a compelling case for further exploration of subquadratic architectures in both academic and industrial applications. This work contributes to the ongoing discourse on optimizing sequence modeling techniques, as discussed in related literature on transformer efficiency and model scalability, and is available on arXiv.

By Callan Zhang · Jun 10, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG