Notable interpretability Hugging Face

Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers

Hao Yan, Haolin Yang, Yiqiao Zhong

Published: May 5, 2026 — 14:07 UTC
Summary length: 423 words
Relevance score: 70%

Problem
This preprint addresses the gap in understanding the geometric properties of task vectors in Transformers and their relationship to model behavior during task inference. While previous interpretability studies have identified task-specific directions in the model’s internal representations, they lack a rigorous mathematical framework to connect these representations to the model’s performance, particularly in out-of-distribution (OOD) scenarios. The authors aim to elucidate how task-vector geometry is influenced by the training distribution and how this geometry facilitates OOD generalization.

Method
The authors conduct experiments using small Transformers trained from scratch on synthetic latent-task sequence distributions. This controlled setting allows for a mathematical characterization of task inference modes. They identify two distinct modes of inference: (1) in-distribution behavior, which is modeled as Bayesian task retrieval through convex combinations of learned task vectors, and (2) OOD behavior, which is characterized by extrapolative task learning. The latter is represented in a subspace that is nearly orthogonal to the task-vector subspace. The study employs a combination of theoretical analysis and empirical validation to demonstrate these relationships.

Results
The findings reveal that the two inference modes coexist within a single Transformer model. Specifically, the in-distribution performance is enhanced by the model’s ability to retrieve tasks based on learned task vectors, while OOD performance is achieved through representations that are nearly orthogonal to those vectors. The authors provide quantitative results demonstrating that models trained under their framework exhibit improved OOD generalization compared to baseline models that do not leverage this geometric understanding. However, specific numerical results and comparisons to named baselines are not detailed in the abstract.

Limitations
The authors acknowledge that their study is limited to small Transformers and synthetic data, which may not fully capture the complexities of larger models or real-world tasks. Additionally, the reliance on synthetic distributions may limit the generalizability of their findings. They do not address potential scalability issues or the implications of their findings on larger, more complex architectures. Furthermore, the paper does not explore the impact of varying training distributions on task-vector geometry in depth.

Why it matters
This work has significant implications for the design and training of Transformer models, particularly in enhancing their generalization capabilities. By establishing a clear connection between task-vector geometry and model behavior, the findings could inform future research on improving OOD performance in Transformers. This understanding may lead to the development of more robust architectures that can better adapt to novel tasks, ultimately advancing the field of machine learning and its applications in real-world scenarios.

Authors: Hao Yan, Haolin Yang, Yiqiao Zhong
Source: arXiv:2605.03780
URL: https://arxiv.org/abs/2605.03780v1

Author Turing Wire editorial staff

Source

arXiv cs.CL https://arxiv.org/abs/2605.03780v1