Notable theory

Expressivity of congruence-based architectures for DNNs on positive-definite matrices

Antonin Oswald, Estelle Massart

Published: Jun 1, 2026 — 17:01 UTC

Problem
This work addresses the gap in understanding the expressivity of neural architectures specifically designed for classifying symmetric positive-definite (SPD) matrices. The authors highlight that existing architectures, particularly those utilizing congruence-like layers, may suffer from expressivity limitations due to the imposition of (semi)-orthogonality constraints on the weight matrix ( W ). This research is particularly relevant as it is presented as a preprint, indicating that it has not yet undergone peer review.

Method
The core technical contribution involves a detailed analysis of congruence-like layers, where the input SPD matrix is transformed by multiplying it on both sides with a weight matrix ( W ) and its transpose. The authors demonstrate that when ( W ) is constrained to be semi-orthogonal, the resulting architecture can collapse to a one-hidden-layer equivalent for certain activation functions, thus limiting its expressivity. The study employs theoretical insights from Poincaré’s separation theorem to elucidate the loss of spectral diversity in these layers. Additionally, the authors evaluate various Riemannian classifiers to determine their compatibility with the feature maps generated by congruence-like layers, providing a comparative analysis of their performance.

Results
The paper does not provide specific quantitative results or performance metrics against established baselines, focusing instead on theoretical implications and the expressivity limitations of the architectures under consideration. The authors emphasize that the semi-orthogonality constraint leads to a significant reduction in the model’s capacity to learn complex representations, which could be quantitatively assessed in future work.

Limitations
The authors acknowledge that the imposition of semi-orthogonality on ( W ) is a primary limitation affecting the expressivity of congruence-like layers. They do not explore the potential benefits of relaxing this constraint or the implications of using alternative activation functions that might mitigate the expressivity loss. Additionally, the lack of empirical validation through experiments on benchmark datasets is a notable gap, as the theoretical findings remain untested in practical scenarios.

Why it matters
This research has significant implications for the design of neural architectures aimed at processing SPD matrices, which are prevalent in various applications such as computer vision and medical imaging. By elucidating the limitations of congruence-based architectures, the findings encourage further exploration of alternative architectures that do not impose strict orthogonality constraints, potentially leading to more expressive models. This work lays the groundwork for future studies that could empirically validate the theoretical claims and explore novel architectures for SPD matrix classification, as published in arXiv.

By Callan Zhang · Jun 1, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG