Notable theory

Limitations of Learning Tanh Neural Networks with Finite Precision

Philipp Grohs, Matěj Trödler

Published
Jun 9, 2026 — 17:02 UTC

Problem
This work addresses the gap in understanding the limitations of learning $\tanh$ neural networks when subjected to finite-precision computations. Previous research has primarily focused on ReLU networks, leaving a lack of insights into the behavior of $\tanh$ networks under similar constraints. The authors build on the findings of Berner, Grohs, and Voigtländer (2023) to investigate how finite precision affects the convergence rates of adaptive randomized algorithms in the context of $L^p$ accuracy guarantees. This paper is a preprint and has not undergone peer review.

Method
The authors introduce a novel construction of sharply localized bump functions through iterated $\tanh$ activations. This construction serves as the foundation for their theoretical analysis. They demonstrate that in a finite-precision setting, any adaptive randomized algorithm utilizing $m$ samples cannot achieve a convergence rate exceeding the Monte Carlo rate of $O(m^{-1/p})$ in the $L^p$ norm. The critical insight is that to surpass this rate, the sampling budget must increase exponentially with respect to the network’s parameters and architecture. The theoretical framework is grounded in the analysis of the function space spanned by these bump functions, which are essential for learning tasks.

Results
The paper presents a significant finding: the convergence rate for learning $\tanh$ neural networks is fundamentally limited by finite precision, mirroring the constraints observed in ReLU networks. Specifically, the authors show that the best achievable convergence rate is $O(m^{-1/p})$, which is a direct consequence of the finite-precision environment. This result is particularly impactful as it quantifies the relationship between sample size and convergence in the context of $L^p$ norms, providing a clear benchmark against which other methods can be compared.

Limitations
The authors acknowledge that their results are contingent upon the assumptions of finite precision and the specific architecture of $\tanh$ networks. They do not explore the implications of other activation functions or the potential for alternative learning paradigms that might mitigate these limitations. Additionally, the focus on localized bump functions may restrict the generalizability of the findings to broader classes of neural networks. The paper does not address the practical implications of these theoretical constraints in real-world applications, which could be a significant area for future research.

Why it matters
This research has profound implications for the design and training of neural networks, particularly in environments where computational precision is limited. Understanding the constraints imposed by finite precision on the learnability of $\tanh$ networks can inform the development of more robust algorithms and architectures. The findings suggest that practitioners must consider the exponential growth of sampling budgets when working with complex network architectures. This work contributes to the broader discourse on neural network optimization and precision, as discussed in related literature, and is available on arXiv.

Turing Wire

By Turing Wire editorial staff · Jun 9, 2026 · Editorial standards →

Source: arXiv cs.LG