Neural Weight Norm = Kolmogorov Complexity
Tiberiu Musat
- Published
- May 11, 2026 — 17:27 UTC
- Summary length
- 487 words
- Relevance score
- 80%
Problem
This preprint addresses the theoretical underpinnings of weight decay in neural networks, specifically investigating why weight decay is effective in regularizing models. The authors establish a connection between the weight norm of a neural network and the Kolmogorov complexity of the output binary strings, filling a gap in the literature regarding the interpretability of weight decay as a form of prior regularization. The work is particularly relevant for understanding the implications of weight norms in fixed-precision settings, which has not been thoroughly explored in prior research.
Method
The core technical contribution is a proof demonstrating that, in a fixed-precision regime, the smallest weight norm of a looped neural network that outputs a binary string is equivalent to the Kolmogorov complexity of that string, up to a logarithmic factor. The authors show that weight decay induces a prior that approximates Solomonoff’s universal prior, which is optimal for computable functions, with a polynomial factor discrepancy. The proof involves two key reductions: (1) encoding any program for a universal Turing machine into neural weights at a unit cost per program bit, and (2) describing any fixed-precision network by enumerating its non-zero parameters with logarithmic addressing overhead. The results are norm-agnostic, indicating that any weight norm collapses to the non-zero parameter count, maintaining the same bounds. The authors emphasize that the fixed-precision assumption is crucial, as infinite precision allows for the encoding of non-computable functions, rendering weight norms irrelevant.
Results
The authors do not provide specific numerical results or comparisons against established baselines, as the focus is primarily theoretical. However, they assert that the bounds derived are tight up to constants, particularly highlighting that networks whose parameters encode permutations yield strings with Kolmogorov complexity proportional to the non-zero parameter count multiplied by its logarithm. This theoretical framework suggests that weight decay can be understood as a mechanism that aligns model complexity with the inherent complexity of the data being modeled.
Limitations
The authors acknowledge that the fixed-precision assumption is a significant limitation, as it restricts the applicability of their findings to scenarios where infinite precision is not feasible. They do not address potential implications of their findings in practical neural network training scenarios, such as the impact of different optimization algorithms or the role of other regularization techniques. Additionally, the theoretical nature of the work may limit its immediate applicability to empirical settings without further experimental validation.
Why it matters
This work has important implications for the understanding of regularization techniques in neural networks, particularly weight decay. By linking weight norms to Kolmogorov complexity, it provides a theoretical foundation for why certain regularization strategies may be effective in controlling model complexity. This insight could influence future research on model design and regularization methods, potentially leading to more robust architectures that leverage these theoretical principles. Furthermore, the findings may inspire new approaches to understanding the trade-offs between model capacity and generalization in machine learning.
Authors: Tiberiu Musat
Source: arXiv:2605.10878
URL: https://arxiv.org/abs/2605.10878v1