Symbolic Regression via Latent Iterative Refinement
Xieting Chu, Sriram Vishwanath, Vijay Ganesh
- Published
- May 26, 2026 — 16:25 UTC
Problem
This paper addresses the limitations of existing neural symbolic regression (SR) methods, which typically rely on a single-pass amortized inference approach. This method often results in a residual amortization gap between the predicted expressions and the true posterior distributions. The authors propose a novel framework, Latent Equation Embedding (LEE), to mitigate this gap through iterative amortized inference in a functionally grounded latent space. This work is presented as a preprint and has not yet undergone peer review.
Method
LEE introduces a shared latent space ( Z ) that integrates three key components: an encoder ( f_\theta ) that maps symbolic tokens and numerical observations into a latent vector ( z ); an expression decoder ( g_{\text{expr}} ) that reconstructs mathematical formulas from ( z ); and an evaluation decoder ( g_{\text{eval}} ) that predicts function values based on ( z ). The framework employs iterative refinement during inference, where decoded expressions are re-encoded alongside observations to progressively enhance the latent estimate. The encoder acts as a learned inference optimizer, allowing for implicit computation of the mismatch between candidate expressions and observed data. Additionally, the differentiability of ( g_{\text{eval}} ) in ( z ) enables the interleaving of continuous gradient descent with discrete re-encoding, creating a hybrid refinement process that combines iterative and gradient-based optimization.
Results
LEE was evaluated on the SRBench dataset across three noise levels, demonstrating significant improvements over 19 baseline methods, including genetic programming techniques and symbolic-neural hybrids. The results indicate that LEE produces expressions that are 2 to 10 times simpler than the best accuracy-oriented baselines, such as Operon, GP-GOMEA, TPSR, RAG-SR, and GenSR. Specifically, the complexity of the expressions generated by LEE ranged from 8 to 11, compared to 20 to 90 for the strongest baselines. Furthermore, LEE exhibited graceful degradation in performance as noise levels increased, advancing the low-complexity region of the accuracy-complexity Pareto frontier.
Limitations
The authors acknowledge that while LEE improves upon existing methods, it may still struggle with highly complex functions or extreme noise levels that could overwhelm the latent space’s representational capacity. Additionally, the computational overhead associated with iterative refinement may limit scalability for very large datasets or real-time applications. The paper does not address the potential impact of hyperparameter tuning on performance, which could be a significant factor in practical implementations.
Why it matters
The introduction of LEE represents a substantial advancement in symbolic regression, particularly in balancing the trade-off between expression complexity and accuracy. By closing the amortization gap through iterative refinement, this framework opens new avenues for research in interpretable machine learning, where simpler models are often preferred for their explainability. The implications of this work extend to various domains requiring model interpretability, such as scientific modeling, engineering, and finance, where understanding the underlying mathematical relationships is crucial.
Authors: Xieting Chu, Sriram Vishwanath, Vijay Ganesh
Source: arXiv:2605.27245
URL: https://arxiv.org/abs/2605.27245v1
By Turing Wire editorial staff · May 26, 2026 · Editorial standards →
Source: arXiv cs.LG