Notable foundation models UiPath

Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

Ellwil Sharma, Arastu Sharma

Published
May 14, 2026 — 17:58 UTC

Problem
This preprint addresses the challenge of negative transfer in multi-physics foundation models, particularly in the context of scientific machine learning (SciML). The simultaneous co-training of disparate partial differential equation (PDE) regimes, such as open-channel fluid dynamics and boundary-dominated porous media flows, leads to gradient conflicts, unstable optimization, and a loss of plasticity in dense neural operators. The authors highlight the need for a solution that can effectively manage the incompatible spectral and geometric demands of these diverse physical phenomena.

Method
The authors propose Shodh-MoE, a sparse-activated latent transformer architecture designed for multi-physics transport. The model operates on compressed 16^3 physical latents generated by a physics-informed autoencoder, which employs an intra-tokenizer Helmholtz-style velocity parameterization to ensure that decoded states remain within divergence-free velocity manifolds. This design guarantees exact mass conservation, achieving a velocity divergence of approximately 2.8 x 10^-10 when evaluated post-hoc in FP64 on 128^3 grids. The architecture features a Top-1 soft-semantic router that dynamically assigns localized latent patches to specialized expert subnetworks, allowing for distinct parameter paths tailored to specific physical mechanisms while maintaining shared experts for universal symmetries. The model underwent a 20,000-step distributed pretraining run over mixed three-dimensional physical tensors, demonstrating autonomous domain bifurcation in routing telemetry.

Results
Shodh-MoE achieved latent validation mean squared errors (MSEs) of 2.46 x 10^-5 for the open-channel domain and 9.76 x 10^-6 for the porous-media domain. The decoded physical MSEs were 2.48 x 10^-6 and 1.76 x 10^-6, respectively. These results indicate that the model converges effectively across both regimes, outperforming traditional dense architectures that do not utilize expert routing. The authors demonstrate that the sparse expert routing mechanism significantly mitigates multi-physics interference, providing a robust solution to the negative transfer problem.

Limitations
The authors acknowledge that while Shodh-MoE effectively addresses negative transfer, the reliance on expert subnetworks may introduce additional complexity in model training and deployment. They do not discuss potential scalability issues when extending the model to more complex or higher-dimensional physical systems. Furthermore, the performance on other PDE regimes beyond those tested remains unverified, which could limit the generalizability of the findings.

Why it matters
The implications of this work are significant for the development of universal neural operators in SciML. By demonstrating that sparse expert routing can effectively manage the complexities of multi-physics interactions, this research paves the way for more robust and efficient models that can be applied across a wider range of scientific domains. The architectural innovations presented in Shodh-MoE could inspire further research into adaptive routing mechanisms and their applications in other areas of machine learning, particularly where diverse data modalities are involved.

Authors: Ellwil Sharma, Arastu Sharma
Source: arXiv:2605.15179
URL: https://arxiv.org/abs/2605.15179v1

Turing Wire

By Turing Wire editorial staff · May 14, 2026 · Editorial standards →

Source: arXiv cs.AI