Deformable Gaussian Occupancy: Decoupling Rigid and Nonrigid Motion with Factorized Distillation
Yang Gao, Wuyang Li, Po-Chien Luan, Alexandre Alahi
- Published
- May 27, 2026 — 15:09 UTC
Problem
This paper addresses the limitations of existing weakly supervised occupancy prediction frameworks that primarily focus on rigid-body motion, which hampers their ability to accurately model dynamic environments with nonrigid agents, such as humans. The authors highlight that current methods rely on simplistic frame-to-frame offsets, which fail to capture fine-grained deformations and maintain temporal coherence. This work is presented as a preprint and has not yet undergone peer review.
Method
The authors introduce Deformable Gaussian Occupancy (DeGO), a novel framework that integrates decoupled Gaussian deformation with a factorized 4D foundation-model distillation approach. DeGO allows for the independent evolution of Gaussian primitives through both deformation and offset-based updates, effectively disentangling rigid and nonrigid motion. The factorized 4D distillation strategy leverages knowledge transfer from the VGGT foundation model, facilitating the extraction of foundation-aligned features that enhance temporal consistency across frames and camera views. The architecture is designed to operate under weak supervision, making it suitable for real-world applications where labeled data is scarce.
Results
DeGO was evaluated on the Occ3D-NuScenes benchmark, where it achieved state-of-the-art performance in weakly supervised settings. Specifically, the method demonstrated a 13.5% improvement in performance on human-centric instances and an overall enhancement of 10.9% compared to existing baselines. These results underscore the effectiveness of the proposed deformation-aware and foundation-guided occupancy modeling in improving dynamic scene understanding.
Limitations
The authors acknowledge that the reliance on weak supervision may limit the generalizability of the model in highly complex scenarios where labeled data could significantly enhance performance. Additionally, the paper does not address the computational overhead introduced by the factorized distillation process, which may impact real-time applications. The scalability of the method to larger datasets and more diverse environments is also not thoroughly explored.
Why it matters
The implications of this work are significant for the field of autonomous driving and dynamic scene understanding. By effectively decoupling rigid and nonrigid motion, DeGO provides a more nuanced approach to occupancy prediction, which is crucial for safe navigation in environments populated by dynamic agents. The integration of foundation-model distillation enhances the model’s robustness and temporal coherence, paving the way for future research that can build upon these advancements. This framework could lead to improved algorithms for real-time scene understanding, ultimately contributing to the development of safer autonomous systems.
Authors: Yang Gao, Wuyang Li, Po-Chien Luan, Alexandre Alahi
Source: arXiv:2605.28587
URL: https://arxiv.org/abs/2605.28587v1
By Turing Wire editorial staff · May 27, 2026 · Editorial standards →
Source: arXiv cs.CV