Notable other

Latent Dynamics for Full Body Avatar Animation

Shichong Peng, Chengxiang Yin, Fei Jiang, Zhongshi Jiang, Lingchen Yang, Qingyang Tan

Published
May 20, 2026 — 17:58 UTC

Problem
This paper addresses the limitations of existing methods for animating full-body avatars, particularly in the context of loose clothing and dynamic elements that cannot be accurately represented by pose alone. Current approaches either rely on explicit garment templates or physics simulations, which are computationally expensive and impractical for real-time applications. Additionally, existing data-driven methods that utilize auxiliary latents for variation do not effectively model the evolution of these latents over time. This work is presented as a preprint and has not yet undergone peer review.

Method
The authors propose a novel architecture that augments a pose-conditioned 3D Gaussian avatar with a transformer-based decoder and a dynamics residual latent. This residual latent captures temporal variations in appearance and geometry that extend beyond the driving pose signals. The key innovation is a learned latent dynamics model that evolves the residual latent based on a short history of poses and the previous latent state. The model decomposes updates into driving, restoring, and dissipative forces, allowing for temporally coherent and history-dependent animations. The architecture is trained on nine sequences of everyday motion featuring diverse loose garments, although specific training compute details are not disclosed.

Results
The proposed method demonstrates significant improvements in animation quality compared to recent data-driven baselines. Quantitative metrics indicate enhanced performance in capturing fine-grained details and reducing temporal artifacts. A perceptual user study further corroborates these findings, showing that users prefer the animations generated by the proposed method over those produced by existing techniques. While exact numerical results are not provided in the abstract, the qualitative improvements suggest a substantial effect size in terms of user satisfaction and visual fidelity.

Limitations
The authors acknowledge that their approach may still struggle with extreme poses or highly complex garment interactions that were not part of the training data. Additionally, the reliance on a learned latent dynamics model may introduce challenges in generalization to unseen clothing types or motion patterns. The paper does not discuss the computational efficiency of the model during inference, which could be a concern for real-time applications.

Why it matters
This work has significant implications for the fields of computer graphics and animation, particularly in applications requiring realistic avatar representation in virtual environments, gaming, and augmented reality. By effectively modeling the dynamics of clothing and avatar motion, this research paves the way for more immersive and visually appealing experiences. The decomposition of forces also opens avenues for user control over animation parameters, enhancing interactivity in virtual environments. Future research could build on this framework to explore more complex interactions and further improve the realism of animated avatars.

Authors: Shichong Peng, Chengxiang Yin, Fei Jiang, Zhongshi Jiang, Lingchen Yang, Qingyang Tan, Amin Jourabloo, Jason Saragih et al.
Source: arXiv:2605.21478
URL: https://arxiv.org/abs/2605.21478v1

Turing Wire

By Turing Wire editorial staff · May 20, 2026 · Editorial standards →

Source: arXiv cs.CV