Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout
Haozhuang Chi, Daosheng Qiu, Hao Su, Haochen Liu, Zirui Li, Haoruo Zhang
- Published
- May 6, 2026 — 16:30 UTC
- Summary length
- 447 words
- Relevance score
- 70%
Problem
This paper addresses a significant gap in the literature regarding the modeling of driver behavior in the context of automated driving systems, particularly for Level 2 and Level 3 automation. Existing driving world models primarily focus on forecasting external environmental conditions but fail to incorporate in-cabin dynamics that account for human driver reactions during shared-control transitions. The authors propose Driver-WM, a driver-centric latent world model that integrates both external traffic conditions and internal driver states, enabling multi-step rollouts of driver dynamics. This work is presented as a preprint and has not yet undergone peer review.
Method
Driver-WM employs a dual-stream architecture that separately encodes external traffic conditions and internal driver states. The model operates in a compact latent space derived from frozen vision-language features, which enhances its efficiency and effectiveness. A gated causal injection mechanism is introduced to couple the two streams directionally, allowing for the modulation of external contextual perturbations while maintaining strict temporal causality. The training process involves optimizing a loss function that balances the forecasting of kinematic trajectories with the recognition of behavioral and emotional states of the driver. The authors do not disclose specific training compute details, but the architecture is designed to facilitate robust long-horizon forecasting.
Results
The evaluation of Driver-WM is conducted on a multi-task assistive driving benchmark, where it demonstrates superior performance in long-horizon geometric forecasting for reactive high-motion maneuvers. The model achieves a notable improvement in semantic alignment for both driver and traffic states compared to baseline models. While specific numerical results are not provided in the abstract, the authors emphasize the effectiveness of their approach in enhancing the predictive capabilities of driver behavior in complex driving scenarios.
Limitations
The authors acknowledge that the model’s reliance on frozen vision-language features may limit its adaptability to dynamic environments where real-time feature extraction is critical. Additionally, the paper does not address potential scalability issues when applied to diverse driving conditions or the integration of additional sensory modalities. The lack of peer review may also raise questions about the robustness of the findings and the generalizability of the model.
Why it matters
The development of Driver-WM has significant implications for the future of automated driving systems, particularly in enhancing the safety and reliability of human-machine interactions during shared control. By effectively modeling driver dynamics in response to external traffic conditions, this work paves the way for more sophisticated in-cabin intelligence systems that can anticipate and react to driver behavior. This could lead to improved user experience and safety in semi-autonomous vehicles, as well as inform future research on human-centered AI in transportation.
Authors: Haozhuang Chi, Daosheng Qiu, Hao Su, Haochen Liu, Zirui Li, Haoruo Zhang, Chen Lv
Source: arXiv:2605.05092
URL: https://arxiv.org/abs/2605.05092v1