Major agents robotics

SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image

Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim, Hyunsoo Cha, Hanbyul Joo

Published: Jun 2, 2026 — 17:59 UTC

Problem
The paper addresses the challenge of reconstructing interactive, simulation-ready 3D scenes from a single image, a critical bottleneck for robotic manipulation. Existing methods, while capable of generating plausible per-object shapes, often result in scenes that fail under physical simulation due to geometric inaccuracies, such as interpenetration and improper object support. Current physics-aware approaches treat these issues as post-hoc corrections, failing to resolve the underlying geometric errors. This work is a preprint and has not undergone peer review.

Method
SimuScene introduces a compositional 3D reconstruction pipeline that integrates physics into the shape and layout estimation process. The core innovation lies in using a physics engine not just for layout correction but as a diagnostic tool during the generative phase. The pipeline employs a feedback loop where reconstructed objects are simulated under gravity, allowing for the identification of penetration and support failures. These failures are quantified and used to inform corrections, specifically through gravity-axis stretching and amodal shape resampling. The architecture leverages a combination of neural networks for shape generation and a physics engine for real-time feedback, although specific details on the network architecture and training compute are not disclosed.

Results
SimuScene demonstrates state-of-the-art performance on benchmarks for physical stability and geometric alignment. The authors report significant improvements over baseline methods, achieving a 20% reduction in penetration errors and a 15% increase in support stability metrics compared to leading approaches. Additionally, the reconstructed environments were successfully deployed in humanoid control and robot-arm manipulation tasks, showcasing practical utility in real-world applications.

Limitations
The authors acknowledge that while SimuScene improves geometric accuracy and physical stability, it may still struggle with highly complex scenes or occlusions that are not well-represented in the training data. Furthermore, the reliance on a physics engine may introduce computational overhead, which could limit real-time applications. The paper does not address the scalability of the method to larger or more diverse datasets, nor does it explore the impact of varying object types and materials on reconstruction quality.

Why it matters
The integration of physics into the generative process of 3D scene reconstruction represents a significant advancement in the field, potentially enabling more robust robotic manipulation and interaction with complex environments. By addressing the limitations of existing methods, SimuScene paves the way for future research in physics-informed generative models, enhancing the realism and utility of reconstructed scenes in various applications. This work is particularly relevant for researchers focusing on robotics and computer vision, as published in arXiv cs.CV.

By Callan Zhang · Jun 2, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CV