Notable agents robotics

Anchor3R: Streaming 3D Reconstruction with Transient Anchors for Long-Horizon Visual Mapping

Peilin Tao, Chong Cheng, Yuansen Du, Caiwei Song, Zhengqing Chen, Xiaoyang Guo

Published
Jun 3, 2026 — 16:00 UTC

Problem
The paper addresses the limitations of existing streaming 3D reconstruction models, which typically rely on a fixed coordinate system tied to the initial frame or a persistent scene memory. This approach results in train-test mismatches, attention biases towards early anchors, and accumulated drift in long sequences, particularly in scenarios requiring continuous camera motion and scene geometry estimation. The authors highlight that current methods struggle with long-horizon visual mapping under constraints of memory and computation, which is critical for effective robot perception. This work is presented as a preprint and has not undergone peer review.

Method
Anchor3R proposes a novel framework that redefines the streaming 3D reconstruction task as a current-centric local measurement prediction problem. Instead of relying on a global-gauge regression, the model predicts window-relative poses and a local pointmap in the coordinate system of the current frame. This approach allows for real-time pose updates and the generation of relative-pose measurements. The architecture integrates loop-closure reinsertion and motion averaging techniques to align trajectories and transform local pointmaps into a coherent global reconstruction. The authors do not disclose specific details regarding the training compute or the exact architecture, but they emphasize the framework’s ability to operate under bounded memory constraints.

Results
Anchor3R demonstrates significant improvements over existing streaming baselines across various benchmarks, including indoor, outdoor, driving, and RGB-D datasets. The paper reports that Anchor3R achieves a 15% increase in long-horizon pose accuracy compared to the best-performing baseline, while also enhancing dense reconstruction quality by 20%. These results indicate a substantial reduction in drift and improved coherence in the reconstructed 3D maps, showcasing the effectiveness of the transient anchor approach in practical applications.

Limitations
The authors acknowledge that while Anchor3R improves long-horizon mapping, it may still face challenges in highly dynamic environments where rapid scene changes could affect pose estimation accuracy. Additionally, the reliance on local measurements may introduce limitations in scenarios requiring extensive global context. The paper does not address potential computational overhead introduced by the loop-closure and motion averaging processes, which could impact real-time performance in resource-constrained settings.

Why it matters
The implications of this work are significant for advancing the field of robot perception, particularly in applications requiring robust long-horizon visual mapping. By shifting the paradigm from fixed global references to transient local measurements, Anchor3R opens new avenues for improving the accuracy and reliability of 3D reconstruction in real-time scenarios. This approach could enhance autonomous navigation, augmented reality, and other applications reliant on precise spatial understanding, as published in arXiv cs.CV.

Turing Wire

By Turing Wire editorial staff · Jun 3, 2026 · Editorial standards →

Source: arXiv cs.CV