Global Structure-from-Motion Meets Feedforward Reconstruction
Linfei Pan, Johannes Schönberge, Marc Pollefeys
- Published
- May 25, 2026 — 17:58 UTC
Problem
This paper addresses the limitations of existing Structure-from-Motion (SfM) techniques, particularly in challenging scenarios such as low texture, limited overlap, and symmetrical scenes. While recent feedforward 3D reconstruction methods have shown promise in these conditions, they often lack scalability, accuracy, and robustness compared to classical SfM methods. The authors identify a gap in the literature where a hybrid approach that leverages the strengths of both classical and feedforward methods has not been thoroughly explored. This work is presented as a preprint and has not yet undergone peer review.
Method
The authors propose a novel SfM pipeline that integrates classical SfM techniques with feedforward reconstruction methods. The architecture combines the geometric robustness of classical methods with the efficiency of feedforward approaches. The proposed system employs a two-stage process: first, it utilizes classical SfM to estimate camera poses and initial 3D points, followed by a feedforward neural network that refines the 3D reconstruction. The loss function is designed to minimize both reprojection error and 3D point cloud density, ensuring a balance between accuracy and computational efficiency. The authors disclose that the training of the feedforward model was conducted on a diverse set of datasets, although specific compute resources are not detailed.
Results
The proposed method achieves state-of-the-art performance on several benchmark datasets, including the ETH3D and DTU datasets. Notably, the hybrid approach outperforms classical SfM methods by a significant margin, achieving a 15% improvement in reconstruction accuracy on the ETH3D dataset and a 20% reduction in error on the DTU dataset compared to the best-performing classical methods. The authors also report that their method maintains competitive performance in low-texture and symmetrical scenarios, where traditional methods typically struggle.
Limitations
The authors acknowledge several limitations in their work. First, the hybrid approach may require careful tuning of hyperparameters to achieve optimal performance across different datasets. Additionally, while the method shows improvements in challenging scenarios, it may still not match the efficiency of purely feedforward methods in standard conditions. The authors also note that the computational overhead introduced by the two-stage process could be a concern for real-time applications. An obvious limitation not discussed is the potential for overfitting in the feedforward model, particularly if the training datasets do not adequately represent the diversity of real-world scenes.
Why it matters
This work has significant implications for the field of computer vision, particularly in applications requiring robust 3D reconstruction under challenging conditions. By successfully merging classical and feedforward methods, the authors provide a framework that could enhance the reliability of SfM in practical scenarios, such as robotics, augmented reality, and cultural heritage documentation. The open-source implementation of their system allows for further exploration and adaptation by the research community, potentially leading to new advancements in 3D reconstruction techniques.
Authors: Linfei Pan, Johannes Schönberge, Marc Pollefeys
Source: arXiv:2605.26103
URL: https://arxiv.org/abs/2605.26103v1
By Turing Wire editorial staff · May 25, 2026 · Editorial standards →
Source: arXiv cs.CV