Notable multimodal

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

Josef Bengtson, Yaroslava Lochman, Fredrik Kahl

Published
Jun 3, 2026 — 17:49 UTC

Problem
Existing multi-view image editing techniques primarily focus on rigid transformations or appearance-only modifications, which limits their applicability to nonrigid scene changes. While some methods target specific tasks like object removal or addition, they do not generalize well to substantial geometric alterations. This paper addresses the gap in the literature regarding flexible, general-purpose editing methods capable of handling nonrigid scene changes. Notably, this work is presented as a preprint and has not undergone peer review.

Method
GeM-NR employs a training-free approach to achieve multi-view consistent image editing. The method consists of three main stages:

  1. Depth Map Estimation: A novel strategy is introduced to maximize the alignment between the 3D point clouds of the edited and unedited scenes, ensuring accurate depth representation.
  2. Projection: The method projects the edited scene onto the viewpoint of the query image, facilitating the integration of the edit into the new perspective.
  3. Refinement: The final stage involves conditioning the output image on the unedited query, enhancing the visual coherence of the edit.

GeM-NR is designed to scale effectively from two views to multiple views, making it versatile for various editing tasks. The backbone editors used for the anchor image can include FLUX, Qwen, or BrushNet, allowing for flexibility in the choice of initial edits.

Results
The authors demonstrate that GeM-NR significantly improves consistency across a range of editing tasks. Quantitative evaluations indicate state-of-the-art performance in edit quality, geometric accuracy, and photometric consistency when compared to existing baselines. Specific metrics and benchmarks are not disclosed in the abstract, but the results suggest a marked improvement over traditional methods, particularly in handling edits that involve drastic changes in both geometry and appearance.

Limitations
The authors acknowledge that while GeM-NR excels in many scenarios, it may still face challenges in extremely complex scenes where depth estimation could be ambiguous. Additionally, the reliance on an anchor image edited by a backbone editor may introduce biases based on the quality of the initial edit. The paper does not address potential computational costs associated with depth estimation and projection, which could impact real-time applications.

Why it matters
GeM-NR represents a significant advancement in the field of multi-view image editing, particularly for applications requiring nonrigid scene modifications. Its training-free nature and ability to maintain consistency across multiple views make it a valuable tool for 3D content generation and customization. This work opens avenues for further research into flexible editing frameworks and could inspire new methodologies in generative modeling and scene reconstruction, as discussed in related literature. For more details, refer to the full paper available on arXiv.

Turing Wire

By Turing Wire editorial staff · Jun 3, 2026 · Editorial standards →

Source: arXiv cs.AI