Notable other

SeeGroup: Multi-Layer Depth Estimation of Transparent Surfaces via Self-Determined Grouping

Hongyu Wen, Jia Deng

Published
May 27, 2026 — 16:55 UTC

Problem
This paper addresses the limitations of existing multilayer depth estimation methods for transparent surfaces, which typically rely on predefined grouping strategies that impose a sequential ordering of 3D points. Such methods are restrictive as they do not account for the inherent ambiguity in the layering of transparent objects. The authors propose SeeGroup, a novel approach that allows for adaptive grouping of surfaces without predefined constraints. This work is presented as a preprint and has not yet undergone peer review.

Method
SeeGroup formulates multilayer depth estimation as a point process, treating depth layers as unordered events along each camera ray. This approach leverages a permutation-invariant likelihood function, which enables the model to learn arbitrary groupings of depth layers directly from the data. The architecture is not explicitly detailed in the abstract, but the method’s core contribution lies in its loss function, which supports flexible layer assignments. The training process involves optimizing this loss over a dataset of transparent surfaces, although specific details regarding the training compute and dataset size are not disclosed.

Results
The authors report a significant improvement in quadruplet relative depth accuracy on the LayeredDepth benchmark, achieving an accuracy of 70.09%, up from 61.34% with previous state-of-the-art methods. This represents a notable effect size, indicating that SeeGroup outperforms existing techniques by a substantial margin. The results suggest that the adaptive grouping mechanism effectively captures the complexities of multilayer depth estimation for transparent surfaces.

Limitations
The authors acknowledge that their method may struggle with extreme occlusions or highly complex scenes where depth layering is particularly ambiguous. Additionally, they do not discuss the computational efficiency of their approach, which could be a concern for real-time applications. The lack of detailed architectural specifications and training parameters also limits reproducibility and understanding of the model’s performance in various contexts.

Why it matters
The implications of SeeGroup extend to various applications in computer vision, particularly in augmented reality, robotics, and scene understanding, where accurate depth perception of transparent objects is crucial. By allowing for adaptive grouping of depth layers, this method could enhance the performance of systems that rely on multilayer depth information, paving the way for more robust and flexible depth estimation techniques in complex environments.

Authors: Hongyu Wen, Jia Deng
Source: arXiv:2605.28735
URL: https://arxiv.org/abs/2605.28735v1

Turing Wire

By Turing Wire editorial staff · May 27, 2026 · Editorial standards →

Source: arXiv cs.CV