Notable evaluation benchmarks

A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping

Maxim V. Shugaev, Md Reshad Ul Hoque, Bridget Kennedy, Joseph T. Riley, Fiona Hwang, Justin Hagen

Published: May 6, 2026 — 16:14 UTC
Summary length: 463 words
Relevance score: 70%

Problem
This paper addresses a significant gap in the literature regarding the evaluation of image restoration methods under severe refractive warping conditions, particularly in video sequences captured through dynamic media like turbulent air or water. While prior research has focused on mild atmospheric turbulence, there has been no systematic benchmark for assessing restoration techniques under strong and nonuniform refractive distortions. This work presents a unified benchmark that fills this void, providing a comprehensive framework for evaluating geometric distortion removal in video.

Method
The authors introduce a benchmark that encompasses both real and synthetic data, generated through physics-based light refraction modeling. The dataset includes various distortion levels, ranging from mild turbulence to strong discontinuous refractive deformations, and incorporates multiple surface wave types. The evaluation framework assesses a range of methods, from classical registration algorithms to advanced learning-based approaches, including DATUM and a novel diffusion-based method termed V-cache, specifically designed for high and extreme distortion regimes. The evaluation metrics employed include pixel-level metrics (PSNR, SSIM) and perceptual metrics (LPIPS, DINO, CLIP), enabling a comprehensive analysis of geometric distortion removal performance.

Results
The benchmark facilitates a large-scale analysis of various restoration methods, revealing performance discrepancies across different distortion levels. The proposed V-cache method outperforms existing baselines, achieving a PSNR improvement of up to 5 dB over traditional methods under extreme distortion conditions. Additionally, perceptual metrics indicate that V-cache provides superior visual quality, with LPIPS scores demonstrating a reduction of 0.15 compared to DATUM on the most challenging datasets. These results establish a new standard for evaluating restoration techniques in the context of severe refractive warping.

Limitations
The authors acknowledge several limitations, including the potential for overfitting in learning-based methods due to the synthetic nature of some training data. They also note that the benchmark may not encompass all possible real-world scenarios, particularly those involving complex dynamic scenes or varying lighting conditions. Furthermore, the reliance on specific perceptual metrics may not fully capture human visual perception nuances. An additional limitation is the computational cost associated with training advanced models like V-cache, which may not be feasible for all practitioners.

Why it matters
This work has significant implications for the development of algorithms aimed at reconstructing video from highly distorted optical environments. By establishing a comprehensive benchmark, it provides a foundation for future research in geometric distortion removal, enabling researchers to compare their methods against a standardized set of challenges. The introduction of both real and synthetic datasets allows for more robust training and evaluation of restoration techniques, potentially leading to advancements in applications such as surveillance, autonomous navigation, and augmented reality, where clear video capture through refractive media is critical.

Authors: Maxim V. Shugaev, Md Reshad Ul Hoque, Bridget Kennedy, Joseph T. Riley, Fiona Hwang, Justin Hagen, Harvir Ghuman, Ethan Garcia-O’Donnell et al.
Source: arXiv:2605.05079
URL: https://arxiv.org/abs/2605.05079v1

Author Turing Wire editorial staff

Source

arXiv cs.CV https://arxiv.org/abs/2605.05079v1