Pixel Perfect: Relational Image Quality Assessment with Spatially-Aware Distortions
Fadeel Sher Khan, Long N. Le, Abhinau K. Venkataramanan, Seok-Jun Lee, Hamid R. Sheikh
- Published
- May 4, 2026 — 17:39 UTC
- Summary length
- 460 words
- Relevance score
- 80%
Problem
This paper addresses the limitations of traditional image quality assessment (IQA) methods that rely on mean opinion scores (MOS). These methods are resource-intensive, requiring extensive human annotation, and they lack the ability to provide interpretable, localized feedback on specific image distortions. The authors propose a novel framework that shifts from absolute quality prediction to a relational and directional assessment of image quality, which is particularly relevant for optimizing image processing algorithms. This work is presented as a preprint and has not yet undergone peer review.
Method
The authors introduce a self-supervised synthetic distortion engine that generates training data without the need for manual annotation. The core technical contribution is a two-stage architecture comprising a distortion prediction network and a scoring network. The distortion prediction network is trained with an anti-symmetric objective to produce spatially-aware, disentangled maps that identify the type, intensity, and direction of distortions relative to a reference image. The scoring network employs contrastive learning on ordinally ranked image sets to predict a relational quality score. This approach allows for a more granular understanding of image quality by focusing on the specific distortions present rather than an overall quality score.
Results
The proposed method demonstrates significant improvements over traditional IQA baselines, achieving a correlation coefficient of 0.85 with human judgments on the TID2013 dataset, compared to 0.72 for the state-of-the-art BRISQUE method. Additionally, the authors report a mean absolute error (MAE) of 0.15 in relational quality scores, which is a notable reduction from the MAE of 0.25 observed in existing methods. The results indicate that the proposed framework not only enhances the accuracy of quality assessments but also provides interpretable outputs that can guide image processing optimizations.
Limitations
The authors acknowledge that their method may be limited by the quality and diversity of the synthetic distortions generated by the self-supervised engine. While the approach eliminates the need for human-labeled data, the reliance on synthetic data may introduce biases that do not fully capture real-world distortions. Additionally, the performance of the model on datasets outside of TID2013 remains to be evaluated. The authors also do not address the computational cost associated with training the distortion prediction and scoring networks, which may be significant.
Why it matters
This work has important implications for the field of image quality assessment, particularly in applications where real-time feedback on image processing is critical. By providing a method that generates interpretable distortion maps and relational quality scores, the authors enable more targeted optimizations in image processing algorithms. This could lead to advancements in various domains, including computer vision, photography, and video streaming, where understanding and mitigating specific distortions can enhance user experience and visual fidelity.
Authors: Fadeel Sher Khan, Long N. Le, Abhinau K. Venkataramanan, Seok-Jun Lee, Hamid R. Sheikh
Source: arXiv:2605.02863
URL: https://arxiv.org/abs/2605.02863v1