Notable other

SemDINO: A DINOv3-Driven Network for Cross-Temporal Semantic Alignment in Change Detection

Xinyu Tong, Meihua Zhou, Jinxiao Sun, Yingjie Tang, Lei Wang

Published
Jun 8, 2026 — 17:32 UTC

Problem
The paper addresses the limitations of existing semantic change detection (SCD) methods, which struggle with cross-temporal alignment, multi-scale representation, and robustness to pseudo-changes induced by factors such as illumination, seasonal variations, and registration noise. The authors highlight that current approaches often fail to effectively identify and localize land-cover changes while maintaining semantic accuracy. This work is presented as a preprint and has not undergone peer review.

Method
The proposed SemDINO architecture is an end-to-end semantic change detection network that integrates several innovative components. It features a dual-branch encoder that combines a convolutional neural network (CNN) backbone with frozen DINOv3 features through gated pyramid fusion, facilitating rich multi-scale semantic representation. The architecture includes a multi-scale temporal bidirectional transformer interaction (M-TBTT) module designed for global cross-temporal feature alignment and information interaction. To enhance the detection of genuine changes while mitigating pseudo-variations, the authors introduce three key modules: semantic purification (SCP), bidirectional change enhancement (BiChangeEnhance), and multi-scale change enhancement (MCE). The final output is generated through a multi-branch change detection (CD) prediction head that produces a binary change mask, bi-temporal semantic maps, and edge constraints.

Results
SemDINO was evaluated on several public remote sensing change detection datasets, demonstrating superior performance compared to state-of-the-art methods. The results indicate that SemDINO achieves significant improvements in accuracy, particularly in complex scenarios with interference factors. Specific performance metrics include an increase in F1 scores by up to 5% over leading baselines, with notable enhancements in precision and recall rates. The authors provide detailed comparisons against established benchmarks, showcasing the robustness and generalization capabilities of SemDINO across diverse datasets.

Limitations
The authors acknowledge that while SemDINO shows promising results, it may still be sensitive to extreme variations in environmental conditions that were not extensively covered in the training data. Additionally, the reliance on a frozen DINOv3 backbone may limit adaptability to rapidly evolving semantic categories. The paper does not address the computational efficiency of the model during inference, which could be a concern for real-time applications.

Why it matters
The advancements presented in SemDINO have significant implications for the field of remote sensing and environmental monitoring, particularly in applications requiring accurate land-cover change detection. By improving cross-temporal alignment and robustness to pseudo-changes, this work paves the way for more reliable and effective monitoring of environmental changes, which is crucial for urban planning, disaster management, and ecological studies. The methodologies and findings can inform future research in semantic segmentation and change detection, as published in arXiv cs.CV.

Turing Wire

By Turing Wire editorial staff · Jun 8, 2026 · Editorial standards →

Source: arXiv cs.CV