Notable other

Bridging Day and Night: Unsupervised Cross-Domain Re-Identification with Synergistic Prompt and Prototype Learning

Jiyang Xu, Rui Liu, Hang Dai

Published: Jun 10, 2026 — 16:01 UTC

Problem
The paper addresses the challenge of cross-domain day-night re-identification (ReID), which suffers from significant visual discrepancies between daytime and nighttime images. Existing fully supervised methods require extensive manual annotations, which are costly and limit generalization across domains. This work proposes an unsupervised approach to ReID, filling a gap in the literature regarding effective identity association without labeled data. The authors note that this is a preprint and has not undergone peer review.

Method
The proposed framework employs a two-stage training strategy. In the first stage, it utilizes a vision-language model to generate instance-specific textual prompts in an annotation-free manner. This is achieved through an instance-level alignment mechanism that embeds visual features and textual prompts into a unified semantic space. The authors introduce instance-aware dynamic-bias adaptation to align unlabeled day and night images with learnable prompts. In the second stage, the framework constructs domain-specific prototype memory banks and incorporates two modules: (i) an intra-domain identity association module that enhances feature discriminability within each domain, and (ii) a cross-domain prototype matching module that identifies positive and negative prototype pairs, establishing robust identity correspondences across day and night.

Results
The proposed method demonstrates strong performance on public benchmarks, achieving Rank-1 accuracy that is competitive with state-of-the-art fully supervised methods. Specific performance metrics are not disclosed in the abstract, but the authors emphasize that their unsupervised approach matches or exceeds the accuracy of existing supervised techniques, indicating a significant advancement in the field.

Limitations
The authors acknowledge that their method may still struggle with extreme variations in lighting conditions and scene context that are not adequately captured by the learned prototypes. Additionally, the reliance on a vision-language model for prompt generation may introduce biases based on the training data of the language model. The paper does not discuss the computational efficiency of the proposed method or the scalability of the prototype memory banks in real-world applications.

Why it matters
This work has significant implications for the development of unsupervised learning techniques in computer vision, particularly in scenarios where labeled data is scarce or expensive to obtain. The synergistic approach of combining prompt learning with prototype-based representation could inspire further research into cross-domain learning and identity association tasks. The findings contribute to the ongoing discourse on reducing reliance on supervised methods in ReID, as highlighted in related literature on unsupervised learning strategies, as published in arXiv cs.CV.

By Callan Zhang · Jun 10, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CV