Major foundation models

RAID: Semantic Graph Diffusion for True Cold-Start and Cross-Lingual Forecasting

Arunkumar V, Manoranjan Gandhudi, Gangadharan G. R., Arun Prakash, S. Senthilkumar

Published: Jun 15, 2026 — 16:27 UTC

Problem
The paper addresses the limitations of existing time-series forecasting models in true cold-start scenarios, where new items lack historical data. Traditional models rely on historical correlations, which are ineffective when no prior observations exist. This work is particularly relevant as it presents a preprint, indicating that it has not yet undergone peer review.

Method
RAID (Retrieval-Augmented Iterative Diffusion) innovatively replaces history-based correlation learning with a framework that leverages metadata-driven semantic retrieval and graph-conditioned diffusion. The method employs a frozen multilingual embedding model to map textual metadata into a shared semantic space, facilitating the construction of an inductive retrieval graph. This graph allows the model to extend its capabilities to unseen items. The forecasting process begins with a base forecast generated by aggregating information from semantically related neighbors. Subsequently, a gated diffusion module refines this forecast by modeling residual uncertainty. The architecture emphasizes non-autoregressive decoding, significantly enhancing inference speed.

Results
RAID demonstrates superior performance under a strict true cold-start protocol, outperforming strong foundation models and competitive baselines. Specifically, it achieves a forecasting accuracy improvement of 15% over the best baseline on the M4 benchmark, while also enhancing prediction interval coverage by 20%. Additionally, RAID reduces inference latency by an order of magnitude compared to autoregressive models, showcasing its efficiency in real-time applications. The framework also enables zero-shot cross-lingual transfer, allowing models trained on English data to generalize effectively to other languages, which is a significant advancement in multilingual forecasting.

Limitations
The authors acknowledge that RAID’s reliance on a frozen multilingual embedding model may limit its adaptability to rapidly changing metadata. Furthermore, the performance gains are primarily demonstrated in controlled environments, and real-world applicability may vary. The paper does not address potential challenges in scaling the model to extremely large datasets or the implications of embedding quality on overall performance.

Why it matters
The implications of RAID are significant for the fields of time-series forecasting and cross-lingual applications. By effectively addressing true cold-start scenarios, RAID opens avenues for deploying forecasting models in domains where historical data is scarce, such as new product launches or emerging markets. The ability to generalize across languages without direct supervision also enhances the model’s utility in global applications, making it a valuable contribution to the literature on multilingual machine learning. This work is foundational for future research in integrating semantic understanding with time-series analysis, as published in arXiv.

By Callan Zhang · Jun 15, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.AI