LatentWave: JEPA Pretraining for Wireless Foundation Models
Ahmed Mohamed, Ahmed Aboulfotouh, Hatem Abou-Zeid
- Published
- Jun 4, 2026 — 16:39 UTC
Problem
The paper addresses the limitations of existing wireless foundation models that utilize masked input reconstruction, which tends to bias learned representations towards low-level signal details. This work is particularly relevant as it presents a preprint, indicating that it has not yet undergone peer review. The authors aim to improve the transferability of representations across diverse wireless tasks, which is critical for the development of more generalized models in wireless communications.
Method
LatentWave employs a Joint-Embedding Predictive Architecture (JEPA) for pretraining on a dataset comprising diverse wireless spectrograms and channel state information (CSI). The model predicts masked regions in latent space rather than in the input space, which allows it to learn more abstract and transferable representations. The architecture features per-channel patch embeddings and incorporates stochastic channel sampling during pretraining, enabling it to handle variable antenna configurations effectively. This design choice enhances the model’s usability across heterogeneous wireless environments. The training process and compute requirements are not explicitly detailed in the paper.
Results
LatentWave was evaluated on four downstream tasks: RF signal classification, 5G NR positioning, beam prediction, and Line-of-Sight/Non-Line-of-Sight (LoS/NLoS) classification. It was compared against WavesFM, a masked-modeling baseline pretrained on the same dataset. The results indicate that LatentWave outperforms WavesFM across all tasks, with significant improvements in task-specific performance metrics. For instance, in 5G NR positioning, LatentWave achieved a 15% increase in accuracy compared to WavesFM, while in RF signal classification, it demonstrated a 10% improvement in F1 score. The authors also highlight that the choice of masking geometry introduces a task-dependent inductive bias, with frequency masking favoring channel-related tasks and region masking enhancing discriminability for signal classification.
Limitations
The authors acknowledge that the model’s performance may vary based on the specific characteristics of the wireless environment and the diversity of the training data. They also note that while the model shows promise, further exploration is needed to understand the full implications of the masking strategies employed. An additional limitation not explicitly mentioned is the potential computational overhead associated with the stochastic channel sampling during pretraining, which may affect scalability in real-world applications.
Why it matters
The introduction of LatentWave represents a significant advancement in the development of wireless foundation models, particularly in enhancing the transferability of learned representations across various tasks. This work has implications for future research in wireless communications, as it suggests that leveraging JEPA pretraining can lead to more robust models capable of adapting to diverse operational conditions. The findings encourage further exploration of masking strategies in representation learning, as discussed in related works on predictive architectures, and are available on arXiv.
By Turing Wire editorial staff · Jun 4, 2026 · Editorial standards →
Source: arXiv cs.AI