Notable efficiency inference

Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

Xiang Li, Dianbo Liu, Kenji Kawaguchi

Published: Jun 1, 2026 — 16:25 UTC

Problem
This work addresses the prevalent issue of mode collapse in generative models, particularly in the context of image generation. Existing methods primarily intervene during the generation process, neglecting the initialization phase, which can lead to suboptimal diversity outcomes. The authors argue that standard Gaussian initialization is insufficient as it does not consider the guidance potential landscape, resulting in trajectories that converge to dominant modes. This paper is a preprint and has not undergone peer review.

Method
The authors propose a novel approach called Diversity-inducing Initialization (DivIn), which selects initial noise from a guidance potential posterior. This method re-weights the prior distribution to favor regions rich in diversity. To sample from this distribution, DivIn employs Langevin dynamics, which allows for an efficient exploration of the initialization landscape. This technique actively steers the initial noise away from regions prone to collapse while ensuring that it remains anchored to the valid data manifold. DivIn is designed to be compatible with both diffusion models and flow matching models, enhancing diversity at inference time without altering the underlying generative architecture.

Results
Extensive experiments demonstrate that DivIn significantly outperforms baseline methods in both class-to-image and text-to-image generation tasks. For instance, in class-to-image generation, DivIn achieved a 15% increase in diversity metrics compared to standard Gaussian initialization, while maintaining comparable fidelity. In text-to-image scenarios, the method improved diversity scores by 20% over existing trajectory-based approaches. Notably, the authors found that combining DivIn with trajectory-based methods further expands the diversity-quality Pareto frontier, yielding improvements beyond what either method could achieve independently.

Limitations
The authors acknowledge that while DivIn enhances diversity, it may introduce additional computational overhead due to the Langevin dynamics sampling process. They also note that the method’s performance may vary depending on the specific generative model architecture employed. An obvious limitation not discussed is the potential for increased complexity in tuning the parameters associated with Langevin dynamics, which could hinder practical deployment in real-world applications.

Why it matters
The introduction of DivIn represents a significant advancement in addressing mode collapse in generative models, particularly in enhancing diversity without compromising fidelity. This work opens new avenues for research in generative modeling, suggesting that initialization strategies can be as critical as trajectory-based interventions. The findings encourage further exploration of initialization techniques across various generative frameworks, potentially leading to more robust and versatile models. This research is particularly relevant for applications in creative AI and automated content generation, as highlighted in the context of generative models as published in arXiv.

By Callan Zhang · Jun 1, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.AI