Notable training methods

A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

Sophia Tang, Yuchen Zhu, Molei Tao, Pranam Chatterjee

Published: Jun 11, 2026 — 16:50 UTC

Problem
The paper addresses the gap in reward-guided fine-tuning for any-length discrete diffusion models, a topic that remains largely unexplored in the literature. Existing methods primarily focus on fixed-length sequences, limiting the adaptability and performance of sequence generation tasks. The authors propose a novel framework, A2D2, to facilitate principled fine-tuning in adaptive decoding scenarios, which is particularly relevant given the increasing demand for flexible sequence generation in various applications.

Method
A2D2 employs a unified framework for reward-guided fine-tuning of discrete diffusion models by optimizing both insertion and unmasking policies concurrently. The authors derive the Radon-Nikodym derivative for joint insertion-unmasking path measures, which allows for theoretically guaranteed convergence to the reward-tilted sequence distribution without the need for target samples. The framework introduces the Adaptive Joint Decoding (AJD) loss, which minimizes decoding error by optimizing the quality of both unmasking and insertion processes. The training process leverages a quality-based inference schedule, enhancing the model’s ability to generate sequences of varying lengths effectively.

Results
Empirical evaluations demonstrate that A2D2 significantly outperforms prior methods on several benchmarks. Specifically, the framework achieves a 15% improvement in reward optimization metrics compared to fixed-length fine-tuning approaches. Additionally, A2D2 shows a 20% increase in generation accuracy over existing inference-time guidance methods. These results highlight the effectiveness of the AJD loss in producing high-quality sequences while maintaining flexibility in length.

Limitations
The authors acknowledge that the proposed method may require substantial computational resources for training, particularly when optimizing both insertion and unmasking policies simultaneously. Additionally, the reliance on the Radon-Nikodym derivative may introduce complexity in implementation, which could be a barrier for practitioners. The paper does not address potential scalability issues when applied to very large datasets or extremely long sequences, which could affect the generalizability of the results.

Why it matters
The A2D2 framework represents a significant advancement in the field of sequence generation, particularly for applications requiring adaptive decoding capabilities. By enabling effective reward-guided fine-tuning of any-length discrete diffusion models, this work opens new avenues for research in flexible sequence generation, potentially impacting areas such as natural language processing and generative modeling. The implications of this research are profound, as it lays the groundwork for future studies to explore more complex adaptive decoding strategies, as published in arXiv.

By Callan Zhang · Jun 11, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG