Notable training methods

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, Cho-Jui Hsieh

Published
Jun 9, 2026 — 17:59 UTC

Problem
The paper addresses the limitations of traditional supervised fine-tuning (SFT) methods, which typically maximize the likelihood of observed tokens in a trajectory. This approach can be suboptimal due to issues such as non-unique, noisy, or misaligned tokens with the model’s prior knowledge. The authors argue that existing literature does not adequately explore the implications of target distribution design in SFT, presenting a gap in understanding how to effectively leverage the rich knowledge encoded in pretrained models. This work is a preprint and has not undergone peer review.

Method
The authors propose the Q-target framework, which reinterprets SFT as a problem of target distribution design. This framework decomposes the SFT supervision into two explicit components: (1) the degree of reliance on the observed token and (2) the allocation of remaining probability mass over alternative tokens. By framing SFT in this manner, the authors unify various existing SFT variants as implicit choices of the target distribution Q. They introduce a new method called Target-SFT, which constructs the training objective directly from the desired target distribution. The paper does not disclose specific training compute requirements but emphasizes the flexibility of the target distribution design in enhancing SFT performance.

Results
Target-SFT was evaluated across ten reasoning dataset-model settings, consistently outperforming baseline methods. While specific performance metrics are not detailed in the abstract, the authors claim significant improvements over traditional SFT approaches, indicating that the target-based methodology leads to better alignment with the underlying knowledge encoded in pretrained models. The results suggest that the proposed framework can effectively enhance model performance in various reasoning tasks.

Limitations
The authors acknowledge that their approach may require careful tuning of the target distribution parameters, which could introduce additional complexity in the training process. They do not discuss potential scalability issues or the computational overhead associated with implementing the Q-target framework. Furthermore, the generalizability of the findings across different model architectures and tasks remains to be fully explored.

Why it matters
This work has significant implications for the design of supervised fine-tuning strategies, suggesting that a more nuanced approach to target distribution can lead to improved model performance. By framing SFT as a target distribution design problem, the authors open up new avenues for research into SFT objectives, potentially leading to more robust and effective fine-tuning methods. This perspective aligns with ongoing efforts in the field to leverage pretrained models more effectively, as discussed in related literature (see arXiv). The findings encourage further exploration of target distribution strategies in SFT, which could enhance the capabilities of AI systems across various applications.

Turing Wire

By Turing Wire editorial staff · Jun 9, 2026 · Editorial standards →

Source: arXiv cs.AI