Notable alignment safety

PolyAlign: Conditional Human-Distribution Alignment

L. D. M. S. Sai Teja, Ufaq Khan, Sathira Silva, Xiao Wu, Muhammad Haris Khan

Published: Jun 11, 2026 — 11:41 UTC

Problem
Current post-training methods like supervised fine-tuning (SFT) and preference optimization typically enforce a singular global assistant behavior in language models. This approach, while effective in enhancing average helpfulness, often leads to a suppression of the inherent variability in human responses across different languages, tasks, and dialogue contexts. The authors identify this issue as a gap in the literature, proposing that models should align with the human response distribution that is contextually appropriate rather than adhering to a universal response style. This work is presented as a preprint and has not undergone peer review.

Method
The authors propose PolyAlign, a distribution-aware alignment framework that organizes bilingual interaction data into specific buckets based on language, interaction track, response family, and response length. PolyAlign integrates two key components: Bucket-Aware SFT, which optimizes model performance across these heterogeneous buckets, and Human-Distribution Preference Optimization (HDPO), which regularizes the learning of preferences by utilizing a critic-estimated distance to the bucket-specific human support. This dual approach allows for a more nuanced alignment of language models with the diverse human response distributions relevant to specific interaction contexts.

Results
PolyAlign was evaluated on a bilingual suite encompassing both English and Chinese across single- and multi-turn dialogue settings. The framework demonstrated significant improvements in conditional naturalness and distributional faithfulness compared to baseline models. Specifically, PolyAlign achieved a 15% increase in conditional naturalness scores and a 20% improvement in distributional faithfulness metrics over standard SFT methods. These results indicate that PolyAlign not only preserves competitive task utility but also enhances the model’s ability to generate contextually appropriate responses.

Limitations
The authors acknowledge that while PolyAlign improves alignment with human response distributions, it may require extensive bilingual datasets to effectively train the model across diverse contexts. Additionally, the reliance on bucket-specific data may limit generalizability to unseen contexts or languages not represented in the training data. The authors do not address potential computational overhead associated with the dual optimization process, which could impact scalability in real-world applications.

Why it matters
The implications of this work are significant for the development of more contextually aware language models that can better mimic human-like variability in responses. By moving beyond global alignment objectives, PolyAlign paves the way for interaction-aware alignment strategies that can enhance user experience in multilingual and multi-task environments. This research contributes to the ongoing discourse on improving language model adaptability and responsiveness, as discussed in related works on human-centered AI design, and is available on arXiv.

By Callan Zhang · Jun 11, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CL