Notable reasoning

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

Lingjing Kong, Xin Liu, Guangyi Chen, Martin Q. Ma, Xiangchen Song, Yuekai Sun

Published: Jun 16, 2026 — 15:55 UTC

Problem
This work addresses the gap in understanding how large language models (LLMs) achieve compositional generalization in reasoning tasks. While post-training pipelines combining supervised fine-tuning (SFT) and reinforcement learning (RL) have shown promise, the underlying mechanisms driving this success remain underexplored. The authors propose a theoretical framework to formalize compositional generalization, which is crucial for improving LLM reasoning capabilities. This paper is a preprint and has not undergone peer review.

Method
The authors introduce a hierarchical latent selection model that generates reasoning traces through a cascade of discrete latent selection variables. These variables correspond to reusable atomic modules, which include both skills (local operations) and routing mechanisms (the selection and composition of intermediate information). The model posits that SFT provides the foundational module materials in the form of compositional traces, while RL is responsible for decomposing these traces to identify and utilize the latent atomic modules. The authors conduct controlled experiments to validate their theoretical claims, demonstrating the efficacy of this approach in enhancing compositional generalization.

Results
The experiments reveal that RL effectively extracts atomic modules from compound traces generated by SFT, enabling the recombination of these modules to solve novel configurations. Notably, training on compound traces results in superior generalization performance compared to training solely on isolated atomic modules. The authors report significant improvements in generalization metrics, although specific numerical results and baseline comparisons are not detailed in the abstract. They also identify an effective training protocol where SFT ensures comprehensive coverage of atomic modules through compositional traces, while RL focuses on exploring novel compositions beyond the SFT support.

Limitations
The authors acknowledge that their framework relies on the assumption that SFT can adequately cover all atomic modules, which may not hold in all scenarios. Additionally, the paper does not address potential scalability issues when applying this model to larger, more complex reasoning tasks. The lack of detailed quantitative results in the abstract limits the ability to fully assess the performance improvements over existing baselines.

Why it matters
This research has significant implications for the development of more robust reasoning capabilities in LLMs, particularly in tasks requiring compositional generalization. By formalizing the interplay between SFT and RL, the authors provide a clearer understanding of how to leverage these techniques to enhance model performance. This work lays the groundwork for future explorations into modular reasoning in LLMs, potentially influencing subsequent research in the field, as published in arXiv cs.LG.

By Callan Zhang · Jun 16, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG