Notable reasoning

Tracing the ongoing emergence of human-like reasoning in Large Language Models

Paolo Morosi, Nikoleta Pantelidou, Fritz Günther, Elena Pagliarini, Evelina Leivada

Published
May 20, 2026 — 15:28 UTC

Problem
This preprint addresses the gap in understanding the reasoning capabilities of Large Language Models (LLMs) compared to human reasoning, particularly in the context of conditional inferences. While LLMs demonstrate human-like performance on various tasks, the authors investigate whether these models can replicate the pragmatic inferences that humans naturally employ. The study aims to clarify the extent to which LLMs can engage in reasoning that goes beyond literal interpretations, a critical aspect of human cognition that has not been thoroughly explored in existing literature.

Method
The authors conducted a population-matching experiment involving 25 LLMs and an equal number of human participants across four languages. The core technical contribution lies in the comparative analysis of how both groups compute conditional inferences. The evaluation focused on the models’ adherence to truth-table logic versus their ability to incorporate pragmatic inferences. The study does not disclose specific architectures, loss functions, or training compute used for the LLMs, but emphasizes the variability in model behavior. The experimental design allows for a nuanced understanding of the reasoning processes employed by both humans and LLMs.

Results
The findings reveal that while humans effectively enrich logical reasoning with pragmatic inferences, LLMs exhibit a more heterogeneous performance. Some models strictly adhere to truth-table logic, neglecting pragmatic nuances, while others demonstrate a consistent interpretation that reflects rule-based processing but lacks human-like reasoning. The results indicate that LLMs are competent semantic operators but fail to capture the pragmatic enrichments typical of human reasoning. The authors note that LLM accuracy does not correlate with model characteristics such as open vs. closed status, training orientation, or architecture type, suggesting that pragmatic reasoning remains an emergent capability in artificial systems.

Limitations
The authors acknowledge that their study is limited by the selection of LLMs and the specific languages tested, which may not generalize to all models or linguistic contexts. Additionally, the lack of detailed information regarding the architectures and training processes of the LLMs may hinder reproducibility and deeper analysis. The study also does not explore the implications of these findings on the practical applications of LLMs in real-world scenarios, which could provide further insights into their reasoning capabilities.

Why it matters
This research has significant implications for the development of LLMs and their integration into applications requiring nuanced understanding and reasoning. By highlighting the limitations of current models in capturing human-like pragmatic reasoning, the study encourages further exploration into enhancing LLM architectures and training methodologies. Understanding the cognitive gaps between human and machine reasoning can inform future research directions, potentially leading to more sophisticated AI systems capable of engaging in complex reasoning tasks that align more closely with human cognition.

Authors: Paolo Morosi, Nikoleta Pantelidou, Fritz Günther, Elena Pagliarini, Evelina Leivada
Source: arXiv:2605.21299
URL: https://arxiv.org/abs/2605.21299v1

Turing Wire

By Turing Wire editorial staff · May 20, 2026 · Editorial standards →

Source: arXiv cs.CL