Chinese sensorimotor and embodiment norms for 3,000 lexicalized concepts
Jing Chen, Gábor Parti, Yin Zhong, Chu-Ren Huang, Marco Marelli
- Published
- May 21, 2026 — 15:29 UTC
Problem
This preprint addresses the gap in normative resources for understanding how conceptual knowledge is grounded in sensorimotor experience, particularly for non-Indo-European languages like Mandarin Chinese. Existing resources are limited in scope and dimensionality, hindering empirical investigations into the relationship between language and embodiment in artificial intelligence systems. The authors present a comprehensive normative database for 3,000 lexicalized concepts in Mandarin, which is essential for advancing research in embodied AI and cognitive science.
Method
The core technical contribution is the development of a normative database that includes 11-dimensional sensorimotor ratings and unidimensional embodiment ratings for 3,000 lexicalized concepts, collected from 378 native Mandarin speakers. The 11 dimensions encompass various sensorimotor modalities, such as visual, auditory, and tactile experiences. The authors validate the reliability of their ratings and establish cross-norm validity with existing Chinese resources. They introduce a theoretically motivated metric, Perceptual Strength of Embodiment (PSE), and evaluate its effectiveness alongside seven common composite variables in lexical decision tasks. The study employs regression models to assess the recoverability of sensorimotor ratings from linguistic representations, achieving a mean Spearman correlation of r = .62 across dimensions.
Results
The findings indicate that the PSE-Sensorimotor and Minkowski-3 composite variables are the strongest predictors of lexical decision performance, demonstrating significant facilitatory effects of sensorimotor information on lexical processing. The exploratory analysis reveals that sensorimotor ratings can be substantially recovered from linguistic representations, with visual and auditory dimensions showing higher correspondence than chemosensory dimensions. The representational similarity analysis indicates that the relational geometry of the sensorimotor space is partially recoverable, with a correlation of r = .540, supporting the hypothesis that distributional language use encodes aspects of embodied conceptual structure.
Limitations
The authors acknowledge that while their normative database is extensive, it is still limited to 3,000 concepts, which may not fully capture the richness of the Mandarin lexicon. Additionally, the recovery of sensorimotor ratings from linguistic representations varies across dimensions, suggesting that some modalities may be more challenging to model. The study’s reliance on regression models may also oversimplify the complexities of sensorimotor knowledge representation. Furthermore, as a preprint, the work has not undergone peer review, which may affect the robustness of the findings.
Why it matters
This research has significant implications for both cognitive science and embodied AI. By providing a large-scale normative database, it enables further empirical investigations into how sensorimotor experiences influence language processing and conceptual understanding. The findings can inform the development of more sophisticated AI systems that integrate embodied knowledge, potentially enhancing their performance in natural language understanding tasks. This work also opens avenues for cross-linguistic studies on embodiment and conceptual structure, contributing to a more comprehensive understanding of language and cognition.
Authors: Jing Chen, Gábor Parti, Yin Zhong, Chu-Ren Huang, Marco Marelli
Source: arXiv:2605.22616
URL: https://arxiv.org/abs/2605.22616v1
By Turing Wire editorial staff · May 21, 2026 · Editorial standards →
Source: arXiv cs.CL