Notable alignment safety Perplexity

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

Gunjan, Sidahmed Benabderrahmane, Talal Rahwan

Published: May 12, 2026 — 17:42 UTC
Summary length: 429 words
Relevance score: 80%

Problem
This preprint addresses the gap in understanding the behavioral characteristics of synthetic political discourse generated by Large Language Models (LLMs) during crisis events. Existing detection methods primarily focus on sentence-level features such as perplexity and token irregularities, which may become less effective as generative models improve. The authors propose a novel approach that evaluates the social realism of LLM-generated text by comparing it to actual discourse from social media during significant political crises.

Method
The authors construct a paired corpus comprising 1,789,406 posts across nine crisis events, including COVID-19, the January 6 Capitol attack, and the 2020 and 2024 U.S. elections. They analyze the generated discourse against observed social media posts using four dimensions: emotional intensity, structural regularity, lexical-ideological framing, and cross-event dependency. The analysis employs mean gaps and dispersion metrics to quantify differences. The study introduces the “Caricature Gap,” a simple event-level measure that encapsulates the discrepancies between synthetic and observed discourse. The methodology emphasizes a Computational Social Science perspective, focusing on population-level realism rather than traditional text-detection metrics.

Results
The findings reveal that while LLM-generated discourse is fluent, it lacks population-level realism. Specifically, synthetic texts exhibit a more negative sentiment, reduced emotional variation, and greater structural regularity compared to observed discourse. The lexical framing in synthetic texts is more abstract and less contextually specific. The Caricature Gap quantifies these differences, showing that the discrepancies are more pronounced in fast-moving, decentralized crises (e.g., BLM protests) and less so in formal events (e.g., U.S. elections). The authors provide quantitative evidence of these trends, although specific numerical results are not detailed in the abstract.

Limitations
The authors acknowledge that their approach may not capture all dimensions of discourse realism and that the Caricature Gap is a simplification. They do not address potential biases in the selection of crisis events or the representativeness of the social media posts used for comparison. Additionally, the study does not explore the implications of these findings on the broader societal impact of LLM-generated content, nor does it consider the potential for adversarial manipulation of the models.

Why it matters
This work has significant implications for the development of more robust auditing frameworks for LLM-generated content, particularly in politically sensitive contexts. By highlighting the limitations of synthetic discourse in mimicking real-world emotional and structural characteristics, the study encourages further research into enhancing the realism of generated texts. It also suggests that population-level auditing could serve as a complementary tool to existing detection methods, fostering a deeper understanding of the societal impacts of AI-generated political discourse.

Authors: Gunjan, Sidahmed Benabderrahmane, Talal Rahwan
Source: arXiv:2605.12452
URL: https://arxiv.org/abs/2605.12452v1

Author Turing Wire editorial staff

Source

arXiv cs.AI https://arxiv.org/abs/2605.12452v1