Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories
Kevin Qinghong Lin, Batu EI, Yuhong Shi, Pan Lu, Philip Torr, James Zou
- Published
- Jun 9, 2026 — 17:51 UTC
Problem
The paper addresses the gap in automated journalism by proposing a system that can generate complete news articles from raw data, a task traditionally requiring extensive human effort and expertise. Existing agents excel in isolated tasks such as data analysis or visual design but lack the capability to function as an integrated data journalist. This work is a preprint and has not undergone peer review.
Method
The authors introduce the Data Journalist Agent (Data2Story), a multi-agent framework that orchestrates various specialized roles to simulate a virtual newsroom. The framework comprises two key innovations: (i) an Inspector module that ensures claims made in the articles are evidence-grounded, linking every assertion back to the original data, code, or external references; and (ii) a multimodal generative approach that allows the system to create articles that incorporate various media types, such as interactive maps and audio, rather than relying solely on text and static visuals. The training and architecture specifics are not disclosed, but the evaluation involves generating 18 articles, each compared to expert-written counterparts.
Results
Data2Story was evaluated across four axes: (a) human-agent angle coverage, (b) rubric evaluation by 53 participants across five dimensions, (c) performance of computer-use agents as judges to simulate reader navigation of interactive articles, and (d) verifiability through a coding verifier that checks claims against data and references. The results indicate that Data2Story produces competitive multimedia stories, particularly excelling in transparency and auditability. However, human-written articles still outperform in editorial angle, creative design, and overall presentation.
Limitations
The authors acknowledge that while Data2Story demonstrates strengths in generating evidence-based content, it still lags behind human journalists in creative aspects and nuanced editorial decisions. Additionally, the evaluation is limited to 18 articles, which may not fully represent the system’s capabilities across diverse topics or formats. The reliance on participant evaluations may introduce subjective biases, and the lack of peer review raises questions about the robustness of the findings.
Why it matters
The implications of this work are significant for the future of journalism, as it positions Data2Story as a potential collaborator for journalists, enhancing the efficiency and verifiability of news reporting. By automating the generation of evidence-based stories, it could facilitate more transparent and accountable journalism practices. This research contributes to the ongoing discourse on the role of AI in media, as discussed in related works on automated content generation and verification, as published in arXiv cs.CL.
By Turing Wire editorial staff · Jun 9, 2026 · Editorial standards →
Source: arXiv cs.CL