Major agents robotics

From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent

Haishuo Fang, Yue Feng, Iryna Gurevych

Published: Jun 11, 2026 — 13:38 UTC

Problem
Existing large language models (LLMs) for automating scientific peer review often fail to provide in-depth evaluations supported by concrete evidence. A significant gap in the literature is the inability of these models to proactively investigate suspicious sections of a paper, akin to human reviewers. This paper addresses this limitation by proposing a novel approach that allows for proactive investigation during the review process. The work is presented as a preprint and has not undergone peer review.

Method
The authors propose ProReviewer, a scientific peer review agent formulated as a Markov Decision Process (MDP). This framework enables the agent to conduct proactive reviews by maintaining a structured review log, which serves as a workspace for tracking evidence and intermediate findings throughout the review process. ProReviewer is built on an 8 billion parameter backbone and is trained using supervised fine-tuning, followed by optimization through reinforcement learning. The structured review log is integral to the agent’s ability to synthesize information and make informed decisions during the review.

Results
ProReviewer demonstrates superior performance across five quality dimensions compared to various baselines. Specifically, it outperforms prompt-based methods utilizing larger frontier LLMs by up to 39% and achieves a 16% relative improvement over the strongest fine-tuned baseline. In human evaluations, ProReviewer also records the highest win rates against competing models, indicating its effectiveness in generating high-quality peer reviews.

Limitations
The authors acknowledge that while ProReviewer shows significant improvements, it is still constrained by the quality of the underlying LLM and the training data used. Additionally, the reliance on a structured review log may introduce biases based on the initial evidence collected. The paper does not address potential scalability issues when applied to a broader range of scientific disciplines or the integration of domain-specific knowledge.

Why it matters
The development of ProReviewer has important implications for the future of automated peer review processes, particularly in enhancing the depth and reliability of evaluations in scientific publishing. By enabling proactive investigation, this approach could lead to more rigorous and evidence-based reviews, ultimately improving the quality of published research. This work contributes to the ongoing discourse on the role of AI in academia and peer review, as discussed in related literature, such as that available on arXiv.

By Callan Zhang · Jun 11, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CL