Notable agents robotics

Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis

Richie Yeung, Aleks Kissinger, Rob Cornish

Published
May 11, 2026 — 17:49 UTC
Summary length
459 words
Relevance score
80%

Problem
This paper addresses the challenge of synthesizing Clifford quantum circuits, particularly for devices with all-to-all qubit connectivity. The authors identify a gap in existing methodologies that do not effectively leverage reinforcement learning (RL) for this task. The work is presented as a preprint and has not undergone peer review, indicating that the findings should be interpreted with caution.

Method
The authors propose a reinforcement learning framework where an agent learns to synthesize Clifford circuits by discovering sequences of elementary Clifford gates that transform a given symplectic matrix representation to the identity matrix. The learning process is facilitated by a curriculum based on random walks from the identity. A key innovation is the introduction of a novel neural network architecture that is equivariant to qubit relabelings, ensuring that the model’s performance is invariant to the order of qubits. This architecture is also size-agnostic, allowing the same learned policy to be applied across varying qubit counts without the need for circuit splicing or reparameterization of the network. The training process involves optimizing the agent’s policy using a combination of simulated environments and real-time feedback.

Results
The proposed method demonstrates impressive performance on six-qubit Clifford circuits, achieving circuits within one two-qubit gate of optimality in milliseconds per instance. The agent successfully finds optimal circuits in 99.2% of instances, with a processing time of seconds per instance. When extended to ten-qubit instances, the agent continues to perform well, achieving lower average two-qubit gate counts than established baselines, including Qiskit’s Aaronson-Gottesman and greedy Clifford synthesizers. Notably, the agent scales effectively to unseen Clifford tableaus with up to thirty qubits, including targets derived from circuits with over a thousand Clifford gates.

Limitations
The authors acknowledge that their approach is primarily validated on Clifford circuits with up to ten qubits, and while it scales to larger instances, the performance on circuits with significantly more qubits remains to be thoroughly evaluated. Additionally, the reliance on a specific architecture may limit generalizability to other types of quantum circuits beyond Clifford gates. The paper does not address potential issues related to the exploration-exploitation trade-off inherent in reinforcement learning, which could affect the agent’s performance in more complex scenarios.

Why it matters
This work has significant implications for the field of quantum computing, particularly in the synthesis of quantum circuits, which is a critical step in the implementation of quantum algorithms. By leveraging reinforcement learning and introducing an equivariant architecture, the authors provide a novel approach that could enhance the efficiency and scalability of quantum circuit synthesis. The ability to synthesize larger circuits with fewer gates could lead to more efficient quantum algorithms and better utilization of quantum hardware, ultimately advancing the practical application of quantum computing technologies.

Authors: Richie Yeung, Aleks Kissinger, Rob Cornish
Source: arXiv:2605.10910
URL: https://arxiv.org/abs/2605.10910v1

Turing Wire
Author Turing Wire editorial staff