Notable efficiency inference UiPath

SupraSNN: Exploiting Synapse-Level Parallelism in Spiking Neural Network Accelerators through Co-Optimized Mapping and Scheduling

Seyed Sadra Ghavami, Mohammad Hossein Nikkhah, Mohammad Rasoul Roshanshah, Saeed Safari

Published: Jun 11, 2026 — 13:41 UTC

Problem
The deployment of Spiking Neural Networks (SNNs) is hindered by the challenge of efficiently managing their inherent parallelism on physical hardware. This work addresses a significant gap in the literature regarding the execution of SNNs, particularly in the context of hardware acceleration. The authors propose a novel approach to overcome the limitations of existing SNN accelerators, which often struggle with serial execution and resource duplication. This paper is a preprint and has not yet undergone peer review.

Method
The core contribution of this research is the SupraSNN architecture, which is inspired by superscalar processor designs. It achieves high synapse-level parallelism by decoupling synaptic and neuronal computations. The architecture employs a Multi-Cast Tree to route spike data to multiple parallel Synapse Processing Units (SPUs), which act as computational pipelines. A Merge Tree is utilized to consolidate the results for processing by a unified Neuron Unit, centralizing complex neuron state dynamics to reduce hardware overhead. The authors introduce a partitioning and scheduling framework that maps the SNN onto hardware while respecting memory constraints. Heuristic scheduling is then applied to optimize the synaptic execution order, enhancing throughput and resource utilization. The implementation is tested on a feedforward SNN trained on the MNIST dataset, achieving notable performance metrics.

Results
SupraSNN demonstrates impressive performance on the MNIST benchmark, achieving 93.44% accuracy with an inference latency of 149 μs and an energy consumption of 0.025 mJ per image (0.276 nJ per synapse) on the Xilinx Zynq XC7Z020 FPGA. This represents a 47.6% reduction in latency and a 5.6× improvement in energy efficiency compared to previous FPGA-based SNN accelerators. Additionally, a recurrent SNN evaluated on the Spiking Heidelberg Dataset achieved 71.82% accuracy, with a latency of 1.41 ms and energy consumption of 0.77 mJ per sample on the XC7Z030 FPGA.

Limitations
The authors acknowledge that the proposed architecture may face challenges in scaling to larger and more complex SNNs, particularly regarding the overhead associated with the Merge Tree and the potential bottlenecks in the scheduling framework. They also note that the performance metrics are specific to the tested datasets and hardware, which may not generalize across all SNN applications. Furthermore, the reliance on heuristic scheduling may limit optimal performance in more complex scenarios.

Why it matters
The development of SupraSNN has significant implications for the efficient deployment of SNNs in real-world applications, particularly in edge computing and low-power environments. By enhancing synapse-level parallelism, this work paves the way for more effective hardware accelerators for SNNs, potentially expanding their applicability beyond vision tasks. The findings contribute to the ongoing discourse on optimizing neural network architectures for energy efficiency and computational speed, as published in arXiv cs.NE.

By Callan Zhang · Jun 11, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.NE