Adaptive Speech-to-Spike Encoding for Spiking Neural Networks
Taharim Rahman Anon, Jakaria Islam Emon
- Published
- Jun 17, 2026 — 13:07 UTC
Problem
The paper addresses the gap in neuromorphic speech processing caused by the mismatch between continuous acoustic signals and the discrete nature of Spiking Neural Networks (SNNs). Current systems utilize fixed spike encoders, which limit the adaptability of input representations and hinder performance. This work is a preprint and has not undergone peer review.
Method
The authors propose a learnable residual speech-to-spike encoder that is jointly trained with a Recurrent Leaky Integrate-and-Fire (R-LIF) network. The encoder is designed to optimize spike representations for specific tasks rather than focusing on faithful signal reconstruction. The architecture is parameter-efficient, with a compact variant containing only 35,000 parameters. The training process involves end-to-end optimization, allowing the encoder to adaptively learn from the input data. The authors also compare two learning paradigms for SNNs: Direct Feedback Alignment (DFA) and surrogate-gradient Backpropagation Through Time (BPTT), under identical architectures and training conditions.
Results
The proposed encoder achieves an accuracy of 94.97% on the Google Speech Commands v2 (GSC-v2) benchmark, significantly outperforming previous methods. The compact 35k-parameter variant reaches an accuracy of 89.8%, which is competitive with larger models that have an order of magnitude more parameters. The DFA approach yields an accuracy of 91.5%, demonstrating the effectiveness of bio-inspired learning rules in neuromorphic audio processing.
Limitations
The authors acknowledge that the encoder’s design does not prioritize signal reconstruction fidelity, which may limit its applicability in scenarios where accurate signal representation is critical. Additionally, the study is constrained to the GSC-v2 benchmark, and further validation on diverse datasets is necessary to generalize the findings. The reliance on specific architectures (R-LIF) may also restrict the applicability of the proposed method to other SNN frameworks.
Why it matters
This work has significant implications for the development of neuromorphic systems, particularly in speech recognition tasks where efficient processing is crucial. The introduction of a learnable encoder that enhances class separability while maintaining parameter efficiency could lead to advancements in real-time audio processing applications. The comparative analysis of learning paradigms also provides insights into the trade-offs between traditional and bio-inspired learning methods in SNNs, paving the way for future research in this area. As published in arXiv cs.NE, this research contributes to the ongoing exploration of adaptive mechanisms in neuromorphic computing.
By Callan Zhang · Jun 17, 2026 · Editorial standards →
Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.
Source: arXiv cs.NE