I. Introduction
Spiking Neural Networks (SNNs) have gained significant interest as a low-power substitute to Artificial Neural Networks (ANNs) in the past decade [1]. Unlike ANNs, SNNs process visual data in an event -driven manner, employing sparse binary spikes across multiple timesteps. This unique spike-driven processing mechanism brings high energy efficiency on various computing platforms [2], [3]. To leverage the energy-efficiency advantages of SNN s, many SNN training algorithms have been proposed, which can be categorized into two approaches: ANN-to-SNN conversion [4], [5] and backpropagation (BP) with surrogate gradient [6], [7]. Among them, BP-based training stands out as a mainstream training method as it not only achieves state-of-the-art performance but also requires a small number of timesteps 5). However, as BP-based training computes backward gradients across multiple timesteps and layers, SNN s require substantial training memory to store the intermediate activations [8].