A Single Chip SPAD Based Vision Sensing System With Integrated Memristive Spiking Neuromorphic Processing

This paper presents a new scalable $8\times8$ single photon avalanche diode (SPAD) based vision sensor with integrated spiking neuromorphic system on a single chip. The proposed vision sensing system adopts the benefits of SPAD’s high quantum efficiency and energy efficiency of memristive spiking neuromorphic processing. The SPAD based vision sensor includes biologically inspired address event representation (AER) readout to generate asynchronous digital address events at the output reducing computation and making it suitable to process directly with integrated on-chip spiking neuromorphic system in a faster and more energy efficient way. A novel on-chip interface is designed to convert the output events of a SPAD-based event sensor into temporally coded spikes (TCS) that enable on-chip processing with integrated spiking neuromorphic system. We have tested the prototype vision sensing system for imaging characters by SPAD based vision sensor and recognizing them using the integrated memristive spiking neuromorphic system. To help with the evaluation, we have built a complete temporal pulses data set from simulating the SPAD vision sensor with AER readout in imaging characters and applied directly to integrated spiking neuromorphic system via designed novel on-chip interface. The achieved accuracy is 89.54% with a power consumption of $316~\mu \text{W}$ for the memristive neuromorphic processor. The SPAD based vision sensor exhibits array-level dynamic range of 148 dB with a power consumption of 2.8 mW. The designed SPAD-based vision sensing system with an integrated spiking neuromorphic system on a single chip shows great promise for robotics, autonomous vehicles, health, and security applications.


I. INTRODUCTION
Single photon avalanche diode (SPAD) imager can capture high-speed 3D images at very low light levels due to its single photon counting capability. With excellent timing response and the ability to produce a digital pulse from a single detected photon, SPADs find a wide range of applications including robotics, bioimaging, facial recognition, and automotive light-detection-and-ranging (LiDAR) for autonomous vehicles [1], [2], [3], [4], [5], [6], [7], [8].
The associate editor coordinating the review of this manuscript and approving it for publication was Zhipeng Cai .
Traditional SPAD imagers encode the time of flight information of incoming photons using a time to digital converter (TDC) in direct time of flight form ( Fig. 1 (a)). The timing data is then transferred off-chip for processing. However, the redundant large volume of unnecessary spatiotemporal data generated in the conventional SPAD imager and finally off chip transfer of those data for processing puts an obstacle in the high frame rate of the SPAD imager. Further, conventional signal processing techniques used on traditional processors to process SPAD data are computationally intensive, requiring significant power and complex hardware. VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. (a) Traditional SPAD imager and off-chip processing of output data [9], (b) Proposed SPAD based vision sensor with integrated memristive spiking neuromorphic processing on a single chip.
SPADs are event-driven and asynchronous by nature, making them suitable for asynchronous address event representation (AER)-based readout scheme. AER is an asynchronous multiplexing technique that mimics the behavior of biological vision [10]. In comparison to frames-based imagers, AER readouts are more efficient due to their reduced computational redundancy and delays. AER readout has been used previously in non-SPAD vision and image sensors because it is energy-efficient, delay-efficient, and event-based [11], [12], [13], [14], [15], [16]. AER readout was used for SPAD imager as well [17]. Prior works, however, did not include onchip processing of output events. Therefore, existing vision sensors and SPAD imagers still rely on off-chip processing of digital address events generated from AER readout, resulting in considerable computational overhead and latency. A frame-based SPAD sensor array with event-based processing in FPGA was reported in [8], but it requires additional steps to convert frames into events, adding complexity to the system. Additionally, all on-chip processing implementations and event based processing in [8] are based on FPGA-based boards that are power and area-intensive. Hence, there exists a need for an efficient on-chip processing methodology for the data generated by AER readout of the SPAD based vision sensor system.
The SPAD image sensor with AER readout provides a stream of digital pulses or spikes at the output, so eventbased, spiking neural networks (SNNs) can be used to process SPAD's output more efficiently. Due to the asynchronous event-based information processing in SNN, the neuron which is addressed by an event is activated only resulting in energy efficient processing [18], [19]. Furthermore, since the event can be responded directly by a neuron and does not have to wait until all the neurons in layers are evaluated, nor to the discrete time step to fire its response resulting in processing information without delay. Additionally, the computation redundancy is also reduced because of the processing of the events using a relatively small number of spikes.
In this work, we present a novel single chip memristive spiking neuromorphic SPAD based vision sensing system which adopts the benefits of high quantum efficiency of SPAD and energy efficiency of memristive spiking neuromorphic processing. The proposed system is illustrated in Fig. 1 (b), where the SPAD based sensor with biologically inspired AER readout captures each individual photon reflected back from the target and generates temporal pulses. A novel on-chip interface is proposed which enables the conversion of SPAD's temporal pulses to temporally coded spike (TCS) enabling on-chip processing with integrated spiking neuromorphic system on a single chip. Finally, the developed memristive spiking neuromorphic system takes TCS and directly outputs classification result in a more energy and area efficient way. To the best of our knowledge, this is the first full SPAD based vision sensing system with integrated spiking neuromorphic processing on a single chip. Mixed-signal approach is used to implement the memristive spiking neuromorphic system to blend together the benefits of both analog and digital design in addition to the merits of nano-scale size and low power operation of the memristor. Some of the distinctive characteristics of the developed memristive neuromorphic system over existing neuroscience-inspired systems are: a higher functionality synapse model, a simplified neuron model, configurability of the overall neuromorphic architecture (number of synapses, number of neurons and connections), and scalable system capacity. We have previously verified the design concepts with initial simulation results [9], [20]. Here, we detail the design and operation of the proposed SPAD sensor with AER readout, the novel on-chip interface, and the implemented neuromorphic system including the learning rule, as well as a detailed performance analysis of the SPAD vision sensing system integrated with spiking neuromorphic system on a single chip. In order to evaluate the performance of the proposed vision sensing system, we have tested the developed system to image characters (digit 0 -9) and recognize by the integrated memristive spiking neuromorphic system. To help with the evaluation, we have built a complete temporal pulses data set from simulating the SPAD vision sensor with AER readout in imaging the characters and applied directly to the neuromorphic system via the proposed on-chip interface. We investigated the effect of different design parameters of the integrated spiking memristive neuromorphic system on the performance of memristive spiking neuromorphic SPADbased vision sensing system. Additionally, we analyzed the robustness of the proposed vision sensing system against nonidealities of memristor devices.
The rest of this paper is organized as follows: Section II presents the related works. Section III describes the architecture of the SPAD based vision sensor including the SPAD based pixel as well as the SPAD pixels array with AER readout, followed by the proposed on-chip SPAD sensorprocessor interface in Section IV. The implementation of the memristive spiking neuromorphic system which is integrated to process the output events of SPAD based vision sensor is presented in Section V. Testing and results are reported in Section VI to evaluate the performance of the vision sensing system, a performance comparison of the developed vision sensor with state of the art vision sensors is presented in section VII, and finally section VIII outlines the conclusion.

II. RELATED WORKS
Because of their high sensitivity, high spatial resolution, and picosecond temporal resolution, SPAD-based time of flight (TOF) imagers [21], [22], [23], [24], [25], have been attracting researchers' attention in a wide range of applications, including automotive, industrial, security, and medical. Most of the SPAD imagers encode the time of flight information of incoming photons using a TDC and transfer it off-chip for processing. There have been a number of different timing circuit architectures proposed in the literature. As an example, [22] presents a distributed digital silicon photomultiplier with a flexible detection strategy and auto-sensitivity with pixel-level TDCs, while [23] presents a per-SPAD TDC that records spatial cross correlation functions of entangled photon fluxes. In their direct TOF receiver, [24] divided the SPAD array into a number of predefined blocks, and allocated TDC resources to only one of the blocks implemented separately on-chip for each measurement. The timing technique in [25] utilizes both a TDC and a time-to-amplitude converter (TAC) with a counter measuring global clock cycles and the triple integration interpolator (TII) measuring between clock cycles. Despite the low power consumption and compact nature of TACs, high frame rates are dependent on a fast analog-to-digital converter (ADC). TDCs, on the other hand, are area-consuming and may consume a high amount of power, yet they can achieve high frame rates due to their intrinsic digital nature. Additionally, these SPAD imagers with TDC have the disadvantage of generating a large amount of redundant spatio-temporal data that must be transferred off-chip for processing, which limits their high frame rate.
Research has recently focused on integrating SPAD imagers with processing to address the low frame rate of off-chip transfers. In [26], a FPGA board was used for integrating the real-time control and signal processing circuits, including synchronized point-by-point scanning, the TDC, the ranging histogram, and peak detection. Reference [8] proposed several alternative methods to convert frame-based SPAD imager data into event-based data streams. The output of the proposed event generation methods is then processed by an event-based feature extraction and classification system implemented in FPGA hardware. Reference [27] incorporated a post-processor on FPGAs incorporating a photon count equalizer (PCE) for removing pattern noise as well as a depth-mapping engine for constructing subrange synthesis and a photon imaging engine. All these integrated processors are based on FPGA boards, which are extremely power-and space-intensive.
In contrast to frame-based imagers, AER communication circuits reduce computation redundancy and delays, making them a suitable choice for event-based vision sensors including non-SPAD vision sensors [14], [16], [28], [29] and SPAD imagers [17]. Event-based sensors improve real-time machine vision by responding asynchronously to relative intensity changes as opposed to conventional cameras. There has been effort made to optimize pixel architecture, reduce pixel size, increase resolution, improve read-out latency, higher read-out throughput, and minimize read-out noise onpixel. Reference [14] presented a fully synthesized wordserial representation of group address events (G-AER) to acquire data (i.e., pixel events) at high speed even with high resolution (e.g., VGA). This representation handles massive events in parallel by tying together neighboring 8 pixels in a group. Reference [28] uses simple buffer arrays for implementing the column driver and row sampling circuit controlled by the digital timing and AER generator (DTAG). In addition to the sequential column selection signals, DTAG generates a global event-holding signal and provides only the addresses of the events grouped into 8-b groups that are sent out by FastFinder. Reference [16] integrated several vision functions into a single image sensor with AER and optimized pixel architecture for high-speed motion detection, temporal contrast detection, and full-frame output. Reference [29] demonstrated another dynamic vision sensor (DVS) employing synchronous AER (SAER) in order to control frame rate and suppress pixel-parallel noise and spatial redundancy. Since SPAD is capable of detecting single photons, the SPAD-based event sensor combined with AER presented in [17] shows great potential for low-light imaging. In addition, SPADs have much higher light sensitivity than photodiodes (PDs) used in conventional vision sensors [11], [12], [13], [14], [15], [16], [28], [29]. However, on-chip processing has not been included in these event based vision sensors. Due to their reliance on off-chip processing, these sensors are unable to process data in real-time, fast, energy-efficient, and area-efficient ways.

III. ARCHITECTURE OF PROPOSED SPAD BASED VISION SENSOR
A. SPAD BASED PIXEL SPAD is a Geiger mode avalanche photodiode which consists of a p-n junction diode and is reverse biased beyond the breakdown voltage [1]. A free carrier is generated due to the arrival of a photon triggering an avalanche. In this region, the high electric field from the applied bias causes charge carriers to accelerate and they undergo impact ionization creating more free carriers. These frees carriers also undergo impact ionization and the number of carriers exponentially increases causing a sudden avalanche of current. This sudden large avalanche current enables the detection of single photons. However, in the absence of photon avalanche events are also generated due to thermal generation, band-to-band tunneling and are known as dark counts. This is the effective noise of the SPAD device [1]. Fig. 2 illustrates the design of the SPAD based pixel. In addition, each pixel includes a quenching circuit, a Schmitt trigger, an analog counter, and an event generator. With enhanced fill factor and resistance immunity to variation, a MOS transistor operating in triode region is used to implement a reconfigurable quenching resistor. When a photon arrives at the SPAD, a Schmitt trigger included in the pixel generates a clean digital pulse from the noisy pulse generated by the SPAD. SPAD avalanche events are integrated over time using an analog counter in the pixel with less power and area, resulting in a higher fill factor. The analog counter with a 9-bit resolution is designed based on the basic charge transfer principle. An event generator is integrated into the pixel to notify of the events to the AER system at the top level. The event generator is triggered by comparing the output of analog counters with threshold voltage. Once it receives the acknowledgement from the top level AER readout, it disables the event notification and becomes free to generate next event notification to the top level. The analog counter and event generator are implemented based on prior work [30].
The proposed SPAD based vision sensor has been implemented in a 65 nm standard CMOS process. We have  AER is an asynchronous digital multiplexing technique that was introduced to imitate the behavior of neural system [10]. Since SPADs are event-based and inherently asynchronous, they are a natural partner of AER readout technique. The output of AER readout is in the form of address-events that are generated locally by the pixels only which has data to send. This results in reducing the computational redundancy and delay associated with other synchronous readout technique used in traditional frame based SPAD imager. Fig. 3 presents the block level view of the system architecture incorporating SPAD based pixel array with fully digital asynchronous AER readout. An 8 × 8 SPAD based pixels array with AER readout is implemented for this prototype design. The AER readout includes row and column arbiters, row address encoder (4-bits), and column address encoders (4-bits). In order to resolve contentions causing errors or data loss, the system implements a fully arbitrated row-column architecture. Pixels trigger when a photon arrives and send a request to arbiters. The row address encoder transmits this particular row address into the data bus in parallel after the row arbiter selects the pixel's row. The row arbiter also initiates a column request when that row is selected. The column address encoder, like row address encoder, encodes and transmits the pixel's column address after the column arbiter selects it for transmission. A data valid bit is added to the column address encoder to provide sufficient time for the address line to stabilize. The design of the arbiter and address encoders has been detailed in previous work [30]. Fig. 4 (a) shows the I-V characteristics of the SPAD device obtained from the model simulation and experiment. The reverse properties and breakdown voltage of the device extracted from the model simulation are in good agreement with the experimental measurements. A quenching technique is required to simulate the dynamic behavior and avalanche, quench and reset. For simplicity, a passive quenching technique with a 100k quenching resistor is used to validate the simulation and experimental measurements. Fig. 4 (b) shows that the simulated dynamic behavior shows a good matching with the experimental dynamic behavior.
Finally, the proposed 8 × 8 SPAD based pixel array with AER has been simulated using the developed SPAD SPICE model. Fig. 5 shows the asynchronous address events generated by simulating the SPAD based vision sensor. When the reflected photon from the target image incidents on the SPAD array, pixels are triggered and their row and column address are placed on the output data bus by AER. Here as an example, pixel11 (P11), pixel on the second row and sixth FIGURE 5. Output address-events which provides the x (row), y (col) address of the triggered pixel in the array generated by simulating the designed 8 × 8 SPAD based pixel array with AER readout. column, is triggered by the first incident photon. P11 sends the row request (Row3Req) to the corresponding row arbiter. Once the row arbiter selects the row request (Row3Sel), the row address encoder places the second row address (0010) on the output bus as shown in Fig. 5, and the column request is initiated for the sixth column (Col6Req). The column arbiter selects the particular column (Col6Sel) and the column address encoder places the sixth column address (0110) on the output bus as illustrated in Fig. 5. In addition, the column address encoder generates a ''data valid bit'' in parallel to ensure that the address line is stable throughout the whole process as shown in Fig. 5. Similarly, the second photon hits the pixel20 (P20) and its corresponding row (0011, third row) and column address (1000, eighth column) are generated by the row and column address encoder of AER and placed on  Synchronous SPAD event to TCS conversion. Looking at the position of spikes at SPAD timeslots, the delay required for conversion can be calculated. For example, the green arrows show that SPAD event at t1 and t8 slot should be delayed by 7 and 1 cycles respectively. The delay required for each SPAD event is tabulated in Table 1.
the output data bus as presented in Fig. 5. Observing the output, it can be seen that the AER generated the row and column addresses of the following triggered pixels (row address, column address), P28 (0100, 0100), P59 (1000, 0011), P56 (0111, 1000), P46 (0110, 0110), P59 (1000, 0111), and P63 (1000, 0111) with high data valid in a similar way. Thus, the proposed SPAD-based vision system with AER can effectively generate digital pulses or spikes as address-events of the triggered pixels in the array only, the row and column addresses of the triggered pixels, reducing the redundant spatio-temporal data generation associated with traditional SPAD images with TDCs. Additionally, the spikes generated by the designed SPAD imager with AER enable fast, energyefficient, and area-efficient on-chip processing by the spiking neuromorphic system, thereby avoiding the limitations of conventional frame-based SPAD imagers. Finally, the output digital pulses or spikes are applied directly to the integrated memristive spiking neuromorphic system for processing via the proposed on-chip interface as described in the following sections.

IV. SENSORS-PROCESSOR ONCHIP INTERFACE
The SPAD based vision sensor generates streams of digital pulses, the timing of which depends on the intensity of the pixels of the target image. The higher the intensity of the pixel, the sooner the photon arrives and hence, the earlier the SPAD event is generated. Moreover, SPAD event generation is asynchronous, but the neuromorphic system used in this work is synchronous. Furthermore, the AER readout generates row and column addresses of the SPAD pixel, but the input neurons of the neuromorphic system are indexed in a single column. To overcome these challenges, an interface between the sensor and the neuromorphic system is needed. Therefore we propose a novel on-chip interface which enables the conversion of SPAD based event sensor outputs to temporally coded spike (TCS) enabling on-chip processing with integrated memristive spiking neuromorphic system.
The on-chip interface converts the output events of the SPAD's sensor with AER readout, the temporal pulses, into TCS based on a temporal encoding scheme. We have built a complete data set of temporal pulses simulating the SPAD vision sensor with AER readout for imaging characters (digit 0 -9), to be used as input for the on-chip interface. Inputs are preprocessed to reduce the dynamic range of pulse arrival times into predefined number of time slots and finally mapped to intensity values before encoding. Here, the original pulse arrival times ranging from earliest to latest arrival times were downsampled to eight time slots (t1 -t8) and mapped into a range from −4 to +4. By assigning polarities to the pixels, the temporal encoding scheme is compatible with the spike-timing-dependent plasticity (STDP) rules. In this case, the input is an 8 × 8 image with arrival times of pulses from the SPAD sensor with a time range of t1 to t8, where t1 is the earliest time and t8 is the latest (Fig. 6(a)). Fig. 6(b) illustrates the TCS generated from the SPAD events, prior to applying the input encoding scheme. SPAD event occurring earliest (t1) is mapped to +4 and later arriving SPAD events are mapped to gradually decreasing value with the latest being mapped to −4. The mapping allows the SPAD sensor to be compatible with the implemented spiking neuromorphic processor in this work. The spiking neuromorphic system includes a spike generation scheme based on our prior work [31], wherein the pixel intensities range from −4 to +4, such that higher pixel intensity magnitude generates a spike closer to the center of the time frame. Positive and negative intensity pixels are placed before and after the center of the time frame, respectively. The lower magnitude pixels are gradually placed further away from the center of the time frame, which is shown in Fig. 6(c). Maximum potentiation will occur in input pixels closest to output neuron spikes with the most positive polarity (timeslot 4 in Fig. 6(c)). The weakest potentiation would happen when input pixels with the least positive intensities appear first (timeslot 1 in Fig. 6(c)). Similar to the positive pixels, negative pixels cause maximum depression when placed near the output spikes with the least negative polarity (timeslot 5 in Fig. 6(c)). The depression would be weakest for negative pixels that arrive last and have the least negative input (timeslot 8 in Fig. 6(c)). This encoding scheme enables the spiking neuromorphic processor to implement STDP [32], a biologically plausible learning rule, with minimal circuit overhead in the synapse design of the spiking neuromorphic system.
The STDP rule stipulates how much the synaptic weight change should depend on the temporal difference of the spikes. The strength of the connections between neurons is adjusted as a function of temporal differences between neuron spiking events. In general, if the pre-neuron spike occurs before the post-neuron spike, long term potentiation (LTP) takes place, and the synaptic connection between the neurons is strengthened. Conversely, if the pre-neuron spike occurs after the post-neuron spike, long term depression (LTD) takes place, and the synaptic connection between the neurons is weakened [33]. The amount of strengthening or weakening of the synaptic connection is dictated by the STDP rule. The   When the SPAD event occurs, the delay decoder converts the count value into the required delay value. The binary to thermometer decoder generates the select signals for the delay line using the delay value provided by the delay decoder to produce the TCS.
largest change in synaptic weight occurs when the difference in time between the pre-neuron and post-neuron spikes is small, and as this difference gets larger, the synaptic weight change diminishes [32]. To conform to the STDP rule, the on-chip interface places the TCS generated from the SPAD events into timeslots corresponding to the intensity of the incident pixel, centering around a reference where the postneuron spike is forced to occur by design (Fig. 6(c)). The resulting STDP behavior is illustrated in Fig. 8, showing the  FIGURE 11. Simulation result of the proposed on-chip interfacing circuit. The SPAD based vision sensor generates the asynchronous event SPAD pulse, which is synchronized by the Clock to produce the synchronous SPAD sync signal at the next clock edge. Finally, the delay block delays it to generate the TCS, which is compatible with the neuromorphic processor. (a) TCS generated from synchronized SPAD event occurring at timeslot t1, (b) TCS generated from synchronized SPAD event occurring at timeslot t2, (c) TCS generated from synchronized SPAD event occurring at timeslot t3, (d) TCS generated from synchronized SPAD event occurring at timeslot t4, (e) TCS generated from synchronized SPAD event occurring at timeslot t5, (f) TCS generated from synchronized SPAD event occurring at timeslot t6, (g) TCS generated from synchronized SPAD event occurring at timeslot t7, (h)TCS generated from synchronized SPAD event occurring at timeslot t8. change in synaptic conductance between neurons with the cycle difference between the spike times of the neurons. The timing of the synchronized SPAD event and TCS [31] is illustrated in Fig. 7. Here, SPAD events for different pixel intensities are shown, assuming that SPAD event collection time window is discretized to 8 timeslots to account for 8 discrete pixel intensity values. As mentioned before, the higher intensity pixels of the target image generate SPAD events earlier. As we decrease intensity from +4 to -4, the SPAD events occur at timeslots 1 to 8, respectively. Fig. 9 shows the block diagram of the proposed on-chip interface for the SPAD based event sensor and spiking neuromorphic processor. The DeMux provides the asynchronous SPAD spike to the appropriate rows, which is first synchronized and then converted to the TCS by Pulse Synchronizer and Delay Block, respectively. The operation of each blocks are briefly described here.
The SPAD sensor with an AER readout generates a stream of digital pulses along with the row and column addresses of the triggered pixels and the time of the generated pulse (Fig. 1). The de-multiplexer is used to de-multiplex the SPAD signal to the corresponding input neuron, according to the row and the column address generated by the AER readout of the SPAD sensor. The row and the column addresses are 3-bits each and the de-multiplexer select pin requires 6-bits to de-multiplex to 64 input neurons. As the input neurons are arranged in row-major-order, the row addresses are tied to the 3 MSBs and column addressed are tied to the 3 LSBs of the select pin.
Once the correct input row has been identified and the asynchronous SPAD event is passed on to the corresponding row, the signal is synchronized to the clock. As SPAD event pulse width maybe narrower than the clock period, the asynchronous signal is first provided to a rising-edge detector.  Then a Level-to-Pulse converter is used to generate the synchronous SPAD event that is required for TCS conversion.
To convert to TCS, the SPAD events need to be delayed by some clock cycles, the number of which depends on the intensity of the pixel. The TCS timeslots begin 4 clock cycles after the SPAD timeslots for maintaining causality. The delay required is calculated by looking at the SPAD timeslots for both SPAD events and TCS events of the same pixel intensity (Fig. 7). For example, the pixel intensity of +3 generates a SPAD spike at SPAD timeslot 2, but for TCS scheme it should generate a spike at SPAD timeslot 7. Hence, the SPAD event is delayed by 5 clock cycles. Table 1 lists the delay required for each pixel intensity of the target image.
The re-configurable delay block shown in Fig. 10 houses a counter to keep track of the timing of the SPAD event and stops counting when there is an SPAD event. Then the number of delay required is generated by the delay decoder and the SPAD event is delayed by that amount. A binary to thermometer decoder drives the programmable delay line to select the number of delay units required to realize a given clock cycle of delay. Thus, the TCS signal is generated from the SPAD event.
The proposed on-chip sensor-processor interfacing circuit was implemented in a 65 nm CMOS process and simulated using Cadence Spectre. Simulation results are plotted in Fig. 11. SPAD pulse is the SPAD event that is generated by the vision sensor according to the pixel intensity of the incident image. The higher the pixel intensity, the earlier it is generated. However, due to the inherent randomness of SPAD, the events are not generated at the exact same time for the same pixel intensity. Hence, it cannot be guaranteed to occur at a clock edge. Due to the asynchronous nature of the SPAD pulse generation, an edge detector and pulse synchronizer is built-in to the interfacing circuit to produce a synchronized SPAD event at the next positive edge of the clock, which is referred to as the SPAD sync signal in Fig. 11. This synchronized SPAD event is then delayed according to the delay values listed in Table 1 to generate TCS, which can be directly applied to the spiking neuromorphic processor.
Simulation result presented in Fig. 12 shows SPAD events generated at different SPAD time slots from the triggered pixels of an 8 × 8 array SPAD vision sensor imaging the digit '0' and converetd TCS events at corresponding TCS time slots. We can see that SPAD events are converted into corresponding TCS events successfully delaying by the required delay amount estimated in Table 1. Additionally, the converted TCS events corresponding to the SPAD events at different time slots are also placed at TCS timeslots as expected with proper placement of positive and negative polarity of TCS intensity before and after the time reference (mid point of TCS timeslots) respectively, with higher intensity placed closer to the reference. Fig. 13 also verifies the successful conversion of SPAD events generated from the triggered SPAD pixels into corresponding TCS events for imaging digit '1'.

V. MEMRISTIVE SPIKING NEUROMORPHIC SYSTEM
Spiking neuromorphic systems are bioinspired systems that emulate the function of a mammal's neural system. They are composed of neurons and synapses that process information in the form of spike events and are used for representative applications such as character recognition. This section provides a brief description of the designed spiking neuromorphic system architecture, which includes input neurons, memristive crossbar synapse, and homeostasis enabled output neurons to build a single-layer spiking neural network for processing the SPAD-based vision sensor's output events.

A. ARCHITECTURE OF MEMRISTIVE SPIKING NEUROMORPHIC SYSTEM
The memristive spiking neuromorphic system shown in Fig. 14 is implemented using input neurons, twin memristors VOLUME 11, 2023 FIGURE 14. Memristive spiking neuromorphic system using input neurons, twin memristors synapses, and homeostasis enabled integrate and fire output neurons.
synapses, and homeostasis enabled integrate and fire output neurons based on prior work [31]. A single-layer spiking neural network is then implemented using these building blocks and a twin memristive crossbar to process the temporally coded spikes generated by the proposed on-chip interface from the output events of the SPAD-based vision sensor.

1) SYNAPSE USING MEMRISTOR AND INPUT NEURON
Memristors are nanoscale non-volatile two terminal devices that were theorized and demonstrated at first in [34] and [35] respectively. Memristors are essentially resistors whose resistance can be altered by subjecting them to a certain amount of voltage or current flux. This is done by applying a net voltage bias across the device for a certain period of time. When the applied bias is above a certain threshold, known as the switching threshold voltage (Vth), the memristor's resistance changes and the device is said to have switched. The memristor's resistance can have any value between two extremes known as the low resistance state (LRS) and the high resistance state (HRS). These values are dependent on the type of device under consideration (for example, based on the material used and the switching mechanism involved) and its physical dimensions of implementation. The resistance level can be adjusted by adjusting the magnitude and duration of the voltage. Therefore, the memristor can store different resistance values, similar to the artificial synapses in SNNs. Synapses allow signals to pass between neurons and weigh the incoming signals in biological neurons [36]. A synaptic plasticity, or a change in weighting factors in synapses, is what enables learning and storing information.
The use of memristors has become popular in neuromorphic circuits since synaptic weight can be encoded with a memristance value. Additionally, two terminal memristive synapse consumes less power, area, and cost [37], [38].
A twin-memristor synapse is designed to achieve both positive and negative weights without additional circuitry in this work, enabling easy integration into a crossbar architecture that exploits memristor crossbar density benefits. As shown in Fig. 14, one end of the twin memristor synapse is shorted and connected to the output neuron, while the other end is driven by the input neuron. The input neuron is designed to convert the TCS generated from the on-chip interface into appropriate waveforms, pulse width modulated signals, which are applied to the twin memristor synapse for accumulation and learning.

2) HOMEOSTASIS ENABLED OUTPUT NEURON
The output neuron illustrated in Fig. 14 is based on the Integrate And Fire (IAF) neurons described in [31] in which the integrator accumulates input current and the comparator compares the accumulated voltage against a threshold. The accumulation in output neuron, V accum , is defined as where C fb represents the effective integration capacitance, and i in (t) represents the total current entering the output neuron.
Increasing capacitance results in a lower accumulation voltage, thereby reducing the possibility of spiking even if the threshold is maintained. A series of switches is connected to the parallel capacitors, activating them, and then a combined integration capacitance is obtained by adding all the capacitors that are ''activated.'' Switches are controlled by a control block, which also determines the synaptic phase and communicates with the winner-take-all (WTA) bus to implement lateral inhibition. The ability to reconfigure integration capacitance (C fb ) on the chip facilitates homeostasis. As each spike is generated from the output neuron, capacitance is added.

B. LEARNING RULE AND OVERALL OPERATION OF IMPLEMENTED NEUROMORPHIC SYSTEM
A modified STDP-based unsupervised learning algorithm [39] is utilized to train the network, the flowchart of which is illustrated in Fig. 15. For each input image, 64 TCS is generated for 64 pixels of the image, each incident on an input neuron. The 64 input neurons are connected to the twin memristive crossbar array, in which the memristors are randomly initialized. Each column of the network is connected to all 64 input neurons via the twin memristor synapse, which acts as a cross-point between a row (input neuron) and a column, to which the output neuron is connected. With each input image, the column currents are accumulated at the corresponding output neurons, and the winner-take-all logic looks for the output neuron which spikes first. As soon as a single output neuron spikes, accumulation is stopped immediately, and this particular neuron is selected as the winner. Then the learning step commences following the STDP rule. To encourage spiking for subsequent similar (or same) pattern, synapses connected to the winning neuron (i.e., column) that had positive inputs (encoded by TCS) are potentiated. Conversely, for subsequent dissimilar patterns, synapses connected to the winning neuron (i.e., column) that had negative inputs (encoded by TCS) are depressed. However, since the net conductance of the winning column has now increased compared to the other columns, the output neuron is now more likely to spike for any patterns. To reduce the neuron's overall spiking probability, homeostasis is applied by decreasing the integration rate of the neuron by increasing the integration capacitance. This makes the neuron less likely to spike for a random pattern, but since the synapses corresponding to the incident image pixels are potentiated, the neuron is still more likely to spike if another variation of the same image occurs subsequently. This algorithm is then serially applied to all the images in the dataset. However, there could exist some output neurons, which are connected to columns that have very low net conductance, and never spike. To encourage these neurons to produce spikes, after some amount of input images (for example, after 20 input images and no spikes), the integration capacitance of these output neurons is decreased, which is referred to as 'Reverse Homeostasis' in prior work [31].
One iteration over the entire training dataset is defined to be an epoch. After some amount of predefined epoch is concluded, the training is assumed to be completed. Since this is an unsupervised learning method, a labelling step is required to benchmark the performance of the network. In the labelling step, all learning mechanisms such as STDP, homeostasis, reverse homeostasis are turned off, and the output neurons are only allowed to accumulate and spike according to the winner-take-all logic. The label of the input data for the winning neuron is tallied for each output neuron. Output neurons are labelled to the digit for which it spiked the highest number of times in one labelling epoch. The testing step is similar to the labelling step, as all forms are learning are turned off. However, one epoch of testing dataset is used to find the testing accuracy of the network. If an output neuron spikes for the image that it had been labelled for, the input is considered to be correctly classified, and vice versa. SPAD images used for training is shown in Fig.16.

VI. RESULTS
The proposed SPAD based vision sensor is implemented in a 65 nm CMOS process and uses the SPAD SPICE model developed earlier in section III-B. The CMOS portion of the memristive spiking neuromorphic circuit is implemented in a 65 nm CMOS process, while the memristor is modeled in Verilog-A [40]. The prototype memristive spiking neuromorphic SPAD based vision sensing system was tested to image and recognize characters (digit 0 -9). We have built a complete temporal pulses data set from simulating the SPAD vision sensor with AER readout (8 × 8 SPAD based pixel array) in imaging those 10 digits. The output   events of SPAD vision sensors is applied directly to the neuromorphic system via proposed on-chip interface which enables the conversion of SPAD based event sensor outputs to temporally coded spike (TCS) enabling on-chip processing with integrated neuromorphic system on a single chip. The SNN is modeled in Python to evaluate the performance. It is assumed that the system operates at 100 MHz clock frequency, and the memristor parameters are adopted from [40]. The performance of the designed system was evaluated by varying the parameters of the integrated spiking memristive neuromorphic system. The various design parameters such as number of neuron, epochs, capacitors, and the maximum capacitance values are tuned to optimize the performance of the spiking neuromorphic system. The robustness of the proposed system against the memristor device imperfections were also analyzed.
The number of input neurons were fixed at 64, assigning one for each pixel. The number of output neurons were varied from 10 to 1000, in steps of 10 neurons, keeping the number of epoch constant at 10. Due to the inherent randomness of memristor initialization, each simulation was run 25 times and median accuracy of the result of the runs are presented in Fig. 17. The networks starts out at an accuracy of 59% with only 10 output neuron and quickly improves to 80% with 80 output neurons. As we keep on increasing the number of neurons, the accuracy increases very slowly with decaying returns and eventually saturates with minor noise. At 550 neurons, the network reaches 87% for the first time. Among all the runs, the absolute maximum accuracy reached by a network was 89.65% with 530 neurons. With a low number of neurons, accuracy is decreased because there are not enough output neurons to learn all the patterns for each digit in a small number of columns. With the addition of more output neurons, a digit can be learned in different forms, leading to a fast improvement in accuracy, as is observed from 10 to 50 output neurons. The accuracy begins to plateau as more output neurons can no longer significantly improve accuracy since there are only a few different forms of the same digit. To optimize the network for the number of epochs, the number of output neurons was kept constant at 200, and the number of epochs was varied from 0.1 to 20, collecting accuracy data more frequently at the lower number of epochs to show the learning trend. The result is plotted in Fig. 18, showing the accuracy reaching 78.85% just after one epoch of learning. A similar trend is observed here compared to the result of increasing neurons. At first the rate of learning is very high, but after the network has learned most of the data patterns, the accuracy settles around a certain value and shows minor deviations. With only 2 epochs, the accuracy goes above 80% (81.52%). At later stages, with more epochs, the accuracy does tend to rise, but with a very small slope. Most patterns are already learned, and providing the same set of data does not significantly improve performance after a certain point. The accuracy settles around 85%, with maximum median accuracy reaching 85.2% with 16 epochs. As the number of epochs decreases, the learning rate increases, with input images contributing to rapid synaptic weight and accumulation adjustments. Learning flattens out and the synaptic weights converge on their expected points when the neurons see the same set of images over and over again. Therefore, the same neurons with similar input patterns produce similar spiking patterns, resulting in small to no improvement in accuracy.
For the sake of completeness, the number of neurons and epochs were varied at the same time and the results are shown in Fig. 19. The number of epochs was changed from 1 to 10 in increments of 1, and the number of neurons was changed from 50 to 100 in increments of 50. This result also shows the improvement in accuracy with both the number of neurons and epochs and the diminishing returns. The maximum median accuracy was 87.31% with 950 neurons and 9 epochs, whereas the absolute maximum accuracy of all the results collected was found to be 89.54% with 850 neurons and 10 epochs. This pattern shows that the network's accuracy increases diagonally, though at a slower pace than was initially expected.
The circuit level parameters to fine-tune the performance of the neuromorphic system are the number of capacitors used in the output neuron and their capacitance. The number of capacitors, as well as the largest capacitance value, was varied to optimize performance, the result of which is presented in Fig. 20. The accuracy increases with both the parameters, but the most gain is brought upon by the number of capacitors used, substantiated by the contours growing darker red faster on the X-axis. As discussed before, having more capacitors enables the neuron to have more resolution in homeostatic plasticity, since there would be greater variation in the accumulation rate if more capacitors were used. A higher variation in accumulation rate allows neurons to find an optimal accumulation rate, resulting in an increase in accuracy. By limiting the capacitance, spiking competition is prevented from being dominated by a neuron. Since the memristor imposes a limit on synaptic weight, boosting maximum capacitance past this point does not have a significant impact. Now that the data is degraded by 5.6%, having more granularity provides a significant gain. This is also beneficial from a circuit level perspective, as multiple small capacitors consume less area compared to one huge capacitor.
The proposed spiking neuromorphic system is built using a crossbar array of memristors, which suffer from various device level non-idealities such as switching time mismatch, aging, device failures, etc. [43], [44]. The switching time from HRS to LRS direction has been reported to be over two orders of magnitude less than the switching time from LRS to HRS [45], [46]. The system performance is heavily degraded by the switching time mismatch, as evident from Fig. 21a. A duty cycle modulation technique proposed in [31] is applied to rectify the impact of switching time mismatch, and the performance of the system is rectified. Another common issue with memristors is known as aging, which refers to the gradual decline of the switching window between HRS and LRS due to cycling [47]. Over a significant period of time, the HRS decreases and LRS increases, and the memristor device loses its endurance. The proposed neuromorphic system is found to be resilient against aging, retaining peak performance up to 30% resistance deviation from nominal HRS and LRS values (Fig. 21b). This can be attributed to the homeostasis mechanism of the output neurons, which  regulates the rate of accumulation, enabling the neurons to adapt to the shrinking switching window of the memristors. At 100% resistance drift, the device is deemed to have failed, which is another problem pertaining to the memristor. The memristor device failure could occur due to both aging [48] or fabrication issues [49]. When the memristor device is stuck at a certain resistance level and does not switch with the application of voltage above the threshold is considered to have failed. In Fig. 21c, the effect of the rate of failure on the performance is illustrated. The neuromorphic system shows considerable resilience, retaining over 80% accuracy even when one out of every four memristors is stuck. The resilience of the neuromorphic system is due to the architecture of the network. Since the number of output neurons is much higher than the number of output labels, there is a built-in redundancy in the system. When a few columns fail due to memristor device failure, the network can leverage the other columns and neurons to retain the accuracy.

VII. PERFORMANCE ANALYSIS AND DISCUSSION
The developed vision sensing system is the first SPAD based vision sensing system with integrated memristive spiking neuromorphic system on a single chip adopting the benefits of SPAD's high quantum efficiency and energy efficiency of memristive spiking neuromorphic processing. A biologically inspired AER readout is integrated into the SPAD vision sensor to generate asynchronous digital address events at the output, which reduces computation and enables the integrated neuromorphic system on-chip to process the output directly in a more energy efficient manner.
The array-level dynamic range for the developed SPAD based vision sensor is given by, DR = 20log 10 2 where b represents the counter's resolution used in the pixel, f max is the maximum speed of readout circuit, DCR is the average dark count rate of SPAD device, and N is the total number of pixels in the array. The estimated array level dynamic range for this prototype 8 × 8 SPAD based pixel array is 148 dB with 9-bit counter at the pixel, maximum readout speed of 80 MHz, and the average DCR of 100 Hz. The proposed SPAD vision sensor is compared with other existing event based vision sensors in Table 2. The proposed SPAD-based vision sensor offers higher dynamic ranges and consumes less power compared to the other event based vision sensors. Moreover, SPAD has much higher light sensitivity than photodiodes (PDs), which are used in conventional vision sensors [11], [12], [13], [14], [15], [16]. Thus, low-light scenarios can also be captured well by the proposed SPAD-based vision sensor, thanks to its single-photon level sensitivity. Furthermore, the proposed SPAD-based vision sensing system includes a novel onchip interface based on temporal encoding scheme to enable SPAD's temporal pulses to be processed by an integrated spiking neuromorphic system. In contrast, the existing eventbased vision sensors [11], [12], [13], [14], [15], [16] are not equipped with on-chip processing. The off-chip processing prevents them from processing data in real-time, fast, energyefficient, and area-efficient ways. A unique feature of the vision sensor presented in this work is that it incorporates onchip processing while leveraging the SPAD's high quantum efficiency.
We have tested the proposed SPAD based vision sensor with integrated memristive spiking neuromorphic processing to image and recognize characters (0 -9 digits). A performance comparison of the developed memristive spiking neuromorphic system (integrated with the proposed SPAD based vision sensor) with other existing event based processing is presented in Table 3. Previously similar characters recognition task obtained from an event based sensor was tackled by [41] and similar task was chosen here to provide a direct comparison with previously published works, as shown in Table 3. In [41], Orchard et al. reported an accuracy of 84.9% ± 1.9% for the digits (0 -9) and characters (A to Z) recognition using a four layer hierarchical SNN model. Although the achieved accuracy (84.47%) of our proposed system with 200 neurons and 10 epochs is on par with that for the same task, the developed memristive spiking neuromorphic system provides a more energy and area efficient processing due to the use of nano-scale memristor device. Furthermore, there are multiple parameters in the proposed design that could be optimized and the attained highest accuracy was found to be 89.54%. Moreover, a separate Dynamic Vision Sensor (DVS) [12] is used to image the digits which are then processed by the developed hierarchical SNN model in [41]. In comparison the proposed work provides a complete system which consists of both the SPAD imager to image the digits and the integrated memristive spiking neuromorphic system on a single chip to enable on-chip processing. Moreover, in [8] a frame based SPAD imager is used and finally converted into output event based data stream via several alternative methods increasing complexity in the design. In addition, the processing was built in FPGA using all digital circuits which are area and power intensive. In comparison, the SPAD imager developed in this work includes AER readout to generate output events directly reducing complexity associated with conventional frame based imager and making it suitable to process directly with integrated SNN. Furthermore, by integrating the developed mixed-signal memristive neuromorphic system with the SPAD imager on a single chip, a more area and power efficient, and real time processing was achieved while retaining the memristor's merits such as its nano-scale size and low power consumption. Moreover, the applied neuromorphic system provides a multitude of reconfigurability, which reinforces the designer's arsenal to increase the network performance. The number of neurons, epochs, capacitors, the implementation of reverse homeostasis, duty-cycle modulation can be thought of as analogous to the hyperparameters of Deep Neural Networks/Convolutional Neural Networks, which enable the designer to tweak the performance of the network according to the need of the application. Additionally, the implemented neuromorphic system uses a digital training and testing approach which is robust to noise at circuit and system levels. The circuits of the neuromorphic system were implemented in an unsupervised neuromorphic system with a memristive crossbar that took into account memristor non-idealities. The design of the unsupervised learning mechanism proved to be robust against non-idealities such as mismatch, aging, and failure.
Furthermore, the proposed SPAD imager consumes less power and provides higher dynamic range than the DVS [12] (compared in Table 2) while adopting high quantum efficiency of SPAD device. For context, the power consumption of a 8 × 8 SPAD array with AER readout is 2.8 mW and neuromorphic processor is 316 µW . The average power consumption of the on-chip interfacing circuit was also found to be only 22.6 µW . The proposed new interfacing circuit was found to be power efficient, consuming less than 1% power of the whole system including sensor, interface, and neuromorphic processor, enabling a compact sensing system with integrated processing on a single chip. In addition, the proposed on-chip interface shows great promise in bridging the gap between event-based sensors and real-time processing with spiking neuromorphic processors.

VIII. CONCLUSION
A new scalable 8 × 8 SPAD based vision sensor with integrated spiking neuromorphic system on a single chip has been presented. We have presented a novel on-chip interface based on temporal encoding scheme to enable processing of SPAD's temporal pulses by integrated spiking neuromorphic system on a single chip. The designed on-chip interface consumed power less than 1% of the entire sensing system with an area of 43µm × 30µm, enabling a compact sensing system on a single chip, which reduces the gap between event based sensing and real-time high-speed processing. As far as we know, this is the first SPAD based vision sensor with integrated spiking neuromorphic processing on a single chip. The prototype SPAD vision sensing system has been tested to image characters (digit 0 -9) and recognize by the integrated memristive spiking neuromorphic system. We achieved an accuracy of 89.54% with a power consumption of 316 µW by the memristive neuromorphic processor. The achieved array-level dynamic range of the SPAD sensor is 148 dB with a power consumption of 2.8 mW. It was found that various design parameters of the developed neuromorphic system, such as the training epoch, the number of output neurons, and the capacitors, could be tuned to optimize performance. Moreover, the proposed system showed robustness to non-idealities of memristor devices, including switching time mismatch, aging, and device failure. The results of the proposed SPAD based vision sensing system with integrated memristive spiking processing on a single chip demonstrates the great potential for robotics, autonomous vehicles, health, and security applications.