Verification of a Rapidly Multiplexed Circuit for Scalable Action Potential Recording

This report presents characterizations of in vivo neural recordings performed with a CMOS multichannel neural recording chip that uses rapid multiplexing directly at the electrodes, without any pre-amplification or buffering. Neural recordings were taken from a 16-channel microwire array implanted in rodent cortex, with comparison to a gold-standard commercial bench-top recording system. We were able to record well-isolated threshold crossings from 10 multiplexed electrodes and typical local field potential waveforms from 16, with strong agreement with the standard system (average SNR = 2.59 and 3.07 respectively). For 10 electrodes, the circuit achieves an effective area per channel of 0.0077 mm2, which is >5x smaller than typical multichannel chips. Extensive characterizations of noise and signal quality are presented and compared to fundamental theory, as well as results from in vivo and in vitro experiments. By demonstrating the validation of rapid multiplexing directly at the electrodes, this report confirms it as a promising approach for reducing circuit area in massively-multichannel neural recording systems, which is crucial for scaling recording site density and achieving large-scale sensing of brain activity with high spatiotemporal resolution.


I. Introduction
ELECTROPHYSIOLOGICAL recording is the gold standard for measuring neural activity due to its potential for high spatiotemporal resolution [1]. Extracellular electrical recordings are typically made with multichannel electrode arrays implanted in brain tissue. The number of recorded neurons has steadily increased as electrode technology has improved over time [2]. However, the number of available neurons is still many orders of magnitude smaller than the total number of neurons in the brain. Therefore, large efforts are underway to increase the number of implanted electrodes, in order to provide a more complete picture of neural activity. Through large-scale recording, researchers wish to decode circuit activity in "cell assemblies," and also understand large functional networks in the brain [1].
Extracellular recordings consist of low frequency local field potentials (LFPs) with signal content up to 300 Hz and amplitudes up to 3 mV [3], and action potentials (APs) with useful signal bandwidth between 300 Hz to 10 kHz and amplitudes ranging from 10 μV to 1 mV [4]. Neural recording integrated circuits (ICs) are usually designed for 500-1000 V/V gain, 2-7 μV rms total integrated noise, and 5-10 kHz bandwidth, typically resulting in ~10 μW/ channel power dissipation and ~0.04 mm 2 /channel area [5].
These CMOS front-end ICs traditionally consist of a dedicated amplifier for each electrode, a lowpass anti-aliasing filter up to 10 kHz, and analog to digital converters (ADCs) that may or may not be multiplexed across channels. However, it is challenging to scale this architecture to higher than a few hundred electrodes, due to large area [5]. Flicker noise in the signal band and low frequency filtering typically result in high area utilization from transistors and capacitors, respectively. To address circuit area, we have investigated sharing a single front-end circuit among multiple channels via rapid time domain multiplexing, directly at the electrodes, without any preamplification or buffering [6]. This technique divides the area per channel by the number of multiplexed electrodes (the multiplexing factor). However, the technique also presents a number of challenges related to noise aliasing and electrode offsets that are not easily solved using traditional approaches.
After summarizing the design, which was described in detail in [6], this report presents verification of experimental results for a rapidly multiplexed CMOS front-end chip implemented in 180 nm CMOS. This verification is important because the 2-electrode multiplexing demonstration in [6] does not establish the feasibility of multiplexing many electrodes in general. To demonstrate the efficacy of this new technique, neural recordings from a microwire array implanted in rodent cortex were characterized for signal quality and noise, and compared to a commercial recording system. Thus, successful recordings from 10 multiplexed electrodes are demonstrated, and the noise theory presented in [6] is quantitatively validated.
The paper is organized as follows. Section II gives a brief description of the circuit architecture and operation. Section III describes the experimental methods. Section IV presents characterizations of recordings from in vivo experiments with comparison to the standard bench-top system. Section V presents additional experiments to confirm the noise performance and theory. Section VI discusses the results, application scenarios, and offers avenues for increasing the multiplexing factor. Conclusions are presented in Section VII.

II. Circuit Architecture
The rapidly multiplexed circuit architecture is shown in Fig. 1, and was described in detail in [6]. Multiple electrodes are sequentially switched to the input at a high rate to provide nearly simultaneous sampling at traditional recording rates (~30 kS/s/channel). This architecture departs from traditional neural recording systems due to the higher bandwidth that is required for signal settling. The higher bandwidth also implies that wide-band noise from the electronics and the bio-electrochemical system formed by the electrodes and tissue cannot simply be lowpass filtered, which threatens aliasing into the signal band. Additionally, the DC offsets of the multiplexed electrodes appear as voltage steps at the input. These fast transients cannot simply be highpass filtered, because of the short acquisition time per channel (T int = 1.5 μs in this report). Appropriate design choices were made to mitigate these challenges. Windowed integration sampling (WIS) was employed to reduce aliased noise, resulting in smaller noise-equivalent bandwidth (NEB) than traditional voltage sampling (3.5x smaller compared to 0.1% voltage settling) [6]- [8]. Two stages of DACs are used to remove the electrode offset voltages from the signal path, to allow high overall gain before the ADC.
The fully differential signal chain consists of a capacitive feedback preamplifier, followed by a WIS circuit composed of a transconductance amplifier (G m ) that drives current onto the input capacitance of a successive approximation register (SAR) ADC. One input of the preamplifier is connected to a 32-channel multiplexer constructed from NMOS switches, and the other input is connected to a typically low impedance reference electrode (a cranial bone screw in these experiments).
The number of multiplexed electrodes is the multiplexing factor, M. The maximum number of available time slices that can be assigned to different electrodes is given by N tot = f mux /fs, where f mux is the multiplexer base clock frequency and f s is the desired sampling rate per channel. These slices can be uniformly distributed across the electrodes, or alternatively, multiple samples (N ave ) can be averaged to reduce noise from the circuits and the electrodetissue interface. In the experiments described in this report, f mux was fixed at 600 kHz and N tot = 20, allowing f s = 30 kS/s/channel for a maximum M of 20 (at N ave = 1). Hence, the multiplexer runs at f mux = 600 kHz, corresponding to 1.67 μs per time slice. Of this, the initial 10% was set aside for settling of the preamplifier and DACs, leaving T int = 1.5 μs for WIS, which results in a NEB of 333 kHz (see [6] for theory).
As each electrode is selected, the corresponding DAC codes are updated to cancel the electrode offset. The DAC codes are computed by a binary search algorithm, which tries to center the ADC output codes. The electrode offsets drift with time [9], and hence the offset correction DACs are updated periodically. We observed drift rates of approximately 300 μV/s across channels. Therefore, the offset DACs were recalibrated every 3.5 seconds. The binary search was implemented off-chip for flexibility during testing, and takes ~10 ms in the Windows/Python environment (see Figs. 1 and 2). The offset correction range is ±65 mV, split into 4-bit coarse correction in the preamplifier and 5-bit fine correction in the G m amplifier, leading to an input referred resolution of ±250 μV.
The 180 nm CMOS test chip achieves a noise-efficiency factor of 4.74 (regardless of M or N ave ) and an area of 0.0077 mm 2 /channel for M = 10, which is >5x smaller than typical multichannel chips [5], and reaches the state of the art [10].

A. Surgical Procedures
One 16-channel tungsten microwire array (Tucker-Davis Technologies, TDT) was implanted in the cortex of a 400 g, male Sprague Dawley rat. The 50 μm diameter electrodes were arranged in two rows of eight with 250 μm and 500 μm spacing between row and column sites, respectively. In each row, the electrodes ranged from 1 to 3.8 mm in length to follow the curvature of the brain, so that each electrode tip was at approximately the same cortical depth. Tips were cut at a 60° angle to prevent depression of brain tissue during insertion. All studies were conducted with the approval of the Institutional Animal Care and Use Committee at the University of Utah For implantation, the animal was anesthetized using 3-5% isoflurane for induction, and maintained at 1-2%. The animal was fixed in a stereotactic frame and a rectangular incision was made to expose the skull. A medial to lateral line was drawn 3.8 mm posterior to the bregma, and a hand drill was used to create an opening approximately 2 mm × 4 mm, as close to the temporal ridge as possible. Four small pilot holes were drilled along both temporal ridges and filled with bone screws. Two of these screws were used as ground and reference, the other two were for mechanical support. The array was then inserted into the cortex using a micromanipulator. During insertion, neural activity was monitored until a depth was reached that provided frequent spiking activity across multiple channels. The exposed cortex was then covered with Kwik-Cast silicone elastomer (World Precision Instruments) and UV cure epoxy.

B. Data Acquisition
The reported recording sessions occurred 5 days after surgery. All neural activity was recorded using either a commercial Grapevine Neural Interface System (Ripple Neuro) referred to as "Grapevine," or the custom test chip (specifications are shown in Table I). All side-by-side comparison data were taken from the same recording session; however, these recordings were not made simultaneously. This is important to note, since exact waveform matching should not be expected. For Grapevine recordings, signals were first passed through a front end (Micro2+Stim, Ripple) with 1 Hz highpass and 4.9 kHz lowpass filters. The front-end achieves <2.1 μV rms input referred noise, and can be considered a "gold standard" reference (custom neural recording chips generally have significantly higher noise [5]). Signals were digitized at 30 kS/s by an onboard 16-bit ADC, then passed to a multichannel processor (Grapevine Neural Interface Processor, Ripple) before being transmitted to a computer via ethernet for storage and analysis. Recordings were monitored and saved using the Trellis Software Suite (Ripple).
The setup for recordings with the multiplexed chip consisted of a custom test PCB for reference voltage generation and power management, and a National Instruments PXIe 6368 for digital control, clocking and I/O. The implanted array was connected to the test PCB with an Omnetics Nano Strip connector. The 6368 was controlled using custom Python code, and provided the 600 kHz multiplexer clock, addresses, and DAC codes. For most recordings, a repeated sequence of 20 addresses were used to distribute time slices across the available channels, using f mux = 600 kHz and f s = 30 kS/s. For instance, for M = 4 and when using a uniform distribution, each channel would be sampled 5 times (N ave = 5 slices) in one sequence. The N ave slices are averaged to reduce noise. Other variants included 16 slice repetitions and f s = 37.5 kS/s. All experiments were performed under anesthesia, using a ke-tamine/xylazine cocktail (50 and 7 mg/kg, respectively), which was injected into the intraperitoneal space. Besides recording experiments, electrode characterization measurements were made with the animal inside of a custom copper mesh Faraday cage, and consisted of electrical impedance spectroscopy (EIS) in a 3-electrode configuration using a Gamry Interface 1000E Potentiostat, similarly to [11].

C. Signal Processing
The recorded data were demultiplexed to separate 30 kS/s streams, which involved averaging samples from multiple time slices together for electrodes that were allocated N ave > 1 (to reduce noise). The data were then processed by digital filters in MATLAB that approximate Grapevine's analog filters, which is important for comparing waveform shapes since the custom chip's frequency response is nearly flat [6]. These "raw" data were passed through a digital bandpass "spike filter" consisting of a 4th order highpass (750 Hz) and 4th order lowpass (4000 Hz) zero-phase digital Butterworth filter, in MATLAB. Putative action potentials were isolated using custom MATLAB detection software. Briefly, a threshold was placed at -3.0x the standard deviation of the spike filtered recordings. Around each threshold crossing, a 1.6 ms waveform snippet was extracted. Isolation of threshold crossing events was performed with time-amplitude window thresholds similar to the "hoops" described in [12]. Upper and lower "hoop" bounds were +/− 1.7x the standard deviation of all the threshold crossing events for the electrode, from a template waveform (Grapevine: mean of threshold crossing events; test chip: mean of sorted events of the same electrode from the Grapevine recordings). Fig. 3 shows 3.5 sec input-referred recordings, where the raw data were either lowpass filtered (top) or spike filtered (bottom). Spike waveforms from the test chip were obtained using M = 10 (2 slices per channel). Grapevine recordings were performed approximately 10 minutes before the multiplexed recordings in an effort to minimize differences in activity due to brain state, although some difference is inevitable.

A. Recorded Waveforms Over Long Timescales
There is strong agreement in the lowpass filtered data. Both show 600-800 μV pp periodic oscillations and smaller oscillations in the 20-100 Hz range, which is characteristic of LFP activity under ketamine [13]. For spike filtered data, both recording systems show periods of low activity and high activity (bursts of threshold crossing events) at approximately 5-7 Hz, which is also common under ketamine [14]. While the peak-to-peak spike filtered amplitudes from each system are similar, the noise floor of the test chip is visibly higher, which is expected given the additional circuit and aliased electrode noise ( ~5 μVrms for the test chip and ~2.1 μV rms for Grapevine in the spike band). LFP waveforms show good agreement between the two systems across all electrodes up to M = 20 (1 slice per channel with 4 discarded). However, for M > 10, the noise of the spike filtered data was substantially higher.

B. Action Potential Waveform Analysis
To assess the ability for action potential recording, threshold crossing events were isolated and compared to the Grapevine system. Average multiunit waveforms were compared, since there were no clearly discriminable groups of waveforms (putative single units) at any electrode sites when recording with either system. Fig. 4 shows example overlay plots from 4 electrodes. To calculate the average multiunit waveform on each electrode, threshold crossings were averaged across 30 seconds of Grapevine data and 31.5 seconds of test chip data (9 separate 3.5 sec recordings made with M = 10, i.e., 2 slices per channel). Fig. 4 shows close agreement between the mean waveforms of what is likely multiunit activity for Grapevine (black) and the test chip (blue) for the same 10 channels in Fig. 3, at their locations on the array. Similarities in waveform shape between electrodes are likely a result of high neuronal densities in the implanted region and moderate SNR caused by large electrode areas (~4000 μm 2 ). Table II shows quantitative comparisons between the two recording systems. SNR values were calculated using the peak-to-peak amplitude of the average multiunit waveform divided by the twice the standard deviation of noise [15]: where W is the average multiunit waveform and ε is a matrix containing the difference of each sample compared to the average. Noise floors were calculated as μV rms of the spikefiltered waveforms with threshold crossing time segments removed. As expected, the noise was slightly larger in the test chip recordings but the average SNR and amplitudes were similar (4.23 μV rms , 2.59 and 15.62 μV pp respectively compared to 3.17 μV rms , 3.07 and 16.37 μV pp for the Grapevine system). Electrode offsets at the test chip input (V os , in ) are listed as well, extracted from the offset correction DAC codes. Fig. 5 shows inter-spike interval (ISI) plots for the same 4 selected channels in Fig. 4. Electrodes 3 and 6 show particularly good agreement. The fact that intervals from all four channels tend toward <20 ms suggests that a majority of them are taking place within the short bursts seen in Fig. 3, which is caused by synchronized firing of nearby neurons. Fig. 6 shows the average multiunit waveforms recorded from two channels across a range of N ave values, with each mean waveform derived from 31.5 seconds of recordings taken separately for each value of N ave . While there does appear to be a small trend toward increased amplitude and narrowing of the width with decreasing N ave , the relative shape is maintained.

C. Scramble Tests and Non-Uniform Slice Allocation
One potential concern is that a variety of circuit non-idealities and electrode behaviors may result in the multiplexing order having some influence on the shape of the waveforms (e.g., crosstalk or settling). Hence, we compared the average waveforms of four channels with M = 4 for two different orders. Fig. 7 shows the average multiunit waveforms for two example channels. These recordings were taken within two minutes of each other. As Fig. 7 demonstrates, the shapes of the average multiunit waveforms were maintained across the two orders, demonstrating basic confidence that a number of possible issues that can cause artifacts have relatively minor effects at the most.
The architecture allows the user to easily assign different slice allocations to individual electrodes to optimize signal quality, e.g., assigning fewer slices to channels with large amplitude action potentials or higher levels of biological noise. Fig. 8(a) demonstrates two example schemes for M = 4 (a 16-slice sequence was repeated and down-sampled to 37.5 kS/s). Electrodes were assigned either 2, 3, 5 or 6 successive slices, which were averaged to reduce noise. Fig. 8(b) shows the background noise floor for all 4 electrodes for four different slice budgets. Table III shows quantitative data for all 4 channels. All recordings show a reduction in noise for a budget of 6 slices versus 2. Thus, this approach enables the ability to digitally optimize noise performance across electrodes, similarly to [16], [17] but using time as a resource instead of large, paralleled circuits that cost high overhead in area.

V. Noise Analysis
In general, noise in neural recordings primarily comes from three fundamental sources: electronic noise from the circuits, thermal noise from electrode-tissue interface, and background biological activity [18]. To validate the noise performance of the rapidly multiplexed test chip, before the in vivo experiments we characterized and recorded from the same microwire electrode array in vitro (immersed in phosphate buffered saline solution, PBS). Electrode impedance was measured with EIS to estimate thermal noise using the methods described in [11]. The average electrode impedance at 1 kHz was ~5 kΩ in PBS and ~54 kΩ when implanted, and at 30 kHz was ~2 kΩ in PBS, and ~24 kΩ when implanted (see [6], [11] for representative spectra). We de-embedded the circuit noise from each channel by subtracting a measurement of the test chip's grounded-input power spectral density (PSD) from the demultiplexed PSD calculated from recordings, similarly to [11]. Here, we compare the de-embedded noise against the impedance-predicted noise, for in vitro and in vivo experiments. To calculate the predicted thermal noise, the EIS data were converted to a PSD, which was processed with the theoretical WIS transfer function ( Fig. 1(d)) and folded around the Nyquist frequency.
Background biological activity that is considered noise in neural recordings is a low frequency phenomenon (up to ~5 kHz [11]). However, circuit noise and the electrode-tissue noise are wideband [11]. As the rapidly multiplexed chip samples the electrodes, highfrequency noise from these sources aliases into the baseband. The PSD of the baseband noise depends on the WIS integration time (set by the multiplexing rate), which determines the front-end's NEB. As discussed in Section II, for the measurements in this report the multiplexing clock runs at 600 kHz, which translates into 30 kS/s/channel for N tot = 20 and a NEB of 333 kHz (Tint = 1.5 μs).
We multiplexed all 16 electrodes in vitro, using N ave = 1 slice per channel. Fig. 9(a) shows the comparison of the de-embedded PSD for a typical electrode along with the impedancepredicted PSD. Since there is no biological activity, the de-embedded PSD should closely match the predicted thermal noise. We see some error in the matching that is attributed to digital interference in the overall test setup, which we reduced as much as possible through improved shielding and grounding. Fig. 9(b) shows the same comparison for a typical electrode from the in vivo experiments described in Section IV. These measurements were taken under ketamine anesthesia with M = 10 electrodes (N ave = 2 slices per channel), which was the configuration that showed wellisolated threshold crossing activity. We see that the PSD contains low frequency biological activity that causes a deviation with respect to the expected thermal noise, and at higher frequencies the measured PSD is close to the impedance-predicted PSD. Tables IV and V show data for all electrodes tested in vitro and in vivo, respectively, where the noise was integrated in the 500 Hz-5 kHz band to estimate the noise expected after spike filtering. Somewhat higher errors are seen in the in vivo data, which is expected due to the presence of biological activity. The average integrated content in the spike band is 4.77 μV rms for in vivo measurements, which correlates well with the background noise seen in the Section IV recordings.

VI. DISCUSSION
The test chip recordings were strikingly similar to the commercial bench-top system. In general, this is difficult to achieve with custom circuitry due to differences in filter characteristics, linearity, noise, and channel-to-channel variability. In this study, we were able to get agreement by replicating the bench-top system's analog filters with digital post processing of the test chip data. This match indicates that differences in linearity did not cause significant differences in the shape of the acquired signals (the Grapevine's linearity is not reported, while the test chip's THD is 2% [6]). The measured crosstalk depended on the clock duty cycle, since the settling on C S and the reset on CiN,adc happens during the narrow settling phase of the clock. For a 10% duty cycle the measured crosstalk was −20 dB. This is higher than simulated (−60 dB), and is suspected to be due to either parasitic capacitance at the input or higher reset switch resistance compared to simulations. For a 20% duty cycle, the measured crosstalk was < −60 dB. For channel-to-channel variability, we suspect that the Grapevine analog front-end uses G m -C filtering, which typically results in high levels of mismatch. The test chip, however, does not have channel-to-channel variability to 1st order since the same circuit is used to condition each channel.
A noise-related aspect that contributed to the close agreement was the reuse of thresholds and "hoops" [12] determined from the bench-top recordings, when processing the test chip data. This step was primarily important for close agreement in the ISI plots, and made a relatively minor impact on the average waveform shape. we did observe some loss of events during strong bursts of activity, when manually isolating threshold crossing events in the test chip data without using knowledge of the bench-top recordings. This loss of events is expected from custom circuitry in general, which is usually designed for higher levels of input-referred noise (5-10 μV rms ) compared to bulky and power-hungry bench-top systems.
This issue is difficult to assess from the literature, due to a lack of studies that perform detailed comparisons of recordings between custom and commercial systems. The isolation method used was deemed appropriate since the main purpose of this study was to verify that rapid multiplexing directly at the electrodes can record the same events as traditional architectures (multiunit action potentials). The experimental results contribute strong evidence that this is the case, although it should be noted that the experimental conditions did not produce recordings of well-isolated single units with either system. In a practical application scenario, the test chip could in fact be used with M = 1 to produce very low noise recordings for initialization of spike sorting, followed by time slice reduction for multichannel acquisition, although this technique is likely not needed in most applications.
In the presented experiments, the multiplexing factor was constrained to 10, primarily due to the small threshold crossing amplitudes (~20 μV pp ). These amplitudes are on the low side compared to other studies, as are the measured background noise levels (~4 μV rms ). For example, [19] reported 50-100 μV pp amplitudes and an average background noise level of ~11 μV. A similar longitudinal study [20] reported ^120 jV pp average amplitudes and background noise ranging from 5-13 μV rms . Given these amplitudes and noise levels, far higher multiplexing factors are possible, although it should be mentioned that highfrequency noise was not reported and may be somewhat higher given the small electrode areas used. However, [21] reported amplitudes of 60-400 μV pp across multiple types of arrays (including TDT microwires) with different electrode materials and area, with an average SNR of ~6 across and within all types of arrays, with background noise of ~10 μV rms . Hence, a multiplexing factor of 10 can be viewed as a lower bound that applies to experiments with very low amplitudes and low intrinsic SNR (~3 in this study). As demonstrated in this report, a system can be designed for higher multiplexing ratios than those that are allowed in one particular experiment, without sacrificing performance, since multiple time slices can be used to reduce noise. The non-uniform slice allocation experiments presented highlight the utility of the rapidly multiplexed approach for designing an adaptive system without large area overhead [16], [17] or the need to perturb the circuit's bias point.
There are still many fundamental limits and practical issues to be explored, and large open area for refining circuit implementations and performance. while this approach has been recently explored for low impedance surface electrodes for electrocorticography [22], to the authors best knowledge this report and our work in [6] are the only examples in the literature of acquiring putative action potentials using rapid multiplexing, directly at high impedance electrodes, without any preamplification or buffering. Although questions remain related to electrode behavior and fundamental limits (focuses of our ongoing work), this study hopefully contributes confidence in a dramatically different approach to neural recording with tremendous potential for scalability. As discussed in [6], this technique is most beneficial for fully integrated arrays with active circuitry (e.g., [10]), since it does not address the wiring bottleneck that occurs if the circuitry is on a separate chip.

VII. Conclusion
This report described verification experiments comparing a rapidly multiplexed CMOS test chip to a commercial, bench-top recording system. strong agreement was observed in recordings between the two systems, including threshold crossing shapes and inter-spike intervals. Measurements of noise characteristics were analyzed to verify the fundamental theory of rapid multiplexing using windowed integration sampling, as well as the test chip's performance. The results indicate successful recording of putative multiunit action potentials using a 10:1 multiplexing factor. This factor divides the circuit's effective area, resulting in 0.0077 mm 2 /channel, which is >5x smaller than typical multichannel neural recording chips, and reaches the state of the art [10]. Higher multiplexing factors are expected to be achievable, and there is new open space for further research into similar techniques and improved electronics. Test setup used for multiplexed neural recordings from an anesthetized rat. The implanted electrode array connects to the multiplexing test chip through a wired Omnetics connector. The digital outputs from the chip are read by a NI PXIe-6368, which also controls the multiplexer addresses, the offset DAC codes, and the master clock. The PXIe-6368 is controlled by a Python script, which implements the binary search algorithm for finding the DAC codes for offset compensation. Right: waveforms of threshold crossing events from ~30 seconds of recordings on four example channels. Blue waveforms were recorded from the multiplexed chip; black waveforms from the Grapevine system. Solid traces represent the mean of the shown waveforms; lighter dashed traces show individual threshold crossings. Left: Mean waveforms of threshold crossing events for 10 channels and their approximate location on the electrode array. Inter-spike interval (ISI) plots for the four selected channels in Fig. 4 (blue) show close agreement to the Grapevine system (gray).