A Low-Complexity Sensing Scheme for Approximate Matching Content-Addressable Memory

The need for approximate rather than exact search arises in numerous compare-intensive applications, from networking to computational genomics. This brief presents a novel sensing approach for approximate matching content-addressable memory (CAM) designed to handle large Hamming distances (HDs) between the query pattern and stored data. The proposed matchline sensing scheme (MLSS) employs a replica mechanism and a 12-transistor positive feedback sense amplifier to effectively resolve the approximate match operation. The MLSS was integrated into a 4 kB approximate CAM array and fabricated in a 65 nm CMOS technology. With an overall area footprint of 0.0048 mm2, which includes 512 sense amplifiers and the replica mechanism, the MLSS allows a flexible and dynamic adjustment of the HD tolerance threshold via several design variables. Experimental measurements demonstrate the efficiency of our sensing scheme in tolerating very large HDs with the highest sensitivity.

Several techniques have been proposed in literature to enable approximate matching in CAMs by leveraging customized sensing schemes and other solutions (i.e., involving redundancy).For instance, error-correction codes have been suggested for Ternary CAMs (TCAMs) and NANDtype CAMs, which employ parity bits and a dedicated matchline scheme [11], [12].However, these methods can only handle a small Hamming distance (HD) of 1 to 4 bits between the input query pattern and stored data, and their implementations require large area overhead and increased design complexity.Tunable sampling time techniques have also been explored [13]; however, their implementation is challenging due to the strong dependency on precise device and circuit sizing, susceptibility to jitter, and higher probability of generating false results (matches instead of mismatches and vice versa).A recent solution, proposed in [14], presents a large HD-tolerant approximate CAM (HD-CAM) based on matchline charge redistribution.Unfortunately, the sensing scheme presented in this brief also suffers from a high degree of design complexity and large area overhead.
This brief proposes a low-complexity, scalable, and areaefficient sensing scheme for approximate CAMs with a tunable matchline discharge rate [14], [15].Our sensing scheme consists of a 12-transistor positive feedback sense amplifier along with a replica mechanism that provides control of the sampling time during approximate match operations.Specifically, the replica line enables the sensing of the sense amplifier that further resolves the compare result.Additional design variables allow adjusting the HD tolerance threshold and the sensitivity of the proposed sensing scheme.A 4 kB HD-CAM design [14] integrating the proposed matchline sensing scheme, was fabricated in a commercial 65 nm CMOS technology.The sensing scheme of the HD-CAM design has a silicon footprint of 0.0048 mm 2 .The effectiveness of the suggested approximate match sensing scheme (i.e., its sensitivity as a function of HD and its susceptibility to variations) is evaluated through experimental measurements.
This brief provides the following main contributions:  approximate match sensing capabilities, that has been fabricated and evaluated in silicon.• Our sensing scheme supports a very wide range of HD tolerance through user-configurable design variables.• The proposed sensing scheme presents low susceptibility to sampling time, temperature, and process variations.• Unlike state-of-the-art matchline sensing schemes, the proposed design utilizes the charge redistribution of a replica line to control the sense amplifier sampling time.

II. BACKGROUND: HAMMING DISTANCE TOLERANT CAM
The HD tolerant CAM (HD-CAM), proposed in [14], is capable of both exact and approximate matching; the latter tolerating HDs of up to 60% of the length of the query pattern.HD-CAM design is based on the observation that the matchline voltage drop is proportional to the HD between the query pattern and a data word.To evaluate the efficiency of HD-CAM approximate matching, it was tested as a real-time DNA classifier programmed to detect Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) DNA in a metagenomic sample (i.e., containing the DNA of multiple organisms).Noteworthy attributes of HD-CAM include its ability to tolerate large HDs with high sensitivity and precision, its resilience to DNA sequencing errors and sampling time variation, and its reduced area overhead and design complexity.
Fig. 1(a) shows the top-level schematic view of an m × n HD-CAM [14].Each row in the CAM has its own matchline (ML), which is connected to a ML sense amplifier (MLSA).A pair of searchlines (SLs) are connected to all the bitcells in a column, thereby forming an n-bit HD-CAM word, as shown in Fig. 1(b).The precharge (PC) transistor (MPC) is used to precharge the ML.The MLSA senses the state of the matchline against a reference voltage (V ref ).Fig. 1(c) shows the NOR-type HD-CAM bitcell, which is built upon the conventional NOR-type CAM bitcell [1].Similar to a standard six-transistor static random access memory (6T-SRAM) cell, it is based on a pair of cross-coupled inverters for storing data and accessed for write and read by enabling row access through the word line (WL) and driving SL and SL to opposite logic values for write or pre-charging them for read.The associative search operation involves two steps: precharge and evaluation.During the precharge step, the ML is precharged to V DD by enabling the MPC transistor (PC = '0').This is followed by the evaluation step, where the MPC transistor is cut off (PC = '1'), and the query data is loaded onto the SLs.The comparison between the query pattern and the data word is performed by the M1-M3 transistors.An evaluation transistor (M4) is integrated into the HD-CAM cell to regulate the discharge rate of the ML according to the evaluation voltage level (V eval ).By controlling the voltage level on M4, HD-CAM can perform approximate matching when V eval < V DD , while a conventional exact match CAM operation is executed when M4 is driven by a full voltage level, V eval = V DD .

III. PROPOSED MATCHLINE SENSING SCHEME (MLSS) A. Design and Operating Principle
The proposed matchline sensing scheme (MLSS) is based on a positive feedback sense amplifier (SA) that is controlled by a replica line, as illustrated in Fig. 2 (a).The ML replica line (MLRL) is composed of n replica transistors (M n−1 to M 0 ) that are connected in parallel.The gate terminals of these devices are grounded, their drain terminals are connected to V DD , and their source terminals are connected to the MLRL.The MLSS includes three additional transistors (MN1, MN2, and MP), along with an inverter (I1).MN2 and MP are controlled by the PC signal.The gate of MN1 is connected to the replica voltage (V rep ), which in the final design is the same as V eval (or V DD ), i.e., does not require a separate voltage source (or MN1).However, for the sake of evaluating the MLSS susceptibility to sampling time variations, we enable the V rep to be biased separately to adjust the MLRL discharge, as presented hereafter.
The MLRL emulates the capacitance of a ML.Its output is the Sen signal that timely enables the sensing of the positive feedback SA.Fig. 2(b) details the schematic of the positive feedback SA.It comprises a pair of cross-coupled inverters with four enable transistors (MEN 1 -MEN 4 ), whose gates are driven by Sen. MEN 1 (MEN 2 ) acts as header (footer) to connect the latch to (down to) V DD (ground).The last two enable transistors, MEN 3 and MEN 4 , are connected to the output Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.3.This figure also shows two particular cases: when V rep is close to V DD and when V rep is less than half V DD .These two examples are labeled as sampling time at t 1 and t 2 , and correspond to approximate match and mismatch responses, respectively.Note that lowering V rep slows down the MLRL discharge, which in turn delays the Sen signal assertion.Therefore, the design variable (V rep ) may provide an additional level of flexibility, enabling the fine-tuning of the sensitivity response of the approximate match.

B. MLSS in Approximate CAM Memory Array
The top-level architecture of the HD-CAM memory array, including the MLSS and peripherals, is illustrated in Fig. 4(a).A 32-kbit HD-CAM array, organized as 512 64-bit words, comprises two 256×64 memory blocks.Two replica circuits are built in the HD-CAM array: the replica row and the replica column.The replica row has 64 transistors connected in parallel.Note that to balance the IR drop between the MLRL and the 512 SAs, the MLRL is placed at the middle of the memory array.The replica row generates the self-timed Sen signal to control the positive feedback SAs.Every evaluation of the SA is preceded by a precharge of the MLRL to initialize the sensing of the 512 HD-CAM words.This ensures that the evaluation phase will start only after the precharge to achieve a correct search operation across Process-Voltage-Temperature (PVT) variations.The replica column serves to synchronize the delays of the Sen signal and the SL/SL lines.This replica column is connected to the Sen line, and comprises 512 delay cells connected in parallel, as shown in Fig. 4. The delay cells represent the capacitance of the SL/SL lines that are connected along the column of the memory cell.
The layout of the HD-CAM memory array is shown in Fig. 4(b).The height of three rows matches the height of the SA.Therefore, three SAs are placed next to each other in a single row, as shown in the sub block view at the top of the memory array.The inset shows the layout of the positive feedback SA, exhibiting an area footprint of 13.4 µm 2 .The total area of the MLSS including all 512 SAs, as well as the replica row and column, is 0.0048 mm 2 .

B. Methodology and Measurement Results
Offline setup: First, we create a random dataset and store it in the HD-CAM array.Second, we build a query data set, which is the same dataset as the one stored in HD-CAM, but overlaid with random errors at a certain predefined rate (i.e., a certain number of bit errors in random positions per memory row).The error rate defines the Hamming distance (HD) between queries and the data stored in HD-CAM.Third, the MLSS is configured by setting its HD tolerance threshold using the design variables V eval and V ref .
Online test: The query datawords are searched one by one in the HD-CAM, and the number of matches is recorded.Since the MLSS HD threshold is configured to tolerate said HD,  we expect each query to match in HD-CAM.Therefore, all matches are true positive (TP) results, while all mismatches are false negative (FN) ones.Using these results, we can calculate the sensitivity of the MLSS as TP/(TP + FN).
Silicon measurement results of the MLSS sensitivity, for different V ref , V eval , and V rep , temperature, and different silicon samples are provided in Fig. 6.Sensitivity as a function of V ref is shown in Fig. 6(a) for a HD of 4 (i.e., 4 bit errors in random positions in each HD-CAM row).V rep = V eval , and V eval is varied between 0.6 V and 1.0 V.For lower values of V ref , the MLSS sensitivity is 100%, meaning that MLSS tolerates the HD of 4 and correctly resolves the compare results.With increasing V ref , the MLSS sensitivity diminishes, meaning some matches are falsely registered as mismatches.We also analyze the MLSS susceptibility to sampling time variation, to model which we vary the V rep .Fig. 6(c) shows the MLSS sensitivity as function of V ref for V DD of 1.2 V and different V rep values.Two sets of measurement results are shown: for V eval of 1 V and 0.6 V.For V eval of 1 V, the sample timing variation shows a very little effect on the MLSS sensitivity.For V eval of 0.6 V, this variation is higher mainly at V rep of 0.6 V. Overall, the susceptibility to the sampling time variation is limited over a wide range of V eval and V ref .
Finally, we also analyze the MLSS temperature and process variability (shown in Fig. 6(d, e)), where about 100% sensitivity is maintained for a wide range of temperatures, and across 5 different chips, respectively.Note that dynamic adjustment of the MLSS design variables effectively resolves the issue when significant PVT variations adversely affect the target HD tolerance.

C. Related Work and Comparison With State-of-the-Art
Table I qualitatively compares the proposed matchline sensing scheme with other sensing approaches compatible with approximate search CAMs.These approximate matching MLSSs have a large area footprint, can only tolerate a limited HD, and present a high degree of design complexity.Garzón et al. [14] require a complex sizing process and extra peripherals.Krishnan et al. [11] use an analog comparator and additional circuitry.Efthymiou [12] requires an encoder for parity bits, a dedicated ML scheme, and an embedded comparator in each cell.Imani et al. [13] use delay lines at the clock inputs of four SAs per match line, as well as precise device and circuit sizing.The StrongARM comparator, used in [14], as well as the proposed MLSS, assure better PVT stability, mainly due to the flexibility to adjust the tolerance threshold to a wide range of HDs [14].To our knowledge, none of these state-of-the-art designs have been silicon-proven.In contrast, the proposed MLSS has been demonstrated, by means of silicon prototyping and measurements, to provide an efficient low-complexity and low-cost solution for the tunable matchline discharge rate-based approximate search CAM.
V. CONCLUSION This brief introduced a low-complexity, scalable, and areaefficient matchline sensing scheme for approximate search CAM based on tunable matchline discharge.Our circuit was fabricated as part of a 65 nm test chip and evaluated through post-silicon testing and measurements.The proposed sensing scheme exhibits high sensitivity over a wide range of HDs between the queries and stored data.Testing results show that the proposed design can flexibly adjust the tolerance threshold, while exhibiting very limited susceptibility to sampling time, temperature, and process variations.The proposed design offers an efficient, low-complexity and robust alternative to state-of-the-art approximate search CAM sensing approaches.

Fig. 1 .
Fig. 1.Overview of the Hamming Distance (HD) tolerant CAM (HD-CAM) based on tunnable matchline discharge rate.(a) HD-CAM array.(b) n-bit HD-CAM word.(c) HD-CAM cell highlighting the storage, comparison and approximate match evaluation blocks.For the sake of simplicity, wordline (WL) is not shown in the HD-CAM cell block of (b) and (c).

Fig. 2 .
Fig. 2. Proposed matchline sensing scheme (MLSS) for HD tolerant CAM based on tunable matchline discharge rate.(a) Schematic of the MLSS.(b) schematic of the positive feedback sense amplifier.

Fig. 5 (
a) shows the test board with the fabricated test chip, nicknamed "LEO-II".The layout of the fabricated chip is provided on the right side of Fig. 5(a) with the approximate search CAM arrays highlighted among the various SoC components and other research projects integrated within the chip.Fig. 5(b) provides the main features of the 4 mm 2 chip, fabricated in 65 nm CMOS technology.The SoC features the operating frequency of 300 MHz at a supply voltage of 1.2 V. Fig. 5(c) shows our experimental setup, with an Intel Cyclone-V FPGA used for control and testing support during measurements.

Fig. 4 .
Fig. 4. (a) Top-level architecture of the HD-CAM memory array along with the matchline sensing scheme and peripherals.(b) Layout of the HD-CAM array highlighting the replica row and sense amplifiers.In the inset: the positive feedback sense amplifier layout.

Fig. 5 .
Fig. 5. (a) LEO-II SoC board along with a top-level view of the SoC layout highlighting HD-CAM, the approximate search CAM equipped with the proposed sensing scheme.(b) Main features of the test chip.(c) Photo of the experimental setup.For the purposes of testing and control, an Intel Cyclone-V FPGA is connected to the prototyping board.

Fig. 6 .
Fig. 6.Measurement results of the MLSS sensitivity for different voltages and temperature variations.Sensitivity as a function of: (a) V ref and V eval , (b) HD for V eval = 0.6 V and V ref = 0.8 V, (c) V ref for different evaluation V rep , (d) temperature.(e) Die-to-die variability box plot of sensitivity.

TABLE I COMPARISON
BETWEEN THE PROPOSED DESIGN AND OTHER POSSIBLE SENSING SCHEMES COMPATIBLE WITH HAMMING DISTANCE CAM higher (lower) the V eval or V ref , the lower (higher) the HD tolerance threshold.