Time Complexity of In-Memory Solution of Linear Systems

In-memory computing (IMC) with cross-point resistive memory arrays has been shown to accelerate data-centric computations, such as the training and inference of deep neural networks, due to the high parallelism endowed by physical rules in the electrical circuits. By connecting cross-point arrays with negative feedback amplifiers, it is possible to solve linear algebraic problems, such as linear systems and matrix eigenvectors in just one step. Based on the theory of feedback circuits, we study the dynamics of the solution of linear systems within a memory array, showing that the time complexity of the solution is free of any direct dependence on the problem size <inline-formula> <tex-math notation="LaTeX">${N}$ </tex-math></inline-formula>, rather it is governed by the minimal eigenvalue of an associated matrix of the coefficient matrix. We show that when the linear system is modeled by a covariance matrix, the time complexity is <inline-formula> <tex-math notation="LaTeX">${O}$ </tex-math></inline-formula>(log<inline-formula> <tex-math notation="LaTeX">${N}$ </tex-math></inline-formula>) or <inline-formula> <tex-math notation="LaTeX">${O}$ </tex-math></inline-formula>(1). In the case of sparse positive-definite linear systems, the time complexity is solely determined by the minimal eigenvalue of the coefficient matrix. These results demonstrate the high speed of the circuit for solving linear systems in a wide range of applications, thus supporting IMC as a strong candidate for future big data and machine learning accelerators.


I. INTRODUCTION
HE system of linear equations is among the most common problems in scientific and engineering fields, such as quantum mechanics, statistical analysis, network theory and machine learning [1], [2].Improving time and energy efficiencies of solving linear systems is constantly sought in modern scientific computing [3] and data-centric applications [4].Conventional digital computers solve linear systems by using classical algorithms such as Gaussian elimination, LU factorization and conjugate gradient (CG) method [5].In these algorithms, the time complexity is always a polynomial function of matrix size N, namely O(poly(N)).In the era of big data and internet of things, however, such performance may not be sufficient, given the exponential increase of data size and the approaching physical limits of the Moore's law [6].In the quest for an acceleration of data-intensive tasks, quantum computing has also been demonstrated to solve systems of linear equations with an O(logN) time complexity [7], [8].
Although quantum computing appears promising for exponential speedup of the solution, cryogenic temperatures and maintenance of quantum coherence in quantum computers appear as strong obstacles toward practical implementation especially for portable computing [9].Here we show that in-memory computing, which relies on the physical computing with crosspoint analog resistive memory arrays and negative feedback in circuit connections, solves a linear system in a time that is dictated by the minimal eigenvalue of an associated matrix.As a result, the corresponding time complexity is demonstrated to be extremely low, e.g., O(logN) or O (1) for solving linear systems of model covariance matrix.For sparse positive-definite linear systems, the time complexity depends solely on the minimal eigenvalue of the coefficient matrix, thus outperforming the conventional digital and quantum computing counterparts.

II. MULTILEVEL RRAM DEVICE
Resistive memories (also known as memristors) are two-terminal devices whose resistance (conductance) can be changed by a voltage stimulus [10], [11].The class of resistive memory devices includes various concepts, such as the resistive switching memory (RRAM) [12], [13], the phase change memory (PCM) [14] and the magneto-resistive memory (MRAM) [15].Thanks to their small size and nonvolatile behavior, resistive memories have been widely considered as promising devices for memory technology [12], [13].Most importantly, resistive memories enable stateful logic [16], [17] and in-memory analog computing [18], [19], thus circumventing the communication bottleneck between the memory and the processor which represents the main limitation of von Neumann machines.Fig. 1 shows the multilevel current-voltage (I-V) characteristics of a RRAM device, supporting the ability to store an arbitrary analog number mapped in the device conductance [19]- [21].The RRAM conductance is controlled by the compliance current, namely the maximum current supplied by the select transistor during the set transition from high resistance to low resistance [17].The device is fully reconfigurable, in that the application of a negative voltage can restore a high resistance in the device, thus preparing for another analog set operation.The 8 conductance levels in Fig. 1 will be employed in the following as discrete values to construct matrices and simulate the solution of linear systems within the circuit.

A. Time Complexity Analysis of the Crosspoint Circuit
Crosspoint resistive memory arrays can be conveniently used to accelerate the matrix-vector multiplication (MVM), which is a core operation in many computing tasks, such as sparse coding [18], signal processing [19] and neural network training [22].Recently, a crosspoint circuit of resistive memory arrays have been demonstrated to solve linear systems or eigenvector equations in one step [23].Fig. 2(a) shows the circuit to solve a system of linear equations, which reads  = ， (1) where  is an  ×  matrix of coefficients,  is a known vector, and  is the unknown vector to be solved.In the crosspoint circuit, each coefficient   of matrix  is coded as the analog conductance   of a resistive memory, the input voltages represents −, and the output voltages of the operational amplifiers (OAs) provides the solution  =  −1 .The reconfigurable resistive memory enables the crosspoint circuit of Fig. 2(a) to map an arbitrary matrix  with positive coefficients.
To address the time complexity of the crosspoint circuit, namely the time it takes to yield the correct answer to the problem, we first note that the closed feedback loop plays a leading role in ensuring a physical iteration between input and output.In other words, instead of completing a certain number of open loop iterations with a gradually diminishing error, we let the signal physically circulate within the closed loop to minimize the error in the feedback network, thus enabling a virtually instantaneous solution.In reality, the non-idealities of the circuit, such as the limited response time of the OAs, result in a finite time complexity of the solution.
Fig. 2(b) shows a block diagram of the crosspoint circuit, where the crosspoint array plays the role of feedback network conveying the weighted output to be compared with the input, thus establishing a stable output.To study the time response of the circuit, we write the input-output relationship in terms of Laplace transform of the ith OA in Fig. 2(a), according to the Kirchhoff's voltage and OA theory, namely: where For the whole system, all equations can be combined in the form of a matrix equation, namely ).As all OAs in the circuit are assumed identical, () is a scalar linking the inverting input and the output of each OA.Assuming a single-pole transfer function [24] for the employed OAs, namely () =  0 (1 , where  0 is the DC open-loop gain and  0 is the 3-dB bandwidth, Eq. (4) becomes where  =  is matrix associated with matrix ,  is the  ×  identity matrix.As  0 is usually much larger than 1, the second term in  + 1  0 can be omitted.As a result, the inverse Laplace transform of Eq. ( 5) into the time domain gives the differential equation: which describes the dynamics of the crosspoint circuit for solving Eq. ( 1).Though an analytical solution can be obtained for Eq. ( 6), we developed an iterative algorithm to analyze the transient behavior and evaluate the time complexity of the crosspoint circuit.Eq. ( 6) can be approximated by a finite difference (FD) equation, namely: where  is a small dimensionless number given by  =  0  0 ∆ with ∆ being the incremental time.To verify the FD algorithm, we have run transient simulation for solving a linear system, comparing the iterative solution according to Eq. ( 7) with the SPICE (simulation program with integrated circuit emphasis) transient simulation result.A 3×3 matrix was randomly constructed with the 8 discrete values in Fig. 1, and the corresponding linear system was solved.Fig. 3 shows the time evolution of the output () for the linear system.The trajectories of FD algorithm results appear highly consistent with the ones of circuit simulation, also both the asymptotic results are in line with the steady-state solution.A concern about the OAs is the slew rate, which limits the response time of the output in case of large signals.Our adopted OA (AD823 from Analog Devices) has a slew rate of 22 V/s, which guarantees that the circuit operates in the small signal response area in our simulation.From the simulation results, the steady-state output amplitude is reached in a computing time below 1 s, which is defined as the time for the norm of error dropping below 10 -3 .
The convergence of the iterative algorithm in Eq. (7) requires that the spectral radius of the matrix  −  has to be less than 1, which implies that the minimal eigenvalue (or real part of eigenvalue)  , of the associated matrix  has to be positive.The  , condition can also be understood from the viewpoint of transfer function of the circuit, which is according to Eq.
(5).The poles of the system can be determined by assigning  +   0  0  as a singular matrix, which implies that the poles are located at  = − 0  0   , where   is an eigenvalue of matrix .For the system to be stable, the N   's (or their real parts) have all to be positive [25].As  is a positive diagonal matrix, the  , condition is conveniently satisfied by positive-definite (PD) matrix, which is widely encountered in various fields and applications, such as statistical analysis [26], quantum chemistry simulation [27] and network theory [28].For this reason, we shall focus our attention on PD matrix in the following.
To provide an analytical model for the computing time as a function of  , , we have analyzed the convergence behavior of the iterative algorithm.The convergence is measured in the A-norm, which is defined as ‖‖  = √   [5].In the case of positive definite matrix , there is ‖‖  = ‖  ( Note that the inner product  *   of input and output is always positive by the definition of PD matrix.Also, compared with the reciprocal impact of  , on the computing time, the logarithmic role of  *   is suppressed.Therefore, the time complexity for solving linear systems with the crosspoint circuit is  ( ), which shows no direct dependence on the matrix size N.The time complexity of conventional iterative algorithms is usually a polynomial function of N, with also the matrix properties such as eigenvalues involved [2].We write the time complexity of the crosspoint circuit in a similar form, by linking  , to the minimal eigenvalue   of matrix  , namely  , =   .Therefore, the time complexity is  ( where the critical role of   is recognized, and the factor  may contribute an N-dependence. To support the   -controlled time complexity of the crosspoint circuit, we considered two 3×3 PD matrices containing discrete conductance levels in Fig. 1 with   = 0.49 and 0.053, respectively.Fig. 4 shows the SPICE transient responses for the two linear systems.The simulation results indicate a faster response for the larger   , thus supporting the dominant role of   .Fig. 5(a) summarizes the computing times for various 3×3 PD matrices, spanning two decades of   , and assuming 15 random vectors  for each matrix  .The results show that the computing time is inversely proportional to   , as also supported by the plot of computing time as a function of 1/  in the inset.As can be observed, there is a rough upper bound for the computing time, which scales linearly with 1/  and defines the time complexity of solving linear systems.In Fig. 5(b), the computing time show a precise linearity for the upper bound with the increase of 1/ , , demonstrating the precise description of time complexity by Eq. ( 12).

B. Time Complexity of Solving Model Linear Systems
To show the time complexity dependence on the matrix size N, we considered a model covariance matrix to represent a real-world problem [4], [29], [30], namely: where  > 0 is a decay factor.Covariance matrix plays an important role in statistical inference and financial economics, such as in the portfolio theory.The decrease of off-the-diagonal elements of the matrix was chosen to simulate the decreasing correlation of high-dimensional data samples in a realistic covariance matrix.In the following, we consider model covariance matrices of the first order ( = 1) and the second order (  = 2 ).Note that   is asymptotically constant as the size of the model covariance matrix increases, thus the N-dependence of time complexity is related solely to the factor .
In simulating the solution of a linear system of a model covariance matrix, the coefficients in Eq. ( 13) were mapped in the crosspoint array with 64 discrete and uniform conductance levels, which is feasible for previously reported resistive memories [19], [31]- [33].The conductance ratio, defined as the ratio between the maximum conductance   and minimum conductance   , was assumed equal to 10 3 , which is also achievable for various RRAM devices [34], [35].Each level was randomized according to normal distribution with a standard deviation  = ∆/6, where ∆ =   /64 is the nominal difference between two adjacent conductance levels.
Fig. 6(a) shows the crosspoint conductance for a 10×10 first-order covariance matrix implemented with RRAM devices.We simulated matrix inversion by the crosspoint circuit, which is equivalent to solving N linear systems where the input vector  is set equal to each column of the identity matrix [23].Fig. 6(b) shows the 100 computed elements of  −1 as a function of the analytical results, which indicates a good accuracy.
To study the scaling behavior of the computing time, linear  systems were solved for matrix size ranging from N = 3 to 300.
For each matrix , 100 linear systems were solved with random input vectors  .Fig. 7(a) shows the simulated computing time for the first-order covariance matrix.The results reveal that the computing time scales logarithmically with the matrix size N, i.e., the time complexity is O(logN).
The O(logN) time complexity indicates the coefficient  scales as  ∝ 1

𝑙𝑛𝑁
. The figure also shows the analytical minimal eigenvalues and those calculated for the conductance matrices implemented in the crosspoint circuit.The difference between the analytical and calculated eigenvalues due to conductance discretization and randomization is responsible for the inconsistency of the computing times obtained by the ideal and conductance matrices.This interpretation is supported by the fact that, for instance, the computing time for the crosspoint resistive-memory simulation with N = 10 is smaller than the ideal value, while the minimal eigenvalue is larger.The opposite case applies to N = 150.To guarantee the minimal eigenvalue is in the vicinity of the ideal one and thus the computing time is predictable, it is important to reduce device variations by using devices of large conductance window accommodating sufficient analog levels, also by using verify algorithms for device programming [20].

C. Comparison with Other Computing Paradigms
The quantum algorithm for solving linear systems addresses sparse Hermitian matrix, and its time complexity is , where   is the maximum eigenvalue of the matrix,   /  is the condition number, and  is the sparsity which means the matrix has at most  nonzero entries per row [7].To make a direct comparison with quantum computing, we also consider sparse PD matrix that is a subset of real-valued Hermitian matrix.By defining   as the minimal eigenvalue (also the minimal element) of the diagonal matrix , there is a relation  , ≥     , due to the eigenvalue inequality for a matrix product [36].As   is determined by the inverse of the largest row sum of matrix , there is 2 ) time complexity of the quantum algorithm.We tested a set of sparse PD linear systems to verify the time complexity of the crosspoint circuit, with the sparsity assumed as s = 10.We generated 1,000 linear systems, i.e. 1,000 sparse PD matrices and one random input vector for each, with sizes from 20×20 to 200×200.Fig. 8(a) shows the computing time for solving the 1,000 linear systems, which is independent of N and is solely determined by   , thus supporting the  (

IV. DISCUSSION
The analysis of circuit dynamics and time complexity is based on the assumption of an ideal crosspoint resistive array, as the RC delay in crosspoint MVM is extremely low, e.g., 0.5 nanosecond in a 1024×1024 array [38].To evaluate the impact of wire resistance, parasitic capacitance and device capacitance on time complexity of the circuit, we have  simulated the solution of linear systems of the model covariance matrix in SPICE with these parasitic components considered.Specifically, each crosspoint resistive device was replaced by a sub-circuit module (Fig. 9), where the wire resistance and the parasitic capacitance are assumed according to interconnect parameters at 65 nm node in the ITRS table [39], and the device capacitance is calculated with dielectric constant of HfO2 [40].The simulation results of the solution of linear systems for increasing size is reported in Fig. 10.The results indicate the same time complexity as the ideal circuit, namely O(logN) and O(1) for solving linear systems of the first-order and the second-order model variance matrix, respectively.Such a comparison supports the robustness of the crosspoint computing circuit against parasitics.The wire resistance imposes a relatively small error to the steady-state solution, which is alleviated by the intermediate interconnect technology for crosspoint arrays, in contrast to the aggressive downscaling of conventional high-density memory [38].Also, increasing the crosspoint device resistance and adopting 3D integration are helpful to improving the solution accuracy [23].
In conventional computers, linear systems of a dense matrix can be solved with standard algorithms such as Gaussian elimination and LU factorization, which are of O(poly(N)) time complexities.The solution can be accelerated with parallel algorithms, for instance, Gaussian elimination can be carried out in parallel with a time complexity of O(N) by using N 2 processors [41].Csanky's algorithm reports a better time complexity that is O(log 2 N), while N 4 processors are required [42].In the crosspoint computing circuit,  ( 1  , ) time complexity that may implicate O(logN) or even O(1) is achieved with only N 2 memory devices to necessarily implement the matrix, thus representing a much more efficient method for solving linear systems.Note that the  ×  crosspoint array is imperative to store the matrix, and the data are processed directly in the memory, whereas in conventional computers, the memory cost also scales as N 2 , and an amount of additional processors are required.Compared with in-memory computing, digital computing possesses an additional data access complexity due to the communication between the separated memory and processor [43].Therefore, there is an obvious efficiency advantage for solving linear systems with the crosspoint resistive memory circuit, thanks to the concept of in-memory computing and to the unique time complexity of computation.In the linear system problem, different matrices may be involved to be stored in the crosspoint array, thus requiring device reprogramming.In this sense, fast and reliable writing schemes [21] are favored to retain the advantage of in-memory computing.The high efficiency of our method is attributed to the parallelism in the circuit, where the Kirchhoff's voltage law and the concurrent feedback play major roles.According to the output update algorithm in Eq. ( 7), the whole system resembles the Hopfield network [44], [45], which is well known for its physics-inspired high parallelism.In contrast, there is no discrete iteration in the crosspoint circuit, instead the output evolves in a self-sustained fashion, thus contributing to an even higher speed in addition to the architecture parallelism.

V. CONCLUSION
In conclusion, we have studied the time complexity of solving linear systems with an in-memory computing circuit.Based on the feedback theory, we show that only if the minimal eigenvalue (or real part of eigenvalue)  , of the associated matrix is positive, the linear system can be solved by the circuit.According to a finite difference algorithm developed for the circuit dynamic, we show that the time complexity is free of direct N-dependence, rather determined solely by  , .For solving linear systems where  , possesses a weak (or none) N-dependence, the speed of the circuit is expected to be unprecedentedly high, e.g., the time complexity is O(logN) or O(1) for solving linear systems of the model covariance matrices.When addressing sparse PD linear systems, the time complexity is  ( ) , thus outperforming its counterparts of conventional digital computing and quantum computing.We project that, when analog non-volatile memory technology becomes maturely industrialized, in-memory computing can play a leading role in boosting the computing performance for big data in a wide range of real-world applications.The solution precision  is limited to 10 -2 due to the more discrete outputs when approaching the steady state in SPICE, thus the computing time is less than the ones in Fig. 7. Also, the matrix was limited to 100×100 for the circuit complexity consideration with parasitic components.
where  is a diagonal matrix defined as

Fig. 2 .
Fig. 2. (a) Crosspoint resistive memory circuit for solving linear systems, illustrated with N = 3 as the problem size.The conductance matrix   maps , the input voltages [ 1 ;  2 ;  3 ] represents −, and the output voltages [ 1 ;  2 ;  3 ] gives the solution of  .(b) Block diagram of the crosspoint circuit as a control system.The crosspoint array conveys the output  to interact with the input .

Fig. 1 .
Fig. 1.I-V characteristics of multilevel operations of the RRAM device.8 conductance levels are shown, and the values read at a small voltage are 120, 80, 60, 50, 30, 20, 15 and 10 S, respectively.The inset shows the RRAM device structure.

2 , 2 .𝑼𝑨 1 2 2 = 2 ≤ 2 Fig. 4 .Fig. 3 .
Fig. 4. (a) Transient behavior of solving a linear system of a 3×3 PD matrix with a relatively large   , which is labeled on the top.(b) Same as (a), but for a matrix with a one-order smaller   .

Fig. 5 . 2 ( 10 )
Fig. 5. (a) Summary of computing time for solving linear systems with different   's.The inset shows the computing time as a function of 1/  .(b) Computing time as a function of  , .The inset shows the computing time as a function of 1/ , , indicating a precise linear upper bound (green line).

Fig. 6 .
Fig. 6.(a) The 10×10 first-order model covariance matrix mapped by discretized and randomized RRAM devices.The conductance unit is 100 S.(b) The inverse matrix solved with the crosspoint RRAM circuit, as a function of the precise analytical solution.The relative errors (right y axis) are generally small, except for the entries near close to zero.

Fig. 7 .
Fig. 7. (a) Summary of computing time for solving linear systems of the first-order model covariance matrix with different sizes, ranging from 3×3 to 300×300.Results from both ideal matrix and RRAM conductance matrix are shown.The   's of both ideal matrix and RRAM matrix are shown as the right y axis.(b) Same as (a), but for the second-order model covariance matrix.

Fig. 7 (
b) shows the scaling behavior of computing time for the second-order covariance matrix, indicating a constant computing time, i.e., the time complexity is O(1).Due to the strong decaying behavior, the elements far from the diagonal are close to zero, thus requiring a larger conductance ratio (  /  = 10 4 ) of resistive memories to map the entire matrix.The O(1) time complexity in Fig. 7(b) reveals the coefficient  is asymptotically constant for the second-order covariance matrix.Therefore, depending on the matrix structure, extremely low time complexity such as O(logN) or O(1) can be achieved, which hugely reduces the computing time for large-scale problems.
~   .As a result, the time complexity of the crosspoint circuit in Eq.(12)

)
time complexity.Fig.8(b)shows the computing time for a subset of the 1,000 linear systems with limited   to exclude its contribution.The relative computing time of the quantum algorithm for solving the same linear systems is also shown, according to its time complexity formula.Fig.8(b) also reports the relative computing time of the CG method[37], which is the most efficient algorithm for solving PD linear systems in conventional digital computers thanks to a time complexity of  (results indicate that in-memory computing, quantum computing and digital computing display O(1), O(logN) and O(N) time complexities, respectively, for solving sparse PD linear systems.

Fig. 9 .
Fig. 9. Sub-circuit module of a single crosspoint resistive memory device.Rr is device resistance, storing an element value in the matrix, Cr is device capacitance, Rw is wire resistance, Cw is parasitic capacitance.

Fig. 8 .
Fig. 8. (a) Summary of computing time for solving 1,000 sparse PD linear systems, plotted as a function of N and   .(b) A subset of simulation results for   limited within [0.9, 1], showing a constant computing time for in-memory computing (IMC).The relative computing time of conventional CG method and quantum computing (QC) for solving these linear systems are also calculated, according to their time complexity formulas.

Fig. 10 .
Fig.10.Time complexity of the crosspoint circuit without or with parasitics for (a) the first-order and (b) the second-order model covariance matrix.The solution precision  is limited to 10 -2 due to the more discrete outputs when approaching the steady state in SPICE, thus the computing time is less than the ones in Fig.7.Also, the matrix was limited to 100×100 for the circuit complexity consideration with parasitic components.
is the conductance of the jth device in the ith