A Generalized Block-Matrix Circuit for Closed-Loop Analog In-Memory Computing

Matrix-based computing is ubiquitous in an increasing number of present-day machine learning applications such as neural networks, regression, and 5G communications. Conventional systems based on von-Neumann architecture are limited by the energy and latency bottleneck induced by the physical separation of the processing and memory units. In-memory computing (IMC) is a novel paradigm where computation is performed directly within the memory, thus eliminating the need for constant data transfer. IMC has shown exceptional throughput and energy efficiency when coupled with crosspoint arrays of resistive memory devices in open-loop matrix-vector-multiplication and closed-loop inverse-matrix-vector multiplication (IMVM) accelerators. However, each application results in a different circuit topology, thus complicating the development of reconfigurable, general-purpose IMC systems. In this article, we present a generalized closed-loop IMVM circuit capable of performing any linear matrix operation by proper memory remapping. We derive closed-form equations for the ideal input-output transfer functions, static error, and dynamic behavior, introducing a novel continuous-time analytical model allowing for orders-of-magnitude simulation speedup with respect to SPICE-based solvers. The proposed circuit represents an ideal candidate for general-purpose accelerators of machine learning.


I. INTRODUCTION
I N-MEMORY computing (IMC) has gained traction as a promising candidate to overcome the von-Neumann bottleneck by eliminating the separation between the memory and processing units [1]. IMC executes algebraic operations by physical laws in crosspoint memory arrays, thus allowing for low-power, high-density, and high-throughput computation [2]. Machine learning [3], [4], image processing [5], combinatorial optimization [6], and baseband processing [7] are some of the representative examples demonstrating the IMC potential as next-generation computing architecture [8]. Experimental demonstrations of IMC accelerators for matrix-vector multiplication (MVM) have been reported in the latest years [9] improving the computational complexity toward the attractive O(1) limit [10]. MVM alone is, however, insufficient to build a comprehensive IMC algebraic accelerator. In a growing number of applications, inverse-matrix-vector multiplication (IMVM) is needed alongside MVM to carry out tasks of increasing complexity. To relieve the dependence of iterative solvers [11] exploiting open-loop MVM on external coprocessors, several memory-agnostic IMC-IMVM accelerators have been proposed [7], [12], [13], [14], [15], [16] exploiting closedloop, feedback-based topologies to implement the inverse operation. Closed-loop IMVM allows to vastly reduce the computation complexity of inverse computation from O(n 3 ) to O(1) [17]. Nonetheless, the different requirements of inverse problems, ranging from simple matrix inversion [12], to linear regression [4], [13] and regularized regressions [7], required the design and implementation of ad hoc topologies, limiting the generality of the proposed solutions.
In this article, we introduce a general-purpose, reconfigurable closed-loop IMVM universal circuit, namely the block circuit, capable of solving any linear matrix operation. In Section II, we introduce the block circuit and demonstrate its capability to implement all the previously reported open-loop MVM and closed-loop IMVM circuit topologies. In Section III, we provide a model for the static operation of the circuit, deriving ideal input-output transfer functions and assessing the impact of common error sources. In Section IV, we study the dynamic behavior of the circuit, providing stability criteria and deriving closed-form equations for the time evolution of voltages and currents. The proposed dynamic model has the same accuracy as a SPICE simulation while allowing orders-of-magnitude improvement in wallclock simulation time.
In the following, we adopt the Householder notation [18], where bold capital letters A, B denote matrices, bold lowercase letters a, b denote vectors and lowercase letters a, b denote scalars. × denotes matrix-vector multiplication, A T is the transpose of A, ∥ · ∥ p is the vector p-norm and |||·||| p the induced operator p-norm. A Hermitian positive (negative) semidefinite matrix satisfies A ⪰ 0 (A ⪯ 0) and its singular values are σ 1 (A) ≥ . . . ≥ σ n (A). The condition number of A is κ A = σ 1 (A)/σ n (A). I n is the identity matrix of size n × n, 0 n is the matrix of all zeros of size n × n. Size subscripts are omitted whenever the size is deducible from context. Fig. 1 shows the universal circuit for general-purpose solutions of linear algebra problems, such as matrix inversion, matrix-vector multiplication, and regression. The circuit includes two crosspoint arrays, namely the input crosspoint array Y and the feedback crosspoint array X, sharing connections on rows. Operational amplifiers (OAs) are used to realize a closed-loop feedback configuration around X, by connecting their inputs to the shared rows, and their outputs to columns of matrix X. Input signals are applied to the circuit either through current generators i in connected to the shared rows, or voltage generators v in connected to columns of matrix Y. Matrix Y thus acts as an MVM pre-processing with respect to v in , whereas X provides IMVM capability as in [12] and [15].

II. BLOCK-MATRIX CIRCUIT
Matrices X and Y are further arbitrarily partitioned in blocks in Fig. 1 where, for the sake of simplicity, we have considered a 2 × 2 partitioning. The OA array is similarly split into blocks, with each block collecting as many OAs as the rows of the corresponding feedback block. In this sense, block X ij represents the matrix connection between the outputs of OA array A j and inputs of OA array A i . A larger number of blocks are possible, provided splitting is performed consistently across all constituent matrices and vectors. Similarly, row connection to the OAs may either be realized to the inverting node, as shown in Fig. 1, or to the non-inverting node, provided all OAs belonging to the same block share the same input connection.
The proposed circuit can perform all previously demonstrated operations using analog in-memory matrix computing by suitably mapping either the input or feedback matrices. The mapping operation can be seen as a circuit rearrangement, thus preserving the same nodal equations. In the following, we report the block matrices corresponding to previously presented circuits of the IMC framework, namely MVM [10], positive-definite linear system solver [12], and linear [13], generalized [15], and ridge regression [7].

A. MATRIX-VECTOR MULTIPLICATION
Typical MVM circuits [10] perform read-out of the currents induced on a target matrix Y by an array of voltage inputs v in through an array of transimpedance amplifiers (TIAs) with transconductance k as shown in Fig. 2(a). The equivalent block-matrix circuit is shown in Fig. 2(d), where the input array is used to map Y and the feedback array is used to map the transimpedance configuration of each amplifier, resulting in a diagonal feedback matrix kI.
An alternative mapping, relying on the feedback array only and current inputs, is shown in Fig. 2(e), corresponding to the circuit in Fig. 2(b). Here, current inputs are converted to a voltage by a first TIA array with gain k 1 . The outputs of the first TIA array are applied to matrix Y, inducing currents that are converted to a voltage by a second TIA array with gain k 2 . The equivalent block-matrix mapping of the circuit consists of a 2 × 2 partitioning, where local feedback blocks k 1 I, k 2 I describe the TIA connection of the first and second OA array, respectively. As Y is connected between the outputs of A 1 and the inputs of A 2 , its corresponding block in the feedback matrix is (2, 1). On the other hand, since there is no direct connection between outputs of A 2 and inputs of A 1 , block (1, 2) is set to 0. Finally, outputs are probed on amplifier set A 2 as in the original circuit. Additional inputs i 2 and outputs v out,1 of the block-matrix circuit are unused in this configuration. Fig. 2(c) shows the linear system solver [12], which is composed of two amplifier sets in the inverting configuration, A 1 and A 2 both of n OAs, and two n × n feedback matrices

B. LINEAR SYSTEM SOLVER
The circuit can operate on positive-definite matrices only [17]. Analog inverting couplers, realized by means of additional OAs with trimmable transconductances k 1 , k 2 , provide intermediate voltage inversion and scaling for matrix M − [19]. The main circuit equation is The corresponding 2 × 2 block-matrix circuit is shown in Fig. 2(f). Block (1, 1) maps connection from A 1 output to its own input, corresponding to matrix M + . Block (1, 2) maps connection from A 2 output to A 1 input, corresponding to matrix M − . Block (2, 1) maps connection from A 1 output to A 2 input, corresponding to k 1 I. Finally, block (2, 2) maps connection from A 2 output to its own input, described by k 2 I. All blocks have equal size n × n, such that the overall feedback matrix has size 2n × 2n. Since no voltage generators are used, the input array is set entirely to 0 and thus neglected. Similarly, additional current inputs i 2 and outputs v out,2 are unused in this configuration. Fig. 3(a) shows the linear regression circuit [13], which is composed of two amplifier sets, A 1 of m OAs in inverting configuration and A 2 of n OAs in non-inverting configuration. Two feedback matrices, both mapping m × n matrix M, are connected between the outputs of A 1 and inputs of A 2 and vice versa. Local feedback on A 1 is provided by conductances k f , whereas no local feedback connection is set on A 2 . The main circuit equations are [15]   M, and A 2 inputs being connected on columns, the equivalent matrix to be mapped in block (2, 1) is M T with a block size n × m. The overall feedback matrix has thus size (m + n) × (m + n). Finally, A 2 is set in a noninverting configuration, whereas additional inputs i 2 are unused in this configuration. Fig. 3(b) shows the generalized regression circuit [15]. The main difference with respect to the linear regression circuit is represented by matrix F, placed in feedback configuration on amplifiers A 1 . Note that, when matrix F is diagonal, a weighted linear regression is obtained, which becomes equal to the linear regression in Fig. 3(a) for equal feedback conductance values. The input-output circuit equations for the circuit of Fig. 3

D. GENERALIZED REGRESSION CIRCUIT
The corresponding block-matrix circuit is shown in Fig. 3(e), which is similar to the linear regression block-matrix circuit except for block (1,1), which now maps matrix F, retaining the same m × m block size. Consequently, the overall size of the feedback matrix is still (m + n) × (m + n). Fig. 3(c) shows the ridge regression circuit [7]. Starting from a linear regression circuit, a local feedback branch is added on A 2 by means of an inverting analog buffer and an array of conductances k d . The main circuit equations are thus

E. RIDGE REGRESSION CIRCUIT
The corresponding block-matrix circuit is shown in Fig. 3(f). The block matrix is organized in a 3 × 3 configuration, where blocks (1, 1), (1, 2), (2, 1), and (2, 2) are the same as the linear regression circuit since the corresponding connections are unchanged. Similar to the linear system solver case, analog inverting buffers are realized by inverting stages with −1 gain requiring additional amplifiers A 3 . Block (3, 1) thus maps direct connections from A 1 output to A 3 input, corresponding to a zero matrix 0. Block (3, 2) corresponds to the connection from A 2 output to A 3 input. For the sake of simplicity, we consider unitary connections, thus resulting in a n × n block I. To preserve the unitary gain, the same matrix describes the local feedback connection around A 3 . The n outputs of A 3 are connected to the n A 2 inputs through conductances k d . Consequently, the n × n block (2, 3) is k d I.
Finally, as no direct connection is present between the n buffer outputs and the m A 1 inputs, the m × n block (1, 3) is 0. The overall feedback matrix size is thus (m + 2n) × (m + 2n). Additional inputs i 2 , i 3 are unused in this configuration.

A. CIRCUIT MATRICES DEFINITION
To study the operation of the circuit, we define block-matrices Y and X for the input and feedback crosspoint array conductances, and block-vectors i, v in and v out corresponding to input currents, input voltages, and output voltages, respectively. For instance, in the case of the 2 × 2 partitioning of Fig. 1, the block matrices and block vectors read OAs are modeled assuming the single-pole amplifier structure of Fig. 4, consisting of a first amplifying block with gain sα 0 , an RC network whose time constant is τ 0 = RC, and a unity gain buffer. The transfer between the input and output voltagesv i , v out,i of the i-th amplifier is thus described by with where s = jω is the Laplace frequency, andŝ i , α 0,i , and τ 0,i are the sign, open-loop DC gain, and intrinsic time-constant of the i-th OA, respectively. Correspondingly, the i-th OA dynamics in the time domain are described by the differential equation The entire set of amplifiers may thus be described in the frequency domain by a diagonal matrix A where S is a diagonal matrix mapping the sign of the corresponding row amplifier, i.e., S ii = −1 if the i-th amplifier is in the inverting configuration, or S ii = +1 if the i-th amplifier is in noninverting configuration, A 0 is the diagonal matrix of the absolute dc gain of the OAs, i.e., A 0,ii = α 0,i , and similarly T 0 is the diagonal matrix of OAs time constants, i.e., T 0,ii = τ 0,i , such that A ii = α i (s). The transfer between the OA input voltage vectorv and OA output voltage vector v out is thus given by Similarly, the entire set of amplifiers is described in the time domain by the corresponding system of differential equations (15) wherev and v out are the OA input and output voltage vectors, respectively.

B. IDEAL STEADY-STATE TRANSFER FUNCTION
To compute the transfer functions in ideal conditions, i.e., assuming infinite DC gain α 0 of the OA, we consider Kirchhoff's law at the input nodes of the OAs in Fig. 1, from which By applying the superposition principle, we first consider the case v in = 0. In this case, the ideal output voltage is given by where blocks of the inverse matrix X −1 may be written in terms of blocks of matrix X by using the block-inversion lemma [18]. We then consider the case with i = 0, for which we write The overall output voltage is thus given by the summation of both contributions, namely

C. OUTPUT STATIC ERROR
The block-matrix model can be used to evaluate the output error arising from various sources. Perturbations of the feedback matrix δX, input matrix δY, input currents δi, and input voltages δv in inevitably introduce deviations δv out of the output voltages from ideal state equations. To quantify the impact of these perturbations, we define the relative error ε Perturbations may arise from many different sources, such as quantization, finite amplifier gain, and device variability. For simplicity, we study each perturbation individually. For the i-th perturbation, we derive a maximum relative errorε i and define upper (ε ↑ i ) and lower bounds (ε ↓ i ). The overall maximum relative error is then bounded by   Appendix A. Upper bounds for all sources depend on the feedback matrix condition number κ X , which therefore serves as the primary sensitivity index of the system. Fig. 5(a) shows simulation results for matrices with increasing condition number κ X and fixed size n = 100. Each line traces the maximum relative error obtained while simulating the corresponding perturbation. Aside from a multiplicative factor dictated by the perturbation nature, all errors linearly increase with κ X . On the other hand, the size dependence of upper bounds might be an excessive overestimation. Fig. 5(b) reports maximum relative errors for each perturbation as a function of the matrix size, for a fixed condition number κ X = 10. As the matrix size increases, maximum errors tend to lose any dependence on the matrix size n, suggesting that lower bounds of Table 1 may prove more helpful in analyzing the system from a scaling standpoint.

IV. DYNAMIC OPERATION MODEL A. FREQUENCY-DOMAIN MODEL
We begin the analysis of the circuit dynamics by studying its operation in the frequency domain to evaluate the circuit poles. To this aim, we perform the loop gain analysis in Fig. 6 by removing all voltage/current generators and cutting the loop at the output of the OAs in Fig. 1. The loop-gain transfer is the one between the test voltage vectorṽ and the output voltage vector v.
We first consider the transfer betweenṽ and the intermediate voltagev, which corresponds to the input voltage of the OAs. Considering for instancev 1 , the application of Kirchhoff's law yields Equation (22) can be rewritten in matrix/vector form as where U Y and U X are diagonal matrices whose i-th diagonal element contains the sum of the i-th row of Y and X, respectively. Equation (23) can be rewritten to obtain the closedform relation betweenv andṽ whereX is the voltage divider matrix betweenṽ andv.  Consequently, the overall transfer between the test vector v and the OA outputs v out is given by where G loop (s) is the frequency-dependent loop-gain matrix.
Poles of the closed-loop system are then found at the frequencies p for which the test vectorṽ is identically mapped onto itself, namely G loop (p)ṽ = Iṽ.
By expanding A, the previous equation may be rewritten as so that circuit poles can be found by solving Poles p are thus the eigenvalues of T 0 −1 (SA 0X − I), where matrix SA 0X = G loop (0) represents the DC loop gain matrix. For stability, it must hold Re(p) < 0, i.e., matrix T 0 −1 (SA 0X − I) must be Hurwitz-stable [20], in accordance with stability criteria for linear time-invariant systems.
In the particular case of all OAs having the same time constant τ 0 , (inverting) sign, and DC open-loop gain α 0 , VOLUME 9, NO. 1, JUNE 2023 Eq. (28) reduces to consistently with [17]. Inferring the stability of SX from the spectral characteristics of the feedback matrix X, which are generally known, is not trivial [21]. As a general criterion, negative-definiteness of matrix SX is sufficient to determine the stability of SX. Additional details are provided in Appendix B.

B. CONTINUOUS-TIME MODEL
The last step in dynamic modeling is to study the circuit in the continuous-time domain. Once again, we assume that all OAs share the same structure of Fig. 4, although each OA may have a distinct time constant τ 0,i , gain α 0,i , and signŝ i .
The time-continuous Kirchhoff equation at the OAs inputs for the closed-loop system of Fig. 1 reads wherev(t) is the time-continuous voltage vector at the OAs inputs. A complete analytical derivation for arbitrary initial conditions and inputs is provided in Appendix C. For current and voltage step inputs, and considering outputs to be initially at rest, then the output voltage reads  Fig. 3(f), and (c) the ridge regression circuit in Fig. 3(f), with colored and dashed lines representing the SPICE-computed and modelcomputed voltages, respectively, highlighting the accuracy of Eq. (31). In particular, Fig. 8(a) shows absolute voltage errors between the model-based and SPICE-based output voltage transients for the circuit shown in Fig. 1, for increasingly conservative SPICE tolerance settings. As SPICE is forced to be more accurate, the solution grows closer to the one computed by the model, once again demonstrating the accuracy of Eq. (31). Finally, Fig. 8(b) reports a comparison of simulation times for increasing matrix size for SPICE (red lines) and model in MATLAB (blue lines), showing up to three orders of magnitude reduction. The MAT-LAB implementation also scales more favorably with size as O(n 2 ) with respect to the O(n 4 ) dependence of SPICE-based solvers.

V. CONCLUSION
We present a universal core primitive for analog crossbarbased IMC. The proposed block circuit is capable of implementing any linear matrix operation, including but not limited to MVM, IMVM, and linear and regularized regression. We derive closed-form equations for the ideal inputoutput transfer functions and static error, together with error bounds that can serve as guidelines for practical implementations. We provide stability criteria and an analytical model for the voltage transient in the presence of step inputs, outperforming SPICE-based solvers by several orders of magnitude. The proposed model can be retroactively applied to previous feedback-based circuit implementations, both open-loop and closed-loop, and can serve as a generalized framework for the study of analog-IMC topologies. Owing to its highly scalable structure, the circuit can represent a complete macro for matrix-based operations in novel analog processing units under the IMC paradigm.