Noise Driven Coupled Nonlinear Systems: A “Dynamical Multilayer Perceptron” Demonstrating XOR Functionality

Coupled nonlinear dynamical systems are known to display remarkable behavioral similarities to some aspects of neural activity in biology. When considering the computational properties of such a neuromorphic network (or even an electronic biomimetic computation device underpinned by such a network), the question of whether (and how) such a network realizes fundamental logic operations must be examined. In this work, a dynamical system of coupled noisy overdamped nonlinear bistable elements is considered. The background noise floor is used to drive information flow by helping each element switch randomly between its stable states, with the system response quantified via a long-time probability density function. A theoretical representation of such a system is developed and a simple five-element realization is used to demonstrate a continuous version of an XOR gate, through the proper choice of coupling coefficients and controllable external biases. This is a first step towards synthesizing more general functions, a prerequisite for advanced computing and learning applications. Finally, a silicon implementation is proposed and simulated using a verified process model that could be fabricated to generate a working analog computing system.


I. INTRODUCTION
Artificial Neural Networks (ANNs) [9] are ubiquitous in increasingly sophisticated scenarios in the general areas of computer vision, natural language processing, and neuromorphic computing.Such systems afford capabilities beyond traditional methods, and they hold out the tantalizing promise of more elegant and efficient applications in areas such as robotics and artificial intelligence.There also exist a number of dynamical mathematical models that mimic certain aspects (e.g.spiking, long term potentiation) of biological neural networks (see e.g.[4]).Such models can The associate editor coordinating the review of this manuscript and approving it for publication was Derek Abbott .
facilitate the modeling of the interaction between the cell body and synapses, allow one to explore spatiotemporal complexity, and provide a framework to examine the effects of background noise.This dovetails with recent interest in collective behavior in large scale brain models, as a possible driver of brain functions, e.g., cognition and perception [1], [2], [3].However, these models can be limited in terms of algorithmic implementation, as they evolve towards larger and more complex networks.In recent years, many-core neuromorphic processors [5] as well as Very Large-Scale Integration (VLSI) emulations of large neural networks with as many as 10 6 silicon neurons [6] have been developed.In this work, we present a potential method for driving neural-like dynamics by taking advantage of the inherent (thermal or other) noise in networks of coupled nonlinear dynamic elements, each with a noise floor; this results in sufficiently complex behavior to perform more advanced tasks such as function approximation.
As a necessary prerequisite for universal function approximation, artificial neural networks should be able to reproduce the basic logic gate functions.The AND and OR gates can each be implemented by a single layer network consisting of two input elements and one output element.In contrast, implementing the XOR gate requires a two layer network, with an additional two elements in a ''hidden'' layer.We focus on implementing an XOR function, operating on analog variables, due to its ubiquity in the machine learning literature.This can be viewed as a first step towards synthesizing more complicated functions.
Here, we examine how noise-driven coupled nonlinear dynamical systems can yield XOR functionality.We consider a network of bistable nonlinear dynamic elements with nonlinear coupling and a noise-floor attributed to each element.These elements qualitatively mimic the dynamic behavior of neurons, with the important caveat that they do not produce spiking; the bistability (underpinned by a nonlinear potential energy function) occurs between two fixed points, rather than between a fixed point and a limit cycle.This system provides a reasonable model for describing, in the mean, the interaction between the noise floor and the stimulus in the network; the deterministic solution is supplanted by a statistical solution wherein the averaged response can be calculated, and connected to observables in the real system.Our proposed system is inspired by conventional ANNs, which may be viewed as consisting of static coupled nonlinear elements.Our choice of nonlinearity closely mimics the usual sigmoidal transfer characteristic commonly used in ANNs, with the coupling also being a sigmoidal (hence bounded) function with suitably chosen coefficients or ''strengths''.
Our dynamical network also resembles a ''liquid state machine'' (LSM), which has been proposed for computing with time series [10].Like LSMs, our network can handle time-dependent inputs, and offers the possibility of quadrature via techniques involving time scale separation.Furthermore, although we take the coupling coefficients to be constants in this study, they can readily be described statistically for more complex scenarios.Nonlinear dynamic elements comprising LSMs can also exhibit cooperative dynamical behavior arising from the interplay between nonlinearity, coupling, and noise, most notably when cast in the framework of driven dynamical systems.One example of such behavior is the oft-studied Stochastic Resonance (SR) effect [11], which is known to be amplified in a nonlinear sensor network via the interplay of coupling, applied signal (often taken to be time-periodic), and noise floor [12], [13].Of particular interest in the context of this paper, we note the ''Logical Stochastic Resonance'' (LSR) [14], [15], [16] scenario, wherein a noisy bistable element accepting two random data streams can reproduce the fundamental logic operations (AND/OR, NAND/NOR) and morph from one to the other using a controllable dc bias to perturb the underlying potential energy function; the gate probability is unity over a definite regime of noise intensity.Notably, a possible biological realization of LSR was proposed by Dari et al. [17], [18] in a theoretical description of an engineered gene network.
In this work, all the computational elements are dynamic in nature, including the output node; each nonlinear element is characterized via a stochastic differential equation with nonlinear inter-element coupling terms, and subject to a deterministic (taken to be constant in this work) applied bias signal.The network elements switch between their stable steady states in the presence of noise with an appropriate selection of coupling strengths.The output of the system will thus not be deterministic, and any result stemming from the network has a probabilistic structure from which statistical measures, such as mean values can be computed, via a long-time probability density function.In Section II, we construct an array consisting of a single overdamped bistable ''reference'' element coupled to an ensemble of ''bath'' elements which evolve on significantly faster time scales than the reference element.Hence, the long-time system behavior is well approximated by the dynamics of the reference (or ''slow'') element.It is assumed that the bath is never far from its long-term steady state.Under these assumptions, the dynamics of the reference element can be obtained in closed form using an adiabatic elimination technique or ''slaving principle'' first propounded by Haken [19].The response (for assorted constant input signals and noise intensities) of the network is quantified by the steady-state (i.e. at long time) probability density function of the reference element.As an aside, we note the similarity of our system to ''liquid networks'' [20].Analogous to liquid networks, our noisy system yields a random 2-state time series solution.However, the relevant information for realizing the XOR is contained in the long-time probability density function P(y 1 ) of the readout element, rather than the actual time-series solution.
We consider, in this work, the particular case of a small network of five elements, based on the architecture of a particular ANN which implements the XOR.The network consists of two input elements (each of which accepts an input dc signal) coupled to a single output or readout element while passing through an intermediate or ''hidden'' layer comprising two elements.Each element has its own noise floor (assumed to be uncorrelated from element to element).Hence, our system implements an analog transform that approximates the XOR function in probability space.The transform has analog inputs, whereas the output consists of probability distributions that can be partitioned into two equivalence classes representing a ''TRUE'' or ''FALSE'' result based upon whether the dc inputs have opposite signs, or the same signs, respectively.We note that the theory derived in Section II is valid for networks of arbitrarily sized inputs and hidden layers and involves forward and backward coupling, subject to certain constraints on the parameters.
In Section II, we introduce the network via a system of coupled stochastic differential equations (SDEs).A Fokker Planck Equation (FPE) for the network is set up, then decoupled using the above-mentioned adiabatic approximation.The FPE is a diffusion type equation that describes the evolution of the (N − body) probability density function; it involves averaged quantities such as the mean and variance of the noise in each element, rather than the noise itself.The FPE-based approach is particularly advantageous for the network under consideration; we will touch on this point later in greater detail.Eventually, we are led to a single effective SDE for the readout element (this is the ''slow'' element in the framework of the adiabatic approximation), which is the goal of the theory.From this SDE, we readily derive a closed form probability density function for the readout element in the long time limit; we will see that the FPE description coupled with the ''slaving'' argument accomplishes this goal quite neatly.The effects of the remaining ''fast'' elements in the network are manifested in a skewing of the slow element probability density function; effectively the fast elements constitute a ''bath'' to which the slow (i.e.readout) element is coupled.It will, additionally, become clear that our general procedure based on a decoupling of the many-body FPE can be readily applied to larger networks and even more complex coupling schemes.
In Section III-A, the results obtained via the analytic model of our network are compared with numerical simulations and shown to match within the constraints of our approximations; these constraints will be defined as we proceed with the calculations.Finally, in Section IV, we consider the implementation of the coupled dynamical system in hardware by carrying out circuit simulations using a validated Process Design Kit (PDK).We demonstrate that the circuit implementation yields nonlinear dynamic behavior (in this case hysteresis) in the output (or readout) element, which matches the predictions of the theoretical model remarkably well.While we do not explicitly use the nonlinear circuit to realize the XOR (this is left to an upcoming paper), the agreement of the theory with the circuit realization gives us the confidence to use it as a roadmap to adjust the circuit parameters to obtain the desired dynamical behavior.This is especially important as the number of elements and/or complexity of the system grows, and direct simulation becomes time prohibitive.

II. THE COUPLED DYNAMICAL NETWORK
We consider a network of coupled overdamped bistable elements, each subject to noise and a constant deterministic signal.The choice of nonlinearity in our dynamics is important.We use a hyperbolic tangent function that is bounded above and below and, hence, reproduces the sigmoidal transfer function that historically has been used to mimic the on-off dynamics of biological neurons.
In its most general form, the network can be described by the coupled system of stochastic differential equations for a generic state variable y(t) (e.g. in a nonlinear circuit, as considered later in this work, y i (t) represents the voltage in the i th element) where the overdot denotes the time-derivative.The noise N i (t), in the i th element, is assumed to be Gaussian, and delta-correlated with zero mean and variance σ 2 i ; further the noise sources in different elements are assumed uncorrelated.We note that this so-called ''white'' noise is an idealization.Rigorously, N i (t) should be correlated (or ''colored'') noise obtained via an Ornstein-Uhlenbeck process [23], [24] that leads to exponentially correlated noise with correlation time τ c which is, roughly, the inverse of the noise bandwidth: which corresponds to a power spectral density In the limit τ c → 0, the rhs of (2) becomes a δ function i.e. we obtain Gaussian δ− correlated noise, having (by definition) zero mean: The power spectral density, in this ''white'' limit, becomes a constant equal to the noise variance σ 2 i .We point out that having the noise autocorrelation function (in the colored case and the corresponding white limit) that is a function only of the time difference, is a hallmark of a Markov process [23].
As noted above, the (theoretical) concept of infinite bandwidth or ''white'' noise is unrealistic and unrealizable in experiments; the noise can usually be adjusted to have very large (however, not infinite) bandwidth.In practice, it is sufficient that the noise have a bandwidth much greater than other system frequencies of interest (e.g. the inverse of the time constants a i and any externally applied frequencies), in order to use the white noise representation (4).We note that the noise arising in real systems can have a wide range of bandwidth, arising from the underlying physics.As an example, Superconducting Quantum Interference Devices (SQUIDS) [28] typically have a noise floor with gigahertz bandwidth, whereas noise in some biological systems might have a bandwidth of only a few hertz.In both these cases the noise may be taken to be broadband; what matters is for the bandwidth to be larger than any other frequency (or inverse time) in the system.In Section III we will detail how the noise bandwidth and the simulation time-step are intertwined.
We observe that, in our notation employed in (1), the noise N i (t) has unit variance and is multiplied by the standard deviation σ i .The rhs of (1) is where each element is subject to a constant bias term ε i .The nonlinear terms include the ''self'' coupling with strength J ii , as well as nonlinear coupling (via a coupling coefficient J ij , j ̸ = i) to the other elements.The coefficients a i have units of inverse time and must be positive to maintain dynamic stability.

A. BACKGROUND: SINGLE ISOLATED ELEMENT
We begin with a concise dynamical description of a single uncoupled (i.e.isolated) element, which is the basic building block of our network.Its dynamics are (we drop the subscripts for convenience) with the potential energy function given by and the ''switching curve'' corresponding to the deterministic portion of the rhs of ( 6) Clearly, in the presence of additive (Langevin) noise, the extrema of the potential energy function are obtained using the deterministic dynamics.The noise manifests itself when we consider the probability density function of element y.This probability density function can be obtained by setting up the Fokker Planck Equation (FPE) that governs its temporal evolution: [23]: with the drift (or streaming) term having the kernel A(y) = −∂U (y)/∂y.The important thing to note is that the FPE contains the noise dependence in a second order diffusion term involving the variance σ 2 .In general, solving the FPE can be quite daunting, except when the kernel A(y) is linear or possessed of some other convenient functional form [23].However, in the absence of an explicit time dependence on the rhs of (9), one may write down the long time or steady-state solution to (9) by setting its lhs to zero, and integrating the resulting differential equation [23].The result is: where N is a normalization constant.This indicates that the spread of the probability density function is directly dependent on the noise variance.We note that, for a more complicated case wherein there is an explicit time-dependence on the rhs of ( 9), one cannot simply write down a solution of the form (10); it would be necessary to obtain the full time dependent solution to (9), and then find its limit as t → ∞.In simulations, one would assemble a time-series of the stochastic variable y(t).Depending on the choice of parameters this series displays a single state or bistable state behavior, corresponding to a potential energy function that is monostable or bistable.All the important system characteristics are manifested in the potential energy function (7) wherein the parameters (a, J ) are obtained from the physics of the particular system under consideration.Knowing this potential energy function is the critical stepping stone to understanding the system behavior.
The locations of the extrema of the potential energy function, for the bistable case, can be determined using simple calculus.First, we consider the symmetric case (ε = 0).The extrema of the potential energy are given by the roots of y − η tanh y = 0 (where η ≡ J /a), the rhs of the (previously defined) switching curve.The trivial solution is y = 0, which can be shown to correspond to the unstable fixed point (maximum) of the potential function.We readily observe that η > 1 is the condition for bistability.For η ≤ 1 the potential function becomes monostable, however it becomes perfectly parabolic only in the J → 0 limit.The remaining extrema are found by approximately obtaining the (non-zero) roots of the switching curve.We find for the (stable) minima y ± (ε = 0): where the approximation becomes exact for large η, and the unstable fixed point is, as noted above, at y = 0.For non-zero ε the minima are shifted and a simple expansion yields which can be simply written as: For this case (ε ̸ = 0), the unstable fixed point is, of course, located at y = δ.Implicit in the derivation of ( 13) is the assumption that ε/J ≪ 1.The last statement requires qualification.If ε is too large, it will render the potential monostable.In this case the three roots of the switching curve become a single real root and two complex conjugate roots.Hence, if we require the potential to remain bistable (as throughout this paper), this constrains the dc signal ε.
We also add the requirement f (x, y) ≤ 0 if xy ≤ 0 and f (x, y) ≥ 0 if xy ≥ 0. One particular ANN that approximates these requirements is the two layer network with functional form (15) wherein the error-defined as the discrepancy between f β and f in ( 14)-can be made arbitrarily small by increasing β.The decision boundary of this network (the zero-locus of f β ) can be made arbitrarily close to the pair of parallel lines x + y = ±1.The implementation of the XOR in an ANN is described in [21] and [22].
In the next section, we create a network of coupled dynamical oscillators with the same architecture as this ANN, and with parameters guided by (15).This network will be the underpinning of a ''dynamic XOR''.We also demonstrate that N = 5 elements is the minimum configuration required to implement the XOR, similar to the case with ANNs.

2) DYNAMIC XOR
Based on the discussion above, we set up a coupled dynamical system of five elements in a two-layer configuration (see Figure 1).For simplicity, we do not include backcoupling, making this a feed-forward network.We start from our general network equation (1) and write out the q i : q 1 = −a 1 y 1 + J 11 tanh y 1 + J 12 tanh y 2 + J 13 tanh y 3 + ε 1 q 2 = −a 2 y 2 + J 22 tanh y 2 + J 24 tanh y 4 + J 25 tanh y 5 + ε 2 q 3 = −a 3 y 3 + J 33 tanh y 3 + J 34 tanh y 4 + J 35 tanh y 5 + ε 3 q 4 = −a 4 y 4 + J 44 tanh y 4 + ε 4 q 5 = −a 5 y 5 + J 55 tanh y 5 + ε 5 . ( The structure of these terms describes an input layer comprised of 2 elements y 4,5 , with dc inputs ε 4,5 and (uncorrelated) noise terms N 4,5 (t).The intermediate layer, comprising elements y 2,3 with analogous dc and noise driving terms, was coupled to the input layer.Finally, the (single) readout element y 1 draws its input from the intermediate layer.
System (1) with definitions ( 16) cannot be solved analytically in the presence of noise sources unless appropriate approximations are made.We follow our previous work [12] and assume that the ''readout'' element y 1 evolves far slower than the remaining elements, thereby allowing us to look at the problem as a system (the ''slow'' element y 1 ) coupled to a ''bath'' of (in this case) 4 elements, with the latter reaching their steady states far more rapidly than the reference or readout element y 1 .The conditions FIGURE 1.A schematic representation of a 5-element, 2-layer system.J ik are the coupling coefficients between the individual elements, whereas ε i =1−5 are the (controllable) constant (i.e.dc) biases applied to each element.ε 4,5 are, therefore, dc inputs to the system.y 1 is a ''slow'' element, whereas y i =2−5 are ''fast'' elements (see text).The steady state probability distribution of the y 1 element is taken as the relevant output of the system.y 2,3 constitute an ''intermediate layer'' which is subject to dc bias (ε 2,3 ) and noise (N 2,3 ).For simplicity, this paper focuses only on a feed-forward network with all back-couplings (J 21 , J 31 , J 42 , J 52 , J 43 , J 53 ) set to zero.
for this to occur are detailed in [12] and [19].For our purposes the condition can be boiled down to demanding that a 1 ≪ a i , i = 2 . . . 5. We demand that each individual element is bistable when isolated, meaning that J ii /a i > 1 ∀i.This sets us up to consider the most interesting case while realizing that the coupling and dc forcing terms can render one or more elements monostable in the (coupled) network.An analytic solution of the coupled stochastic differential equations (SDEs) (1) would be impossible under any other scenario; however, using the above-mentioned adiabatic approximation [19], we aim to obtain the long-time probability density function P(y 1 , t → ∞) which we denote as P(y 1 ) for convenience.The structure of P(y 1 ) under assorted coupling, noise and forcing scenarios will inform the realization of the XOR gate.
The starting point is the N -body Fokker-Planck equation (FPE) for the probability density function P({y i }, t) corresponding to the system of variables y i (corresponding to the dynamically evolving nodes in the network of Figure 1) that satisfy the coupled SDEs (1).This may be written as [23] with the drift terms defined in (16) and the noise terms manifesting themselves in the second order diffusion terms.The ''slaving'' argument [19] then leads us to factor the probability P({y i }, t) as follows (suppressing t to simplify our notation) where h is to be interpreted as a conditional probability density for finding y j>1 given y 1 (both h and P are normalized to unity).The slaving constraints (i.e. the separation of time-scales embodied in the relative values of the coefficients a 1 and a i>1 ) then imply that we may substitute the factored probability density function (18) into the original FPE ( 17) and separate the slow variable y 1 from the fast one(s) [12], [19].After some algebra, we obtain with the kernel A(y 1 ) defined by × q 1 (y 1 , y 2 , . . ., y N )dy 2 dy 3 . . .dy N .
The above equation represents the formal (N -element) decoupling of the Fokker-Plank equation with the probability density function P(y 1 , t) representing the slow element dynamics.We have assumed, in deriving the above (decoupled) FPEs that the derivative of h with respect to y 1 may be neglected compared to derivatives with respect to y i>1 , an approximation stemming from the slaving assumption [19].We will specialize the theoretical calculations to our case of N = 5 using the feed-forward coupling scheme embodied in ( 16).
We will be concerned with steady state solutions for the fast and slow element probability density functions.By this we mean that, in the absence of explicit time-dependent terms on the rhs of (19), the solution to the FPE at long times provides a very good representation of the system behavior.In other words, we are concerned with the solution h(y 2 . . .y N , t) in the t → ∞ limit.In a practical system, this limit is reached after allowing the transient behavior of the system to die down; the rate at which this occurs depends on the time constants a i , with the fast elements y 2 . . .y N settling down to their steady states very rapidly, compared to the slow element y 1 .Theoretically, the long time limit corresponds to setting the lhs of ( 19) equal to zero and integrating the resultant equation; this approach is valid only when the rhs of (19) does not contain explicitly time-dependent terms [23].
The utility of the FPE is now clear.As long as one can decouple fast and slow variables leading to the separate FPEs ( 19) and ( 20), the steady state solution becomes a matter of integrating an ordinary differential equation subject to the appropriate boundary conditions.This steady state solution which can be (in our system) written down analytically, using a procedure analogous to our derivation of (10), is precisely what we will need in our characterization of the XOR function.Conveniently, the FPEs to be integrated do not involve the noise terms directly, only their moments (in this case the mean and variance); this is because the FPE is a diffusion type equation, for a probability density function, in contrast to the original SDEs (1) which involve the original random variables y i and, explicitly, the noise terms.
We note that the theoretical approach (underpinned by a steady state solution to the FPE) is equivalent to (numerically) integrating the coupled system (1) and fitting a probability density function to the ensuing time-series solution.As long as we have allowed transients to die out (in practice this can be accomplished by discarding a tranche of data at the front-end of the time series), this numerically generated density function corresponds to the long time solutions to (19) and (20).Clearly, an experimental manifestation of the coupled system, as carried out in Section IV, wherein data is collected as a time series, would yield the same probability density function as the above-described procedure involving numerical integration of (1).Finally, when making comparisons between numerical integration results and nonlinear circuit simulations, a rigorous conversion of abstract dimensionless quantities (e.g.J ij , a i , σ 2 i ) to corresponding quantities (and their correct units) defined via the circuit, is of great importance; we do this in Section IV.
Our choice of coupling scheme allows us to further simplify the system.We note that the dynamics of the input elements y 4,5 are independent of the remaining elements, due to the absence of back-coupling.Similarly, the elements y 2,3 can be regarded as independent, having no direct coupling to one another, no back-coupling to the input elements,and only forward coupling to the readout element y 1 .For this special case, we can factor the entire density function for the fast variables according to which we substitute into (21) and work systematically through the integrals, remembering that each density function is normalized to unity.As an example, we can write the steady state solutions for the input elements y 4,5 as follows with a similar expression for h(y 5 ).Here, N 4 is the normalization constant; note that each elemental probability density function is normalized to unity.These solutions are obtained by solving the FPEs for the steady state density functions h(y 4 ) and h(y 5 ); in the absence of back-coupling these FPEs do not depend on the elements y 2 , y 3 ) to which (y 4 , y 5 ) are forward coupled only.Knowing the steady state density functions for the input elements, we can write the dynamics of y 2,3 in the form: Here we have defined The effect of the input elements on the intermediate layer (comprising the elements y 2,3 ) is to offset the dynamics by constant terms C 2,3 which adjust the asymmetry in the y 2,3 dynamics.As noted in the preceding subsection, this results in a skewing of the potential energy functions with a concomitant effect on the corresponding probability density functions (see (7) and (10).
We now address the integrals in (26) using the y 4 integration as an illustration.We assume that the density function h(y 4 ) sharply peaks at approximately its two stable steady states y 4± .Then one can replace the integral by the sum of two separate integrals in the left and right quadrants, and expand the exponents to second order about each stable state point y 4± (the first-order terms vanish because the stable states are extrema; in this case, the maxima of the probability density function) and break up each integral into two summed integrals over [−∞, 0] and [0, ∞].We note the necessity of choosing parameters such that the integrands are sharply peaked about the stable minima of the potential energy function (or the maxima of the corresponding probability density function).The normalization is carried out in the same manner and is included in the calculation.After some work we obtain where where the double prime denotes the second derivative, and the steady states are given by (see the preceding subsection) We can do some additional simplification to finally obtain with the definitions In ( 27) and ( 30), the denominators arise via the normalizations.An analogous (to (30)) expression can be written for tanh y 5 .Thus, we have evaluated the constants C 2,3 on the rhs of (25); the input signals ε 4,5 are contained in these constant terms, together with all other parameters (including the noise variances) contained in the input elements.Equations ( 30) and ( 31) can be simplified further.We can easily set f 4 ≈ 1 because of the behavior of sech x at large x; this assumes that the parameters (a 4 , J 44 ) are selected such that U (y 4 ) is bistable and the (stable) fixed points are reasonably well separated from the unstable fixed point.Further, we can calculate the quantity U 4 Using the definition in (29), we are readily led to with a similar result for U 5 .To arrive at (33), we perform a Taylor expansion of (32) to O( 4 ).We now consider (30), and use the definitions in (29) With these evaluations, the equations for y 2,3 are closed and we can write the steady state density functions with an analogous expression for h(y 3 ).Let us, now, recall that the y 1 dynamics are given by with the definitions 144312 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.With the (now) closed form expression (36) and the analogous expression for h(y 3 ), we can compute the integrals above exactly as done previously.Finally, we are led to the steady state probability density function (10) for the reference (or readout) element y 1 in closed form, with the potential energy function which has its extrema located at where we have dropped the second terms on the rhs of ( 34) and (35), as well as in the corresponding (not shown) calculated quantities from (39), because (for all reasonable parameters) these terms are quite small.Importantly, the expansion in e.g. ( 29) and (41) require the second (and higher order) terms to be ≪ 1.It is possible that these terms are ≈ 1, depending on the choice of parameters.However, multiplying these terms by sech 2 y i0 , i = 2 . . . 5 (and higher powers thereof) renders the second (and higher) terms on the rhs of e.g. ( 34), (35) ≪ 1 so that they can be dropped when compared to the leading terms.Equation (40), in which the deterministic constant C 1 contains the net effects of coupling to the rest of the network, is used to realize the XOR gate based on the structure of U (y 1 ) or, equivalently, P(y 1 ).We note that the effect of the fast elements is to contribute to the asymmetry (corresponding to C 1 ̸ = 0) of the probability density function P(y 1 ) of the reference (or readout) element.The noise variances in the input elements and the intermediate layer are also incorporated in C 1 , together with all the other system parameters appearing in (16).Finally we observe that, for the special case of all ε i = 0, the integrals in ( 26) and ( 39) vanish such that the potential energy function U (y 1 ) and the associated probability density function P(y 1 ) are bistable and symmetric.
Later (Section IV), we describe circuit elements that can replicate this system in a CMOS environment.For now, however, we close this section with an important note.Our system (Figure 1) does not have back-coupling, meaning that terms in J 21 , J 31 etc., are taken to be zero.In turn this allows a decomposition of the ''bath'' density function into the product of individual density functions, as given in (22).For our particular system (1) without back-coupling, this means that the fundamental tenet of the adiabatic (or slaving) assumption (a 1 ≪ a i>1 ) can be relaxed.However, in the presence of back-coupling, this assumption becomes necessary to achieve a smooth decoupling of the system into slow and fast elements; this is exemplified in our earlier work [12].With non-zero back-coupling, the calculation is more complicated, however, it remains tractable.Regardless, we chose to display, formally in ( 17)-( 21), the decoupling via the ''slaving principle'', to demonstrate how it could be carried out for the more general case.It should be clear that our procedure (even with back-coupling included) can be applied to larger arrays even though, in this work, we focus on a small feed-forward network.In our earlier work [12] we did, in fact, treat a simplified network with back coupling, using the slaving principle to achieve the decoupling of fast and slow elements.
Before concluding this subsection, we return to the consideration of the minimum number of elements required to realize the XOR in our dynamical network.Focusing on the sub-network defined by the elements 3, 4 and 5, and viewing it as an isolated single-layer network with two inputs and one output (or readout) element, the expression for C 3 in (42) provides a mapping from the input variables to the effective offset or skewing constant which completely characterizes the probability density of the output element: Here, we have subsumed the constants appearing in (42) into α i and β i .After suitable re-indexing and re-labeling, this functional form should be compared with (15) and is readily seen to be manifestly inadequate to realize the XOR.This stems from the monotonicity of the tanh function and the separation of the input variables in the arguments.Therefore, in order to realize the XOR with our dynamical network, we need a hidden layer containing at least two elements, so that the total number of dynamical elements is N = 5; this is similar to the case with conventional ANNs.We note that for the case of ANNs, the requirement that N = 5 at minimum to realize the XOR is described in [21] and [22].

C. STRUCTURE OF THE PROBABILITY DENSITY FUNCTION P(Y 1 )
The preceding analysis has shown that the effects of the background noise terms in the fast elements enter into the slow element dynamics via the constant term C 1 defined in (38).The effect of this constant term is to skew the density function P(y 1 ).If C 1 = 0 (which can be achieved by judicious adjustment of the magnitude and sign of the dc bias term ε 1 , or by adjusting other parameters in ( 16)), the density function is bimodal and symmetric, with the peaks centered about the deterministic fixed points y 1± .The transition between monoand bi-modality is governed by the streaming term A(y 1 ) (38), whose zeros yield the locations of the extrema of the potential energy function (this point was also discussed in our analysis of a single bistable element, above).The extrema of the probability density function P(y 1 ) are, exactly, the extrema of the potential energy function U (y 1 ) with the minima of the potential energy function corresponding to the maxima in the probability density, and the unstable fixed point between the potential minima corresponding to the minimum (located between the 2 maxima) of the probability density.
The transition from bistability to monostability corresponds to the three real roots of the transfer function A(y 1 ) collapsing into a single real root and two complex conjugate roots.It is worth quantifying the threshold at which this occurs.Figure 2 shows a plot of the function A(y 1 ) for an arbitrary (for purposes of illustration) choice of parameters a 1 , J 11 selected so that, in the absence of a constant term on the rhs, the system is bistable, meaning that A(y 1 ) has three real roots; this means taking |J 11 | > |a 1 |.In the middle curve we set C 1 = 0 yielding a transfer characteristic that is symmetric about the vertical axis and has three real roots.The potential energy function for this case is bistable with wells of equal depth with the minima occurring at the intersections of A(y 1 ) with the horizontal axis.These minima are also, as mentioned above, the locations of the peaks in the probability density function obtained by using (40).The unstable point of the potential is at y 1 = 0.
As C 1 steadily decreases from zero, the area between the curve and the positive axis in the right half plain steadily decreases.This corresponds to one of the potential wells becoming deeper than the other until, at the critical value (shown in the bottom curve), a minimum in the potential energy function (in this case, on the right side) has been replaced by an inflection point (occurring at y = y p ). Past this point the transfer function admits one real root and two complex conjugate roots; the potential is monostable (but not parabolic).Completely analogous changes occur in the steady state probability density function; it transitions from bimodal (and symmetric) to unimodal (in this illustration, on the left side, with the peak located at the real root of A(y 1 ).
Simple calculus allows us to quantify this transition.We begin with the streaming term in Equation (38).At y 1 = y p (the bottom curve in Figure 2), one has an extremum so that A ′ (y p ) = 0.This leads to y p = tanh −1 1 − a 1 J 11 .Very close to the critical case (for the bottom curve in Figure 2), we can expand A(y 1 ) about y p A(y 1 ) = A(y p ) − a 1 (y 1 − y p ) 2 tanh y p . (44) Rearranging terms, we obtain a quadratic in y 1 which can be solved for the roots where tanh y p is given above.Referring to Figure 2 it is clear that, for the transition occurring at y 1 = y p we must have A(y p ) > 0 for bistability (3 real roots), A(y p ) = 0 at the critical point (where the bottom curve just touches the y 1 axis), and A(y p ) < 0 for one real and two complex conjugate roots; in the last case, the single real (and negative) intersection of the transfer curve with the y 1 axis marks the location of the (single) peak of the monomodal density function.Putting all this together, we obtain as the condition for monostability in the Left Half Plane (LHP).The critical case (shown in the bottom curve of Figure 2) occurs for A(y p ) = 0.In an entirely analogous manner (in this case, by increasing C 1 ) we can obtain a monomodal density function for the Right Half Plane (RHP).The condition for this is found to be with the critical case now occurring for A(y m ) = 0.The rhs of the above equalities determine the critical values of the constant term C 1 as the density function transitions from being monostable (in the LHP) and peaked at the positive root of the transfer characteristic (bottom curve in (2)), to being monostable in the RHP (top curve).Between these extreme cases, various ''degrees'' of bistability are obtained.We note that the mean value ⟨y 1 ⟩ = ∞ −∞ y 1 P(y 1 )dy 1 traverses from right to left as we progress through the situations (top-tobottom) depicted in Figure 2. Concomitantly, the mean value ⟨y 1 ⟩ = ∞ −∞ y 1 P(y 1 )dy 1 traverses from left to right as we progress through the situations (top-to-bottom) depicted in Figure 2. When P(y 1 ) is perfectly monomodal the mean value corresponds to the location (given by Equation ( 41)) of the peak.For the middle case in Figure 2, the mean value is zero, corresponding to C 1 = 0 in (38), that is, a symmetric bimodal probability density function P(y 1 ).
Clearly, with all other parameters fixed, the constant forcing term ε 1 may be adjusted to realize the critical (from (46) or (47)), or any other desired value (from (42)) of C 1 .If ε 1 is to be kept fixed (or zero), the other system parameters in ( 16) ca be adjusted to realize the required value of C 1 .This is a fairly complicated proposition because of the sheer number of available parameters.We reiterate that, in general, the crossing points of the switching curve A(y 1 ) = 0, as shown in Figure 2, are functions of every element in the network i.e. they depend on all the parameters in the array (16) as well as the noise variances σ 2 i>1 of the ''fast'' or ''bath'' elements.
The above analysis, aimed at determining the characteristics of the probability density function P(y 1 ), e.g. the locations and relative heights of the peaks, are essentially deterministic, containing the noise variances σ 2 i>1 as parameters that change the peak locations and heights through their effect on the quantities ( 26) and (39) and, ultimately, on A(y 1 ), the deterministic contribution to the y 1 dynamics (37).The slow element noise variance σ 2 1 appears only in the slow dynamics (20) leading to the long time behavior quantified in (40); its sole effect is to change the width of the lobes of the probability density function P(y 1 ).It does not affect the locations of the peaks of P(y 1 ).
We can investigate this behavior further by examining the regimes where the constant C 1 forces the probability density P(y 1 ) to transition from bi-to mono-modality; this is central to our characterization of the XOR and is done in the following sections.

III. APPLICATION OF THE THEORY TO THE XOR TRANSFORM
In what follows, we apply the theoretical results of Section II to the realization of the analog equivalent of the XOR function in probability space.We show that this network provides a realization of an XOR (predicated on the structure of the probability density function P(y 1 ) of the readout element) except in well-defined regions of parameter space.In a subsequent section, we validate our theoretical results with numerical simulations and propose a nonlinear circuit implementation of the 5-element system.We begin by recapping the observation that the probability density P(y 1 ) can transition between mono-and bi-modality as shown in Figure 2. The transition between these configurations is controlled by the parameter C 1 as calculated in (38) and (42).For C 1 = 0 the probability density function is bimodal with symmetrically placed peaks of equal heights.As C 1 starts to deviate from zero, the density function transitions towards monostability in the left or right half, depending on the sign of C 1 .The critical point at which purely monostable behavior is obtained can be calculated using ( 46) or (47).Given the large number of parameters in our 5-element system, we keep the matrix of couplings J ij fixed in our analysis going forward, while ensuring that all elements are bistable, that is J ii /a i > 1.We then examine the XOR in probability space by varying only the noise variances σ 2 i and the constant bias terms ε i .
Our approach consists of calculating the parameter C 1 as a function of ε 4,5 , the constant biases applied to the input layer.As a representative case, Figure 3 shows the contours of C 1 in the 2D space of ε 4,5 .The long time probability density function P(y 1 ) is symmetrically bimodal along the contour C 1 = 0 shown in the figure.On either side of the C 1 = 0 contours, we have a gradual transition to monomodality with the critical C 1 calculated using ( 46) or (47).For this specific set of parameters, the C 1 = 0 contours serve as delineations between the two logic output states.Different bias bias and system parameters can create different (but still bistable) probability distributions than those in Figure 3, but the requirement is the same: two different distinguishable distributions, one resulting from (ε 4 , ε 5 ) in quadrants I and III, the other from (ε 4 , ε 5 ) in quadrants II and IV.
Note that, due to the immense number of adjustable parameters in the theoretical model, we have selected a particular set of parameters, for the simulations.In particular, we keep the matrix J ik , i, k = 1 . . .5, the column vector of time constants a i , i = 1 . . .5, the dc biases ε 1,2,3 , and the noise variances σ 2 i i = 1 . . . 5 fixed.This allows us to examine the XOR in the space of the two input dc bias signals ε 4,5 .These parameters are kept fixed for the numerical simulations, as well.It is clear, that changing the magnitudes and signs of some of the parameters, e.g. the coupling coefficients, could lead to much different behavior, including changing where the XOR is realized in the space of ε 4,5 .For this paper, we deliberately chose parameters such that every element in the system (1) remains bistable, assuming that the bias signals are chosen to be small enough to retain the bistability in each element.
The ratios of the peak heights in the density function P(y 1 ) are important for characterizing the XOR; this is evident from Figure 3.These can be calculated in terms of the density function P(y 1 ) or its associated potential energy function U (y 1 ).We begin with the potential energy function.The difference (underpinned by the constant C 1 ) in well-depths can be expressed as where we have used ( 40) and ( 41) and expanded to leading order in 1 .Given that y 10 = (J 11 /a 1 ) tanh(J 11 /a 1 ) is positive by definition, the sign of C 1 determines which well is deeper.A similar calculation can be carried out for the ratio of the peak heights in the probability density function P(y 1 ) which is also, approximately, the ratio of the areas enclosed by each peak when using the Gaussian approximation (as was performed in Section II).This ratio is the ratio of the probabilities of the system being in one state (negative quadrant) or the other (positive quadrant).In the event of equal probability of being in the two states (that is, perfectly symmetric bistability), C 1 = 0 and the ratio is unity, as expected.Analogous to Figure 3, we can examine the structure of the probability density function P(y 1 ) as a function of the inputs ε 4 , ε 5 for different selections of the input noise.This is illustrated in Figure 4.In each sub-figure the case for C 1 = 0 (corresponding to symmetric bistability) is shown.We recall that the noise variance in the elements y k , where k = 2, 3, 4, 5 can affect the structure (including the fixed points) of the individual density functions h(y k ) and, hence, the structure of the readout element density function P(y 1 ); this is evident when we compare the figures.As an example, in the bottom center figure where noise is increased in both the hidden and input layers, the transition to complete monostability in the RHP occurs for C 1 = 76, within the red regime depicted in Figure 3.The critical points for transition to monostability can be calculated using Equations ( 46) and (47).We reiterate that the noise variance σ 2 1 in the readout element only affects the spread of its density function, with no effects on the fixed points.Accordingly, in Figure 4 we hold σ 2 1 fixed.In addition, for simplicity, the matrix of coupling coefficients J ik and the constants a i are also kept fixed.

A. COUPLED ELEMENT SIMULATIONS
To validate the theoretical model developed in the preceding section, we solved system (16) numerically.The general stochastic differential equation is solved by Euler-Maruyama integration [25].Using the definition of white noise ξ (t) = dW /dt as the derivative of a Wiener process, we rewrite (50) as Let t be the integration step, and let t i = i t.At t i+1 , y is incremented by approximately where W i is drawn from a normal distribution N (0, √ t) with mean zero and variance t.
The characteristic time scale of the system (1) is defined by the fastest element: τ = 1/ max{a i }.In order for the Euler method to be stable, we must have t ≪ 2τ .Balancing the need for accurate simulations with the execution time, we chose a time step t = τ/B, and solved (52) for t ∈ [0, M τ ], thereby generating 5 sequences of B • M points {y i (t j )}.We discard the first few points to let transients die down.Then, the steady state probability densities for y i are calculated by applying a kernel density estimator with bandwidth 0.2 to the resulting sequences, with smoother appearing densities for larger values of M .For Figure 5, we used B = 10 and M = 10 6 .
In Figure 5 we plot the simulated steady-state probability density functions for all elements y i in response to four pairs of Boolean inputs defined by pairs (ε 4 , ε 5 ) where ε 4,5 = ±7 (all other ε i are fixed).When the two inputs have the same signs (i.e.either both at -7, or both at +7, corresponding to (''FALSE'', ''FALSE'') or (''TRUE'', ''TRUE'')) the distribution P(y 1 ) is left-skewed and an output of ''FALSE'' is expected.However, when the system is presented with inputs having opposite signs (i.e. one at −7, the other at +7, corresponding to (''FALSE'', ''TRUE'')), the distribution P(y 1 ) is right-skewed and the output ''TRUE'' is expected.In addition, the corresponding theoretical steady-state (i.e. in the t → ∞ limit) probability density functions ( 23), (36), and (40), obtained in Section II, demonstrate good agreement (in particular the peak locations and relative heights) with the numerical simulations.While this result is for J ij and (fixed) ε i that were not selected via any definitive criteria, the critical point is that J and ε are chosen such that the four input pairs reduce to two output probability distributions which are sufficiently distinct that they can be differentiated and equated to representations of logical ''TRUE'' or ''FALSE'', thus comprising an XOR transform.The characteristics of the distributions generated from a given choices of J and ε (peak locations, height, etc.) are immaterial as long as a correspondence can be made.

IV. NONLINEAR COUPLED CIRCUIT IMPLEMENTATION
Finally, we implemented the computational method presented here in physically realizable hardware.As fabrication of components is extremely expensive, we chose a technology, CMOS, that has well established simulation suites such as Cadence.Although it is difficult to verify noise driven switching in the Cadence iterative simulation environment, CMOS represents a reasonable first exploration into circuit realizations.It provides initial evidence that our underlying assumptions are valid while offering an environment in which implementation can be validated.Established platforms for integrated circuit design and simulation, such as Cadence, are not designed for validating transient solutions in systems wherein the noise to signal environments approach unity.As a result Cadence is slow and difficult if not impossible for system validation in the time domain.We used Cadence to simulate the coupled N = 5 system without added FIGURE 5. XOR demonstration via simulated steady state probability density function P(y 1 ) of readout element.In this example, the self-coupling terms are J 11 = 5, J ii = 50 for i > 1.The cross-coupling terms are J 12 = J 13 = 1.5, J 24 = J 25 = 15, J 34 = J 35 = −15.ε 1 = −1.5Each box shows probability density function derived from theory (Section II) in blue and simulation in green.The theory, despite invoking an expansion of the form (29) to calculate the fixed points, predicts the extrema remarkably well.The theoretical probability density function is calculated in the t → ∞ limit and would yield better overall agreement with the simulations if the latter were run for longer times.Nonetheless, the ratios of peak heights (or equivalently, the ratios of areas enclosed by each peak) show good agreement between theory, i.e. (49), and simulations.
noise (meaning there may still be component noise present), and showed good agreement with the theory of Section II.We demonstrate this comparison of theoretical results to those obtained via the circuit simulation, by observing the hysteresis behavior and comparing it to the (deterministic) switching curves depicted in Figure 2.

A. THE TANH FUNCTION GENERATING CIRCUIT
To construct a physical circuit implementing the dynamical system (16), each non-zero coupling coefficient J ij is embodied by a tanh generating circuit.It is well known that CMOS devices operating in subthreshold are readily configured to achieve a five-device tanh function [26] (see Figure 8 for a simplified depiction of three such circuits implementing row 1 of the J matrix).
A symbolic representation of a prototypical tanh generating circuit C is shown in red in Figure 6.This circuit produces a differential output current proportional to the tanh of the differential input voltage where V id = V ip −V im is the differential input voltage, κ is the subthreshold slope (extracted as 1.07 from Global Foundries 12LP), and U T = k B T /q is the thermal voltage (where k B is Boltzmann's constant, T is the absolute temperature, and When i = j , the dotted lines are connections and V od = V id , representing the self-coupling coefficient J ii .

FIGURE 7.
Circuit for 5 element realization.Negative weights J ij are implemented by crossing the output wires.Red elements are as displayed in Figure 6.
q is the electronic charge).The differential output voltage is defined to be V od = V op − V om .
When C is connected in parallel with a load resistor R, capacitor C, and external current source I E as shown in Figure 6, Kirchoff's current law dictates that the state of the circuit evolves according to where b = 1/(2κU T ).The noise current I noise is explicitly defined later in this paper.Here, C includes both the intrinsic and extrinsic parasitics of the CMOS cell.
For each non-zero coupling coefficient J ij , we create a fully differential circuit C ij with tail current I ij (explicitly defined later), input terminals at voltages V ipj , V imj and output terminals at voltages V opi , V omi .All C i * outputs in row i are at the same voltage as the input terminals of C ii .All C * j inputs in column j are at the same voltage as the output terminals of C jj .Figure 7 shows a block level circuit topology for the network of five elements implementing the XOR function.
To complete the mapping of this network to a system of stochastic differential equations, we define the state of the i-th element as the differential voltage of Inserting indices into (54) as appropriate, v i obeys the state equation The noise current I noisei is modeled as an additive Gaussian white noise signal N i (t) = s i ξ (t) of average power spectral density [26] where I trani = 2 j I ij is the current through the transistors in row i which contribute to the noise, and q is the charge of an electron.
By substituting y i = bv i and dividing by C i /b, we obtain the equations as described in Section II: The conversions between circuit parameters and abstract dynamical system are: Figure 7 introduces a block level circuit topology that implements the set of equations in (16).It uses the following parameters: Noise computations for the circuit in Figure 7 can be derived from the methods presented in [26].The effective bandwidth of the idealized white noise signal N i (t) is approximately i = 1/(4R i C i ), resulting in a noise current with 1-sigma amplitude which is significantly less than the typical noise amplitudes used in the simulations of (58).Based on these estimated results we expect to have to inject noise into each diagonal element of Figure 7.The signal-to-noise ratio desired is a trade off between acquiring a satisfactory sample size and the bandwidth of the inputs.A simplified schematic of the implementation of element v 1 is shown in Figure 8.Note that Figure 8 is the basis for all five rows in Figure 7.Each term I ij tanh(bv ij (t)) of Equation ( 56) is implemented by MOSFET devices M1-M5, with the tail currents of M5 setting the weights I ij .In this example, the current through M5A sets the diagonal weight I 11 , whereas the currents through M5B and M5C set the off-diagonal weights I 12 , I 13 .Cascode transistors M3 and M4  The general nature of each row lends extendability to the hardware implementation of large computational arrays as demonstrated by Figure 7 and Figure 8.The approach presented here can also be easily extended to arrays with backcoupling connections.As a direct result of this extendability, a simple cell library consisting of 30 diagonal and off-diagonal cells and their common mode cells will result in a powerful toolbox exploiting cell reuse in the fabrication of large weight arrays.
Figure 9 is a combination of a theoretical (from direct integration of the deterministic system) hysteresis loop (blue/green) and the loop (red/brown) obtained via the circuit.The red curve was generated directly from a Cadence simulation by incrementally adjusting the input current over the range [−150, 150] nA while holding ϵ 4 = ϵ 5 = 50 nA; this yields the symmetric case where two ''TRUE'' inputs generate a ''FALSE'' output with the hysteresis curves almost symmetric about the vertical axis.The blue curves were generated by numerically integrating a deterministic version of equations ( 56), where an additional inhomogeneous forcing term was added to the equation for v 1 .The input current is swept from −150 nA to 150 nA and back.The second hysteresis loop (on right-brown) is generated in a similar manner, with ϵ 4 = 50 nA and ϵ 5 = −50 nA; this is the case where one ''TRUE'' input and one ''FALSE'' input generate a ''TRUE'' output with its corresponding numerical integration case (green).
We note very good agreement between the theoretical and circuit results.These hysteresis loops should be viewed in the context of Figure 2. The (blue) switching curve, derived directly from the theory, is shown in the symmetric case (left set of loops in Figure 9) with its corresponding curve from the simulated circuit results (red).In this case, the blue and red switching curves cross the vertical axis in three places (three real roots).The hysteresis loops are nearly symmetrical about the vertical axis with the potential energy function (Equation 40) being nearly symmetric-bistable.The corresponding probability density function consists of two peaks of nearly equal area, centered at the points of intersection of the switching curves with the vertical axis.This is a case where the XOR function evaluates to ''FALSE''.The other set of hysteresis loops on the right portion of the figure (green/brown) where one input is ''TRUE'' and the other is ''FALSE'' display switching curves which will have only one intersection with the vertical axis (one real root and two complex conjugate roots); these hysteresis loops are shifted to the right.For this case, the probability density function is monomodal, the switching curve (not shown) intersects the vertical axis at a single point, and the XOR function evaluates to ''TRUE''.In both cases, the roots of the switching curves (that is, the locations of the extrema of the probability density function P(y 1 )), as well as the y-axis crossings of the hysteresis loops, are nearly identical via the theory and its circuit realization.It is important to note that the dotted segment of the blue switching curve (shown in the symmetric case) is dynamically unstable.It can be derived from theory, however, 144320 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
it does not appear in the experimental data.For the red curves we obtain vertical jumps between the two steady states, as explained above, and the dotted red curve in the symmetric case has been synthetically added for comparison with the theoretical switching (blue) curve, to illustrate agreement between theory and experiment.
This agreement suggests that, given a set of theoretical parameters which point to a regime (e,g, in Figure 3 or Figure 4) wherein the XOR can be realized, a nonlinear circuit manifestation of the coupled system would generate the XOR for the parameters obtained from the theory (and carefully transformed to the corresponding circuit parameters).

A. UNIVERSAL FUNCTION APPROXIMATION
We have focused on the use of coupled nonlinear dynamic elements to produce a network that implements an XOR as a concrete example.Our networks have, in principle, the same representation power as standard ANNs with a sigmoid activation function.Focusing on networks with a layer of n input nodes, an internal layer of m nodes, and a single output node, an ANN implements a function f : R n −→ R, which can be written as Here, σ is the usual sigmoid activation function, and θ i ∈ R, y i ∈ R n are the network biases and weights, respectively.It was shown in [27] that sums of the form (60) are dense in the space C(I n ) of continuous functions on the unit hypercube I n = [0, 1] n ⊂ R n with the uniform (supremum) norm.This means any continuous function f : I n −→ R can be approximated arbitrarily closely by an ANN with a single hidden layer (albeit with m possibly very large).In this context, we reiterate that our theoretical approach of Section II can be applied to an arbitrary sized network and also incorporate back coupling when needed.This was, in fact, demonstrated by us in a simplified network (N = 2) in our previous work [12].
As described in II, a coupled network of dynamic elements with n input elements, m internal elements, and a single output element is essentially performing the computation While the summands of (61) are somewhat different from those in (60), it can be shown using the same technique as in [27] that sums of the form (61) are also dense in C(I n ).Therefore, in principle, one can set up a dynamic network to approximate any desired computation.What is missing, however, is a systematic procedure for finding the right parameters, akin to training an ANN with backpropagation.
Once such a procedure is established, one could foresee the ability to perform complex tasks, for example, pattern recognition on data sets such as images, where pixel RGB values are encoded as constant bias signals into the input layer.

B. POWER CONSUMPTION
At this stage, some remarks on power consumption in the nonlinear circuit in Section V are warranted.A classical figure of merit for low power analog circuits is the dynamic range (DR).Dynamic range is limited by the signal amplitude which in turn, is limited by the power supply (V DD ) and the system noise floor.For the system under consideration, the dynamic range (DR) can be written as [31] DR where κ is a constant taken from the noise term in (51) and the 20U T ensures all devices generating weight elements (see Figure 8) remain in their active analog regimes.The power consumption is proportional to DR [31], [32] with equations ( 62) and (63) being strictly valid at low noise where the system gain is approximately linear.Note that thermal noise, in the circuit presented here, is the driving force behind the sampling or changes of state.From the hysteresis loops in Figure 9, it is apparent that a change greater than 60 nA (from +30 nA to −30 nA or vice versa) is required to generate a change in state (i.e. a switch).This change of state represents a ±30 nA change in the output current I , which at 5σ , suggests that the noise standard deviation will be on the order of 12 nA.As a result, the DR can be defined as the state change divided by the noise standard deviation; in this specific example, one obtains a DR = 20 log(60/12) = 13.98 dB.Therefore, the desirable DR should be approximately 10 − 16 dB.Note that too large a noise signal can distort the system, resulting in unreliable results, whereas a noise signal that is too small will not produce a result at all.Hence, an optimal DR likely exists; however, determining its exact value for this system is beyond the scope of this study.This behavior is in contrast to classical analog systems, such as analog-to-digital converters, where the desired DR's are 60 dB to 150 dB or higher.
From a hardware perspective, there are many other hysteretic, nonlinear elements beyond CMOS circuits that are more susceptible to noise, and thus might be suitable for this method of computation.These include a range of nonlinear devices such as superconducting elements [28], memristors [29] and ferromagnetic nanowires [30].Once the circuit is constructed using one of these devices as the nonlinearity, considerations such as the appropriate amount of noise necessary for the computation, determination of the coupling coefficients for a desired transform, etc., can be explored within this framework, using the theory as a guide.
All in all, the creation of an analog computing environment, due to a lower DR, is likely to produce computations that can accomplished with less power than traditional analog approaches, not to mention being able to function in noise environments unsuitable for classical digital computing.The application of this method is more suited to systems where the system switching is driven by the combination of the ''hidden'' digital signal and noise contained within the signal and power constrained hardware, e.g.quantum mechanical systems including spintronics, or systems based on SQUID devices.

C. NOISE
Since the noise floor has been an important component of the dynamics addressed in this paper, some general (noiserelated) closing remarks are in order.The noise terms N i (t) in ( 1) are assumed to arise internally (e.g.thermal noise).However, in real-world applications there could also be external noise attached to an input signal (e.g., the dc inputs ε i ), and in some sensor applications of an array of the form (1) there would be other sources of noise.As an example of the latter, we can envision an array of coupled magnetic field sensors which would be susceptible to atmospheric magnetic noise arising from variations in the terrestrial magnetism, lightning strikes, seismic activity, etc. Environmental monitoring sensors are also susceptible to external noise, because signals to be monitored are, usually, aperiodic and could easily be overlapped by external broadband noise.The (external) noise effects usually have to be dealt with on a case-by-case basis, for example during design time, for characterization purposes.In the system studied in this paper, one could have an additional (additive) noise term in one or all of the equations; this leads to additional diffusion terms in the FPE (17).As long as the additional noise terms are additive, the ensuing FPE and its solution will have a similar structure to what we obtained in Section II.Their effects on the switching dynamics of the array and the realization of the XOR can be studied as a function of the different noise standard deviations; these effects are, of course, intimately connected to the nonlinearity and coupling strengths (i.e. the matrix J ik ).In this paper the noise terms in each equation are taken to represent a combination of internal noise, as well as fluctuations superimposed on the applied dc signals.For the sake of the phenomenology discussed in this paper, this is a sufficient treatment.
In a nonlinear device with readout dependent on switching between stable states, a reasonably large switch rate is desired to achieve reasonable sampling of the output in a short observation time.This is especially true when these devices are used in computation/logic circuits.In general, the switch rate is governed by the energy barrier height, applied external power, and noise floor intensity.In a coupled system, as described in this work, the effective energy barrier of the readout element is a function of the input signals, the coupling coefficients, and the noise floor standard deviation in each element, evident in our N = 5 system, from equations ( 40)-(42).Subsequently, the net power required to switch the readout element can be adjusted by adjusting these (deterministic) parameters.Adjusting them effectively adjusts the noise floor through their effects on the energy barrier; in turn, this changes the switching rate.This is analogous to the stochastic resonance scenario [11], wherein such an adjustment has been referred to as ''tuning'' to the noise.In our coupled system, the readout element can be made to operate efficiently at a lower dynamic range.Thus we can speculate that in systems/devices with larger noise floors, for example Resistive Random Access Memory (RRAM) [33] arrays, the background noise might significantly enhance the array performance, with careful parameter adjustments; however, too much noise could degrade performance.

VI. CONCLUSION
We have demonstrated the potential viability of a novel method of computation using a set of coupled noisy nonlinear elements, configured to yield XOR logic.This method represents a new way of generating neural-like dynamics, in the sense of both computational neuroscience and machine learning.Theoretical calculations and basic CMOS modeling of the system are in good agreement for the parameter regimes wherein the XOR can be realized.While scaling this technology up to practical use would require significant additional investment in fabrication technologies or simulation capacity, our results validate a powerful method of attacking a large set of problems.Optimization of the coupling parameters for any required scenario is virtually impossible through numerical simulations alone, due to the sheer number of parameter combinations available.However, with a theory that agrees with simulations and the circuit replication of our coupled system, such an optimization can be more efficiently achieved from a physically-informed starting point.
The overall good agreement of the theoretical results with the CADENCE simulations comes with an important caveat, however.CADENCE, while convenient, is slow, and is sometimes unable to faithfully reproduce all the desired circuit functionalities and characteristics, particularly when the circuit models contain complex dynamics.Other errors can arise from the circuit hardware not reproducing complex functions precisely.For instance, we know that the tanh(y) function is not properly reproduced in the circuit (the deviations occur mainly for large ±y); this likely accounts for the small differences in the theoretical and experimental hysteresis loops in Figure 9.
The above remarks must be tempered by the observation that real-world circuit parameters are never exact; this is well-known in the circuit repertoire.This discrepancy can be addressed, theoretically, by allowing the system parameters (e.g. the dc inputs ε i ) (16) to have small deviations (with an appropriate distribution) about a mean value.Then, for example we might have ε i = εi + δε i with the first term representing the deterministic value as used throughout this paper, and the second term representing small parameter deviations occurring in the circuit.At the end of the 144322 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.calculation, one would carry out an additional average over this distribution of the deviations.This process can get quite complicated/tedious if many adjustable parameters deviate in this manner from their expected values.A far more complicated situation can occur if the system parameters (e.g. the coupling coefficients) that are attached to functions of the dependent variable are actually affected by circuit noise leading to them being time-dependent and having their own correlation functions similar to (4).In this case one has multiplicative or ''state-dependent'' noise terms due to products of the fluctuating coupling coefficients and functions of the state variables y i .The analytic treatment of Section II becomes far more complicated due to the FPE (17) having state-dependent diffusion terms (these are the second order terms on the rhs of ( 17) [23]).The above scenarios are, however, beyond the scope of this paper.
Finally, we be note that a neuromorphic system such as the one discussed in this work is inherently less sensitive to noise (or might actually take advantage of cooperative behavior mediated by the noise-floor), and can, in fact, be designed to operate in noisier environments.It is, therefore, reasonable to conjecture that our methodology may fit applications wherein the noise-floor might otherwise be detrimental to performance, better than traditional approaches.As just one example, we cite the case of wireless communication with drones where we could anticipate receiving noise contaminated data [34].
Further work will help to determine whether the highly distinct energy cost profiles associated with this method will enable truly low-power computation by taking advantage of its atypically reduced DR requirements, especially in environments with noisy signals or in technologies where the inherent noise is higher.

FIGURE 2 .
FIGURE 2. Transition to monostability.The dashed curves depict the transfer characteristic A(y 1 ) corresponding to the critical cases (inequality replaced by equality) of equations (46) (bottom) and (47) (top).Middle (solid curve) corresponds to C 1 = 0 (symmetric bistability).The critical points (switching curve touching the horizontal axis) are y p (bottom curve) and y m (top curve).

FIGURE 4 .
FIGURE 4. Four contour plots of C 1 (as in Figure3) on the same scale showing the effect of changing noise intensity in the first two layers.The ''baseline'' case (top panel) has noise variances σ 2 i = 100 for all elements i .Contours labeled ''0'' are the locus of symmetric bistability, while the dashed contours mark the transition to ''complete'' monostable density functions.Other parameters as in Figure3.See text for further details.

FIGURE 6 .
FIGURE 6. Prototypical circuit C ij with tanh generating block in red.When i = j , the dotted lines are connections and V od = V id , representing the self-coupling coefficient J ii .

FIGURE 8 .
FIGURE 8. Simplified circuit representing a general element of layer 1 and 2. The left most cell is the diagonal tanh element providing self-coupling, and the right two are the off-diagonal elements providing coupling from the previous layer.Pin and current labels within the figure are specific to element v 1 .Green arrows at the top represent current flow when all biases I E i = 0.

FIGURE 9 .
FIGURE 9. Hysteresis loops obtained from Cadence circuit simulation (red/brown) and numerical integration of (56) (blue/green).The XOR=''TRUE'' case corresponds to the loops on the right (green/brown) which have only one crossing of vertical axis (monostable probability density function).The loops on the left (blue/red) have two crossings of the vertical axis (bistable probability density function), corresponding to XOR=''FALSE''.The unstable segments (dotted) can be obtained via the theory, however they are not realized in the experiments.See text for additional details.
If we encode TRUE as +1 and FALSE as −1, we can define an analog version of an XOR gate acting on real-valued variables as any continuous function f : R 2 a 1 − J 11 sech 2 y 10 )