Master Memory Function for Delay-Based Reservoir Computers With Single-Variable Dynamics

We show that many delay-based reservoir computers considered in the literature can be characterized by a universal master memory function (MMF). Once computed for two independent parameters, this function provides linear memory capacity for any delay-based single-variable reservoir with small inputs. Moreover, we propose an analytical description of the MMF that enables its efficient and fast computation. Our approach can be applied not only to single-variable delay-based reservoirs governed by known dynamical rules, such as the Mackey–Glass or Stuart–Landau-like systems, but also to reservoirs whose dynamical model is not available.


Introduction
Reservoir computing is a neuromorphic inspired machine learning paradigm, which enables high-speed training of recurrent neural networks and is capable of solving highly complex time-dependent tasks.First proposed by Jaeger [1] and inspired by the human brain [2], it utilizes the inherent computational capabilities of dynamical systems.Very recently, the universal approximation property has also been shown for a wide range of reservoir computers, which solidifies the concept as a broadly applicable scheme [3].Bollt pointed out a connection between reservoir computers and VAR (vector autoregressive) and nonlinear VAR machines, which may be one of the reasons behind the surprising efficiency of reservoir computers for timedependent tasks [4,5].Many different realizations [6][7][8][9][10][11][12][13][14][15][16][17][18] have shown the relevance of reservoir computing to practical applications, while analytical and numerical analyses [19][20][21][22] help in building understanding of its working principles and improve its performance.Motivated by fast inference and low energy consumption, optoelectronic and optical hardware implementations of reservoir computers are often realized [23][24][25][26][27][28][29][30][31][32] indicating a high future potential of such processing unit.
Originally, reservoir computing is performed with a network of nonlinear nodes, which projects the input information into a high dimensional phase space, allowing a linear regression to linearly separate features [1].In timedelayed reservoir computing, a single dynamical node with delayed feedback is employed as a reservoir instead of the network [33].The time-multiplexing procedure allows for such a single-element system to implement a recurrent ring network [33][34][35], see Fig. 1.The absence of the need for a large number of nonlinear elements significantly reduces the complexity of the reservoir hardware implementation.Existing experimental and numerical realizations show promising results in solving time-dependent tasks, such as speech recognition, time-series predictions [31,[36][37][38][39][40][41][42][43] or equalization tasks on nonlinearly distorted signals [44].For a general overview, we refer to [45][46][47].
Often reservoirs are optimized to a specific task by hyperparameter tuning, which defeats the purpose of reservoir computing as a fast trainable machine learning scheme.Dambre et al. [48] introduced a task-independent quantification of a reservoir computer, building on the memory capacity notion already introduced in [1] whereas a high memory capacity pinpoints to generally well-performing reservoirs.
In this paper, we provide an analytical tool for finding promising reservoir setups by introducing a master memory function (MMF) for delay-based reservoir computing with a small input.The MMF allows for fast computable predictions of the linear memory capacity and it indicates that linear memory capacity of reservoirs is similar for systems with similar linearizations.
The main idea behind our method can be outlined as follows.Consider a delay-based reservoir described by a general nonlinear system ṡ() =  ((), ( −  ), ()), where () is an input signal, which is "small" in a certain sense, and () determines the state of the reservoir.The response () of the reservoir must be independent (at least to some extend) on its initial state, the property known as echo state.Such a situation occurs when the reservoir is operating near an equilibrium state  * that is stable in the absence of the input signal.Therefore, all reservoir dynamics takes place in a neighborhood of this equilibrium and as a result, the reservoir linearization  ṡ() = () + ( −  ) + () approximates these dynamics.Here  is the deviation from the equilibrium.In the considered case of the single-variable reservoir, the scalar parameters  and  are the only determining quantities.The relatively simple form of the linearized system allows us to obtain an analytical expression for the linear memory capacity, which depends on the parameters  and  and thus parametrically determines the linear memory capacity of any reservoir with the above properties.We call the obtained function MMF due to its universal features, i.e. different reservoir computing setups, which possess the same linearizations, yield the same linear memory capacity given by MMF.
The paper is structured as follows.First, we will briefly revise the concept of time-delay-based reservoir computing and the concept of linear memory capacity.We will then present our main analytical result while additionally presenting an example code for an efficient evaluation of the obtained expression; the derivation is given in the appendix.Finally, comparisons of numerically simulated reservoir computer performance with the semianalytical approach are provided.We also show in Sec.4.6 the applications of our results to reservoirs with unknown dynamical model, where the parameters  and  are evaluated using the system response to an external stimuli.

Time-Delay based Reservoir Computing
Reservoir Computing utilizes the intrinsic abilities of dynamical systems to project the input information into a high dimensional phase space [1].By linearly combining the responses of the dynamical reservoir to inputs, a specific task is approximated.In the classical reservoir computing scheme, often, a so-called echo state network is used by feeding the input into a spatially extended network of nonlinear nodes.Linear regression is then applied to minimize the Euclidean distance between the output and a target.This approach is particularly resourceful for time-dependent tasks because the dynamical system which forms the reservoir acts as a memory kernel.
In the time-delay-based reservoir computing scheme [33], the spatially extended network is replaced by a single nonlinear node with a time-delayed feedback loop.The time-multiplexing procedure with a periodic mask function  is applied to translate the input data to a temporal input signal.Similarly, the time-multiplexing procedure translates the single temporal high-dimensional reservoir response to the spatio-temporal responses of virtual nodes.The virtual nodes play the same role as the spatial nodes in echo state networks.
A sketch of the delay-based reservoir computing setup is shown in Fig. 1.In the following, we will give a short overview of the quantities and notations used in this paper.We also refer to our previous works [49][50][51] for a detailed explanation of how the reservoir setup is operated and task-independent memory capacities are computed.
Let us briefly remind the main ingredients of the time-multiplexed reservoir computing scheme [33,[49][50][51].We apply an input vector u ∈ R  componentwise at times  ∈ [ −1 ,   ),   =  ,  = 1, . . ., ,  being the number of sample points.The administration time for different inputs  +1 −   =  is the same and it is called the clock cycle  .To achieve a high dimensional response to the same input, a  -periodic mask function  multiplies the input and the resulting signal enters the system (see Fig. 1 and Fig. 2).The mask  is a piecewise-constant function on   intervals, each of length  =  /  corresponding to   virtual nodes.The values of the mask function  play the same role as the input weights in spatially extended reservoirs, with the difference that time-multiplexing distributes the weights over time.The responses of the reservoir are collected in the state matrix S ∈ R  × R   , see Fig. 3.The elements of the state matrix are [S]  = ŝ( + ) with  = 1, . . .,   , and  = 1, . . ., , where ŝ(( + )) ∈ R is the state of the dynamical element of the reservoir at time ( + ) shifted by the mean over all clock cycles ŝ( + ) = ( + ) − ⟨(•  + )⟩, see [48].The average ⟨(•  + )⟩ can be understood as the averaging over the row elements.For example, for an experimental or numerical realization of the reservoir with a semiconductor laser, () could be the laser intensity.
A linear combination of the state matrix is given by Sw, where w ∈ R   is a vector of weights.Such a combination is trained by ridge regression, i.e., the least square approximation to some target vector ŷ where ‖ • ‖ 2 is the Euclidean norm, and  is a Tikhonov regularization parameter.The solution to this problem is In the case of invertible S  S, the matrix (S  S) −1 S  is the Moore-Penrose pseudoinverse.We set  = 10 −6 • max  (S), where max  (S) is the largest state response in the state matrix S. To quantify the system's performance, we use the capacity (see [48,49]) C ŷ to approximate a specific task which is given by where NRMSE is the normalized root mean square error between the approximation y = Sw and the target ŷ where var(ŷ) is the variance of the target values ŷ = (ŷ 1 , . . ., ŷ ).

Number of
Inputs  (4) Fig. 3: State matrix S corresponding to the timeline shown in Fig. 2 with   = 5.

Reservoir Computation Quantification
Here we introduce the linear memory capacity as a quantitative measure for the memory kernel of a dynamical system.

Memory Capacity
The central task-independent quantification was introduced by Jaeger in [1] and refined by Dambre et al. in [48], which yields that the computational capability of a reservoir system can be quantified via an orthonormal set of basis functions on a sequence of inputs.Here we give a recap of the quantities introduced in [49,50] and focus on the linear memory capacity.
In particular, the capacity to fulfill a specific task is given by which can be derived from Eq. (2) (see [48,49]).The capacity equals 1 if the reservoir computer computes the task perfectly, thus y = ŷ; and it equals  = 0 if the prediction is not correlated with the target.In between 0 and 1 if it is partially capable of fulfilling the task.To quantify the systems capability for approximating linear recalls of inputs, an input sequence {} = { − , . . .,  −3 ,  −2 ,  −1 } is applied, where   are uniformly distributed random numbers, independent and identically drawn in [−1 , 1].With the input sequence {} of random numbers, the reservoir response is collected in the state matrix S.
To describe a linear recall task of  steps into the past, the target vector ŷ is defined as which is the linear recall  steps into the past.Formally, one considers an infinitely long sequence  → ∞.To approximate it numerically, we use  = 75000.The linear memory capacity MC is defined as the sum over the capacities of all possible linear recall tasks where   =  ŷl is the capacity of the -th recall into the past.This quantification is task independent and thus implications for specific applications cannot be given.Different tasks may need different specific capacities.The measure MC thus only gives a hint for well-performing reservoirs in the context of using the full scope of the given reservoirs, rather than a direct task-specific estimate.We have to point out, that the linear-nonlinear trade off is a well-known effect [48], thus a system with high linear memory capacity can yield a low nonlinear transformation capability.Nevertheless, we believe predicting a well-performing linear memory kernel reservoir is beneficial for a general reservoir computer setup, as higher nonlinear memory transformation can be utilized by adding additional reservoir systems with increased perturbations.

Reservoir Systems
As one example system, we use a Stuart-Landau oscillator with delayed feedback: Here, () describes the real-valued amplitude of the system,  is the feedback strength,  the delay time,   is a parameter determining the dynamics, and  the input strength of the information fed into the system.For   +  > 0 and  = 0, system (8) has only the trivial equilibrium  = 0, and for   +  < 0, additionally, the nontrivial equilibria exist ( * ) 2 = −(  + )/, which appear in a pitchfork bifurcation at   +  = 0.The linearization at the nontrivial equilibria (taking into account the input term) reads where  = −2  − 3 =   + 3( * ) 2 ,  = , and  =  * .As the second example, we use the Mackey-Glass system where () is the dynamical variable, (− ) is the delayed variable, and    , , and  are control parameters.The reservoir input is fed into the system via the term ().We set  = 1, for which the system possesses a stable nontrivial equilibrium  * = −(   + )/   (for  = 0).The corresponding linearization at this equlibrium is Eq. ( 9) with  =    ,  = /(1 +  * ) 2 , and  =  * .

Analytic description of memory capacity
From Eq. ( 5), we see that the capacity to approximate a specific input is given by the inverse of the covariance matrix (︀ S  S )︀ −1 (corrected by I), also called the con- centration matrix, and the matrix multiplication of the state matrix and the target S  ŷ.Thus, it is necessary to derive the state matrix S from the responses to the small perturbations of the system.This has already been done for 1-dimensional reservoirs with  =  by an Euler step scheme [22], and for 1-dimensional reservoirs with  ̸ =  for specific differential equations [51].We would like to extend this knowledge by analyzing arbitrary systems and  ̸ =  .We assume the virtual node distance  to be small and  = , with  ∈ N + .We also assume the operation point of the reservoir to be a stable equilibrium.We will exemplarily validate our analysis on the two 1-dimensional nonlinear reservoirs given by Eqs. ( 8) and (10).
Our main result is the modified state matrix S, that we can use to determine the MC while we can calculate it solely from the linearized system.The entries where , the parameters  and , and  are given by the linearization (9),   are the weights of the time-multiplexing.The index  corresponds to the -th clock cycle, and  to the -th virtual node.The rows of the modified state matrix contain entries in the statistical direction of the -th shifted input.As we show in App.A, the covariance of the modified state matrix approximates the original state matrix S S = S  S.
Moreover, we also show that the full linear memory capacity can be calculated by using solely the modified state matrix and the capacity of the -th recall is given by where S is the -th row of S. Details of the derivations can be found in App. A. We call the memory capacity given by Eq. ( 13) the Master Memory Function (MMF).For given parameters of the linearization , , and , as well as the mask coefficients   , this function can be evaluated in a much more efficient way than the direct evaluation of the linear memory capacity via a stepwise integration of the differential equation.A speed comparison is given in App.B. The new approach does not require calculating the reservoir, and it does not involve the input sequence   .

Efficient numerical evaluation of the memory capacity and the modified state matrix
The obtained approximations of the modified state matrix (11) and memory capacity function (13) allow for efficient numerical evaluation.For this, we propose the following scheme, which we also show as pseudocode in Alg. 1. First, we iterate over all entries of the modified pascal's triangle given in Fig. 6, which can be done by two nested loops , .We do this until all entries in a row  are below a given threshold  for  +  =  for ,  ∈ N (see Fig. 6).The threshold  ensures that we cut unnecessary terms smaller than the regularisation parameter .A third loop  goes over all virtual nodes   adding the result (︀ +  )︀     multiplied with the corresponding weight  ++ mod   to all corresponding entries s⌊(++)/  ⌋, , that thus lie in the same input interval .See App.A for more information.The algorithm to compute the modified state matrix S is given below, where ⌊⌋ is the floor function rounding  down to the greatest integer less than or equal to , getBinomialTerm(i,j,p) returns (︀ +  )︀     and  is the mask weight vector of length   .The implemented C++ code can be found in the supplementary.

Direct simulation of the reservoir and memory capacity
Simulations have been performed in standard C++.For linear algebra calculations, the linear algebra library "Ar- madillo" [52] was used.To numerically integrate the delaydifferential equations, a Runge-Kutta fourth-order method was applied, with integration step Δ = 0.01 in dimensionless time units.First, the system is simulated without reservoir inputs, thus letting transients decay.After that, a buffer time of 10000 inputs was applied (this is excluded from the training process).In the training process,  = 75000 inputs were used to have sufficient statistics.Afterward, the memory capacities  l of linear recalls were calculated with Eq. ( 5), whereby a testing phase is not necessary.The linear memory capacity MC was calculated by summing the obtained capacities  l .For the piecewiseconstant  -periodic mask function () independent and identically distributed random numbers between [0, 1] were used.
For all simulations, the input strength  was fixed to 10 −3 .The small input strength was used to guarantee linear answers of the reservoir and, hence, the relevance of the approximation.
A program written in C++ to perform the semianalytic calculations is given in the supplementary material.

Comparison of MMF and direct numeric calculations of the memory capacity
In this section we illustrate the MMF effectiveness.First, we show that the MMF provides a very good approximation of MC using the reservoir given by Eq. ( 8).The approximation works quite well as long as  is relatively small.This is fulfilled for typical reservoir computing setups, as one would otherwise lose computation speed.In the second part, we show how MMF provides a universal, system-independent characteristics.For this, we compare MMF with the memory capacities of different reservoirs.
Each particular reservoir realization is described by one parameter combination of the MMF.In the last part, we describe how MMF can be computed for reservoirs with unknown dynamical rule.For this, the parameters  and  of the linearization are measured from system's response to a small periodic input.Figure 4 shows the memory recall capacity   obtained from direct simulations and compares it with the MMF for 4 different cases of the Stuart-Landau system, given by Eq. ( 8).The exact parameters are given in the caption of Fig. 4. The directly simulated results are shown by blue solid lines and blue markers , whereby green dashed lines and green markers show the MMF.For a small virtual node distance  = 0.5 in Fig. 4(a,c), the MMF predicts the linear memory capacity very accurately.For a higher value of  = 1.6 (Fig. 4(b,d)), the accuracy drops, though the results are still accurate for qualitative predictions, and describe the general trend of the system's memory capacity.
The scans in Fig. 4(c) and 4(d) were done with a higher delay time  = 3.06 , which induces memory gaps [50].Even though the memory capacity has a complex dependency on  at these parameter values, the prediction for the two different virtual node distances  = 0.5 and  = 1.6 is still accurate.
A 2-D parameter plane was simulated in App.E to show that the predictions of the MMF work for arbitrary parameter setups, thus the general predictability of the new scheme is very promising.
Comparing the computation speed of the classical numerical intergration and the new proposed scheme shows an increase of 2 to 3 orders of magnitude, depending on the operation point, the number of training steps  and the value of the clock cycle  .A higher clock cycle  and more training steps  increase the simulation time for the direct numerical integration, whereas the new proposed scheme is independent of that.If the operation point is close to a bifurcation, the convergence of the new proposed scheme is slower, increasing the computation time needed.Still, even very close to the bifurcation line, the computation speed is significantly higher (with a factor of about 100) making the MMF a valuable tool.See App.B.

Universality
An exciting result that follows from the MMF concept is the possibility to generalize to arbitrary time-delay-based reservoirs.Every reservoir with a similar linearization should yield similar linear memory capacity.To illustrate this, we compare the Stuart-Landau reservoir system given by Eq. ( 8) and the Mackey-Glass reservoir system given by Eq. (10).
The inset of Fig. 5 illustrates this fact.It shows the capacity to recall the -th step into the past   as a function of  for the Stuart-Landau (blue), the Mackey-Glass (red), and the MMF given by Eq. ( 14) (green).Both systems are tuned such that their respective linearization yield the same parameters  and .
From this it follows that it is enough to compute the linearization parameters  and  to predict the MC of any arbitrary delay-based reservoir computer.The color plot in Fig. 5 shows the MMF given by Eq. ( 13) for different parameter values  and .A well-performing operation point seems to be the edge to instability, agreeing with the known rule of thumb from the literature.Any reservoir yielding the same linearization parameters  and  in (9) must possess the corresponding memory capacity as given by Fig. 5 for these values of the parameters, as soon as the input is sufficiently small.
It thus follows that analyzing the Jacobian (linearization given by Eq. ( 9)) for fixed delay  , virtual node distance , and number of virtual node   is sufficient to predict the linear memory capacity of any arbitrary time-delay-based reservoir computer, and this memory capacity is given by MMF via Eqs.( 13) and ( 14).

Systems with unknown dynamics; small signal response approach
In this chapter, we show an experimentally accessible approach for measuring the parameters  and  for a delay system whose dynamical equations of motion are not known and which can be described by a single variable.The corresponding linearized dynamical system is given by The goal is to measure  and .This can be achieved by perturbing the system with a harmonic periodic signal () =  0 sin().When this signal is small, we can consider the perturbed linearized system where the complex form is chosen for simplicity.Due to linearity, the real solution is obtained simply by taking the real part.We consider the case of real  and , which holds always when the reservoir variable is real.Since the homogeneous solution decays to the stable equilibrium (we assume its exponential stability), the solu-tion of Eq. ( 16) converges to the particular solution, given by   () =  0  −1 ()  (17) with  −1 () =  −  −  − .The ratio of the output to the input amplitude equals to the transfer function where |()| can be measured.
To determine the parameters  and , it is sufficient to measure the transfer function at two frequencies, for example, at   = 2/ and   = / .The first frequency is resonant to the delay while the second is in 'anti-phase' to the delay  .It holds From above we can obtain the values for  and where the values of  (  ) and  (  ) can be obtained experimentally or numerically by perturbing and measuring the response of the reservoir.We remark, that the choices of the resonant and antiphase perturbation frequencies are convenient, but not unique.Clearly, one can perturb at other frequencies to obtain  and .Moreover, the above idea can be generalized to the case of complex-valued parameters  and , whereby more frequencies must be tested.
The measured values of the parameters  and  for a reservoir with unknown dynamics can be then simply used in MMF by estimating the linear memory capacities.

Conclusions and discussion
We have developed a simple and fast method for calculating the linear memory capacity for time-delay-based reservoirs with single-variable dynamics.The method allows the construction of a modified state matrix whose columns point in the direction of the linear recall steps.
Our results can be used to predict the reservoir computing setup with high linear memory capacity.The nonlinear memory capacity, on the other hand, remains an open question.In this case, combined setups could be used, where a delay-based reservoir computer includes multiple uncoupled subsystems.The decoupling ensures that no highly complex dynamical responses destroys the computational performance of the reservoir.One timedelay-based reservoir computing subsystem can be tuned to low perturbations at the edge of instability to act as a high linear memory kernel.Increasing the perturbation strength for the other subsystems will ultimately increase the nonlinear responses and thus the nonlinear memory capacity, so that the subsystems with high input strengths take on the role of high nonlinear transformation kernels.
A teamwork setup is thus recommended, where one or a few subsystems perturbed by small inputs and operated close to instability act as a linear memory kernel.In contrast, other nodes are perturbed more strongly and thus act as highly nonlinear transformation units.Such a setup should be capable of tackling a wide range of different tasks.It would be interesting to investigate this in future works.
One of the advantages of the delay-based reservoir, which allows the introduction of the MMF, is that it contains a small number of system parameters while the dynamics remains infinite dimensional.In the case of a small input signal and single-variable dynamics, these are only the linearization parameters  and .Thus, if the linear memory capacity is computed for all possible values of these two parameters, it covers the case of all possible reservoirs.This procedure could be difficult, if not impossible, for network-based reservoirs, where the systems parameters may include, e.g., multiple coupling weights.
Denoting   () = ( −1 + ) to be the function on the interval [ −1 ,  −1 + ], with  ∈ (0, ), we rewrite equation (23) as where we additionally used the relation  −1 () =   (0) and  − =   ( −  ),  =  /.By evaluating Eq. ( 24) at  = , we obtain Denote   :=   () =  * +   (), which is the approximation for state of the reservoir (21) at the virtual nodes ().From (25), we obtain Further, we approximate the integral from Eq. ( 26) by assuming  − () ≈  − () =  − .The approximation holds, in particular, when  is small.The obtained expression represents a discrete map (coupled map lattice) for approximating the state matrix S.Here ŝ* = (1 If considering it as a corresponding network with the nodes   , see e.g.[34,35,53], we see that the node   is coupled with the two nodes  −1 and  − in a feed-forward manner with the coupling weights  and , respectively.The schematic representation of such a coupling structure leads to a Pascal's triangle shown in Fig. 6.The first row of the Pascal's triangle from Fig. 6 shows the dependence on   , which is simply the multiplication by .In the second row, the contributions of  −1 and  − are shown.To obtain these dependencies explicitly, we insert  −1 and  − recursively in (27): that is, we obtain the terms  −1 and  − .To build up further intuition about the dependence of the state matrix on the input, we show here the third level by substituting recursively  −2 ,  −−1 , and  −2 into Eq.( 28): To obtain a general recursive formula, we need to split the index in the appearing terms as  −  −  , where  corresponds to the delayed ('right', ) and  to the 'left' () connections in the coupling network in Fig. 6: where  1 is a constant depending only on  and .For an infinitely long input sequence, the sum in (30) goes for all ,  from 0 to ∞. Practically, the sum is considered for the available data   .As a result, the reservoir states   are composed of a linear combination of the inputs with corresponding coefficients given in Eq. ( 30).The elements of the state matrix S used in the reservoir computing setup are where ⟨ •   + ⟩ is the average over the input intervals Here and later, the dot denotes the index, over which the averaging is performed.Taking into account Eq. ( 30), we obtain The input   of the reservoir computer is given by the discrete input sequence u multiplied by the input weights: Therefore, we obtain since u has zero mean.Hence, we have for the elements of the state matrix Correspondingly, the elements of the covariance matrix S  S from ( 5) are and they describe the covariance of the virtual node  with virtual node  ′ over all clock cycles .By substituting (33) into (34), we obtain where the second summation range (*) is taken over all values of , ,  ′ ,  ′ such that  +    ≤  +  <  +   ( + 1) and  ′ +    ≤  ′ +  ′  <  ′ +   ( + 1).
The obtained expression (37) does not depend on the sequence   and hence, provides a significant simplification for calculating the covariance matrix.We may further notice that the same covariance (37) can be obtained by defining the modified state matrix S = s , where  is the the −th interval of the shifted input (the -th recall) and  the -th virtual node.s is given by the sum over all combinations , , that fall into the same shifted input interval , i.e. s =  √ 3 (38) This is our main result, because Eq. ( 38) defines the modified state matrix S from which all capacities   are derivable.More specifically, we have shown Further, for the -th recall, where the target is the shifted input sequence ŷ = { − } ∞ =1 , we have therefore, it holds where S is the -th row of S. Further, we notice that and ‖ŷ  ‖ 2 ≈ 1/3.As a result, taking into account the definition of the memory capacity (5), we obtain the approximation for the capacity of the -th recall   by The results can be understood in such a way, that we constructed the modified state matrix S, such that every column has entries in the statistical direction of the -th shifted input recall.The full linear memory capacity is then given by the trace

B Computation Time
To compare the computation speed of the full numerically simulated differential equation and our new analytic approach, we simulated both systems.The full system with a time-step of  = 0.01, buffer samples of 10000, i.e. that 10000 clock cycles were simulated and discarded, and 50000 training samples to get high accuracy on the memory capacity.The analytic program was calculated until all values in a row in pascals triangle were below 10 −6 •max  (S).We compared the simulation speeds of both approaches on a parameter linescan of the linearization parameter , scanning from values close to the bifurcation value   in which the linearized system destabilizes up to values of about 0.1 greater than the bifurcation value   .We show the percentage of the simulation time for the analytic approach   in comparrison to the simulation time   , i.e.   ∖   % in Fig. 7.We see that close to the bifurcation the analytic approach increases in computation time.This comes from the fact, that the convergence of pascals triangle close to the bifurcation is slower.Still, the simulation time is at maximum 4% of the fully simulated system, showing at least a 25-fold increase in computation speed.

C Range of Approximation
We would like to show the range of approximation for the new analytic approach by computing the memory capacity   of the fully simulated system    and the analytic approach    by showing the relative memory capacity of the analytic approach to the full system, i.e.    ∖    .The results are shown in Fig. 8 plotted over the input strength  for six magnitudes of order.A result close to 1 indicates a good agreement of the simulation and the analytic approach.For high input strengths, starting at around  = 10 −2 , the analytic approach overestimates the real memory capacity, because high values of  induce nonlinear answers in the system and thus increase the nonlinear transformations of the reservoir in exchange for linear memory.See [48][49][50] for more information on that effect.

D 𝜒 2 𝑘 Estimation
We give a short insight into the  2  estimation introduced in [48].When calculating capacities  l , all below a fixed value  * were excluded because of finite statistics, where  * is given by the following relation.CFD( 2 (  ,  * )) is the cummulative distribution function of the  2 function and  * is chosen such that 1 − CFD( 2 (  ,  * )) yields a probability   2 = 10 −6 , i.e. the probability of a capacity having a value greater than  * even though with infinite statistics ( − → ∞), it would have a value less than  * . 2 is the probability density function of the sum of squared independent, standard normal random variables See [48] for more information.

E Broader Parameter Range Check
A 2-parameter characterisation of the memory capacity of the Stuart-Landau system (8) is shown in Fig. 9.The parameter space is spanned by the pump   and the feedback rate .
Small relative differences of up to 0.08 are seen for the simulations presented here for  = 1.One has to remember that reservoir computing is usually done with very small .
The work [1] introduced a rough estimate of the optimal value for  as  ≈ 0.2 ans , where  ans is the linear answer timescale of the system.In the case of the Stuart-Landau system, this is given by  ans = −2  − 3.For the parameter space in 5 bigger than the proposed value given in [1] for optimal virtual node distance.In our approximation, we assume a constant state value on one -interval, thus  = 1 is a very high value, which is one of the reasons for the deviations.
To underline that the MMF (13) gives a reasonable estimation of the memory capacity, we also calculated the 2dimensional correlation coefficient RV(,  ) between the directly simulated total linear memory capacity MC direct and the linear memory capacity MC MMF given by MMF in the 2-dimensional plane of the pump   and the feedback rate  parameters.RV(,  ) is the generalization of the squared Pearson coefficient for two dimensions and is calculated via: RV(,  ) = COVV(,  ) √︀ VAV()VAV( ) (45) with COVV(,  ) = tr(Σ  Σ   ) ( 47) Here, () is the expectation value, Σ  denotes the centered covariance matrix of the matrices  and  , COVV(,  ) denotes the trace of the matrix multiplication of Σ  Σ   and VAV() the trace of the matrix multiplication Σ 2  .Calculating RV over the parameter range shown in Fig. 9 yields a value of  (MC direct , MC MMF ) ≈ 0.99925.The correlation is close to the maximum of 1, allowing us to make accurate predictions of high-performing reservoirs with the MMF.

Fig. 1 :
Fig. 1: Scheme of delay-based reservoir computing.Important timescales are marked in green:  input cycle,  delay,  virtual node seperation time.

Fig. 2 :
Fig. 2: Exemplary timeline sketch for time-delay based reservoir computing.Three input intervals of length  are shown for the inputs  −1 ,   ,  +1 in red, blue and green respectively.The delay time  in multiplicatives of the -intervals is  = 7.The number of virtual nodes in this example is   = 5, thus  >  .Four system states are indicated in grey ( −1,3 ,  −1,4 ,  ,2 and  +1,4 ), where  ,2 influences  +1,4 directly via the delay time  .The pink line indicates an example trajectory, with black dots showing the measured system states i.e., the virtual nodes.

Fig. 5 :
Fig. 5: MC MMF computed by the MMF in the 2-dimensional parameter plane of the linearization parameters  and , with   = 100,  = 72,  = 0.5.At the edge to instability the performance is highest.The inset shows the MC over the recall steps  of the MMF, the Stuart-Landau and the Mackey-Glass System, at the parameter point indicated at the red cross.

Fig. 6 :
Fig. 6: Pascals Triangle showing the series contributions for  , given by all the participating equilibria  * −− .Blue boxes show participating timesteps, green boxes show multiplications by time propagating factors and red boxes give equilibria contributions for the specific timesteps.Out of convenience, we denote  = −   (1 − ) and  = (1 +   )(1 − ). denotes the fixpoint factor contributions given by  (︀ +  )︀      −− for a specific , .

Figure 9 (
a) shows the linear memory capacity while Fig. 9(b) shows the relative difference ΔMC of the MMF and the direct numerics:

Fig 9 ,Fig. 9 :
Fig. 9: Two-parameter characterisation of the memory capacity of system (8) with respect the pump   and feedback rate .(a) Total linear memory capacity of the directly simulated MC direct .(b) Relative difference ΔMC of the MMF MC MMF and the directly simulated MC direct value.The black dashed line shows the threshold of stabilization of the non-trivial equilibrium.The RV coefficient is RV(MC direct , MC MMF ) = 0.99925.The parameters are   = 100,  = 100 (corresponding to  = 1),  = 1.41 ,  = 10 −3 , and  = 0.1.

Algorithm 1 :
Calculate modified state matrixState ∈ R   × ; ′ containing mixed terms of the form     ,  ̸ =  can be approximated by zero since the random variable   is independently distributed with zero mean.