Hamiltonian and Q-Inspired Neural Network-Based Machine Learning

The goal of this study is to present a universal large-scale machine learning model based on spectral processing. By machine learning, we mean input-output mapping approximation generated by training sets. We treat tasks such as pattern recognition and classification as special problems in mapping approximation. The structures of the approximators are implemented using Hamiltonian neural network-based biorthogonal and orthogonal transformations. From a mathematical point of view, these structures can be seen as an implementation of non-expansive mappings. An interesting property of approximators is the reconstruction and recognition of incomplete or distorted patterns. The reconstruction property gives rise to a proposition of a superposition processor and reversible computations. Finally, the models of machine learning described here are adequate for processing data with real and complex values by defining Q-inspired neural networks.


I. INTRODUCTION
The problem of learning represents a key to understanding intelligence in both brains and machines [1], [2]. By machine learning, we mean here input-output mapping approximation, where nodes of approximation are given by the set of training pairs x i , y i N i=1 , x i ∈ X ⊂ R n , y i ∈ Y ⊂ R m . Hence, one aims to realize the mapping F : X → Y , where the value of such a mapping (or multivariable function f (·) for y i ∈ Y ⊂ R) is known at the training points, i.e., Classification and pattern recognition issues can be seen as an important problem in mapping approximation. It should be noted, however, that deep learning is currently driving a renaissance of interest in neural network research and applications (e.g., AI, big data, deep convolutional neural networks) [3], [4]. Such neural networks used for the realization of F(x) (1) take the form of multilayer nonlinear kernel machines. From a mathematical point of view, they set up a structure of algebraic mappings. Currently, most of learning algorithms are based on optimization procedures (often, however, without any constraints). Different forms of stochastic gradient descent (SGD) dominate optimization algorithms used today [5]- [7]. Thus, deep learning, technology that is widely used in commercial applications, can be The associate editor coordinating the review of this manuscript and approving it for publication was Ikramullah Lali .
seen as a special topic in optimization theory. Moreover, due to the massive amount of training data, these optimization methods are adapting to the evolving features of processed information [8]. It is worth noting that as a potential direction for future deep learning research, geometric deep learning methods have been proclaimed [9]. Nevertheless we claim that artificial neural networks (ANN) should constitute both universal algorithmic and physical models used in computational intelligence. However, currently, optimal architecture and implementation technology have not yet been developed. The main direction of research seems to be focused on three subjects: 1) Research on classical (non-quantum) computational models with real-valued parameters (RVNN); 2) Research on classical computational models with complex parameters (CVNN); 3) Research on quantum neural networks (QNNs), which are an alternative to quantum computers (QCs).
Note that so-called quantum-inspired (Q-inspired) neural networks are a non-quantum version of CVNN [10]- [12]. In a set of known real-valued neural networks, Hopfield-type neural networks fulfill an essential role [13], [14]. They are both physical and algorithmic models of neural computations. In this study we consider an extended model of Hopfield-type neural networks defined as follows: where: W -skew-symmetric orthogonal matrix W s −real symmetric matrix 1−identity matrix θ (x) −vector of activation functions I d −input vector ε, w 0 , η−parameters A model defined by (2) gives rise to the following types of neural networks: a) Hamiltonian neural networks (HNNs) for ε, w 0 = 0, η = 1; b) Classical Hopfield neural networks for w 0 , η = 0. c) Q-inspired Hopfield-type neural networks: where: (3) is an equilibrium equation of neural networks (2). The main purpose of this study is to illustrate that a mapping (1) at points x i can be implemented in the form of a composition of extended Hopfield-type neural network-based biorthogonal and orthogonal spectra transformations. An important feature of this model is its universality, enabling the realization of the basic functions of large-scale learning systems, such as pattern association, pattern recognition, classification, and inverse modeling. The pattern recognition feature is illustrated in this study by an example that reconstructs a distorted signal (e.g., using images). Moreover, Q-inspired neural networks, i.e., complex-valued neural networks as defined by (3), gain the computational efficiency of machine learning models. Thus, the input-output mapping approximations (1) can be augmented to complex vector spaces: x i ∈ C n , y i ∈ C m . Recently formulated models, defined as quantum machine learning (QML), are quantum algorithms [15]. Their execution requires universal QCs that are not yet available. To our knowledge, Q-inspired neural networks are currently not available as physical objects, either. Hence they should be seen as algorithmic solutions. To summarize, we proposed in this paper a machine learning model, that makes use of biorthogonal and orthogonal transformations based on spectral processing, as alternative solutions to deep learning based on optimization procedures.

II. HNN-BASED ORTHOGONAL TRANSFORMATIONS
A general description of a Hamiltonian system is given by the following state space equation [16], [17]: where: x-state vector, x ∈ R 2n v(x)−nonlinear vector field J−skew-symmetric, orthogonal matrix e.g., Poisson matrix ∇H (x) −gradient of energy The Hamiltonian function H (x) = E k + E p is the total energy (i.e., the sum of kinetic, E k , and potential, E p , energy) absorbed into the system. Because Hamiltonian systems are lossless (dissipationless), their trajectories in the state space can be quite complex and oscillatory for t ∈ (-∞, ∞). Equation (4) gives rise to the model of HNNs [18], [19] as follows:ẋ (5) where: W -skew-symmetric, orthonormal weight matrix (W 2 = −1), dim W = 2 n ∇H (x) = θ (x) −vector of activation functions (output y = θ (x)) d−input vector and Hamiltonian function: One assumes here that activation functions are passive, i.e., One can easily see that the HNN comprises compatible connections of n elementary building elements-lossless neurons. The state space description of a lossless, autonomous neuron is as follows: The HNN described by (5) cannot be realized exactly as a macroscopic-scale physical, lossless object. Nevertheless, by introducing negative-feedback loops, equation (5) can be reformulated as follows: where: w 0 > 0 1−identity matrix Due to the assumed negative-feedback loop in (8), the neural networks considered here are not oscillatory. The stable (i.e. |x| ≺ ∞) equilibrium point of network (8) sets up an orthogonal transformation: where: W 2 = −1 and y is a Haar spectrum of d y−output vector Note 1 A Haar spectrum is the result of a Haar transformation, where the transformation matrix {-1, 0, 1} is orthogonal but not skew-symmetric. On the other hand, the main challenge in HNN-based orthogonal transformation is to create the weight matrices, W , skew-symmetric and orthogonal. The most adequate mathematical framework for this task seems to be an algebraic theory of Hurwitz-Radon matrices [20]. Hence, we show how Hurwitz-Radon matrices can be used in the construction of orthogonal transformations (filters), by defining matrices W as the superposition of Hurwitz-Radon matrices. Moreover, only for the matrix W 8 does one have available eight free design parameters, w 0 , w 1 , . . . , w 7 , to synthesize any eight-dimensional orthogonal filter and to solve the following inverse problem. Thus, an eight-dimensional orthogonal transformation, referred to as an octonionic module, can be synthesized by the formula: where: a = 7 i=0 w 2 i -scaling parameter Weight matrix W 8 of octionic module: and y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 −y 2 y 1 −y 4 y 3 −y 6 y 5 y 8 −y 7 −y 3 y 4 y 1 −y 2 −y 7 −y 8 y 5 y 6 −y 4 −y 3 y 2 y 1 −y 8 y 7 −y 6 y 5 −y 5 y 6 y 7 y 8 y 1 −y 2 −y 3 −y 4 −y 6 −y 5 y 8 −y 7 y 2 y 1 y 4 −y 3 −y 7 −y 8 −y 5 y 6 y 3 −y 4 y 1 y 2 −y 8 y 7 −y 6 −y 5 y 4 y 3 −y 2 y 1 It can be seen that (12) 8 can be seen as the best-adapted orthogonal basis. The output y in (10) is a Haar spectrum of the input vector d. It is worth noting that an octonionic module sets up an elementary memory module as well. Designing, for example, an orthogonal filter using (11) and (12), which performs the following transformation: where: y [1] = [1, . . . , 1] T , i.e., synthesizing by (10) a flat Haar spectrum for the given input vectors, m, so that: yields an implementation of a linear perceptron, as shown in Fig. 1.
To summarize the basic considerations above, one can state that the octonionic module is a universal building block to realize very large-scale orthogonal filters and, in particular, memory blocks. Multidimensional, octonionic module-based orthogonal filters can be realized by using the family of Hurwitz-Radon matrices. Thus, a 16-dimensional orthogonal filter can be, for example, determined by the following matrix: where: w 8 ∈ R, W 8 -weight matrix of an octonionic module. Similarly, for the dimension q = 2 k , k = 5, 6, 7, . . . all Hurwitz-Radon matrices can be found, as: where: w K ∈ R, 1-identity matrix To conclude, one can formulate the following statements: • A q-dimensional HNN or a q-dimensional orthogonal basis can be created by a compatible connection of octonionic modules.
• The basic function of orthogonal filters is the Haar spectrum analysis of the input data d. Particularly, an orthogonal filter performs the function of memory, as given by (13). Matrix W 2 k can be designed as the best-adapted base by using (11) and (12) (i.e., d-data, y-demanded spectrum).

III. HOPFIELD NEURAL NETWORK-BASED BIORTHOGONAL TRANSFORMATION
It is well known that in data mining, signal processing, and machine learning, two transforms known as principal component analysis (PCA) [21]- [23] and independent component analysis (ICA) [24] are commonly used. PCA can be VOLUME 8, 2020 categorized as an orthogonal, and ICA can be categorized as a biorthogonal transform. Both transformations can be used as a lossless or lossy technique. Thus, for example, lossless PCA and ICA are widely used in blind signal separation (BSS). The Hopfield neural network described by (2) can perform a biorthogonal transformation and can be used for the implementation of a mapping given by training points. Thus, such a biorthogonal transformation can be formulated by a differential equation as follows: where: W s -symmetric matrix The equilibrium points of the network (17) set up a Hopfield neural network-based biorthogonal filter. A special feature of such a filter is the vector field consisting of antisymmetric (W 2 k ) and symmetric (εW s − w 0 1) components. This form of vector field can be referred to as a biological-like mechanism consisting of recombination (antisymmetric) and selection (symmetric) components. A neural network (17) can be seen as one possible extension of Hopfield-type neural networks. Moreover, by using such extended structures, some optimization problems (e.g., TSP) can be solved more effectively [25].

IV. BIORTHOGONAL TRANSFORMATION-BASED APPROXIMATION
As mentioned above, the equilibrium points of a biorthogonal filter can set up a nonlinear mapping d = F (x), as follows: Given a set of training points, , concatenating input vectors x i ∈ R n and output vectors d i ∈ R m in the form: where: dim u i = n + m, n + m = 2 k , k = 3, 4, . . . and using the orthogonal transformation (9), one obtains a Haar spectrum m i of u i , i = 1, . . . , N as: where: W 2 k -the Hurwitz-Radon matrix (16) The stable equilibria of the biorthogonal filter (17) constitute the following transformation T s (·): where:u-input vector For w 0 = 2, ε = 1, one obtains: i.e., T s (·)

It should be noted that
can be seen as Tikhonov's regularization [1]. It is clear that the transformation T s (·) projects training points u i into m i as given by (21). Hence, one obtains an inverse transformation: i.e., The transformations T s (·) and T −1 (·), arranged as a realization of a mapping F(x), have the block structure as shown in Fig. 2.
The block structure with ''distributed memory,'' presented in Fig. 2a, can be reconfigured to the form with ''lumped memory,'' as shown in Fig. 2b.
Note 2 It is worth noting that, according to the structure from Fig. 2, such an approximator performs the function of spectrum estimation m i : Hence, due to the feedback loop action, one implements a recurrence:m at the output of this approximator. It is easy to note that the structure from Fig. 2 implements an input-output mapping φ (·), as follows: Thus, vectors u i , i = 1, . . . , N are invariant points of φ (·), and vectors d i are asymptotic centers of attractors i = 1, . . . , N . Moreover, mapping φ (·) is given by the following matrix transformation: (26) and its Lipschitz constant k fulfills: Hence, φ (·) is a non-expansive mapping. Note the block T c in Fig. 2 implementing this mapping. The recurrence is convergent under the linear independence of patterns, and the number of patterns N fulfills: where: n + m = dim u i (18).

V. FEEDFORWARD MODEL OF APPROXIMATOR
One of the general frameworks unifying the different learning algorithms has been formulated by considering a functional of the form [1], [2]: The approximated function f (·) corresponds to the minimum of a functional H for a different loss function V (f ) i.e., Thus, (29) represents the classical optimization problem solved in Tikhonov's regularization theory [1]. Based on the general framework (29), another model of an approximator has been published [18]. Indeed, given a training set can be implemented as: where: K i (x i , x) are defined by the function: and: x i ∈ R n is the i-th training vector W n −skew-symmetric matrix (·) −an odd function (e.g., sigmoidal) x i , W n x -a scalar product Thus, the Gram matrix (N × N ): is skew-symmetric, and the key design equation is as follows: where Hence: where: θ R i = θ (·) + R i δ (·), δ (·)-the Kronecker function.
A block structure of a feedforward approximator is shown in Fig. 3.
It is worth noting that due to the skew symmetry of the Gram matrix in (32), regularization parameter R i can be used as a smoothness regularizer. Moreover, for W n = W 2 n (16),(32) K i (x i , x) consists of the scalar product of training vectors with their Haar spectra.

VI. ON FEATURES OF APPROXIMATORS
The essential function of approximators described in this study is the implementation of a mapping defined by a training set. However, some properties of the approximator shown in Fig. 2 are worth noting. They can be used to solve tasks typical for big data processing and machine learning. Let us assume that input patterns x p are a distorted or compressed form of x i . Hence, changing the input vectors and the structure of the feedback loop, as follows: where fraction p > 0.1 (according to numerical tests) and x p is the preserved part of x i , one achieves the full pattern reconstruction, i.e.,d i = F(x p ). This property is illustrated by Lena's photo reconstruction.

B. IMPLEMENTATION OF ASSOCIATIVE MEMORY
Implementation of associative memory by the approximator: where: x i -key vectors z i −memory vectors is realizable by defining a training set of the form: and It is easy to see that due to the reconstruction property mentioned above, the memorized vectors z i can be retrieved by the distorted or incomplete key vectors x p (36). However, note that the stored vectors u i are not attractors. Hence, any input vector v in = x i is not associated with z i , i = 1, . . . , N .
This means that: Thus, such an input vector is a key a vector only if v in = v out , and vector m out belongs to the memory. Moreover, any superposition x s of key vectors retrieves the associated superposition of the memorized vectors z i : This property allows one to use the approximator as a processor, as pointed out below.

C. MODEL OF ANALOG PROCESSOR
The model from Fig. 2 can be referred to as a superposition processor, performing the addition and multiplication of real numbers. Indeed, for the simplicity of presentation, let us consider the component-wise addition of two vectors d 1 and d 2 . Defining two system vectors x 1 and x 2 with an a priori known sum: = x 1 + x 2 (x 1 = x 2 ) , one implements (d 1 + d 2 ), as presented in Fig. 4.
Note that: It is clear that the functioning of the processor is based on the data reconstruction property. This schema can be extended  to the addition of N vectors. Multiplication by such a superposition processor can be realized as the inverse operation d = F −1 (x) using the structure from Fig. 2 with a modified feedback loop as presented in Fig. 5. Thus, a component-wise multiplication of a given vector x by a number A ∈ R i.e., A · x, can be realized as follows: Hence, the inverse operation (Fig. 5) is realized by the same T c , changing only the feedback loop: i.e., Pattern recognition can be implemented by the approximator defining a training set S = {x i , d i } as a set of patterns/vectors assigned to one of a prescribed number of classes. For simplicity of presentation, let us consider two classes of recognition, i.e.
where: c 1 , c 2 -a signature of prescribed classes (c 1 = c 2 ) x i −input patterns.
The structure of such a pattern recognizer is shown in Fig. 6.
Note that the structure of the approximator can be augmented as in Fig. 6. by the above-described feedforward model [18]. Thus, vectors c can be interpreted as features of input patterns. Moreover, this pattern recognizer can classify incomplete vectors deploying the reconstruction property. It is worth noting that the dimension of the Gram matrix (33) is defined by the number of prescribed classes.

E. SUPERPOSITION-BASED PARALLELISM
It is well known that quantum computations, and hence QNN, are based on quantum parallelism, which is the result of quantum state superposition. The structure of the machine learning presented in Fig. 2 can be categorized as a computation model featuring ''parallelism'' as well, because it features superpositions of the processed vectors. Indeed, let us assume that a given set of training vectors is generated by a linear mapping F(·): Under the assumption that in the set S there are m linearly independent vectors: x i , . . . , x m , one obtains the following superposition: and Hence, the structure of a mapping F(·) is supported by m vectors: where: can be, according to (44), implemented by the structure shown in Fig. 7.
It is worth noting that a structure similar to the mapping F(·) is used for solutions to different linear equations (see further examples below) and linear transformations.

Example 1:
The model of machine learning described above was used for image reconstruction of incomplete and distorted patterns. As a test image, a grey photo of Lena was used, having resolution of 512 × 512 pixels, which is commonly used to investigate algorithms for image compression and processing (Fig. 8). This photo was written in the MATLAB program in the form of matrix X lena with a size of 512 × 512. The X lena was a source of a different number of patterns (columns or rows). The following sets of columns were analyzed: N = 4, 16, 32, 64. The analysis of the full image was therefore a sequence of 128, 32, 16, and 8 partial analyses, respectively.  where: x p -the preserved part of the photo columns. This means that the input patterns were distorted by randomly removing about 90% of the pixels. The convergence of the iteration process is illustrated in Figs. 9, 10, and 11. The images show the result of photo reconstruction after 5, 20, and 100 iterations, respectively. The best reconstruction results were achieved for N = 4.
In the next experiment, some information was removed from the analyzed image, leaving only a narrow band of row patterns. From a formal point of view, in matrix X lena , the rows from 200 to 280 were left. The original image and the reconstruction results after 5, 20, and 100 iterations are presented in Fig. 12. It is worth noting that in the case of a simultaneous analysis of the whole image, the dimension of the approximator should be appropriately increased to  allow the processing of an image as a single column vector. Moreover, in these numerical experiments, matrix W 2 k = {1, −1, 0} was deployed.
Example 2: Many machine learning algorithms and applications are based on solutions of linear equations [15]. One considers here the following problem: where: A-(m × n) real matrix, m = n. b−(m ×1) real vector, x-(n×1) real vector m + n = 2 k , k = 3, 4, . . . , m < 1 2 (m + n) One considers two cases. Case 1: m < n. To solve this equation using the structure from Fig. 2, one first generates a training set:  Under the assumption b i = b k , for i = k, the synthesis of a mapping F (x) gives: where . . , m, as shown in Fig. 13. Formulating b as a superposition of pattern b i ; i.e., where: one obtains a solution to (47) as follows: Hence, the structure shown in Fig. 13 is the model of a mapping φ and it sets up the solutions to (47). It is worth noting that: 1.  3. Once designed, a structure of the mapping model presented in Fig. 14 can deliver any number of exact solutions for the equation Ax = b by generating different training sets:

It is well known [26] that the least-squares (LS) solution
to (47) is: Case 2: m > n. Linear equations with dimensions fulfilling m > n can be solved only approximately. The training set constitutes matrix B (50), which is singular. This means that due to the singularity of matrix B (m × m), a mapping F −1 (b i ) = x i cannot be determined.
However, the solutions to (47) for m > n can be obtained by using an LS estimator of the superposition parameters α i (50). Thus, for any given vector b (m × 1), n superposition parameters α i can be set up by the least-squares approximation formula: where: 1-identity matrix λ ≥ 0 Hence: It is worth noting that a positive parameter λ(i.e., λ > 0) can be for b i 0 set up to obtain α i > 0, i = 1, . . . , n, and hence x 0. The machine learning model for a solution of linear equations Ax = b is presented in Fig. 15. It is well known [26] that the LS solution of equation (47) is given by: Thus one obtains: x − x * = 0.
Note that for the ''regularized'' least squares in (53), i.e., the exact solution x is also set up by the structure from Fig. 15. It is well known that nonnegative matrix factorization (NMF) has recently been used for modeling many real-life applications from the field of signal and data processing [22]. NMF is formally defined as or as where: B d -data matrix A− basis matrix, r ≤ min {m, n}. A nice review paper on NMF can be found in [27]. It should be clear that (47) for case 2 (i.e., m > n) can be interpreted as an NMF subproblem under the assumptions: x 0 (column of X), A 0, A ∈ R m×r , and b-column of B d . Thus, one generates the training set Due to the above-described regularization, the machine learning model for NMF is shown as in Fig. 15. The model is adequate for all columns of the data matrix B d .
Example 3: The structure of machine learning presented in Fig. 2. can be categorized as a computation model featuring ''parallelism.'' Hence, by using this model, ''oracle'' or ''black box'' problems [28] can be solved by performing only one query to the oracle. Indeed, a black box computes a function where: x i ∈ R n , n + 1 = 2 k , k = 4, 5, . . .. 2N < 1 2 (n + 1). Moreover, one assumes that f is either constant (f (x) = c for all x) or balanced (f (x) = 0 for 1/2 of the possible input vectors). It is clear that the decision problem of whether f is constant or balanced can be solved by accessing the black box from Fig. 2 only once, using input s in the form of superposition: Hence: This means that one query suffices to recognize a constant and balanced function.

Example 4
The model from Fig. 2 was used for the recognition of patterns generated by four deterministic chaos systems, i.e., Mackey-Glass's [29], Lorenz's [30], Chua's [31], and Rössler's [32] structures. Thus, sixteen 512-dimensional patterns/vectors were used as learning vectors to determine four classes: class (1) (Mackey-Glass), class (2) (Lorenz), class (3) (Chua), class (4) (Rössler). Hence, four learning vectors for every class were determined as:   (4), and x (n) are the time samples from the solution to the Rössler equations. As mentioned above, the main feature of the model from Fig. 2 is that it can classify incomplete patterns. Indeed, with only about 150 components of the 512-dimensional vectors, recognition and classification were correct. Moreover, using the data in this example, the oracle-like problem was solved as an illustration of parallelism. Indeed, the membership of a group of vectors in a given class can be set up by performing only one query.

VIII. IMPLEMENTATION OF AN APPROXIMATOR BY A DYNAMIC NEURAL NETWORK MODEL
As mentioned above, the basic structure of the approximator shown in Fig. 2 resolves the equilibria of Hopfield neural network-based biorthogonal filters. Hence, the structure of the approximator from Fig. 2 and the analog processor can be implemented as a ring of neural networks, as shown in Fig. 20.
It is easy to see that the structure from Fig. 20 is described by the following differential equations: where: Hence, the steady-state solutions of these equations are: In order to realize the unlimited number range of the processor, one can assume the linearity of activation functions: θ (ζ ) = ζ , θ (ξ ) = ξ . It is worth noting again that in the examples of computational verifications, the orthogonal matrix W has the form W = {−1, 0, 1}.

IX. Q-INSPIRED MODEL OF MACHINE LEARNING
As mentioned in the Introduction, Q-inspired neural networks are a non-quantum version of CVNN. In this study, we show that the extended model of Hopfield-type neural networks can be a source of different types of computational models. Thus, the Q-inspired Hopfield-type neural network is given by (3). The machine learning model, implemented by such  a Q-inspired neural network, is determined by the following statements: Statement 1 The real-valued octonionic module defined by (10),(11), (12) is for complex input vectors d ∈ C 8 transformed to a complex-valued module, because its transformation matrix T 8 is unitary, i.e., T 8 ·T

Algorithm 1
1. Declaration: Input the set of training points:

System design:
Create system vectors u i : Calculate the spectrum m i of system vectors u i :  Thus, the approximator shown in Fig. 23 computes the DFT spectrum d i for any vector x i ∈ S and, moreover, it computes the spectrum d for any vector x ≺ ∞(dim x = m). It is clear that through inverse modeling, the same approximator computes IDFT. Finally, it is worth noting that the Q-inspired model of machine learning described in Statement 2, which was designed as complex-value approximator, can be used as a model of an analog processor (Section 6, C) for the addition and multiplication of complex numbers.

X. CONCLUSION
This study proposed a general model of a machine learning system. This model has the following universal features: it makes it possible to realize the typical, basic functions of learning systems, such as association, pattern recognition and classification, as well as inverse modeling. This study also focused on one aspect of the application of the presented model: its usefulness for the signal reconstruction of incomplete patterns. This aspect of the application was illustrated with the results of the analyses, which showed that image reconstruction is possible even in a case where 90% of information was randomly removed from the signal. This means that the model can be used as a device for data compression as well. Moreover, as a side result of the research, the structure of a superposition processor was proposed. However, research on VLSI realizability, and the practical meaning of such a processor is beyond the scope of this study. Nevertheless, the structures proposed in this study can be seen either as mathematical algorithms, or as models of physically realizable VLSI networks. Thus, unlike deep learning algorithms, which are basically multilayer nonlinear kernel machines designed by very-large-scale optimization tools (e.g., SGD), the approximators proposed in this study, as physical objects, could implement reversible computations and ''parallelism.'' Finally, it should be noted that the Q-inspired networks considered in this study and implemented as complex-valued structures are more powerful than the real-valued realizations. However, the technological realizability of CVNN is currently unknown, and thus, they can be viewed only as mathematical algorithms.

APPENDIX SUMMARY OF ALGORITHM
WIESLAW CITKO received the M.S. degree in solid state physics from the Faculty of Applied Physics and Mathematics, Gdansk University of Technology, and the Ph.D. degree in electronics from the Lodz University of Technology. He is currently an Assistant Professor with the Department Electrical Engineering, Gdynia Maritime University, Gdynia, Poland. His current research interests include artificial intelligence, machine learning, neural networks, and PLL networks.
WIESLAW SIENKO (Life Member, IEEE) received the M.S. and Ph.D. degrees in electronics from the Gdansk University of Technology. He is currently a Professor with the Department of Electrical Engineering, Gdynia Maritime University, Gdynia, Poland. His current research interests include artificial intelligence, machine learning, quantum signal processing, and digital signal processing. VOLUME 8, 2020