Activation Function Modulation in Generative Triangular Recurrent Neural Networks

Autonomous generation of time series is challenging because the network must capture short-term features while tracking long-term time dependencies. This paper introduces the modulation of the activation function slopes of the upper-lower triangular neural networks (ULTRNNs) for dynamic variation of memory through a secondary recurrent network with its own independent states. A zigzag propagation algorithm for weight updates is proposed that accounts for the dynamic interaction of the states between the ULTRNN and the secondary network. A novel training method is proposed that distributes the eigenvalues of the closed-loop system around the unit circle in the complex z-plane to ensure that the network behaves as a nonlinear oscillator with an output that neither collapses nor saturates but continues to emulate the target. Examples encompassing the Lorenz series, Santa Fe laser data, Kolam patterns, electrocardiogram (ECG) signals, stock pricing data, and smart grid data are presented to demonstrate that the proposed approach is highly effective in the generative modeling of complex periodic, chaotic, and nonstationary time series. The qualitative and quantitative performance of the ULTRNN obtained with the proposed activation-function modulation technique is comparable to that of state-of-the-art techniques including feedforward networks and generative adversarial networks, but with far fewer trainable parameters and shorter computation times.


I. INTRODUCTION
This paper introduces a novel sparse recurrent neural network (RNN) architecture for the generative modeling of time series. Generative modeling is challenging because the network must capture short-term features while tracking long-term time-dependencies. RNN-based networks are uniquely suited to such problems owing to their autoregression ability. Typically, the activation function slope used in such networks is fixed and determines their ability to model long-term time-dependencies at the cost of short-term features or vice versa. We propose dynamically varying the activation function slope to control the memory retention and forgetting. In addition, unconstrained RNN weight placements may result in unstable networks whose outputs saturate or highly stable networks whose outputs collapse. Therefore, we propose a novel training method to constrain the eigenstructure of the closed-loop system for marginal stability to ensure a stable emulation of the target data with an output that neither collapses nor saturates. We apply activation function modulation, together with constraining the closed-loop eigenvalue distribution, to synthetically generate chaotic, complex periodic and quasiperiodic waveforms, and model non-stationary processes such as stock data.
Sparse RNNs have the advantage of reduced computation and storage compared with fully recurrent networks. In addition, special recurrent structures such as block-diagonal (BDRNN) [1] and upper-lower triangular (ULTRNN) [2] inherently facilitate monitoring stability and ensure a robust learning ability. The eigenvalues of the 2 × 2 diagonal blocks of the BDRNN recurrent weight matrix were constrained during the training to lie on the unit circle of the complex zplane. Such placement inherently mitigates the vanishing and exploding gradient issues normally associated with conventional RNN [3]. Reported applications of BDRNNs include speech recognition [1], lung-sound processing [4], and telecom call volume prediction [5]. The ULTRNN [2] uses upper and lower triangular weight matrices, each with 2 × 2 diagonal blocks with the same constraints as the BDRNN. As a second constraint, the corresponding diagonal blocks in the two triangular matrices are set equal to reduce the chances of overfitting as the network learns to model the hidden oscillatory modes of the target trajectory. The ULTRNN is effective for modeling dynamic time series that exhibit chaotic, nonsymmetric, and long-term dependency behaviors [2].
Adaptation of the activation function slope has previously been considered a tool for improving learning performance. Reference [6] used this technique with a limited scope to improve the intrinsic stability of generic recurrent networks. Reference [7] used trainable activation functions in a deep feedforward neural network for an MNIST handwritten digit recognition task. In a previous study, the learning capability of a ULTRNN was enhanced by modulating the slope of the activation function associated with each state variable [8]. Modulation of the activation function incorporates variable memory with an objective similar to that of an LSTM network that employs multiplicative gates [9].
The slope of the activation function directly affects the time duration contribution of the corresponding state variable. A sharp activation function slope lengthens the time contribution and helps model and control long-term dependencies. Conversely, a slack slope shortens the time contribution and helps to model controlled "forgetting". A fixed activation slope favors the equal contribution of all states over time and the network learns long-term dependencies at the expense of its short-term performance and vice versa [8]. Varying the slope of the activation function with time can be used to model the dynamics of the system time constants that produce the target waveform. Reference [8] dynamically computed the slope of the (tanh) squashing function for each state variable using a secondary network. For this secondary network, a feedforward structure that uses the main ULTRNN states was chosen to simplify the training with a direct extension of the conventional backpropagation algorithm. However, such sharing of states may result in suboptimal mapping of the activation function slope space. To address this shortcoming, in this study, we propose using a recurrent architecture for a secondary network with its own independent and dimensionally unconstrained state variables. We call this secondary network an independent state activation function network (ISAFN).
The interaction between the state variables of the ULTRNN and ISAFN imposes additional constraints on the gradient computation process during training. Considering this interaction, we developed a novel zigzag propagation (zprop) algorithm that uses a combination of forward and backward steps for exact gradient computation. A truncation technique that increases the computational speed by trading off the desired gradient accuracy level is presented. In addition, the activation function slope is constrained to a range that ensures the marginal stability of the ULTRNN.
The activation function modulation of the ULTRNN can be applied to a wide range of time-series modeling and prediction tasks. This study focuses on the unique objective of autonomous generation or replication of time series with behavioral characteristics close to those of the parent system as envisioned in [10], [16]. From a practical perspective, such generative networks can be used to clone new data when collecting real data is difficult or when its availability is sparse. We consider one-step prediction as the primary mechanism for training, such that the trained network autonomously generates outputs by feeding back only its own output from the previous time step. Because the stable performance of the trained network depends on the eigenstructure of the closed-loop system, we introduce an eigenvalue-based heuristic cost function. Using this approach, the closed-loop network behaves as a nonlinear oscillator emulating the target with an output that neither collapses nor saturates. We demonstrate the performance of ISAFN-ULTRNN with the autonomous generation of synthetic examples of Lorenz limit cycles [11], Santa Fe laser data [12], one-stroke kolam patterns [13], ECG signals [14], [15], Google stock data [16], and smart grid data [17], [18].
The main contributions of this work include: (a) the controlled modulation of the activation function slopes of the ULTRNN which facilitates the dynamic variation of memory retention. A secondary recurrent network whose states are independent of the ULTRNN states is employed for the dynamic computation of the activation function slopes. (b) A zigzag propagation algorithm that uses a combination of forward and backward steps for weight updates to account for the dynamic interaction of the states between the ULTRNN and secondary network. (c) A novel training method distributes the eigenvalues of the linearized closedloop system around the unit circle in the complex z-plane to ensure that the trained network behaves as a nonlinear oscillator with an output that neither collapses nor saturates but continues to emulate the target.
The remainder of this paper is organized as follows: In Section II, the structure of the ULTRNN with ISAFN is developed, and its key features are compared with the common-state AFN (CSAFN) presented in [8]. In Section III, we develop a learning algorithm for the ISAFN with a specific focus on the z-prop algorithm for gradient computation, which considers the interaction between the state variables of the ULTRNN and ISAFN. In this section, we also discuss the computational burden of the z-prop algorithm and suggest techniques for increasing computational speed through truncation. Section IV describes the development of an eigenvalue-based cost function that ensures the marginal stability of the linearized closed-loop system. Section V presents illustrative examples to demonstrate and contrast the generative performance of the ISAFN-ULTRNN with state-of-the-art networks. Section VI summarizes the key contributions of this study.

A. ACTIVATION FUNCTION MODULATED ULTRNN
The ULTRNN architecture uses twin triangular statefeedback weight matrices [2] as shown in Figure 1. The system equations, with sampling instant ,, are given by: The state activation function is: Note varies dynamically. With a fixed slope α > 0, the output activation function is: Note that the ℎ 2 × 2 block diagonal submatrices of , = , are constrained to be equal and formulated using angular variables as follows: A key motivation for using (5) is that it is an effective mechanism for modeling the underlying oscillatory modes [1], [2]. A sufficient condition for network and training stability is [1], ( ) ≤ 2, = 1,2, . . . . Eqn. (5) with (6) constrains each eigenvalue pair of to lie on the unit circle of the complex z-plane. This ensures both network and learning stability without any need for online monitoring. Such eigenvalue placement eliminates the exploding and vanishing gradients issue associated with RNN. Constraining the respective 2x2 block diagonals of and to be equal minimizes the possibility of overfitting while learning [2], [19].
Typically, in neural networks, the slope of the activation function ( , ) is chosen a priori and is always fixed. In this study, we dynamically vary the activation function slopes to ensure that the contribution of some states to others varies with time. With each state variable of the main ULTRNN having a unique activation function, the variation in the activation function slopes over time enhances the dynamic performance of the ULTRNN.
We now describe two alternative architectures for a secondary network that output the variable slope activation functions required by the main network. The first (CSAFN), is a feedforward network that uses the state variables of the main network to compute the activation function slopes [8]. The second (ISAFN), is a novel architecture that employs a recurrent network with its own independent state variables to compute activation function slopes.

B. COMMON STATES ACTIVATION FUNCTION NETWORK (CSAFN)
The state variables are common to the ULTRNN and CSAFN as shown in Figure 2. The output of the CSAFN is the activation function slope ( ), given by: where, where, The parameters , = 0,1,2, 3 are the scaling factors that determine the vertical shift, span, slope, and horizontal shift, respectively. During training, the CSAFN simultaneously learns to output ( ) as the ULTRNN learns to model dynamic processes. Note that in [8], ̃ is chosen as a feedforward, triangular matrix with block diagonals of the form (5). However, such a constraint is not necessary because the CSAFN does not use recurrence.

C. INDEPENDENT STATES ACTIVATION FUNCTION NETWORK (ISAFN)
The CSAFN architecture may impact the learning performance by mapping the network state to a positive activation function slope. Such suboptimal mapping compromises network memory. This may impact the generalizability of the CSAFN, and thereby, the network's robust learning ability. With the primary motivation of decoupling the network states of the AFN from the main network and improving the memory retention control, we propose a novel AFN architecture that employs recurrent states that are completely independent of the main ULTRNN states as shown in Owing to the recurrence used in the ISAFN, the state matrices are constrained to triangular matrices with block diagonal submatrices of form (5). The output of the ISAFN is: where, = { , , = 1,2, … , = 1,2, … } is the output matrix, and ( ) = { ( ), = 1,2, … } is the activation function of the same form as (8). Note does not need to be equal to and can be selected as higher or lower depending on the task objectives. The parameters of ̃( ·) of the CSAFN and (·) of the ISAFN, and how they impact the activation function ( , ) of the ULTRNN, are shown in Figure 4. The vertical-shift factor 0 determines the minimum limit of the activation function slope; the span factor 1 expands or contracts the span and determines the granularity of forgetting levels; and the slope factor 2 impacts both the range and sensitivity of forgetting levels; the horizontal-shift factor 3 impacts the desired value of the activation function slope for no external inputs. These parameters are empirically chosen such that (a) the marginal stability of the ULTRNN and its training are retained at all instants with or without external inputs, and (b) the slope assumes a range of values above a minimum threshold to represent various levels of forgetting.

III. TRAINING ALGORITHM
The combined CSAFN and ULTRNN are referred to as CULTN, and the combined ISAFN and ULTRNN are referred to as IULTN. The learning algorithms for the CULTN and IULTN are described in this section. The algorithms were derived from the small perturbation theory to compute the exact gradients. The cost function for minimizing the output error during training is: where ( ) is the element of the error vector given by is the number of training samples in an epoch.

A. TRAINING CULTN
The feedforward architecture of the CSAFN allows training to be performed with a simple extension of the conventional backpropagation algorithm. The following steps are performed for = , .
Weight differentials for the CSAFN are accumulated at each iteration step as Note that the symbol −= in (14) and (15) represents the negative accumulation of the weight differentials at the th iteration step to the accumulated values from the previous steps through + 1. Before the next iteration step, the following update is performed ( ) = .
(16) After the accumulation of the differentials over all sample steps, the differential of the angular variable of each diagonal block of is computed as [ (17) (17) constrains the block diagonal elements of to lie on the unit circle and helps maintain network stability and improve robust learning ability. The possibility of exploding and vanishing gradients is minimized while retaining the sensitivity required to model the oscillatory modes of the target.

B. TRAINING IULTN
The interaction of the recurrent states of the ISAFN with the recurrent states of the ULTRNN necessitates two parallel training processes, one for the ULTRNN and the other for the ISAFN.
where is the intermediate error function variable. Weight differentials are accumulated at each th iteration step as shown in (14) using of (18). Before the next th iteration step, the following update is performed:

2) ZIGZAG PROPAGATION ALGORITHM FOR ISAFN
( ) being a function of two variables is impacted by the ISAFN weights through two propagation paths. One is directly through ( ) of the ISAFN and the other is indirectly through ( ) of the ULTRNN, which is in turn is impacted by ( − 1) and . Hence, the ISAFN gradient terms are computed through a combination of forward and backward propagation steps as summarized below.
where, ( ) and ( ) are the intermediate error function variables, is the derivative of ( ); and

Note that the interaction of the ISAFN and ULTRNN states is reflected in ( ).
For each -th iteration step above, the following steps are performed for = , − 1, . .2,1: Weight differentials are accumulated at each iteration step as Before the next th step, respectively, the following updates are performed: Given the backward steps for each forward iteration step, the algorithm is termed as zigzag propagation (z-prop). The algorithm is illustrated in Figure 5 for iterations − 1 and . In the figure, the solid downward arrow indicates a conventional backward propagation step, and the dashed upward arrow indicates forward propagation. Each box shows an error function from which the gradient components are obtained by pre-multiplying with (·) and postmultiplying with the ISAFN state (·). The process begins with the initial error computed for each th iteration step (shown in the top-left box). The error function at each lower box is obtained through post-multiplication of the term in the box above with the ISAFN backpropagation term ′ (. ). The process continues in the next column with the error in the top box obtained by pre-multiplying the error in the previous step ( − 1) with the ULTRNN forward propagation term ′ (·) . This process is repeated recursively for all columns in the th iteration step, and for all = 1,2, . . .

3) WEIGHT UPDATES
The CULTN or IULTN weights are updated at each training cycle consisting of an epoch of time steps, using ( + 1) = ( ) − ∆ ( ), > 0 (25) ( ) represents the weights in training cycle . Note that updating and is subject to the form of (17).

4) COMPUTATIONAL REQUIREMENTS
The number of trainable weights for the CULTN is given by: and for the IULTN is given by = ( + 2( + )) + ( + 2( + )) (27) For a given network size, the z-prop used by the ISAFN requires ( 2 /2) computations whereas the CSAFN requires only ( ) computations. Hence, the ISAFN can be computationally expensive for a large . Truncating the number of recursions for a large to a fraction of trades off the accuracy of the gradients with a reduction in computational speed. For example, iterations can be limited to a threshold ℎ , that can be selected empirically.
To empirically assess the impact of recursion truncation on the gradient accuracy and computation speed, a representative 2-output, 5/5-mode ( = =10) IULTN was considered. The inputs and outputs were chosen as random values in the range of (-1:1). The gradients were computed first with the z-prop algorithm and then with the direct perturbation of the individual weight elements. Errors between the gradients were compiled for several truncation thresholds as a fraction of in the range (0.1:1). The error comparison was repeated for several values in the range (50:200). The normalized gradient accuracy based on the root-mean-squared (RMS) error versus the computational time profiles is summarized in Figure 6. The gradient error is close to zero with no truncation ( ℎ = ), which confirms the correctness of the z-prop equations (20) -(24). The gradient errors exhibited a hyperbolic profile with a ℎ knee point between 0.2 and 0.35 . The lower the epoch size , the higher the increase in error with increased truncation. The computational time exponentially increased as the truncation decreased to zero, with a worse impact for the higher epoch size cases. A threshold chosen close to the knee point provides a reasonable trade-off between the gradient accuracy and computation time.

III. CLOSED-LOOP GENERATIVE TRAINING
For autonomous generation using only the network's predicted outputs, the question of how to ensure closed-loop stability of the network needs to be considered. A tractable metric of the closed-loop system performance would be ideal to steer the training, such that the trained network behaves as a nonlinear oscillator with an output that neither collapses nor saturates but continues to emulate the target. However, owing to the nonlinearity of the activation functions and the variation in their slopes, such an ideal metric for guaranteed closed-loop performance is not computationally feasible. Instead, we seek a simple metric based on the linear control system theory that can potentially improve closed-loop performance. A heuristic metric based on the sufficiency condition for the local marginal stability of the linearized closed-loop system is derived. The closed-loop state transition equation of the ULTRNN with one-step prediction feedback for small perturbations is: In (4), with slope = 2 for the output activation function (·), derivative ′ ( ) is a vector of positive values with a maximum of unity. In (29), we set ′ ( ) to unity to obtain a linearized closed-loop matrix (LCLM) Ensuring the marginal stability of the closed-loop system for small perturbations requires constraining the eigenvalues of ( ) within an annular ring to approximately unity in the complex z-plane. Because the elements of ′ ( ) vary from zero to unity, ( ) transitions within the ranges * and * . Given that * is marginally stable due to constraint (5) imposed on its block diagonal elements, it is heuristically evident that imposing marginal stability on * , in turn, ensures the marginal stability of ( ) for all values of ′ ( ). Note that in (28) as varies, modulated by the AFN, ′ ( ), = , , shapes the closed-loop system performance by dynamically adjusting the annular ring that constrains the eigenvalues. The cost function (11) used for training is modified as = + (32) to include the cost term which represents the effective distance of the eigenvalues of * from unity.
where, ( ) is the eigenvalue of , and ‖·‖ represents any scalar norm. The gradients of with reference to the individual elements of * are computed through direct perturbation at each weight update. Note that echo-state networks [20] employ eigenvalues in the training process to eliminate exploding gradients. The gradients of require ( 4 ) computations and can be computationally expensive for large networks. The computational speed can be increased at the cost of accuracy using an alternative cost function based on a rough estimate of the spectral radius, that is, the largest eigenvalue, of * can be used [19]. Note that (29) applies only to equal input and output dimensions, and if other additional inputs or outputs are present, they should be separated from , , = , .

IV. ILLUSTRATIVE EXAMPLES
Several illustrative examples are presented to demonstrate and compare the effectiveness of the IULTN architecture with state-of-the-art networks for generative tasks. The first two examples pertain to the autonomous generation of chaotic waveforms with characteristics close to those of Lorenz limit cycles [11], and Santa Fe laser data [12]. The third example deals with the autonomous generation of a complex periodic waveform based on the South Indian Kolam patterns [13]. The fourth example autonomously generates synthetic electrocardiogram (ECG) data with characteristics similar to real patient data [14], [15]. The fifth example models a nonstationary process using the Google stock data studied in [16]. The final example generates hourly solar generation and energy consumption data studied in [18] as part of smart grid data synthesis. In each example, the network was trained for one-step prediction using the cost function (32) and tested in a closed loop with the output from the current time step used to predict the next output. Techniques adopted to improve the learning robustness include [8]: (a) input noise injection to improve generalizability [21], [7], (b) use of additional pilot inputs to embed periodicity, (c) use of a composite input consisting of the target and the network outputs for guided training, and (d) combined use of n-step prediction outputs for improved noise rejection during generation.
For the chaotic time series considered in the first two examples, the Kullback-Leibler (KL) divergence [22] was used for performance comparison. The KL divergence is not a measure of the global closeness of the network output to that of the target. However, at the local level, it is useful to quantitatively compare the probability distribution of outputs from different networks. We use the following formulation to facilitate a performance comparison of CULTN and The parameters for the comparative analysis were the covariance time step and quantization range . For all the examples, minimizing the cost function (32) was used to ensure closed-loop stability and robust learning. Typically, training is conducted in two phases to improve learning robustness. The first phase uses only the target waveform as the input, and the second combines the target waveform ( ) and predicted network output ( ): ( + 1) = ( ) + (1 − ) ( ), = 1,2, … , 0 ≤ ≤ 1 (35) where is chosen empirically. The training transition from the first to the second phase occurs when the eigenvalue metric decreases below a threshold. During the second training phase, the weight update parameter was progressively reduced to ensure was maintained at or below the threshold.
To generate complex periodic waveforms, the waveform period was set to the epoch length . To reduce the initial transient errors, the initial values of the network state variables in each training cycle are updated as (1)| +1 = ( (1) + (1 − ) ( )) | , 0 ≤ ≤ 1 (36) with chosen empirically. Note, represents the network's state variables.
For all networks used in the examples, the parameters 0 , 1 2 , and 3 of ̃( ·) used in the CSAFN and (·) used in the ISAFN per (8) are empirically chosen as 0.1, 2.1, 2.1, and 1.5, respectively. This selection limits the activation function slope to a maximum of 2.2 aimed at sustained longterm memory, and a minimum of 0.1 heuristically chosen to regulate the forgetting level. The initial angular variables of the block diagonals of the ULTRNN and ISAFN were assigned random values that were uniformly distributed in the range (0:2π). The elements of all other matrices were randomly distributed in the range (−1:1). For input-output compatibility, before training, the inputs and targets were amplitude-normalized by scaling to span the range (-0.85:0.85). The initial value of the weight update parameter was set to 0.1. Tables I and II summarize the network parameters for the  CULTN and IULTN, respectively, and Table III lists the key training and testing parameters used in the examples.

A. LORENZ LIMIT CYCLE
We consider the autonomous generation of time-series with characteristics close to the Lorenz limit cycles [11]. The target outputs are produced by solving the ordinary differential equation (ODE):     (35) reduced from 1 to 0.5 to 0.25 such that the network was forced to rely more on its prediction to improve the orbit switching accuracy. The generative performance of the network was assessed by replacing the targets ( ) with the corresponding network outputs ( ), = 1,2,3. The performance of the 6-mode CULTN closely matched that of the 4/4-mode IULTN, but was inferior to that of the 6/6mode IULTN. Hence, an 8-mode ( =16) CULTN was considered, and its performance was comparable to that of 6/6-mode IULTN. From Figure 8, it can be observed that the LCLM eigenvalues for the two networks are distributed around the unit circle.  Figure 9 shows the autonomously generated outputs of the 8-mode CULTN and the 6/6-mode IULTN. Qualitatively it can be seen that both networks can generate and sustain chaotic waveforms with attractor contours similar to that of the target. The two basins of attraction are distinguishable, and orbit switching between them is retained Quantitative performance was assessed using symmetric KL divergences  Figure 10 shows the ensemble average and standard deviation of the KL divergences for the 8-mode CULTN and 4/4 mode, and 6/6-mode IULTN. The average KL divergence of the 6/6mode IULTN is consistently lower than that of the 8-mode CULTN with a lower standard deviation. Even the 4/4-mode IULTN exhibited marginally better performance than the 8mode CULTN.
In summary, IULTN with fewer weights ( =792) outperformed CULTN ( =1056) both qualitatively and quantitatively in the generative modeling of the Lorenz limit cycles. A similar task considered in [23] used a reservoir computing network with an estimated 1800 trainable weights.

B. SANTA FE LASET DATA
We considered autonomously generating synthetic data similar to the Santa Fe Laser dataset [12]. The original dataset consists of periodic to chaotic intensity pulsations of a far-infrared laser. The laser output exhibited significant dissymmetry around the mean value and mimicked a chaotic low-frequency waveform modulating a high-frequency carrier as shown in Figure 11.
First, a 4-mode ( =8) CULTN and a 4/4-mode ( = =8) IULTN were considered. With the laser data as the target input, a 1-input 2-output network was used with the outputs being the 1-step and 4-step prediction values. To improve the learning robustness, a second output is included to enhance the network's ability to capture the structure of the laser data better, for example, its rise and collapse. In addition, white noise with a variance of 0.01 is added to the input. The networks were trained with the first 1500 data points of the original data series "A.Cont," with an epoch size of = 100. The networks were first trained in an open loop and then in a closed loop with in (35) reduced from 1 to 0.5 such that the network relied more on its prediction. Figure 12 shows that the LCLM eigenvalues of the two networks were distributed within an annular ring around the unit circle. The generative performance of the trained networks was assessed by replacing the target ( ) with the mean of the predicted network outputs ( ) and ( − 4). The performance of the 4-mode CULTN is significantly inferior to that of the 4/4-mode IULTN. Hence, an 8-mode CULTN was considered to have the performance comparable to that of a 4/4 mode IULTN. Figure 13 shows the CULTN and IULTN output time plots for 2000 sampling instants. The first 200 points show the network output with the target input, and the remaining autonomously generated. It is qualitatively evident that both networks can sustain highfrequency oscillations and chaotic low-frequency variations similar to the original laser data. However, the CULTN waveform has 'softer' valleys with higher values at which the envelope collapses compared to that of the IULTN. The IULTN output was more similar to the target ( Figure 11) with a sharper rise and lower valleys of the envelope. Quantitative performance was assessed using symmetric KL divergences (33) for multiple ensembles of autonomous generation. The ensembles were computed for quantization levels in the range (4:44) and setting the covariance parameters in the range (20:200). Figure 15 shows the ensemble average and standard deviation of the KL divergences as a function of the quantization level . The average KL divergence of the 4/4-mode IULTN was consistently lower with a smaller standard deviation than the 4-mode CULTN and closer to that of the 8-mode CULTN. In summary, IULTN with far fewer weights ( = 320) outperformed CULTN ( = 896) in the generative modeling of Santa Fe Laser data. A similar task considered in [10] with a multilayer perceptron network used an estimated 1300 trainable weights.

C. KOLAM PATTERN
Kolam is a floor art of complex and intricate patterns that is commonly practiced in South India. A Kolam is drawn as a continuous line looping or joining straight and curved lines typically around symmetric dot patterns [13]. The mathematical properties implied in Kolam design have been studied in [13], [24] and [25]. A single-stroke Kolam variant can be represented as a trajectory plot with x and y coordinates. The modeling complexity of a single-stroke Kolam lies in the rich harmonic content of the x-and yvariables, and their spatiotemporal synchronization.  Figure 16. Both CULTN and IULTN can stably and continuously generate a Kolam pattern. However, the 5/5-mode IULTN, with fewer weights ( =520), outperforms the 6-mode CULTN ( =576) in terms of accuracy and symmetry around the dot pattern, as confirmed by the average and standard deviation of the errors. Errors were obtained by comparing the distances between the generated pattern and target to the nearest dot.
Case 2 shown in Figure 17 is the autonomously generated plot by a set of nine 6/6-mode ( = =12) IULTNs, each trained to reproduce one of the symmetric one-stroke kolams drawn around 1-3-5-3-1 dot placements studied in [13]. The low average errors and low standard deviations confirm the quantitative accuracy achieved by IULTN.

D. ELECTROCARDIOGRAM (ECG) WAVEFORM
Case 1 considers several 2-input, 2-output CULTN, and IULTNs to autonomously generate the ECG waveform synthesized using the model presented in [14]. As shown in Figure 18, the first input is the target, and the second is a pilot sine wave whose frequency is the same as the fundamental ECG waveform. To improve learning robustness, given that segments of the ECG waveform are nearly flat for a substantial amount of time, the pilot signal plays a key role in the distinct mapping of each ECG segment. The ECG waveform was period normalized with =100. The networks were first trained in an open loop and then in a closed loop with in (35) reduced to 0.5, and then to 0.25, such that the network relied more on its prediction as training progressed. To reduce the initial transient errors, the initial values are updated at each cycle with in (36) set to 0.95.
The autonomously generated outputs, and LCLM eigenvalue distribution of a 4-mode CULTN ( =8) are shown in Figure 19. Note that the pilot sine wave was also generated by the networks; hence the process was fully autonomous. The CULTN continuously reproduces the subperiod, shape, and peak values of the P, R, and T waves, and heartbeat with reasonable accuracy. However, it cannot not accurately reproduce flat segments. The autonomously generated outputs, and the associated LCLM eigenvalue distributions, of a 2/2-mode ( = =4), 3/3 mode ( = =6), 3/4 mode ( =6, =8) IULTNs are shown in Figure 20. It can be observed that the 2/2 IULTN, with fewer weights ( =112), outperformed the 4/4 mode CULTN ( =288), with P and T waves better reproduced. The reproduction of the flat segment between the T and P waves improved with the 3/3 mode IULTN. The ECG waveform reproduction was significantly improved with the 3/4 mode IULTN, and with approximately the same number of weights ( =276), it significantly outperforms the 4mode CULTN ( =288). Note that the additional dimensional freedom of the 3/4 mode IULTN, with the 3mode ULTRNN ( =6) and 4-mode ISAFN ( =8), results in a better reproduction of the valleys in the Q and S waves. Relaxing the dimensional constraint may have resulted in a more optimal activation function slope dynamic, thus improving the network memory for modeling.
Case 2 considered the generation of clones with characteristics close to the ECG of a general population using a single network. Similar tasks using generative adversarial networks (GANs) were studied in [26], [27]. The synthesized generation of clone waveforms is useful for data augmentation in biosignal classification, clinical training, and data analysis tasks where real field data are scarce and difficult to collect. In addition, synthetic data are useful for anonymizing and alleviating patient privacy concerns [26]. In the dataset [15], it was noted that ECGs with elevated T waves are normal variants and form a small fraction (<5%) of over 4000 patients with no heart ailments. Such cases may not be well represented in data-modeling tasks, and their synthetic generation may be useful for increasing the data diversity. We consider a 3-input, 2-output, 12/12-mode ( = =24) IULTN for the autonomous generation of ECG clones with elevated T waves. We used the 50 samples drawn from [15] as the training set. Each training sample was period normalized with = 100. While the first and second inputs are the target ECG and pilot sine wave, the third input is a random seed drawn from a Gaussian noise generator and is associated with each target ECG. The outputs were one-step predicted values of the target and the pilot sine wave. The network was first trained in the open loop, and then in the closed loop with γ gradually reduced to 0.5. During closedloop training, only the first two inputs use output feedback, and the noise seed is retained as the external third input. Figure 21 shows the representative target ECGs, along with the corresponding autonomously generated outputs when fed with the training seeds. In addition, the test outputs were autonomously generated using new Gaussian noise inputs that were different from the training seeds. The generative capability of IULTN is evident from cloned outputs that have structural similarity to the training set, while exhibiting natural diversity. This example demonstrates the ability of the IULTN to autonomously generate complex biosignals with characteristics similar to those of a parent population. Reference [26] considered an LSTM-based deep GAN with 200 hidden units and three layers for an estimated 800000 trainable weights for each of the generator and discriminator networks. Comparable results were obtained with IULTN, which uses only 2880 trainable weights, a minuscule fraction (0.2%) of LSTM-GAN [26].

E. STOCK DATA MODELING
We consider synthetically generating nonstationary timeseries data with a probability distribution that best approximates that of the parent. We specifically modeled the distribution of Google's stock data previously studied in [16] which used TimeGAN. TimeGAN uses an embedding subnetwork and a recovery subnetwork to learn the underlying temporal dynamics, in addition to the discriminator and generator subnetworks common to a generic GAN. The training process has three stages: the first for the embedding subnetwork, the second for the full network, but with only supervised loss, and the third for a joint process considering all losses. TimeGAN exhibits the best performance among competing techniques for a range of time-series generative tasks [16].
The stock dataset consists of six variables: open, low, high, closing, and adjusted closing stock prices, and the stock trading volume for each trading day for the years 2004-2019. The objective of this exercise is to synthetically generate stock data over a 48-day period that mimics the Google stock data distribution. Stock data exhibit a high feature correlation among the five pricing variables. A 48-day period is considered here, as the temporal correlation for the 24-day periods studied in [16] is high, but significantly reduced for 48-day periods. We trained a 7-input 6-output 10/10 mode ( = =10) IULTN to perform generation through a onestep prediction. The first six inputs are the stock data variables. To improve learning robustness, the seventh is a Gaussian noise vector that better maps the temporal variability in any period. The network was first trained in the open loop and then in the closed loop, with in (35) progressively reduced to 0.25. The network was tested for autonomous generation of stock data using Gaussian noise as the external input. TimeGANs with four, six, and eight gated recurrent units (GRUs) for each of the six stock variables, similar to the one studied in [16] were trained, and the one with the eight GRUs that had the best performance was chosen for comparison with the IULTN. The key comparative items of the network architecture, its dimensions, and the computational burden are listed in Table  IV. As shown in the table, the number of trainable parameters for IULTN (2400) is a small fraction (~1.4%) of that of TimeGAN (174727). This is reflected in the substantially low computation time per epoch iteration for IULTN training (~15.3% of TimeGAN). The number of generative parameters required for the trained IULTN remains at 2400 and is still a small fraction (~6.2%) of the trained TimeGAN (38400).  Figure 22 shows a qualitative comparison of the IULTNgenerated data with that of TimeGAN. The visualization plots compare the principal component analysis (PCA) [28] and t-distributed stochastic neighbor embedding (t-SNE) [29] maps of the synthetically generated data with those of the original data. It is apparent that the distribution of the IULTN-generated data closely follows that of the original data, and is comparable to that of the TimeGAN-generated data.

FIGURE 22. Comparative PCA (left) and t-SNE (right) plots of TimeGAN (top) and IULTN (bottom) generated outputs FIGURE 23. Comparative MAE density map of predictions with TimeGAN (left) and IULTN (right) generated outputs
A quantitative comparison was performed using the train on the synthetic test on real (TSTR) technique using generic assessment networks based on GRU [16]. The first assessment network was trained to predict the closing stock price using training data extracted from the synthetic generation, with the trading volume data discarded. Next, the second assessment network with the same structure and dimensions as the first was trained using the training data extracted from the original data. The prediction performance of both networks was tested using fresh test data extracted from the original data. The mean absolute errors (MAE) of the predictions obtained using the two assessment networks collected over 200 train and test runs were used as comparison metric. The prediction MAE densities obtained for the TimeGAN-and IULTN-generated data are shown in Figure 23. It is clear from the figure that the prediction MAE with IULTN-generated data is closer to that obtained with the original data than that of TimeGAN. Specifically, the peak MAE of the IULTN case (0.0135) occurred in close proximity to that corresponding to the original data (0.0125). However, an accurate objective comparison with TimeGAN is not possible because the network dimensions and training are not fully optimized in either case.
In summary, the generative performance of IULTN compares well with that of TimeGAN both qualitatively and quantitatively with far fewer weights, a simpler training process, and lower computational times.

F. SMART GRID DATA MODELING
The objective of this exercise, similar to the task addressed in [18], is to synthetically generate energy consumption and solar generation data with the same probabilistic distribution as that of the Pecan Street Dataset [17]. We observed that the solar generation data for multiple users with installed PV panels exhibited repetitive patterns with variations owing to the onset of clouds and rain. Energy consumption data also exhibit similar trends but are accompanied by irregular spikes because of random consumption. The training data comprised hourly data of nine users in Austin collected over a year. We trained a 23-input 22-output 15/15 mode ( = =15) IULTN to perform autonomous generation through one-step prediction. The first 18 inputs and outputs correspond to the solar generation and energy consumption data of nine users. The four inputs/outputs were the pilot sine waves to represent the periodicity of the daily, weekly, and annual variations. A Gaussian noise seed is used as the final input. The autonomous generation capability of the network was tested using pilot waves and a Gaussian noise seed as the only external inputs. Figure 24 shows the IULTNgenerated solar generation and energy consumption data over a one-year period for a representative user, compared with the corresponding original data.  Figure 25 shows a qualitative assessment of the IULTNgenerated data using visualization plots based on PCA [28] and t-SNE [29] maps. It is apparent that the distribution of IULTN-generated data closely follows that of the original data.
A quantitative comparison was performed using the TSTR technique with assessment networks based on GRU [16], similar to that one considered in Section E for stock data modeling. The two assessment networks were trained for one-step prediction, the first with the original data and the second with the IULTN-generated data. The MAEs of the solar and energy data prediction for the nine users collected over 200 train and test runs are shown in Figure 26. It is clear from Figure 26 that the MAE density of the predictions with the IULTN-generated data closely matched that of the original data. An objective comparison with the results of [18] is not possible because of the sparsity of network information. In summary, the autonomously generated IULTN data compares well with the original solar generation and energy consumption data both qualitatively and quantitatively.

V. CONCLUSION
The main contributions of this study are as follows: (1) Activation function slope modulation with a recurrent ISAFN architecture where the states and dimensions are decoupled from the ULTRNN for better control of memory retention. (2) Zigzag propagation algorithm for the weight updates to account for the state interactions in the IULTN. A truncation technique is presented that significantly reduces the speed of gradient computation without significantly affecting accuracy. (3) Closed-loop training with an eigenvalue metric of the linearized closed-loop system to ensure sustained output generation emulating the target. The effectiveness of the proposed approach was demonstrated with applications to synthetic data generation encompassing chaotic systems, complex periodic patterns, biosignals, and nonstationary processes.
We developed a novel method to dynamically vary the activation function slopes of a ULTRNN by using a secondary network. Varying memory retention over time allows the network to capture short-term features while continuing to remember long-term dependencies, which is a key requirement for the generative modeling of time series. We compared two architectures for the secondary network, CSAFN and ISAFN. The CSAFN is a feedforward architecture that uses the main ULTRNN states to compute activation function slopes. This simplifies training by directly extending the backpropagation algorithm. The ISAFN uses a recurrent architecture with independent states to compute activation function slopes. The interaction between the ISAFN and ULTRNN states makes the z-prop training process complex. The sharing of states in the CULTN architecture results in suboptimal performance. The independence of states in the IULTN architecture allows for a better optimization of the activation function slopes. The marginal stability of the triangular state-feedback submatrices of the ULTRNN ensures stable and robust training of the open-loop system, because the gradients are not allowed to either explode or vanish. In generative tasks, with the predicted output fed back as the input to generate the next prediction, it is essential to ensure marginal stability of the closed-loop system. An eigenvalue metric of the linearized closed-loop system was employed to ensure that the network behaved as a nonlinear oscillator whose output was neither saturated nor collapsed. Maintaining the marginal stability of both open-and closed-loop systems inherently allows for the robust and stable learning of the target features. Robust learning was enhanced by noise injection, n-step prediction, and pilot waves.
Examples demonstrate that the autonomous waveforms generated by the IULTN have qualitative and quantitative features closely similar to the target, and that the IULTN outperforms state-of-the-art neural network architectures with a smaller number of trainable weights and lower training times, as shown in Table V. Specifically, the IULTN uses only a very small fraction of the weights when compared with deep GAN networks.