Numerical Simulations and FPGA Implementations of Fractional-Order Systems Based on Product Integration Rules

Product integration (PI) rules are well known numerical techniques that are used to solve differential equations of integer and, recently, fractional orders. Due to the high memory dependency of the PI rules used in solving fractional-order systems (FOS), their hardware implementation is very difficult and resources-demanding. In this paper, modified versions of the PI rules are introduced to facilitate their digital implementations. The studied rules are PI rectangular, PI trapezoidal, and predict-evaluate-correct-evaluate (PECE) rules. The three modified versions of the PI rules are validated using a benchmark system of differential equations for different sizes of the memory window to show the effect of the window size on the solution accuracy. Additionally, the three modified versions of the PI rules are used to simulate a novel fractional-order chaotic system (FOCS) where its bifurcation diagrams are discussed with the window length parameter. The chaotic system is then implemented on a Field Programmable Gate Array (FPGA). There are only a few trials in literature to implement the fractional PECE algorithm on FPGA, nevertheless, the proposed FPGA realization is compared with the most recent of these trials. The FPGA implementations of the three PI rules are made using Xilinx ISE 14.7 on Artix 7 kit. The achieved throughput are 1280 Mbits/sec for PI rectangular, 128.8 Mbits/sec for PI trapezoidal, and 129.12 Mbits/sec for fractional-order PECE.


I. INTRODUCTION
The origins of fractional calculus date back to the beginnings of calculus itself. The idea started in a correspondence in 1695 between Leibniz and L'Hopital, where they discussed the possibility of raising the differential operator to the power of 1/2 [1], [2]. Fractional calculus is the study of noninteger order derivatives and integrals where the order can be rational, real, or even complex [1]. Over the last few decades, researchers showed a huge interest in the study of FOS due to their flexibility and their ability to model systems with memory dependency [3]. Also, another advantage of fractional-order modeling is that it regenerates closer The associate editor coordinating the review of this manuscript and approving it for publication was Jun Shen . responses to the real system while maintaining more compact mathematical formulations than the integer-order counterparts [1]. This advantage is a direct result of the additional tunability attained by introducing fractional orders as new model parameters. Engineering application of FOS includes: bioengineering [4]- [6], control [7]- [10], filters [11], [12], oscillators [13], [14], energy [15], [16], encryption [17], and chaos [18]- [20].
Chaos is an interesting phenomenon that has a quasirandom behavior [21]. It presents nonlinear dynamical systems that are highly sensitive to initial conditions [22], where a minor change in input leads to a significant difference in the output [18]. Chaos theory focuses on systems such as the stock market, weather, neural networks, biology, security, and encryption [23], [24]. A chaotic system can be either VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ continuous (usually called chaotic attractors) or discrete (usually called chaotic maps) [18], [25]. Lorenz first described the chaos theory in the 1960s by presenting a butterfly attractor while working on weather prediction [26]. In 1975, Li et al. have stated simple chaos models for describing similar behavior [27]. Lately, new categories of chaotic systems, self-excited, and hidden attractors have been proposed in [28], [29]. Chaotic systems can be implemented using analog integrated circuits or even at the transistor level [30], [31]. However, due to the non-linearity and to avoid deviation of the chaotic systems parameters, it is preferred to digitally implement the chaotic systems. Mainly, FPGA chips have been utilized in the realization of chaotic generators. FPGAs have a vital position in digital communications and encryption systems due to attributes like reprogrammability and high speed [32]. FPGAs intuitively grant rapid prototyping for Application-Specific Integrated Circuit (ASIC) in low to medium production [33]. The time for design-implement-test-debug cycle may take only hours not months as in ASIC, therefore, the modification in the design will be easier [34]. Also, low power consumption, low cost, easy modification, real-time computing, and high capacity are inherent merits of FPGAs [33]. Additionally, FPGA is characterized by its flexibility due to the predesigned Configurable Logic Blocks (CLB) that is used in industrial applications that match the aimed requirements [35], and the parallel structure which makes it appropriate for high-speed applications, furthermore it outruns microprocessors. Besides, FPGAs clock rates work in the hundreds of MegaHertz (MHz). According to the characteristics of FPGA, it is the most suitable technology for hardware realization of complex systems, that can surpass the microprocessor by a ratio of 100 to 1 based on the system implementation [36].
Fractional-order models increase the complexity of the dynamic behavior of systems. Accordingly, the alliance of chaos theory and fractional calculus to have FOCS, makes the behavior of the systems more complex. Therefore, these systems could be used in various applications such as synchronization and control [37], encryption [38], economics [39], and signal generators [40]. Evaluating solutions of fractional differential equations (FDEs) accurately and efficiently is much more complicated than the traditional integer-order case. Also, popular software packages don't provide this option as built-in functions [41]. So researchers in this field have to write their codes to solve fractional-order differential equations numerically [41]. The persistent memory of the fractional-order differ-integral operators is one of the most difficult problems faced when coding a numerical technique that solves FDEs due to being very computationally exhaustive [42]. Based on this, the numerical methods for solving FDEs are not ready, in its original form, to be implemented on hardware platforms like micro-controllers or FPGAs. So, some manipulations and trade-offs must be made to achieve this goal, like the use of the short memory principle [43], [44].
There are different implementations for fractional-order operators on FPGA. Two different approximations for fractional-order operators to realize FOS on FPGA were presented in [36]. Another approximation to control DC motor using a fractional-order PI controller was implemented on FPGA in [45]. Hardware implementation of the Grünwald-Letnikov (GL) differ-integral was introduced in [44] for different memory sizes on FPGA. Also, based on the GL operator, FPGA implementation of integrator/differentiator was presented in [7]. Furthermore, Infinite Impulse Response (IIR) was used to implement fractionalorder operators in [46]. Moreover, the LABVIEW environment was used to implement fractional-order operators using GL technique on FPGA, then connecting FPGA with an external circuit to realize fractional-order Proportional Integral Differential (FOPID) controller. Direct Torque Control (DTC) induction motor was controlled through the implemented controller on FPGA using MATLAB/Simulink tool in [47]. Also, FPGA implementation of digital controllers was realized for power electronics as Space Vector Pulse Width Modulation (SVPWM) using MATLAB/Simulink and HDL coder in [48]. In [49], chaos control of fractionalorder motor using sliding mode controllers was implemented on FPGA. The generation of double-scroll chaotic attractor based on fractional-order unstable dissipative systems was introduced in [50]. Various fractional-order chaotic oscillators were implemented on FPGA using the GL method [51]. Based on the auto-regressive filter, a simple design of frequency hopping chaotic generator was implemented on FPGA in [52]. Fractional-order Liu system was realized on FPGA with topological horseshoe analysis in [53].
Most of the time domain based FPGA implementations of chaotic systems that did not use MATLAB/ Simulink code generation were based on the GL fractional differ-integral and its associated numerical method [44]. The other time domain based numerical methods, to the best of the authors' knowledge, were not discussed for FPGA implementation. Hence, this paper discusses the modification of the product integration (PI) rules to prepare them for hardware implementation by employing the short memory principle. This paper presents three different FPGA implementations of a fractional-order multi-scroll attractor based on three different product integration rules. Modified versions of the product integration rules are proposed to allow the usage of the short memory principle, which facilitates the hardware implementation of the FOS on FPGA. Three different algorithms are used to realize the multi-scroll chaotic attractor on FPGA, which are: PI rectangular, PI trapezoidal, and the fractionalorder predictor-corrector scheme. The same approaches can be applied to implement other FOS on hardware.
The paper is organized as follows: section II presents the theoretical background of the numerical techniques used in solving FDEs, the proposed modified numerical techniques which are ready for FPGA implementation, and numerical validation of the proposed methods. The multi-scroll chaotic attractor used for hardware realization and FPGA implementation with its results of the three algorithms is discussed in Section III. Finally, the concluding remarks of this work are summarized in Section IV.

II. THEORETICAL BACKGROUND
The definitions of the fractional-order differ-integrals are numerous and ever-increasing. They have recently been categorized into four classes: the classical definitions (F1), the modified definitions (F2), the local definitions (F3), and the definitions with non-singular kernel (F4) [54]. The Caputo definition belongs to the F1 class and is given as [55]: It can be seen as the m − α order fractional integral of the m-th integer order derivative of the function f (t).
In practice, the Caputo definition is preferred over other definitions of the F1 class as it uses integer order initial condition in its Laplace transform while the others use fractionalorder ones [55]. This is due to the fact that researchers are accustomed to measuring integer order initial conditions rather than fractional-order ones.
The initial value problem for an FDE that is described by the Caputo derivative is given by [56]: where f (t, y(t)) is a continuous function and y 0 , y 0 , . . . , y (m−1) 0 are the set of integer order initial conditions. Applying the Riemann-Liouville fractional integral to both sides of Eqn.(2) yields [41]: where T m−1 [y; t 0 ](t) = y 0 for 0 < α < 1. Eqn. (2) is known as weakly singular Volterra integral equation (VIE). VIEs are a well studied topic and there are several theories and numerical techniques associated with them [41].
Step-by-step numerical techniques used for solving differential equations are either one-step or multi-step. In the former, the approximation result of the solution at only the previous step is used to evaluate the current step. Whereas, in the latter, more than one previous step is used to evaluate the current step. Fractional differential equations are known for their memory dependency, which makes choosing multistep methods for their solutions a must. These multi-step methods can be considered a convolution quadrature given in general as [41]: where φ n and c n are known coefficients that depend on the employed technique and t n = t 0 + nh is the equally spaced grid with h > 0. There are only two classes of multi-step methods that were studied in the literature: product integration rule and fractional linear multi-step methods [41]. This work is concerned with the PI rules only.

A. ORIGINAL PI RULES
Based on the PI rules, the solution of the Eqn. (3) can be written in a piece-wise manner as [56]: and approximated using polynomials to evaluate the integral. Different choices of the polynomials leads to different PI rules. Implicit or explicit rules are implemented based on the way the integral approximation is carried out. Implicit methods involves solving nonlinear equations as y n depends on f (t n , y n ). However, in explicit methods y n doesn't depend on f (t n , y n ) but on f (t n−1 , y n−1 ) and past approximates of f and y. Hence, explicit methods are better suited for hardware implementations.
If f (τ, y(τ )) in Eqn. (5) is approximated using a constant value in each sub-interval, two methods will result. The explicit PI rectangular rule is given by [41]: The implicit version of the PI rectangular rule is given by [41]: If f (τ, y(τ )) in Eqn. (5) is approximated using a first order polynomial in each sub-interval [56]: the implicit type trapezoidal rule will result and is given by: wherẽ Formulating an explicit variant of the trapezoidal PI rule is not a straight-forward procedure and is rarely encountered in literature. One of these variants is given as [57]: The predictor corrector scheme is the result of combining the rectangular PI explicit rule with the trapezoidal PI implicit rule as: This is done to avoid solving nonlinear equations in the implicit PI rule.

B. PROPOSED FPGA READY PI RULES
The coefficients in the original version of the PI rules are heavy-tailed, which leads to an indispensable dependency on almost all the previous calculated samples of the solution. This is a huge drawback when it comes to hardware realization. So, new expressions are proposed to reduce this long memory dependency and allow the hardware to store only a relatively small window of the past estimates. The relative magnitudes of the originally used coefficients and their suggested first-order differences are shown in Fig. 1. It depicts that the magnitude of the coefficient or its first-order difference for any value of α decreases as the iteration number increases except at α = 1 where they remain constant. Also, at the same iteration number, the magnitude of the coefficient or its first-order difference decreases as α increases. The main idea is to evaluate the expression y n − y n−1 for each of the explicit versions of the original PI rules. The difference between the coefficients at iteration n and at n − 1 is much less in magnitude than the coefficient itself. This reduces the dependency on the oldest values of y.
The proposed version of the PI rectangular rule is given by: where the first sample, at n = 1, is calculated using Eqn. (6).
A windowed version of the modified PI rectangular rule is given by:  where TR is the window size and the first TR + 1 samples are calculated using Eqn. (13). The modified version of the PI trapezoidal rule: where the first two samples, at n = 1 and n = 2, are calculated using Eqn. (11). A windowed version of the modified PI trapezoidal rule is given by: where the first TR + 2 samples are calculated using Eqn. (15). Due to hardware window limitations, the expressioñ a (α) n−1 is approximated by a rational function of the form: The rational function coefficients are extracted using MATLAB curve fitting toolbox. As mentioned before, the predictor-corrector scheme is a combination between the explicit PI rectangular rule and the implicit PI trapezoidal rule. The modified version of the PECE is given by: where the first two samples, at n = 1 and n = 2, are calculated using Eqn. (12). A windowed version of the modified predictor corrector scheme is given by: +h α b 0 f (t n−1 , y n−1 ), y n = y n−1 + h α (ã (α) n −ã where the first TR + 2 samples are calculated using Eqn. (18).

C. NUMERICAL VALIDATION OF THE PROPOSED METHODS
A set of five benchmark Caputo FDEs were proposed to compare performances of different numerical algorithms [58]. From these five problems, the following is chosen to benchmark the modified PI rules: where the initial conditions are x(0) = 1.0, y(0) = 0.5, and z(0) = 0.3. The analytical solutions are given by [58]: The benchmark results of Eqn. (20) using the modified numerical methods in Eqn. (14), Eqn. (16), and Eqn (19) and their relative errors are shown in Fig. 2, Fig. 3, and Fig. 4, respectively. The simulations are made between t = 0 and t = 5 at a step size h = 0.01 and different window sizes namely: TR = 50, 100, 200, 350, and 500(the total number of samples). The relative error levels vary significantly over the time interval of interest and between different implementations. The relative error levels in the modified PI rectangular implementation is higher than the modified PI trapezoidal rule, which in turn is higher than the modified PECE. In case of the modified PI rectangular and trapezoidal rules, the relative error drops down significantly around the position where the window effect takes place, at t = h * TR, and then rises up till it reaches its maximum value at the end of the interval (t = 5, n = 500), except for the case when the window is the same size as the studied time interval (window effect is removed, TR = n). However, in the case of the modified PECE rule, the relative error rises immediately at t = h * TR without going down as in the previous two cases.
The GL based approach is not included in the comparison because it gives complex values results when it is used to simulate the benchmark system in Eqn. 20. The GL approach is a generalization of the classical Euler method used in the simulation of the integer order system [41], [56]. This can be considered one of the main drawbacks of the GL approach due to its lower accuracy levels when compared with other approaches like the FO-PECE method for example [59].

III. FPGA IMPLEMENTATION OF MULTI-SCROLL CHAOTIC OSCILLATOR A. THEORETICAL ANALYSIS
In this section, the chaotic attractor used for hardware realization is discussed. Some of the chaotic attractors generate multi-scrolls, where they are related to a modified version of double-scroll attractors like Chua [60]. The employed chaotic attractor is defined as follows [61]: where q i ∈ (0, 1),  form as follows: where U = 24 and L = −24 are integers multiples of eight, represent the lower and upper bound of the step function. Also, q 1 , q 2 , and q 3 are the non-integer orders, and a = 1.25 and b = 0.75 are the system parameters.
The output of the floor function returns a maximum integer not greater than x/8. The equilibrium points of this system can be found by solving [61]: (24) VOLUME 8, 2020  The solution is found to be [61]: x * = y * = z * = 8i + 4, i ∈ {±3, ±2, ±1, 0}. (25) This means that the system under investigation has 7 3 equilibrium points. These points appear in the simulations as the center points of the scrolls where the attractor rotates about but never crosses them.
The proposed system can be solved using any of the three proposed methods; Product Integration Rectangular (PI-RECT) method, Product Integration Trapezoidal (PI-TRAP) method, and Product Integration Rectangular-Trapezoidal (PI-RECT-TRAP) method as previously discussed in Section II There are different types of data representation, fixedpoint or floating-point. Fixed-point is described as follows one bit for sign, bits for integer part and bits for the fractional part, while floating-point is divided as one bit for sign, bits for exponent and bits for mantissa [62]. Subsequently, the floating-point seems to be similar to scientific notation and more accurate, but fixed-point is commonly used for hardware implementations to save cost and increase the speed. Therefore, fixed-point representation is chosen to implement the proposed systems to save the hardware resources by maintaining the quality of the implementation. Figure 5 shows the overall block diagram describing the proposed hardware realization of the FOCS by PI-RECT or PI-TRAP methods. For implementing the proposed system on FPGA, three registers are needed to save the primary states of the system x, y, and z. Each state is computed by a combinational circuit using 32-bit fixed-point for each register, divided into 7-bit and 25-bit for the integer and fractional parts, respectively.
The nonlinear function of the system g(w) is implemented in the PWL form with three pieces as described by Eqn. (23);  therefore, a multiplexer is required to select the operation, where the selection lines depend on the value of w. The hardware realization of 8floor(w/8) in Eqn. (23) is executed by taking the four Most Significant Bits (MSBs) of the input of the function g(w) as shown in Fig.6.

B. RECTANGULAR ALGORITHM
In Fig.7, an illustration of the PI-RECT module is presented. The first values of the three main states of the system are initialized as (x = 0.01, y = 0.01, and z = 0.01). Then the second iteration is calculated as in Eqn. 6 For implementing the proposed algorithm, LUT1 (see Fig. 7) is used to store the difference of (b (α) as stated by Eqn.14 labeled by bn_diff in Fig.7.
Tx i is the combinational circuit required to compute the numerical solution of the system in Eqn.22. In Fig.7, Tx i is multiplied by the coefficients from bn_diff 0 to bn_diff n−1 . datax i in LUT2 is used to save the values of the adder from Tx i bn_diff 0 + datax i to Tx i bn_diff n−1 + datax n−1 . Then, the result from the adder is added to Tx i bn 0 , and multiplied by h q 1 . The output of every clock cycle is taken as Tx i bn_diff 0 + datax 0 .

C. TRAPEZOIDAL ALGORITHM
The first values of the three main states of the system are initialized as (x_c = 0.01, y_c = 0.01, and z_c = 0.01). Then, the second and the third iterations are implemented as in Eqn.11. Figure 8 shows the implementation of the proposed algorithm where three LUTs are employed. LUT1 is used to store the difference of (ã (α) n−j−1 ) as stated by Eqn. 16. LUT3 is used to store the values from Tx i an_diff 0 + datax i to Tx i an_diff n−1 + datax n−1 . Also, three registers are used to save Tx i−1 ,Tx i−2 , and Tx 1 .
After preparing the required LUTs (LUT1, LUT2, and LUT3) and registers (Tx i , Tx i−1 , and Tx i−2 ), the combinational circuit is ready to calculate the main states of the FOCS using the trapezoidal method.
The implementation of PI trapezoidal algorithm depends on the computed Tx i , where i represents the real-time, which makes PI-TRAP unrealizable on FPGA after the window size. Therefore, Eqn.17 is a curve fitted equation used after the   window size to make the PI-TRAP algorithm applicable on FPGA.

D. RECTANGULAR-TRAPEZOIDAL ALGORITHM
The Predict Evaluate Correct Evaluate (PECE) method is the combination between the two previous algorithms, PI-Rectangular and PI-Trapezoidal. For implementing the proposed algorithm, three LUTs are needed to store the coefficients of both rectangular and trapezoidal methods. 26-bit registers are used, divided into 2-bit and 24-bit for the integer and the fractional parts, respectively.
The first values of the three main states of the system are initialized as (x_c = 0.1, y_c = 0.1, and z_c = 0.2). The second iteration of this algorithm is calculated as in Eqn. 12.
After that the algorithm is as follows: Tx i is computed then transferred to the PI-RECT section. Then, it is added to the previously calculated value from the PI-TRAP algorithm. The calculated value then passes by the f (y) block to compute the Tx i+1 . Last, Tx i+1 passes by PI-TRAP and is added to the value of the previously calculated value from the PI-TRAP algorithm (x i−1 ).
For the PI-RECT-TRAP algorithm, g(w) is implemented as a function in the same module not as a separate module as in PI-RECT and PI-TRAP algorithms. Also, in Fig. 5, the PI-RECT/PI-TRAP part, and in Fig. 9, the PI-RECT and the PI-TRAP parts, are sections from the system module not separated ones.

E. RESULTS AND DISCUSSION
The proposed algorithms (PI-RECT, PI-TRAP, and PI-RECT-TRAP) are realized using Verilog hardware description language, Xilinx ISE 14.7, Pmod DAC2, digital Oscilloscope DPO 4104, and Xilinx FPGA Artix-7 XC7A100T. Each algorithm is verified using a testbench, by importing the data from RTL simulation and plotting it using MATLAB. An Oscilloscope is used to plot the fractional-order multiscroll chaotic attractor, where a digital to analog converter must be used to be connected to the JTAG interface of FPGA. This interface produces two serial data outputs, and each is 12-bit sent serially to the Pmod DAC2 that is connected to the oscilloscope for showing the chaotic attractor. Figure 10 shows the hardware setup of one of the proposed algorithms while Fig. 11 presents the x − y plane of the chaotic attractor generated using the proposed algorithms (PI-RECT, PI-TRAP, and PI-RECT-TRAP) for window size equal to 20. Hardware resources comparison between the three methods is presented in Table 1. Regarding the LUTs, the utilization percentages out of 63400 LUTs are 3.4%, 30%, and 33.06% for PI-RECT, PI-TRAP, and PI-RECT-TRAP, respectively. For the slice registers, the utilization percentages out of 126800 are 1.029%, 1.78%, and 1.99% for PI-RECT, PI-TRAP, and PI-RECT-TRAP, respectively. As Expected, the PI-RECT-TRAP implementation uses the most resources due to its complexity as it can be seen as a modified combination of the PI-RECT and PI-TRAP implementations.
The performance can also be measured using the throughput parameter, which represents the output data per second. It is calculated by multiplying the clock speed with the number of output bits (32-bit). Consequently, the achieved throughput are 1280 Mbits/sec for PI rectangular, 128.8 Mbits/sec for PI trapezoidal, and 129.12 Mbits/sec for fractional-order PECE. Tables 2, 3 and 4 present the chaotic system behavior at different values of the system parameters and window sizes. The color of each graph changes from dark blue to cyan as time progresses. All simulations in these tables are made at h = 0.0625 and simulation duration equals to t f = 10000 seconds. It can be seen that the newly introduced window size parameter can have a significant effect on the number and the size of scrolls generated during fixed simulation time and the stability of the system. The system response can change from chaotic to stable or from chaotic to unstable by only changing the window size.
Bifurcation diagrams in Table 5 are used to validate the effect of window size on the chaotic system behavior. From the window size perspective, three different cases are chosen to be discussed. For (α = 1, β = 0.8 γ = 1, a = 1.1 and b = 0.7), the system is chaotic for all window sizes (TR, TR_P, or TR_C) from 1 to 200, which is also seen from the bifurcation diagram in the first column of Table 5. Regarding the horizontal spaces in the bifurcation diagrams, in Table 5, they represent the equilibrium points of the system that are mentioned in Eqn. 25. For (α = 0.9, β = 0.8 γ = 1, a = 1.5 and b = 0.7), the system is chaotic when it is solved by PI-RECT method. However, in the case of PI-TRAP and PI-RECT-TRAP, the system is only chaotic for a limited range of TR_C and TR. Bifurcation diagrams, in Table 5, make it more obvious that after 20, when window size equals 23, the system starts to be unstable. Finally, For (α = 0.9, β = 0.8 γ = 1, a = 2.5 and b = 1.7), the system becomes stable as the window size increases.
When comparing the proposed design in this work with its recent counterpart introduced in [62], two important distinctions are found. The first one is the usage of memory. The authors in [62] used a memory of length 256 to represent the window size of the fractional operator while the proposed FPGA design in the current manuscript doesn't use memory at all. The second difference the latency. The total latency in [62] is 272 clock cycles which is very big compared to the latency of only 1 clock cycle observed in the proposed FPGA design.

IV. CONCLUSION
This work presented an FPGA realization of the fractionalorder operator based on the product integration rules, with a modification in the PI rules to make it suitable for hardware realization. A multi-scroll chaotic attractor was realized on FPGA using the proposed methods (PI rectangular, PI trapezoidal, and predict-evaluate-correct-evaluate). The results of the systems based on the three algorithms were presented on a digital Oscilloscope. A comparison between the three algorithms from the hardware resources perspective was presented. The proposed algorithms are generic methods that can be used to realize any FOS on FPGA. Future directions might include using the proposed FPGAready techniques in other applications like encryption. Also, the proposed modifications can be applied to the multicorrector numerical techniques.

ACKNOWLEDGMENT
Authors would like to thank Nile University for facilitating all procedures required to complete this study.