An Efficient and Reliable Chaos-Based IoT Security Core for UDP/IP Wireless Communication

The ultimate focus of this paper is to provide a hyperchaos-based reconfigurable platform for the real-time securing of communicating embedded systems interconnected in networks according to the IoT (Internet of Things) standards. The proposed platform’s RTL (Register Transfer Level) architecture is entirely developed and designed from scratch using the VHSIC Hardware Description Language (VHDL). The original idea consists of exploiting the nonlinearity of a discretized and optimized 4D Lorenz hyperchaotic system as an encryption keystream generator in a symmetric cryptosystem to secure wireless communicating embedded systems and adapted to the UDP/IP protocol. It was necessary to go through three essential steps to achieve this goal. First, a lightweight and energy-efficient hyperchaos-based encryption IP core is designed, implemented on an FPGA circuit and dedicated to IoT device security, denoted Hyperchaotic-based IoT Device Security Core (HC-IoT-DSC). The designed encryption IP core combines three subsystems: a multiple key size hyperchaotic key generator (HC-KG), a hyperchaotic synchronization by dynamic feedback modulation technique (HCS-DFM), and an online FIPS 140-2-based built-in self-security test (BISST) module. Second, a secure UDP/IP stack is totally implemented using the VHDL language. Third, the proposed architecture was integrated into real-world and real-time secure wireless communication at a distance of 2 km between two delocalized network nodes employing the Xilinx ML605 FPGA platform and the ZigBee E800-DTU module. A panoply of online/offline investigations and experiments were carried out intensely, deeply, and thoroughly to analyze, evaluate and validate the robustness and security aspects of the proposed scheme regarding all the aspects related to embedded system security. Notably, the evaluations were conducted in two phases for all the platform components before and after integrating the proposed security core in real-time wireless communication. The investigations and implementation findings validate that the proposed architecture can attain good performances, and confirm the feasibility of the adopted approach for IoT applications. Furthermore, the timing and power efficiency results present an excellent trade-off between design performance and high-security achievement.

exchange of information on large public communication networks such as the Internet of Things [2].
The IoT strings various objects with different models, capabilities, and qualities around the globe. The IoT has simplified daily life by merging the digital and physical eras. The IoT network's rapid expansion and the widespread use of IoT devices blur the line between the digital and physical worlds, exposing vast regions to potentially innovative attacks that traditional cybersecurity measures have not foreseen [3], [4]. In this setting, the major problem is to manage billions of things connected to diverse networks [5], [6]. This variety results in another significant problem called heterogeneity, which requires security to overcome the challenge of deploying efficient cryptographic algorithms and protocols on all IoT ecosystem components [5], [6].
The resource constraints of the IoT and the vast number of deployed and linked devices, which increase the heterogeneity impact and decrease the scalability ratio, complicate, if not make, the direct implementation of advanced security procedures in many circumstances. The absence of authentication and authorization standards for IoT devices fosters malicious attacks on quiet confidentiality attacks on network availability, such as denial-of-service (DoS) attacks [7]. Additionally, security problems significantly influence the safety of IoT devices, and several security concerns must be addressed. Therefore, the integration and interoperability of the IoT with various technologies provide an opportunity to rethink security principles around data collection, storage, and sharing to establish an inclusive, human-centered safe environment [8], [9]. Usually, not all security threats are apparent, and connection might have unexpected implications. Developing dependable and secure real-time systems that make the IoT worthwhile requires a robust security approach that ensures data privacy, confidentiality, integrity, authentication, and identifying and trusting both digital and physical data sources.
Creating a complete set of IoT standards may cover networking, communication, and data management and contribute to general interoperability. Developing designs and prototypes contributes to reducing fragmentation in early IoT systems. More precisely, caution must be used in selecting and developing appropriate solutions. Today's technological considerations will bind the IoT indefinitely, and leak standardization may cause restricting security and usage alternatives. However, this constraint is projected to be eased soon [10]. As actions are taken in the following years, it will be critical to understand the origins of IoT devices and their security. Security by design is a method of developing software and hardware that incorporates security from inception, even if it results in additional expenses, rather than being added after a cyber incidence. The need for security by design has grown critical as technology firms continue to produce a flood of IoT devices for consumers and businesses. Most of these objects were created with no security features, making them ideal candidates for security vulnerabilities. Securing linked devices and using a tiered security strategy is critical, which we refer to as security by the policy. This strategy aims to mitigate security risks using a diversified collection of independent security techniques applied at different levels of the IoT architecture.
The rapid IoT revolution imposes that security mechanisms be continuously inspected, and new security paradigms should be proposed [11], [12]. Among that, we can safely say that the IoT will be a revolutionary technology if we can overcome its weaknesses concerning architecture, standardization, and security. Unfortunately, the most current IoT solutions that rely on conventional cryptography architectures will soon reach their limits and cannot keep up with those complex technical challenges [13]. Therefore, we rationally need new security techniques to address the IoT challenges appropriately. Alternative solutions, utterly different from standard cryptographic techniques, are currently of apparent research interest in this context: chaos-based cryptography has always been the case.
Chaos-based cryptosystems and the IoT have become progressively prevalent since a few years ago [14]- [19]. Many researchers worldwide are now trying to develop new ways of integrating chaos and IoT to create highly secure but robust ecosystems and address technical and other issues. Chaos-based cryptosystems and IoT as standalone architectures have already proved highly disruptive. However, one can easily fall by modifying the architectures without effectively guaranteeing their operation or applying them to scenarios where the cost does not compensate for the improvement. More specifically, the IoT and chaosbased cryptosystem union seem convenient for both but spout various potential security and architectural challenges. Integrating chaos-based cryptosystems and the IoT should be analyzed carefully and taken with high caution to work together successfully. Merging these two systems should be addressed, considering the challenges identified above. Beyond the security aspect, which affects both systems, most research efforts should also be made to ensure high performance regarding all factors of the embedded system, such as energy efficiency and high speed. Therefore, we argue that there is a pressing need for more extensive research into IoT security by applying chaos cryptography.
A dynamical system is chaotic if a significant portion of its phase space simultaneously presents the following two characteristics: the phenomenon of sensitivity to initial conditions and a strong recurrence. These two properties lead to a highly disordered behavior rightly qualified as chaotic. Chaotic systems' unpredictability and sensitivity to initial conditions have attracted the attention of academics involved in information security. Indeed, the chaotic system's unpredictability arises from the fact that a minor change in the initial conditions results in drastically different behavior. This feature may conceal data inside a chaotic signal. The deterministic nature of chaotic systems enables the generation of similar chaotic signals from identical initial values and control parameters. Using two identical chaotic systems as transmitter and receiver makes it conceivable to envision a communication that uses chaotic events to hide the communicated information. However, synchronization between two chaotic systems is challenging because of the extreme sensitivity to initial values.
In addition to the concepts of chaotic systems, hyperchaotic systems have more degrees of freedom than chaotic systems and are therefore closer to the natural systems they model. Hyperchaotic systems are modeled by no less than four differential equations that induce additional constraints (i.e., other mismatch parameters and other initial conditions) and give two or more positive Lyapunov exponents. Hyperchaotic systems present more complexity, large keyspace, and higher unpredictability than chaotic systems. Therefore, hyperchaotic systems are more suitable for chaos-based cryptosystem applications [20]- [22]. Several hyperchaotic systems have been introduced in the literature, such as the Rössler [23], Lorenz [24], Chen [25], and Liu [26]. The first hyperchaotic system, which the German Otto Rössler proposed, is related to the study of fluid flow; it follows from the Navier-Stokes equations. The mathematical model of this system was discovered as a result of work in chemical kinetics.
Recent studies categorize chaotic or hyperchaotic systems, either integer-order (IO) or fractional order (FO), into two main classes: systems with self-excited attractors (CSSA) and systems with hidden attractors (CSHA) [27]. An attractor is self-excited if its attraction basin includes at least one equilibrium point. Or else, the attractor is defined as hidden. According to [27], self-excited attractors can be identified by applying a simple calculation, making them incapable of resisting attractor reconstruction attacks. Therefore, the CSSA can be easily attacked in secure applications. However, the evaluation of equilibrium points of CSHA is arduous, which complicates the identification and localization of the hidden attractors. Despite this difficulty, suppleness in the system performance without changing parameters can be employed with the correct control techniques to transition between distinct coexisting states [27]. Recently, great attention has been given to modeling, studying, designing, identifying, and controlling CSHA [27].
Chaotic systems in continuous time are represented by a set of ordinary differential equations (ODEs) that have a unique solution. This solution is defined as a trajectory in phase space. It has the property of never going through the same point in this space, i.e., chaotic signals are bounded without periodicity, valid for a physical phenomenon of exorbitant chaos such as the atmosphere. A chaotic system cannot repeat its dynamic evolution since it can be in an infinity of states, and it is impossible to reproduce the same state with exactness. This phenomenon is true in continuous space, but when chaotic systems are discretized with finite precision and applying numerical resolution methods, the probability of revisiting a point in the phase space becomes nonzero. In addition, the results obtained numerically no longer represent the same system of ODEs since they are not defined in the same space even though the global behavior remains identical [27], [28]. Therefore, this property transforms the infinite trajectory of the chaotic system into a set of closed and finite trajectories with different lengths.
Solving chaotic systems is feasible by applying various numerical techniques such as Forward and Backward Euler, Trapezoidal, fourth-order Runge-Kutta (RK4), Adams-Bashforth, and Adams-Moulton. Hardware rounding and truncation, as well as other sources of mistakes (e.g., multistep, variable order, and variable step-size), are all factors that might affect the accuracy and convergence of numerical resolution algorithms [27], [28]. It remains challenging to choose the appropriate numerical approach for addressing a specific ODE issue. A similar difficulty is the estimate of the time-step, intending to achieve the lowest possible error in comparison to the exact solution and the numerical method stability. Indeed, suppose the numerical technique and the time-step are not chosen appropriately. In that case, the solution might diverge, converge, be incorrect, or have other undesirable computing consequences, leading to algorithm instability [27], [28].
In [28], a pertinent work was presented by Martín Alejandro Valencia-Ponce et al. It has been proven that the chaotic system behavior can be optimized by estimating the highest integration step in either one or multistep numerical methods. Mainly, the authors realize a stability analysis of several numerical methods by applying three different metaheuristics algorithms. The results confirm that the Kaplan-Yorke Dimension can be maximized while preserving the highest integration step. In the same context, Esteban Tlelo-Cuautle et al. in [27] realize a numerical simulation of IO and FO chaotic systems using one-step and multistep methods. The authors confirm that RK4 has the lowest error compared to other methods, and the Forward-Euler method generates the highest error. Therefore, many researchers apply this method, which has a low probability of developing undesired effects such as computational chaos or superstability. Additionally, the authors confirm that the numerical method and the integration step are directly related to the hardware resource consumption and design performance.
Chaos-based encryption suggests a new and efficient way of dealing with the problem of fast and highly secure data encryption. Many methods based on analog circuits are used to implement the chaotic behavior generators and the chaotic attractors associated with specific practical applications, such as switched capacitors or analog complementary metal-oxide-semiconductor (CMOS) technology. However, these methods exhibit some practical difficulties since the component values vary with age, temperature, etc. To overcome this problem, analog implementations can be enhanced using FPAAs to reduce mismatches using commercially available amplifiers. Additionally, one can infer that the design of integrated circuits is a challenge to develop lightweight cryptography applications suitable for hardware security for IoT [27]. Another approach performs digital implementation of chaotic generators since the problem of VOLUME 4, 2016 parameter mismatch does not exist. It provides accuracy and a significant possibility of integration into the embedded system, allowing for many embedded applications. The originality of this approach is that it will enable low-cost data encryption for embedded systems while still providing a good trade-off between performance and hardware resources. However, digital implementations suffer from the problem of degradation due to the use of finite precision to perform computer arithmetic operations. Esteban Tlelo-Cuautle et al. in [27] presented excellent detailed guidelines starting from numerical simulation to FPAAs or FPGAs implementations of IO or FO chaotic systems.
Chaos-based encryption provides a novel and efficient approach to encrypting data securely. Numerous analog technologies, such as switched capacitors or analog complementary metal-oxide-semiconductor (CMOS) technology, are utilized to build chaotic behavior generators and attractors connected with specific practical applications. These approaches, however, provide significant practical issues since component values fluctuate with age, temperature, and so on. To address this issue, analog implementations may be upgraded and enhanced with FPAAs to eliminate mismatches when commercially available amplifiers are used [27]. Another technique is to create chaotic generators digitally, which eliminates the issue of parameter mismatch. It delivers precision and a high degree of integration with the embedded system, enabling a wide variety of embedded applications. This solution is novel because it allows for low-cost data encryption for embedded devices while maintaining an acceptable trade-off between performance and hardware resources. However, digital systems suffer from deterioration due to computer arithmetic operations being performed with limited accuracy. Esteban Tlelo-Cuautle et al. offered good extensive guidance in [27] for implementing IO and FO chaotic systems using FPAAs or FPGAs. The authors admit that implementing IO chaotic systems is much less straightforward than implementing FO chaotic systems. Effectively, the difficulty lies in the approximation of the FO derivatives. Notably, the analog implementation of FO systems was realized by means of Laplace transfer functions to approximate the FO derivatives. However, it is necessary to tame the ODEs to obtain system parameters in the same order of electronic components. Unlike when using FPGA technology, the FO systems are implemented in the time domain using numerical methods. In addition to the numerical resolution method and step-size selection, the authors emphasized a colossal and crucial factor to consider during the implementation: numbers arithmetic representation. In fact, contrary to the floating-point arithmetic, using the fixed-point arithmetic leads to consuming less memory bandwidth, providing faster speed, and attending higher power efficiency.
As explained above, a considerable accumulation of pertinent research works in chaos-based cryptosystems recently proposed solving security issues in different applications, such as image encryption [21], [22], [29]- [44], video en-cryption [45]- [47], watermarking [48], speech encryption [49], [50], PRNGs [51]- [60], secure communications [61]- [67], wireless communication [68], conventional cryptography algorithms [69], [70], and FO analog and digital implementation [20]. Adopting different approaches, the established cryptosystem prototypes were realized on various hardware and software platforms such as microcontrollers, ARM processors, FPGAs, GPUs, and analog circuits [27]. Table 1 presents the related-literature review and analysis. We can reaffirm the vital role and effectiveness of chaosbased cryptosystems in securing information systems. Despite the variety of the offered solutions, they do not comply with the required security level for several reasons. Most proposed cryptosystems presented poor analysis, except for some [20], [27]. In other words, the other works offer just some studies that weakly analyze software, hardware, and security levels since they could be used in cryptosystems for embedded system security. Moreover, some related works are limited to the simulation phase and present only randomness analysis with a statistical test suite applied to nonreal word data. Additionally, because of the high complexity at the architectural level, most of the proposed cryptosystems miss flexibility, reconfigurability, portability, and standardization in most cases. Therefore, updating or adapting these cryptosystems to other platforms or applications is highly complex. Moreover, most proposed cryptosystems have been developed for academic purposes and are unused in practice or real-world applications.

II. CONTRIBUTION
The present paper proposes a hyperchaos-based reconfigurable platform for real-time securing communicating embedded systems interconnected according to IoT standards. The designed platform is a modular RTL architecture fully developed and designed from scratch using the VHSIC Hardware Description Language (VHDL). The originality of the adopted encryption approach uses an optimized 4D hyperchaotic Lorenz system to construct a complex hyperchaotic pseudorandom number generator (HC-PRNG) for generating random data as encryption key matrices. In terms of design, the architectural study focuses on an adaptive layout using architectural optimization techniques to achieve a system embedded in an FPGA chip, which offers portability, easy adaptation, and reconfigurability with no technology constraints. Additionally, to consume less memory bandwidth, provide high throughput, and attend to power efficiency, the hardware implementation uses the 32 bits, fixed-point arithmetic model. To establish a robust trust in the design, our strategy is to adopt multiple layers of security where risks are managed using diverse security mechanisms. On the one hand, this strategy allows sharing security responsibilities between all platform components. On the other hand, to completely isolate key generation and secrets from any software exposure at any point in time (i.e., hardware-based security to obtain strong device protection). Our final solution is nothing more than achieving chaotic, [39] Real-time FPGA implementation of blur detection, compression, and chaotic encryption for image applications. The main aim is to perform blur detection, compression, and image encryption in parallel. The Advanced Encryption Standard (AES) algorithm and the modified Lorenz chaotic PRNG are merged to realize an efficient CBC mode.
[47] A real-time video encryption algorithm based on a customized AES that integrates Henon's chaotic map. A mix row function, the chaotic Henon function, and Logical XOR are employed in place of the conventional shift row, the subbyte operations, and the multiple rounds of the original AES algorithm, which significantly speed up the encryption-decryption processes.
[52] A hardware implementation of six chaotic pseudorandom number generators. This research aims to identify efficient strategies for eliminating or consuming the minimum multipliers and dividers in the hardware design of different pseudorandom number generators.
[58] A VHDL-based design model of a new method for constructing multiwing chaotic systems. The authors suggest that 3D Lorenz's continuous chaotic system can be amended by incorporating saw-tooth and sine functions. The proposed chaotic technique is used to build a novel chaos-based TRNG true random number generator.
[20] Propose a detailed FPGA implementation of six fractional-order chaotic systems. The implementation approach is based on the combination of the Grünwald-Letnikov numerical method and the short-memory technique under the 32-bit fixed-point arithmetic. Mainly, the authors employ particular RAM and ROM blocks to design a reconfigurable architecture able to control the number of state variables and the length of memory.
[45] Hardware implementation of a multiwing chaos-based real-time secure video communication system. Integrating two saw-tooth wave functions in the well-known chaotic Lorenz system to construct a new multiwing chaotic system. Euler's method is employed for the system discretization.
[50] Proposed a novel encryption solution for secure audio transmission that uses the fast Fourier transform (FFT) and a new 3D Lorenz-logistic chaotic system. The novel chaotic system generates a new dynamic behavior by combining the conventional 3D chaotic Lorenz and the 1D logistic map systems.
[59] Suggested a novel generalized pseudorandom number chaotic map based on the Newton complex map. The significant achievements of this research include the proposed model's capacity to create both integer and complex random numbers and its large and dynamic key size, which considerably improves security features.
[41] Proposed a novel chaotic system to concept an image encryption system in a mobile Raspberry Pi3 model B microcomputer. A modified chaotic system is designed to build an RNG. The RK4 description is implemented using the Python language and the Spyder IDE software to solve the modified chaotic system.
[61] Presented a method for secure communication based on a novel five-dimensional hyperchaotic system and its hardware implementation via a microcontroller unit (MCU). The proposed hyperchaotic system is discretized using the Euler resolution approach. The drive-response is an adopted synchronization mechanism.
[62] An improved chaos-based text cryptosystem for real-time embedded system applications implemented on a 32 bits microcontroller. An implementation of the proposed cryptosystem with double precision float-point arithmetic is realized on a 32 bits Freescale ColdFire microcontroller using CodeWarrior software and C language.
[64] A lightweight chaos-based image encryption method has been realized on the 32-bit Keil MCB2140 ARM development board. The authors use half of the key bits as initial values and the other half to randomly change the system's initial values.
[65] Proposed an innovative, lightweight, and efficient chaos-based cryptosystem for securing low-resource nodes' network communication systems. The proposed cryptosystem is implemented on an Atmega1281 8 Bit microprocessor. secure wireless communication between two network nodes with the following contributions: • FPGA Design of a lightweight and energy-efficient hyperchaos-based encryption IP core dedicated to IoT device security, termed the Hyperchaotic-based IoT Device Security Core (HC-IoT-DSC). The encryption IP core uses an HC-KG to generate pseudorandom key matrices of different sizes. Additionally, the designed IP core incorporates hyperchaotic synchronization by the dynamic feedback modulation technique (HCS-DFM). To guarantee online and continuous control of randomness quality, system availability, and security reliability, the proposed IP core integrates a FIPS 140-2-based built-in self-security test module (BISST). The BISST ensures four online statistical tests while realizing an environment failure protection and testing mechanism (EFPTM). The EFPTM is implemented in such a way as to guarantee system functioning under three different security levels; • FPGA design of a secure UDP/IP stack. The proposed stack affords a low design latency and multiple high speeds VOLUME 4, 2016 of 10, 100, and 1000 Mb/s. In addition, the proposed UDP/IP interface is strengthened by many security measurements, such as port number control, MAC address random configuration, private internal static routing table hardware configuration, and IP packet fragmentation deactivation. The goals behind the static routing usage are from one side, to use less bandwidth and to minimize memory and computation resource consumption. On the other hand, this technique enables good security because the routes are always known, and any changes in the network topology require the intervention of a trusted authority. The UDP/IP medium can resist MAC address spoofing and fragmentation attacks; • Realize real-world and real-time secure wireless communication at a distance of 2 km between two delocalized network nodes employing the Xilinx ML605 FPGA platform and The ZigBee E800-DTU (Z2530-ETH-27) module; • Online/offline investigations and experiments were carried out intensely, deeply, and thoroughly to analyze, evaluate and validate the robustness and security aspects of the proposed scheme regarding all the aspects related to embedded system security. Mainly, the evaluations were conducted for all the platform components in two phases before and after integrating the proposed security core in real-time wireless communication.
The remainder of the article is structured as follows: Section III details the main steps to implement the proposed security core and the corresponding results and analysis. Then, the design of a secure UDP/IP stack is presented in Section IV. Moreover, performance analysis and connectivity tests of the UDP/IP interface are given. Section V is devoted to realizing secure wireless communication. Furthermore, the realization results, performance details, security analysis, and comparison are discussed. Finally, this research is concluded in Section VI.

III. SECURITY CORE DESIGN AND IMPLEMENTATION
This section presents the developed cryptosystem, a lightweight and energy-efficient hyperchaos-based encryption IP core implemented in an FPGA circuit and dedicated to IoT device security. The proposed security scheme combines three subsystems, an HC-KG (hyperchaotic key generator), a synchronization mechanism, and an online statistical test battery, as shown in Figure 1. Initially, we present the hardware architectures of the random number generator, the statistical tests battery FIPS 140-2 [71], and the adopted synchronization technique. Second, to validate the security aspects of the proposed scheme, several experiments and tests were established to realize an online and offline evaluation of the proposed architecture. Finally, we conclude this section by presenting the obtained results and related interpretations.

A. HYPERCHAOTIC ENCRYPTION KEY GENERATOR (HC-KG)
A cost-effective and optimum technique for constructing a hyperchaotic embedded system is to develop a customized hardware architecture that is compatible with a digital numerical resolution method. We may mention both onestep and multistep numerical resolution methods (i.e., Euler, Runge-Kutta, Adams-Bashforth, Adams-Moulton, etc.). In contrast to the Euler method, which is a numerical process for solving first-order differential equations for a given starting conditions, the RK4 method yields the most accurate solutions [20], [27], [28]. Indeed, in numerical analysis, the RK4 technique is an iterative numerical approach in that the initial estimate of the solution is used to produce a somewhat more exact second estimate, and so on [27], [72]. Additionally, as explained earlier in the introduction section, the hardware performance strongly depends on the numerical method, step-size, and selected arithmetic representation. In this regard, and considering our embedded ciphering application, the hardware implementation is developed and written in VHDL using structural description logic. This low-level design approach aims to resolve an optimized 4D Lorenz hyperchaotic system (1) using the RK4 numerical resolution method to provide a more accurate approximation of the solution and fulfill the needs of onboard applications in terms of physical resource usage, power efficiency, and speed.  The hyperchaotic Lorenz system is described by the nonlinear equations system as follows: where x (t 0 ) = x 0 , y (t 0 ) = y 0 , z (t 0 ) = z 0 and w (t 0 ) = w 0 Moreover, F, G, Q and U are nonlinear functions. To solve the nonlinear equation system (2), the RK4 technique uses the following equation system: where h is the discretization step and is h = 0.001, and different derivatives are k i , m i , l i and p i with i = (0, 1, 2, 3). The k i derivatives are defined as follows: where x n is an arbitrary starting point chosen at an arbitrary t n , k 0 is the derivative function at the start of the integration interval, k 1 and k 2 are the derivatives at the middle of the integration interval, and finally, k 3 is the derivative at the endpoint of the integration interval. Applying the same analogy, the derivatives m i , l i and p i are computed. Figure 2 depicts the RTL design that implements the RK4 numerical method. The proposed architecture depends on control parameters a, b, c, and d, the integration step h = 0.001, and functions F,G,Q and U as shown in Figure 3.
It should be noted that continuous Lorenz hyperchaotic variables are real numbers. To better compromise performance and cost, fixed-point arithmetic is used in a 32 bits format (10Q22). More precisely, 10 and 22 bits encode the integer and fractional parts, respectively. We have replaced the multiplication and/or division operations (i.e., the most used operations by the RK4 method) with right and left shifting operations to reduce FPGA resource utilization. Consequently, the proposed design uses only six additions and seven multiplications per clock cycle. Therefore, we minimize the number of DSP modules, and only basic arithmetic operations are used in our implementation.This architectural optimization minimizes the consumed logic slices, leading to a high throughput rate and a low design latency.

2) Hardware implementation of the HC-KG
Knowing that implementing chaotic generators on FPGA suffers from the finite precision problem, which causes losses in the natural chaotic dynamics of the implanted generator, we propose a very original solution to thwart this problem (see Figure 1, HC Encryption Keys Generator). The basic idea of our solution is based on increasing the length of fractional parts of data while reducing that of integer parts, and we only consider fractional parts to construct the encryption keys, hence the designation HC-RNG (Hyperchaotic Random Number Generator) of the proposed hyper chaos-based cryptographic key generator. Equations system (5) describes our approach for generating random keys: where f ract (u) = (u|−En((u|) and En ((u|) represents the integer part of u.
The real-time implementation is realized utilizing an FSM comprising four states (see Figure 4). A random key of 88 bits (11 bytes) is generated. Then, judging from the latter, the KMP (Key Management Process) is launched in parallel to construct the other keys in different sizes (8,16,32,64,88, and 128 bits). The operations performed in each state are as follows: • Initialization: this operation is realized initially and not included in the FSM. This means that the hyperchaotic system outputs are set to zero, and the initial conditions are assigned. We use an asynchronous reset signal during this phase, active in the low logic state (reset = '0'), to put the whole generation process in an idle status by forcing all system parameters to their initial values. We prepare for the first integration step in the RK4 method. The triggering of the key generation is conditioned by the external signal Sys_En.
• State 1: Calculation of the initial derivatives k 0 , m 0 , l 0 and p 0 (Equation (1), system (4)) as well as the intermediate points. The machine unconditionally switches to the State2 state at the next clock edge.
• State 2: Calculation of the derivatives k 1 , m 1 , l 1 and p 1 at the mid integration interval (Equation 2, system (4)) and the intermediate points. The machine unconditionally switches to the State3 state at the next clock edge.
• State 3: Calculation of the derivatives k 2 , m 2 , l 2 and p 2 at the mid integration interval (Equation 3, system (4)) and the intermediate points. The machine unconditionally switches to the State4 state at the next clock edge.
• State 4: Calculation of the final derivatives k 3 , m 3 , l 3 and p 3 (Equation 4, system (4)) and the final intermediate points. In this step, if the requested key is achieved, we stop the generation and return to the initial status; otherwise, we switch to State 1. In this state, all computed derivatives and the final intermediate points are delivered to the KMP from one hand to calculate the final solutions (Equation (3)) and from the other hand to extract the fractional parts (Equation (5)) and construct cryptographic keys.   It is worth mentioning that it was possible to use another FSM state to realize all tasks performed by the KMP. However, for optimization reasons and to give more architectural flexibility to the proposed keys generator, it was preferable to add this process that runs parallel with the RK4 FSM. The advantage of this choice is the possibility of generating a multisize key (8,16,32,64,88, and 128 bits) in only four (04) clock cycles. Conversely, [73]- [75] implement the same system and use the same resolution method, but the keys are delivered in mono-size format after 10, 8, and 6 clock cycles. In addition to ensuring continuous randomness quality verification and control, the random number generator is attached to an onboard FIPS 140-2 test battery. Based on the results of the test, three operating modes are envisaged. The first mode is the standard system operating with security level 1 (the generated key passes all the tests). The second mode is operating under security level 2. The generated keys fail FIPS 140-2 tests less than five times in this situation. Then, an internal zeroization process puts the generation and encryption processes in the initial state (back forward to initialization, reassigning the initial conditions' values, and waiting for Sys_En to restart the generation and encryption/decryption operations), and the generated keys are systematically rejected. The third mode is the system dysfunction mode under security level 3, where the generated keys fail the tests over five times. In this mode, to protect the secret parameters of the key generator (initial conditions, values, and mismatch parameters) and the plaintext, a security alarm is generated, and an external interruption is necessary. The latter activates a finite loop to randomly change the secret parameters (physical formatting), eliminating all attempts to recover these parameters. Furthermore, the only way to quit this mode and return to normal operating mode is the FPGA board reconfiguration (reloading the FPGA configuration file). The next subsection details the development and implementation of the proposed FIPS 140-2.

B. HARDWARE IMPLEMENTATION OF THE BISST MODULE
Several sets of statistical tests exist in the literature; the best known are those of NIST SP800-22 [76], [77], DIEHARD [78], AIS20/31 [79]- [81], TestU01 [82], NIST SP800-90B [83], ENT [84], and FIPS 140-2. TestU01 is the most complete and difficult RNG test suite, including eight subbatter-ies with over 282 statistical tests. Excluding FIPS 140-2, the other tests' hardware implementation is very complicated and consumes high physical resources. This difficulty is because of the use of several very complex mathematical functions. Thus, in our work, four standard statistical tests are selected: the frequency test (FT), Poker test (PT), Runs test (RT), and Long-Run test (LRT). These tests are used in the FIPS 140-2 cluster, and they are included in the AIS20/31, NIST SP800-22, and TestU01 tests. Although FIPS 140-2 is less stringent than other statistical tests, its hardware implementation is very efficient, making it the best candidate for onboard and constrained resource applications. Mainly, these tests are applied to a binary sequence of 20,000 bits and can be summarized in two steps: a calculation of a statistical quantity and a comparison of the latter to a predefined decision interval. We cannot find a clear and detailed development about the processes followed to obtain the decision intervals in the literature. The following subsection details each of the four tests.

1) The proposed architecture of the BISST Module
The FIPS 140-2 is used as a statistical test battery and, at the same time, as an environment failure protection and testing mechanism (EFPTM). In other words, the FIPS 140-2 is used as a BISST (built-in self-security test) that guarantees the reliability and availability of our cryptosystem. The hardware implementation is provided in three processes, the frequency test, Poker test, and run test combined with the longest run in the third process. The FIPS 140-2 module takes the generated keys, constructs a binary sequence of 20, 000 bits, and outputs STR and S_ALRM flags. The STR is the statistical test result indicator (i.e., equals '1' if the tested sequences pass all tests called security level-1). Additionally, the STR is used to launch an internal zeroization process to reinitialize the keys generator and end the data encryption if the tested sequences fail the statistical tests less than five times (called security level-2). S_ALARM is a security alarm generated when the sequences fail the tests over five times, which causes system dysfunction, and the key generator stops operating. Here, a zeroization process is activated using an external interruption signal (security level-3).

a: Frequency test
The form of the FSM (see Figure 5) comprises four states. During IDLE, we wait for data, and we switch to the next state (Read) only if the reset signal equals '0' and data are available at the input. In this state, all outputs and internal signals are set to zero. The read state uses two counters, the D in _CT counter to count the number of bits in the binary sequence and the ones_CT counter to count the number of ones in the same sequence. Each time a data bit is read, the counter D in _CT is incremented and compared to the value of 20, 000. If D in _CT equals 20, 000, and the reset is '0', then we pass to the Result state; otherwise, we keep incrementing until we reach the value 20, 000. In parallel, the counter ones_CT is only incremented if the data bit is '1'. In the results, ones_CT is compared to the interval decision. Therefore, if the relation 9, 725 < Ones_Ct < 10, 275 is verified, the test is passed, and the resulting output is set to '1'. Otherwise, the sequence fails the test, and the result is set to '0'. If reset is '0', we pass to the next state Halt. Finally, in the Halt state, if the reset is '1', we set both D in _CT and ones_CT to zero, and we pass to state IDLE.

b: Runs and Long Run tests
The same algorithm implements both runs and long-run tests. Figure 6 shows the FSM diagram of these two tests. In total, there are 16 states: an initialization state INIT, 14 states for counting each type of Run (onesones1, ones2... ones6, onesL and zeros1, zeros2... zeros6, zerosL), and a final state Final. All system outputs and internal signals are zero during the INIT state. The data's bit value conditions the transition to the next state. If the latter is '1', the next state is ones1; otherwise, we transition to zeros1. The INIT state is reached in two cases: either an asynchronous reset is set to '1' at any step of the test or a return from the final state FINAL. In-state ones1, two counters are incremented; D in _CT and ones1_CT , then we compare D in _CT to 20, 000. If this comparison is verified, we transition to the FINAL state. Otherwise, we switch to either ones2 (data's bit is '1') or zeros1 (data's bit is '0') and so forth. If the data value keeps the value '1', unconditional switching from ones2 to onesL until the appearance of a value '0' where we pass to zeros1. Similarly, unconditional switching is realized from zeros1 to zerosL if data keep the value '0' until a '1' appearance, so the next state is ones1, and the same process is repeated.
We realize the runs test during states ones1 to ones6 and zeros1 to zeros6. Parallel to the incrementation of D in _CT , another counter onesi_CT, and zerosi_CT are incremented with i ∈ {1, 2, ...5} to define the number of ones and zeros in the tested binary sequences, respectively. The transition from one of these states to the FINAL state is realized only if D in _CT = 20, 000. In states ones6 and VOLUME 4, 2016 zeros6, both the long-run and run tests are implemented. If the data's bit holds the same value ('1' or '0'), a counter L_CT is incremented until we obtain 26, then we switch either to the onesL state (data's bit has held '1') or zerosL state (data's bit has held '0'). Otherwise, two counters are incremented ones6_CT and zeros6_CT to continue the execution of the Run test. For both situations, if D inCT = 20, 000, we pass to state FINAL. A decision is made over the state Final by comparing all obtained counters' values with predefined intervals.

c: Poker test
The FSM diagram of the Poker test is similar to that of the frequency test, with four states. In addition, we use 17 counters, D in _CT , to count the number of bits in the binary test sequence and 16 counters to count the number of occurrences of each 4-bit chunk combination (CT 0, CT 2, . . . , CT 15). During the IDLE state, we wait for data, and we switch to the next state (Read) only if the reset signal equals '0' and data are available at the input. All outputs, internal signals, and counters are set to zero in this state. In Read, we read the input by a chunk of 4 bits. To this end, a 4-bit register is used to save the input bits. Once the register is full, according to the possible combinations, a test is performed to increment the value of one of the counters CT 0, CT 2, . . . , CT 15. At the same time, a test is carried out on the value of D in _CT compared to 20, 000. If this is the case, then we transition to state Result. The result compares ones_CT to the interval decision. Therefore, if the relation 9, 725 < Ones_Ct < 10, 275 is verified, the test is passed, and the resulting output is set to '1'. Otherwise, the sequence fails the test, and the result is set to '0'. If reset is '0', we pass to the next state Halt. We compute the quantity given by formula (6) in this state. Therefore, the sum of the squares of the counters' values (i ∈ {0, 2, ..., 15}) is calculated. A comparison of the obtained values is performed with formula (7). Based on the comparison result, a decision is made. If '0', we switch unconditionally to state Halt. In Halt, if the reset is '1', we set all counters and pass them to state IDLE.

C. THE HCS-DFM SYNCHRONIZATION METHOD
In our implementation, only techniques that consider the addition of user data can be used. CS-DFM (Chaotic Synchronization by Dynamic Feedback Modulation), developed by V. Milanovic et M.E. Zaghloul in [85] is part of this class. In this subsection, we present our approach based on CS-DFM. We apply the same principle of CS-DFM but with an extension to hyperchaotic systems; therefore, it is called HCS-DFM (the H for Hyper). The difference is that HCS-DFM synchronization is more dedicated explicitly to hyperchaos secure communication. Therefore, the synchronization uses a coupling between two identical hyperchaotic systems in Master-Slave configuration. Indeed, the signal transmitted to the slave system is a mixture of two signals, the information (plaintext) signal and the chaotic keys generated by the master system. This informational mixture is transmitted to the slave system and reinjected (feedback signal) to the master system. The objective is to create the same chaotic dynamic in both Master and Slave systems. Accordingly, the same key used for the encryption is generated by the Slave system. In the same way, the master and slave systems are decomposed into two subsystems (1 and 2) to achieve this configuration, as shown in Figure 7. Thus, to recover the information, it suffices to perform the inverse "mixing" operation between the coupling signal and the decryption key generated by the slave system after synchronization. Note that this configuration can use any hyperchaotic signals or combinations to create the encryption/decryption keys. In all cases, synchronization is ensured. To verify this method, in what follows, we detail the mathematical modeling and dynamic error stability proof of the proposed synchronization technique. Note that we assume the absence of noise in the transmission channel, and the mixing operation is a logic XOR. We use the Lorenz hyperchaotic system defined by systems (1), (8) and (9) to describe the transmitter and receiver dynamics, respectively.  Theorem. The dynamic error is stable if and only if we introduce the information i (t), the systems (8) and (9) must be globally asymptotically stable in the vicinity of the origin.
Proof. The dynamic error between the two systems is defined by e = T − R, where T = (x, y, z, w) and R = (x r , y r , z r , w r ).
As announced, it is assumed that there is no noise, and that the signal m (t) arrives at the receiver without distortion. Accordingly, the parameters of the two systems are equivalent. The dynamics of the error between the two systems are then given by: Let E(e, t) be the Lyapunov function, such that the term m (t) disappears.
The derivative of the Lyapunov function giveṡ Parameters a and c are defined to be positive. When t → ∞, the committed errors are close to zero or zero; then, the following product e 2 ( 1 4 e 2 − e 4 ) is close to zero. Therefore, the derivativeĖ (e, t) is negative definite. This allows us to say that: e 3 → 0, e 2 → 0, e 1 − 1 2 e 2 → 0 then e 1 → 0. According to BARBALAT's lemma (Lyapunov extension) [86], when t → ∞ in system (8),Ė (e, t) is uniformly continuous once the variables of the system converge. We find that e 1 =ė 1 = 0, e 2 =ė 2 = 0, and e 3 =ė 3 = 0, so e 4 =ė 4 = 0. Finally, because the Lyapunov function is positive and its derivative is negative definite, we see that the error e (t) converges to zero (when t → ∞,e (t) → 0), and therefore, the transmitter and receiver system is globally asymptotically stable and synchronized for any data type.

IV. DESIGN OF A SECURE UDP/IP STACK
There is a real need to design our communication medium within the conformity and adherence to a certain number of international standards in terms of good Quality of Service (QoS) and high performances (i.e., throughput, low area, flexibility, reliability, and simplicity) and security design. From this point, an RTL design of a secure UDP/IP stack is fully implemented using VHDL language. The choice of the VHDL language was taken to achieve, on the one hand, a flexible and reconfigurable architecture (i.e., the concept of modular design), which gives the possibility of easy adaptation in the future without dependability to the hardware target technology, and on the other hand, a minimum of risk in terms of security issues.
Mainly, our approach was inspired by the design supplied by Xilinx given in Virtex-6 FPGA Embedded Tri-Mode Ethernet MAC UG800 [87]. The Xilinx design only implemented functional verification without experimental performance evaluations or analysis. Therefore, our contributions are as follows: • Give a detailed architecture with all points that could be modified or subject to possible optimization.
• Provide experimental tests, performance evaluation, and metrics analysis concerning occupied physical area or hardware resources consumption, throughput, and protocol efficiency.
• Adapt and integrate the UDP/IP stack into the proposed security core for IoT devices.
• One of our important objectives is to achieve the simplicity and clarity of the design within compliance with international standards. In our work, the implantations are carried out based on the criteria defined in the Open Systems Interconnection (OSI) [88]. The User Datagram Protocol (UDP) is appropriately represented using the IETF RFC 768 [89]- [92]. Additionally, the Internet Protocol version 4 (IPv4) is implemented as explained in the IETF RFC 791 [93]. Furthermore, the Address Resolution Protocol (ARP) IETF RFC 826 [94] is used for the ARP Core, while the Ethernet protocol IEEE 802.3 [95] is exploited to implement the Ethernet MAC. Figure 8 illustrates the relationship between the OSI model and our UDP/IP architecture for further details. Figure 9 represents the block diagram of the implemented UDP/IP stack from the physical layer to the user interface. It mainly consists of a user interface (UI) and a UDP module composed of UDP_TX, UDP_RX, and IPv4 modules. This latter comprises four subblocks IPv4_TX, IPv4_RX, ARP, and Tx_arbitrator. The IPv4 Core allows us to encapsulate (multiplex and demultiplex) the UDP datagram in the IP packets and vice versa to pass through the MAC layer of the FPGA platform. The Tx_arbitrator module allowed us to control and manage access to the MAC_TX channel when the ARP and IPv4 modules request a transmission simultaneously. The ARP module reads the MAC_RX data parallel to the IPV4_RX path. Subsequently, it manages the communication of ARP requests and the timeout if no response is received. The IPv4 module is developed to ignore every packet except the following: IPv4 packets, broadcast packets, and packets intended for our IP address. Once all these verifications are satisfied, the received header data are valid, and the module asserts the start of the reception.
The MAC interface is relatively straightforward, with separate clocks for the receiver and the transmitter. Each interface (RX and TX) has an 8-bit data bus. On the one hand, this interface is used to communicate with the ARP module using the AXI bus, and on the other hand, it is used to interface with the modules transmit engine and receive engine. In addition, the Xilinx MAC connects to the external ethernet PHY (copper) via a Gigabit Media Independent Interface (GMII). Note that the MAC wrapper is fundamentally provided with more subblocks, but to adapt it to our requirements and for optimization reasons, we have removed all modules that we do not need in our application.
The UI shown in Figure 9 is used as a driver module for the UDP/IP core to control transmission and reception processes, the IP and MAC address configuration, and ensure the traffic routing path employing an internal private routing table. In other words, this interface plays the role of an API in the application layer that allows communication with and controls the UDP stack. The first process is used for the static configuration of the FPGA's IP address. As a security measure, the MAC address of the platform changes randomly. The objective of this action is to avoid MAC spoofing attacks. In parallel with this process, an FSM is used to manage the transmission and reception of the data. First, we start UDP header capture and cannot send data. Second, if the reception is finished with validation, we pass it to user data transmission; otherwise, we must wait until the reception is completed. Finally, the transmitted data are routed according to a specific predefined routing table. The goals behind the static routing usage are from one side, to use less bandwidth and to minimize resource consumption (memory and computation). On the other hand, this technique gives the possibility of having good security because the routes are always known, and any changes in the network topology require the intervention of a trusted authority.

V. APPLICATION TO WIRELESS COMMUNICATION ENCRYPTION
This section presents our final solution, which is nothing more than achieving chaotic secure wireless communication between two network nodes with the following main requirements: • The adopted encryption solution is based on the use of the HC-KG for the generation of pseudorandom data as key encryption matrices; • HCS-DFM is the chosen synchronization method; • Use the FIPS140-2-based BISST to guarantee online and continuous control of the generated stream key randomness quality and the security core reliability; • To establish robust trust in the design (i.e., security by design), the main philosophy is to completely isolate key generation and secrets from any software exposure at any point in time (hardware-based security to obtain strong device protection); • The UDP/IP stack is the selected communication interface; • Offer an encryption IP core that can be interfaced with any IP (Internet Protocol) network.
These requirements could only be achieved with several tools and methods. In particular, the hardware-software codesign approach makes the overall architecture flexible and easily reconfigurable. Figure 10 represents the experimental setup of the wireless secure communication system, which is composed of: 1. A development ASUS workstation and TOSHIBA Laptop, containing the following tools: • ISE-DS 14.7 software environment for VHDL programming and generation of configuration bitstreams; • iMPACT tool for FPGA configuration; • Eclipse IDE 2020-09 for Java development of the graphical interfaces to communicate between the PC and the FPGA board; • Wireshark software allows network packet analysis. This tool uses GTK+ software for its user interface implementation and pcap for packet capture [96]; 2. Two XILINX ML605 development and prototyping platforms based on Virtex6-XC6VLX240T; 3. Two ZigBee to ethernet modules, Ebyte E800-DTU (Z2530-ETH-27); 4. Network router RG-EG210G-P; 5. Network Switch RG-ES224GC.

A. CRYPTOSYSTEM FUNCTIONAL ARCHITECTURE
The developed cryptosystem architecture is shown in Fig    on the architecture represented by the previous section, except that its integration into the overall cryptosystem requires some modifications. These modifications are described in the addition of two states to its implemented FSM, as shown in Figure 4. The first state is added to avoid starting the generation of the encryption keys before achieving synchronization and fully receiving the message. The second state is added to manage different security issues when internal or external zeroization is needed; 2. Key management module (KMM): The key management module handles the key matrix formatting and key size selection; 3. Data management module (DMM). This module ensures the interfacing of the cryptosystem with the TRCM module and transferring received text to encryption or decryption operations; 4. Encrypt/Decrypt module (EDM); The EDM module implements two cryptographic modes, encryption and decryption modes. Mainly, this module uses an exclusive OR (XOR) to couple or decouple encryption and text; 5. The TX/RX Ctrl Module (TRCM) and UDP/IP stack allow scheduling all data transmission and reception operations. The modules in question are realized using the architecture detailed in section II. Note that we have kept the same architecture, except some modifications have been introduced to interface with the proposed hyperchaotic cryptosystem. The TRCM response module is developed using two processes. The first is a combinatorial process that implements a Moore-type finite state machine that manages the transmission/reception of data and the generation of control signals for the KMM and DMM modules. The second is a sequential process that serves as a control unit for the first process and the routing of UDP packets presented in Table 2 and Table 3.  Figure 12 shows the encryption/decryption flowchart of the developed cryptosystem. The whole process consists of five (05) steps as follows:

B. ENCRYPTION AND DECRYPTION PROCESS
• Step 1: Establishing the cryptosystem initialization by fixing the HC-KG initial values and control parameters; • Step 2: Before any message exchange, a synchronization test is realized. If the two FPGA gateways are synchronized, the process goes to the encryption key generation step, and a synchronization flag is activated to start message exchange; otherwise, we wait until synchronization is achieved.
• Step 3: In parallel with message reception, encryption/decryption key generation is launched; • Step 4: During the encryption/decryption phase and according to the selected operating mode, the generated key in step 3 is used to either encrypt or decrypt the received message; • Step 5: In this step, another test is performed to compare the lengths of the generated key and the received message. If they are equal, we stop key generation, and the treated message is transmitted.
Note that if FIPS 140-2 (BISST) detects a system anomaly at any step of the encryption/decryption process, we should return to the initialization step, and all operations are interrupted.

C. HC-KG SIMULATION RESULTS
Before the real-time measurement of our solution, we carried out two types of simulations. The first one is a MATLAB simulation, which aims to investigate and validate the chaotic aspect of the proposed HC-KG. To achieve this, the HC-KG is digitized by using the RK4 resolution method. Then, a bifurcation analysis is conducted to verify the behavior of the HC-KC and compare it to the original 4D Lorenz chaotic system. Moreover, the dynamical degradation effect is discussed through this simulation. The second simulates the hardware architecture with ModelSim-SE 10.4 software (i.e., functional simulation). In all cases, the simulations and real-time measurements are applied to the 4D Lorenz chaotic system described by the nonlinear dynamic equations (1) with an integration step h = 0.001 and initial values x 0 = y 0 = z 0 = w 0 = −10. In addition, hyperchaotic signals were obtained such that the hyperchaotic signals (x, y, z, w) were represented with (10Q22) bits fixed-point arithmetic, 10 bits integer, and 22 bits decimal. The random signals (x r , y r , z r , w r ) are the fraction part (22 bits) of the hyperchaotic signals. The RK4 simulation results are given in Figure 13 and Figure 14, while those of the functional simulations are depicted in Figure 15. The MATLAB simulation results are used to reference both the functional and real-time measurements. The results obtained by simulating the hardware are very similar to those obtained with MATLAB software.

1) Bifurcation diagram
The bifurcation diagram is employed to study the different transitions to the chaos of a nonlinear dynamic system. This type of diagram highlights the technique leading to chaos dynamics, namely, the period-doubling cascade. Figure 16 illustrates the behavior of the original 4D Lorenz chaotic system and its random variant, defined by equations (1) and (5) as a function of the parameter a. In particular, for a = 1/2, we observe a doubling of the period called here bifurcation. Before falling into chaos, there is a cascade of period doublings. After doubling the period, the previous   periodic orbit is still present but unstable. A chaotic system, therefore, has an infinity of periodic orbits. From the bifurcation diagram of the original system, we can deduce the parameters that lead to the chaotic regime of a nonlinear system. In contrast, the bifurcation diagram of the random system does not present any doubling of the period, which confirms, on the one hand, the nonperiodicity of the system and, on the other hand, its random behavior.

2) Dynamical degradation effect
In this subsection, the dynamical degradation effect is discussed. In our study, two types of error are considered: discretization and computing precision errors. The discretization error is related to approximating the classical derivative of the continuous chaotic system, the numerical resolution method, and the choice of the sampling step. Furthermore, nonlinear systems are particularly sensitive to the type and the selected the computing precision. Regardless of the finite precision, the system's dynamics are always too limited compared to its real behavior, and it is impossible to avoid the usage of truncation round operations in the intermediate calculations.
In this simulation, we choose three configurations to solve the system. The first is to use the same resolution method and precision and change the step. The second is to use the same resolution method and step but change the accuracy. The last one is to use different resolution methods with the same precision and sampling step. The values in Table 4 are obtained by simulation, and these values clearly show that the error is inversely proportional to the sampling steps and the fraction part in the selected precision. Additionally, the RK4 error is smaller than the Euler error, confirming that the RK4 method is more accurate than the Euler method.

D. REAL-TIME MEASUREMENTS
Comparing the real-time implementation findings, illustrated in Figure 17 and Figure 18, with those obtained by simulations, we find that they are too similar. Finally, the obtained results are satisfactory and determine the feasibility of embedding the proposed security architecture in a real-time and optimized way by targeting a specific hardware system, VOLUME 4, 2016     such as FPGA technology. Thus, we undoubtedly validate our implementation method and the adopted approach to developing a novel 4D hyperchaotic-based security core.

E. EXPERIMENTAL SETUP
After developing our cryptosystem and presenting the simulation results, this section presents its FPGA implementation integrated into a real-world application for establishing a real-time and secure wireless messaging exchange between two network nodes. We begin with the presentation of the experimentations, and we conclude with the presentation of the different obtained hardware synthesis results and related performance and security analysis. The basic idea is to carry out a messaging exchange between two nodes according to the network topology and IP address configuration of Figure 19. In this configuration, the FPGA platforms embedding the proposed cryptosystem architecture of Figure 20 are operated as a secure gateway. In other words, any information exchange can only be done through the FPGA gateway. In this experiment, all communications are realized in full-duplex mode as follows: • Using GJI, client1 sends a message to FPGA gateway (1); • The FPGA gateway (1) encrypts client1's message and sends it to ZigBee E800-DTU (1). This latter acts as a UDP client of FPGA gateway (1) and at the same time as a network coordinator for the ZigBee E800-DTU (2). Additionally, ZigBee E800-DTU (2) is configured to be a UDP client of FPGA gateway (2) and a terminal node of ZigBee E800-DTU (1). Therefore, all messages from FPGA gateway (1) are automatically transferred to ZigBee E800-DTU (2) through ZigBee E800-DTU (1) and then routed to FPGA gateway (2) and vice versa.
• The message received by FPGA gateway (2) is decrypted and sent to client 2 to be displayed on the GJI.
The goal is to hide the plaintext in the hyperchaotic keys. Each plaintext character is encoded in 8 bits and encrypted by an 8-bit portion of the key. To avoid losing information, we obtain more diffusion in the ciphertext. The second type of result relates to the functioning of our chaotic cryptosystem in a real-world application. The goal is to hide the plaintext in the hyperchaotic keys. Each plaintext character is encoded in 8 bits and encrypted by an 8-bit portion of the key. To avoid losing information, we obtain more diffusion in the ciphertext.
By using the developed GJI, as shown in Figure 21, the communication functions properly and without losing information. Therefore, on the one hand, we can realize a messaging exchange between two network nodes and, on the other hand, validate the correct functioning of secure communication.
To better analyze the obtained results, in what follows, we present the different captures of UDP packets provided by Wireshark 3.4.6 software [96]. For reasons of organization of the paper and to avoid repetition, we only give Wireshark captures for a single transmission from Client 2 to Client 1. In this experiment, another station is used to intercept the UDP traffic between Client 1 and Client 2. To do this, one of the ports of the switch RG-ES224GC is configured as a mirroring port and connected to the interception station. On the one hand, this configuration allows a duplicate of all the network traffic between Client 1 and Client 2 to be obtained, and on the other hand, the same traffic can be captured and analyzed under the interception station by using Wireshark software. Figure 22 illustrates the packets captured during the communication between Client 2 and FPGA gateway 2 or between FPGA gateway 1 and Client 1. Note that those packets represent the message before encryption (plaintext) or after decryption (ciphertext). Consequently, the exchange between clients and the FPGA platform is always in clear mode. Unexpectedly, the communication between the two FPGA platforms through the ZigBee modules is always in cipher mode, as shown in Figure 23. In other words, only the FPGA gateways handle text encryption and decryption. Consequently, the hardware-based security criterion is perfectly ensured by the proposed cryptosystem. This criterion guarantees strong system protection by isolating all secrets (i.e., keys generation, encryption, and decryption) from any software exposure at any point in time.

F. CRYPTOSYSTEM SYNTHESIS RESULTS AND PERFORMANCE ANALYSIS
This part mainly concerns the implementation results and performance analysis to demonstrate the robustness of the proposed security core. Several tools were used to carry out these results: MATLAB 2020b software, ISE-DS 14.7 (Integrated Synthesis Environment-Design Suite), Eclipse IDE 2020-09 for Java developers, and ModelSim-SE 10.4 of Mentor Graphics. A panoply of investigations has been performed to investigate the proposed design, according to the following: • The occupied FPGA area is estimated based on the used Flip Flops (FFs), Lookup Tables (LUTs), Block Random Access Memory (BRAM), and Digital Signal Processing unit (DSP). More precisely, every slice in Virtex-6 XC6VLX240T contains four (04) LUTs and eight (08) FFs, accordingly the AS (Area Size) in terms of the LS (Logic Slice) is equivalent to a quarter of (LUTs+FFs/2);       • Maximum Post Place and Route operating Frequency (MPRF) and throughput, which is defined as the number of bits by a unit of time and can be formulated by T P = O_Size O_latency = (O_Size × GF ) ( Gb s ), where O_sizeis the output size, O_latency is the delay to obtain a new output, called output latency modeled by O_latency = N _cycles M P RF (ns), N _cycles is the number of clock cycles to obtain one output (design latency) (in our case, O_Size = 8 bits and N _cycles = 1) and generation GF is the reverse of O_latency; • The power is evaluated for the register transfer abstraction level (RTAL) at the post place and route stage, considering all the hardware implementation details (physical constraints, placement and routing delays, device settings, ambient temperature, MPRF). Therefore, the power value is more accurate and closer to that measured when the FPGA circuit is configured. Using the Xilinx Xpower Analyzer tool [97], both static and dynamic power can be estimated. Accordingly, in our work, only the dynamic power consumption is presented with the following environmental conditions: industrial temperature grade (−65 to + 125 • C), junction temperature 53.2 • C, small size board (4 " × 4 " ), number of board layers (12 to 15), power supply (V cc int = 1 V, V cc aux = 2.5 V ); • Design efficiency is based on two crucial criteria, timing efficiency and power efficiency, which can be expressed by T ef f = M P RF AS ( M Hz LS ) and P ef f = P ower AS ( mW LS ), respectively.
• Security analysis and randomness characteristics tests include keyspace and key sensitivity analysis, histogram, information entropy analysis, and statistical tests. Statistical tests comprise three test suites: NIST SP 800-22, TESTU01, and AIS20/31. Moreover, for our sake and for the correct application of these tests, each of the seven test suites has been developed and implemented under the MATLAB 2020b environment. Figure 24 presents the implemented architecture and experimental setup for real-time data collection for statistical tests. Note that the data used for randomness quality evaluation and security analyses are not issued from simulation but represent the raw random sequences (without postprocessing) generated physically (real word data) after physically embedding the key generator on the XILINX ML605 FPGA target. This data acquisition is carried out by using the hardware architecture debugging tool ChipScope analyzer [98]. The data are captured using a trigger and stored in an internal buffer. A PC interface can collect the data stored in this buffer through a JTAG link. This method offers a good acquisition speed and allows an accurate evaluation of the random statistical characteristics of the generated keys and a clear understanding of the internal functioning of the designed security core.

1) HC-IoT-DSC synthesis results
epicts, on the one hand, the synthesis results obtained after place and route and, on the other hand, the implementation performance analysis summary of the proposed HC-IoT-DSC. The implementation targeted the Xilinx ML605 FPGA platform (Virtex-6 XC6VLX240T). Note that in our implementation, the selected step-size is h = 0.001. First, we discuss the results obtained for each submodule constituting the HC-IoT-DSC. Next, we present the total resource consumption and the maximum frequency of the entire system.
Regarding physical resources, the RK4 and PT submodules occupy almost 90% of the total physical area size of the proposed security core (737 out of 877 LS). In the same context, comparing the implementations of all the submodules, we find that only these two submodules require multiplication operations. Thus, the consumed DSP blocks in the security core are only those used by these two submodules. Consequently, they consume more than 65% of the total power (15.25 out of 23.35 mW ). Another critical measure is the maximum operating frequency. We can clearly say that submodules occupying the smallest physical area operate in high frequency and consume less than 35% of the total power (8.1 out of 23.35 mW ). It is evident that the proposed FIPS 140-2 can achieve high speed, allowing real-time analysis of the hyperchaotic key stream generator to predict any deviation from normal functioning. These different synthesis results demonstrate that the proposed hyperchaotic-based security core can be easily and efficiently implemented on an FPGA target by using only 877 LS (2%) of the logic slices with 108 DSP blocks (15%) and no block RAM under a maximum frequency of M P RF = 29.465 M Hz and total power of 23.35 mW .
To effectively evaluate the hardware implementation of the proposed security core, we use some evaluation metrics directly related to the maximum operating frequency. The metrics are the throughput rate and the time latency, where different throughput values are presented. Moreover, these values correspond to 8-, 16-, 16-, 22-, 32-, 64-, and 128bit key lengths. However, the longest critical path (design latency) to generate one key, independent from the size, is four cycles of the maximum operating frequency (MPRF). From Table 5, we have reached the highest throughput of 0.2490 Gb/s for the 4D Lorenz hyperchaotic system, where the throughput for the keystream generator varies between a minimum of 0.0589 Gb/s (for 8 bits keys) and a maximum of 0.9424 Gb/s (128 bits keys). Additionally, the latency of the entire security system is 135.754 ns. As a result, it can be stated that the proposed security solution has a good trade-off between high speed and low logic resources, which is very attractive for securing IoT communication systems.

2) UDP/IP stack synthesis results
This section reports the hardware realization results and performance analysis investigating the performance of the proposed UDP/IP stack. The design has been evaluated according to the following factors: the occupied FPGA area, MPRF, throughput TP, power consumption, timing efficiency, and power efficiency. Additionally, the different metrics related to the QoS aspect (i.e., transfer rate (TR) and transfer efficiency ratio (TER)) are specified.
In preparation for throughput and TER measurements, the Xilinx ML605 FPGA board is configured to transmit and receive a 1472 bytes UDP datagram payload (DPL) with an overhead (DOH) of 28 bytes (8 bytes UDP header and 20 bytes IP header), which conduct to a maximum packet length of 1500 bytes. Furthermore, a JAVA app was built to manage communication between the FPGA board and a personal computer (PC) equipped with a Realtek PCIe GbE family controller and an Intel Core i7-10700 CPU at 2.90 GHz running Windows 10 Pro. In addition to the UDP socket configuration, the JAVA app executes the following operations: 1 6. Compare the computed T R V S to the predicted transfer rate.
Implementing the UDP/IP architecture on the ML605 FPGA platforms has produced several results. The first result relates to the consumption of resources. As indicated in Table 6, the proposed architecture consumes low physical resources, with only 653 logic slices (1%), two BRAM (1%), and no DSP blocks. This result also verifies the constraint of the open, flexible, and simple hardware architecture, allowing the addition of possible future hardware modules (ICMP, TCP, DHCP). The low number of BRAMs and DSP nonuse gives us high independence from the FPGA target. In other words, it offers good portability of the developed solution with no technology constraint.
The second result corresponds to the critical operating frequencies and power consumption of the design. The UDP/IP stack can operate at a maximum frequency of 123.229 MHz (0.986 Gb/s), almost very close to the theoretical frequency of 125 MHz (1 Gb/s). This time performance is more than sufficient for Ethernet-based communication at a maximum rate 1 Gbits/s. Additionally, it fulfills the constraint set at the start of the design. The low power consumption (40.62 mW) can be explained by the low physical cost and confirmed by the obtained timing efficiency and the power efficiency values.
The third result is the QoS metrics benchmarking. This test aims to study the behavior of UDP/IP for data packets with variable sizes. The obtained TR at the output is 116.475 MB/s (114.65 MB/s) with a TER of 99.12% (98.67%) for transmission and reception, respectively. Figure 25 illustrates a comparison between the predicted and measured values of the transfer ratio TRVS. The obtained measurements indicate good convergence between those two values. These results show that our UDP/IP can be implemented on FPGA technology to provide high-speed communication. We present the following comparative study to demonstrate that our architecture meets the requirements. This comparative study focuses on the consumed physical resources, timing and power analysis, communication features, and design properties. Table 7 reports a performance comparison between our implantation and five other similar works ( [99], [100], [101], and [102]), from which we can deduce the following: Our architecture occupies the fourth range compared to other architectures regarding the achieved throughput with a 0.986Gb/s. The proposed UDP/IP stack consumes fewer FPGA resources in slices, BRAM, and DSP with high flexibility and simplicity. Moreover, our UDP/IP stack is the only architecture that ensures multientries ARP func- tionality with a minimum BRAM. Therefore, the designed stack offers good portability with minor modifications and high technology independency. These various remarks show that our architecture and that proposed in [100] are the best candidates for onboard applications. Indeed, those two architectures present an almost complete system compared to the other designs concerning different comparison criteria and needed requirements. Furthermore, our stack is superior in terms of security. This property can be explained by the fact that while our conception, many security measurements have been taken into consideration, such as MAC address and port number control, static routing configuration, and IP packet fragmentation deactivation. 3

) The global cryptosystem synthesis results
The consumption of material resources in terms of slices and the critical operating frequency of the developed architecture are summarized in Table 8.

G. SECURITY ANALYSIS
Mainly, the effectiveness of a cryptosystem is reflected directly by its resistance level to different security attacks. Many MATLAB scripts have been implemented to process the experimental data using security analysis and performance metrics to evaluate this level. Such tests include key sensitivity, histogram, chi-square, differential attacks, correlation, floating frequency, and information entropy analyses. Moreover, all the investigations are discussed and compared to related works.

1) HC-KG key sensitivity analysis
The main characteristics of hyperchaotic systems are unpredictability and high sensitivity to slight variations in initial conditions (IC) (x 0 , y 0 , z 0 and w 0 ) and mismatch or control parameters (MP) (a, b, c, and d). Indeed, the unpredictability property comes from a minimal variation in IC and/or in the MP inducing a radically different evolution in the dynamics of the hyperchaotic system, in that the latter is sensitive to IC. Accordingly, the proposed random generator should provide other keys even if it uses a very close IC or MP as secret keys or seed values. To this end, an experiment was conducted to measure the influence of a slight level change of at least significant bit positions on the resemblances of the generated keys.
In Figure 26, we illustrate the first thirty (30) 8-bit keys in the random trajectory of the proposed HC-KG by using three similar secret keys (see Table 9) with a one-bit change (equivalent to 10 −16 ), whereas the three trajectories behave differently. Thus, the proposed HC-KG is very sensitive even at the bit change level in the secret keys. This is not a disadvantage but an advantage that makes hyperchaotic systems good candidates for multiuser communications. Indeed, we can generate an infinity of encryption keys from a given hyperchaotic generator; it suffices for this to slightly modify the values of its parameters.

2) HC-KG keyspace analysis
In this subsection, the key size is analyzed from two different angles. The first comprises determining the key size from the initial conditions and control parameter codification. The second is to deduct the key size from the key sensitivity analysis of the previous subsection. Indeed, the dynamic of the proposed HC-KG is related to eight different 32 bits values, represented on four initial conditions (x 0 , y 0 , z 0 and w 0 ) and four control parameters (a, b, c, and d). Moreover, the secret key can be any of the 2 32 values. Likewise, it can be any of the eight values. Hence, this gives a key-size of (2 32 ) 8 =2 256 . The key sensitivity analysis determined that the proposed HC-KG is very sensitive to any change equal to 10 −16 . Thus, the keyspace is larger than 10 16 . Similarly, the secret key can be any of the eight values. Therefore, the key size is (10 16 ) 8 = 10 128 ≈ 2 425 .
According to the Advanced Encryption Standard (AES), a random number generator is resilient against brute force attacks if it has a keyspace of secret keys larger than 2 128 . Comparing the obtained key spaces to the required criteria, the proposed HC-KG is large enough to resist exhaustive attacks. A comparison of the key space is accessible in Table 10.

3) HC-KG histogram analysis
The uniform distribution property in the generated random sequences is an essential key factor regarding security. In other words, the repetition frequency of each element in a random sequence. One of the most commonly used methods for verifying this property is the histogram. Figure 27 illustrates the histogram of three random sequences of 1 M bits each (131, 072 integers of 8 bits), generated by using the secret keys of Table 9. It is documented that the histograms are uniform and notably different. Therefore, the proposed HC-KG can generate uniform random sequences and is resilient against statistical analysis attacks.

4) HC-KG Information Entropy analysis
In this test, the entropy is computed using Maurer's universal statistical algorithm, a compression type test detailed in [86], [107]. Furthermore, the same algorithm is adopted by the statistical test batteries NIST SP 800-22 and AIS20/31. This algorithm requires a sequence of n bits, which is divided into two chunks, Q(≥ 10 • 2 L ) initialization blocks and K(≈ 10 • 2 L ) test blocks with 6 ≤ L ≤ 16 and K = n L − Q. Each chunk is a set of L bits blocks or templates (in our case, L = 8). Next, we sweep the whole sequence by a block of L bits looking for the closest preceding exact bit block template match and recording the distance in multiple applying blocks. Then, we compute the log 2 of all distances for all the L bits templates within the test blocks. More precisely, it effectively gives the number of digits in the binary expansion of each distance [108]- [110]. Finally, we average all the expansion lengths by the number of test blocks. The following expression gives the information entropy: where k is the number of indices since the previous occurrence of the ith template. In this experiment, to realize an in-depth analysis of the information entropy H, the latter is measured at different operating frequencies varying between 1 to 29 M Hz. Therefore, the secret key 1 of Table 9 is used to generate 23 random sequences of 4 • 10 6 bits each (524, 288 integers of 8-bit) at 23 different frequency values. Figure 28 shows the obtained entropy measurements. We remark that the entropy level is nearly uniform and stable for all the studied frequencies with an average of H = 7.9873. Therefore, the randomness aspect is validated, and the proposed HC-KG is secure against information entropy attacks.

5) Randomness characteristics analysis of the HC-KG
In this section, to evaluate the randomness aspect of the proposed HC-KG, a variant of random sequences is generated and tested using NIST SP800-22, TESTU01, and AIS.20/31.

a: NIST SP800-22 tests
A statistical test consists of stating a hypothesis concerning a set of data and then checking whether the obtained observations are plausible within the framework of this hypothesis. The hypothesis to be tested is called the null hypothesis H 0 . It is imperatively accompanied by its alternative hypothesis called H a . Hypothesis H 0 is the one we are trying to refute, the one that is "true" until we prove the contrary. The hypothesis H a , contrary to H 0 , is the one we seek to demonstrate. For each test, the result leads to a decision: accept or reject H 0 . To reach an objective decision, first, the null hypothesis H 0 is established considering its alternative hypothesis H a . Then, to test the established hypothesis, an appropriate statistical test with a level of significance α is specified. Moreover, the sampling distribution of the statistical test under H 0 should be found. Based on the previous steps, the rejection region is defined. Finally, the value of the statistical test using the sample data is computed. In our case, the hypothesis H 0 is that the studied or "tested" sequence is random, which induces H a : the tested sequence is not random. The appropriate choice of statistical tests for testing H 0 is the NIST SP800-22 test series. The significance level α of the test represents the probability that hypothesis H 0 is rejected when it should have been accepted. In practice, an upper limit of the first type for the level α, most often 5% (significant), 1% (very substantial) or 0.1% (highly significant). More precisely, our approach evaluates the distribution of the P_Values for each test in NIST SP800-22 while calculating the P_ValueT. If this value is less than 0.1% (highly significant decision threshold), then the conclusion is that the sequence does not satisfy the corresponding random criterion. Otherwise, we can consider the sequence to be random and uniformly distributed.
As the proposed HC-KG can generate encryption keys of different sizes, for each key size, our analysis was carried out on 1073 sequences of one million bits (1, 000, 000 bits) for all tests. Table 11 summarizes the P_ValueT to examine the distribution of the P_Values of each test. As a reminder, this must be less than 0.1% to consider that the sequence does not meet the uniform distribution criterion. As can be observed from Table 11, all the generated keys pass the statistical tests. Therefore, we confirm the randomness aspect of the generated keys, and the proposed HC-KG has an excellent statistical performance concerning the NIST SP800-22 standard.

b: TESTU01 tests
Currently, TestU01 is the most complete and challenging test for RNGs, including eight (08) subbatteries with over 282 statistical tests. These subbatteries are SmallCrush, Crush, BigCrush, Alphabit, Rabbit, PseudoDIEHARD, FIPS 140-2, and NIST SP800-22. The SmallCrush is applied first. If the tested sequence passes, the crush tests are involved, and the more complex the BigCrush will be applied. Generally, if the tested sequence passes the previous battery, there is a high probability of succeeding in the remaining tests. Moreover, in these subbatteries, the tested sequences and the parameters of the tests are not fixed, making TestU01 more flexible. In these tests, TestU01 is used to verify whether the generated sequences behave randomly and flow a uniform probability distribution over the interval [0, 1]. P_Values within [0.001, 0.9995] are considered accepted. Table 12 presents the obtained results of the TestU01 batteries. The proposed HC-KG passes all tests; therefore, it has good randomness and statistical quality. c: AIS20/31 tests In this subsection, the randomness quality of the proposed HC-GK is proved by using the AIS20/31 test suite that we developed under MATLAB software according to the requirements and the specifications of BSI. This suite includes nine statistical tests (T0 to T8) (see Annex B). AIS20/31 is organized into two procedures, A and B, conducted in seven steps (A-1 to A-7) and five (B-1 to B-5) for procedures A and B. In step A-1, test T0 is applied to a sequence of at least 2 16 • 48 bits. However, in steps A-2 to A-7, tests T1-T5 are used for a sequence of 20000 bits and repeated 257 times. Procedure A is passed if and only if all 1285 basic tests (1 × T 0 + (T 1 to T 5) × 257) have been passed. If more than one basic test failed, procedure A failed. If precisely one basic test has failed, the second run of procedure A is tolerable. If one basic test failed within a second repetition, procedure A failed. Procedure B applies the uniform distribution (T6a and T6b) test and the homogeneity test (T7a and T7b) for widths 1, 2, 4, 8 on a 100000-bit sequence, followed by Coron's test (T8) on a 25600 + 2560 bit sequence. Note that each of the tests above represents one step in procedure B, which results in five stages with five basic tests in total. Procedure B is passed if all the basic tests have been passed. Similar to procedure A, procedure B falls if more than one basic test fails, but procedure B's second repetition is acceptable if just one basic test has been unable. If procedure B's second execution and one basic test fail, procedure B falls, and a third repetition is not allowed. Table 13 illustrates the obtained results of AIS20/31 tests, and the proposed HC-KG successfully passes all the basic tests for procedures A and B. Therefore, according to the AIS20/31 standard, the proposed solution can produce an excellent random sequence with high randomness and sufficient entropy density.

6) Cryptosystem key sensitivity analysis
In this subsection, we determine the sensitivity of the proposed cryptosystem to all encryption and decryption keys. This sensitivity can be evaluated following two methods. The first is to quantify the influence of a insignificant level change in the encryption key on the similarities of the generated ciphertexts. Indeed, changing one bit in the encryption key must produce a dissimilar ciphertext. The other is to validate the impossibility of recovering the plaintext when a slight change in the decryption key value is made. Accordingly, five similar keys were used to encrypt a plaintext of 1472 bytes. The obtained ciphertexts and decrypted text for the 30 first bytes of plaintext are represented in Figure 29. The obtained ciphers are different, demonstrating that the proposed encryption scheme is susceptible to tiny changes in the encryption keys. Alternatively, decrypting the cipher encrypted by key 0 using the other keys (key 1, key 2, key 3, key 4) produced incorrect plaintexts, as shown in Figure 29. However, the decryption process generates the  correct plaintext using the valid encryption key. Therefore, it proves that the proposed decryption scheme is also sensitive to bit-level changes in the decryption keys.

7) Cryptosystem avalanche effect analysis
To measure the sensitivity of the proposed cryptosystem to a slight change in the plaintext, the AE (Avalanche Effect) is used. Generally, a small change in the plaintext should cause no less than 50% change in the generated ciphertext. Similarly, the same sensitivity can be estimated using the MSE (mean square error). The MSE shows that the accumulated square error between two ciphertexts results from similar plaintexts. The MSE is formulated by: where c 1 and c 2 are two ciphers, and N = 1472 bytes, and B denote the most significant supported byte value of the generated ciphertext. In our case, B = 255. Usually, if MSE ≥ 30 dB, the difference in the plaintext is very clear. In this test, five similar plaintexts of 1472 bytes are encrypted with the same key to generate five ciphers. Then, the AE and MSE are estimated, as illustrated in Table 14. The obtained results are within the required thresholds. Therefore, the proposed cryptosystem verifies and accomplishes the plaintext sensitivity test.

8) Cryptosystem differential attack analysis
Differential attack analysis is another approach to measure the diffusion performance of an encryption scheme. More precisely, we quantify the capacity of an encryption scheme VOLUME 4, 2016   to resist differential attacks when a minor change in the plaintext is made (i.e., a one-bit change in the plaintext should produce a different ciphertext). NPCR, UACI, and correlation coefficients are employed to evaluate the dependency level between the ciphertext and the plaintext. NPCR, UACI, and the correlation can be formulated as follows: where ct 1 and ct 2 are two ciphers of the same size N = 1472 bytes and B denotes the largest supported ciphertext byte value of the generated ciphertext. In our case, B = 255. If ct 1 (i) ̸ = ct 2 (i) then D(i) = 1; otherwise, D(i) = 0. Additionally, cov (ct 1 , ct 2 ) = 1 In this experiment, a plaintext is modified to generate four similar plaintexts. Then, the cipher of the original plaintext and the other ciphers are used to calculate the NPCR, the UACI, and the correlation coefficients. Table 15 illustrates the obtained results. We observe that the obtained values for all the tests are within the required standard. Therefore, the proposed cryptosystem can efficiently resist differential attacks.

9) Cryptosystem chosen-plaintext analysis
Always in the same context of differential attacks, this analysis shows the vulnerability of a CPA (chosen-plaintext attack) and its effect on an encryption scheme. According to [113], if a cryptosystem can resist CPAs, it can resist other conventional attacks. To demonstrate this, two plaintexts are chosen: the first is a text of 1472 "A" characters, and the other is a text of 1472 "B" characters. Table 16 lists the obtained entropy values and correlation coefficients of plaintext and ciphers. We can observe that the entropies of ciphers are close to 8, and the correlation values are very close to 0. Therefore, the proposed scheme is secure against chosen plaintext attacks compared to other conventional attacks.

10) Cryptosystem histogram analysis
Histogram analysis is commonly used to validate that the generated ciphers are uniformly distributed and to verify the effectiveness of a cryptosystem against statistical attacks. Figure 30 shows the histogram of the plaintext, the encryption key, and the ciphertext. The plaintext histogram is nonuniform and irregularly distributed. Nevertheless, the histograms of the encryption key and the produced cipher are uniformly distributed. Thus, the proposed encryption scheme can resist statistical attacks.

11) Cryptosystem chi-square analysis
This analysis comes to complete that test based on the histogram to confirm the randomness of the produced ciphers. To this end, the independence and goodness-offit tests used in NIST SP800-90B recommendations are exploited to verify this uniformity through the chi-square metric. Five ciphers of 1472 bytes each are generated and tested in this test. Table 17 shows the obtained chi-square values compared to the required theoretical thresholds. All the tested text passed the test; therefore, the histogram uniformity is validated, and the proposed encryption scheme has good randomness.

12) Cryptosystem correlation coefficient analysis
Using correlation coefficients, we verify the independence between ciphertext adjacent characters in this analysis. Indeed, a good encryption scheme had better eradicate the high character correlation in plaintext and produce a random cipher that can resist statistical attacks. Formula (17) is used to compute the correlation, where ct 1 and ct 2 represent adjacent characters of the same ciphertext. Furthermore, five plaintexts are encrypted, and their ciphers are tested. Table 18 lists the obtained results. All the values are close to zero, demonstrating that the produced cipher has good randomness and its characters are highly uncorrelated. Thus, the proposed encryption scheme can resist statistical attacks.

13) Cryptosystem floating frequency analysis
Similar to the histogram and chi-square analysis, the floating frequency opts to evaluate the occurrence of different elements in all test windows of a ciphertext. Likewise, this occurrence should be uniformly distributed. To this end, a plaintext of 1472 bytes is encrypted using a random key. To measure the floating frequency, we consider a gap size of 256 elements and count the occurrence of different elements within that gap. Then, the chosen gap is shifted by one element to the right, and the floating frequency is computed. Figure 31 shows that the calculated floating frequency is nearly uniform. Thus, the proposed method verifies the randomness property. This analysis is fundamental, as it specifies whether an attacker can construct the whole plaintext or understand the encryption process from a subsequence of the cipher. Therefore, the obtained results demonstrate that this attack was unsuccessful.

14) Cryptosystem information entropy analysis
Information entropy is used to measure the amount of information, a concept of information theory. The more orderly a system is, the lower information entropy is; conversely, the more confusing the entropy is. The formula can calculate information entropy: where p(S i ) denotes the probability of symbol s i . The closer the information entropy is to 8, the more random the image is. In the test, five plaintexts and their ciphers are tested. Table 19 lists the obtained entropy values. We can see that the entropies of plaintexts are lower and different; however, when encrypted, the entropies of ciphers are close to 8. Therefore, the proposed encryption scheme produces ciphers with high randomness. Therefore, our cryptosystem can provide defense against entropy attacks.

15) Cryptosystem computational complexity and timing analysis
The execution time and computational complexity are essential criteria for evaluating an encryption scheme's performance. This analysis focuses on estimating three metrics: the synchronization time, the key generation time, and the encryption/decryption time. These metrics are expressed as follows: where: • T Gen , T Sync and T ED are the key generation time, the synchronization time, and the encryption/decryption time, respectively; • D L is the design latency, which in our case is 6; • CLK is the system clock period; in our experiment, 25 MHz; • N C is the clock cycle required to achieve synchronization, and N is the size of plaintext.
The experiment showed that for a multitude of tests, synchronization is always achieved after N C = 120 iterations. Each iteration is equivalent to D L = 6 clock cycles. Moreover, the proposed encryption and decryption methods are based on diffusion and inverse diffusion principles. These principles are realized by applying an XOR (Exclusive Or) between plaintext and an OTP (One-Time Pad key). Accordingly, the encryption/decryption time is proportional to the plaintext size, which is proportional to the key generation time, as shown by equation (21). In addition, regardless of the key size, the generation process consumes six clock cycles and requires only six additions and seven multiplications. Hence, the proposed scheme is less complex and time-efficient with a linear computation complexity of 14N . Table 20 lists the obtained encryption/decryption time and the computation complexity and compares them to related works. The proposed scheme presents the best performance in terms of encryption/decryption time and computation complexity.

16) Cryptosystem performance comparison
This subsection is devoted to comparing the proposed encryption scheme with various recent works that provide chaos-based key generation and their related information security applications (i.e., image encryption, speech encryption, and wireless secure communication). In this comparison, all aspects are considered, exclusively those related to hardware implementation and security analysis. As shown in Table 21, our scheme is the most evaluated and analyzed among all the presented works. More precisely, the randomness of the proposed system has been verified using three different test suites, especially the more complex and challenging (i.e., TestU01). However, only some or null test suites are used in the other systems. In addition, the proposed scheme presents the best keyspace after the approach suggested in [48] because this latter used four chaotic systems for the key generation, in contrast to only one in our scheme. Moreover, in terms of power consumption and occupied physical space, our design with 63.97 mW is a good candidate for constrained objects such as IoT devices.
It is worth mentioning that the occupied FPGA area is presented in logic slices (LS), except those implemented on Altera Cyclone IV ( [21], [22]) are presented in logic array blocks (LABs). Precisely, the authors in [21], [22] provide the consumed physical area in term of logic elements (LEs). Additionally, according to the Altera Cyclone IV handbook [119], each LAB contains 16 LEs. Therefore, to realize a reasonable comparison, we divide the number of LEs by 16 to obtain the consumed FPGA area in terms of LABs, which is comparable to LS. Additionally, the ability of the proposed scheme to provide encryption keys with different sizes simplifies its implementation on a variety of platforms. In contrast, it can be used with varying data formats and support any data size. Furthermore, all the related works are limited to the simulation phase or laboratory demonstration except those proposed in [21], [22], [39]. However, our work has simulated, implemented, and integrated the proposed scheme into a real-time wireless secure text transmission.

VI. CONCLUSION
The present work comprised an FPGA design and realized embedded hyperchaotic communication for real-time and secure wireless data transmissions. The developed architectures are modular and comprise a UDP/IP stack and a chaotic security core. It was necessary to go through several stages to arrive at the final solution. The first step comprises the study, modeling, and implementation of FPGA technology of a UDP/IP interface, intending to control the network communications physically. To do this, we constructed an RTL architecture that relies on a VHDL description of UDP and IP network protocols. This stack has been developed to achieve, on the one hand, a flexible and reconfigurable architecture that gives the possibility of easy adaptation in the future without dependability to the hardware target technology. On the other hand, there is minimum risk regarding security issues. Indeed, the developed UDP/IP interface has low latency and high speed close to the 1 Gb/s theoretical speed. Moreover, this stack is designed to support static routing that uses less bandwidth and to resist many attacks, particularly fragmentation and MAC address spoofing attacks. The second step is fundamental. It consists of the study and implementation of the chaotic-based security core. This core was implemented by the VHDL description of the RK4 method to generate random data as encryption key matrices. The original idea employs the fundamental properties in terms of nonlinearity, unpredictability, and extreme sensitivity to initial conditions to develop key encryption. In addition, our security core ensures the synchronization mechanism, called HCS-DFM. It is based on the regeneration of chaotic data by a feedback dynamic using an observer. Its advantage is the synchronization of a hyperchaotic signal mixed with useful information. Therefore, the reliability of HCS-DFM synchronization highly depends on how the data and the chaotic signal are incorporated.
The proposed security core has been profoundly and intensely evaluated. Additionally, the proposed solution presents a complete and profound analysis in terms of chaos validation, security aspects, statistical characterization, and architectural conception. Moreover, to the best of our knowledge, the proposed security core is the only one in the literature that its randomness quality and its security aspects have been tested with more than 18 security analyses, including the hardest and the most complex, and applied to real-world data.
Additionally, power consumption was considered very soon in the conception process. Furthermore, to demonstrate the design efficiency of the proposed security, for the first time in the literature, we introduce a crucial criterion, power efficiency (P ef f ), which is the ratio between the power consumption and occupied area size. The obtained results (P ef f = 0.024 mW LS ) confirm that our architecture has been effectively designed and implemented with compliance to embedded system requirements.
The last step is integrating the UDP/IP stack and the chaotic security core to develop a secure platform for realtime communicating systems and interconnected devices according to the IoT standards. The obtained real-time results are very satisfactory and validate the adopted approach. Our conclusions can be summarized as follows: • The advent of digital programmable circuits, such as reconfigurable FPGA-type circuits, makes it possible to design hyperchaotic systems while avoiding the drawbacks of an analog design. Indeed, this type of circuit allows the prototyping and hardware implementation of digital electronic architectures permitting the generation of hyperchaotic signals; • The adopted hardware/software codesign methodology offers excellent flexibility to the developed systems. It facilitates future updates by using the notion of IP design reuse; • Indeed, our architecture was tested for secure wireless communication, but because of the high modularity of the proposed hardware design, our cryptosystem is easily configurable for other applications, such as image encryption and speech encryption; • The proposed security core was intensely, deeply, and thoroughly analyzed and evaluated from all the aspects related to embedded system security applications; • The obtained results present a good trade between design efficiency and high security; • Using the fixed-point arithmetic model in the hardware implementation of the chaos-based security core leads to consuming less memory bandwidth, providing faster speed, and attending higher power efficiency. The fixed-point arithmetic permits a beneficial and interesting trade-off between fast speed and minimal resource cost.
• The randomness of our solution has been tested by three (03) statistical test suites. Thus, our security core is more secure with superior randomness and no postprocessing; • The proposed security core is equipped with BISST to guarantee online and continuous control of the reliability and availability of the proposed security core; Finally, according to the above, we can confidently confirm and enforce the statement about the superiority of the proposed security core compared to related works for embedded system security. Additionally, regarding the quantity and quality of the provided efforts and the obtained results, we confidently say that this research could be used as a strong roadmap for similar and future works. Finally, the rich results of this work open up several horizons for further development and future research. They can be classified into three crucial topics: • First, work on the digital hyperchaotic synchronization component for error minimization and exploring other techniques; • Second, we work on the data encoding component to be able to use chaos to secure bit-level data. That is, to generate chaotic cryptographically safe sequences; • Finally, work on the cryptanalysis component to properly assess the degree of confidentiality offered by the designed cryptosystem. In other words, add other security analyses, specifically those related to the hardware, such as power analysis attacks, side-channel attacks, and environmental tests (i.e., temperature, voltage, electromagnetic tests).