A Modified Algorithm for the Logistic Sequence Based on PCA

In order to deal with the dynamical degradation of logistic chaotic sequence caused by the structures and finite precision computing, we propose a modified algorithm based on phase space reconstruction and PCA. In the process of phase space reconstruction, it uses Cao theory after determining the delay time by the mutual information function and takes the first principal component as the result. Results of experiments demonstrate that: the algorithm erases the relevance among data points and enhances the randomness and complexity of the sequence.


I. INTRODUCTION
The massive exchange of information through networks, especially the internet and mobile-phone networks, has created a great demand for real-time secure transmission. Consequently, many encryption systems have been proposed and the chaos-based approach is a promising direction and has shown some exceptionally good properties in many aspects such as security, complexity, speed and computing power [1]- [4].
In general, chaotic systems possess several properties which make them essential components in constructing cryptosystems: 1) Randomness: Chaotic systems generate a random like chaotic sequence in a deterministic way. 2) Sensitivity: They are unpredictable and strongly nonlinear, highly sensitive to initial conditions and very slight change in the starting points can lead to a radically different behavior. 3) Ergodicity: Each state variable uniformly takes all states in phase space. These properties allow checking typical Shannon requirements of confusion and diffusion, which offer many opportunities that can be used in constructing effective pseudo-random number generator and various cryptosystems. Matthews [5] is the first who proposed a chaotic-based The associate editor coordinating the review of this manuscript and approving it for publication was Zheng Yan . encryption algorithm. In [6] and [7], algorithms combining AES and chaos encryption are studied and many chaotic theory-based encryption systems for communication have been proposed in [8] and [9]. The control and synchronization of chaos are described in [10] and [11]; since then, various images encryption schemes based on chaotic map have been proposed in [12]- [15].
However, due to the computation precision of software and hardware, the chaotic complexity in real-numbered chaos is often affected by dynamic degradation (collapsing effect) and is quite different from theoretical expectations, especially in low-dimensional space, such as logistic and tent maps. Some encryption systems based on the chaos theory have been attacked and destroyed, especially in image encryption systems [16]- [18]. To be precise, some low-dimensional chaotic maps have been extensively analyzed and even confirmed [19]- [21]. At the same time, the encryption systems based on low-dimensional chaotic maps have some security defects, such as small key spaces, short periods and insensitive initial values [22]- [25]. Finally, some security problems will appear because the orbits of the discrete chaotic system will exhibit degenerate behavior near a certain cycle, especially the encryption system based on logistic and tent maps [26]- [31]. Aiming at the degradation of digital chaotic property caused by finite precision, researchers have provided some countermeasures. The error compensation method proposed by Hu et al. [45] is also an effective way to deal with the degradation of digital chaotic systems since it can drive most properties of the compensated digital chaotic map to be approximate to the original chaotic ones. However, it seems difficult to extend this method to higher dimensional cases. The perturbation-based schemes can greatly improve the dynamical degradation of digital chaotic systems and have been proved to perform better than other methods [44], [47]. However, the properties of perturbed system are dominated by the perturbing source (like linear shift register). The other is a hybrid model based on the complementary of continuous chaos and digital chaos [46]. However due to the uncertain parameters, it is difficult to maintain long-term stable synchronization, which results in the limitation of dynamical properties. Generally, these methods cannot drive the properties of systems to achieve some levels, which may not be desirable for some applications. Therefore, one should resort to other solutions to ensure the expected properties, such as uniform distribution, higher complexity and ergodicity.
In view of the above questions, we introduce a modified logistic chaotic scheme based on phase space reconstruction theory and principal component analysis theory. The sequences generated by the algorithm have better randomness, high complexity and limited predictable long periodicity. The security of the sequence was evaluated using statistical test suites such as the NIST SP 800-22, as well as the approximate probability density, entropy and the autocorrelation analysis to determine unpredictability.

II. THEORY OF PHASE SPACE RECONSTRUTION
In order to study the complexity of chaotic sequence, it is necessary to reconstruct phase space of it. The dynamic behavior of system evolution can be expressed without singularity from the evolution trajectory of the point in space. The space formed by the observation value and its delay value is called reconstructed phase space. In 1980, Packard et al. proposed two methods for reconstructing the original dynamical system in phase space using time series [32], the derivative reconstruction and the coordinate delay reconstruction. In principle, both of them can perform phase space reconstruction. When it comes to practical application, without knowing any prior information of a chaotic sequence, and from the perspective of numerical calculation, numerical differentiation is sensitive to errors, so phase space reconstruction of a chaotic time sequence generally uses the coordinate delayed method [33]. The essence of the coordinate delay method is to construct a one-dimensional time series {x(n)} into an mdimensional phase space vector through the delay time τ , that is In 1981, Takens et al. proposed the embedding theorem: for the scalar time series {x(n)} with infinite, noiseless which generates by a d-dimensional chaotic attractor. As long as the dimension m ≥ 2 d + 1, an m-dimensional embedded space can always be found on the sense of topological invariance [34]. From the theory of Takens, a phase space is topologically equivalent to the original dynamical system, which can be reconstructed from a one-dimensional chaotic sequence, and the determination, analysis and prediction about the chaotic time sequence are all performed in it. Therefore, phase space reconstruction is the crucial step in the study of chaotic time series. In the process of the reconstruction, the first consideration is the selection of the optimal embedding dimension m and the delay time τ .
The delay time τ is an important embedding parameter. If the value of τ is too small, any two components x(i + jτ ) and x(i + (j + 1)τ ) in phase space are very closed to each other, so they cannot be distinguished; but, if τ is too large, the two coordinates are completely independent in statistic, and the projections of a chaotic attractor's trajectories in two directions are irrelevant. The delay time can be determined by the autocorrelation function method, the mutual information method and the empirical method. The autocorrelation function method is only the way to describe the degree of a linear correlation among variables, so it is not suitable for the nonlinear dynamic system. In the mutual information function method, the time corresponding to the minimum point is taken as the time delay, which can reflect the general correlation among data points. In this paper, the mutual information method is adopted to obtain the delay time, which is generally accepted as the major method to accurately judge the delay time [33].
The optimal embedding dimension of reconstruction is confirmed by Cao's method, which is an improved way of the false nearest neighbor method. In this, just one parameter of the delay time τ is needed, and it can effectively distinguish between random signals and deterministic signals. The embedding dimension m can be obtained by a small amount of data. Therefore, this paper adopts the mutual information function method and Cao method to reconstruct phase space.

A. THE SELECTION OF THE DELAY TIME
The time delay is a crucial parameter for phase reconstruction. The function of it is to divide a one-dimensional logistic time series into multiple sub-sequences by τ . The goal is to reconstruct the spatial structure of the original sequence, while preserving the chaotic characteristics of the original sequence to the maximum extent. In the process of phase space reconstruction, if there is a time delay, multiple subsequences are projected to multiple coordinate axes, and the best result of decorrelation is retained, which is the first principal component, and the other parts with poor chaotic properties are discarded. Provided that there is no time delay, then the original sequence can project to a single coordinate axis, but the part with poor chaotic characteristics is also retained.
Fraser and Swinney proposed the mutual information method to judge the nonlinear correlation of a system [35]. That is, the time of the mutual information function's first local minimum is selected as the delay time. The conjoint system is constructed by two discrete information systems, S = {s 1 , s 2 , · · · , s n } and Q = {q 1 , q 2 , · · · , q n }, which is nonlinear. According to the information theory, the average information of the two systems is called the information entropy, respectively as follows: where P s (s i ) and P q (q j ) are the probabilities of the event s i and the event q j in S and Q respectively. Given S, the information about system Q is called the mutual information of S and Q. The formula is as follows where So where P sq (s i , q j ) is the joint probability density of the event s i and q j .
, that is, s stands for the time series x(t) and q is the time series x(t +τ ). The delay time is τ . So it is obviously that I (Q, S) is the function about the delay time, denoting as I (τ ). The value of the known system S (i.e. x(t) ) represents the deterministic value of the system Q (i.e. x(t + τ )). When I (τ ) = 0, x(t + τ ) is completely unpredicted, that is, x(t) and x(t + τ ) are uncorrelated. Whereas, the valve of I (τ ) is the local minimum, that means x(t) and x(t + τ ) are the most likely to be unrelated. Thus, the first local minimum used to be the optimal delay time.

B. THE OPTIMAL EMBEDDING DIMENSION
The improved FNN (Cao method) was proposed by Cao [36]. The principal advantages of this method as follows: 1) The delay time τ is the single parameter needed for the calculation.
2) It can effectively distinguish random signals from deterministic signals.
3) The embedding dimension can be obtained by a small amount of data. The details of the algorithm are outlined as follows: In d-dimensional phase space, each of the space phase vector X (i) = {x(i), x(i + τ ), · · · , x(i + (d − 1)τ )} has the nearest point X NN (i) and the distance is As the dimension of phase space increases from d to d + 1, and the distance between the two points will change to R d+1 (i), and If R d+1 (i) is much larger than R d (i), it can be considered that the two points are non-adjacent, which is in the higherdimensional chaotic attractor becoming adjacent points when projected onto the lower-dimensional orbital, so the two adjacent points are false. Suppose that If a 1 (i, d) > R τ , X NN (i) is the false adjacent point of X (i). The threshold R τ can be selected in [10,50]. Take R d (i) into (9), and the equation is as follows: Cao rewrote the above formula as where X d (i) is the ith vector of the d-dimension phase space and X d NN (i) is the nearest adjacent point of it. X d+1 (i) and X d+1 NN (i) are the ith vector of the d + 1-dimension phase space and the nearest adjacent point respectively. Define If the time series is determined, the embedding dimension exists, that is, E1(d) will not be changed till d is greater than a certain value d 0 . If the time series is random, E1(d) will be gradually increased. However, in practical, it is difficult to judge whether E1(d) is gradually increasing or stable for finite length sequences, so the supplementary judgment criterion is For the random sequence, there is no correlation between the data, and E2(d) will always be 1, but for the non-random one, the correlations depend on the change of the embedding dimension d. So there are always some values d that make E2(d) not equal to 1.

III. THE IMPROVED ALGORITHM FOR THE LOGISTIC SEQUENCES BASED ON THE PRINCIPAL COMPONENT ANALYSIS
In statistics, the principal component analysis (PCA) is the technique to simplify the data-set by the regularity of random data to extract and compress the information of the random data. Project data from the original N -dimensional space to m-dimensional space, in general m << N . Therefore, it can reduce the dimension and remove the correlation of the data, but keep most of the internal information of the input data. This is the purpose of PCA. In other words, PCA transforms a large number of relevant data into a set of unrelated characteristic components. Therefore, it is a typical statistical analysis method in the statistical data analysis, feature extraction and data compression, which has a good application in image processing, face recognition and time series prediction [37], [38].
Because of the limitation of calculation accuracy, the correlation of elements in the sequence will be enhanced, which is generated by the low dimensional chaotic system, and the characteristics of the original chaotic will be reduced, resulting in the short cycle. PCA is suitable for removing the correlation in chaotic systems, which is caused by precision.
The improved algorithm of logistic sequence is arranged as follows: 1) Phase space reconstruction: For the sequence X generated by chaotic system, the mutual information method is used to estimate the delay time τ , and Cao method is used to determine the optimal embedding dimension, and the one-dimensional data is reconstructed into m-dimensional phase space C, that is The components of C are X 1 = x(n), X 2 = x(n + τ ), . . . , X m = x(n + (m − 1)τ ). 2) Data standardization: Normalize the reconstructed multidimensional phase space, that is, subtract the mean by columns(centralization). The formula is as follows 3) Calculate the covariance matrix R x , that is where 4) Solve the eigen equation R − λX p = 0 of the covariance matrix R x and the eigenvalues are λ 1 , λ 2 , · · · , λ p , and λ 1 > λ 2 > · · · > λ p . Let the orthonormal eigenvectors be u 1 , u 2 , · · · , u p . In accordance with properties of a covariance matrix, R x is a symmetric matrix. The eigenvalues of a symmetric matrix can be obtained. Therefore the orthogonal matrix U = (u 1 , u 2 , · · · , u p ), where u i = (u 1i , u 2i , · · · , u pi ). 5) Transform the components of phase space into principal components.
where Y 1 is the first principal component, and Y 2 is the second, and so on. In order to obtain the optimal decorrelation results, the first principal component of all principal components is taken as the final result of PCA.

IV. THE ANALYSES AND TESTS OF THE IMPROVED SCHAME
The logistic equation of chaotic system is given by Firstly, the delay time τ and the embedding dimension m were obtained by the mutual information method and Cao method respectively. Fig. 1 shows that the relationship between the mutual information function I (τ ) and τ . It is the first local minimum when τ = 15. Then the delay time τ = 15. In Fig. 2, when m is greater than 8, both E1 and E2 are almost unchanged, the optimal embedding dimension m = 8. Secondly, we reconstructed phase space to 9895 × 8. At last, PCA was carried out. The result of PCA is the first principal component y 1 . It must be beyond the original range [0, 1], because of project onto the axis. To keep the modified sequence in [0, 1], the normalization will be done in practice.

A. THE DISTRIBUTION OF CHAOTIC SEQUENCE AND THE LARGEST LYAPUNOV EXPONENT
The bifurcation of the modified chaotic system (µ ∈ [3,4]), displayed in Fig. 3(a), shows that the improved algorithm not only removes the correlation between data points, but also retains the characteristic of the original chaotic system. In Fig. 3(b), an enlargement of Fig. 3(a), shows that the points of the system have been in chaotic and Lyapunov exponents are positive (µ ∈ [3, 3.57)) which is showed in Fig. 4, just  the complex is low. So the maximum Lyapunov exponents are all greater than zero in phase space in Fig. 4. After normalizing, in Fig. 5, it is clear that the sequence generated by the modified system is uniformly distributed in phase space without any concentration of points in special region implying a strong chaoticity and good ergodicity, when the initial value equals to 0.7 and µ = 3.95.

B. AUTOCORRELATION FUNCTION
The waveform of the autocorrelation function with good performance should be a sharp needle without protruding  side lobe, which is beneficial to the accurate detection and recognition of signals. The random variables of Gaussian white noise sequence at any two different times are not only uncorrelated, but are also statistically independent. Fig. 6 shows the unbiased estimate of the Gaussian white noise's autocorrelation function. The more it is similar to the autocorrelation waveform of the white Gaussian noise, the more it shows randomness. In this paper, the original sequence and the improved sequence are tested for the autocorrelation. Fig. 7 is the autocorrelation test result of the original sequence, and Fig. 8 is the autocorrelation test before the normalization of the sequence. In Fig. 8, the autocorrelation function of the improved sequence is superior to the original one, and it is more similar to the Gaussian white noise, indicating that this method removes the correlation that caused by the calculation accuracy.

C. THE POWER SPECTRAL DENTISTY
Spectral analysis is an important method to study vibration and chaos. For the sample function of a random signal, the power spectral density function of x(t) is defined as   where R x (τ ) denotes the autocorrelation function of x(t), that is where τ denotes the sample interval. For the periodic signal, the power spectrum peaks only arise at the fundamental frequency and doubling frequency. For quasi-periodic signals, the power spectrum peaks arise at several irreducible fundamental frequencies and superposed frequencies. Both chaotic motion and white noise are aperiodic, and their power  spectrum are continuous. Because of its irregularity, Gaussian white noise has a uniform power spectrum in the entire frequency domain and all frequencies have the same energy, so it has a flat power spectrum, as shown in Fig. 9. In Fig. 10, x 0 = 0.7, µ = 3.99, the length N = 10000 and after 10,000 iterations, the power spectrum of the sequence was obtained from (21). Energy of the power spectrum distribute uneven and the reverse peaks arise, so it is a quasi-periodic motion. The power spectrum of the modified sequence is displayed in Fig. 11. It is similar to the frequency spectrum of white Gaussian noise in the frequency domain, which is uniformly distributed in energy. The histogram of the approximate probability density function, displayed in Fig. 12, shows that a perfectly normalized distribution.

D. THE NIST STATISTICAL TEST
For all 16 tests in the NIST suite, the significance level was set to 1%. If a P − value > 0.01, the binary sequence is accepted as random with a confidence of 99%; otherwise, it is considered as nonrandom [39] and [40]. To perform this battery of tests, we have generated up to 10 6 points by the modified scheme, and have converted the floating number obtained from this modified system to binary form. We have assessed successfully the sequence, proving the strong randomness. This implies robustness against many statistical attacks. The modified sequence is random with respect to all the 16 tests of NIST suite (see Table 1).

E. THE ENTROPY TEST
Entropy (such as K-S entropy, information entropy(EN), approximate entropy (APEN)) is one of the most important characteristics of randomness. Information entropy is a mathematical theory about data communication and storage. It is proposed by Shannon in 1949 [41]. To calculate the entropy of the modified sequence, we have where P(s i ) represents the probability of the symbol s [14], [42]. Information entropy of the original sequence is 7.7350, and the modified of it is 7.9734. Approximate entropy(APEN) is proposed in Literature [43]. We performed APEN test to the same sequence. APEN value of the modified sequence is 2.5557, but the original is 0.6434. It can be seen that our improved algorithm increases the complexity of the sequence.

V. CONCLUSION
In this paper, we establish a scheme to deal with the influence of finite precision on the dynamical of digital chaotic systems. The improved scheme is based on phase space reconstruction technology composed of mutual information method and Cao method, and PCA. The simulation results show that the improved scheme is simple and easy to operate, and remove the correlation between the original sequence data, which not only increases the complexity of digital chaotic system, but also does not destroy the attractor structure of the original system. The scheme can be applied to any given chaotic system, low or high dimension, and can be applied to pseudo-random coding, secure communication and channel transmission, image encryption and other fields.
CHUNYUAN LIU received the B.S. degree from the Computer and Information Engineering College, Heilongjiang University of Science and Technology, in 2003. She is currently pursuing the Ph.D. degree in information and communication engineering with Heilongjiang University, Harbin. Her research interests include nonlinear dynamics, image encryption, and image compression.
QUN DING received the Ph.D. degree in instrument science and technology from the Harbin Institute of Technology, China. She is currently a Professor with the College of Electronic Engineering, Heilongjiang University, China. She has published two books and over 100 scientific articles in refereed journals and proceedings. Her research interests include nonlinear dynamics and control, chaos pseudo-random sequence generator, and chaotic secure communication.