Is the PPG signal chaotic?

This paper shows how the dynamics of the PhotoPlethysmoGraphic (PPG) signal, an easily accessible biological signal from which valuable diagnostic information can be extracted, of young and healthy individuals performs at different timescales. On a small timescale, the dynamic behavior of the PPG signal is predominantly quasi-periodic. On a large timescale, a more complex dynamic diversity emerges, but never a chaotic behavior as earlier studies had reported. The procedure by which the dynamics of the PPG signal is determined consists of contrasting the dynamics of a PPG signal with well-known dynamics---named reference signals in this study---, mostly present in physical systems, such as periodic, quasi-periodic, aperiodic, chaotic or random dynamics. For this purpose, this paper provides two methods of analysis based on Deep Neural Network (DNN) architectures. The former uses a Convolutional Neural Network (CNN) architecture model. Upon training with reference signals, the CNN model identifies the dynamics present in the PPG signal at different timescales, assigning, according to a classification process, an occurrence probability to each of them. The latter uses a Recurrent Neural Network (RNN) based on a Long Short-Term Memory (LSTM) architecture. With each of the signals, whether reference signals or PPG signals, the RNN model infers an evolution function (nonlinear regression model) based on training data, and considers its predictive capability over a relatively short time horizon.


Introduction
Even in ancient times, medical specialists paid particular attention to human physiology, since understanding the mechanisms that make it possible for the human body to function correctly opened up a new avenue in development diagnostic procedures to possible pathologies or more and less severe somatic disorders (Nguyen, 2016). Physiological systems are dissipative systems that, in their energy exchange with the surrounding environment, ensure the stability of the homeostatic process, an internal self-regulatory mechanism that guarantees optimal vital conditions (Haddad and Chellaboina, 2005). This interrelation is best reflected by dynamic variables that can be measured directly or indirectly using the appropriate technical equipment. Dynamic variables are known in medical jargon as biological signals and represent the dynamic response (output variables) of the physiological system at hand. Roughly speaking, it is not always possible to measure all the dynamic variables involved in physiological functioning; at best, only a few, the so-called physical observables, are available. Even then, it is practically possible to gain information on the state of the system only using one dynamic variable (Packard et al., 1980).
The self-regulatory gearing of physiological systems operates intricately at different timescales, even though, in many cases, the time response of some of the system's dynamic variables shows an apparent regularity at small timescales (Dana et al., 2009). However, this apparent regularity conceals dynamic subtleties that mainly expect materialize in the longer term and which often mask under the guise of dynamical noise. These minor variations provide the necessary complexity to enable regulation at a very precise and effective level. Therefore, biological signals have both deterministic and stochastic components (Costa et al., 2005), both contributing to the underlying dynamics of the physiological system.
In a previous paper, we study the dynamic behavior of the PPG signal from a stochastic perspective. Through a slight modification of the 0-1 test, we contrast the dynamic behavior of the PPG signal with a diffusive process at different timescales. The results compare with those obtained for reference signals whose dynamic behavior is known, aside from being typical of the possible dynamics that physical systems may exhibit (de Pedro-Carracedo et al., 2019). This paper assumes no a priori analytical approach. This study intends to inquire, using neural network-based methods, which dynamics are present in PPG signals of young healthy individuals with the known dynamics of the reference signals. By learning these known dynamics, even in the presence of noise, together with their predictive capability, the neural networks in this work have made it possible to examine the dynamic composition imbedded in the time evolution of any PPG signal at different timescales.
The rest of the paper is organized as follows. Section 2 describes the datasets used in the experimental tests, as well as the two deep neural network-based architectures that serve as an experimental framework. In addition to the adapted internal structure of these architectures, this section refers to their training to fulfill analytical requirements pursued in this study. Section 3 shows the obtained results, both graphically and numerically, for various experimental settings. In section 4, we analyze and interpret the obtained results. Finally, in section 5, we briefly outline the conclusions drawn from this study, which in turn serve as the basis for future work.
[ Figure 1 about here.] 2 Materials and Methods 2.1 Datasets 2.1.1 Reference signals (synthetic dataset) Although the complexity of signals admits different definitions-a rigorous definition has not been agreed by the scientific community (Grassberger, 1986;Badii et al., 2003)-, a consistent alternative evaluates the degree of complexity regarding regularity in the patterns of repetition of the data; to some extent, this alternative quantifies the difficulty in describing or understanding a signal (Kantz and Schreiber, 2004). These patterns lead to an ordering of the signals between two opposite ends, the most regular or periodic and the most irregular or random, with a whole range of intermediate options; as they approach a random regime, as in the case of chaotic signals, these acquire greater degrees of freedom or versatility, without losing determinism in its dynamic behavior.
To train our CNNs, we have used 150,000 samples (points) of each one of the different signals that represent the typical behavior of time series concerning regularity in the repeating patterns of the data (reference signals). Classification of the signals is according to the order of less to greater complexity, in accordance with the aforementioned approach, from the most regular or periodic evolution to the most random one. Figure 1 shows the amplitude variation, normalized to the interval [0, 1], of all reference signals in the time domain.
Periodic signal Here, we analyze a repetitive pattern with two frequencies whose values are rationally related, i.e., we use a saw's wave at f = 10 Hz: Quasi-periodic signal A certain recurrence in time evolution may lead to wrongly consider this type of dynamics as an "irregular periodicity" or the same "repetitive structure". The fact is that the pattern never repeats if the data have infinite precision, shaping a torus or tori depending on the degrees of freedom (Landau and Lifshitz, 1987). In our case, we apply the most simple one, two cosine waves whose frequencies are irrationally related: with ω 1 being the reciprocal of the golden mean.
Aperiodic signal An aperiodic signal has non self-similar repetition even with infinite precision data, although mathematically it can be considered like a periodic function with an infinite period. The selected function is the one that generates samples by a linear frequency sweeping, the chirp function: between f 0 = 0 Hz and f 1 = 10 Hz, with k = (f 1 − f 0 )/T , being T the sweep time; and the sampling frequency f s = 250 Hz.
Chaotic signal The sensitivity of certain deterministic functions to small changes in the initial state is the characteristic footprint of chaotic behavior. The initial uncertainty increases with time, and it is not possible to predict the final state of the system (N. S. Krylov 1944; M. Born 1952) (Landau and Lifshitz, 1987). We use the Hénon map dynamic variable x n , with a = 1.4 y b = 0.3. The initial conditions were x 0 = y 0 = 0.03. One million samples were generated, using the last 150,000 points.
x n+1 = 1 − ax 2 n + y n , y n+1 = bx n . (4) Random signal It is not deterministic and requires a probabilistic characterization. 150,000 points were generated by a uniform distribution in the interval [0, 1].

Real-world PPG signals (biological dataset)
This paper focuses only on a single biological signal, the PhotoPlethysmoGraphic (PPG) signal; forward publications will describe the results for more biological signals. We have chosen the PPG signal because it is easily accessible and the information provided allows us to monitor vital physiological signs. A pulse oximeter consists of a light emitter and a photodetector that collects and records (pulse or PPG signal) the loss-scattering and absorption-that a beam of light undergoes when it passes or reflects from human tissue. It allows detecting blood volume changes in the microvascular bed of tissue-in our case, the middle finger of the left hand-, obtaining valuable information about the cardiovascular system and, on the whole, about the cardiorespiratory system. Given the simplicity of its non-invasive accommodation, in addition to its low cost, a pulse oximeter is very useful in biomedical applications for clinical and sports environments. For a useful review refer to (Allen, 2007).
In this work, we use the PPG signals from a total of 40 students, between 18 and 30 years old and a non-regular consumers of psychotropic substances, alcohol or tobacco, selected to participate in a national research study (Aguiló et al., 2015;Arza et al., 2018). All signals were captured from the middle finger of the left hand and sampled at a frequency of 250 Hz (Aguiló et al., 2015), say, sampling time ∆t = 4 ms. Since the PPG signal operates at a very low frequency, where its dynamic diversity is manifested, it is necessary to work with many points; hence we have used up to 600,000 points (40 minutes) of every PPG signal, those corresponding to the total data recorded in the indicated research project.

Dynamic behavior classification with a CNN architecture
In this paper, we propose a CNN-based approach to classify PPG signals according to their dynamic behavior. The CNN receives time segments of the PPG signal as input, and it produces a normalized matching-up vector (target pattern) as output that contemplates the success rate of each possible dynamic behavior assignable to the input signal. The available dynamic behaviors are periodic, quasi-periodic, aperiodic, random and chaotic.

CNN architecture
Here we propose an encoder-fully connected residual architecture whose residual blocks are based in the ones from (He et al., 2016). The encoder-fully connected architectures are traditionally used in classification tasks like (Simonyan and Zisserman, 2015;Szegedy et al., 2015), actually it is used in a range of applications like Wi-Fi people detection (Huang and Lin, 2020) or shape regression (Güler et al., 2017). We define the proposed CNN as ours for three main reasons: first, that is a CNN explicitly created for 1D signal processing; at next, we prepare an structure capable of process high and low frequencies with the use of long kernels; finally, we base our residual blocks in ResNet (He et al., 2016), but all of them have been adapted to 1D processing.
[ Figure 2 about here.] The reason to use residual block configurations is that it allows preserving all the processed information, a very known problem in CNN architectures. With these residual blocks, we can process and add new information to the old one, enriching the information to be processed by the next layer, in contrast to non-residual structures like the ones used in (Simonyan and Zisserman, 2015), that loss information across each one of the layers used. In our case, both low and high-frequency information is essential (specially low frequency), for which reason is very important to preserve or reuse all the available information. We cannot afford to lose it, because all this information can hide invaluable dynamics behaviors in the PPG signals.
The base structure is shaped like a classic 1D-CNN that has two stages. The first one is an encoder feature extractor that is composed of 1D residual blocks whose basis is the one used in (He et al., 2016). This stage includes max poolings layers that obtain the most important features and erase the spurious of the previous outputs. The second one is a classification stage composed by fully connected layers, where the features obtained by the encoder are vectorized and used as inputs to the fully connected layers, that are the ones responsible for classifying the dynamics behavior of the input signal segment using the encoder output features that characterize the input signals.
The input time segment I is processed by the encoder feature extractor, which generates the latent filtered representation L. The process from the input signal to obtaining the latent space follows these steps. Firstly, the point segment of the input signal passes through the first residual block used; this first residual block has a kernel of 14 so that its receptive field will be quite wide, being able to capture both high and low frequencies. This first residual block is in charge of obtaining the simplest filters characterized by simple features such as fast transitions between samples or simple relationships between them. In order to filter the outputs of this residual block, a max pooling layer is used, which allows eliminating parasitic activations and highlighting the most outstanding ones. Once the first residual block + max pooling has been passed, the second residual block is used. This second block, unlike the first one, uses a lower kernel since it does not require such a wide receptive field when using as input the outputs of the filters of the first residual block. The mission of this second block is to obtain more complex features and of greater dimensionality; once again, a max pooling layer is used at the output to highlight which of these more complex features are best suited to the input signal segment. This last output is the one that constitutes the latent space of our neural network. Once we generate the latent space, we proceed to the classification stage or fully connected. In this stage, the first thing we do is to vectorize the input so that the dense layers can easily process it. After vectorizing the latent space, it passes through a first dense layer composed of 12 neurons and a ReLU activation; this is the first one to process the information before its final classification. The second and final stage is a 5-neuron dense layer, which corresponds to the number of output dynamic behaviors. This last layer has a softmax activation that allows obtaining some output dynamics with a success rate and normalized format to one; besides, it is the one in charge to relate finally all the information used and to obtain the final criterion of classification of the dynamics. So this final stage produces the 5-dimensional output vector C, as we can see in Figure 2.
In Table 1, we can see the architecture in detail, with all its layers and parameters. Finally, to provide a better understanding of the used residual blocks insides, we graphically display all its structure in In short, the reason to propose and use this architecture is given by its very favorable characteristics to face the problem here posed. Such characteristics, in summary, are composed of: • Wide Kernels: These kernels allow a receptive field captures both high and lower frequencies, being able to unravel all the dynamic behaviors present in the signal at different timescales. • Residual Blocks: These blocks allow, on the one hand, improve the convergence of training and, on the other hand, allow reuse all the information that is lost through the convolutional filters, reducing the information lost to a minimum.
• Max Pooling Layers: These layers allow filtering the most negligible and noisy components generated through the neural network.
To test the effectiveness of our proposed architecture, a comparison is made in 3.1.4.

Training
In the training step, we use the reference signals explained previously in 2.1.1 to train our CNN end to end and allowing this to learn the most important features of the dynamic behavior matching task, allowing us to avoid the use of a real-world signal labeled database. In this step, we train the network with the synthetic dataset, which is divided into training, validation and testing sets, which are composed of 80%, 10% and 10% of each dataset. These division ranges have been chosen since usually the training set constitutes 60 to 80 per cent of the dataset while the validation and test set comprises 20 to 40 per cent. At next, we explain the training parameters.
1. Optimizer. The used optimizer is the Adaptative Moment Estimation, better known as Adam (Kingma and Ba, 2014). This optimizer is an alternative to the traditional stochastic gradient descent (SGD) algorithm optimizer, and it combines the advantages of two previous alternatives (Duchi et al., 2011;Dauphin et al., 2015), creating a more reliable preference that uses the averages of the first and second moments of the gradient to adapt the learning rate dynamically. The learning rate is the parameter that defines how much and how fast our system learns in each period; a very high learning rate may result in training divergence, while a very low learning rate may not advance or take a long time to train the system. Adam starts with a user-defined learning rate and, after that, it modifies the learning rate along with the training without supervision, providing an adaptable learning rate to the train, which is a big advantage in terms of learning adaptation. The initial learning rate is 10 −4 ; to complement this train, we use an early stopping callback. The early stopping callback is a tool that helps us to save the best model of all our training. This callback records the metrics and losses achieved in each epoch, saving only the best of the achieved models. The network is trained for 10 epochs with a batch size of 50 samples, but the before mentioned callback saves the best-obtained model in these 10 epochs. The total training time is about 6 hours.
2. Loss Function. Our CNN uses to train a training set formed by input signal segment I and output matching-up vector C. The proposed multiclass classification loss function is the categorical crossentropy loss, that evaluates the differences between ground truth and predictions. The categorical cross-entropy loss function is applied between the ground truth and the per class calculated success rate after softmax activation by the CNN S j . In Eq. (5), S p appears when using one hot representation for ground truth vectors, so that there's only one non-zero element t p = 1, which belongs to the true label.
Finally, we explain in-depth and step by step how the training takes place. The first thing that needs to be clarified is that it is necessary to train one CNN for the small timescale, whose inputs are represented by signal segments of 5,000 samples and another one that is in charge of the large timescale whose inputs are composed of signal segments of 60,000 samples. The training of these CNNs is done through the synthesized reference signals, which are composed of periodic, quasi-periodic, aperiodic, random and chaotic dynamics. Since these CNNs have been designed from a classification point of view, the problem to be solved is a multi-class classification, with five possible classes or dynamic behaviors to be found. Each of the segments of the input signals has associated output labels defined, which directly link these input segments to a specific dynamic, learning the system that with 100% success rate, these segments are associated with a dynamic behavior. In the training process, a random batch generator is used that takes 50 random segments from the 5 reference signals with 5,000 or 60,000 points and generates 50 output labels assigned. This process has been randomized to provide more generalized and learning-rich training. Once the batches are loaded, during 10 periods and through the use of the Adam optimizer, the proposed system is trained to recognize the input signals correctly. For the training to be carried out, an 80% (600,000 points) of data is used for training, a 10% (75,000 points) of data for validation, and a 10% (75,000 points) of data for test. After training the proposed system with the reference signals, it is evaluated with the real-worlds PPG signals, which, unlike the reference signals, have a dynamic with multiple dynamic behaviors at different timescales. So, its output vector does not have a single dynamic with a 100% success rate, but this rate is redistributed among the five dynamics trained through the reference signals, obtaining outputs characterized by a composite dynamics and not unique.

Horizon of prediction with an RNN architecture
We now propose an RNN architecture to predict the following h p points of the time series conformed by the PPG signals beyond the current point considered in a current time t c . In addition to this, we predict the following h p points of periodic, quasi-periodic, aperiodic, random and chaotic time series, to offer a perspective of how predictable are the PPG signals and with which type of dynamic horizon or horizons are most associated. The RNN receives time segments of the PPG signal as input and it outputs a regression of the t c + h p(n) point, being p(n) the offset regarding to the current time (starting point).
The horizon of prediction h p is 50 points to cover a wide range of samples to stabilize the prediction error. Thus, we train 50 different RNNs for each time series specialized in predict the next 50 points with regards to the starting point.

Architecture
We suggest the use of an RNN. The best-known types of RNN are, firstly, the classic RNN blocks, which are simply made up of a tanh activation and a concatenation of the current input with the previous outputs. These blocks are in disuse due to their low efficiency and high instability, as they diverge significantly. Secondly, the LSTM or Long-Short Term Memories units introduced the concept of doors in recurrent networks, using different structures within the same block to carry out different functions (door of oblivion, update door, exit door), each one vital for the correct operation of the same. Finally, the Gate Recurrent Units (GRU) (Cho et al., 2014), which are a simplification of the LSTM, reducing the number of structures or gates to two, the reset gate and the update gate, would theoretically obtain similar performances.
[ Figure 4 about here.] Here we propose a Recurrent Neural Network based in LSTM structures (Hochreiter and Schmidhuber, 1997), since the RNN are highly unstable and problematic, while the GRU are recurrent units that try to simplify the LSTM units and that theoretically obtain the same performance as these, even though this adjustment may not have been observed in practice. The LSTM-based architectures are traditionally used in time series regression and classification tasks, as we can see in (Zhang et al., 2018;Ningsih et al., 2019;Khotimah et al., 2019).
Two LSTM layers with sequence return connections make up the base structure and one final dense layer to provide the point prediction. The first layer LSTM is in charge of obtaining the time characteristics of the input signals directly. This first layer returns as output the final state and all those intermediate states of the LSTM, which allow enriching the output of this layer widely. This enrichment is used by the second layer LSTM that uses all these time characteristics and obtains much more complex relations between them and their intermediate states, to finally obtain the output state of this second layer LSTM. All these LSTM layers use hyperbolic tangent functions to provide high non-linearities to the prediction model. Finally, the outputs of the second LSTM layer are processed by the dense output layer that is in charge of interrelating all these and performs a regression through which it predicts the point of interest. This dense output layer employs a linear activation, which is a widely used activation in regression tasks. The input time segment I with size 1, 200 (around 6 PPG signal cycles) is processed by the first LSTM layer that returns all the intermediate states to enrich the used information by the second LSTM layer; this second LSTM layer processes more complex and recurrent information to finally decide, along with the fully connected layer, what is the prediction of the point P(t c + h p(n) ), as we can see in Figure 4. In Table 2, we can see the architecture in detail, with all its layers and parameters. Finally, to provide a better understanding of the used LSTM blocks inside, we can observe all its internal structure in Figure  5.

Training
In the training step, we use the reference signals explained in 2.1.1 to estimate the different trends horizon of prediction and characterize them. Besides that, we will train with five real-world PPG signals from the dataset, following 2.1.2, to provide an average PPG horizon of prediction. Each one of the processing of each one of the signals, will be composed of N different trainings that will create N different RNNs specialized in the prediction of each one of the defined points of interest to predict. So for each signal, we train N RNNs. One hundred fifty thousand points form each one of the training signals. In Figures 1  and 6, we see the training signals conformed by the five most important dynamic behaviors and five realworld PPG signals selected randomly, respectively. In this step, we divide and train with the signals in a training set and a validation set composed by the 80% and the 20% per cent of the dataset, accordingly. These division ranges have been chosen since usually the training set constitutes 60 to 80 per cent of the dataset while the validation and test sets comprise 20 to 40 per cent. Let us know look in greater detail the training parameters.
[ Figure 6 about here.] 1. Optimizer. The used optimizer is the same model applied to our CNN architecture, with the exception that the initial learning rate is 10 −3 and, in this case, the used batch is 1000, because big batch sizes improve greatly the convergence, which is an already known problem in the recurrent neural networks structures. The total training time of the N networks for each dynamical horizon is about 32 hours.
2. Loss Function. The RNN uses to train a training set formed by the input time segment and an output set conformed by the points to predict. The proposed regression loss function to use is the logarithm of the hyperbolic cosine or log-cosh loss. This loss tries to combine the advantages of the L 1 -norm and L 2 -norm, being robust to outliers at the same time that encourages the correct points to learn. The proposed log-cosh loss evaluates the differences between the predicted points P i and the ground truth ones Q i , as we can see in Eq. (6).
Just as we have done for our CNN architecture, we explain in depth and step by step how the training takes place for our RNN architecture. The first thing that needs to be commented is that it is necessary to train one RNN for each point of the prediction horizon to be predicted, that is to say, in order to find the prediction horizon at 50 points of view of a signal that we want to analyze, it is necessary to train 50 RNNs in charge of predicting each of the 50 points of interest. The fact of using a network for each point allows these networks to specialize in calculating the specific point. The inputs of the proposed RNN are composed of segments of 1200 points of the input signal, while the outputs of the same are composed of each of the 50 points following the input segment analyzed. The training of this network can be done both with real-world PPG signals and with previously generated synthetic signals. However, to build the prediction horizon solidly and to associate real PPG signals correctly to a trend, it is necessary to make the prediction horizon both from reference signals and from real-world PPG signals, thus being able to build a more robust and intuitive diagram. In the training process, random batches containing random segments of the signal to be analyzed is processed, with a length of 1,200 points. This process, randomize, provide a more generalized and learning-rich training. Once the batches are loaded, during 10 periods and through the use of the Adam optimizer, the proposed system is trained to predict the output point correctly. For the training to be carried out, 80% of each signal sample trained is used; 10% of each signal for validation and 10% of each signal for test. Once the prediction horizon and predicted points of the real-world PPG signals are obtained, the average prediction horizon measured by these PPG signals of interest is calculated.

Results
Different experimental tests based on neural networks allow us for a qualitative and quantitative assessment of the dynamic behavior of a PPG signal at different timescales. The results are presented graphically and numerically. The numerical results shown in the tables concern all the measurements of the PPG signals available for each individual, i.e., around 600,000 points (40 minutes

Dynamic behavior classification with a CNN model
Here we show the evaluation of our implemented CNN-based dynamic behavior classification system (cf. § 2.2 for more details) using a real-world PPG signals dataset, consisting of 40 PPG signals from young and healthy individuals between the ages of 18 and 30, according to a national research project, and reference signals of well-known dynamics (cf. § 2.1). Also, we look at the effect of noise and filtering on PPG signals to prove the classification capability of the system. The dynamic characterization process consists of two trained systems with a different number of input samples chosen empirically, the first with I = 5, 000 and the second with I = 60, 000. In this way, we provide two different perspectives: a very locally approach (I = 5, 000), or small timescale, and a less locally approach (I = 60, 000), or a large timescale, of the system, as we can see in Figure 7. The rationale for defining different timescales is that the dynamic behavior differs considerably on the PPG signal as we introduce more and more signal cycles, allowing us to discover the hidden dynamic richness of the PPG signal that we cannot appreciate at small timescales.

Preliminary analysis of the PPG signal
This first experimental test allows us to analyze the dynamic behavior of PPG signals at different time scales. Unless stated otherwise, to avoid high-frequency noise and to some extent, motion artifacts, all PPG signals are filtered with a simple Butterworth bandpass filter with cutoff frequencies at 0.01 and 8 Hz. The notion of small timescale means using as input to our CNN model I = 5, 000 samples, and large timescale means I = 60, 000 samples. In Figure 8, we show an input sample PPG signal superposed with the dynamic behaviors matching-up rate allocated by our proposed CNN model at different timescales. In Table 3, we refer to the corresponding numerical values, also providing the average values of each dynamic behavior for all PPG signals.
[ Figure 8 about here.] As can be seen in Table 3, at small timescales, the predominant dynamic behaviour of PPG signals is quasi-periodic, with matching-up rate of around 99%, and residual rates for the rest of the dynamics. However, at larger timescales, the dynamics get more complex and intriguing. It is now the aperiodic evolution that manages the dynamic behavior of the PPG signal, with a by no means negligible quasiperiodic component and a weakest random component.

Raw versus filtered PPG signals
The next experimental test aims us at assessing the discriminant power of the dynamic behavior classifier implemented by our CNN model for filtered and raw (unfiltered) PPG signals. Raw PPG signals include  those supplied direct by the research project itself. Although we are aware that PPG signals have some preprocessing, it is our understanding that PPG signals have quite a few motion artifacts, among other limiting factors. For filtered PPG signals, we simply kept that stated in 3.1.1, according to which all PPG signals are filtered with a Butterworth bandpass filter with cutoff frequencies at 0.01 and 8 Hz, in order to remove artifacts while retaining as far as possible all dynamic information available. As in the previous experimental test, we explore the dynamic differences between raw and filtered PPG signals at different timescales. In Figure 9, we show an input sample raw and filtered PPG signal superposed with the dynamic behaviors matching-up rate allocated by our proposed CNN model at different timescales.
In Table 4, we refer to the corresponding numerical values, also providing the average values of each dynamic behavior for all PPG signals, in conjunction with the percentage change |∆|, in absolute terms, between filtered and raw signals.
In accordance with Table 4, we can see how filtering enhances the discriminating power of our CNN architecture. Raw PPG signals enter much noise into the outputs provided by our CNN model, especially on a large timescale, where even the predominant dynamics, quasi-periodic and aperiodic behaviors, exchange their role with respect to what happens with filtered signals. At small timescales, the effect is not as marked, perhaps due to possible noise in the input signal or spurious artifacts, and the outputs provided by our CNN model are more stable. However, the percentage change is still significant in the case of the predominant dynamic behaviors, more than 15%, and the classification process implemented may make mistakes.
[ Figure 9 about here.] 3.1.3 PPG signals with added white noise Within our CNN architecture, the latest experimental test probes the robustness of the dynamic behavior classifier when examined PPG signals are noisy. That is why we add different additive white noise levels to real-world PPG signals, and roughly estimate the tolerance threshold of our CNN network to discriminate dynamic behaviors in noisy PPG signals at different timescales. In Figure 10, we show an input sample noisy PPG signal, at 1%, 5% and 10% noise level, superposed with the dynamic behaviors matching-up rate allocated by our proposed CNN model at different timescales. In Table 5, we refer to the corresponding numerical values, also providing the average values of each dynamic behavior for all PPG signals, according to the added white noise amplitude.
[ Figure 10 about here.] Within Table 5, we can see that with 5% white noise added to PPG signals, our CNN architecture already detects a strong random component at the expense of the alleged winning dynamic choice in each timescale. An interesting point is that with 1% added white noise, the random component is hidden, and at the same time the chaotic component resurfaces, but not above 4.5% on a large timescale.

Comparison with other architectures (state of the art)
Finally, in an additional experimental test, we compare our proposed CNN architecture with others from the state-of-the-art to ensure the reliability of the obtained results. The architectures being employed by comparative purposes include Alexnet (Deng et al., 2009) and VGG16 (Simonyan and Zisserman, 2015), which we have implemented and adapted to solve one-dimensional problems, as the one raises here. The rationale for using these specific state-of-the-art architectures lies in the dissimilarity of their internal structures, each with innovative strategies in information processing.
Starting with the Alexnet architecture, it represents one of the most basic CNN structures. Its conceptualization is appealing because it uses convolutional layers with large kernels and reduces fast these kernels to the small ones, producing abrupt transitions and obtaining more general features of the inputs due to this initial large receptive field of the convolutional layers. As opposed to the Alexnet architecture, we find that the VGG16 architecture introduces another interesting structure in a CNN: the filtering banks. These consist of 3 × 3 convolutional layers followed by Max Poolings to obtain the greater activations of the feature maps produced. In this case, the transitions between layers are smoother but have a less receptive field, which implies the use of more local features. Lastly, we propose our architecture, which is based on (He et al., 2016) and introduces the concept of the residual layers, that allow to recover and use the previous information along the Residual Blocks to avoid the leak of information along the convolutional layers and ensure the use of all the possible information in the CNN to predict the underlying dynamic behavior. As you can see, each of the architectures explained develops important concepts and different types of classification backends, to ensure the reliability of the obtained results.
In Figure 11, we show a sample input PPG signal superposed with the dynamic behaviors matchingup rate allocated by the proposed CNN-based models at different timescales. In Table 6, we refer the corresponding numerical values for four different PPG signals randomly chosen. At first sight, according to Table 6, the three CNN-based architectures reveal that, at small timescales, the predominant dynamic behavior is the quasi-periodic, but at large timescales, the 1D-VGG16 diverges from the norm and puts forward differentiated dynamic behaviors. This discrepancy is linked to the small size of the filters used by the 1D-VGG16 architecture, which improves a local timescale analysis, at the expense of a more global analysis (large timescale). A closer look at the numerical values shows how our CNN model performs a much finer dynamic discrimination than the other two proposed architectures, allowing us to a better identification of a component that we feel could be key in the dynamics of PPG signal, the random component.

Horizon of prediction with an RNN model
From another perspective, a supplemental experimental test allows us to predict up to h p future samples of the PPG signal from the current time t c . The principal aim of this experimental test is to shed light on how predictable are PPG signal and what type of dynamic behavior fits better in terms of predictability based on the reference signals. With this in mind, we have implemented and adapted a RNN architecture (cf. § 2.3 for more details) to infer a nonlinear regression model that best fits both PPG and reference signals. All the results are shown in Figure 12.

Discussion
We tackle the dynamical analysis of the PPG signal of young and healthy individuals from two complementary angles. On one side, we identify the dynamic patterns of behavior present in the PPG signal at different timescales with the aid of a custom CNN architecture. On the other, we consider the level of predictability of the PPG signal on the grounds of a nonlinear regression model implemented in an RNN architecture.

Dynamic behavior classification
Within physiological systems, the seeming regularity at a glance of the time evolution of many biological signals, as is the case with the PPG signal, covers up fairly more complex dynamic patterns, slowonset processes, that contribute to efficiently regulating homeostatic mechanisms to the body to function properly. The proposed CNN model has allowed us to classify the dynamics of the PPG signal of young and healthy individuals at different timescales, in connection with the most typical dynamics found on a wide range of physical systems, i.e., periodic, quasi-periodic, aperiodic, chaotic or random dynamic behaviors.
In the first instance, we analyze the dynamics of all PPG signals available in an experimental project conducted in 2015 by using the dynamic classification process implemented in the CNN model. The number of participants in this research project amounted to 40 students between 18 and 30 years old and non-regular consumers of psychotropic substances, alcohol or tobacco. According to Table 3 (cf. § 3.1.1), the dynamic behavior of the PPG signal is clear and predominantly quasi-periodic at small timescales, in line with the physiological response of the cardiorespiratory system, which manifests in the form of a pulsatile component, due to the heartbeat, modulated quasi-periodically by breathing pattern, among other factors. However, on a large timescale, the PPG signal starts experimenting with a strong tendency to aperiodic behavior, despite the presence of some latent quasi-periodicity, which sometimes fades or becomes confused with a less remarkable periodical bias. The cardiorespiratory system as a whole is affected by physiological processes that develop slowly, to better aligning to body's needs in all time, which justifies a more competition of factors affecting cardiac modulation. In either case, in the analysed PPG signals, we have found no significant trace of chaotic or random behavior at all different timescales.
To guarantee obtained results, we also evaluate the influence of filtering and noise on the dynamics of PPG signals. So, Table 4 (cf. § 3.1.2) shows the average contribution that the CNN model assigns to the dynamics present in all PPG signals at different timescales, before and after filtering. As we can see, the raw PPG signals available to us in the study have quite a lot of motion artifacts or artifacts that arose out of the conditions in which the measurements were conducted. As a result, our CNN model has difficulties in clearly distinguishing between the two predominant dynamic behaviors at different timescales, quasiperiodic and aperiodic dynamics, emerging sudden fluctuations that confuse and deceive the matching-up process of the CNN model, as exemplified in Figures 9a-9b. However, once PPG signals are filtered with a simple Butterworth bandpass filter with cutoff frequencies at 0.01 and 8 Hz-in order to avoid highfrequency noise and to some extent, motion artifacts, without losing as much dynamic information as possible-, the matching-up process of the CNN model does not show fluctuations and provides a clear dynamic response. So, on a small timescale a quasi-periodic dynamic behavior and on a large timescale an aperiodic behavior, as shown as examples in Figures 9c-9d, respectively. Nevertheless, despite the predominant aperiodic behavior, our CNN model recognises two additional underlying components: a quasi-periodic component and a random component. One of them, the first, reveals an about sustained heart rate, quasi-periodically modulated by breathing rhythm under normal operating conditions. The second of these, we think that provide the stochastic component that breaks up the apparent dynamic inflexibility and is responsible for introducing multiple frequency variations much needed in a system as adaptive as the cardiorespiratory system (see Figure 8b for an example).
In the case of the effect of noise on the dynamics of PPG signals, we introduce additional levels of white noise amplitude (1%, 5% and 10%) to the PPG signals to verify the dynamic tolerance of PPG signals to external noise. As can be seen on Table 5 (cf. § 3.1.3), up to about 5% our CNN model is being able to retain the prevailing deterministic dynamic response at different timescales. Simply put, on a small timescale a quasi-periodic dynamic behavior, and on a large timescale an aperiodic behavior, although now with 5%, the presence of a strong random component in the detected dynamic behavior is noticeable; with 10% additive noise, the predominant dynamic behavior is random (see Figure 10 for an example).
Finally, we compare the performance of the classification process of our CNN model with that of other implementations of remarkable CNN architectures. In light of the results of Table 6, we can reaffirm that our proposed architecture performs a dynamical analysis very consistent with other typical configurations based on convolutional neural networks. Our proposed model and the Alexnet model use wider-band filters, enabling the detection of a very low frequency dynamic spectrum (large timescale), unlike the 1D-VGG16 model, where the small size of the filters make correct detection of the dynamics on a large timescale virtually impossible. A more thorough examination of the numerical data show how our CNN architecture becomes significantly more selective, in terms of dynamic discrimination, than the other two proposed architectures, allowing us to a better identification of, among others, the random component, which we see as a key factor in the dynamics of PPG signal at large timescales.

Horizon of prediction
From another angle, since we seek to show if the PPG signal of young and healthy individuals is chaotic, we proceed to evaluate the predictive capacity of an RNN model, which implements a nonlinear regressive model, both on PPG signals and reference signals. It is well known that random systems are unpredictable and, therefore, their prediction error is maximal for any prediction horizon. Chaotic systems become unpredictable in a very short space of time, because initially, very close trajectories soon diverge exponentially. Periodic and quasi-periodic signals are completely predictable and, hence, their prediction horizon is virtually limitless, or, in other words, their prediction error is almost zero. Aperiodic signals are somewhat predictable, as initially very close trajectories either evolve in parallel or diverge linearly over time. In any case, the prediction error is always bounded. Figure 12 (cf. § 3.2) shows the prediction horizon of all signals studied, with an error normalized between 0 and 1. The predictable behavior of our RNN architecture for all reference signals meets the above criteria. In the case of PPG signals, the predictable behavior of our RNN architecture shows a bounded prediction error, while lower than aperiodic signal and always away from chaotic and random dynamic regimes, as we can see in Figure 12. We think that the PPG signal is an intricate combination of two different main dynamics, a regular rhythm imposed by the heart rate and an underlying aperiodic dynamic that develops very slowly. Accordingly, there is an excellent balance between all of them, depending on the physiological requirements of each individual. Hence, in the short term, at the beginning of the prediction horizon, a predictive behavior closer to the typical quasi-periodicity prevails, but as the length of the prediction horizon increases, the aperiodic behavior comes in more precisely. So, an increasing dynamic complexity leads to predictive behavior with a growing but bounded prediction error. It is within the realms of possibility that with an infinite predictive horizon the prediction error will increase very gradually, as more slowly developing physiological factors take part in the overall dynamics (stronger aperiodicity), but under no circumstances, would it become unpredictable, as it would in chaotic or random dynamic behaviors.

Conclusions
In this paper, we show, through different experimental tests based on neural networks, that the PPG signal dynamics of young and healthy individuals are not chaotic or random on any timescale. Nevertheless, behind the apparent regularity of a PPG signal, there is a hidden increasing dynamic complexity with timescale. On a small timescale, the dominant dynamics of a PPG signal is mainly attributable to a quasi-periodic behavior, but on a large timescale, the dynamics of a PPG signal become aperiodic, although it does not become chaotic or random. It has been shown both with a classification tool of the present dynamics in a PPG signal, at different timescales, and with the predictive capability of a nonlinear regression model.
We think, pending confirmation by ongoing research work-while preliminary results are steps in the right direction-, that the aperiodic dynamics of a PPG is consistent with the characteristic behavior of a strange nonchaotic attractor (SNA). This category of attractors acts as a dynamic interface between quasi-periodic and chaotic behavior. As an active agent that a SNA is in the transition towards chaos, a highly promising line of research involves finding out how the stochastic (random) component present on a large timescale in a PPG signal facilitates the transition, with all of the consequences that that entails, as well as the psychosomatic causes, or even chronic physiological conditions, that could affect its development.   (He et al., 2016). These blocks have two parallel branches, in which the first one has a single convolution. In contrast, the second uses three convolutions followed by ReLU activations that first reduce the dimensionality of the input through the number of filters and then increase it in the second convolution creating more complex filters and finally in the last convolution project it through a unitary convolution in a space of lower dimensionality. This structure allows the use of convolutions at different scales, combining their outputs without losing much information and acting in terms of size as a traditional convolutional layer.