A Benchmark Study of Machine Learning for Analysis of Signal Feature Extraction Techniques for Blood Pressure Estimation Using Photoplethysmography (PPG)

Cardiovascular related diseases are the most significant health concern around the globe. The most crucial health indicator is blood pressure because it gives essential information about the health of a patient’s heart. Cardiovascular diseases can be detected early and prevented if blood pressure is monitored continuously and regularly. Blood pressure cuffs, which are widely used to control blood flow in the arm or wrist when measuring blood pressure, are not practical for continuous blood pressure measurement. However, biosignals can be used for blood pressure estimation; but it is still critical and challenging. In this paper, we conducted a comprehensive analysis of feature extraction techniques for blood pressure estimation by using PPG signals. The feature extraction techniques were further divided into three subgroups to analyse the significance of each group. Group A includes time-based features; group B presents statistical feature extraction, and group C presents frequency domain-based features. The analysis employed several machine learning algorithms and compared their performance from many perspectives. The experimental results from two publicly available datasets demonstrated that the set of features belonging to group A were more reliable than other techniques for blood pressure estimation. We found that deep learning models achieved better performance than all traditional machine learning methods. We also found that the GRU model and Bi-LSTM achieved the best performance for time-domain features for blood pressure estimation. We believe the findings of this benchmark study will help researchers choose the most appropriate method for feature extraction and machine learning algorithms.


I. INTRODUCTION
According to the World Health Organization (WHO), cardiovascular disease (CVD) is the leading cause of mortality, outnumbering all other diseases [1]. The predominant reasons for these diseases are high blood pressure, atrial fibrillation, and high cholesterol. There are a number of devices available to measure blood pressure (BP) manually or digitally, invasively or non-invasively [2]. Invasive BP estimation is the most accurate method that entails inserting a catheter into a peripheral artery's lumen [3]. This method can only be performed during high-risk surgeries or under intensive care [4]. It is a time-consuming method and needs a trained operator to The associate editor coordinating the review of this manuscript and approving it for publication was Tony Thomas.
perform it [5]. For this reason, the standard approach to measuring blood pressure is non-invasive, allowing intermittent or continuous measurement [4].
The continuous non-invasive BP estimation is the process of volume clamping (or vascular unloading) [6]. In this method, A finger cuff is applied to the patient's finger to record arterial waveforms [7]. An integrated photoplethysmograph assesses the diameter of the finger artery, which ensures that the flow of blood in the finger artery is kept constant [8]. It continuously monitors blood pressure in realtime like an invasive arterial catheter device but is noninvasive like a regular upper arm sphygmomanometer. The accuracy of all continuous non-invasive instruments is susceptible to the patient's movement. In addition to this, continuous BP tracking is relatively expensive compared to FIGURE 1. A hierarchical view of non-invasive blood pressure measurement depicts two types of non-invasive blood pressure estimation: intermittent and continuous, which are further divided.
traditional intermittent BP measurement systems [4]. Photoplethysmogram (PPG) is a simple and low-cost technique that can measure the blood volume and heartbeat of the patients. It uses a semiconductor component to illuminate the skin tissue and measure light intensity and volume variations. However, recent developments in this method have demonstrated promising results concerning accuracy, convenience, and clinical acceptance. With technological advancement, it is imperative to have intelligent devices to get health checks, even sitting in the comfort of your home. Many researchers are working worldwide to get a non-invasive and cuffless blood pressure estimation device to achieve this. Continuous blood pressure estimation has attracted many researchers because it can detect early blood pressure problems. Electrocardiogram (ECG) and PPG are the two most commonly studied parameters for non-invasive BP estimation and have shown promising results.
The primary motivation of this study is that PPG is a non-invasive, simple, and low-cost technique that can be used to measure blood volume and other significant parameters related to CVD. Hypertension is an indicator for heart diseases risk, and it can be measured by screening blood pressure. There are conventional ways to monitor blood pressure, but a professional like a doctor or a nurse needs to measure it. Moreover, these methods cannot perform continuous measurements and are unsuitable for integrating with the latest technology. Furthermore, many researchers and companies have explored different methods for continuous BP measurement; development is required for the clinical benefits of PPG for mobile and wearable health technologies [9]. These devices can measure different physiological parameters like body temperature, physical activity, heart rate, including estimated systolic blood pressure (SBP) and diastolic blood pressure (DBP) [10]. The significance of PPG biosignals inspired us to do a detailed analysis of existing feature extraction techniques to assess their respective performance. To the best of our knowledge, this is the first detailed study to perform comparison analysis for feature extraction techniques used for PPG biosignals for blood pressure estimation. This paper proposes a continuous cuffless blood pressure estimation using PPG signals and compares feature extraction techniques by statistical and deep machine learning. For this study, we are particularly interested in these follow-up research questions: RQ1: which traditional feature extraction techniques achieve high performance? RQ2: How accurate are conventional machine learning vs deep learning models for estimating blood pressure?

A. CONTRIBUTION
The key contributions of this paper can be summarised as follows: • This paper aims to demonstrate how to use PPG signals to facilitate continuous monitoring of blood pressure of older or vulnerable patients. To be more specific, we analyse feature extraction techniques to estimate blood pressure more accurately.
• We analyse the performance of different types of feature extraction techniques. We also investigate the importance of these features individually to provide the best feature performance in this domain.
• The experimental results on two datasets show that the performance of Time-domain features achieved better results with deep learning algorithms. To the best of our knowledge, this paper provides the first extensive comparison of different machine learning methods with different feature extraction techniques for blood pressure estimation using PPG signals.
The paper is arranged as follows: Section 2 assesses the proposed research taxonomy in the light of existing literature. Section 3 discusses publicly available databases used in this study, feature extraction techniques and machine learning algorithms. Section 4 details the experiment setup and results with discussion. Section 5 presents the conclusion and future work.

II. BACKGROUND AND RELATED WORK
This section sheds light on the background of the PPG signal and provides an overview of current research work to estimate blood pressure using PPG. Numerous methods for estimating the cuffless BP have been presented as alternatives to conventional blood pressure measurement techniques over the last decade. Pulse transit (PTT) time and pulse arrival time (PAT) have been widely used for this purpose [11]. PTT is defined as the amount of time a pressure wave takes to travel concerning two body parts. It is determined as the time difference concerning the proximal and distal locations of the arterial pulse [12]- [14]. Another cuffless approach is the Pulse Arrival Time (PAT) method which uses the total of PTT and the period of pre-ejection [15]. PTT can be extracted by simultaneously using PPG and ECG signals and is a valuable element in assessing BP values [13]. In several studies, researchers used vascular transit time (VTT) to estimate BP values. It is defined as the time interval between PPG and heart sounds [16]. Furthermore, the heart rate (HR) has been used for blood pressure estimation [17]. However, PTT and PAT approaches are confronted with several problems, despite their advantages, which have limited their usage to research. Specific measurements are required at two separate locations on the body, and therefore, for evaluating those parameters, two sensors (ECG and PPG) are needed for measurement. Some patients may find this uncomfortable and challenging. More sensors also demand more work on signal pre-processing, which takes time and increases the complexity of the computation. Furthermore, these methods are dependent on complex models of artery wave propagation and need individual calibration because of their dependence on patient physiology. These are all reasons why PAT and PTT may not be trustworthy approaches and a substitute for regular blood pressure measurements [18], [19].
The reasons, as mentioned earlier, inclined the researcher's attention towards PPG-based devices to estimate BP as it requires one pulse sensor instead of two different sensors and is a non-invasive, cuffless approach. A photoplethysmogram (PPG) is a non-invasive and low cost optically taken plethysmograph technique, first explored in the 1930s. It can find changes in blood volume in the microvascular bed of tissue. A pulse oximeter can obtain the signals to measure the changes in light absorption by illuminating the skin. [20]. Pulse pressure is created from the heart pumping blood to the periphery in each cardiac cycle, and this can be measured by illuminating the human skin surface using an LED (Light Emitting Diode) to a photodiode. PPG can be used to monitor the cardiac cycle, heart rate, respiration, depth of anaesthesia, blood pressure, and conventional imaging as remote photoplethysmography. These signals are composed of nonpulsatile (DC) and pulsatile components (AC). DC has narrated the light absorption in the skin tissue, which correlates with the AC component synchronised with arterial pulsation and heart [21]. According to the Beer-Lambert law, blood volume changes represent optical path length ( d) to generate AC as heartbeats [22]. It has traditionally been used in the healthcare domain to measure blood oxygen saturation [23] and heart rate [24] with a pulse oximeter. However, PPG signals can be analysed to benefit many areas, like the automotive field, to get information about driver behaviour [25], monitoring blood circulatory condition, breathing, and subjective analysis [26].
Over the last decade, machine learning with data analysis has been an attractive area for research. Many machine learning techniques and algorithms have been applied to calculate blood pressure from the study of PPG signals. In 2011, SBP was measured by a random forest model using a set of characteristics of PPG signal by Monte-Moreno [27]. Then, a similar methodology was performed on intensive care patients' data [28] for continuous measurement of blood pressure using a deep belief network model using PPG signal analysis [29]. Teng and Zhang [30] applied linear regression (LR) algorithm to investigate extracted features from the PPG relationship with blood pressure. It was observed that the diastolic time showed a stronger relationship with systolic blood pressure than diastolic. It was determined that the outcomes were consistent with the American National Standards of the Association for the Advancement of Medical Instrumentation's (AAMI) [31] requirement for blood pressure estimation non-invasively. Nevertheless, they used a small dataset for testing consisting of data from individuals with normal blood pressure. The AdaBoost machine learning algorithm was employed to estimate blood pressure [32]. A threshold was used to classify SBP values, and then a machine learning model was used to estimate SBP using a nonlinear classification process. This method was only capable of assessing SBP and did not meet the criteria set by the AAMI standards.
In 2013, Kurylyak, et al. [33] extracted distinguishing features using PPG waveforms and used a feed forwards neural network to estimate SBP and DBP. The results were promising in terms of developing an accurate cuffless blood pressure monitoring system. Liu, et al. [34] extracted 21 time-scale PPG features proposed by Kurylyak, et al. [33] and 14 more features from PPG's second derivative (SDPPG). SDPPG has previously been shown to detect aortic stiffness and compliance, closely linked to blood pressure [35], [36]. They applied support vector regression (SVR) to estimate blood pressure. This study found that combining features improved the accuracy. In another research study, Khalid, et al. [37] applied three machine learning techniques to predict blood pressure categories and compared their accuracy. They extracted three PPG features: pulse area, 25% width, and pulse rising time. However, the results for blood pressure categories were unsatisfactory. Dey, et al. [38] presented a set of PPG features based on frequency and time domains. This new BP model used physiological and demographic partitioning. They proposed a smartphone app called InstaBP. However, this mobile application is no longer available. In addition to this, the findings were not AAMI standards-compliant. Yi, et al. [39] extracted nine morphological features from each PPG signal and applied linear regression, Elastic network, LASSO, KNN, and CART. The results showed that KNN outperformed all other algorithms, and they also stated that fewer PPG features could achieve the same accuracy as more PPG features.
Liang, et al. [40] used the GoogLeNet convolutional neural network for hypertension stratification with traditional features extractions using PPG signals. In the recent past, the cuffless blood pressure measurement studies by LSTM have received considerable attention [41]- [44]. Radha, et al. [45] applied many feature extraction techniques like time-domain, entropy-based, and frequency domain on PPG signals to retrieve more information and used machine learning neural nets. Further, numerous models of machine learning were applied to predict the relative SBP and DBP in a free-living context with 103 participants. It was found that the LSTM model worked better for the prediction of SBP [46]. The literature indicates that these models can measure blood pressure with a single PPG sensor, and it is the most straightforward and cost-effective technique. However, the link between the VOLUME 9, 2021 FIGURE 2. Structure of proposed methodology for a detailed comparison of feature extraction techniques and machine learning algorithms for systolic and diastolic blood pressure estimation.
PPG features and blood pressure were not necessarily linear. While evaluating a huge dataset from diversified groups, linear models might have difficulty accurately modelling the link between BP and PPG. Other machine learning models, like random forest and SVM, are more precise. Since these models require one for each objective, the two independent measures, SBP and DBP, can be computed separately. Though, DBP has a strong correlation with SBP and can be used to improve its estimation. Therefore, they should be modelled simultaneously using a single model architecture.

III. METHODS
The proposed approach has four stages: In this study, the PPG-BP dataset [47] and Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC-II) dataset [48] are used for the analysis of feature extraction techniques and machine learning models.
The MIMIC-II database [48] incorporates health-related information associated with a huge number of patients who remained at the Beth Israel Deaconess Medical Center critical care units from 2001 to 2008. This dataset is open-source and used extensively for blood pressure estimation analysis because it offers an array of signals containing PPG and references BP signals from a wide range of different demographics. It comprises physiological signals, including BP, PPG, ECG and breathing. These signals are sampled at 125 Hz simultaneously. BP and PPG signals have been divided into 30-second windows, which display stable BP values.
The second dataset is PPG-BP, published recently and recorded at Guilin People's Hospital in Guilin, China. It consists of many patient parameters like age, height, weight, heart rate, BMI, blood pressure readings, PPG signals, etc. These records were collected from 219 patients aged between 21 to 86 years. It also contained other diseases records like hypertension and diabetes [47]. The MIMIC dataset has been used extensively in previous studies. This comprises much more information such as age, heart rate, ECG signals, PPG signals, arterial blood pressure, etc. Table 1 can be seen for statistical analysis of these two datasets.

B. RE-PROCESSING
A critical component to process a signal is accurately identifying and extracting PPG characteristics from the original signal data, as this is required to construct a reliable and widely applicable model. In the signal databases, there are a significant number of BP and PPG patterns that have distorted and irregular segments. We performed several steps in order to extract PPG features effectively. Firstly, we removed irregular signal segments and aligned PPG and BP signals. The patient's movement during the test could create a shift in the base of these signal acquisitions. These movement artefacts and noise can affect the signals subject to the sensor's properties, such as patient measurement point, movement, and ambient light. To denoise the signals and remove these movement artefacts, we applied the Butterworth filter, a signal processing filter. It is intended to provide frequency response in the passband that is as flat as mathematically possible. It is also known as a flat maximum magnitude filter [49]. Moreover, the min-max normalisation method was used to normalise the PPG signals to a range of [0,1] using the PPG signals. The signals were further segmented into the same time frames, and the resulting datasets were later used as input datasets for feature extraction and machine learning algorithms.

1) FEATURE EXTRACTION
In order to establish a trustworthy BP estimate model, it is necessary to identify integrated features pertinent to the task, especially the characteristics that describes best how blood pressure fluctuates. Due to the fact that PPG waveforms vary from person to person, there is no precise set of features that establish a direct relationship with blood pressure. Some features may not be extracted from all PPG signals; for instance, in the arterial pulse, there is a tiny downward deflection known as ''aortic notch'' or ''dicrotic notch'' [50] that can be calculated or approximated from some signals only [51]. This notch can be found in PPG waveforms recorded from healthy and young individuals [52]. However, the signals without a clear dicrotic notch may deteriorate the accuracy of BP estimation. Thus, building a dataset for an automatic blood pressure prediction model entails accurately extracting features from PPG signals, choosing the most influential parameters for blood pressure estimation and generalisation enhancement, reducing model overfitting. Despite this, the PPG-based strategy has gained popularity due to its simplicity and feasibility over the last decade. Numerous features that characterise the PPG waveform have been explored in the literature.

2) TIME DOMAIN-BASED FEATURES (GROUP A)
The PPG waveform is defined by the amplitude and duration of specific cardiac cycle components. Due to the movement artefacts present during the recording of PPG signals, the height of the pulse might change, considering that it is not suitable for blood pressure calculation. By contrast, the collection of frequencies by oximeters and other sensor devices varies according to the duration of each individual heartbeat for the PPG signal acquisition [53], [54]. This means that each heartbeat is signified by a unique set of values of samples, which cannot be used as the ANN input vector since input neurones numbers are fixed and cannot be modified during the ANN architecture. As a result, a different solution must be sought. Numerous factors can be used to define the pulsatile component of PPG. Apart from the Systolic upstroke Time (ST), Diastolic Time (DT), 2/3 and 1/2 pulse amplitude widths [30], the pulse height, cardiac period, and peak width at 10% of the pulse height are also employed [55].
Kurylyak, et al. [33] investigated additional data to extract more features and scrutinise the optimal parameter combination afterwards. In particular, they proposed that the width be calculated at 25%, 33%, and 75% of the pulse height and that different values be extracted for the systolic part (i.e., the interval between the minimal and maximal points) and the diastolic part (the interval between the maximal and the next minimal point). In this study, 21 parameters were extracted and used, including the timings and ratios of systole and diastole. These parameters can be seen in Figure 3.
Using the MIMIC-II dataset, the ABP signal provided the blood pressure values; what was required were the systolic and diastolic blood pressure single values that matched waveforms throughout the given period. Systolic blood pressure denotes the peak of the ABP pulse waveform's systole, while DBP denotes the peak of the ABP pulse waveform's end diastole. As a result, the ABP signal's peak and end at diastole points serve as the SBP, and DBP ground truth values, respectively. Time domain-based features that were extracted from the PPG waveform from each cycle. These features were calculated from the signal waveform width at a different signal level, such as at 10% for systolic and diastolic.

3) FEATURE IMPORTANCE
Dimensionality reduction is a pre-processing step in machine learning that effectively removes irrelevant and redundant information, increases learning accuracy, and improves the understandability of results [56]. In this study, we used several machine learning algorithms to reduce dataset dimensions to reduce complexity further and speed up calculations. It can help to exclude dataset redundancies and choose merely the efficient PPG features to estimate BP. Redundant features may negatively affect the output target, increasing the estimation error. Therefore, we have applied feature importance that refers to a group of techniques for assigning scores to the input features of a predictive model, assigning the relative importance of each feature when making a prediction. Scores of feature importance can be used to get insight into a dataset or model and can also be used to improve a predictive model. The relative scores can indicate which features are most important to the target and which are least important. Furthermore, it can help simplify the modelled problem, accelerate VOLUME 9, 2021 the modelling process (removing features is referred to as dimensionality reduction), and in some cases, improve the model's performance. The findings of the analysis of each of the following tools were studied:

• CART REGRESSION FEATURE IMPORTANCE:
Binary recursive partitioning is used for the Classification and Regression Trees (CART) methodology [57]. The process is binary because parent nodes are always divided into two children's nodes exactly and recursive since each child node is treated as a parent. CART analysis includes a number of regulations to split each node in a tree, decide when the tree is complete, and assign each terminal node a class result. The Gini rule is used for splitting, which is essentially a measure of the way in which the splitting rule separates classes in the parent node [57]. Therefore, random forest is well suited for feature selection.
• XGBOOST FEATURE IMPORTANCE: This is an improved gradient boosting algorithm based on the decision tree, which can efficiently build and operate boosting trees in parallel [59]. The core of the algorithm is the optimisation of the objective function. In comparison with the use of functional vectors for calculating the similarity between forecast and history, a gradient boost is built to give the boosted trees a smart indication of the importance of each feature's training model [60]. The algorithm considers ''gain,'' ''frequency'', and ''cover'' as importance [61].

4) STATISTICAL FEATURE EXTRACTION (GROUP B)
In this section, statistical features were extracted. These features are further discussed below to provide more detail.

a: SKEWNESS
This can be calculated as given below, where σ andμ x present the empirical estimate of the standard deviation and average mean of x i , respectively, and N shows the number of samples in the PPG signal.
This is a statistical measure that can be used to describe the distribution of observed data around the mean. Moreover, it represents a heavy tail and peakedness or a light tail and flat distribution relative to the normal distribution. Selvaraj, et al. [62] suggested that kurtosis is a reliable indication of the quality of PPG signals. It can be calculated below where N is the number of samples in the PPG signal, and σ andμ x present the average mean and standard deviation empirical estimate of x i , respectively.
c: PERFUSION This is the gold benchmark for determining the quality of PPG signals. It is the ratio of pulsatile to non-pulsatile or static blood flow in tissue. To put it another way, this shows the difference between the quantity of light absorption by the pulse and the amount of light passing through the skin tissue; this can be expressed as given below where x signal (raw PPG signal) has statistical meanx; moreover, filtered PPG signal is y.

d: MEAN ABSOLUTE DEVIATION, MAXIMUM AND MINIMUM
We included additional statistical features like mean absolute deviation of the given x PPG signal, maximum and minimum of x signal as given below:

5) FREQUENCY-BASED FEATURES (GROUP C)
In this section, frequency-based features were extracted from PPG signals. The multitaper method has been used, an extended version of spectral representation and used to overcome classical spectral limitations. Combining high-frequency resolution and low variance gives a more robust spectral estimation than the classical and Welch's periodograms. This can be seen in equation 7. Further, for underlying stationary processes, x t may include several recurring components given in equation 8. These processes are called conditional or central stationary processes with mixed spectra [63]. The discrete orthogonal increment process expected value dZ(f ) is not zero for these processes and can be calculated by equation 9, where δ shows the Dirac delta function. The dZ(f ) second central moment can be calculated by equation 10.
x t = j C j cos ω j +φ j +ξ t = j µ j e −iωt +µ * j e −iωt +ξ t (8) The multiwindow technique provides a practical yet straightforward likelihood ratio test for the relevance of periodic components in MTM spectral estimation. This method employs numerous data windows known as ''discrete prolate spheroidal sequences'' and ''Slepian sequences'' and can be defined as follows: where W represent spectral bandwidth, N shows the number of sample points of the PPG signal, and λ k shows the values linked with the Slepian sequences ν The energy concentration of the above Slepian functions is most significant in the interval (f − W, f + W). Moreover, the bias from all frequencies is far from the frequency of the window width multiplied by the number of observations, so using these sequences to eliminate window leakage is quite successful [65]. As an initial stage, the MTM computes the expansion or eigen coefficients of input Xt as shown below: The predicted value of y k (f ) can be calculated by combining the previous formulae. Ey This can be calculated by reducing the residual local squared error, that is, when f = f 0 . The squared error is defined as follows by equation (16). The results can be provided as equation (17). The following formula (18) can be used to compute the continuous section of the spectrum: However, a large line component with frequency F0 necessitates the reconstruction of the spectrum in accordance with equation (10).

6) MACHINE LEARNING ALGORITHMS
One of the fastest-growing technology areas today, machine learning lies at the foundation of artificial intelligence and data science, connecting computer science and statistics. Data-intensive learning methods can be applied in science, technology, and commerce, resulting in more evidence-based decision-making in many areas such as health, production, education, financial modelling, law enforcement, and marketing [66]- [68]. Machine learning and biosensor technology have piqued many researchers' interest in this area of research [69]. In this work, we tested a variety of classical and deep learning models. We detail all the machine learning models that we examined in this section.

C. TRADITIONAL MACHINE LEARNING MODELS
We constructed our first traditional machine learning models using Linear Regression (LR), Random Forest (RF), AdaBoost, and Support vector regression (SVR), each with a unique set of characteristics. We employed the linear SVM kernel, one of four significant variations. We provided the results comparison table for all the above-listed algorithms. Algorithm 1 presents the statistical machine learning process for blood pressure estimation given below: Linear regression models are trained to assess the problem's linearity. These models are appropriately regularised through the use of a K-fold cross-validation procedure. It is well known that when there is a substantial nonlinear relationship between the feature vector and the target, the final trained models are inapplicable. These models are straightforward, simple to train, and less prone to over-fitting compared to other alternatives. They also entail fewer samples for training than other alternatives, making them more efficient and thus more efficient in terms of implementation [70].

2) RANDOM FOREST
When using ensemble learning methods, the final prediction is created by merging predictions from a number of weak learners, referred to as random forests (e.g., decision trees) [70]. Each tree is trained on a randomly selected subset VOLUME 9, 2021 of the training data to achieve a low bias and a reasonable prediction variance. The ultimate prediction of a random forest model in a regression problem is the average of the predictions provided by each regression tree. The maximum depth of any tree in this area is unrestricted. Cross-validation was employed to decide the number of trees that will be included in the final regression model [70].

3) ADAPTIVE BOOSTING ALGORITHM (ADABOOST)
The AdaBoost algorithm is the most common and extensively used ensemble learning algorithm, more precisely, of the boosting family of ensemble learning algorithms [71]. AdaBoost's unique feature is that it uses the initial training data to build a weak learner and then adjusts the training data distribution based on the weak learner's prediction performance in the subsequent round of weak learner training. Note that the previous stage's training samples with low predictive accuracy will receive additional attention in the subsequent step. Finally, the weak learners are combined with a strong learner using varied weights.

4) SUPPORT VECTOR MACHINE (SVM)
In its most basic form, the SVM proposed by Vapnik [72] entails the construction of an optimal hyperplane that maximises the division of two distinct classes. Classification models with excellent generalisability are typically constructed using this approach, enhancing their capabilities for a wide range of applications [73]. We used SVM with a linear kernel in our implementation.

a: NEURAL MODELS
We compared three deep learning models for BP estimation using PPG biosignals: LSTM, Bi-LSTM and GRU. The models, as well as their experimental setups and algorithm process, are described further below. LSTM, a special RNN structure, has proven stable and robust for long-range modelling dependencies [74]- [76]. The LSTM's primary innovation is its memory cell c t , which functions essentially as an accumulator of state information. Several self-parameterised controlling gates access, write and clear the cell [77]. If the input gate is activated, each time a new input arrives, its information accumulates in the cell. Additionally, if the forget gate f t is enabled, the previous cell c t−1 status may be ''forgotten'' during this process. The output gate o t also controls whether the latest cell output is propagated to the final state h t . A benefit of using a memory cell and gates to control information flow is that the gradient will be trapped within the cell and be kept from vanishing too rapidly, which is a crucial issue for the vanilla RNN model [75], [78]. The LSTM hidden states can be calculated from the equations below, further seen in Figure 4.

6) BIDIRECTIONAL LSTM
Bidirectional RNN developed by Schuster and Paliwal [79] is another variation of the RNN to train the network in the past and future using input data sequences. For the processing of input data, two connected layers are used. In the reversed time step direction, each layer performs operations. The results can be combined with various merging techniques. Likewise, Bidirectional LSTM has two layers so that one layer operates in the same data sequence direction and the second layer operating in the reverse sequence direction [80].
In some applications, such as phoneme classification, BLSTM was more effective than unidirectional LSTM [81]. For Bi-LSTM calculation, this can combine forward and backwards hidden layers, as shown in Figure. This connection can achieve the temporal flow of information in both directions with better network learning [82].

7) GATED RECURRENT UNITS (GRU)
GRU is a variation of RNN that has been developed to deal with the issue of long term dependency [83], shown  in Figure. GRU is similar to LSTM. The GRU controls the information flow using several gates integrated into its cell and outperforms standard RNNs. However, the GRU offers a few benefits over the LSTM. First, it has two gates (rest gate and updated gate). Second, the GRU does not have a memory cell which makes it computationally faster than LSTM. It is simple to implement and compute. Like LSTM, GRU at each time step has two inputs: the output of the previous hidden state and the current input. Reset gate z and an update gate r combines input and forget gates. These both gates can be computed as follows: where δ presents the logistic sigmoid function. The update gate chooses the volume of information that needs to be passed to the GRU current state. Moreover, the reset gate determines that when the previous hidden state needs to be ignored, that can be computed as given below:

IV. EXPERIMENTAL SETUP
This section discusses the findings of numerous machine learning algorithms tested on two different datasets that contained multiple feature extraction strategies for each PPG signal. The mean absolute error (MAE), root mean square error (RMSE) and standard deviation of the error (SD) are the estimation accuracy metrics that can be used to evaluate the accuracy of the estimation. It provided the BP model's performance to facilitate comparisons across feature extraction approaches and machine learning algorithms. University of Tasmania 1 (UTAS) cluster and Google colaboratory 2 (Colab pro) were used to generate the experimental results on the MIMIC II and PPG-BP datasets, respectively. Further, Python 3.7 3 was the programming language that was used for the experimentation and testing. In order to better understand performance, the results tables have been presented in this section.

A. EVALUATION METRICS
In this study, blood pressure estimation was conducted by extracting features from PPG signals without a cuff. Estimation performance was evaluated using MAE, RMSE and SD performance criteria. It is the average of the absolute difference between the data's actual and anticipated values and is calculated as follows: A residual average is calculated by averaging all the residuals in a dataset whereŷ is the predicted value of y andȳ is the mean of y value: The standard deviation (SD) is a measure of how much a group of values varies or disperses. Standard deviation  measures how close a set's values tend to be to its mean (also known as anticipated value). On the other hand, a high standard deviation implies that the values are dispersed over a broader range of values. Expected values' standard deviation (SD) aids in understanding the dispersion of predicted values in a variety of models. This can be calculated as given below:

B. PARAMETERS SETTING
In order to achieve high-performance accuracy, we used parameters setting and listed the best parameters listed in the tables given below for traditional machine learning and deep learning models. We used a grid search algorithm for traditional machine learning algorithms and selected the best parameters for model tunning. Similarly, each dataset was divided into train, validation, and test sets for deep learning to prevent overfitting and parameter optimisation. The accuracy of the test dataset was used to determine which model parameters were the most accurately optimised. Further, they were compared against machine-learning algorithms to see which was the best. The table provided below depicts the best parameters were found for each feature extraction technique.

C. RESULTS AND DISCUSSION
This section focuses on the performance analyses of traditional and deep learning models. We divided each dataset into three parts: training, validation, and testing to carry out these experiments. We used parameter tuning for better performance for each machine learning model. This section is further divided into feature extraction groups to analyse the performance on two public datasets. We used mean absolute error (MAE), root mean square error (RMSE) and standard deviation (SD) to evaluate the models and compare the performance differences.

1) TIME DOMAIN-BASED FEATURES (GROUP A) PERFORMANCE
We extracted 21 time domain-based features from PPG signals and applied machine learning algorithms in this group. In addition to this, we applied feature importance algorithms and, following a thorough analysis of each feature importance technique's results, only 11 features were selected. Further, these datasets were given as input to machine learning algorithms to estimate blood pressure. Therefore, we constructed two datasets with the reference values of SBP and DBP, one with a set of 21 features and one with only a set of 11 features for group A features. Table 3, 4 and Figure 7 have been provided for the results for each algorithm.
The results showed that LSTM, Bi-LSTM and GRU achieved better accuracy as compared to traditional machine learning algorithms. The GRU performance showed that the average mean error was 3.68+4.28 mmHg and 5.34+5.25 mmHg for systolic BP and diastolic BP measurement using 21 features. However, after feature importance, eleven selected features were applied to all these   algorithms that showed more average error than 21 based features findings. The best MAE and SD were achieved from Bi-LSTM for eleven features set using the PPG-BP dataset. The above findings conclude that 21 features can perform well as compared to the eleven features dataset.

2) STATISTICAL FEATURE EXTRACTION (GROUP B)
These features were extracted by statistical calculations for signal analysis. Figure 8 and Table 5 have been provided for linear regression, random forest, AdaBoost, SVM, LSTM, Bi-LSTM and GRU to visualise the results.
The findings can be seen from Table 5 of the statistical features datasets achieved good accuracy using deep learning models for both datasets. For the PPG-BP dataset, vanilla LSTM performed better among other algorithms that portray mean absolute error + standard deviation of 4.70+5.21 mmHg and 4.68+5.34 mmHg for SBP and DBP, respectively. However, random forest performed better than other machine learning algorithms for the MIMIC dataset for DBP and Bi-LSTM for SBP estimation.

3) FREQUENCY-DOMAIN BASED FEATURES (GROUP C)
In this group, frequency-based features have been discussed. We used the multitaper method to extract these features. Further, we calculated band power and relative band power.
The resultant features set showed a complex input dataset provided to the machine learning algorithms for systole and diastole estimation. Mean absolute error and standard deviation have been calculated to analyse the results given in the table below. Figure 9 and Table 6 have been provided for further analysis of these results. Frequency domain features depict better accuracy than statistical features extracted from PPG signals, But not better than time domain-based features. For the PPG-BP dataset, vanilla LSTM performed better than all others and portrayed an average error of 4.60+5.17 mmHg and 4.98+5.52 mmHg for systolic and diastolic blood pressure predictions. On the other side, for the MIMIC dataset, Bi-LSTM gave an error of 5.42+5.21 mmHg and 6.17+5.89 mmHg for SBP and DBP, respectively.

D. COMPARISON WITH RELATED WORK
Over the last decade, numerous methods for determining cuffless blood pressure have been proposed as alternatives to conventional blood pressure measurement techniques. Most of the literature studies adopted PPG and ECG signals for BP estimation using some methods such as pulse transit time (PTT) and pulse arrival time (PAT). However, these techniques cannot be used in clinical practice because they depend on individual physiological parameters that require calibration. Furthermore, these methods require two sensors for measurement, which can increase the computational  complexity. This inclined researchers towards PPG-based blood pressure estimation, which looks promising. Table 7 presents a performance assessment with literature work in this research area. A number of critical factors such as calibration, dataset size, different evaluation metrics, or distinct techniques make it challenging to compare our work directly to other relevant studies. Furthermore, the MIMIC II dataset have been extensively used in literature; however, PPG-BP is a new dataset, and very few studies have been conducted using this dataset so far. For this, we provide a comparison of our work with some calibration-free studies for BP estimation, for which they used the MIMIC public dataset and mean error (MAE or ME) as evaluation metrics. Kachuee, et al. [70] used the PAT method to estimate calibration-free SBP, DBP and MAP using several machine learning algorithms. The proposed study complied with AAMI standards for MAP and DBP estimation. Nevertheless, SBP estimation did not meet AAMI standards. Moreover, this method requires two signals for calculations from two different sensors. Liu, et al. [34] extracted second derivative features from PPG signals and also used time-domain features for BP estimation. The results error was recorded 8.54+10.9 mmHg for SBP and 4.34+5.8 mmHg for DBP using 35 features. Obtaining the correct SDPPG features values, which are solely dependent on the visibility of the five peak points, is the most challenging aspect of this technique.
Slapničar, et al. [84] suggested a ResNet-based approach for BP estimating using first and second derivatives of PPG signals as an input. They tested the model on 510 individuals' data from the MIMIC III database. The input signals were analysed by a spectro-temporal block, five ResNet blocks, and the GRU layer in the sequence. The proposed technique complexity level was high. Moreover, it achieved low performance as compared to the state-of-art methods. For evaluation metrics, the MAE was 15.41 mmHg, 12.38 mmHg for SBP and DBP, respectively. They also did not provide standard deviation values. We can infer from the literature that traditional features extraction has been a significant factor in this research area, affecting the performance of the BP estimation models. Inspired by this, we presented a study of feature extraction techniques comparison for PPG signals for BP estimation using two benchmark datasets. Overall, given the high number of patient data and only one PPG signal usage, our proposed comparison study suggested that LSTM and GRU models performed well with time-domain features. Moreover, this demonstrates that a single PPG signal can estimate BP with reasonable accuracy.

V. CONCLUSION AND FUTURE WORK
This paper investigates the relationship between features extracted from PPG signals and their capacity to detect systolic and diastolic blood pressure levels. Studies have demonstrated that using ECG and PPG combined can help determine blood pressure directly, although synchronising both signals at the same time can be difficult. In addition, the computation of blood pressure needs the placement of several sensors on the body, which is a complex process to undertake. Therefore, the more sensors we have, the more difficult it is to compute blood pressure. As a result, our goal in this research is to detect blood pressure solely by using PPG signals. To achieve this, we conducted a comprehensive study to find the best feature extraction technique and machine learning algorithm for blood pressure estimation.
The findings of this study demonstrated that time domainbased features achieved better accuracy among other feature extraction techniques. More importantly, deep learning algorithms performed better to achieve high performance with less error for SBP and DBP estimation. Specifically, GRU and Bi-LSTM with time-domain features achieved the best performance for SBP and DBP estimation, respectively, than all other models for all datasets. The results and conclusions drawn from our comparative analysis can aid future research in this area and also assist researchers in selecting the best appropriate model for blood pressure estimation. Our future direction will focus on designing models that can minimise the error for more robust results. Further, we will investigate the attention process and weights to determine which signal's waveform sequence significantly impacts the desired result.
SUMBAL MAQSOOD received the B.S. degree (Hons.) in computer science from Punjab University College of Information Technology (PUCIT) and the M.S. degree in computer science from GC University, Lahore, Pakistan. She is currently pursuing the Ph.D. degree with the University of Tasmania, Hobart, TAS, Australia. Her research interests include machine learning, natural language cybernetics, biotechnologies, data science, and software engineering. She is also working on biosignals analysis using deep learning.
SHUXIANG XU received the Bachelor of Applied Mathematics degree from the University of Electronic Science and Technology of China, China, the Master of Applied Mathematics degree from Sichuan Normal University, China, and the Ph.D. degree in computing from the University of Western Sydney, Australia. He is currently a Lecturer and a Ph.D. Student Supervisor with the School of Information and Communication Technology, University of Tasmania, Hobart, TAS, Australia. His research interests include artificial intelligence, machine learning, and data mining. Much of his work is focused on developing new machine learning algorithms and using them to solve problems in various application fields.
MATTHEW SPRINGER received the Ph.D. degree in information systems from the University of Tasmania, in 2010. He is currently a Lecturer with the School of Technology, Environments and Design, University of Tasmania. His major focus has been on improving teaching with the Discipline of Information and Communication Technology, but is also an Active Member of the Industry Transformation, and Games and Creative Technologies research groups.
RAMI MOHAWESH received the B.E. and M.E. degrees in computer science. He is currently pursuing the Ph.D. degree with the University of Tasmania, Hobart, TAS, Australia.
In his Ph.D. research, he is the first Researcher who investigated the concept drift in fake review detection. His research interests include software engineering, cloud computing, natural language processing, cybersecurity, and machine learning. His current work is on Fake Review. He is a Reviewer of high impact factor journals, such as the Information Processing and Management journal, Artificial Intelligence Review, and Secure Computing. VOLUME 9, 2021