Evaluating Quality of Photoplethymographic Signal on Wearable Forehead Pulse Oximeter With Supervised Classification Approaches

Pulse oximeter is a common and important instrument in medical clinic, which uses the photoplethysmography to measure oxygen saturation ratio (SpO2). However, the photoplethysmographic (PPG) signal is easily corrupted by the motion artifact when SpO2 is measured in a dynamic scenario. Moreover, the probe of most pulse oximeters available in the market is finger-type clip, which is only suitable in the static scenario. This study aims to develop a wearable forehead pulse oximeter which could be used in a dynamic scenario, and uses two classification approaches, support vector machine (SVM) and convolutional neural network (CNN), for evaluating the qualities of its PPG signals. The higher the quality of PPG signals, the higher the accuracy of SpO2 values. The SVM classified the SQI of PPG signal by twelve statistic features calculated from 7-second PPG segment. In the CNN approach, the PPG signals converted to an image was used as the input. The VGG-19 model, was then used to evaluate the SQI of PPG segments. Twenty subjects were recruited to perform the static and dynamic experiments. For the dynamic experiment, subjects ran on a treadmill with three different speeds, 3 km/hour, 5 km/hour and 7 km/hour. Experimental results indicated that the accuracies of SVM and CNN for SQI classifications were 89.9 ± 1.18 % (mean ± deviation %) and 88.7 ± 1.54%, respectively. Then, the dynamic data were used to test the two classification models which were trained by the static data. The accuracies of SVM and CNN for SQI classifications were 89.5 ± 3.87% and 86.2 ± 4.28%, respectively. The error ratios of SpO2 in the case of static condition before and after the SQI classification with the SVM were respectively 5.6 ± 6.6% and 1.9 ± 1.1%. The results suggested that the performance of SVM was better than CNN.


I. INTRODUCTION
Oxygen saturation ratio (SpO 2 ) is an important parameter in clinical monitor, which can be measured by the finger-type pulse oximeter. Various devices for SpO 2 measurement have been developed in the past decades. With the advancement of technology, development of an accurate and ease-to-use device for SpO 2 measurement has attracted much attention in recent years [1]. In the SpO 2 measurement, two different specific wavelengths of light, red light and infrared (IR) light, are employed. A photodiode receives the light signals that are The associate editor coordinating the review of this manuscript and approving it for publication was Wenbing Zhao . transmitted through or reflected from the tissue. This noninvasive optical technique is the photoplethymography (PPG). PPG-oriented sensors have two main types: transmission mode and reflection mode [2]. The finger-type pulse oximeter on the market uses transmission sensor, which probe usually is placed on peripheral parts of the body, such as fingertip or toe. The reflective sensor can be applied to other parts of body, such as forehead or wrist. Therefore, the reflective PPG sensor is more suitable for the long-term and wearable measurement of physiological signals [3].
The PPG signal consists of a tiny pulsating signal, or the alternating current (AC) signal, and a large baseline signal, or the direct current (DC) signal. The AC signal comes from VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the change of arterial blood flow following heartbeats. The DC signal is mainly due to the light absorption of body tissues, such as skin, muscles, bones and venous blood. The SpO 2 is calculated according to the ratio of the AC and DC components of the PPG signals measured by the red and infrared (IR) lights, respectively [4]. Pulse oximeters have been used at the variety conditions, like as anesthesia, labor and delivery, emergency treatment, and cardiovascular or respiratory dysfunction. During exercise of patients with cardiopulmonary limitations, the pulse oximeter is useful in assessing the severity of disease, monitoring the effectiveness of therapeutic interventions, and determining the need for supplemental oxygen [5]. For the training of athletes, pulse oximeter could be used to assess the degree of hypoxemia [6]. Recently, the comparison between finger and forehead SpO 2 measurements has been presented [7], [8]. According to Sugino's study [7], the forehead pulse oximeter could detect the varying of hypoxia with faster speed and higher accuracy than the finger-type pulse oximeter. The reason is that shaking of a finger is large, and the corruption of PPG signal with the finger-type probe is severe during dynamic measurement. The foreheadtype probe can be tightly tied to the forehead, and its PPG signal is not easily corrupted by the motion artifact. The study of Blaylock also showed that the pulse signal measured from the finger was susceptible to low temperature and low perfusion, which led to more loss of pulses than that from the forehead [8]. Thus, the accuracy of the SpO 2 measurement would be reduced. However, the commercial forehead pulse oximeter is suit to the static measurement, and needs the medical personnel to place the sensor probe at the proper position. Therefore, designing a wearable forehead pulse oximeter that could be easily used in the exercise training or home care will be a benefit for athletes or users with limited cardiopulmonary function.
The interference of motion artifacts and ambient light are two factors that cause the distortion of PPG signal, and thus decrease the accuracy of SpO 2 measurement [9]. The problem of ambient light could be overcome by the structural design of the sensor probe. But, the overcoming methods for the problem of motion artifacts are to detect the signal qualities of corrupted PPG segments or pulses, and delete them. The waveform of PPG pulse bears the regular morphological characteristics [10], [11]. According to these characteristics of PPG pulses, like as main peak, dicrotic notch, pulse width, and amplitude, many studies used the rule-based methods to determine the quality of each PPG pulse. Then, the signal quality index (SQI) represents the corrupted degree of the PPG pulse. Liu et al. have employed the fuzzy rules to determine the SQI of PPG pulses [12], and Fischer et al. have applied the characteristics of PPG waveform and the decision tree to classify its SQI [13]. Li et al. have used the Bayesian hypothesis testing method to analyze the SQI [14]. Some studies used the machine learning approaches to classify the SQI. Pereira et al. [15] and Chong et al. [16] used the support vector machine (SVM) to determine the SQI of PPG segment.
Liu et al. used the fuzzy neural network to evaluate the SQI of PPG pulse [17]. Liu et al. also used the deep learning approaches to classify the SQI of PPG pulse with the same data measured by the previous study [18]. They found that the performance of the latest deep learning methods was better than the traditional machine learning.
In this study, the goal was to develop a wearable forehead pulse oximeter to measure the SpO 2 under the static or dynamic scenarios. In order to improve the accuracy of SpO 2 , we evaluated the SQI of PPG signals, and used the PPG signals with the high SQI to calculate SpO 2 . For users easily wearing the pulse oximeter, we designed an arraytype probe worn at the forehead. After signing the informed consent agreement, twenty subjects participated in this experiment which included the static and dynamic scenarios. The SQI of each 7-sec PPG segment was annotated as high or low according to morphology features of PPG pulses in this segment. Our SQI evaluation algorithms used two supervised learning approaches, SVM and CNN. The former approach used the statistical parameters of PPG segments as the input variables to train the model, and then the trained SVM was tested by the unknown PPG data. The latter approach used a 2-D CNN model to determine the SQI by converting each 7-sec PPG signal segment of the red light and IR light into a corresponding image. We used the 11-fold cross validation to test the performances of two supervised learning approaches. Moreover, because the static data of forehead pulse oximeter was easily measured, we explored whether the static data could be used to classify the SQI of dynamic data. Thus, we also used the static data to train two models, and then determining the SQI of dynamic data by the trained models. The results showed that the performance of SVM approach was better than the CNN approach.

II. HARDWARE IMPLEMENTATION AND SIGNAL PRE-PROCESSING
A forehead pulse oximeter was designed to measure the SpO 2 in static and dynamic scenarios, which had an arraytype probe. The PPG signals of red and IR lights on this pulse oximeter were processed and used to calculate the beat-to-beat SpO 2 .

A. HARDWARE IMPLEMENTATION
The array-type probe consists of three MAX30102 PPG sensors (Maxim Integrated TM, San Jose, CA, USA). Each MAX30102 contains two light-emitting diodes with wavelengths of 660 nm (red light) and 880 nm (IR light), respectively. Moreover, a photo diode, an 18-bit analog-todigital converter, and low-noise electronics with ambient light rejection are also included in this PPG sensor. Communication between sensors and microcontroller is accomplished via the standard I 2 C-compatible interface. The MCU (MSP430F5438A) is a 16-bit microcontroller (Texas Instruments TM, Dallas, TX, USA), which reconciles the workflow of the MAX30102, acquisition of PPG signals, digital signal processing to determine the pulse amplitude, and transmits the data to a notebook PC by a Bluetooth v3.0 module using the UART protocol. The sampling rate was 100 Hz. Figure 1(a) shows the real photo of our designed forehead pulse oximeter, and Fig.1(b) shows when it is worn on the forehead position. The average DC components (digital code) of two light sources within one second must be the range from 1.8 × 10 6 to 1.9 × 10 6 according to the Maxim Integrated Products technique report [19]. Thus, if the average DC components for the red light and IR light were lower or higher than the range, the system would automatically adjust the power of LED light to maintain SpO 2 measurement for each subject at the same condition.

B. SIGNAL PRE-PROCESSING
The PPG signal comprises a very large DC component (PPG baseline signal) and a very small AC component (PPG pulse signal), and is always coupled with some noises due to motion artifact, ambient light and 60 Hz power line interference. Because the power spectrum of the PPG pulse signal is usually below 5 Hz [20], [21], the proposed system applied a fourth-order Butterworth low-pass filter with a cut-off frequency of 5 Hz to filter out high-frequency noise, and smooth the PPG pulse signal. A second-order Butterworth high-pass filter with a cut-off frequency of 0.3 Hz was used to filter out the PPG baseline signal. The PPG pulses were detected from the filtered PPG pulse signal by the differential method [22]. The differential PPG (dPPG) pulse has a main peak during the systolic duration which is significantly larger than the other peaks. Then, the systolic phase of the PPG pulse could be found by seeking the maximum value of the dPPG pulse. Figure 2 shows the PPG and dPPG pulse signals. According to the references [23], [24], the systolic duration is about 0.49 seconds. The maximum point (circle symbol) of each dPPG pulse is marked at the synchronous PPG pulse  as the center position of systolic phase. The valley (star symbol) of the PPG pulse is found backward to the minimum value within 25 points from the center point. The peak (cross symbol) of the PPG pulse is found forward to the maximum value within 25 points from the center point.

C. ESTIMATION OF OXYGEN SATURATION
The SpO 2 value is calculated based on the Maxim Integrated Products technique report [19]. The AC component is the amplitude of PPG pulse signal. The DC component is the average value of the PPG baseline signal within one second. Then the ratio R is obtained by AC and DC components of the red and IR lights by Eq. (1). Then, the SpO 2 value is calculated using Eq. (2), which is updated with one second. The PPG pulse signals, red light (red) and IR light (blue), are segmented by seven seconds, and the window is shifted by one second, as shown in Fig. 3. The SpO 2 of the segment is the average value of all PPG pulses. Figure 4 shows the flowchart of SQI annotation for the PPG segments. Four features are extracted from the PPG pulse.   Then, we annotate the SQI of PPG segment as high or low according to the SpO 2 value and its error ratio, the ranges and varieties of four characteristics.

A. CHARACTERISTICS OF PPG PULSE
There are four characteristics in the PPG pulse, as shown in Fig. 5, including the pulse wave amplitude (PWA), pulse wave duration (PWD), rise time (RT), and systolic-todiastolic duration ratio (SDR). They are defined as follow, where V peak1 is the voltage of peak1, V valley1 is the voltage of valley1, T peak1 is the time of peak1, T valley1 is the time of valley1, and T valley2 is the time of valley2.

B. QUALITY OF PPG PULSE
There were three stages to determine the quality of the pulses within the PPG segment. In the first stage, the forehead SpO 2 measured by our apparatus was larger than 100%, or the error percentage (Error %) between the forehead SpO 2 and finger SpO 2 was larger than 5%. The finger SpO 2 measured by the commercial Rossmax SA310 (Rossmax, Taipei, Taiwan) was used as the reference data. The error percentage is defined as, If the pulse fits any of the above two conditions, the quality of the pulse is annotated low.
In the second stage, the following six conditions including the contour and four characteristics of pulse wave were used to determine the quality of pulse according to previous study [13]. If the pulse fits any condition, the quality of the pulse is annotated low. The six conditions are explained as following.
(1) The pulse wave has the chopping on the peak or valley.
(2) The absolute peak or valley value for red light is larger than the IR light.
(3) The RT is smaller than 0.08 seconds, or larger than 0.49 seconds.
The min{PWA right / PWA left , PWA left / PWA right } is smaller than 0.4.
(6) The PWD is larger than 2 seconds, or smaller than 0.3 seconds. In the third stage, we considered the three-feature varieties of the neighbor pulses. There were three conditions. If the pulse fits any condition, the quality of the pulse is annotated low. The three conditions are explained as following, n represents the nth pulse.
(1) The ratio of RT(n-1) and RT(n) is out of the range between 33% and 300%.
(2) The ratio of PWD(n-1) and PWD(n) is out of the range between 33% and 300%.
(3) The ratio of PWA(n-1) and PWA(n) is out of the range between 50% and 200%. Figure 6(a) shows the labeling results (green line) of all PPG pulses in one segment (red line for the red light, and blue line for the IR light). One pulse wave has the chopping on the valley for the red light. Moreover, the PPG pulses for the red light are more severely corrupted than the IR light. These pulses are all labeled as low quality (-1 value). Otherwise, the pulses are labeled as high quality (1 value). Figure 6(b) shows the PPG signals for the red light and IR light have the synchronous corrupted segment from 1.8 seconds to 5.7 seconds. This segment is annotated as low SQI.

C. SQI ANNOTATION OF PPG SEGMENT
When there were over one-fifth pulses labeled as low quality in the segment, this segment was annotated as low SQI. Figure 7(a) shows the segment annotated as high SQI, and Fig. 7(b) shows the segment annotated as low SQI.

IV. FEATURES EXTRACTION AND CLASSIFICATION
The supervised classification approaches used SVM and CNN to classify the SQI of the PPG segment being high or low. In the SVM approach, twelve statistic features extracted from the PPG signals for the red and IR lights in one segment were used to classify the types of SQI. In the CNN approach, the PPG pulse signals for the red and IR lights in one segment were transferred to a corresponding image which size was 300 × 300 pixels, as shown in Fig. 7. These images were fed into the 2-D CNN to classify the SQI.

A. FEATURES EXTRACTION
In one segment, there were two PPG signals, red light and IR light. We defined six statistic features for each PPG signal, the standard deviation of PWD (SD PWD ), the standard deviation of PWA (SD PWA ), the standard deviation of SDR (SD SDR ), the standard deviation of peak-to-peak interval (SD PPI ), the standard deviation of PPG pulse signal (SD pulse ), and the standard deviation of PPG baseline signal (SD baseline ).
where N is the number of pulses in the segment, PWD is the PWD mean of all pulses.
where PWA is the PWA mean of all pulses.
where SDR is the SDR mean of all pulses.
where PPI is the peak-to-peak interval, PPI is the PPI mean of all pulses.
where M is the sampling number in the segment, M = 700, AC is the mean of PPG pulse signal.
where DC is the mean of PPG baseline signal. VOLUME 8, 2020

B. SUPPORT VECTOR MACHINE
SVM is a machine learning approach that has been widely used for solving supervised classification problems due to its generalization ability [25]. In essence, SVM classifies the maximum margin for the training data with a separating hyperplane that can be formulated as a quadratic optimization problem in feature space. The subsets of patterns closest to the decision boundary are called support vectors. Considering a linearly separable dataset {X i , D i }, where X i is the input pattern for the ith example and D i is the corresponding desired output (1, or -1), a hyperplane is found as the decision surface. It can be written as follows: where W is the coefficients' vector of the hyperplane function. Cortes et al. proposed the soft-margin for SVM [26]. A slack variable (ζ i ) was added to get the margin which could tolerate some data points with the mistaken classification. Thus, the margin between the hyperplane and the nearest point is maximized and is considered a quadratic optimization problem: where C is the cost parameter. When C is larger, the affection of the data with the mistaken classification on the Eq. (16) is larger. When W and b are rescaled, the point nearest to the hyperplane has a distance of 1 W [27]. Using Lagrange multipliers and the Kuhn-Tucker theorem, the solution is given by: Only a small fraction of α i coefficient is nonzero. The corresponding pairs of {X i , D i } are known as support vectors and they define the decision boundary. All other input patterns multiplied with α i values are rendered irrelevant. The hyperplane decision function for the input pattern vector can be written as follows: Via replacing the inner product (X T i X i ) with the kernel function K (x i , x j ), the input patterns are mapped to a higher dimensional space [27]. In this higher dimension space, a separating hyperplane is constructed to maximize the margin. In this study, the linear function and Gaussian radial basis function were used to construct the kernel function as follows:

C. CONVOLUTION NEURAL NETWORK
Because the number of samples is too few and the characteristics of patterns do not have big difference, we chose a 2-D CNN with the multiple layers. We build a 2-D CNN based on the pretrained CNN architecture with 19 layers (VGG-19) [28]. In the output layer, we replaced the 1000 fully-connection with the softmax activation by a 2 fully-connection. VGG-19 model was pretrained for object detection task on the ImageNet dataset [29]. The detailed description of VGG-19 model is shown in Table 1. All of the filters in VGG-19 are of 3 × 3 in size. The down-sampling is performed directly by maximum pooling layers that have a stride of 2, batch size is 16, learning rate is 0.00001, and batch normalization is performed right after each convolution and before ReLU activation. Two fully connected layers have sizes of 1024.

V. EXPERIMENTS
A series of static and dynamic experiments was conducted in which a total of 20 subjects (12 males and 8 females) were invited to participate in the experiments. All the subjects were healthy adults and aged 20 to 25 years old (21.7 ± 2.3 years, mean ± standard deviation), weight was between 45 and 87 Kg (61.4 ± 11.5 Kg), height was between 151 and 185 cm (169.8 ± 9.6 cm) and heart rate was between 58.5 and 95.9 BPM (77.6 ± 11.0 beats/minute (BPM)). They did not have past histories of related major injury. This experiment was approved by the Research Ethics Committee of China Medical University & Hospital (No. CMUH107-REC1-167), Taichung, Taiwan. A commercial medical pulse oximeter (Rossmax R SA310, Rossmax, Taiwan) with the finger clip was used to measure the beat-to-beat SpO 2 and was considered as the reference value in the study. The Rossmax R SA310 has the Bluetooth function to transmit the measured SpO 2 values to a computer. Thus, the SpO 2 measured by our designed forehead pulse oximeter could synchronize with the SpO 2 measured by the Rossmax R SA310. In the static experiment, subjects were asked to wear the forehead pulse oximeter on their heads, and take the finger clip of the Rossmax SA310 on the index finger of right hand, as shown in Fig. 8. Subjects would be measured three minutes. In the first thirty seconds, they were requested to keep their body still, and to slightly move their heads left and right in the next thirty seconds. These activities would be repeated three times. When the subjects slightly move their heads, the PPG signals measured by forehead probe would be corrupted by the motion artifact. But, the PPG signals measured by finger-type probe were stable. Thus, some PPG segments for the forehead probe would be annotated as low SQI. Table 2 shows the number of high and low SQI samples for all subjects in the static experiment. The total sample has 3480. There are 1596 samples belonging to high SQI, and 1884 samples belonging to low SQI.
In the dynamic experiment, it was difficult to keep the fingers still when the subject was running on the treadmill. Then, the Rossmax R SA310 pulse oximeter could not measure the SpO 2 . Thus, subjects only wore the forehead pulse oximeter. When PPG pulse signals were in stable status, subjects ran on the treadmill. In order to observe the corrupted influences of the PPG pulse signals for the different exercise intensities, the running speeds had three types, 3 km/h (walking), 5 km/h (brisk walking) and 7 km/h (jogging), respectively. At each speed, the data were recorded for 3 minutes. Table 2 shows the number of high and low SQI samples for all subjects. The total sample has 9396. There are 5824 samples belonging to high SQI, and 3572 samples belonging to low SQI. Subjects 7 and 11 did not perform the dynamic experiment.  According to our proposed method, a PPG segment was considered true positive (TP) as its quality level was correctly identified, false positive (FP) as its quality level was incorrectly identified, true negative (TN) as its quality level was correctly rejected, and false negative (FN) as its quality level was incorrectly rejected. Here, the performance of the proposed method was evaluated using accuracy, (TP+TN)/(TP+FP+FN+TN), precision, TP/(TP+FP), sensitivity, TP/(TP+FN), and specificity, TN/(FN+TN).

VI. RESULTS
In order to evaluate the performance of SVM and CNN approaches, we designed two analytic methods. First, we used the 11-fold cross validation to train and test the two approaches. We organized the eleven data sets according the subject's serial number. In the first set, the data of subject 1 to subject 10 was used as the training data, and the data of subject 11 to subject 20 was used as the testing data. The data included the static and dynamic data. Thus, the number of training and test data was 6438, separately. In the second set, the data of subject 2 to subject 11 was used as the training data, and the data of subject 12 to subject 20 and subject 1 was used as the testing data. According to this composed method, in the eleventh set, the training data was the data of subject 11 to subject 20, and the testing data was the data of subject 1 to subject 10. Second, we used the static data from all subjects as the training data, the dynamic data of all subjects as the testing data. Thus, the sample numbers of training and testing data were 3480 and 9396, respectively.
We used the first set to search the optimal values of parameters of kernel functions for SVM by the grid-search method. The range of C parameter was from 2 −8 to 2 8 , and γ parameter was from 2 −15 to 2 15 . The step was 2 times. Figure 9 (a) shows the distribution of SVM performance with the linear kernel function. The highest accuracy is 91.2 % when the C is 64. Figure 9 (b) shows the distribution of SVM performance with the Gaussian radial basis kernel function. The highest accuracy is 92.1% when the C is 256 and γ is 0.125. Thus, for the training and testing of SVM, we chose these values. Table 3 shows the training and testing sample numbers of high and low SQI for each fold. In the general analysis,   the sample numbers of high SQI are more than those of low SQI. Table 4 shows the performance of SVM with the linear kernel function for the 11-fold validation, the accuracy is 89.5 ± 0.988%, sensitivity is 91.8 ± 1.61%, specificity is 86.4 ± 1.24%, and precision is 90.2 ± 0.94%. Table 5 shows the performance of SVM with the Gaussian radial basis kernel function for the 11-fold validation, the accuracy is 89.9 ± 1.18%, sensitivity is 91.8 ± 1.87%, specificity is 87.3 ± 1.64%, and precision is 90.8 ± 1.15%.

A. 11-FOLD VALIDATION
For the VGG-19 model training, in order to reduce the difference between the sample numbers of high and low SQI, we extended few sample number to the more sample number for each fold. In the first fold, the sample number of high quality is 3803 and low SQI is 2635. Thus, the sample number of low SQI is extended to 3803. The epoch was 50 times. Table 6 shows the performance of VGG-19 model, the accuracy is 88.7 ±1.54%, sensitivity is 88.1 ±3.23%, specificity is 89.7 ±3.16%, and precision is 92.2 ±2.34%. According to Tables 4 and 5, the performance and stability of SVM are better than the VGG-19 model.

B. CLASSIFICATION OF DYNAMIC DATA USING STATIC DATA-TRAINED MODELS
The second evaluation used the static data-trained models to classify the dynamic data. In Table 2, the total training number is 3480. There are 1596 samples belonging to high SQI, and 1884 samples belonging to low SQI. The total testing number is 9396. There are 5824 samples belonging to high SQI, and 3572 samples belonging to low SQI. Table 7 shows the testing performances of SVM with the linear kernel function at three speeds, 3 km/h, 5 km/h, and 7 km/h. The average accuracy is 89.1 ± 3.53%, sensitivity is 90.8 ± 4.67%, specificity is 77.3 ± 18.87%, and precision is 90.9 ± 0.35%. Table 8 shows the testing performances of SVM with the Gaussian radial basis kernel function at three speeds, 3 km/h, 5 km/h, and 7 km/h. The average    accuracy is 89.5 ± 3.87%, sensitivity is 90.6 ± 4.67%, specificity is 79.3 ± 16.39%, and precision is 91.7 ± 0.85%. The data under 7 km/h has the highest accuracy, 93.2%, and the data under 5 km/h has the lowest accuracy, 85.5%. Table 9 shows the testing performances of VGG-19 model at three speeds. The data under 3 km/h has the highest accuracy, 90.2%, and the data under 7 km/h has the lowest accuracy, 81.7%. The average accuracy is 86.2 ±4.28%, sensitivity is 95.9 ±1.08%, specificity is 67.0 ±6.95%, and precision is 81.4 ±13.51%. The overall assessment is that the SVM has the better performance than VGG-19 model.  Table 10 shows the change of SpO 2 measured by the forehead pulse oximeter at static condition, 3 km/h, 5 km/h, and 7 km/h before and after the SQI classification with the SVM using Gaussian radial basis kernel function. We only used the PPG segments with the high SQI to calculate the SpO 2 after the SQI classification. Because subjects were measured by the forehead and finger-type pulse oximeters only in the static experiment, there was the Error %. We could find that the Error % reduces from 5.6 ± 6.6 to 1.9 ± 1.1 after the SQI classification. Figure 10(a) shows the Bland-Altman plot displaying the difference between the reference SpO 2 and forehead SpO 2 before the SQI classification. PPG signals in some segments were seriously corrupted. Thus, these SpO 2 would be very large negative values. Figure 10(b) shows the Bland-Altman plot displaying the difference between the reference SpO 2 and forehead SpO 2 after the SQI classification.
The mean and standard deviation of the differences were −2.9 ± 15.8% and 1.6 ± 1.5% before and after SQI classification, respectively. There are 50 points and 46 points before and after the classification out of the 95% limits of agreement.
In the dynamic condition, the standard deviation of SpO 2 measuredby forehead pulse oximeter at the three speeds all reduce from 0.7, 1.1 and 2.5 to 0.3, 0.3 and 0.4 after the SQI classification. Box plot is used to display the distribution of SpO 2 before and after the SQI classification for the three speeds, as shown in Figure 11. We used the t-test to analyze the SpO 2 values whether they have the significant difference before and after the SQI classification for the different speeds. In the cases of 3 km/h and 7 km/h, SpO 2 values all had the significant differences. But, in the case of 5 km/h, SpO 2 values did not show the significant difference. Thus, accuracy and stability of SpO 2 could be improved by the SQI classification.

VII. DISCUSSION
PPG signal has been widely used to measure many physiological parameters, like as pulse rate [30], blood oxygen saturation [1], blood pressure [31], respiration rate [32], and left ventricular ejection time (LVET) [33]. But, the information of PPG wave used by these physiological parameters is different. The heart rate measurement used the PPI of PPG signal. Thus, the accuracy of heart rate would not be reduced when the PPG wave has the clear main peak even if the PPG signal is corrupted by motion artifact. The SQI classification for the heart rate measurement would not contribute higher accuracy when the morphology of PPG pulse is seriously distorted [34], [35]. Because the LVET is defined in the DPPG signal as the time interval between the first zero crossing point and the minimum point, when the PPG wave has a slight distortion at the rise time duration, the accuracy of LVET would be reduced. Thus, the SQI decision has to be peak by peak. Liu [16]. Subjects only were requested to move their body, and walk or climb stairs. Thus, the density of motion artifact could be controlled in their study. In our study, subjects were running on the treadmill for the different speeds. The intensity and density of the motion artifact could not be controlled. The accuracies of the SVM and VGG-19 model were 89.9 ± 1.18% and 88.7 ± 1.54% when the classified data included the static and dynamic data. Even if we used the static data to estimate the SQI of dynamic data, the accuracies of the SVM and VGG-19 model also had 89.5 ± 3.87% and 86.2 ± 4.28%. It appears that our results were not as competitive as Chong's study. however, our dataset had a large amount of data with low SQI. In Table 2, the amount of data with the low SQI increases from 602, 904 to 2066 when the running speed increases from 3 km/h, 5 km/h to 7 km/h. In Table 8, the accuracy for the 7 km/h data is 93.2% which is the best than other speeds, because the PPG signals were seriously corrupted by the motion artifact. The SVM could easily classify the PPG segments with high and low SQI. While in the case of 5 km/h, the accuracy is the worst because the PPG signals were moderately corrupted by the motion artifact. Thus, SpO 2 values before and after SQI classification did not have the significant difference in this speed. Figure 12(a) shows the PPG pulse signal during the static and 5 km/h conditions. Figure 12(b) shows the annotated result (green line) and classified result (red line). In the static duration, the accuracy is 100%. But, in the 5 km/h duration, both changes of PWA and SD of pulses are large.
Classifying the SQI of PPG segments appertains to a generalized decision. The annotations of the PPG segments would directly affect the accuracy of classification. In this study, we used SpO 2 value, the characteristics of PPG pulse, and the characteristic change to annotate the SQI. The features of PPG segment were the standard deviation of these characteristics. Thus, the characteristics of PPG pulses with tiny error would make this PPG segment annotated incorrectly. Figure 13(a) shows a PPG segment which SQI is annotated as low but it is classified as high in the static experiment. The first row shows the SpO 2 values measured by the Rossmax R SA310, the second row shows the SpO 2 values measured by the forehead pulse oximeter. In this PPG segment, the first to fourth pulses are annotated as low quality because the RTs of these pulses are over the desired range.  But the PPG pulse signals are very stable. Figure 13(b) shows the PPG pulse signals in the dynamic experiment. Those SpO 2 values were measured by the forehead pulse oximeter.
The forth, sixth and eleventh pulses are annotated as low quality according to ''The absolute peak or valley value for red light is larger than the IR light''. Thus, this PPG segment VOLUME 8, 2020 is annotated as low SQI. But, the standard deviations of PWA and PWD are small. This segment was classified as high SQI.
In this study, the performance of VGG-19 model was not better than the SVM. This result had the opposite view to the previous studies of Liu et al. [17], [18]. The reason was considered that the SQI classification of PPG segment pertains to a generalized decision. The VGG-19 model could detect the tiny difference of the patterns. Figure 14(a) shows that a PPG segment is classified as high SQI by SVM, but as low SQI by VGG-19. We could find that the PPG pulse signals have some baseline drift in the beginning of PPG segment. Figure 14(b) also shows the same problem. The PPG pulse signals have sort of baseline drift between 4 and 5 seconds. This PPG segment was classified by VGG-19 model as low SQI.

VIII. CONCLUSION
In this study, a forehead pulse oximeter was implemented which could measure SpO 2 in static and dynamic scenarios. The array-type sensor was able to find the best measurement position. Because the forehead pulse oximeter would be used in dynamic scenarios, the SQIs of PPG signals have to be evaluated to increase the accuracy of SpO 2 . The SVM and VGG-19 model were used to classify the SQI. Although the performance of SVM was slightly better than VGG-19 model, advantage of using the VGG-19 model is that we did not need to explore the characteristics of PPG pulses in preprocessing step. Noticeably, the main limitation of the study was that the number of the PPG segments was not large enough. If more PPG segments are used in the training process of the VGG-19 model, better results will be expected. Moreover, we did not compare the performance of wrist pulse oximeter with the forehead pulse oximeter in the dynamic scenario. In the future, this issue is worthy to explore.