Stress Detection With Single PPG Sensor by Orchestrating Multiple Denoising and Peak-Detecting Methods

Stress is one of the major causes of diseases in modern society. Therefore, measuring and managing the degree of stress is crucial to maintain a healthy life. The goal of this paper is to improve stress-detection performance using precise signal processing based on photoplethysmogram (PPG) data. PPG signals can be collected through wearable devices, but are affected by many internal and external noises. To solve this problem, we propose a two-step denoising method, to filter the noise in terms of frequency and remove the remaining noise in terms of time. We also propose an ensemble-based multiple peak-detecting method to extract accurate features through refined signals. We used a typical public dataset, namely, wearable stress and affect detection dataset (WESAD) and measured the performance of the proposed PPG denoising and peak-detecting methods by lightweight multiple classifiers. By measuring the stress-detection performance using the proposed method, we demonstrate an improved result compared with the existing methods: accuracy is 96.50 and the F1 score is 93.36%. Our code is available at https://github.com/seongsilheo/stress_classification_with_PPG.


I. INTRODUCTION
When faced with environmental changes, we tend to keep our internal state constant. This mechanism is called homeostasis. Stress can be defined as a state of threatened homeostasis [1], [2]. Stress is one of the common problems in modern society. Long-term stress can lead to chronic activation of the stress response. Chronic stress even threatens physical and mental health. For example, it breaks down a body's immune system and causes cardiovascular disease, diabetes, depression, and other illness [3], [4]. Thus, it is important for us to detect and manage stress to improve the quality of life and reduce the threat of physical and mental health.
Accurately measuring stress has become an important task for people. In the past, stress was assessed by The associate editor coordinating the review of this manuscript and approving it for publication was Jason Gu . directly answering questionnaires [5]. Recently, the demand for wearable devices that monitors our health condition in real-time has increased, as shown in Figure 1, and detecting stress by measuring physiological signals via VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ wearable devices has become possible. The typically used physiological signals are electrocardiogram (ECG) and photoplethysmogram (PPG). ECG is an electrical signal in fine muscles around the heart activated during a cardiac cycle, and the data are collected by attaching sensors to the chest near the heart [6].
PPG is a signal that represents a changing arterial wave during each cardiac cycle, and the data are collected by attaching sensors to the wrist [7]. Therefore, setting up devices is inexpensive and convenient. The PPG signal contains more noise than the ECG signal. Most previous studies used relatively less noisy ECG signals or other types of signals [8], [9]. However, our current research focused on using only a single PPG signal, which is simpler and more convenient in everyday life.
Previous PPG studies focused mainly on denoising or peak-detecting method. Denoising techniques are typically performed in terms of frequency [10]- [12] and time [13], [14]. However, the current methods have limitations in noise reduction since the denoising techniques are applied independently. The current peak-detecting methods mainly focus on attenuating the non-systolic peak, especially the diastolic peak or the peak triggered by noise. Figure 3 shows a typical PPG signal form. The circle indicates the systolic peak, and the triangle indicates the diastolic peak. The proposed method detects the peaks by extracting only a peak with the largest amplitude value per block [15], [16], attenuating the non-systolic peak signals through signal transformation [16]- [19], and removing the non-systolic peak through thresholding [17]- [20]. However, each method can still detect incorrect peaks since the PPG signal comes in diverse forms, and the distribution depends on an individual characteristics.
The objective of this paper is to improve the performance of stress detection by extracting accurate features using orchestrating multiple denoising and peak-detecting methods (OMDP) with PPG signals to overcome the limitations of the previous PPG signal analysis. The contributions in this paper are as follows.
(1) We present an effective two-step denoising method in terms of frequency and time.
(2) We extract more accurate peak points by applying an ensemble-based multiple peak-detection method.
(3) We demonstrate the superiority of our proposed method through seven lightweight classifiers for stress detection. Figure 2 shows the entire process of this work. It consists of denoising, peak detection, and feature extraction. It finally measures the performance of stress detection by training seven lightweight machine-learning classifiers available on low power wearable devices.

II. METHODOLOGY
A. ADAPTIVE TWO-STEP DENOISING METHOD Figure 4 shows that the PPG signal-denoising process consists of two steps: noise filtering and noise elimination.
In the noise filtering process, we use a band-pass filter to compensate the signals in terms of the frequency [10], as shown in Figure 4(a). After applying the band-pass filtering, we use a three-point moving average filter for smoothing.
To deal with the remaining noise after the noise filtering process, we remove the noisy segment in terms of time using a statistical method [14]. Figure 4(b) shows the noise elimination process. We first extract the valley from the signals and divide it into segments per cycle. The previously proposed valley-detecting method, an improved moving-window method, resulted in an inaccurate valley-detection problem because all signals continuously detect incorrect valley due to noise. To address this problem, we devise the method that calculates the mean of the entire signal amplitude and sets each segment from the previous to the current points, which contain the mean values of the entire signal amplitude, as suggested in [15]. We calculate the standard deviation, kurtosis, and skewness of all segmented data. The measurement method is expressed as follows: wherex is the mean value and σ is the standard deviation. When one of the three statistical parameters is beyond the threshold, the corresponding segment is eliminated. We extract the part with high-quality signals as a reference signal, and set it as the threshold. In this approach, we extract the reference signals by ourselves. We can obtain better performance when experts directly extract high-quality signals.
The threshold T σ , T k , and T s1 formulas are as follows: whereσ ,k, ands represent the mean of the three-reference signal, i.e., standard deviation, kurtosis, and skewness, respectively. The optimal parameters obtained from our experiment are as follows: α = 1.0, β = 2.0, γ = 1.8, and δ = 1.5. We reconstruct the signal by removing all the segments beyond the threshold. Finally, we utilize a three-point moving average filter for smoothing at the reconstructed signal.

B. ENSEMBLE-BASED PEAK-DETECTING METHOD
We apply the ensemble to five well-known peak-detecting methods and determine the final peak point by majority voting. Figure 5 shows each process of the five peak-detecting methods. The local maxima method (LMM) [20] extracts all local maximum points from the signal. Among the extracted points, we remove the points with a lesser value than the mean of the entire signal amplitude.
The block generation with the mean of the signal threshold method (BGM) [15] generates blocks with the mean value of amplitude of the entire signal. The points that contain the largest value in each block become the peak point.
The first derivative with an adaptive threshold method (FDA) [17] divides the signal into every 5s and generates blocks. All signals are differentiated, and the points with a differential value of zero are extracted as peak candidates. The threshold is adaptively set according to the mean of amplitude in the 2s of the block. The peak candidates with larger amplitudes than the threshold become the peak point.
The slope sum function with an adaptive threshold method (SFA) [18], [19] applies the SSF, leaving only the ascending point through the differential process and setting the remaining points to zero. All the local maximum points are extracted from the modified signal. The threshold is constantly updated to 70% of the median value between the last five peaks, and only the point with amplitude is larger than the threshold is extracted. It is re-positioned into the original signal since the extracted peak point represents the peak point from the modified signal.
The moving averages with the dynamic threshold method (MAD) [16] set signals below zero, then square all signals. The block is generated by two moving averages. The points with the largest amplitude value in each extracted block ultimately become the peak point.

C. FEATURE EXTRACTION
The PPG characteristics for general stress detection are based on heart rate variability (HRV) [21]. HRV is quantified as the changes in the interval between successive peaks [22]. The signals must be divided into appropriate window sizes to calculate HRV. Since at least 2 min of the window length is required for accurate feature extraction, 2 min of the window length and 0.25 s of the sliding window are applied for HRV extraction. An inaccurate interval between successive peaks exists because not all the peak points of the signal can be perfectly extracted. Therefore, intervals of approximately 300 ms greater or lesser than the mean of the entire intervals are eliminated, and the rest are used to extract HRV [15].
The most widely used measurements, such as HRV time domain, frequency domain and nonlinear domain, are applied to extract the HRV features. We employ the most commonly used features, and these features are listed in Table 1 [21].

III. EXPERIMENTAL RESULT A. EXPERIMENTAL SETUP
We employed wearable stress and affect detection (WESAD) public dataset to verify the proposed method. It provided various types of physiological signals and was labeled with four emotional states: baseline, stress, amusement, and meditation [23]. The baseline condition aimed at inducing a neutral affective state. However, only a single PPG signal was used in our experiments, and the sampling rate was 64 Hz.
To set up the cut-off and threshold in the noise-elimination process of the adaptive two-step denoising method, a 0.1% high-quality signal with high-peak-detection performance was extracted and used as a reference signal. 952 windows were generated through feature extraction. Figure 6 shows the changes of the PPG signal by applying the denoising method step-by-step: (a) the original, (b) after the noise-filtering step, and (c) after the proposed adaptive two-step denoising method. When the proposed denoising method is applied, the number of generated windows was reduced by approximately 8 % as shown in 6 (c).
To quantify the stress detection performance using the proposed method (OMDP), we utilized seven typical learning-based classifiers [23], [24]. The area under the receiver operating characteristic curve (AUC) [25] and F1 score were used to measure the proposed method performance. The accuracy was used for the performance comparison between the existing method and the proposed method. In addition, Leave-One-Subject-Out cross validation was used to verify the generalization performance [26], [27]. Our code is available at https://github.com/seongsilheo/stress_ classification_with_PPG.

B. INTEGRATED APPROACH PERFORMANCE
We investigated the effectiveness of the proposed integrated method (OMDP) by comparing it with the non-integrated  methods: original, denoising without peak-detection, and peak-detection without denoising. OMDP is an integrated method that utilizes both two-step denoising and ensemble-based peak-detection. We performed a binaryclassification, stress versus non-stress, and non-stress is defined by combining the state baseline and amusement [23]. Figure 7 shows the result of the ablation study of the proposed method by averaging all the result of the seven classifiers of AUC and F1 score.
When the ensemble-based peak-detecting method was not applied, the results of each proposed five peak-detection method were averaged. The integrated OMDP method exhibited the best performance with the result of 91.65 for AUC and 88.38 for the F1 score. The result indicated a performance improvement with 4.43% for AUC and 5.01% for the F1 score compared with the second best option that used only the adaptive two-step denoising method. Thus, we confirmed that the integrated method achieved better performance than any of the other methods. The peak detection without denoising showed the lowest performance because we applied the peak-detection method to low-quality signals that contained a lot of noise. Table 2 presents the performance comparison between the existing method proposed by Schmidt, Philip, et al under the same conditions [23] and our OMDP method. We measured the accuracy and F1 score using the same approach adopted in the previous method. The five classifiers demonstrated that the proposed OMDP outperforms the previous method by achieving a performance improvement at an average accuracy of 7.62% and F1 score of 7.04%.
The first table in Table 3 presents the results of each classifiers when the proposed OMDP method was applied. This table lists the AUC and F1 scores of each of the seven classifiers. The Linear discriminant analysis (LDA) classifier achieved the best result with 95.07 for AUC and 93.36 for the F1 score. The second table in Table 3 shows the overall performance results of the seven classifiers by applying the proposed method step-by-step. The bold marks represent the performance of the proposed OMDP method, and the gray marks represent the best results of each classifiers. The proposed OMDP method achieved the best results in all classifiers except the decision tree classifier. This result confirms the generalization performance that the proposed OMDP method performs well for any classifiers.

C. ADAPTIVE TWO-STEP DENOISING PERFORMANCE
To investigate the effect of the adaptive two-step denoising method, we measured the performance by applying our denoising method step-by-step: original, one-step noise filtering method, and two-step denoising method. The ensemble-based peak-detecting method was applied in every comparison. Figure 8 shows the adaptive two-step denoising method performance result by averaging all the result of the seven classifiers of AUC and F1 score.
We confirmed that the two-step denoising method achieved the best result with 91.65 for AUC and 88.38 for the F1 score .  TABLE 3. Summary of the performance of our OMDP method with seven classifiers, and the overall performance results with seven classifiers by applying our method step-by-step on the binary classification task. ''Ens. '' on peak-detecting method section applied ensemble-based peak-detecting method. Bold marks represent the performance of our OMDP method, and the gray marks represent the best results for each classifiers. (Abbreviations: 9-NN = 9-nearest neighbour, LDA = Linear discriminant analysis, NF = Noise filtering, NE = Noise elimination). This result indicated an increase in the performance by 2.09% for AUC and 2.60% for the F1 score compared with the second best option that used only the one-step noise filtering method. An increase in performance of 6.44% for AUC and 10.61% for the F1 score compared with the worst option that did not use any of the denoising methods was also achieved.

D. ENSEMBLE-BASED PEAK-DETECTING PERFORMANCE
We set variable N to determine at least how many peak points should be matched for selection as the new peak point. We measured the performance of the ensemblebased peak-detecting method according to each N in the seven classifiers. The adaptive two-step denoising method was applied to every method. Figure 9 shows the result of the average of AUC and F1 scores with seven classifiers. We achieved the best result when N = 3, i.e., 91.65 for AUC and 88.38 for the F1 score.
When N = 4, the performance was comparable to that when N = 3. However, the number of extracted peaks was too small to extract features in some windows. Thus, we eliminated these windows, and the number of windows decreased 47782 VOLUME 9, 2021 TABLE 4. Tri-classification task performance comparison between previous and our method.

TABLE 5.
Overall performance results of the seven classifiers by applying our proposed method (OMDP) step-by-step on the multi-classification task (tri-class and quad-class). ''Ens. '' on the peak-detecting method section when the ensemble with five peak-detection method is applied. The bold marks represent our OMDP method result, and the gray marks represent the best result for each classifiers. (Abbreviations: 9-NN = 9-nearest neighbour, LDA = Linear discriminant analysis, NF = Noise filtering, NE = Noise elimination). by approximately 2%. When N = 5, most of the windows could not extract the features; thus, we were unable to measure the performance. When N = 1, we achieved the worst result because majority voting was not performed, and many inaccurate peak points were reflected. Figure 10 shows the performance comparison between the single-based and ensemble-based peak-detecting methods. We measured the performance using the seven classifiers and averaged all the results of each classifiers. We obtained that the ensemble-based peak-detecting method achieved the best result of 91.8 for AUC, and 88.6 for the F1 score compared with the single-based peak-detecting method. The ensemble-based peak-detecting method demonstrated an improved performance of 1.2% for AUC and 2.4% for the F1 score compared with the second best option that applied the MAD peak detection method.

E. MULTI-CLASSIFICATION PERFORMANCE
Many studies have focused on the binary classification of stress detection [8], [28]- [30]. However, we need to develop a system to detect the stress levels above certain limit since stress is harmless at a certain level [31]. Thus, we extend our experiment to evaluate the performance using multi-classification. Table 4 lists the performance comparison between the state-of-the-art method by Schmidt, et al [23], and the proposed OMDP method in the multi-classification task for tri-class: baseline, stress, and amusement. We verify that the proposed OMDP method demonstrates higher performance than the existing method. On average, the proposed method achieves a higher performance of about 9.05% for AUC, and 8.90% for the F1 score than the existing method. To precisely verify the generalized performance of the proposed OMDP method, we also performed the multi-classification task for quad-class similar to the tri-class. We used data that are labeled as stress, baseline, amusement, and meditation in the multi-classification. Figure 11 shows the result of the multi-classification task by applying our suggested method step-by-step: original, denoising without peak-detection, and the proposed OMDP method. We obtained the average result of the seven classifiers and showed the average result of each of the five proposed peak-detection methods when the ensemble-based peak-detecting method is not applied. When the proposed OMDP method is applied, the multi-classification task for tri-class achieves the best performance with 74.2 for AUC and 61.7 for the F1 score. The multi-classification task for quad-class also achieved the best performance with 71.7 for AUC and 53.8 for the F1 score. Table 5 presents the result of the AUC and F1 score when the multi-classification task for tri-class and quad-class are performed using the proposed method step-by-step in the seven classifiers. The bold marks indicate the performance of the proposed OMDP method, and the gray marks indicate the best results of each classifier. The proposed OMDP method achieves the best results in four out of the seven classifiers in tri-class and quad-class. On average, we obtained that the proposed OMDP method achieves the best result in the multi-classification task as well as in the binary classification task.

IV. CONCLUSION
In this paper, we proposed an orchestrating multiple denoising and peak-detecting (OMDP) method by integrating various well-known lightweight denoising and peak-detecting methods. we utilized the advantages of the PPG signal, which is non-invasive, and can be widely used in wearable devices. We solved an inaccurate feature extraction problem due to low-quality signals with many noises by applying OMDP method. The noise was filtered and efficiently eliminated by analyzing the original PPG signals using a two-step process in terms of frequency and time. In addition, we applied the ensemble method when peaks were detected to determine the accurate peak point and extract the accurate features. The proposed OMDP method demonstrated superior stress-detection performance over the existing methods with multiple classifiers. This result opens up the possibility to monitor our health in real-time using only a single PPG signal only. The proposed OMDP method also provides the possibility of detecting elaborate stress levels by measuring the stress-detection performance at up to four levels as well as binary levels.