Reliability of a Low-Cost Chest Strap to Estimate Short-Term and Ultra-Short-Term Heart Rate Variability Measures in Response to Emotionally Valenced Stimuli

The heart rate variability (HRV), which represents the time variation between consecutive heartbeats, has been presented as an indicator of how the cardiovascular and nervous systems respond to changing emotional states. A huge variety of affordable wearable devices has become available and provide the HRV with presumed accuracy. A few exploratory studies have investigated the reliability of these devices under different conditions, but their reliability under emotional stimulation is unknown. This study investigated the reliability, described by the similarity and agreement with a five-lead cardiac monitor, of a widely used low-cost chest strap to detect HRV responses in 29 subjects to emotionally valenced stimuli in short-term and ultra-short-term conditions. The HRV was recorded with both the chest strap and the cardiac monitor. The similarity between devices was evaluated based on the cross-correlation of the time series recorded by them and the statistical differences between the measures estimated from the recorded time-series of both devices. The agreement between measures from both devices was investigated with the intraclass correlation coefficient (ICC) and Bland–Altman plots. Our results showed that the signals recorded by both devices were highly correlated and there were no significant discrepancies between the measures derived from both devices either under short-term or ultra-short-term conditions. Additionally, all time-, and frequency- and nonlinear measures demonstrated strong to excellent agreement. This evidence suggests that the low-cost chest strap can be a reliable alternative for estimating short-term and ultra-short-term HRV measures related to time, frequency, and also nonlinear measures.


I. INTRODUCTION
M EASURING human emotional states has posed one of the major challenges of affective science for decades [1].Although different types of measures have been presented over the years, there is no universally accepted gold standard for assessing emotional responses [1].Furthermore, these methods are often scrutinized for their sensitivity to emotional states or their ability to evaluate the valence and arousal of the emotional stimuli.Advances in sensing technologies have allowed researchers to supplement traditional self-reported measures with multimodal data, which are derived from unconscious human responses [1].Interestingly, enhancing emotion recognition through technological innovations has been a fundamental goal of affective computing since its inception [2].Since then, an increasing body of research in this field has aimed at identifying and measuring human emotional states through behavioral, neural, and autonomic responses [3], [4], [5].Particular interest has been given to the latter category, driven by the well-documented yet unclear influence of the autonomic nervous system on emotions [6], [7], [8].Common indicators of autonomic nervous system activity are based on electrodermal responses such as skin conductance and cardiovascular activity metrics like heart rate (HR), blood pressure, and heart rate variability (HRV) [9].HRV, which measures the time variation between consecutive heartbeats [10], is believed to provide an indirect gauge of vagal activity or tone.However, the physiological mechanisms behind HRV remain a topic of ongoing debate [11].The main research interest in HRV stems from its potential to serve as an indicator of how the cardiovascular and nervous systems respond to changing external conditions and mental states.HRV is usually determined from a specific analysis of electrocardiography recordings, obtained from a variable number of pairs of electrodes or leads placed on prescribed areas of the body surface [12].Technological progress has led to miniaturization in recording, visualization, and interfacing hardware, allowing a transition from large, wheeled instruments to handheld devices.In the last years, a huge variety of low-cost wearable electrocardiography systems and fitness trackers, such as wristbands and chest straps, has become available for health and sports monitoring [13], [14], [15], [16].Importantly, the current trend toward increasing the portability of the sensing instruments while reducing their costs could potentially facilitate the integration of heart-related measures, such as the HRV, into affective computing applications.However, the reliability of these instruments has been a source of criticism [17].
Many studies have investigated the accuracy of affordable chest and wrist sensors in estimating HRV measures.Most of these studies focus on short-term measurements within the time domain, revealing strong correlations between data registered by these cost-effective devices and data from highquality, laboratory-grade systems, both at rest [18] and during various intensities of exercise [19], [20].However, the ability of these devices to identify changes in HRV due to exposure to emotionally valenced stimuli, as well as the reliability of ultrashort-term measurements and frequency-domain and nonlinear measures, remains untested.
We hypothesized that low-cost devices can accurately record the variations in HRV in response to stimuli with emotional valence.Therefore, the goal of this study was to investigate the reliability, described by the similarity and agreement with a five-lead cardiac monitor, of one of the most commonly used low-cost chest straps to detect fluctuations in HRV in response to emotionally valenced stimuli in short-term and ultra-shortterm conditions.

A. Participants
A convenience sample of healthy volunteers was recruited from the staff and acquaintances of the research institute conducting the study.Eligible participants were above 18 years old and had no medical history of cardiovascular disease, mental illness, or hearing loss.The study included a total of 29 individuals, comprising 16 women (55.2%) and 13 men (44.8%), with an average age of 30.8 ± 6.7 years.
The study was approved by Universitat Politècnica de Valéncia (Approval No: P0528112022).Written informed consent to participate in the study was obtained from all the participants before their enrolment.
2) Stimulus Presentation: The stimuli were presented using closed-back headphones with circumaural ear coupling, the Logitech G432 (Logitech International S.A., Lausanne, Switzerland).The headphones have a bandwidth of 20-20 kHz and an impedance of 39 .
The Shimmer3 ECG is a lightweight (31 g) and compact (65 × 32 × 12 mm) wireless ECG recording device.It features a four-lead, five-wire ECG recording system with a sampling rate up to 8 kHz.The device provides accurate and reliable ECG signal from five surface electrodes [23], and has been repeatedly used in research [15], [18], [24].It streams data via Bluetooth and also provides local storage through a microSD card.
The Polar H10 is a chest strap that embeds a connector with very similar weight (21 g) and size (64 × 34 ×10 mm) to the Shimmer3 ECG.The device estimates heart rate and RR intervals with a sampling rate of 1 Hz, using a pair of electrodes attached on the inner side of the strap.The Polar H10 streams these data via Bluetooth.

C. Procedure
The experiment was conducted in a dedicated and quiet room, free of distractors and controlled temperature.An experimenter supervised the entire process.Participants were briefly introduced to the objective of the study and were equipped with both the chest strap and the five-lead cardiac monitor, and the closed-back headphones.The electrodes of the five-lead cardiac monitor were placed according to the manufacturer guidelines on the right and left arms, right and left legs, and the precordial V6 position.Participants were then asked to comfortably sit in an armchair.The quality of the recorded signals was checked, and any technical issues were resolved as needed.After a 5-min rest period, the experiment started.Data from both recording instruments were recorded during an additional 120-s rest interval and throughout the presentation of the stimuli.

D. Data Analysis
1) Signal Processing: The ECG recordings of each subject were resampled to 256 Hz.The recording were processed with a fourth-order Butterworth bandpass filter with a range of 1-40 Hz to eliminate high-frequency noise.Additionally, a fourth-order Notch filter ranging from 45 to 55 Hz was used to mitigate power line interference.
Data points with Z -scores exceeding 3 were marked as outliers and removed.One subject with outliers making up over 2% of their recorded data was excluded from the analysis.The PanTompkins QRS detection algorithm [25] was used to detect R-peaks in the ECG signals.The resulting interbeat interval (IBI) series were resampled to 1 Hz.All ectopic or incorrectly detected beats were identified [26] and removed.Three subjects, with ectopic beats accounting for 10% or more of their data, were also excluded from analysis.
The IBI series of the remaining 25 participants were analyzed as follows.First, the entire IBI series of each subject, lasting 350 s, were considered to assess the reliability of the low-cost chest strap in determining short-term HRV characteristics.Second, the IBI series were divided into 70-s segments, corresponding to each block of valenced audio clips, to evaluate the reliability of the device in estimating ultrashort-term HRV characteristics [11].Tables I and II present the measures investigated under both conditions.
2) Statistical Analysis: The similarity between devices was evaluated based on the cross-correlation of the time series recorded by them and the statistical differences between the measures estimated from the recorded time-series of both devices.Initially, the cross correlation between the time series was calculated for varying lag values, ranging from 0 to 10 s.The lag value that resulted in the highest cross correlation was taken as the estimated delay of the time series.The time series were then synchronized based on this identified lag value.The normality of the data was examined and confirmed with the Shapiro-Wilks test.Consequently, differences between the measures derived from both devices were investigated using Student's t-tests, with a significance level α set at 0.05.
The agreement between measures from both devices was investigated with the intraclass correlation coefficient (ICC) and Bland-Altman plots, as recommended by previous research [27].A two-way mixed effects, consistency, single measurement ICC, ICC(3,1), was used to determine the ICC.ICC values below 0.50 were categorized as weak, those between 0.50 and 0.74 as moderate, between 0.75 and 0.89 as strong, and equal to or higher than 0.90 as very strong [28].Bland-Altman plots were used to identify potential systematic differences between the measures (fixed bias) and to spot any outliers.The bias, limits of agreements (LOAs), and the percentage of paired measures that fell outside the LOAs were Time series captured by the polar and shimmer devices for participants exhibiting the lowest cross correlation (above) and the highest cross correlation (below).calculated.In addition, 95% confidence intervals (CIs) were determined for the bias and LOAs.At last, to establish preset acceptable LOA, a priori LOAs, set at a 50% variation of the mean value [29], were also estimated.

A. Similarity of the Time Series
The average cross correlation between the time series recorded by both devices was 0.95, ranging between 0.73 and 0.98.Fig. 1 presents a comparison of the time series for participants with the lowest and highest cross correlation.A majority of the participants (87.0%) exhibited cross-correlation values above 0.8.In particular, excellent cross correlation between time series was found in 65.2% of the participants, and good cross correlation in 21.8% of the participants.Only 13.0% of the participants had cross correlations below 0.8, but these were still moderate to high.In terms of time offset, most of the time series (73.9%) showed a 4-s offset.The remaining time series had offsets of 6 (4.4%), 5 (13.0%), and 3 s (8.7%).

B. Comparison of Measures
No statistically significant differences were found between the measures derived from the recordings of both devices (Tables I and II), which supports their comparability.
Tables I and II display the values of all the investigated short-term and ultra-short-term measures, respectively, obtained from both devices, along with the corresponding p-values resulting from the Student's t-test.

C. Agreement
All time-domain and frequency-domain measures obtained in short-term and ultra-short-term conditions exhibited ICCs exceeding 0.9, indicating excellent agreement.The lone exception was power of the low-frequency band, which came close to this threshold (Tables III and IV).The ICCs of the nonlinear measures, estimated under the short-term condition, also exhibited excellent agreement, but for measures associated with the axes of the ellipse of the Poincaré plots, which ranged from moderate to excellent.
The Bland-Altman plots, which assess the discrepancies between measurements from both devices for short-term and ultra-short-term measures, are depicted in Figs. 2 and 3, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I COMPARISON OF THE SHORT-TERM HRV MEASURES OBTAINED BY BOTH DEVICES
respectively.They visually represent the level of potential biases, and the overall reliability of two devices.
The results of the Bland-Altman analysis are also detailed in Tables III and IV, respectively.The number of samples falling out of the priori LOA was considered small for all short-term and ultra-short-term measures, but for those related to the successive RR intervals that differed by more than 50 ms, which slightly exceeded 10%.
IV. DISCUSSION This study investigated the reliability of a commonly used, affordable chest strap to estimate fluctuations in short-term and ultra-short-term HRV measures in response to emotionally valenced stimuli.We compared the signals recorded by the chest strap and a five-lead cardiac monitor, revealing a high correlation for most participants.Furthermore, there were no significant discrepancies between the HRV measures derived from both devices, regardless of whether the records were short-term or ultra-short-term.Importantly, all time-, and frequency-and nonlinear measures estimated under both short-term and ultra-short-term conditions demonstrated strong to excellent agreement.This evidence suggests that the low-cost chest strap can be a reliable alternative for estimating short-term and ultra-short-term HRV measures related to time, frequency, and also nonlinear measures.
Our findings align with previous studies examining the reliability of the low-cost chest straps compared to laboratory-grade systems under various conditions, including incremental intensity exercising [19], [30], [31] and rest [18], [19].Although each of these studies have used different approaches to assess the agreement between the chest strap and laboratory-grade systems, they have found excellent signal quality from the chest-strap [31], and strong to excellent agreement for specific measures, like average NN interval (AVNN) [19] and HR [18], [19].Our findings not only support these results but also extend the reported reliability of the chest strap in estimating  the most frequently used time-domain, frequency-domain, and nonlinear HRV measures.
It is important to highlight that, despite these measures being frequently used in emotion recognition research [32], [33], [34], [35], [36], the reliability of the low-cost chest  in estimating these measures not been investigated until the present study.This unexplored validity is somewhat contradictory, given its repeated use as a standard in validating other low-cost devices, such as smartphones and smartwatches, that estimate HRV measurements [14], [37], [38], [39], [40], [41].It important to consider that the growth rate of the market of wearable devices is so rapid that it challenges the ability of scientific assessments to stay abreast [18].
The high cost of long-term assessments and the complexity associated with ambulatory monitoring have prompted the exploration of HRV responses in significantly shorter time spans, more accessible for research studies [42].An increasing number of studies investigates the impact of emotional states on HRV measures taken in roughly 5-min intervals, referred to as short-term HRV measures [43], [44], or in less than 5-min intervals, referred to as ultra-short-term HRV measures [45], [46].Our findings indicate that the low-cost chest strap can Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
provide reliable estimations of all the measures investigated under both short-term and ultra-short-term conditions.However, the validity of ultra-short-term measures is still a matter of a debate [42], [46], [47].
Certain limitations of the study should be taken into account when interpreting the results.First, although the sample size is greater than in similar studies [18], [19], [30], [31], it can still be considered limited.Second, the participants were primarily young adults.Thus, extrapolating the results to other demographic groups should be done with caution.At last, since the investigated low-cost chest strap did not provide the ECG signal, the investigation of the processing algorithms used to calculate the RR intervals was not possible.Despite these limitations, the systematic investigation of time-, frequency-, and nonlinear measures in this study, in conjunction with the time series recorded by the device, which have been acknowledged as a limitation of previous studies [18], supports the validity of our findings.

V. CONCLUSION
Our study demonstrates the reliability of the low-cost chest strap in estimating short-term and ultra-short-term HRV measures, even in response to emotionally charged stimuli.The strong correlation of the chest strap with a five-lead cardiac monitor and consistent agreement across various conditions support its validity.In conclusion, the chest strap proves to be a valuable tool for researchers studying HRV, particularly in emotion-related contexts.

Fig. 1 .
Fig. 1.Time series captured by the polar and shimmer devices for participants exhibiting the lowest cross correlation (above) and the highest cross correlation (below).

Fig. 2 .
Fig. 2. Bland-Altman plots of the short-term measures extracted from the 350-s time series obtained by both devices.

Fig. 3 .
Fig. 3. Bland-Altman plots of the ultra-short-term measures extracted from the 70-s time series of each stimulus block obtained by both devices.

TABLE II COMPARISON
OF ULTRA SHORT-TERM HRV MEASURES OBTAINED BY BOTH DEVICES BY STIMULI TABLE III ICC AND OUTCOMES OF THE BLAND-ALTMAN ANALYSIS OF THE SHORT-TERM HRV MEASURES OBTAINED BY BOTH DEVICES

TABLE IV ICC
AND OUTCOMES OF THE BLAND-ALTMAN ANALYSIS OF THE ULTRA-SHORT-TERM HRV MEASURES OBTAINED BY BOTH DEVICES