Non-Contact Wi-Fi Sensing of Respiration Rate for Older Adults in Care: A Validity and Repeatability Study

In recent years, considerable effort has been directed towards non-contact Wi-Fi sensing applications such as fall detection and vital sign monitoring. For emerging technologies in healthcare, it is essential to assess the validity and repeatability of new measurement instruments before real-world implementation. However, the existing literature has not addressed the clinical validity and repeatability of respiration rate measurements obtained from Wi-Fi CSI. This study draws on medical instrumentation statistics to address this research gap by investigating the validity and repeatability of Wi-Fi sensing in measuring respiratory rates. For this purpose, we first implement a non-contact Wi-Fi Channel State Information respiration rate sensing system using off-the-shelf ESP32 devices and signal processing methods. Then, we evaluated the validity of the Wi-Fi sensor’s respiration rate measurement against respiration belt NUL-236 as a ground truth. The Bland-Altman method provided homoscedastic results across the standard range of respiration rates of older adults [12, 28] BPM achieving a validity of [1.29, 1.06] BPM, allowing us to analyze measurement repeatability at a single point. Hence, we assessed the measurement repeatability at 14 BPM using the spread of the data and the implications of random error in the measurements. The Wi-Fi CSI measurements dataset and corresponding belt data were made available for the validity and repeatability experiments. By providing appropriate measurement validity and repeatability metrics, care professionals can make informed decisions about the acceptability and generality of non-contact Wi-Fi sensing systems in measuring respiratory rate.


I. INTRODUCTION
In most countries, including the United Kingdom, medical and public health advancements have contributed to an increase in life expectancy and the quality of life over the past few decades [1], [2].However, the population ages 65 and over suffer the highest morbidity and mortality rates due to geriatric disorders, such as illness and functional decline, as well as injury-related conditions [3].In response to the The associate editor coordinating the review of this manuscript and approving it for publication was Li Yang .ageing population's needs, there is an increasing demand for health services and monitoring solutions.
With the use of emerging technologies, we are able to move away from traditional hospital settings and provide patient-centric care.The ability to remotely monitor patients would have a positive impact on older people's quality of life, as they prefer to age in place and remain independent [4].Additionally, the resulting continuity in patient records can provide higher resolution data in terms of health status, thereby helping to detect and predict health disorders [5].It has been demonstrated that continuous monitoring of patient health status in hospital wards is more effective than manual assessments during nursing rounds in identifying deteriorating patients [6].Incorporating sensing technology into everyday objects and the environment in order to monitor the health status of older individuals can alleviate the strain placed on the health system.A sustainable alternative to some aspects of traditional care could exist as a result of connected systems and the Internet of Things.
An individual's physiological parameters include body temperature, blood pressure, heart rate (HR) and respiration rate (RR), which provide a general picture of their health status [7].An investigation has found that RR and HR changes are more reliable indicators of cardiopulmonary arrest than any other vital sign [8].In addition, the changes in RR and HR correlate with illnesses such as sleep disorders, cardiovascular disease, neurodegenerative disease, fall risk and mental stress.
Traditionally, vital signs have been monitored using wearable sensors, but these are not convenient for longterm monitoring.The current gold standard device for RR monitoring is the Capnograph, which uses a nasal probe or a respiratory mask on the patient [9].RR can also be determined by monitoring thoracic exhalations using a sensing belt [10].The use of these solutions may be considered obtrusive and restrictive for older persons; contact-free vital sign monitoring solutions are therefore preferred for continuous long-term care.
The validity and repeatability of non-wearable sensors play a critical role in their long-term usability.If an instrument contains significant errors, it is unlikely to serve its purpose or provide accurate data for making important decisions.It is therefore imperative that a health care professional determine the amount of error that is acceptable between an intrusive but accurate device versus a non-intrusive but less accurate device that will not interfere with their care decision-making for older patients.Previous studies have evaluated the validity of wearable sensors for RR measurements [11], [12]; however, no research has yet been conducted to ascertain the validity and repeatability of Wi-Fi sensing as a method of RR measurement.Previous studies have focused on system implementation rather than on measurement assessment for clinical and care use [13], [14] [15].This study addresses this research gap by developing methods to investigate the validity and repeatability of Wi-Fi sensing measurements for respiration rate estimation in older adults.
This study aims to contribute to the growing field of Wi-Fi sensing research by introducing an analytical framework and experimental measurement methodology for assessing the viability of Wi-Fi-based sensing as an instrument for respiratory monitoring in care.The main contributions of this study are as follows: • The experimental design and evaluation of the validity of non-contact Wi-Fi Channel State Information (CSI) sensing using a low-cost ESP32 Microcontroller Unit (MCU).This was done for the resting RR range for older adults against a ground-truth respiration belt logger NUL-236 by the Bland-Altman method.The validity of respiration rate measurements has been previously evaluated for wearable devices, but no work has addressed this for non-contact Wi-Fi sensing, which is vital for assessing its measurement robustness for clinical adaptation [11], [12].
• The development of an experimental evaluation technique to assess the repeatability of Wi-Fi sensing-based RR measurements accordingly.Since the Bland-Altman method produced homoscedastic results, this enables the examination of the repeatability of measurements at a single point in the respiration range of 14 BPM.
Although previous studies have addressed accuracy metrics [13], [14], [16], [17], there has been no detailed examination of the repeatability of RR measurements using Wi-Fi CSI sensing to date.
For the aforementioned experiments, a comprehensive dataset of Wi-Fi CSI measurements paired with the corresponding belt data was collected and made available on IEEE Dataport for research reproducibility purposes [18].The accompanying signal processing code will be made available in the repository upon the completion of the project.

II. RELATED WORKS A. UNOBTRUSIVE SENSING
It is the purpose of unobtrusive vital sign monitoring to obtain long-term data collection without encumbering users with wearables by integrating sensors into everyday environments and objects [19].Hence, vital signs can be continuously measured or over time without interfering with the patients' daily lives, which enables the detection of physiological anomalies and data-informed prediction of disorders [5].
In terms of unobtrusive sensing, we only discuss Radio Frequency (RF)-based sensing methods for RR for brevity.RF-based techniques consist primarily of radar and Wi-Fi sensing implementations.
The types of radars that are used for vital sign measurements are Doppler Continuous Wave (CW), Frequency Modulated Continuous Wave (FMCW), and Impulse Radio Ultra Wide Band (UWB-IR).CW radar sensing methods are dependent on cardiorespiratory displacement.They are based on the Doppler frequency shift incurred due to target movement between the transmitted and received signal of a radar transceiver [20].Additionally, a Doppler-based sleep monitoring system was proposed and evaluated in [21] for sleep stage classification based on vital signs and on-bed movements.
Unlike CW radars, which measure only the Doppler frequency at the target, FMCW also measures the range using chirp signals.In [22], low-power FMCW sweeping from 5.46 GHz to 7.25 GHz they used every 2.5 milliseconds to extract vitals through walls and multi-person scenarios.FMCW has a lower resolution for relative motion than CW-Doppler.Hence, since CW and FMCW utilise the same hardware, [23], [24], [25] use a hybrid approach to achieve an absolute distance accuracy of less than 4 cm and millimeter-scale accuracy for relative motion at the 5.8 GHz ISM band.
Alternatively, the UWB-IR measures the target range by transmitting short pulses, computing the time delays in the received pulse amplitudes, and extracting vital signs using distance information.It has the advantage of having a smaller size and lower power consumption than the CW Doppler radar [26].In [27], a method based on autocorrelation was used to extract RR and HR periodic waveforms, as well as subject location.Vitals signs extraction in the presence of random body movement was studied in [28], using active motion cancellation by direct signal fusion from two RF sensors.
Although radar-based methods are effective and precise for Line-of-Sight (LoS) detection and even through walls [22], they require expensive customized hardware which prevents wide-scale deployment [13], [29].

B. WI-FI SENSING
The most widely adopted wireless access globally is Wi-Fi in terms of devices and infrastructure.This proliferation has been enabled by the widespread use of Wi-Fi chipsets in laptops and smartphones, and ease of configuration and low maintenance of W-Fi [30].This has led to the ubiquity of Wi-Fi in homes, offices, and public environments.Furthermore, Wi-Fi's use of unlicensed spectrum bands has unhindered wide-ranging IoT devices and solutions from emerging [30].
Earlier Wi-Fi sensing studies used the Received Signal Strength Indicator (RSSI), which measures the total received power at the receiver.It provides coarse-grained information and is bounded by the sum of the power of each element of the CSI matrix.RSSI has been used for coarse gesture recognition [41] and for RR estimation [42], and also presents a module for sleep apnoea detection.
However, RSSI measurements fluctuate because they are sensitive to environmental noise [39].The patient must be close to the LoS of the transceivers to achieve a good estimate.Thus, limiting vital signs monitoring in practical applications.Meanwhile, CSI allows for the examination of each subcarrier's amplitude and phase information separately, allowing for a finer-grained and wider sensing area [39].

1) WI-FI SENSING IN CARE
An activity and fall recognition system using CSI amplitude was proposed in [33], which differentiates between sitting, standing, walking and falling, and could be utilized in ambient assisted living as an initial phase to analyze the behaviour of the older people.For older adults in independent living, approximately 50% of their falls occur at home; hence, RT-fall [34] implements real-time activity segmentation using CSI phase difference to detect fall events starting from standing or walking positions.Since gait is an effective biomarker in assessing functional decline, GaitWay [35] was designed to unobtrusively capture gait speeds while walking using CSI, extract gait features, as well as recognize the gait of different users.Furthermore, the CSI phase difference between antennas was used to detect nocturnal seizures in [43] to support patients with epilepsy and caregivers.

2) WI-FI SENSING FOR VITAL SIGN MONITORING
A model of respiration detection using Wi-Fi CSI was introduced in [29] leveraging the Fresnel Zone model and Wi-Fi radio propagation, which has informed the RR extraction performed in our work.Micro-movements can be extracted from Wi-Fi CSI signals, including those induced by respiratory and cardiac activities.Together with macro-movements such as falls and rollovers during sleep, they help provide more information about an individual's health status.For instance, Liu et al. [39] used CSI amplitude information to extract RR during sleep, as well as sleeping posture and rollover events during sleep.Multi-person respiration monitoring during sleep has been implemented in [44] on three persons, suggesting that a respiration state analysis would be necessary to map measurements to each target subject, assuming that each subject follows a different respiratory pattern.This has then been achieved in [45] by modeling CSI-based multi-person respiration sensing as a blind-source separation problem using multiple antennas.A sleep-stage recognition program was implemented for in-home sleep monitoring using respiratory data in [15].In addition to RR, body movements during sleep were used in [14] for sleep monitoring using deep learning and prior knowledge of sleep medicine.Indeed, combining vitals with movement information enables advanced health analyses previously unavailable for unobtrusive modalities.
Beyond the vital signal extraction mechanisms, the effect of practical conditions on the quality of the extracted signal is a crucial domain to examine.For example, in [46], RR and HR were extracted during sleep while evaluating the effect of transmitter-to-receiver distance, sleeping posture, obstacles, and packet transmission rate.Furthermore, in [47], the CSI phase difference between two antennas was exploited to track RR and HR, and the effects of Non-LoS tracking, transmitter-to-receiver distance, and packet transmission rate were analyzed.Furthermore, signal processing techniques can be exploited to improve the quality of the extracted vital signs.For instance, the CSI phase difference was used in [32] with directional antennas where the most informative subcarriers were fused to obtain HR estimates to improve the signal 6402 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
quality.Expanding further on the aspect of signal fusion, the complementarity of the CSI phase and amplitude was exploited in work by Zeng et al. [48] to achieve full area coverage without leveraging multiple subcarriers.Published studies, however, are yet to address the robustness of measurements across RR ranges and different breathing depths.
Diverse RRs naturally result in inversely proportional breathing depths when measured on the same subject.This is attributed to fixed physiology and individual respiratory mechanics.In the context of respiratory motion, the anteroposterior displacement of the chest exhibits a range of 4.2 to 5.4 mm, whilst the mediolateral dimension demonstrates variability between 0.6 and 1.1 mm during conventional inhalation and exhalation procedures [13], [49].The motion due to the chest displacement gives rise to variations in the dynamic path of the CSI.The ability of a tool to accurately measure respiration regardless of the rate and the corresponding depth is a mark of its universality and is crucial for the clinical setting.There are undoubtedly various concerns to address within Wi-Fi-based vital signs extraction; however, none of the previous works tried to address the validation of the non-contact instrument as a medical device which we aim to consider in this work.

3) WI-FI CSI SENSING MEASUREMENT DEVICES AND TOOLS
Even though CSI has been included since IEEE 802.11n [50], the access to CSI directly from Wi-Fi chipsets is limited to specific hardware and software tools.For example, the first CSI collection tool is the Linux 802.11nCSI tool [51], which is based on an Intel 5300 Network Interface Card (NIC).However, it only collects up to 30 subcarriers and requires firmware modifications [50].On the other hand, the Atheros CSI tool [52] works with Atheros 802.11NICs and obtains all the 56 subcarriers for 20 MHz bandwidth without tampering with the firmware.Nonetheless, the aforementioned NIC based solutions do not support standalone operation and remain impractical for large-scale deployment [53].
The Nexmon CSI extractor utilizes the Broadcom chipset in the Nexmon 5 Android smartphone to obtain CSI data from all the 56 subcarriers in the 20 MHz bandwidth as a standalone solution [54].The Nexmon-based solution requires modification and may interfere with the warranty of the device.Alternatively, the ESP32 CSI toolkit [55] and the Wi-ESP tool [53] are based on the ESP32 MCUs and exhibit the least hardware-software dependency [50], [53].They provide a flexible, low-cost Wi-Fi sensing solution that enables large-scale deployment [56].
The ESP32 CSI sensing capabilities have been previously explored for applications such as crowd-counting and occupancy monitoring [57], [58], human presence and fall detection [56], as well as human activity recognition [55].However, to the best of our knowledge, no study has been conducted to date that has implemented an ESP32-based respiratory rate measurement instrument nor evaluated its measurements to the acceptability of its use as a medical device.Thus, we aim to address this gap in research by developing a Wi-Fi CSI-based RR sensing system using commercial off-the-shelf (COTS) ESP32 MCUs and investigating its measurement validity and repeatability in the context of the care of older people.

C. VALIDATION OF NEW MEDICAL INSTRUMENTS
Medical laboratories are often required to assess the degree of agreement between two measurement techniques [59].In order to validate a new technology for application in clinical medicine, it needs to be compared with older and more established methods [60].We may wish to determine whether a new inexpensive and unobtrusive technique produces results that are comparable to a well-established method with sufficient agreement for clinical purposes [61].The Bland and Altman method is essential for method comparison studies with the aim of validating new medical devices [61].Using this approach, measurement instruments that capture continuous variables measuring the same construct can be assessed [62].
For the Bland and Altman method, the statistical limits of agreement between the two measurement methods were constructed based on the mean and standard deviation of the difference in measurements.The limits of agreement are defined as [ d − 1.96s, d + 1.96s], where d is the bias or mean difference, and s is the standard deviation of the difference.Given enough samples collected, if the error distribution can be determined to be normal, 95% of the differences will lie between those limits of agreement [61].Normality of the distribution of differences is a prerequisite for this analysis [61].
Previous studies have evaluated the validity of wearable sensors for RR measurements [11], [12], where several devices were assessed for their validity, and the most reliable device was adopted for extended investigations.Furthermore, medical staff evaluated a non-intrusive manual device, such as a stethoscope, for its measurement validity during assessment [63].Nevertheless, no work has been conducted to date that addresses the clinical validity of noncontact Wi-Fi sensing as an RR measurement device, which is the gap we aim to target in this study.

D. REPEATABILITY OF MEASUREMENT INSTRUMENTS
The importance of investigating measurement errors from random and non-random sources lies in determining the appropriateness of the measurement method and instrument for different contexts [64].A crucial aspect of the usability and the long-term implementation of non-wearable sensors is the measurement repeatability of the instrument.An instrument riddled by enormous random errors is most likely not fit for its purpose, let alone be a suitable variable for making important decisions.For instance, in real patient scenarios, the risk of obtaining an erroneous estimate of RR is high because it is related to the patient's health condition, and an instrument with reliable measurements is required [64].
The repeatability of the Wi-Fi sensors can be determined by measuring the spread of the data around the sample mean, calculating the standard deviation [61], [64], and obtaining the confidence intervals for repeated measurements.The more consistent the repeated measurement results are, the higher the repeatability of the measurement process.In a repeatability study, variations in measurements taken on the same subject can be attributed only to errors in the measurement process [64].To quantify the repeatability of the measurements, the experimental conditions of the study must remain constant using the same measurement method [64].
Previous studies on non-contact RR sensing have assessed the repeatability of acoustic-based sensors [65] and polymer humidity sensors [66].However, to the best of our knowledge, no study has examined the repeatability of non-contact Wi-Fi sensing for RR monitoring which we address in this work.

A. HARDWARE DESCRIPTION
We used two ESP32-DevKitC-VE embedded devices in our work as Wi-Fi sensors.One ESP is programmed to act as the Access Point or Transmitter (TX), and the other is set as the Receiver (RX).The development kit supports the 802.11n protocol and allows access to CSI data without hardware tampering [53].
The data are transmitted and captured with the built-in omnidirectional PCB antenna in the development kit, where the transmission power is 20 dBm (100mW) at 2.4 GHz abiding by the IEEE and ETSI standards.The data were sent from the RX to a PC through a universal serial bus (USB) cable to a USB to universal asynchronous receiver-transmitter (UART) bridge with a maximum transmit rate of 3 Mbps.
We used an additional measurement device as a ground-truth signal for respiration: a scientific grade Neulog Respiration Monitor Belt logger sensor (NUL-236).A belt logger wrapped around the chest and measured the air pressure in the belt, which varied with the subject's breathing.
To minimize human error while maintaining a constant RR throughout the experiment, a metronome application was used as a guide for respiratory movements.The metronome guided the participant to inhale and exhale with alternate beats, where the beat rate of the metronome was set to double the intended RR.

B. DATA ACQUISITION 1) SOFTWARE TOOLS AND SETTINGS
We use the esp32-CSI-tool to obtain CSI data using the IEEE 802.11n 2.4 GHz Wi-Fi communication standard [55].The USB baud rate was set to 1843200 bits per second, and the wireless packets were transmitted at 120 packets/s (PPS).Subsequently, the Wi-Fi CSI data were collected by a MacOs laptop, is time-stamped with UNIX epoch time, and saved in a.CSV file format.We processed the saved complex CSI data once the data for the experiment were acquired.
A software application is provided as part of the NUL-236 respiration belt logger, which facilitates the visualization and data collection of the respiration waveform.There is no standard unit of measurement for the waveform data obtained from the sensor, and it can be rescaled.Samples captured by the NUL-236 were labeled against time and saved in a.CSV format, with a sampling rate of 100 samples/s.

2) EXPERIMENTAL DESIGN
Essentially, an experiment consists of a series of measurements aimed at testing the relationships between several variables.With respect to our particular study, we aimed to investigate the relationship quality between Wi-Fi CSI and the micro-motion of the chest and abdomen as a result of breathing.The validity of a measurement device is its ability to demonstrate that the experimental process successfully measures the quantity with little to no systematic error.Furthermore, a reliable instrument must minimize random error in its measurements by providing consistent results of repeated readings.
We designed an experimental procedure to measure the validity and repeatability of RR measurements using the Wi-Fi CSI amplitude.A test space of 3 m × 3 m in a testing environment closely replicating a standard care living room setting for individual older persons monitoring where such a device would be most beneficial.The TX and the RX are placed 3 m apart at a height of 0.85 m perpendicular to the ground, with a LoS distance of the TX-RX crossing the middle of the test space.The participant was seated approximately 0.9 m away from the middle of the LoS of the TX-RX pair.The labeled setup is shown in Fig. 1 resembling the setup illustrated in Fig. 2. The TX and RX were carefully placed in the test space based on the study's requirements.The test subject was advised to remain stationary during the testing period to control the variable of motion and isolate breathing chest movements from the effect of motion artifacts for the purpose of the datasets.
In this study, we collected two datasets, one for validity and one for repeatability [18].To test the validity of the Wi-Fi CSI RR sensing system, we performed the experiment 17 times with RRs ranging from 12 to 28 breaths per minute (BPM).Although our system captures RR ranging from [9,37] BPM expected from humans, [12,28] BPM is considered the expected RR range during rest for older adults as described in [67] including Tachypnea and hence the choice of RR range in this study.The duration of each data capture experiment was 120 seconds.Breathing slower than 12 BPM is indicative of Bradypnea, while faster than 24 BPM is of Tachypnea.Sample durations of 30-, 60-and 120-seconds were assessed to evaluate the effect of window width.This was done similarly to work in [68], where 30-seconds was considered common in clinical practice, 60seconds as the ideal counting duration, and 120-seconds as a  larger sample.For repeatability, we evaluated the consistency of our measurements through experiment repetition with the RR set to 14 BPM.It is repeated for 30 times with all the factors controlled for, as n = 30 is the Large Enough Sample Condition.The accuracy of the instrument was also assessed based on the repeatability experiment data, which is another form of validity.

C. SIGNAL PRE-PROCESSING AND RR EXTRACTION
Python 3 was used to implement the pre-processing and RR extraction from the raw CSI data in this study.The signal processing workflow is illustrated in Fig. 3. First, we obtained the CSI Amplitude data from the complex CSI, as shown in Fig. 3 (a), after extracting them from the.CSV timestamped file.Time indexing is essential in RR tracking applications.Unfortunately, due to packet loss, transmission delays, and other processing delays, the received packets are not evenly distributed over time.Hence, we interpolate and downsample the signal from 120 PPS to a rate of 40 PPS, using the Fourier method, as shown in Fig. 3 (b).Resampling and interpolation help in outlier removal by reducing the spurious effects occurring from hardware-introduced errors.It also evenly the incoming signal over time and reduces the computational complexity, preparing it for the Discrete Wavelet Transform (DWT).
We use a DWT-based filtering technique in contrast to the Fourier-based finite impulse response filters, in which the latter would require additional signal conditioning.Signal conditioning techniques such as the Hampel filter [69], Savitsky-Golay filter [44], and median or mean filters are used to remove the noise.To prevent hardware or environmental noise from interfering with the performance of the Fourier-based filter, it is necessary to complete this step before implementing the filter.However, this signal conditioning may distort the signal [70].On the other hand, this conditioning is not required before applying the DWT introducing fewer distortions to the signal [70].Furthermore, wavelet analysis is used on the time-series of Wi-Fi sensor data; it is used for data which is non-stationary in nature, and it preserves any sharp transitions in the signal better than other types of filters [71].
The down-sampled CSI data are transformed to the wavelet domain using the DWT with a 'db4' wavelet, as it is the most appropriate wavelet for extracting RR signals, further reducing the effect of outliers [72].We apply a 7-level decomposition and maintain the sixth and seventh detail coefficients while nullifying the approximation coefficients and the lower-level detail coefficients.This wavelet filtering technique only reconstructed frequencies from [0.15625, 0.625] Hz, corresponding to [9.375, 37.5] BPM.This range includes the typical RR for older adults of [12,28] BPM, which we used to evaluate the sensing system.The reconstructed signal containing the frequencies of interest is shown in Fig. 4 (c).
Principal Component Analysis (PCA) helps separate respiratory body movements from noise, as movement causes correlated effects across subcarriers.Subcarriers experiencing  the most variance due to movement are considered the most sensitive to movement, hence the variance is preserved using PCA [73].Principal Components (PCs) capture the primary features of respiration movement data, suppress noise, and reduce dimensionality [69].Furthermore, since they preserve only the correlated data due to variations in the dynamic path of the CSI, they ensure the generality of our system in measuring RR independent of the shape and size of the subject.Using this method ensures that we can recover CSI change patterns independent of phase offset potentially introduced by hardware and software errors.The first PC captures highly correlated noise due to hardware imperfections; therefore, we used the second PC because it contains more of the respiration waveform without the noise corresponding to the internal state changes in the hardware [74], [75].
The combination of the DWT filter and PCA ensures that regular movements such as walking, tremors, and restless leg syndrome are not picked up by our system as they are not regular enough periodicity-wise or lie outside of the frequency range.The extracted respiration signal, in comparison to the respiration belt, can be seen in Fig. 4 (d) and Fig. 4 (e).To extract the final RR estimate, we obtained the peak of the power spectral density of the second PC, as demonstrated by Fig. 4 (f).The data pre-processing and RR are illustrated in Fig. 4 and are implemented for 20 BPM and a 120-second analysis window.A zoomed-in comparison between the W-Fi CSI obtained respiration versus the belt data is displayed in Fig. 5, where we can see the peaks from both modalities coincide.

A. VALIDITY 1) AGREEMENT: A METHOD-COMPARISON STUDY
In Fig. 6(a), we can note that 95% of the differences in measurement between the Wi-Fi sensor and the respiration belt for a 30-seconds sample duration are accounted for with limits of agreement ranging between [−6.05, 4.66] BPM with a bias of −0.70 BPM between the two instruments.Whereas in Fig. 6(b) we can note that 95% of differences in measurement between the Wi-Fi sensor and the respiration belt for a 60-seconds sample duration are observed within limits of agreement ranging between [−1.29, 1.06] BPM with a bias of −0.11 BPM between the two instruments.Finally, for the 120-second time window in Fig. 6(c) we can note that 95% of differences in measurement between the Wi-Fi sensor and the respiration belt are accounted for with limits of agreement ranging between [−0.27, 0.21] BPM with a bias of −0.03 BPM between the two instruments.
In addition, we find from the Bland-Altman plot that there is no proportional bias; therefore, the scatter of the plot is homoscedastic.Homoscedasticity was observed because the bias did not vary with increasing mean difference values, nor did the scatter change in variance with the mean difference values.Consequently, we can apply absolute statistics to obtain the instrument's repeatability from a single point along the expected respiratory scale for older adults of [12,28] BPM.Furthermore, since the RR value obtained is consistent with the behaviour predicted by theory and that measured by the respiration belt, this agreement proves construct validity [62].The Bland-Altman plots and calculations obtained in this study used the Pingouin package in Python 3, which is based on Pandas and NumPy libraries, specifically the pingouin.plot_blandaltman()function [76].

2) ACCURACY
Accuracy metrics supporting the results of the validity of the Wi-Fi sensor for RR measurement in older adults are presented in this section.In Table 1., the results of the accuracy and error metrics for each sample duration are presented.The accuracy results were obtained based on the data set of 30 repeated experiments for 14 BPM RR since the data set is larger.A 120-second sample duration results in a smaller Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) and more accurate results than using a 60-second or 30-second sample duration.With the inclusion of more data points in the analysis window, the accuracy of the measurements increased.The error cumulative density function (CDF) is calculated using the Wi-Fi obtained data as observation and the data as ground truth, while the error is smoothed with a gaussian filter with σ = 1.In the graph of the error CDF in Fig. 7, we can see that we obtained approximately 80%, 72% and 68% of error below 1 BPM for 120-seconds, 60-seconds, and 30-seconds respectively.This result is in line with our expectation since a longer sample duration captures smaller magnitudes than shorter sample By providing the accuracy and error metrics per sample duration for the Wi-Fi sensor, we can assess the validity of the measurements.

B. REPEATABILITY
The repeatability of Wi-Fi as an RR measurement instrument was evaluated by making 30 measurements for a RR of 14 BPM, each with a duration of 120-seconds.The choice of RR of 14 BPM was subjective, based on the participant's most comfortable breathing rate.Depending on the homoscedasticity of the previously obtained Bland-Altman plot, it appears that it is appropriate to select a single point of RR for analysis.Hence, any point within the range of [12,28] BPM is suitable to select for analyzing the repeatability statistics.
Table .2 displays the results of repeatability for sample durations of 30-seconds, 60-seconds and 120-secondswhich is calculated using the standard deviation of the Wi-Fi CSI RR measurements, and the associated confidence interval with a 95% confidence level.We can note that the confidence interval width for the 60-second sample duration is double that of the 120-second sample duration.These results are expected because the resolution of the Fourier Transform increases as the duration of the sample window increases, where 1  T s × 60 is the resolution in BPM [22].Although the 30-second sample duration obtains a smaller interval width than the 60-seconds duration, the confidence interval does not contain the expected RR value of 14 BPM; hence, the sample mean does not equal 14 BPM at the level of 0.05 significance.We conclude that a wider sample duration yields more repeatable results attributed to obtaining a smaller standard deviation and narrower confidence intervals.

V. DISCUSSION
In general, an experiment is valid if it measures the quantities it intends to measure.In previous studies, quality evaluations of Wi-Fi respiration sensors have mostly focused on comparing it to a ground-truth device and evaluating the correlation [77], [78].However, correlation is limited to investigating the strength of the linear relationship between two variables.Correlation is not regarded as a measure of agreement; it is a measure of association [61] and cannot be used to evaluate the interchangeability and validity of the device, which is necessary for clinical evaluation.
Our objective is to determine whether the RF sensor and the respiration belt can be used interchangeably if the readings of the two devices agree within acceptable limits.This type of comparison is frequently conducted for medical instruments when a new measurement method is less precise but less invasive or more affordable than the ground-truth or gold standard [61].This is the first work of its kind to determine the limits of agreement between a Wi-Fi sensor and a respiration belt to assess how the two devices agree on measurements.To provide markers for evaluating the suitability and the generality of implementing Wi-Fi sensing in the context of health care, experiments were conducted for the normal RR range of older adults.
We apply the Bland-Altman method to different sampling durations of 30-, 60-and 120-seconds to examine the effect of window width on the validity of the Wi-Fi sensor.It is evident from Fig. 5 that the Limits of Agreement as well as the bias decrease as the sample duration for the time window increases, indicating an improvement in validity and hence the reliability of the Wi-Fi sensor's measurements.An acceptable range for the limits of agreement must be determined a priori by the clinical or care staff before implementation, which could depend on patient risk and health conditions.Typically, inter-observer variability of respiration in a clinical setting may account for a difference of 2-6 BPM [68].
For the set of measurements taken in a lab setting, the performance was on par with the wearable and contact sensors discussed in [11] and [12] for the 30-second time widow Wi-Fi sensor, and exhibited better performance when 60-an d 120-seconds windows were used.In [11], the narrowest limits of agreement obtained are [−5.6,6.4] BPM with a bias of 0.4 BPM using a mattress embedded sensor against thoracic impedance pneumography.Meanwhile, in [12] the best agreement was obtained using a chest band sensor with limits of [−9.99, 6.8] BPM with a bias of -1.60 BPM against a cardiac test face mask.While for we obtain limits of agreement of [−1.29, 1.06] BPM with a bias of −0.11 BPM for the Wi-Fi sensing system against the NUL-236 respiratory belt using a 1-minute window analysis.A summary of the comparison of the results of our device against some of the best-performing devices mentioned in [11] and [12] is listed in Table .3. While our study's findings are confined to controlled laboratory conditions, they exhibit significant potential.
The Bland-Altman method obtains the limits of agreement, but it cannot determine whether these limits are acceptable.The acceptability of the limits of agreement between these two devices must be defined a priori by a clinical or a professional, with the health risk of older patients in mind.For instance, the limits of acceptability can be predefined as ±3 BPM, as in [11].If the limits of agreement are found to be clinically insignificant, we may say that the two devices are interchangeable [59].Interchangeability demonstrates the instrument's validity and acceptability according to predefined criteria.Although the results of this study cannot evaluate device interchangeability, the validity and agreement are assessed for the range of standard RR of older adults, providing an appropriate analysis for Wi-Fi's use as a medical device for RR measurement for older people.
The second form of validity concerns the accuracy of the device.We applied the accuracy and error metrics to the repeated RR values of the experiments.These metrics evaluate the closeness of the measured value to the ground-truth value and hence can be mostly attributed to systematic errors.Accuracy was also evaluated for varying window widths and showed improved metrics with increasing sample duration.Presenting accuracy metrics is essential as systematic error tolerance must be determined before implementing the Wi-Fi sensor for RR estimation, and the sensor must be calibrated to an acceptable degree fit for use in the care of older persons per patient risk and health condition.
Since the Bland-Altman plot is homoscedastic, the repeatability of the plot can be determined using absolute statistics.The standard deviation and confidence intervals characterize the spread of the measurements around the mean value and uncertainty in the Wi-Fi sensing device.As expected, the uncertainty around the mean value decreases with increasing sample duration, as does the confidence interval width.The RR inversely influences the breathing depth, and due to the homoscedasticity of the plot, the model is generalizable across RRs in range and their corresponding breathing depths.The precision of the Wi-Fi sensor for RR estimation informs clinicians and care professionals regarding the degree of random errors present in the sensor.Prior to implementing a monitoring system, a random error tolerance assessment must be similarly conducted for repeatability on a wider participant pool because it can affect important healthcare decisions.

A. CLINICAL IMPACT
Nurses usually manually assess vital signs during ward rounds, a situation in which the monitoring frequency is low and adverse events are often missed [79].Manual counting methods suffer from high inter-observer variability.Two simultaneous observers measured the RR and obtained considerably wide Limits of Agreement of [−4.2, 4.4] [80].However, continuous or automated monitoring devices would help capture adverse events in patients more effectively.Rubio et al. [12] presented a comparison of four wearable devices worn simultaneously against a ground-truth; however, ill patients found the sensors to be intrusive, which would affect patient adherence.Using an unobtrusive alternative, such as Wi-Fi sensing, provides a more acceptable alternative for older patients.
This validation method comparison study performed with a Wi-Fi sensor against an RR belt offers an evaluation and interpretation of the instrument agreement.Furthermore, the use of correct statistical methods to evaluate the accuracy of a measurement device will provide the end-user with a better understanding of the implications of adopting a new measurement methodology.In this case, the Bland-Altman method is discussed in the medical statistics and instrumentation literature as a metric for validity and interchangeability.Additionally, this study was one of the first to assess the clinical acceptability of using Wi-Fi sensing as a non-contact tool to measure RR in the context of care of older adults.

VI. CONCLUSION
This study aimed to conduct the first investigation on the validity and repeatability of Wi-Fi Channel State Information (CSI) sensing for respiratory rate measurements in the context of caring for older adults as a medical device.As a first step, we validated the performance of the ESP32 Wi-Fi sensor against the respiration belt logger NUL-236 as a ground-truth device within the typical respiratory range for older individuals, from 12 to 28 breaths per minute, using the Bland-Altman method thus confirming the generalizability of the model across the respiratory range.Furthermore, as the validity results are homoscedastic in nature, we can evaluate the repeatability of the measurements at a single point.These repeated measurements were also used to measure the precision and accuracy of the Wi-Fi sensor against the ground-truth respiration belt to determine the effects of random and systematic errors.The dataset of Wi-Fi CSI measurements, along with the corresponding belt data, was collected and made available for the validity and repeatability experiments.
The interchangeability of a medical device depends on its acceptance by clinical or care staff.Providing an appropriate appraisal of a measurement device would support professionals in adapting and deploying non-contact Wi-Fi sensing in older patients in care.This study addresses these points by providing validity and repeatability assessments to facilitate the interchangeability of Wi-Fi CSI sensing as a medical respiratory rate device for older adults.As this study was conducted in a controlled laboratory environment, data collection was limited to an independent living scenario with one quasi-stationary subject.Further investigations should be conducted to include a longitudinal multi-participant study informed by this work to better understand the interchangeability between Wi-Fi CSI respiration sensing and ground-truth devices.Future work will address different multi-sensor placements to explore optimal sensor locations for data fusion in the context of care of older adults, as well as abnormal respiration pattern detection during sleep to monitor health conditions and pathologies.

FIGURE 2 .
FIGURE 2. Independent living scenario: Seated in a living area.

FIGURE 4 .
FIGURE 4. Signal processing results for 20 BPM and 120 seconds sampling time.

FIGURE 5 .
FIGURE 5.Comparing the respiratory waveform obtained using Wi-Fi CSI and the respiratory belt for 20 BPM between[60,80] seconds.

FIGURE 6 .
FIGURE 6. Validity: Bland and Altman plots for the Wi-Fi sensor and the respiration belt.
FIGURE 7. Error CDF for different sampling duration.