Real-World Gait Bout Detection Using a Wrist Sensor: An Unsupervised Real-Life Validation

Gait bouts (GB), as a prominent indication of physical activity, contain valuable fundamental information closely associated with human’s health status. Therefore, objective assessment of the GB (e.g. detection, spatio-temporal analysis) during daily life is very important. A feasible and effective way of GB detection in real-world situations is using a wrist-mounted inertial measurement unit. However, the high degree of freedom of the wrist movements during daily-life situations imposes serious challenges for a precise and robust automatic detection. In this study, we deal with such challenges and propose an accurate algorithm to detect GB using a wrist-mounted accelerometer. Features, derived based on biomechanical criteria (intensity, periodicity, posture, and other non-gait dynamicity), along with a Bayes estimator followed by two physically-meaningful post-classification procedures are devised to optimize the performance. The proposed method has been validated against a shank-based reference algorithm on two datasets (29 young and 37 elderly healthy people). The method has achieved a high median [interquartile range] of 90.2 [80.4, 94.6] (%), 97.2 [95.8, 98.4] (%), 96.6 [94.4, 97.8] (%), 80.0 [65.1, 85.9] (%) and 82.6 [72.6, 88.5] (%) for the sensitivity, specificity, accuracy, precision, and F1-score of the detection of GB, respectively. Moreover, a high correlation ( $R^{2}= 0.95$ ) was observed between the proposed method and the reference for the total duration of GB detected for each subject. The method has been also implemented in real time on a low power consumption prototype.


I. INTRODUCTION
Physical Activity (PA) is one of the fundamental aspects of daily life, closely associated with well-being and recognized as a Leading Health Indicator of populations [Healthy people 2020, https://www.healthypeople.gov/]. The World Health Organization (WHO) has reported a strong connection between PA and risk of falling, cognitive function, muscular fitness, and functional health level of elderly people [1]. PA becomes even more important when the increasing trend of aging populations (from 524 million people in 2010 to 1.5 billion in 2050 according to WHO [2]) is considered. PA is a crucial component in healthy aging [2] and is a major factor for the prevention of chronic non-communicable The associate editor coordinating the review of this manuscript and approving it for publication was Anubha Gupta . diseases such as diabetes, hypertension, cardiovascular diseases, depression, obesity and some types of cancer, which cause over 60 (%) of global deaths [3]- [5].
Amongst different types of PA, gait (e.g. walking and running) is one of the most important and effective ones. The gait objective assessment can provide useful and valuable information about physical functioning of people. The advances in wearable technologies has led to the development of Inertial Measurement Unit (IMU)-based PA monitoring systems using various configurations, for example attached to the lower limbs [6]- [10], on the upper body [11]- [23] or a combination of on-body sensor locations [24]- [35]. Sometimes, IMU in a smartphone has been employed, where the phone has been fixed to different parts of the body [36]- [41]. While these systems allow detection of gait outside the laboratory, they suffer from several drawbacks. Wearing multiple VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ sensors (e.g. on foot, shank, thigh, hip, or chest) may be cumbersome, uncomfortable and awkward in daily-life situations, especially when long-term measurements are targeted. Fixation and alignment of the sensors on body segments, to guarantee fixed orientation/location during the whole measurement, may require the intervention of an expert as well as additional tests, which affect the usability of the system or can easily disturb the wearer and modify his/her normal daily activities. Moreover, the power-consumption of such systems may be high due to using multiple sensors and/or modality (e.g. gyroscope), which limits the duration of continuous measurements. As another issue, the multiple-sensors-based algorithms are generally not devised for real-time data analysis, relevant for generating real-time feedback to the user, due to their high complexity. Therefore, for most of the existing systems, the recorded raw data must be later transferred to a computer for an offline analysis. Considering the above limitations, an alternative for PA monitoring, particularly gait, is to use a single IMU mounted on the wrist [42]- [53]. It offers comfort, high usability and discreet monitoring (e.g. integrating inside a wristwatch) thus increasing user compliance. Therefore, wrist-worn PA trackers have experienced a significant growth in two main directions, i.e., as consumer-grade and as research-grade, each group featuring specific advantages and limitations [54]. Technological advances as well as increasing demand and interest for long-term monitoring of PA lead to the emergence of a high variety of consumer-oriented activity trackers that nowadays gain popularity due to their low cost and accessibility on the market. Although they appear useful to promote a more active lifestyle, there is also growing consensus about the limitation of these devices in healthcare research settings since their reliability and validity have seldom been assessed (only 5% are validated yet). Several studies showed a significant drop of the performance (up to 50% of error) of such commercial products under different conditions [55], [56]. On the other hand, the research-grade PA trackers are consistently reported to be more accurate than consumer-grade ones, where some of them provide the access to the raw IMU data (e.g. 3D acceleration). However, the analysis software of such PA trackers is generally expensive. An additional common limitation of the commercial devices (both consumer and research grade) is the lack of information about the methodology used to estimate the reported PA parameters (e.g. steps, moderate to vigorous activity intensity), complicating the interpretation and comparison of the results across studies.
Owing to the research-grade PA trackers, which allow the access to raw IMU data, big cohort databases have been created worldwide that contain long-term real-life wrist IMU/acceleration data of thousands of people [57]- [59]. This has motivated the scientists to develop and validate new algorithms [60]- [64], allowing a more comprehensive and transparent assessment of real-life PA behavior (e. g. in terms of GB and gait features such as cadence, speed), in clinical/ research settings.
Most of wrist-based PA assessment methods have used the abstract modeling where several raw data (accelerometer) features based on time, frequency and statistics are extracted and fed into various types of machine learning models (e.g. decision tree, SVM, Bayes). Such methods are independent of sensor orientation, therefore, no need for sensor calibration and alignment, which is suitable for long-term measurements of gait in real-world situations. However, the association of gait activity with the wrist motion is more challenging than the upper body or lower limbs. The wrist may have ''independent'' movements from the gait (e.g. carrying bag, hand-in-pocket) and non-gait (e.g. moving wrist when sitting or standing), which is problematic for accurate GB detection in everyday-life conditions.
Another issue regarding GB detection is the validity of the existing methods under unsupervised or real-world conditions. In fact, most methods were validated only in supervised (e.g. in controlled laboratory setting including measurements during short periods and limited space) or semi-supervised conditions (e.g. a series of activity lasting longer time, simulating real-life situations under the supervision of an observer). Such conditions may not match real-world situations where the gait activity is context dependent, self-initiated and purposeful. It has been shown that the performance of in-laboratory validated PA classification algorithms significantly drops when they are applied to data recorded in real-life circumstances [23], [24], [65]. Only few works have evaluated their methods under completely real-world conditions, without any supervisions or pre-defined sequences of PA [11], [25]. Those methods used a subject-borne camera as a reference to label PA in free-living conditions. In particular, [25] reported an accuracy of 96 (%) for daily life PA detection using two sensors, located at thigh and low back (L5). They showed that the accuracy dropped to 75 (%) by using only a wrist sensor. This paper describes an accurate and precise algorithm devised to detect GB in completely free-living conditions using a single 3-Dimensional accelerometer on a wrist. Biomechanically-derived features along with a naïve Bayes classifier, followed by two physically-meaningful post classification procedures have been used to optimize the algorithm performance. We target a low-power calibration-free algorithm, which needs low computation in order to be implemented inside a wristwatch proper for online feedback in every-day situations. The algorithm was validated in real-world conditions on healthy young and old adults against an accurate and pre-validated wearable system [26].

A. MEASUREMENT PROTOCOL
The measurement protocol consisted of two datasets: M1 including young and active people, and M2 including old and less active population. In dataset M1, 29 healthy young volunteers (14 women, 15 men, age 37 ± 9 years old, height 172 ± 10 cm, and weight 68 ± 11 kg) wore two time-synchronized inertial sensors (Physilog R IV, GaitUp, CH, see Fig. 1 (D)) on both wrists using elastic straps ( Fig. 1 (A)). Only single (left or right) wrist-recorded 3D acceleration (range ±16 g), sampled at 200 Hz, was used to devise the GB detection algorithm. Data recorded on both wrist enabled comparison of the algorithm performance between the two sides. In addition, another timesynchronized inertial sensor (Physilog R IV) was attached to the shank (using elastic straps), measuring 3D acceleration (range ±16 g) and angular velocity (range ±2000 degrees/s) sampled at 200 Hz. The shank-mounted sensor was used only by the reference system [26]. Participants carried the wrist sensors and the reference system for two days, one weekday and one weekend (to keep enough diversity of physical activities in daily life), around 12 hours per day, during their real free-living conditions without any constrains or supervisions.
For the dataset M2, 37 old participants (19 women, 18 men, age 64 ± 11 years old, height 167 ± 10 cm, and weight 77 ± 12 kg) were included. They wore a data logger (GENEActiv Original, ActivInsights Ltd, United Kingdom, see Fig. 1 (D)) measuring 3D acceleration (range ±8 g, sampled at 40 Hz) on one wrist ( Fig. 1 (B)). Moreover, as a reference, an inertial sensor (ActiGraph GT9X Link, United States, see Fig. 1 (D)) was attached to the shank to measure the 3D acceleration (range ±8 g) and angular velocity (range ±2000 degrees/s) at sampling rate of 50 Hz. These values of sampling frequencies are sufficiently high to avoid aliasing and capture the characteristics of the gait pattern [66]. Each subject carried the wrist device and the reference system around 12 hours (within one day) in real-world situations. Experimental protocols for both datasets were approved by local research ethics committees and all participants signed written informed consent prior to the measurements. It should be noted that the measurements have been collected at different time in different research sites which provided the opportunity to show the robustness of the proposed method by including 1) more subjects, 2) different populations (i.e. young and elderly), and 3) sensors from different companies.

B. LABELS FOR GB DETECTION
The inertial signals recorded with the sensor on shank were used by a validated noncommercial accurate algorithm [26] to obtain reference data. The algorithm provided labels for gait and non-gait periods in real-life situations with a resolution of one second. In [26], the algorithm was validated against visual observation and achieved a sensitivity and specificity of 97.1 (%) and 97.9 (%) for GB detection. This algorithm has been already used in several studies, as reference for technical validation of algorithms in free-living conditions as well as the clinical assessment in various healthy and diseased populations [67]- [69].
C. WRIST-BASED GB DETECTION Fig. 2 represents the block diagram of the proposed wristbased method where the measured 3D accelerometer signals (A x , A y , A z see Fig. 1 (C)) were segmented and relevant features were extracted. Then, the probability of gait occurrence was estimated using a Bayes estimator trained by the extracted features and their corresponding labels from the reference system. In the next step, the temporal information of past-detected activities was used to update the gait occurrence probability based on the histogram of gait durations in real-life situations. Finally, ''gait'' or ''non-gait'' bouts were classified using a smart rule based on the probability resulted from the previous steps. Here, L[n] is a vector containing the predicted label of each window of extracted features. In the following, a more detailed description of each step is provided.

FIGURE 2.
Block diagram of the proposed wrist-based method. First, wrist acceleration signals (A x , A y , A z ) were segmented using a 6-second moving rectangular window with 5-seconds overlap. Then, relevant features (H [n]) were extracted for each window. Next, a Bayes estimator was trained to estimate the probability of gait occurrence (P Bays [n]). The probability of the Bayes estimator was modified (P T [n]) by a temporal classifier based on the histogram of GB duration. Eventually, a smart algorithm was proposed to decide if the window is gait or not based on the probability obtained from the previous steps. L [n] is the final estimated label, provided at the system output on 1-second time-base.
Segmentation-First, acceleration signal of dataset M2 was up-sampled to 200 Hz to have the same sampling frequency as dataset M1. As mentioned previously, the original sampling frequency of dataset M2 (40 Hz), was already high enough VOLUME 8, 2020 to avoid aliasing. Then, we employed a 6-second moving rectangular window (i.e. 1200 samples per window) with a 5-second overlap to generate segmented wrist acceleration where n refers to window number. The values of the window length and shift were experimentally found to optimize the algorithm performance, which are also consistent with the literature [12], [60], [70]. The amount of data of each window is also optimal in the sense that it is short enough to have the required time resolution and long enough to have sufficient data for consistent frequency analysis. It is worth mentioning that the output of the proposed system (i.e. the estimated label L indicating gait/non-gait bouts) is provided on 1-second time base.
Feature extraction-We defined a set of various features based on biomechanics of the wrist movements, such as intensity, periodicity, posture, and other non-gait dynamicity, to highlight intrinsic differences between the gait and nongait. LASSO (least absolute shrinkage and selection operator) feature selection method according to [71] was used to specify the best possible features set to optimize the performance on the training dataset. As we expected, the LASSO selected a set of features which covers all biomechanical criteria (i.e. intensity, periodicity, posture, other non-gait dynamicity) used to define features. Totally, 13 features related to the four biomechanical criteria were chosen and categorized, as described in the following.
Intensity-based features: One key difference between gait and non-gait periods is the intensity of the wrist acceleration signal. In order to extract this information, we computed the following features: NI [n] : The intensity of acceleration norm calculated according to (1).
where SA [f i ] is the amplitude of the power spectrum of the acceleration norm computed according to (2) and (3). In order to estimate the spectrum, we used the N -point Fast Fourier Transform (FFT) with Blackman windowing (experimentally selected) where N is the number of samples within a time window (i.e. N = 1200 in our case). Moreover, f i refers to frequency resolution of the method, which is indicated in (4). We used the logarithm transform in order to shorten the range of this feature, as well as the heavy tail of its histogram, which is proper for further Bayesian modeling.

MeanA[n]:
The mean value of acceleration norm within a time window.
Periodicity-based features: Considering the cyclic nature of the gait, five features related to the periodicity of the acceleration signals were included as follows: Here, θ is the angle between the wrist and the horizontal plane made by <x Global , y Global >. x, y , and z are the axes in the sensor frame, and g is the global gravity vector.

NACFmax [n]
: The autocorrelation function of acceleration norm computed and normalized to the first sample (i.e. sample of the zero lag). Then, its maximum peak, NACFmax, excluding the zero lag sample, was reported for each window.

NACFp2p[n]
: This is the peak-to-peak value of the maximum peak and the minimum valley of the normalized autocorrelation function, excluding the zero lag sample.

SAmax[n]
: The normalized spectrum of acceleration norm (NSA) was estimated using SA [f i ] according to (5) and the amplitude of its maximum peak was computed as SAmax [n].

DomSAmax[n]
: We designed this score to quantify how sharp the maximum peak of NSA was, compared to its neighboring samples. This feature was computed according to (6) where f max , f max−1 and f max+1 referred to the frequencies of maximum peak of NSA, a sample before and after that, respectively.

Cad[n]:
This is the cadence of the gait (number of steps per minute) which is generally bounded in a short range around 120 steps/min (≈40-300 steps/min). The rationale of including this feature is that, in addition to the periodicity of the signal, the period itself is an important information to distinguish between gait and non-gait periods. Cad[n] was computed using the algorithm presented in [60].
Posture-based features: During gait (i.e. running or walking), the wrist has generally a more specific and predictable posture than during non-gait periods. Consequently, extracting information about the posture of the wrist should be useful for GB detection. We defined θ as the angle between y-axis of accelerometer on the wrist and the global horizontal plane < x Global , y Global > (the plane made by x and y-axes of the global coordinate system perpendicular to the gravity vector, Fig. 3). By assuming that the sensor can only rotate around the wrist, the y-axis of the sensor was almost aligned with the longitudinal axis of the wrist. According to Fig. 3, if dynamic acceleration of the wrist movement remains low, the projections of gravity vector on y-axis of sensor, and on the plane made by x-axis and z-axis of sensor (< x, z >) are: where A <x,z> [n] is the amplitude of resultant acceleration vector on the plane < x, z > for window n. Here, g is the standard gravity. Consequently, the angle θ [n] can be estimated according to equation (9): Finally, the proposed postured-based feature was defined as: Non-gait dynamicity features: During a gait period, the wrist acceleration signal is pseudo-cyclic with energy mostly in low frequency bands. On the other hand, in non-gait periods the acceleration signal is rather random and erratic with energy distributed in a larger frequency band. Therefore, like Signal-to-Noise Ratio (SNR), the gait/non-gait power ratio is expected to be higher in the presence of gait than non-gait periods. Consequently, several features were devised to separate gait from dynamic signals that may be observed during non-gait, by using the level of ''noise'' (i.e. non-gait period) in the desired signal (i.e. gait period). These features are as follow: HLR [n] : The ratio between the intensity in high to low frequencies as expressed in (11). The frequency threshold was experimentally set to 3.5 Hz, to optimize the performance.

ZCR[n]:
The zero crossing rate in acceleration norm, expected to be higher for non-gait periods due to noisy and erratic nature of the wrist movements. First, the mean value of the acceleration norm within a time window was removed. Then, any linear trends in the resulted signal were discarded using ''detrend'' MATLAB function. Eventually, the number of zero crosses was counted as feature ZCR [n].

SEF[n]
: The Spectral Edge Frequency, computed according to equation (12), estimates the frequency where α (%) of the energy of the signal is observed below that frequency [72]. We found that α = 70 (%) provided the best performance in our application.

RandA[n]:
By assuming that the wrist acceleration signal is less random during gait than non-gait periods, the feature RandA[n] was defined according to an autocorrelation-based test presented in [73] to measure how much the signal is random. According to this test, if a time series comes from a stationary random process (which is almost the case for acceleration norm of non-gait periods within a short time window of 6 seconds), samples of autocorrelation of the time series will be mainly bounded between ±1.96/ √ N where N is the number of samples within a time window (i.e. 1200). We defined RandA[n] as the percentage of autocorrelation samples outside the range of±1.96/ √ N . High values of RandA[n] means less randomness of the signal.

KurtosisA[n]
: Kurtosis is a statistical measure quantifying how much the distribution of data is outlier-prone [74]. We hypothesized that the acceleration norm of non-gait periods contains more outliers than gait due to higher randomness of the signal. Therefore, the kurtosis of the acceleration norm within a time window was computed as another feature.
Eventually, for each time window n, H [n] was built as the feature vector including all selected features.
Bayes estimatorwe evaluated the performance of several models such as Bayes, decision tree, SVM, and Neural Network for the GB detection on the training data and experimentally chose the Bayes approach since it showed the best performance. In addition, the Bayes approach is simple and fast enough for a hardware implementation inside a wristwatch for real-time on-board computations. Consequently, the probability of gait occurrence for each window is estimated using Bayes estimator according to (13 within the gait (G) and non-gait (NG) classes, respectively. Furthermore, P G and P NG are respectively prior probabilities of gait and non-gait occurrences. We considered multivariate multinomial distributions (''mvmn'' in MATLAB) for the Bayes estimator. Moreover, in order to manage the intrinsic imbalances of samples between gait and non-gait periods (in real-world situations, non-gait samples are significantly more than gait ones), we took the advantage of Laplace smoothing parameter [75] in computation of the prior probabilities as follows: whereN NG , and N G are the total number of samples observed for non-gait and gait periods, and l is a smoothing parameter fixed empirically to(N G + N NG )/10. Temporal-based probability modification-We took the advantage of information of past-detected activities to VOLUME 8, 2020 increase the certainty of the decision made for the current activity. As shown in Fig. 4, assume that q[n−1] and d[n −1] are the type and the duration of the last activity detected up to window n − 1 (i.e. the last activity was started from window n − d[n − 1] to window n − 1). P q[n]=q[n−1]|d[n −1] is also the probability of having the same activity in window n (i.e.q [n] = q[n − 1]) knowing the type (q[n − 1]) and duration (d[n − 1]) of the last activity. To this end, two exponential functions (see (16) and (17)) were fit to probability density functions of duration of gait and non-GB specific to daily life. Here, the parameters of the functions (i.e. β G , γ G , τ G , etc.) were obtained from the training session of the method. The exponential distributions were chosen due to their important properties [76], such as: (i) ability to describe samples including more small values and fewer large values (as empirically observed for the duration of real-life GB [67], [77]), (ii) great mathematical tractability. Then, since the probability given by Bayes estimator (P Bays ) was generally more reliable than P q [ −1] . Eventually, the modified probability of gait occurrence (P T ) of time window n was computed through (19) where ''min-max'' function was used to limit the probability to the range of [0, 1], where ψ was defined according to (20).
Smart decision making-When P T [n] is far enough from 0.5, it is easy to decide if the window n is gait or not. However, making the decision is challenging when P T [n] is close to 0.5, which can happen in the proximity of transients between the activities since a part of the feature window is gait and the other part is non-gait. Consequently, we designed the following algorithm to make a smart decision based on P T [n]. If P T [n] < 0.3 or P T [n] > 0.7, the decisions were NG or G, respectively. We called these windows as reliable windows. On the other hand, if 0.3 ≤ P T [n] ≤ 0.7 (called ambiguous windows), then we analyzed the period between the last and next reliable windows. Imagine for ambiguous window n, windows m and k are respectively the last and next reliable windows, (m < n < k, see Fig. 5). If k − m + 1 ≤ 10, then we changed the threshold of decision making from conventional 0.5 to 1 − mean (P T [m < n < k]). Otherwise, decisions were G or NG if P T [n] > 0.6 or P T [n] < 0.4, respectively, and for 0.4 ≤ P T [n] ≤ 0.6, the last reliable decision was assigned to window n (i.e., L [n] = L [m]). The Algorithm 1 briefly explains the procedure of the proposed smart decision-making. In worst-case scenario, the algorithm can impose a maximum delay of 10 seconds (10 shifts of 10 overlapped windows) to the whole system, which is acceptable for the real usage of the system (i.e. gait bout detection) in real-life situations.

D. CROSS-VALIDATION AND ERROR COMPUTATION
The proposed method was validated against a reference system [26]. Well-known leave-one-subject-out cross-validation was applied where the model was trained using data of all subjects except one, and tested with the one absent in the training dataset. This procedure was repeated until all subjects were selected once to be in the test set. The algorithm was tested on both young (M1) and old (M2) healthy populations. Furthermore, to evaluate the generalization ability, the model was train on M1 and tested on M2 (and vice versa). In all cases, we compared the results of the reference with the proposed wrist-based method with a resolution of 1 second. In the dataset M2, due to clock drift of the sensors, the signals recorded by shank sensor were not well synchronized with the wrist sensor and up to a 10-second shift was observed. Consequently, to compare the results of the wrist and the reference in this dataset (i.e. M2), we considered a tolerance of one sample. It means that each sample of the wrist-based method was compared with three samples of the reference method (i.e. one sample before and after the current sample) and if only one of those samples matched the wrist method, it was counted as a correct decision. Standard performance parameters i.e. Sensitivity (also known as recall, detection rate in class G), Specificity (detection rate in class NG), Accuracy, Precision, and F1-score were computed according to (21)- (25) where T P , T N , F P , F N were true positive, true negative, false positive, and false negative, respectively. These performance parameters all together provide a proper evaluation of the proposed method, especially on imbalanced real-life datasets [78], [79]. Moreover, since the time window necessary for robust features extraction is 6 seconds, for the computation of the performance parameters, we ignored the activities shorter than 6 seconds for both reference and wrist algorithm. Spearman rank correlation method was used to compute correlations between parameters [80]. Additionally, the median and Inter-Quartile Range (IRQ) of each parameter were computed. In order to show the importance of each extracted feature, the number of non-zero coefficients reported by LASSO during 100 iterations was computed as a score. The higher value of the Lasso score means the higher importance of the feature for classification.

E. REAL PROTOTYPE IMPLEMENTATION
In order to show the feasibility of a hardware implementation of the proposed method and to evaluate the power consumption, we developed a real prototype of the proposed method using commercial electronics components. Due to implementation constrains, the prototype employed only 8 features (NI, MeanA, SAmax, DomSAmax, Cad, Wrist-Post, ZCR, RandA) and worked with a sampling frequency of 20 Hz. It should be noted that the real prototype was used only to evaluate the power consumption. For the rest of analysis, the set of all 13 features was used.

III. RESULTS
The following subsections report the results obtained from applying the proposed method on 1283 hours of free-living physical activities recorded in 66 young and old participants (datasets M1 and M2).

A. PERFORMANCE OF GB DETECTION
For one representative subject (#1 from dataset M1), Fig. 6 gives insight into the decisions made in daily life by the proposed wrist-based method and the reference. Due to the limitations of the reference algorithm, it was not possible to know what exactly activities B1-B14 were, only whether they were gait or not. As illustrated, during typical gait with swinging arm (e.g. bouts B1 plus the first part of B5, B7, B9,  B11, and B13), the proposed method perfectly detected all the GBs. In addition, when the subject and the wrist were motionless (e.g. the first part of bout B2 and the middle part of bout B6), non-gait periods were completely detected. Interestingly, the proposed method was able to deal with challenging periods, when the subject did not engaged in gait activity but the wrist was moving (e.g. the beginning and end of B6, B8, B10, and B14). Nevertheless, abnormal movements of the wrist caused few misclassifications due to the wide freedom of motion of the wrist (e.g. bouts B3 and B12). In order to show the generalization ability, we evaluated the proposed method through different combinations of training and test sets using M1 and M2 datasets (TABLE 1). When training and testing sets were the same, we employed leave-one-subject-out cross-validation strategy. The confusion matrix (TABLE 2) was also estimated by considering both datasets (M1 and M2). In order to test the sensitivity of the algorithm to the experimentally-adjusted parameters used in the classification, a 10 (%) of changes was applied on the frequency threshold in HLR[n], the α in SEF[n], and the l in Bayes estimator (all were individually tested). We observed a maximum of 0.1 absolute changes (with 0.1 IQR) in the median sensitivity and no more than 0.2 (with 0.4 IQR) in precision and F1-score, while specificity and accuracy remained unchanged.
The probability density function of features within each class (gait: blue, non-gait: red) are displayed in Fig. 7. Here, the features are grouped based on the biomechanical criteria used to define them. Moreover, the models built by Bayes estimator (through training) on each feature for the detection of gait and non-gait classes are also shown where the green and black curves respectively correspond to gait and non-gait classes. The LASSO scores are also reported which determine the importance of each feature.
As the proposed method is based on a wrist-worn sensor, it is important to evaluate and compare the performance between left and right wrists. TABLE 3 reports the performance parameters in the dataset M1 where the participants wore accelerometer sensors on both wrists at the same time.

B. EFFECT OF BOUTS' DURATION ON PERFORMANCE
Spearman test demonstrated a high correlation (R 2 = 0.95) between the total (summed) duration of GBs of each subject detected by the proposed wrist-based method and the reference, using both datasets M1 and M2 (see Fig. 8). More specifically, the proposed method showed a median [interquartile range] error of +12.6 [2.9 -27.3] (min) and 14 [6 -26] (%) for the estimation of the total duration of GB of each subject. Note that the reference reported a median [interquartile range] of 73.6 [45.1 -153.0] (min) for total duration of GB of each subject. Fig. 9 displays higher sensitivity of the proposed method for longer GB while the specificity remains almost constant.

C. POWER CONSUMPTION AND COMPUTATION TIME
We employed a low-power accelerometer (MC 3635, mCube Inc.) as well as an ARM microprocessor (nRF52840, @64MHz, Cortex-M4) including 256kB RAM and 1MB FLASH memory (see Fig. 10) in the designed prototype. Our analysis shows around 135.5 mAh power consumption of the proposed method for an effective usage during a whole year. Moreover, the computation time for processing of one feature window (6 seconds at 20 Hz, i.e. 120 samples) is around 1ms.

IV. DISCUSSION
In this study, an accurate and precise algorithm was devised to recognize GB and estimate their durations using a single lowpower accelerometer mounted on the wrist in unsupervised real-world situations. Two different datasets (younger and elderly, recorded in free-living conditions) were employed to validate the proposed algorithm. Moreover, the importance of FIGURE 7. Probability density functions (PDF) of the extracted features within each class (gait: blue, non-gait: red) based on different biomechanical criteria. In addition, the models built by Bayes estimator on each feature for gait (Green) and non-gait (Black) are presented. The LASSO score of each feature is also reported. the features, the effect of duration of GB as well as wearing the sensor on the left/right wrist were investigated.
The proposed method was highly capable of following the decisions made by the shank-based reference algorithm (see typical example in Fig.6). Almost all GB with arm swing were perfectly recognized (e.g. bouts B1, B7, B9, B11, and B13). These GB should probably occur during outdoor where regular arm swing exists to reduce energetic cost and facilitate the movement of the leg [81]. Moreover, the results in Fig 6 demonstrate that the proposed method was able to deal with challenging arm-in-motion non-gait periods that can frequently occur in daily life (e.g. B6, B8, B10, and B14). The proposed method might miss few short duration bouts (e.g. end of B5) mainly due to performing a window-based analysis (6 seconds). Eventually, non-gait bouts in which the arm has a periodic movement with a period similar to the gait (0.5-2 seconds) can challenge the proposed method. Our analysis demonstrated that the algorithm is robust to variations (±10 %) of the experimentally-adjusted thresholds. This is very important, especially when the algorithm is applied to complex and complicated real-world situations where acceleration patterns may vary a lot.
In order to obtain sufficient activity diversity, the method was validated on two different free-living datasets, M1 (young and active) and M2 (older and less active). When only dataset M1 was used for training, the proposed method obtained the highest performance when tested on the same dataset (   Here, the dark curves and shadows, respectively, indicate inter-subjects' median and IQR of the specificity and sensitivity. As an example, the red stars show median sensitivity and specificity of around 99 (%) and 97 (%), respectively, for the proposed method in detection of activities longer than 2 minutes. accuracy, precision, and F1-score of detection of GB, respectively. A higher value of sensitivity than precision indicates that the proposed method is more sensitive in the detection of gait than non-GB. Therefore, misdetection of non-GB as gait was more frequent than vice versa. This ensures that the method successfully detects most of the GB, which is crucial in daily life where there is a limited number of GB. As TABLE 2 specifies, for more than 1280 hours of recording of free-living physical activities, including 148 hours of gait, only 17 hours of GB were missed, which is less than 11 (%). For non-GB, the performance was much better where only around 3 (%) of the bouts were misclassified.
According to Fig. 7, probability density functions of the biomechanically-derived features (i.e. intensity, periodicity, posture, and non-gait dynamicity) illustrated a high ability of the selected features to distinguish between gait and non-GB. The LASSO scores showed that NACFmax, SAmax, NI and WristPost are among the best features. In addition, the periodicity was a better criterion to separate gait and non-gait. As illustrated in Fig. 8, a high correlation (0.95) was observed between wrist and reference methods for the total duration of GB of each person, indicating that the proposed method can accurately estimate how long a person is engaged in gait during the monitoring time. However, the wrist-based method showed a slight overestimation (around +13 min where the subjects had a median total gait duration of 75 min). One reason for this overestimation can be the usage of the Laplace parameter to determine prior probability of G and NG classes for the Bayes estimator. This issue should be considered when the method is applied for clinical assessment, especially when patient populations with critically limited numbers and durations of GB are targeted. Analysis of data from left and right wrists (TABLE 3) illustrated that sensor location does not significantly affect the results.
According to Fig. 9, the proposed method shows significantly higher performance (mainly sensitivity) for longer activities than short ones. For GB longer than 2 minutes (probably occurring outdoor), the method was able to detect almost all activities correctly (99 %). However, the detection of short GB, especially around 6 seconds, was challenging. One reason could be the higher stability of the acceleration signal within a window (6 seconds) in longer activities. Moreover, short activities happen mainly in indoor situations where more artifacts (e.g. sudden stops, turns, transients) exist. Another point is that, since the number of long non-GB is usually much bigger than the number of short non-GB, the specificity does not change a lot through removing short duration non-GB.
By using only one low-power accelerometer, optimization of features computation and code implementation techniques, an optimized implementation of the proposed method showed a very low power consumption (135.5 mAh per year) in real-world conditions. The implemented method offers around one year of continuous effective measurement of gait with a primary normal battery cell (250 mAh). This is a great advantage since many medical and sport applications crucially need long-term measurement of physical activities in real-life situations. Moreover, the simplicity of the proposed method and its low computation time (1 ms per window) offers the possibility of a real-time and on-board analysis of PA, allowing the possibility to generate real-time feedback that can be important in many applications.
Compared to previous works, the proposed method has obtained an excellent performance, considering that only one single accelerometer sensor mounted on the wrist was used, which is challenging for GB detection in real-world situations. For a fair comparison of the results, only few works have been validated in free-living situations ( [25], [11]). The authors in [25] have recently employed 22 features extracted from wrist-worn accelerometer and gyroscope sensors to classify four types of PA in daily life (walking, standing, sitting, lying) and obtained an accuracy and F-measure of 75.8 (%) and 58.1 (%), respectively. They also used a combination of IMUs mounted on the thigh plus low back (L5) which achieved an accuracy and F-measure of 96.8 (%) and 88.1 (%), respectively. Another study, [11], performed GB detection in real-life situations based on a biomechanical model to obtain an absolute intra-class correlation of 0.941 using a single accelerometer attached to the lower back. Unfortunately, none of the mentioned methods reports sensitivity and specificity parameters for the gait detection.
Most previous works have validated their algorithms only in supervised (e.g. in controlled laboratory setting) or semisupervised (e.g. simulated real-life) situations . TABLE 4 lists some of the methods, which are particularly wrist-based for classification of PA. They employed long (e.g. 2-5 minutes) and clean trials of PA in a pre-defined sequence to test their methods. It is shown that the validation of PA detectors in real-world situations will lead to a significant drop (up to 20 %) of the performance [23], [24], [65]. Even in this unfair situation, the proposed method overtakes the existing algorithms while using less features and sensors, providing a high autonomy (around 1 year). Furthermore, the method developed works in real-time where a small portion of data (6 seconds) with limited resources is used.
The proposed method, validated in an unsupervised daily situations in young and elderly subjects, offers a high potential to be used in clinical settings for the monitoring of patients with activity restrictions due to various diseases. As an example, the system is currently being used in a large population of older adults to characterize the distribution of daily GB as well as the effect of various factors such as aging, obesity, and frailty on the quality and quantity of PA in daily life situations. More importantly, the proposed method can be used as a primary stage of many PA analysis algorithms where accurate detection of GB is needed, such as cadence [60], and speed [61] estimations.

V. CONCLUSIONS
This study presented an accurate and precise method for detection of GB in free-living situations using wrist acceleration data. We extracted biomechanically-derived features integrated with a naïve Bayes classifier followed by two physically-meaningful post-classification steps to deal with the difficulties posed by challenging movements of the wrist in real-world situations. Such a wrist-based, low-power, and calibration-free (no calibration phase is needed for sensorto-body alignment) system offers a versatile measurement tool with high usability and autonomy, perfect for long-term monitoring of PA in free-living situations. In addition, the simplicity of the proposed method and being real-time allows implementation of the method inside a wristwatch, which protects privacy of the user. This also provides the possibility of giving online meaningful feedback to the user in daily life to promote a more active life-style. His research interests include methodologies for human movement monitoring and analysis in real world conditions mainly based on wearable technologies, with an emphasis on gait, physical activity, and sport. He is teaching in the areas of physiology and instrumentation, medical devices, biomechanics, and sports. He has authored or coauthored over 500 scientific articles published in reviewed journals, and presented at the international conferences and holds 12 patents related to medical devices. His research aims to perform outcome evaluation in orthopaedics, to improve motor function and intervention programs in aging and patients with movement disorders and pain, and to identify metrics of performance in sport science.