Probabilistic modelling of gait for robust passive monitoring in daily life

Passive monitoring in daily life may provide invaluable insights about a person's health throughout the day. Wearable sensor devices are likely to play a key role in enabling such monitoring in a non-obtrusive fashion. However, sensor data collected in daily life reflects multiple health and behavior related factors together. This creates the need for structured principled analysis to produce reliable and interpretable predictions that can be used to support clinical diagnosis and treatment. In this work we develop a principled modelling approach for free-living gait (walking) analysis. Gait is a promising target for non-obtrusive monitoring because it is common and indicative of various movement disorders such as Parkinson's disease (PD), yet its analysis has largely been limited to experimentally controlled lab settings. To locate and characterize stationary gait segments in free living using accelerometers, we present an unsupervised statistical framework designed to segment signals into differing gait and non-gait patterns. Our flexible probabilistic framework combines empirical assumptions about gait into a principled graphical model with all of its merits. We demonstrate the approach on a new video-referenced dataset including unscripted daily living activities of 25 PD patients and 25 controls, in and around their own houses. We evaluate our ability to detect gait and predict medication induced fluctuations in PD patients based on modelled gait. Our evaluation includes a comparison between sensors attached at multiple body locations including wrist, ankle, trouser pocket and lower back.


Introduction
Ubiquitous consumer devices such as smartphones and wearables are equipped with low power inertial sensors such as accelerometers and gyroscopes capable of continuously recording their wearer's movements. In controlled laboratory settings, such sensors have been used successfully to measure symptoms of patients with various movement disorders, such as Parkinson's disease (PD) [1,2,3]. However, these measurements only provide a snapshot of the patient's condition, and may not be representative of the symptoms experienced in daily living conditions outside the lab, for example because of observer effects [4]. Unobtrusive wearable sensors enable us to monitor patients in daily life, which may provide patients, care providers and researchers with useful insights in the course of symptoms [5].

arXiv:2004.03047v1 [cs.HC] 7 Apr 2020
However, obtaining reliable and interpretable measurements in uncontrolled environments is difficult. One strategy has been to record the patient's ability to perform specific tasks (e.g. walk 10 meters) at different times of the day (active tests) [6]. However, an important limitation of active tests is that patients are interrupted during the tests, which can lead to high attrition in compliance [7]. Additionally, it is practically impossible to obtain a continuous view of symptom fluctuations using short active tests.
Instead of instructing patients to perform specific tasks, we could use daily routine activities that are affected by the patient's condition to measure how someone's symptoms fluctuate throughout the day (i.e passive monitoring). An important example of such activity is walking, otherwise known as gait. Many movement disorders are associated with alterations in gait patterns, and neurologists often use in-clinic gait examination to establish a diagnosis. PD-related changes in gait patterns consist of continuous impairments involving slowness and reduced arm swing (bradykinetic gait) and episodic hesitations to produce effective steps (freezing of gait). In many patients, bradykinetic gait is already present early in the disease [8] and is responsive to symptomatic medication (e.g. levodopa) [9]. Therefore, measuring free living gait could serve as a marker for disease progression and therapy-related symptom fluctuations in PD patients. This would allow for unobtrusive remote patient monitoring, and can potentially facilitate titration of medication, early diagnosis and evaluation of new drugs [10].
In order to extract meaningful information about a patient's free living gait, we need a robust framework for gait detection and characterization of the gait pattern. When it comes to gait detection, most existing work can be classified into two systems: (1) activity recognition systems which classify (real time) data into a fixed number of pre-specified activities [11,12,13], and (2) gait detection systems that perform binary classification to determine whether a window of data belongs to the gait or no gait class [14,15,16]. Systems in both categories are typically trained and evaluated using labelled data from a pre-defined, and sufficiently distinguishable set of scripted activities, often collected in controlled environments. However, in uncontrolled environments, it is practically impossible to anticipate all the activities users might engage in. This means we expect a distributional mismatch between the activity data available during training and the actual out-of-sample data. In particular when such systems use many different features or deep learning, it is difficult to anticipate how the algorithm will cope with unseen activities. Furthermore, in health monitoring applications, we expect large differences in the way patients perform the same activity, gait in particular, due to their disease symptoms. This means that, on the one hand, we need labelled training data that better reflects real-life variation and variation due to disease symptoms. On the other hand, we need to acknowledge that it will remain infeasible to capture all variation in training data sets, and we need models that can account for this.
Binary gait detection systems are often implemented using threshold values applied to statistical summaries of windowed data [17,18,15,19,20]. Whereas this "low complexity" approach may have acceptable accuracy to globally describe how much users walk, problems can emerge when it is used as a starting point for evaluating the quality of the gait in health monitoring applications. For example, these systems group all gait together, regardless of changes in the gait pattern that can occur even within the same gait segment (e.g. because of changes in symptoms, pace or environment). As a result, the detected gait segments are likely heterogeneous and non-stationary, introducing unpredictable biases into subsequent gait pattern analysis.
In this work, we propose a unified framework for gait detection and gait pattern analysis. We have combined some the most common criteria used for gait detection into a principled probabilistic graphical model, which can be directly applied to the accelerometer data to infer varying gait and non-gait patterns occurring in free living. We adopt a flexible nonparametric model which can locate different gait and non-gait activities that vary both in terms of their statistical and temporal characteristics. Specifically, we use a set of high order autoregressive (AR) processes. The AR process is a parametric model of the frequency spectrum, hence it directly captures characteristics derived from the power spectral density of the data. At the same time, AR processes are time domain models which allows us to couple them with a nonparametric hidden Markov model (HMM) leading to an AR-iHMM also known as a nonparametric switching AR process [21]) to capture the longer-term changes in behavior patterns and gait types in free living conditions. Different HMMs have been applied previously, both for generic activity recognition [22,23,24,25,26], binary gait classification [27] and sub-typing of gait [28]. However, this has been done in a supervised setting where a parametric HMM is trained on features extracted from the windowed sensor data. By contrast, the AR-iHMM proposed here can be directly applied to the pre-processed time series, which allows us to circumvent challenges related to spectral estimation of windowed signals, and avoid the need for engineering many features.
Once free living data is segmented into different states, they can be used to separate gait from non-gait data, but also to identify segments in time of sufficiently different gait as well as short-term events causing interruption of gait. To demonstrate the applicability of this analytical framework, we use a new, unique dataset consisting of sensor data from various wearables and concurrent reference video annotations, collected during unscripted daily living activities in and around the homes of 25 patients with PD, and 25 age-matched controls. Our results demonstrate state-of-the- Figure 1: Example of 3-axis accelerometer data collected during unscripted activities in and around the home using a smartphone placed in a PD patient's front trouser pocket. art accuracy at detecting healthy and pathological gait in free living conditions from different sensor wear locations. Furthermore, we show that the model can identify changes in gait pattern after medication intake in individuals with PD.

Related work
In the last two decades, advances in wearable sensors have made it feasible to unobtrusively monitor patients outside controlled laboratory conditions, allowing us to study real-life gait patterns. However, to successfully deliver on that promise, we need tools which can reliably and robustly model data recorded from wearables in this setting. Here we review relevant prior work in terms of device location, gait detection algorithms, and gait characteristics under study.
Device location Studies on gait detection and gait pattern analysis have used accelerometers, gyroscopes and/or magnetometers worn on various body locations, including the trouser pocket [15], the lower back [29], the shin or ankle [26], the shoe [30], as well as the wrist [31]. The choice of device location is influenced by the expected gait detection accuracy, the type of gait characteristics that can be reliably estimated, patient acceptance, and the commercial availability of devices. An extensive review of widely-used wearable devices and their sensors for gait analysis can be found in Tao et al. [18], and a focused review on sensor placement for monitoring of PD can be found in Brognara et al. [32].
There is no consensus on the best device location to detect and characterize the gait of PD patients, and whether there is added value in combining multiple locations. Therefore, we evaluate our proposed framework on various commonly used sensor locations.
Another concern can be the limited commercial availability and high costs of "research-grade" devices. For this reason, we include a consumer smartphone in our comparison, which is widely available and relatively low-cost.
Gait detection Most gait detection techniques rely on parametric assumptions about the spectral density, time domain distribution or both [15]. Typically, features are extracted from windows of fixed width, and the decision to classify a window as gait or non-gait behavior is made using pre-defined thresholds or using a trained classifier. For example, one of the most widely used methods for identifying gait estimates the standard deviation of a windowed accelerometer signal, and uses a fixed threshold value [33]. An alternative, and similarly popular approach is the window-based analysis of spectral features [19,18,34]. Gait is typically highly periodic with Nyquist bandwidth of 10-15Hz [35]. This has motivated the use of the short-time Fourier transform (STFT) [36] to detect gait. For example, Sama et al. [37,17] studied the energy of the accelerometer signal in 800 different frequency bands. They applied Relief feature selection to identify the energy bands that are most descriptive of gait and then they used a support vector machine (SVM) to detect gait. Karantonis et al. [38] suggested directly analyzing the Fourier coefficients of the z-axis on the accelerometer to look for sufficient power at the expected range of walking frequencies (0.7-3.0 Hz). The time-frequency resolution issues of STFT-based walking detection have sometimes been addressed using wavelet transforms [39]. Continuous wavelet transforms often require large computational effort [40], but discrete wavelet transforms can be used to efficiently estimate high quality features of gait [41], more efficiently even compared to Fourier transform [42, page 254]. We can also encode the power spectrum directly in the time domain if we use windowed auto-correlation [43] and then use the values at a subset of time lags corresponding to the duration of the gait cycle [44,43].
A problem with these different window-based feature extraction methods is that signals acquired in daily life are highly non-stationary. When these non-stationarities occur within a window, for example, the transition from standing to gait, they may reduce the usefulness of the extracted features, particularly in the case of STFT (as we will further discuss in Section 4).
These different gait detection systems not only vary in the features they rely on, but also in the classification algorithm they use. "Traditional" classification techniques such as support vector machines and random forest classifiers are commonly trained on window-based features [45,46]. In addition, HMMs have also been used to detect gait based on window features, which offers the potential advantage of incorporating the sequential nature of human behavior [47,25,48]. Haji et al. [48] trained a hierarchical HMM on frame variance, raw data and second order polynomial coefficients and demonstrated that, in more challenging settings, this method significantly improves gait detection compared to, for example, peak detection and dynamic time warping (explained below). Despite the heterogeneity in gait patterns, gait detection is generally treated as a binary classification problem (gait/non-gait).
Other approaches avoid feature extraction altogether. For example, a stride template can be formed offline and online similarity to the template be determined (e.g. via cross-correlation [49,50] or dynamic time warping [51]). However, this approach is not very practical for detection of pathological gait whose temporal pattern can vary significantly even within the same individual. More recently, generic activity recognition pipelines based on deep learning methods [52,53] are being introduced, although the scarcity of labelled free living data currently limits their practical use.
Characterization of the gait pattern Once gait episodes have been identified, studies have used various approaches to characterize the gait pattern in movement disorders such as PD. Many studies try to identify important events of the gait cycle, including the heel strike or initial contact (IC), and final contact (FC) of both feet. Several variations to peak detection have been used for this, which may benefit from pre-processing the acceleration signal using continuous wavelet transforms (CWT) [54]. The timing of IC and FC events is then used to compute temporal gait features such as step time, swing time, stance time, and double support time. Additionally, based on assumptions about the exact sensor positioning and the biomechanics of gait, location-specific algorithms can be used to estimate spatial gait features. For example, having identified the ICs and FCs, one can use the inverted pendulum model to estimate the step length from the accelerometer signal of a sensor on the lower back [55]. Del Din et al. [56] used this approach and showed that free living gait analysis discriminated better between PD patients and healthy controls than lab-based gait analysis, which illustrates the potential of free living gait analysis. Moore et al. [57] suggested that the step length estimated using an ankle sensor could be used to track the free living gait pattern of PD patients, but only included three PD patients monitored over 24 hours in an apartment-like setting.
Other approaches focus on analyzing the periodicity of the accelerometer signal during gait, either based on the PSD or auto-correlation in the time domain. An advantage of these methods is that they are less dependent on locationspecific assumptions, compared to identifying gait cycle events and computing the step length. For example, Weiss et al. [58] computed the width of the dominant frequency in the PSD during free living gait (based on the accelerometer signal from a lower back sensor), and demonstrated that it could be used to predict future falls in patients with PD. Similarly, Rispens et al. [59] computed the PSD during free living gait based on a lower back accelerometer, and showed that the spectral power in the lower frequencies, and the amplitude and slope of the dominant frequency, were related to the number of falls in older adults. Pérez-López et al. [60] combined the identification of ICs with analysis of the PSD during individual strides, and showed that the power in the gait range (based on a waist accelerometer) was correlated to changes after medication intake in PD patients. Bellanca et al. [61] suggested that the harmonic ratio (ratio of the sum of the amplitudes of the even and uneven harmonics, computing over the PSD of a single stride) could be used as a measure of step symmetry. Alternatively, the periodicity of free living gait can also be analyzed in the time domain, for example by estimating the auto-correlation [62]. All analyses mentioned in this paragraph strongly depend on accurate localization of stationary gait segments, which may be sub-optimal given current gait detection algorithms. In this work, we propose that free living gait analysis can be improved by employing a unified approach to gait detection and gait pattern characterization.

Free living data collection
Most gait modelling pipelines are both designed and tested on data recorded under controlled lab conditions. In order to allow for a more realistic understanding of the challenges of modelling free living gait data, we have used a new reference dataset from the Parkinson@Home validation study. This study includes sensor data and video recordings during uninterrupted and unscripted daily life activities in the participants' natural environment. In brief, both patients with Parkinson's disease (PD group) and 25 age-matched participants without PD (non-PD group) were recruited. Inclusion criteria for both groups consisted of: (1) age 30 years or older and (2) in possession of Figure 2: Illustrative example of feature smoothing over accelerometer data during unscripted gait. The top panel displays the magnitude of 10 minutes of pre-processed accelerometer data collected using a smartphone placed in the front trouser pocket. The next panel below displays the standard deviation over 1 second windows, which is a descriptive feature of the switching gait patterns in the considered example. In the following panels we display the filtered values of the feature using: standard moving average in time (5 seconds); "conditional" moving average with same window, conditioned on the states identified by the AR-iHMM (displayed in red, blue and green). Note: the white segment contains multiple rapidly switching states as identified by the AR-iHMM (not shown). a smartphone running on Android OS version 4.4 or higher. Additional inclusion criteria for participants in the PD group were: (1) diagnosed with PD by a neurologist, (2) receiving treatment with dopaminergic medication (levodopa and/or dopamine agonist), (3) experiencing motor fluctuations (MDS-UPDRS item 4.3 ≥ 1), and (4) known to have PD-related gait abnormalities, i.e. bradykinetic and/or freezing of gait (MDS-UPDRS item 2.12 ≥ 1 and/or item 2.13 ≥ 1). PD patients who received advanced treatment (deep brain stimulation and/or intestinal infusion of levodopa or apomorphine) were excluded.
Participants were visited in their own homes and each visit included a standardized clinical assessment (full MDS-UPDRS [63] and AIMS [64]) and an unscripted free living assessment of at least one hour. To ensure indicative behaviors such as longer gait cycles were captured, assessors encouraged participants to include these in their routines. Participants in the PD group were asked to skip their morning dose of dopaminergic medication before the visit, so that they were in the OFF medication state at the start of the visit. After the MDS-UPDRS part III (motor examination) and free living assessment were conducted in the OFF state, participants took their usual medication and the full MDS-UPDRS, AIMS and free living assessment were performed in the ON state, i.e. with the symptomatic effects of medication present.
During the full visit, participants wore various light-weight sensors on different body locations. In this study, we used the accelerometer data from the smartphone worn in the front trouser hip pocket (collected using the Hopkin-sPD app [65]; all participants were instructed to wear trousers with a front pocket), and the accelerometer data from Physilog 4 devices worn on both ankles, both wrists and the lower back. To allow for time synchronization, all devices were triggered together (hit ten times against a table) in front of the video camera at the beginning and end of data collection.
The video recordings during the free living assessments were annotated by a research assistant, who labeled as "gait" any activity that involved at least 5 consecutive steps, with the exception of any running episodes.

Challenges of modelling free living gait
Because of its simplicity, robustness and affordability, the 3-axis accelerometer is by far the most widely used sensor for free living gait analysis. The accelerometer sensor measures the vector sum of all sources of acceleration acting on the device in each spatial direction. The unit of measurement is m/s 2 and if the device is not under other sources of acceleration, the only acceleration measured by the device is due to the force of gravity (zero magnitude under free-fall). An example of accelerometer data collected during the free living assessment is shown in Figure 1.
Analysis of free-living gait is challenging because accelerometer data simultaneously reflects both disease symptoms, behaviour, device orientation, sensor location and environment. This makes it difficult to design a reliable analytical pipeline which untangles these factors and allows us to focus solely on representative aspects of the gait that are relevant for monitoring PD. To make a necessary step in this direction, we highlight some of the common estimation challenges which we aim to address. We use examples from the unscripted free living assessments of the Parkinson@Home validation study.
Device orientation Accelerometers measure any forces due to accelerations which partly prevent the device from free-fall in the Earth's gravitational field. If we are interested in monitoring gait, however, we first need to remove this field effect from the raw accelerometer data, as irrelevant device rotations (e.g. slight variations in the attachment of the sensor) may otherwise confound any inferences we make about a person's gait. This analytical step is most commonly done using fusion of data from a magnetometer, gyroscope and accelerometer [66], or simply using a digital low pass filter [67] applied to the accelerometer signal. Sensor fusion is well justified from a physical modelling point of view, but it is not very accurate in practice during fast motion [68]. On the other hand, low pass filters are poorly justified, since unwanted orientation changes can be rapid with a broad bandwidth leading to unwanted distortions in the time domain depending on the cut off frequency of the filter. In this work we opt for a piecewise l 1 -trend filter as motivated in Badawy et al. [69] which assumes that changes due to orientation are piecewise linear [70].
The accelerometer data we use in any subsequent analysis, is pre-processed by interpolating to a uniform sample rate (i.e. using cubic spline interpolation [71])), applying the l 1 -trend filter to each individual axis and computing the magnitude of acceleration according to a 2 x + a 2 y + a 2 z . Parsimonious representation of gait data We have seen in Section 2 that most pipelines for analysis of gait data involve windowing of the sensor data and estimation of some statistical or spectral features. The estimated feature values are then used to make inferences about the behaviour monitored at that point in time (i.e. gait vs non-gait) or the gait pattern. However, in free living the variability of these features is large. For example, in Figure 3 we show how much the window standard deviation varies for both gait and non-gait classes, even within a single individual.
To reduce some of this variability, we tend to aggregate feature values (i.e. across time, across individuals, across similar behaviours). The way we make such aggregation will inevitably affect the quality of the inferences we make.
Let us consider the following example: we have 10 minutes of consecutive gait data from a PD patient where the gait varies significantly across different segments. In Figure 5 We plot how much for example the (1 second) window standard deviation varies in time and how the variation can be reduced by smoothing through time (using moving average). The underlying assumption is that feature values collected closely in time, should be similar (i.e. change smoothly). However, when monitoring heterogeneous behaviours such as gait, this assumption does not hold. In contrast, we also display the "conditional" moving average obtained if we first segment the series into 4 stationary states (using the proposed AR-iHMM) and only smooth data that belong to the same state.
A similar argument can be made for the features based on the spectrum. A common "building block" for the estimation of the spectrum is the short-time Fourier transform (STFT). Gait detection based on STFT assumes that if the data contains sufficient spectral power within the range of normal gait cadence, it represents gait activity [15]. However, the support at the different frequencies also exhibits large variation both within gait and non-gait classes, as shown in Figure 4. Similar to standard deviation, more stable spectral estimation can be done by aggregating across neighbouring windows, e.g. using Welch's overlapped averaging power spectral density (PSD) estimator [72, 42, Section 7.4]. The problem is that this still assumes that the signal is stationary across the windows, which is often not the case in free living data because of abrupt changes in behaviour (e.g. changing pace, turning, starting to make gestures), environment (e.g. changing walking terrain), and PD symptoms (e.g. hesitations to walk through doorways) which affect the characteristics of the gait pattern. If we ignore such variability when choosing the gait segment boundaries, we would obtain less useful spectrum estimates, damaging any further inference. Figure 5 displays an example interval of consecutive PD gait where the gait pattern changed abruptly within the interval. We show that the Welch PSD associated with each of the two gait patterns, varies significantly from the Welch PSD estimated when grouping both gait patterns together. An additional problem that arises with estimating Fourier features in free living is that the accurate estimation of the spectrum at the gait frequencies, rests on the assumption of periodic continuation [42]. Because of common non-stationarities in free-living (e.g. mentioned changes within gait episodes, but also the start and end of gait episodes), violations of this assumption are common and can lead to spurious spectral artifacts, for example caused by Gibbs phenomenon [42]. Typically, these issues are ameliorated by using other window functions than the rectangular window, such as the Hanning window [73]. However, while windowing matches samples at window edges (by zeroing), it also distorts the waveform because it causes amplitude modulation. In conclusion, the usefulness of spectral estimates largely depends upon accurately locating stationary segments in time, i.e by accurately detecting the start and end of gait episodes, and by detecting (abrupt) changes within gait episodes. At the same time, doing this depends on having access to spectral estimates. Because of this interdependence, we propose a unified framework that addresses both these problems simultaneously. Figure 5: Illustrative example of estimating the power spectral density over an unscripted gait segment that contains switches between two different gait patterns (i.e. is approximately piece-wise stationary). The top panel displays the signal magnitude of 20 seconds of pre-processed gait data from a PD patient, obtained from a smartphone worn in the trouser pocket. The red and blue shading indicates different gait states (as identified by the AR-iHMM). The bottom panels display the Welch's PSD estimates: for data from the red gait state (left); for data from the blue gait state (right); for an equal amount of data from both (middle). To allow for same resolution in the 3 bottom plots, we have used 20 seconds of data for each plot.

Probabilistic modelling of gait
Most systems focus on segmenting accelerometer data into gait vs. non-gait classes, by contrast we first segment the data into multiple different groups (more than two) and afterwards assign these groups to gait or non-gait class. We do this efficiently by designing a flexible, probabilistic model which is trained directly on the magnitude of acceleration obtained after removing piecewise linear device orientation changes (see Section 4). The proposed model does not rely on any labels, hence it is more objective than a supervised classification approach and reduces the risks of over-fitting.
Autoregressive modelling of gait The first assumption we make is that the repetitiveness of the gait cycle (heel strike, midstance, heel off, midswing, heel strike) is one of the key properties that characterize gait episodes. The periodic nature of the accelerometer data during gait [62] makes it efficient to detect and model gait based on the spectrum, for example using the Fourier transform. However, Fourier spectral analysis inherently assumes periodic continuation (see Section 4). We address this problem by simultaneously estimating the spectrum and the start and end points of the stationary gait episodes. To achieve this, we first model the spectrum of the gait in the time domain, using an autoregressive (AR) processes [42]. An order r AR model is a random process which describes a sequence {x t } T t=1 as a linear combination of previous values in the sequence and a stochastic term: where A 1 , . . . , A r are the AR coefficients, T denotes the length of the sequence e t is a zero mean random variable, assumed to be an i.i.d. Gaussian sequence (we can trivially extend the model such that e t ∼ N µ, σ 2 for any realvalued µ). We assume that the AR noise variance σ 2 is unknown and place a conjugate inverse-Wishart prior over it. This essentially means that in addition to modelling the periodicity of the input signals, we also account for changes in the non-periodic components of the signals. We saw in Section 4 that the window variance of the acceleration can be a useful discriminator of gait versus non-gait on its own in certain scenarios. If we assume an AR model of order r = 0, the variance of e t is the variance of the window.
AR processes are commonly used as parametric models of the PSD since the power spectrum is determined by the AR parameters [42]: where f is the frequency variable and i denotes the imaginary unit. This means that the number of non-zero AR coefficients determines the complexity of the PSD which the model can represent: there is a peak in the PSD for each complex-conjugate pair of roots of the coefficient polynomial. Parametric spectral estimation is often more stable than non-parametric PSD methods, and can be of high quality using fairly little data, assuming the model is correct. By contrast, non-parametric PSD estimation methods require more data to produce stable estimates, unless we trade some of the frequency resolution via averaging, as in Welch's PSD [42]. The parametric model of the spectrum will allow us to construct a flexible, non-parametric model of the switching dynamics of different gait and non-gait activities in free living. More detailed discussion on the relative merits of different spectral estimation methods combined with machine learning, can be found in Little [42].
High order adaptive autoregressive processes As mentioned above, the AR order r we use will determine the complexity of this parametric model of the spectrum. The optimal AR model r is likely to vary across different stationary segments of sensor data and choosing fixed r which is too large will lead to problems with parameter estimation (fitting the AR coefficients). At the same time, gait is typically characterized by a low fundamental frequency, with bandwidth of up to 10-15Hz (see Section 2). This implies the need for fairly high order r AR processes (together with sufficiently high sample rate) in order to accurately capture the typical range of gait frequencies. To address this conflict, we use a non-conjugate Bayesian prior on the AR coefficients A 1 , . . . , A r which induces sparsity of the coefficients (only a few are non-zero at any one time) and allows us to draw conclusions about the autoregressive model coefficients that do not contribute to the underlying dynamics of the gait. In effect, this means that we attempt to learn fewer than r AR coefficients supported by the signal but potentially associated with larger AR time delays. This is done by assuming independent, zero-mean Gaussian priors on the coefficients A 1 , . . . , A r with unknown precisions, which acts as an automatic relevance determination prior (ARD) [74]. The ARD prior was first proposed in the context of neural network models in Mackay [74] and then later adopted for discrete latent variable models in Beal [75] and for switching AR processes in Fox et al. [76].
Latent switching behavior dynamics To analyze free living data, it is insufficient to define a parametric spectral model for the patients' gait, because participants regularly switch between different gait and non gait episodes, which results in highly non-stationary time series (see Figure 1 and Figure 5).
Even within gait episodes, the optimal AR parameters to model the gait might change depending on the speed, amplitude and other characteristics of the walking pattern. In order to group similar gait signals, but also separate gait Table 1: Gait detection performance of the proposed AR-iHMM and of common gait detection algorithms (using the thresholds reported in the literature, and after pre-processing and optimizing thresholds). We have computed the average performance and standard deviation using leave-one-subject-out cross-validation. For PD patients, we show the performance of the complete free-living assessments, and the difference in balanced accuracy (average of sensitivity and specificity) between the parts before and after medication intake. from non-gait data, we use a switching AR process model [77] (AR-HMM). However, one drawback of conventional switching AR processes, is that it requires a fixed number of hidden states and AR order. Since the heterogeneity in both gait and non-gait episodes will increase as more free living data becomes available, we adapt the more flexible non-parametric switching AR process first proposed in Fox et al. [21]. The model can be thought of as an infinitestate extension of the model of Kim [77] (AR-iHMM). Viewing the switching AR model as a hidden Markov model (HMM) with AR processes used to model the HMM emissions, then in the non-parametric switching AR model the parametric HMM is effectively replaced with an infinite HMM [78].
In the AR-iHMM model, we assume that the data is an inhomogeneous stochastic process and that multiple AR models are required to represent the dynamic structure of the signal, i.e.: where z t ∈ {1, . . . , K + } indicates the AR model associated with time index t. The latent variables z 1 , . . . , z T describing the switching process are modelled with a Markov chain. A transition matrix π is estimated with K + rows and K + + 1 columns indicating the probability of specific transitions from existing state i to existing state j, π ij , or from existing state i to a new state K + + 1, π iK + +1 . Transitions that are observed more often during the training of the model will have higher probability, represented in the transition term π ij .
When K + T , this model clusters together parts of the signal into an, a priori, unknown number K + of time segments which are best represented with the same AR coefficients. In AR-iHMM, K + is unknown: instead of being fixed it is inferred from the data and can adapt to new, unseen structure in the data. The AR-iHMM is obtained by augmenting the transition matrix of the Markov process π underlying the latent variables z 1 , . . . , z T with a hierarchical Dirichlet process (HDP) [79] prior.

Empirical comparison of gait detection algorithms
In order to make inferences about the gait pattern, we first need to verify that our proposed framework is able to identify gait segments. In this section, we evaluate our ability to detect gait as annotated in the video recordings of the Parkinson@Home validation study. To establish whether our approach achieves reasonable results, we include a comparison with widely used gait detection algorithms. It is important to note that our goal was not to maximize gait detection accuracy (when compared with human annotation), but to locate gait segments in time, that are useful to study the effect of PD on the gait pattern.
In this section, we use the pre-processed accelerometer data (see Section 4) from the smartphones and Physilog 4 devices placed on various body locations (see Section 3). We infer the AR-iHMM described in Section 5 using scalable iterative MAP inference proposed in Raykov et al. [80]. Any hyperparameters associated with the AR state priors or the HDP prior (see Section 5) are fixed across patients and are selected using standard Bayesian model selection. For each point x t , we consider it is associated with its most likely state z t = k * to enable direct comparison, i.e. we ignore the estimated uncertainty associated with the segmentation indicators.
To determine if the identified hidden Markov states should be classified as gait or non-gait, we consider the ARbased PSD estimates associated with each state. Specifically, we compute the total energy at frequencies in the range [0.5 -10Hz], and select a threshold of minimal spectral energy that maximizes the balanced accuracy (average of sensitivity and specificity) averaged across participants (measured against the manual video annotations for the presence of gait). We evaluate the performance of selecting the threshold using leave-one-subject-out cross-validation. Thresholding using a shared PSD range across participants is done only to enable a fair and intuitive comparison with the other commonly used techniques for detection of gait in smartphones and wearables; in principle, once the AR-iHMM model is trained we can derive multiple features related to the distribution of the sensor data and train a supervised classifier on these features.
For the comparison with existing algorithms, we implemented the following widely used generic gait detection algorithms: STD-thresholding [33,15]; STFT-thresholding [36]; normalized autocorrelation step detection and counting (NASC) [43] and continuous wavelet transform (CWT) thresholding [81].We evaluate the performance of the original formulations of the algorithms, and the performance after applying our pre-processing pipeline and adjusting corresponding thresholds to maximize the balanced accuracy across participants using leave-one-subjectout cross-validation 1 : • STD-thresholding: we set a threshold based on the 1 second window standard deviation which optimally discriminates gait from non-gait classes; • STFT-thresholding: we set a threshold based on the 1 second window total energy at frequencies in the range [0.5 -10Hz] to maximize the balanced accuracy averaged across participants; • NASC algorithm: the NASC involves first applying STD-thresholding and then evaluating the autocorrelation of the remaining data over 2 second windows, specifically looking at the time delays representative of gait. We set a modified STD threshold, a range of delays, and an auto-correlation threshold to maximize the balanced accuracy averaged across participants (iteratively, one at a time); • CWT-thresholding: we compute the ratio between the energy in the band of walking frequencies and the total energy across all frequencies, and set a threshold to maximize the balanced accuracy averaged across participants.
In Table 1 we report the performance of the different methods when applied to the smartphone data. In Figure 6, we plot the trade-off between sensitivity and specificity as we vary the threshold in the receiver operating characteristic (ROC) curve for the different methods. Finally, in Table 2 we evaluate all algorithms on data from different accelerometry devices, placed at different body locations (see Section 3).

Inclusion criteria
It should be noted that we do not provide a comprehensive comparison of all available gait detection algorithms: we have omitted some because (1) they were largely based on heuristics which could not be trivially adapted for detection of pathological gait; (2) they demonstrated very poor performance on our dataset; or (3) they had strong conceptual overlap with the techniques included in the comparison. In addition, we have excluded activity recognition pipelines that rely on a large number of features to detect gait. The main reason for this is the inherent curse of dimensionality which we want to account for in most health monitoring applications. The huge variation in unconstrained free living data, coupled with the fairly limited amount of labeled data (in the Parkinson@home validation study a few hours from 50 participants) suggest that there is high risk of overfitting our training data and collecting unreliable gait measurements out-of-sample in the unsupervised setup. Because of this, we propose that one should aim for principled gait detection criteria, rather than simply maximizing the accuracy on training datasets.

Discussion
First of all, the results in Table 2 show that it is feasible to identify gait using our modelling approach, with at least as good average performance compared to existing algorithms. In addition, the results underline the importance of appropriate pre-processing and threshold adjustment when applying algorithms to patients with PD.
In most methods, we observe a difference in accuracy between PD patients and controls, and between before and after medication intake for PD patients. The latter is most notable in patients with a strong response to medication. This difference in accuracy between before and after medication intake was less prominent for the AR-iHMM, which also demonstrated less variability in the performance across PD patients. Moreover, the performance of the AR-iHMM was relatively robust to different body locations of the sensor in comparison to STD-thresholding, NASC, and CWT-thresholding (Table 2). While balanced accuracy is fairly similar, the AR-iHMM consistently captures longer gait segments, but misses on shorter duration walks, whereas the remaining techniques are more accurate on shorter periods, but interrupt long consistent walking intervals.
It is worth noting that the occurrences (prevalence) of the gait/non-gait classes are not balanced in the free living assessments from the Parkinson@Home validation study. Across PD patients, the mean walking time is 16% with a standard deviation of 6%. The mean walking time is slightly higher for non-PD controls at 21% with the same standard deviation. Because we cannot assume that this prevalence is representative of truly free living situations, we choose to evaluate the methods with measures that are independent of the prevalence of gait (i.e. sensitivity and specificity). However, the thresholds were set to optimize the balanced accuracy (mean of sensitivity and specificity), which implicitly optimizes for the situation where class prevalences are equal, and misclassification costs are equal as well. Different applications may require different settings of the thresholds. In contrast with the other methods evaluated, the trained AR-iHMM model offers a simplified representation of the data, which can be used to distinguish between multiple gait and non-gait patterns. We discuss the use of the different gait states in the next Section 7.

Modelling gait pattern changes
The non-parametric nature of our approach allows us to rigorously identify different segments in a data driven manner. In addition to gait detection, this segmentation can also be used to identify clinical changes in the gait pattern. We demonstrate this in PD patients by showing that the gait segments identified by our algorithm, have a different gait pattern before and after intake of symptomatic medication.
Discrimination of gait before/after medication intake An important potential application of free living gait analysis in PD patients is monitoring real-life variations in the response to medication. It is well known that dopaminergic medication (e.g. levodopa) can have a visible effect on PD patients' ability to walk [82]. Here we investigate the effectiveness of our probabilistic modelling approach for the problem of classifying gait episodes into "before medication" and "after medication" classes. The comparison is done using smartphone accelerometer data from the home visits of the Parkinson@Home validation study (described in Section 3).
For this classification problem, we consider all segments that were identified as gait by our model (see Section Empirical comparison of gait detection algorithms'). The AR-iHMM segments the data into intervals with the state variables z t denoting the AR state representing the signal at time indexed t. If we then assume K + unique values for z t as t = 1, . . . , T , we will estimate K + sets of AR coefficients: A k 1 , . . . , A k r K + k=1 . For each state k we estimate the spectrum based on AR coefficients A k 1 , . . . , A k r . There are multiple PSD features that in principle can be used to monitor PD related changes in the gait pattern: position of the dominant peak (often the fundamental frequency); height of the dominant peak; width of the dominant peak; ratio of the first and second peak; energy in a specific frequency range, and others [20]. In our evaluation, we consider the height and position of the dominant peak, and the total energy in the range 0.5-10Hz (gait related information is expected in this frequency range). In Figure 7 we illustrate how the total energy estimated from the PSD varies throughout a single visit and how it varies during the predicted gait periods of that visit. Because we expect that the relative rather than the absolute within-person changes are relevant to distinguish between before and after medication intake, we normalize all features per patient using z-scores.
Of the 25 PD patients taking part in the study, 18 were considered to have sufficient walking periods both before and after medication. In these patients, we train a logistic classifier using each of the features described above, to predict whether a gait segment occurred before or after medication intake. For each patient, we compute the out-of-sample Table 3: Balanced accuracy (average of sensitivity and specificity) to predict whether gait segments occurred before or after medication intake, using a logistic classifier based on PSD features obtained from the AR-iHMM, normalized per subject. We use leave-one-subject-out cross validation, and present the mean and standard error across subjects. Results are compared between using gait segments as annotated on the video recordings ("Annotated gait"), and gait segments identified by our model ("Predicted gait").

Feature
Annotated gait Predicted gait Peak hight 72% (9%) 70% (10%) Peak position 51% (14%) 50% (18%) Total energy 74%(8%) 78% (8%) accuracy based on leave-one-subject-out cross validation 2 . As displayed in Table 3, we can predict with reasonable accuracy whether a gait segment occurred before or after medication intake; not all patients have visible motor response after medication intake, so we expect that achieving perfect prediction accuracy will not be possible. Note that combining multiple gait features may slightly increase accuracy, but the goal here is to obtain an interpretable model.
To examine how our approach for gait detection affects the ability to discriminate between before and after medication intake, we compare results with using the gait segments as annotated on the video recordings. Here, we learn the AR-iHMM on all annotated gait data, and use all the identified states to obtain the AR-based PSD. We see that the accuracy when using the gait segments identified by our model is at least as high as using the annotated gait segments. When using the total energy, the model-based gait segments appear to be even more informative. A possible explanation for this is that not all gait segment are equally informative, and that the model selects longer, more periodic, "steady-state" gait segments, which are less sensitive to environmental or behavioral factors that are irrelevant for the classification problem.
Exploratory gait analysis Because our probabilistic model is unsupervised, we can use the model not only as a tool to make predictions, but also as an exploratory tool to study the gait data. For example, in Figure 8 we show the gait segmentation of a PD patient with a notable clinical improvement in gait pattern after medication intake (based on the video annotations), where different colors indicate different hidden states z t . What we observe is that the probabilistic model not only allows us to identify non-gait segments (pink and green), but also discriminates between different variations in gait quality. In this patient, the model separates before medication gait (red) or after medication gait (yellow). By contrast, Figure 9 shows segmented gait of a PD patient whose gait does not notably improve after medication intake (based on the video recordings). Interestingly, we can still identify different gait segments both before and after medication intake, but their pattern of occurrence is similar in both conditions. Furthermore, inspection of the AR-based PSD estimates associated with the states in both figures indicates that the gait states in Figure 9 are more similar to each other than the gait states associated with before and after medication periods in Figure 8. It should be noted that this contrast was not present in all patients, and we show two illustrative cases here. Further research is needed to identify reasons why some cases do not show the expected change.

Discussion
In this report we study the problem of passively monitoring movement disorders such as Parkinson's disease (PD) in daily living using wearable sensors. This is a challenging problem for two main reasons. First, many factors other than the changes to the medical condition, contribute to the enormous variation seen in daily living sensor signals, such as voluntary behaviour and device orientation. Second, it is costly and logistically difficult to collect representative data in daily living with reliable labels. This may explain why highly flexible methods such as deep learning have not been successful in the context of monitoring symptom fluctuations in PD [83]. This has stimulated the search for signal models that are based on principled assumptions which reduce the model's flexibility while still allowing it to capture subtle disease-related changes. In this work we propose a simple, structured probabilistic modelling approach specifically for the analysis of free living gait. Gait is a promising target for passive monitoring because it is a common behaviour and responds to symptomatic medication in PD patients. Our approach is designed to simultaneously locate stationary gait segments and characterize the gait pattern based on pre-processed accelerometer data. We achieve this by adopting a non-parametric switching autoregressive model, circumventing the need to use window-based analysis and the need to pre-define the number of gait and non-gait classes that can be observed in daily life. Figure 8: Segmentation of smartphone data during the free living assessment obtained from the AR-iHMM, from one PD patient with clinically observable changes in the gait pattern after medication intake (based on the video recordings). Top: before medication intake. Bottom: after medication intake. The red, yellow, blue and grey segments are all associated with gait data (according to both the video annotations and the classification based on the AR-iHMM-based PSD features); the remaining segments indicate different non-gait patterns. Figure 9: Segmentation of smartphone data during the free living assessment obtained from the AR-iHMM, from one PD patient with no clinically observable changes in the gait pattern after medication intake (based on the video recordings). Top: before medication intake. Bottom: after medication intake. The yellow, blue and grey segments are all associated with gait data, with similar occurrence before and after medication intake; the remaining segments are not associated with gait.
We demonstrate our approach on a new reference dataset including 25 PD patients and 25 controls. The dataset is unique because it combines unscripted daily living activities in and around the house with detailed video annotations, which allows us to test our model on a much more realistic setting. First, we show that the identified classes can be used to accurately detect gait. Second, we show that states that represent gait can be used to predict medication induced fluctuations in PD patients.
There are other potential advantages of the proposed model which are not reflected in empirical classification accuracies. Our approach has two key advantages when it comes to estimating the spectrum of the free living accelerometer data (or similar sensors): (1) the time boundaries of each segment of stationary data over which we compute the spectrum are adaptively selected by the model, which avoids the need for window-based analysis and problems associated with this; (2) in a fully probabilistic fashion, we can leverage multiple repeating patterns to get a more robust estimate of the spectrum.
Additionally, because our algorithm does not treat the problem as binary classification (gait/no gait), but is designed instead to learn multiple gait and non-gait states, it can deal with changes in gait pattern during gait episodes. This avoids grouping different gait patterns together, which can introduce problems in further gait pattern analyses. Moreover, because it is unsupervised, the model can be used as an exploratory tool to locate gait or non-gait segments that share the same (spectral) characteristics. The user can explicitly control the prior parameters of the model to determine the temporal granularity of the segmentation and focus in on more or less detailed changes in the gait signals. This extra control allows us to focus on sufficiently stationary ("steady state") gait segments of good quality gait data representative of patient's PD symptoms. Lastly, using this fully Bayesian model to describe the acceleration signals allows us to estimate the uncertainty involved with both the segmentation and the estimation of the spectrum.

Limitations and future directions
In order to develop an intuitive, robust and easy to interpret probabilistic signal model for gait data in free living, we have made some restrictive assumptions about the distribution and occurrence of such data in daily life. Despite the flexibility of inferring an unknown number of different spectral AR representations, we have focused on the states that have sufficient spectral power in the gait range to monitor changes after medication intake. We believe this approach is appropriate for monitoring the highly prevalent continuous impairments in PD patients (bradykinetic gait). However, by focusing on short-term interruptions of the gait, the model could potentially also be very suitable to monitor the more rare episodic hesitations (freezing of gait). Because of the limited number of patients that presented with this symptom during the free-living assessments of the Parkinson@Home validation study, this remains to be evaluated using other data sets. In addition, the proposed framework does not use the axis meanings in the sensor outputs. This was done because in smartphones, the default orientation of the device can be different depending on the user. Our approach can also be applied to the three-dimensional dynamic component of the acceleration vector, or to any specific axis. Before the framework can used in medical contexts, further validation is necessary. For example, in the current study protocol the gait before medication intake was measured after overnight withdrawal of dopaminergic medication. While this allowed for a detailed assessment of the changes after medication intake, in some patients the effects in daily life might be more subtle. Future work will aim to evaluate how well response fluctuations can be captured for naturally occurring shorter withdrawal periods in truly unsupervised conditions. Finally, we emphasize that the developed framework aims to merely segment varying gait patterns in a principled and largely unsupervised manner. In order to assign meaning to detected changes in the gait pattern observed in real-life, e.g. estimate causal effects of medication, further analysis using a carefully designed causal map is required. For example, real-life factors such as environment (e.g. crowded city versus park) might also influence the observed gait pattern. Future work will include adjusting for real-life confounding, using contextual sensors such as GPS.