Gait Phase Estimation of Unsupervised Outdoors Walking Using IMUs and a Linear Regression Model

Human gait analysis and detection are critical for many applications, including wearable and rehabilitation robotic devices, reducing or tracking injury risk. The proposed work allows researchers to study the gait phase of human subjects in an unsupervised outdoor environment without the need for fixed thresholds and sensor-embedded insoles. We present an experimental protocol to label gait events based on patterns in human subjects from two body-worn inertial measurement units (IMUs). Gait patterns are developed using a force plate and a motion capture system. Upon defining the gait pattern, human subjects walk outdoors for forty minutes to train and test a principal component analysis (PCA)-based linear regression model. Next, gait phase estimation is performed using the defined patterns from other human subjects to accommodate cases where motion capture and force plate data are unavailable. Results showed a minimum normalized gait phase estimation error of 1.81 %, a maximum of 2.48 %, and an average of $2.21~\pm ~0.258$ % for all subjects involved. Results are particularly significant because the proposed work can be expanded to precise control of human-assistive devices, rehabilitation devices, and clinical gait analysis.


I. INTRODUCTION
Walking is an essential part of humans' activities of daily living. As a result, there has been increasing clinical and research interest in gait monitoring and analysis. Gait cycles begin when the heel of one foot strikes the ground (0% of gait phase) and ends with the next strike of the same foot. Each cycle includes two major phases, a stance, and a swing phase. The stance phase is when the studied foot is on the ground; the stance starts at heel strike (HS) and ends at toe-off (TO). The swing phase is when the studied foot is off the The associate editor coordinating the review of this manuscript and approving it for publication was Rajeswari Sundararajan . ground; the swing starts at TO and goes to the next HS of the studied foot (100% of the gait cycle) [1]. Gait analysis systems are used to evaluate patients' health and study human gait [2], [3] and are an essential clinical tool for a wide range of applications, such as assessing neurological and sports injuries [4], risk of falls [5], prostheses design [6], [7], rehabilitation devices [8], [9] and gait emulators [10], [11]. For optimal performance of such applications, accurate detection of the gait phase is required. For example, Cherell et al. designed the Ankle Mimicking Prosthetic Foot (AMP-Foot) that utilizes torque control of DC motors [12]. Distinct gait events are used to provide the required parameters for controlling the torque of the motors. Yang et al. used two hall effect sensors to detect two phases of the gait cycle and command a fuzzy controller for a passive prosthesis [13].
Gait analysis and estimation are typically conducted in a proctored lab environment equipped with motion-capture systems [14] and force plates [15]. These approaches can provide an accurate measure of the human gait phase; however, they are expensive and require large laboratory spaces [16]. Alternatively, Inertial Measurement Units (IMUs) are utilized in many gait studies due to their low cost and portability. In more recent work, IMUs are attached to human subjects' feet, shanks, or waists to detect HS and TO events [17]. The placement of IMUs on foot is most common since it provides a clear distinction between the swing and stance phases [16], [18], [19]. Different gait phase estimation approaches use online and offline detection methods. Lee and Parkused a single-foot IMU to detect heel-strike (HS) (using spikes in linear acceleration) and toe-off (TO) in an offline process [20]. Bae [25]. Quintero et al. proposed using a single thigh IMU to continuously measure gait phase and speed estimation based on a single mechanical parameter derived from human thigh motion [26]. Potter et al. used an array of body-worn IMUs and an error-state Kalman filter to estimate lower-limb kinematics, testing their work on a simplified 3-body model of the human lower limbs [27]. Previous groups typically validated their work with human subjects ranging from 3-20 recruited human subjects. The authors previously used foot and shank magnetic, angular rate, and gravity sensor modules (MARGs) and an errorstate Kalman filter to get a real-time estimate of foot and shank orientations, to calculate the ankle angle on 1) a 2-DOF prosthetic ankle, 2) a human ankle [28]. Similarly, Qiu et al. estimated HS and TO using wavelet analysis based on angular data of the thigh and shank IMUs [29].
So far, available algorithms have depended on estimating the gait phase in a proctored indoor lab environment. One advantage of lab settings is that they provide tools to estimate ground truths reliably. However, gait analysis in an indoor lab environment considers multiple assumptions and does not consider all the variability in natural gait that typically occurs outdoors. Wearable devices with IMUs or combined with other sensors have been implemented on more realistic test conditions. Aminian et al. presented an algorithm for detecting gait events and identifying terrain types based on IMUs and electromyographs (EMGs) on the thigh and shank [30]. Pham et al. used an IMU wearable device on the waist to estimate the stride length of an outdoor walk on a square trajectory [31]. Wearable technology can be used to study numerous activities of daily living not limited to walking; however, PCA is still an effective parameter reduction technique when using machine learning or deep learning models [32]. Other previous efforts to label outdoor gait events used threshold values to detect HS and TO [16], [18], [19]. Alternatively, a smart shoe insole with force sensors is used to identify the stance phase [21], [33]. However, threshold labeling isn't robust to changes in gait cycles and can lead to false labels based on noisy IMU readings. In addition, smart insoles can impede human subjects' natural gait and be sensitive to sweat, causing difficulties in the experimental protocol.
This paper proposes a novel approach to labeling outdoor kinematic data (HS and TO) based on ground-truthdeveloped patterns of foot IMU data. Generated patterns are driven in an indoor lab environment using a force plate. A gait-based PCA model is developed per subject to reduce feature selection and prevent overfitting and match computational power. The model estimates each subject's gait phase of an outdoor walking session. Furthermore, the models trained on three subjects were tested on a fourth subject.
The rest of this paper is organized as follows. Section II describes the 2-IMU setup on the human subject and the experimental protocol. The experimental protocol includes IMU calibrations, an indoor 8-minute walk, and a 40-minute outdoor walk. Next, section III presents the results of using the generated models to estimate the gait phase and further implications of developed methodologies. Section VI concludes the paper. Finally, Section V provides relevant details, videos, and data required to implement the proposed work. All the recorded human subjects' data is available as opensource data sets that can be used to replicate methodology or other gait analysis methods.

II. METHODOLOGY A. 2-IMU SET UP
Two IMUs (Precision NXP 9-DOF, Adafruit, USA) are secured to the shank and foot of human subjects, as shown in Fig. 1. The IMUs are attached to the subject using athletic and skin-safe double-sided tape. Furthermore, wires are twisted together and protected using wire sleeves to limit interference with the subject's natural gait. The IMUs are connected to a microcontroller unit (MCU, Teensy 4.1, PJRC, USA) with a built-in SD card placed in a fanny pack around the subject's waist. MCU is powered using a small 3.7V LiPo battery. IMU 1 is connected to the subject's foot, IMU 2 is attached to the subject's shank. The MCU stores data packets of raw  ). The IMUs signals are sampled at 400 Hz. Teensy 4.1 was selected since 1) it comes with a built-in SD Card reader that can write (Arduino SDFat library) to a text file at a high rate (400 Hz), and 2) it can be powered using a small 3.7 V battery. The placement of IMU on the human shank was placed midway between the knee and ankle joints and away from a dominant muscle group (tibialis anterior) to reduce the vibration effect. Foot IMU was centered around the subject's laces. The exact placement of the shank and foot IMU isn't critical, as raw IMU readings are corrected using calibration with a motion capture system that accurately defines a shank and foot frame.

B. EXPERIMENTAL PROTOCOL
The experimental protocol aimed to quantify the gait phase estimation accuracy of human subjects during natural gait in an outdoor environment. To this end, four unimpaired participants (22.5 ± 1.5 years old, 1.71 ± 0.06 m, 77.5 ± 14.81 kg) were enrolled in this study (Table 1), giving written content to participate in the experiment, as approved by the Institutional Review Board (application number 1902021698).
Three experiments were designed with the purpose of 1) calibrating IMUs to correct raw IMU readings, 2) identifying gait patterns through an Indoor walk in a lab environment, and 3) training and testing gait phase estimation models through an outdoor walk. All human subjects participated in all three experiments. Based on the statistical power analysis, 1068 steps are needed for a 95% confidence interval estimation, with a margin of error of ±3%. Each subject had at least 1100 steps (20 minutes of walking time) on the studied foot to validate the model and computed error.

1) IMU CALIBRATION
The two IMUs are calibrated to account for 1) the coordinate frame rotation between the IMU frames (IMU 1 and IMU 2 ) and the body frames (foot and shank), 2) scaling errors, and 3) axis misalignment errors of the IMUs [34]. Calibration parameters are used to transform raw IMU measurements (ω IMU 1,2 t , a IMU 1,2 t ) to foot (ω F t , a F t ) and shank (ω S t , a S t ) gyroscope and accelerometer measurements as shown in equations 1-4.
Ten optical marker cameras (OMC) (Miqus M5, Qualisys, Sweden) calibrate the IMUs using an extrinsic calibration scheme [28]. Moreover, eight optical markers are placed on the subjects' feet and shank (four each) to define the shank and foot frames. The IMUs are calibrated by exciting the gyroscope and the accelerometer axes in two steps: 1) the human subject raises the foot off the ground and moves their foot and shank linearly in all axes of motion (x, y, z) to excite the IMUs' accelerometers, then 2) with the foot still in the air, subject rotates foot and shank about all axes of rotation to excite the IMUs' gyroscopes. Each of the IMUs' axes is excited for 15 seconds in each calibration step. The OMC records the shank's and foot's orientations, translations, and body rates. Synchronized measurements from IMUs' (ω IMU 1,2 t , a IMU 1,2 t ) are fitted against OMC measurements based on continuous-time batch estimation to provide the transformation matrices (T Sw , T Sa , T Fw , T Fa ) between IMU frames and OMC shank and foot frames [35]. The bias parameters b ω IMU 1,2 , b a IMU 1,2 of the IMUs are estimated using Allan Variance analysis [36].

2) GAIT PATTERN GENERATION
A gait pattern representative of each subject's gait is produced using an indoor walk. The pattern focuses on relating how the foot IMU (ω F t , a F t ) behaves during heel strike (HS) and toe-off (TO). A walkway equipped with a force plate sensor (9260AA3, Kistler, Switzerland) is utilized to accurately detect and label HSs and TOs during the human subject's gait. The force plate measures force (F N ) on the 3-axis of motion. Subjects wearing the 2-IMU setup walked repetitively on the walkway, ensuring they stepped on the force plate in each run (Fig 1.c). Subjects walked for eight minutes, during which they performed straight and turning gait cycles, replicating their natural gait as much as possible. Heel strike is identified whenever the vertical force measurement on the force plate exceeds a threshold (2 N), and Toe-off is identified whenever the vertical force drops again below the threshold. The threshold is set based on empirically examining the data to find the instant of time the heel contacts the force plate.
An average gait length was assumed to be 1 second [37], and it takes 4 steps to go from one side of the walkway (embedded with the force plate) to the other, meaning subjects averaged approximately 125 steps on the force plate [38]. Taking more steps (longer time) would increase the strength of the gait pattern however can fatigue the subjects. HSs and TOs identified using the force plate are used to label (HS and TO) adjacent foot IMU (ω F t , a F t ) measurements as shown in Fig. 2. HS and TO patterns are saved for all steps on the force plate. Note that the gait pattern included HS or TO gait event, occurring at time zero and half a second before and after the gait event, respectively. The half a second before and after each gait event are selected to collect data from approximately half a gait cycle (assuming an average gait cycle of approximately 1 second [37]).
The mean of the generated patterns of the human subjects are shown in Fig 3; the standard deviation is also shown, representing the variance of ω F t , a F t during different gait phases performed in the eight-minute indoor walk. Increased variance is encouraged as it shows that the human subject replicated the various step trajectories in their natural gait.

3) OUTDOOR WALK
Upon completing the indoor walk, subjects walked outdoors while wearing the 2-IMU setup ( Fig. 1.b). Subjects were instructed to walk for forty minutes in any direction, stopping whenever needed. Fig 4. shows the shank and foot IMU measurements during the outdoor walking trial of one of the subjects.

C. GAIT MODEL
A gait model of each subject is used to predict their current gait phase. Note that the aim is to predict the gait phase using measurements from the shank IMU (ω S t , a S t ). On the other hand, the foot IMU (ω F t , a F t ) is only used to label HS and TO on data sets to find the gait phase of an outdoor walk.

1) LABELING OUTDOOR DATA SETS
To produce and train the gait model, previously generated gait patterns are used to identify the HSs and TOs of the outdoor walk on the foot IMU. HS and TO are labeled based on crosscorrelation (under MATLAB's xcorr2 function) between generated HS and TO patterns (Fig. 3) and measured foot IMU data (ω F t , a F t ), as shown in Fig. 5. To distinctively label HS and TO, the IMU data points that have the highest correlation to the pattern (peak) were selected. Note that HS and TO data points are labeled to ensure that peak points are at least 0.8 seconds apart (assuming 0.8 seconds is the minimum gait length [36]). Peak magnitude is at least in the 75th percentile of correlation between pattern and outdoor walk data point (under MATLAB's findpeaks function).
The labeled data set (Fig.5) is used to find the outdoor walk gait phase (θ). Linear interpolation between two adjacent heel strike points defines the gait phase. θ equals 0 to represent the 1st heel strike and 2π as the next heel strike (end of the gait cycle). Furthermore, θ is split into two phasors (γ ) as shown in equation 5 and Fig 6.a. γ = [sin (θ) cos (θ)] (5) where γ ∈ R 2 .

2) TRAINING MODEL USING PCA AND LINEAR REGRESSION
Twenty minutes of the subject's outdoor data is used to create a regression fit model of the gait phase phasors (γ ). The independent variables (X ) are the Z-score (Z ) of shank IMU accelerations and angular velocities (ω S t , a S t ) from one second before heel strike (400 samples) as shown in Fig. 6.b, c and equation 6.
where X ∈ R n×6 , n = number of data points.  PCA is used to select the principal components of X ; principal components remove noise by reducing many features to a smaller set of principal components. PCA is used because it compresses large amounts of information that speeds up the real-time implementation of models and prevents the overfitting of predictive algorithms [39]. PCA is used since it maximizes the variance of data, meaning we can reduce the number of features in the model while having a varied representation of the subject's gait cycle. Note since IMUs are sampled at 400 Hz, to avoid repetitive data samples, X is down-sampled to 10 Hz. 100 PCA coefficients (PC) are used to select principal features. Down-sampling to 10 Hz and selecting 100 PCA coefficients were empirically tuned under the constraints of the computer's processing power. 100 PCA coefficients are selected to reduce feature size to meet the constraints of the processing power computer. A high number of PCA coefficients increases the chance of overfitting; reducing the number of coefficients can eliminate important gait features. 100 PCA coefficients were selected experimentally by examining pre and post-PCA feature reduction figures. Down sampled features, X d are used to fit the model (β) as shown in equations 7-9.
where X d ∈ R n×6 , PC ∈ R 6×100 . The gait phase error (e γ ) of the model (β) is calculated using equation 10 (shown for a representative subject  in Fig 7.a). β is then used to predict the gait phase of the remaining twenty minutes of the outdoor walk, representing the validation data set. The gait phase error (e γ ) for the validation data set of subject D is shown in Fig 7.b.

A. GAIT PHASE ESTIMATION
The average gait phase estimation error (e γ ) for the validation data sets of the human subjects is shown in Table 2. The gait VOLUME 10, 2022   phase estimation error (e γ ) for each subject's validation data set was calculated based on a trained model from their gait pattern (Fig 3). Subject D had the highest gait phase estimation accuracy due to the varied gait pattern observed in the subject's recorded indoor walk. Fig 3.d shows that subject D had the highest standard deviation in HS and TO patterns, meaning the subject tried to replicate all the various steps she typically takes during an outdoor walk. This allows the generation of a fully representative model of the subject's outdoor walk. Conversely, the lowest tracking accuracy is subject C, the subject with the lowest deviation in their gait pattern (Fig 3.c); meaning their indoor gait wasn't fully representative of their gait characteristics.

B. GAIT PHASE ESTIMATION WITHOUT MOTION CAPTURE AND FORCE PLATE PATTERN
Due to similarities in HS and TO patterns between humans (based on foot IMU data) [19], [21], [22], large models of gait patterns can be used to estimate the gait phase of a human subject. This is useful for outdoor studies that are performed without a motion capture system and a force place, where it is unable to create a gait pattern for each human subject. Table 3 shows the gait phase estimation error (e γ ) for the subjects' outdoor walks when trained with other subjects' concatenated gait patterns. While the accuracy of gait phase estimation is reduced, it's still comparable to other gait phase estimation methods [17]. Increasing the number of concatenated gait patterns decreases the gait phase estimation error of the studied subject, as it's more likely that one of the concatenated patterns is representative of the studied subject's gait. To expand on this work, more gait patterns of multiple subjects can be concatenated and used to estimate the gait phase of subjects without a uniquely defined gait pattern. In addition, IMU calibrations can be done without a motion capture system through repetitive static calibration to correct IMU readings [40].

IV. DISCUSSION
The presented work was designed to help analyze the gait kinematics of healthy and impaired humans in a natural outdoor environment. The 2-IMU setup, besides the ability to accurately detect distinct gait events, provides a robust algorithm that does not require set subject-independent thresholds and labels outdoor gait events based on statistical correlation to ground truth data. Another advantage of the proposed work is that it offers an experimental setup that does not require special shoes/insoles or excessive sensors that can interfere with the subject's natural gait.
The outdoor gait events (HS and TO) are successfully identified based on a ground-truth pattern of each subject. A gait pattern for each subject is accurately defined based on a force plate precisely detecting heel strikes and toeoffs. Outdoor walk data was labeled with HS and TO gait events based on defined patterns. The proposed methodology is advantageous to researchers 1) as it allows them to label gait events harmoniously and with a known correlation to a ground truth measurement without the need for any external validation instrumentation, and 2) as it allows researchers to conduct gait tests outside of a proctored lab space, eliminating any bias related to the lab environment. Another implication of the proposed work is the continuous control of a lower limb prosthesis.

A. PROSTHESIS IMPLEMENTATION
The team previously designed a 2-DOF prosthesis [6] that was operated to detect heel strike upon the impact of the prosthetic ankle with the ground, creating a delay in launching the prosthesis's controller. This approach generally does not consider the users' change of speed or varying their gait length. The PCA-Based can be expanded to control the prosthesis by installing a foot and shank, IMU. In future work, the amputee users will wear a 2-DOF prosthesis and walk indoors to generate a gait pattern for their walk. Furthermore, amputee subjects will wear the prosthesis for a long duration to provide training and validation data sets for the linear regression model. This work can be expanded for real-time estimation of the gait phase, allowing control of powered prosthesis to have gait trajectories as a function of a timevarying gait phase.
In a limited study, an unimpaired operator carries the prosthesis and walks with it, as shown in Fig 8. The operator walks with the prosthesis for 10 minutes; 4 minutes is used to produce a pattern, 3 minutes to train the model., and 3 minutes to test the model (validation dataset).
Longer times weren't used, as it's difficult for the operator to carry the prosthesis for an extensive period. The HS and TO patterns of ω F t The gait phase estimation error of the training and validation data sets are shown in Fig. 9. Note that PCA estimation and down-sampling weren't used for this estimate due to the small data set size.
Nevertheless, estimation results were promising and needed to be expanded using amputated human subjects, as further deviation is expected. In future work, impaired subjects will use the 2-DOF prosthesis and walk around with it to generate a gait pattern similar to Fig 3. Impaired subjects will then walk outdoors at their natural gait speed. A gait model will be generated for each impaired subject. This is advantageous to amputees as it allows for improved autonomous interaction between amputee and prosthesis. To estimate the gait phase real time, shank data (ω S t , a S t ) from half a second before the gait event (200 samples) can be used to predict when the upcoming gait event. The linear regression model can predict the point of gait based on previous shank IMU data (ω S t , a S t ). A limitation and an ongoing study of this work is the lack of ability to identify gait activities. Gait activities include turning and incline (stairs) maneuvers. When subjects vary their indoor gait pattern to include several turning maneuvers representing outdoor turns, the gait model can still accurately detect HSs and TOs. For optimal results, subjects should try to replicate as many maneuvers typically done in outdoor walks during the indoor gait trial. Further studies are needed to identify and predict inclination activities; the team conducted a preliminary study using a previously designed ESKF to estimate stair activities [28].

V. CONCLUSION
This paper presents two IMU-based methods that can accurately estimate the gait phase of human subjects based on a one-second time history. The estimation method is based on initially producing a gait pattern representative of each human subject. To produce the pattern, human subjects carry out two experimental protocols: 1) IMU calibration to correct raw IMU readings, transforming readings to be representative of the body frame to the IMU is attached, and 2) an indoor walking experiment where subjects carry out continuous and varying gait cycles while being monitored by a motion capture system and a force plate. The motion capture and force plate readings are used to form the ground truth and are synchronized to foot IMU readings. They produce a gait pattern for heel strike (HS) and toe-off (TO) events. The pattern embodies the gait event and half a second of data before and after the event. Upon completing the indoor walk, subjects walk for 40 minutes in an unsupervised outdoor environment. Subjects walked for 40 minutes because 1) it's long enough to have adequate data sets for training and testing the gait model, 2) subjects didn't walk longer due to risks of battery draining, and 3) adhesives on human shank weakening due to sweat. An outdoor walk experiment is used as it is more representative of a human's natural gait than a lab environment or a treadmill. Data from twenty minutes of the outdoor walk generates a linear regression model representative of the subject's natural gait. The model is built on shank IMU readings; outdoor walk data is labeled with gait events (HS and TO) based on the pre-defined gait pattern of the subject. The model uses PCA estimation to narrow down the large data sets into a 100 representative features, remove any unnecessary noise, and limit its overtraining. The model estimates the gait phase of the remaining twenty minutes of the subject's outdoor walk.
Gait phase estimation error for each of the subjects is calculated. The average error for the subjects was 2.21 ± 0.258 %. Subject D reported the lowest error due to having the highest variability in the original gait phase pattern, meaning that the subject's pattern included a wide variety of all the steps the subject took in the outdoor environment. The lowest estimation accuracy was reported in subject C, the subject with the lowest variability in their indoor pattern walk (Fig 3). Our team also proposed a method that allows groups without a force plate and a motion capture system to replicate the described work with high accuracy. Concatenated patterns from other human subjects can be used to label outdoor data from a subject that does not have a representative pattern. The more subjects represented in the concatenated pattern, the higher the correlation confidence between the pattern and unlabeled outdoor data. Further application of the proposed work is the autonomous control of an active lower limb 2-DOF prosthesis.
The proposed work allows researchers to study the gait phase of human subjects in an unsupervised outdoor environment. This is advantageous over lab and treadmill testing as it's a closer representation of a human's natural gait and allows subjects to varying speeds as needed. Furthermore, in terms of labeling gait events (HS and TO), the proposed work offers a unique methodology that does not require fixed thresholds, external sensors (force sensors), or special shoes (sensor-embedded insoles). The labels for HS and TO are based on statistical correlation confidence that is robust to any human subject and does not require embedded sensors that can impose the subject's natural gait. Additionally, the proposed methodology may offer opportunities for extended gait experiments without requiring fully equipped laboratory settings to replicate work. Finally, the methodology can be expanded to aid patients in gait training as proposed regression models can be run offline and in real-time. Presented work can also control wearable assistive devices to offer seamless interaction between humans and robotic devices.

VI. AVAILABILITY
The MATLAB and C deployment of the gait model training and experimental raw data described in the Methods section is available through (https://github.com/hirolab/gaitphaseestimation ).