Gait Identification Using Limb Joint Movement and Deep Machine Learning

Person identification is a key problem in the security domain and may be used to automatically identify criminals or missing persons. The traditional face matching approaches adopted by the police and security services across the world have recently been shown to produce a high rate of false positive identification. Alternatively, gait-based person identification has shown to be a convenient method particularly as it can be performed at a distance, without the cooperation of the subject, and is a biometric trait which cannot be easily disguised. In this work, we propose a gait-based person identification approach which uses limb joint motion data and deep machine learning models to identify the individuals. Distinct statistical features are identified and extracted from limb movement using a fixed width sliding window to train a Long Short-Term Memory model. The proposed solution outperforms the existing methods producing 98.87% accuracy when evaluated over unseen samples. In addition, we propose a simple two-stage filtering approach to increase the prediction accuracy up to 100% when identifying individuals from larger sequences of samples. This finding may improve the current solutions in controlled environments such as airports. In the future, this approach may help to overcome the problem of occlusion in gait-based identification, as unlike the existing works, it does not require information regarding the entire body. The study also presents a primary dataset comprising limb joint movement acquired from a diverse range of participants during casual walking captured through two digital goniometers.


II. RELATED WORK
For decades, psychologists have been able to demonstrate that 95 people are able to recognise individuals based solely on the 96 way that they walk [11], [12]. Early attempts at technological 97 gait analysis included rule-based systems such as in [13], 98 however, very little work on technological approaches to gait 99 identification were reported before machine learning became 100 prominent, after which, significant advances have been made. 101 Several existing works have explored machine learning 102 approach to gait identification, as shown in Table 1. For 103 instance, [14] used a clustering algorithm in combination with 104 an accelerometer-based device to learn the users' gait pattern 105 when they first begin to use the device. Then K-Nearest 106 Neighbour (KNN) clustering is used to classify whether 107 detected footsteps belong to a known and approved user of 108 the device or not [14]. This approach is appropriate where it 109 is necessary to confirm the access rights of a single person 110 in a one-vs-all fashion (e.g., to grant or refuse access to a 111 device), however, it is not appropriate in its current form 112 for problems where people must be identified from a known 113 list of individuals (i.e., multiple class classification). Despite 114 this, an unsupervised approach has been demonstrated for 115 gait analysis in [15] which makes use of Rapid Centroid 116 Estimation (RCE) for clustering, and also in the detection 117 of neurological diseases such as Parkinson's [16]. Similarly, 118 a joint-based approach has shown to be useful in areas such as 119 medical gait analysis [23], [24]. Moreover, an unsupervised 120 learning approach may be inconvenient to the gait identifica-121 tion problem as identities are not labelled during the cluster-122 ing process. Alternatively, supervised machine learning can 123 conveniently label the identified classes. 124 A variety of supervised machine learning models have 125 been deployed for gait identification that include Artificial 126 Neural Networks (ANN), Support Vector Machines (SVM), 127 Long Short-Term Memory (LSTM), and KNN as described 128 in Section III. For example, in [17] an LSTM model is used 129 to classify accelerometer data to identify participants, the 130 results are compared to a traditional approach which uses 131 hand-crafted features and a random forest model. The results 132 suggest that the LSTM model outperforms that of the alter-133 native random forest approach when using the hand-crafted 134 identification task in the event of partial occlusion. To the best 191 of the authors' knowledge, none of the existing works inves-192 tigate movement data from individual joints for the purpose 193 of gait identification. 194 The existing literature also provides examples of alterna-195 tive sensors for the purpose of gait analysis and identification. 196 For example, in [28] and [29] pressure sensors are attached 197 to foot in order to analyse the wearers' gait pattern.  ilarly, in [30] pressure sensors are placed on the floor to 199 identify people walking across it. Alternatively, in [31] and 200 [32] information regarding footsteps is acquired from sounds, 201 the work presented suggests that such an approach may be 202 applicable to the problem of person identification. Despite 203 such solutions demonstrating some degree of success, they 204 are not practical outside of a controlled environment, nor 205 are they able to provide information regarding specific body 206 parts, unlike wearable IMU sensors. Furthermore, due to 207 these limitations, these sensors are not appropriate for proto-208 typing solutions where the aim is to adapt them to computer 209 vision problems, as they do not provide information which 210 can currently be retrieved using existing computer vision 211 methods. 213 The proposed gait identification solution is a composite of 214 several tasks mainly related to the data science cycle as 215 shown in the Figure 1. In the first step, the IMU signal for 216 the body-joints is acquired from real walking sequences of 217 30 participants. Then the raw data is pre-processed (e.g., 218 cleaned, standardised etc.) using an overlapping (50%) fixed-219 width window (equivalent to the duration of the average gait 220 cycle). A unique feature vector is then extracted through the 221 overlapped windowing of the data resulting in a large number 222 of statistical features (442 in total). A similar approach is 223 taken in [17], [33], and [34] where it is expected that win-224 dowing data in this manner will allow LSTM to learn tempo-225 ral dependencies between windows which provide complete 226 information regarding a full gait cycle at a given time.

227
The pre-processed data is then partitioned into two subsets; 228 the training subset contains 80% of the data per participant 229 and is used to train the various machine learning models 230 required. The remaining 20% is reserved and not exposed to 231 the machine learning models until the testing phase. As this 232 is a person identification problem, it is necessary to provide 233 both training and testing data for each participant as such 234 models are unable to identify specific individuals if they have 235 not previously been trained using data associated with that 236 individual.

237
From the 442 features available, the 30 most important fea-238 tures are identified using feature selection methods including 239 Principal Component Analysis (PCA), the Boruta algorithm, 240 and Recursive Feature Elimination (RFE). In the next step, 241 an LSTM model is trained to identify the relevant participant 242 using the extracted features. Finally, the model is evaluated 243 with varying experimental configurations using the reserved 244 and previously unseen test data (20% of the original data per 245 VOLUME 10, 2022   Accelerometers measure acceleration the acceleration of the 266 device, gyroscopes measure angular velocity of the device, 267 and magnetometers measure the magnetic field of the Earth 268 [10]. In this work, the device was attached in two positions, 269 as shown in Figure 2. In the first scenario (Figure 2a), the 270 sensor is attached to the right leg above the knee to gain 271 movement data regarding the hip joint while in Figure 2b, the 272 sensor is attached to the right arm above the elbow to obtain 273 movement data regarding the shoulder joint. An adjustable 274 strap is used to comfortably and firmly attach the sensor to 275 the participant, requiring less than 10 seconds to secure the 276 sensor to the participant using the provided clip which can be 277 instantly detached using a button on the clip if required. These 278 joints were chosen for their prominence in the gait cycle, and 279 to provide joints from more than one limb.

280
Participants were required to perform 12 walking 281 sequences; each was 8 metres in distance and took the partic-282 ipants approximately 6 to 8 seconds on average to complete 283 depending on their walking speed. This provided a total of 284 240 walking sequences in total, providing approximately 285 1,680 complete gait cycles, and 120,000 labelled samples. 286 The raw data along with the extracted features is available in 287 the supplementary materials (S1).

289
Once the dataset is acquired, the next stage is to extract 290 distinct features from the raw data which can be used for 291 the efficient identification of individuals. Firstly, windowing 292 is performed on the recorded data. An average complete 293 gait cycle is empirically selected as an optimal fixed-width 294 window size as shown in Table 3. The average gait cycle is 295 determined by sampling 10 gait cycles from each of the par-296 ticipants in the dataset, the result of which is 0.965 seconds, 297 Table 3 suggests that the optimal window size is equal to the 298    Table 3 also suggests 301 that a sample size of 5 is optimal.

302
For each overlapping window segment, a feature vector is 303 generated by combining each of the 17 data points provided 304 by the sensor, as described in Table 4, with 13 statistical 305 features, as described in Table 5. As a result, a vector con-  Table 5 describes the statistical features calculated in each 311 window for each of the data points described in Table 4,  Table 5 provide a range of temporal and spatial 317 features when combined with the data points in Table 4. For 318 example, the maximum, minimum, and mean motion degree 319 provides information regarding the stride length, and kurtosis 320 and skewness provide information regarding the spread of the 321 data. Furthermore, Table 5 [39]. PCA can be 341 used to select the features with the highest importance to 342 the classification task by selecting for high variance, thus 343 ensuring variety in the feature vector [40]. The correlation 344 coefficients between the statistical features extracted from 345 gait data and the principal components (obtained through 346   features which are deemed statistically less important [42].

362
The less important features are identified as those which are 363 statistically less relevant than random probes [42]. A detailed 364 explanation of the Boruta algorithm can be found in [42].  Table 4 and 377 Table 5 respectively.

378
As shown in Table 6, the most commonly appearing sensor  problems which involve long sequences of data [45]. The 393 main difference between LSTM and RNN is the construction 394 of the LSTM which contains three components. An LSTM 395 cell contains a forget gate which controls how much infor-396 mation is retained, an input gate which updates the values 397 contained in the hidden states, and an output gate which 398 updates the cells output value [45]. For the given task of per-399 son identification, the LSTM model is an appropriate model 400 because of the time-series nature of the data involved and the 401 ability of LSTM to learn temporal dependencies between data 402 samples [46]. 403 Table 7 presents the LSTM model configuration imple-404 mented in this work; the configuration was selected empir-405 ically by repeating the experiment with additional layers 406 until the optimum configuration was found. The optimum 407 configuration contains four layers including an LSTM layer, a 408 dropout layer with a dropout rate of 50% to help prevent over-409 training, and two dense layers. The final layer has an output 410 shape of 30 to allow for classification of the 30 participants 411 included in this work. The final output is provided as a single 412 output feature with the range of 1 to 30.

414
To provide a comparison to the LSTM model which has been 415 implemented in the proposed solution, a variety of popular 416 machine learning models have also been implemented for the 417 classification task. These include an ANN, KNN, and SVM. 418 The following subsections will describe these models and 419 their implementation. ANNs superficially resemble the neural networks of the 422 human brain. In this work a feed forward ANN has been 423 implemented. In feed forward ANNs, the connections move 424 in one direction only (i.e., forward), unlike RNNs there are no 425 loops in the model which means that inputs are considered in 426 isolation and not in combination with any prior or subsequent 427 inputs [47]. Each node in an ANN computes a function based 428 on its inputs, the result of this function is then passed on to 429 the next nodes in the model [47]. In this work a feed forward 430 ANN has been implemented to classify each of the windowed 431 samples.  [48]. KNNs retain the entire training set, 436 so classification simply involves assigning the majority label 437 of a data points neighbours [48]. In this work, as in [29], the 438 10 nearest neighbours of the data point to be classified are 439 examined when performing classification. The KNN algo-440 rithm is provided in [48].
The SVMs perform classification tasks by providing an opti-443 mal ''hyperplane'' which separates the members of one class 444 from another [49]. The hyperplane may then be used to 445 predict the most likely classification label for previously 446 unseen data points [49]. Russell   for one individual), therefore making a single classification 493 prediction for the entire sequence as opposed to making 494 predictions per sample. This is performed by replacing all 495 sample predictions for that sequence with that of the mode. 496 Both stages of the filtering algorithm are evaluated using the 497 aforementioned reserved test data.

499
This work contains three main experiments, the first uses the 500 sensor data for the leg only, the second uses the data for 501 the arm only, and the third uses synchronized data for both 502 the leg and arm. In addition, a further three experiments are 503 provided using the most important selected features, rather 504   Table 5 for each of 529 the 17 data points provided in Table 4, 221 potential features  Table 5 for each of 543 the 17 data points provided in Table 4, 221 potential features

565
Following the experimental design (Section IV), statistical 566 results are retrieved from multiple experiments and evaluated 567 using the reserved, previously unseen test data as described 568 in Section III. For each experiment, results of the proposed 569 LSTM model are provided in addition to the alternative 570 models described in Section III, to provide a comparison. 571 Furthermore, the results of both stages of the two-stage filter-572 ing algorithm (i.e., the moving average and mode columns), 573 as detailed in Algorithm 1, are presented. Both stages of the 574 filtering algorithm are evaluated using the reserved test data. 575 The first stage of the filtering algorithm, the moving average, 576 is evaluated on a per-sample basis (i.e., many samples per 577 walking sequence). Whereas, due to the nature of the second 578 stage, the mode metric, this is evaluated on a per-sequence 579 basis (i.e., one prediction per walking sequence). 580 Table 8 presents the results of Exp.1-A, which was com-581 pleted using leg data only, utilizing all 221 available features. 582 From Table 8 the highest accuracy was achieved using the 583 proposed LSTM approach with an accuracy of 97.3% when 584 evaluated using purely unseen test data. However, the addi-585 tional metrics, precision, recall, F1 score, and Cohen's Kappa 586 are all highest for the SVM model, suggesting that this may 587 provide a more stable and balanced classification in terms of 588 a reduced rate of false positives. 589 Table 9 presents the results of Exp.1-B, which was also 590 completed using the leg data only, however, only the 15 most 591 important features were used in this experiment. As in the 592 results from Exp.1-A, the proposed LSTM approach provides 593 the best accuracy. Furthermore, when using only the top 594 15 as in Exp.1-B, all models report slightly increased results 595 compared to using all features as in Exp.1-A (see Tables 596  8 and 9), suggesting that an appropriate subset of the original 597 feature vector have been correctly selected. 598 Table 10 presents the results of Exp.2-A which was com-599 pleted using the arm data only, using all 221 available fea-600 tures. As with Exp.1-A, the proposed LSTM architecture pro-601 vides the highest accuracy with 98.9%, the highest reported 602 across all experiments, was achieved when evaluated by 603 using the previously unseen test data. As with experiment 1, 604 the SVM model provided higher scores for the remaining 605 metrics.  Table 11 presents the results of experiment Exp.2-B, which 618 was also completed using the arm data only, using only the 619 15 most important features. As in the results from Exp.2-A, 620 described in Table 10, LSTM provides the best accuracy, 621 as shown in Table 11. Furthermore, when using only the  however, all alternative models report lower accuracies as 639 compared to when using the full feature set.  In summary of the existing works' results, the accura-649 cies presented in Table 14 shows that the proposed method  provided the highest accuracy, this is likely due to the more 663 varied arm movement displayed by people, as discussed in 664 Section IV. This suggests that there is an opportunity to fur-665 ther explore arm movement and joint movement to contrast 666 the traditional approaches which focus on either the legs or 667 the trunk of the body.

669
As shown in Section V, each trained model (when evaluated 670 over unseen instances) indicated a higher accuracy for the 671 arm data (Exp.2-A) as compared to the leg data ( Exp.1-A). 672 This is likely due to the more obvious differentiations in 673 arm movements that can be visually observed. For example, 674 some people are rigid and have very little arm movement 675 when they walk, some people allow their arms to naturally 676 swing with the rhythm of their walking, and some people 677 use more energy and swing their arms more flamboyantly as 678 they walk [11]. 679 Furthermore, it can be observed that overall, the results of 680 Exp.2-A, arm data only, provided higher accuracy than those 681 of Exp.3-A, the combination of leg and arm data. This was an 682 unexpected result as it was anticipated that the unique traits 683 of both limbs would provide more unique gait traits for each 684 participant.

685
As presented in Section V, Table 14 compares the accuracy 686 of proposed solution to those of similar IMU-based gait 687 identification works from the literature. In [19], movement 688 data was gathered from 24 participants via an IMU device 689 attached to the ankle, containing an accelerometer and gyro-690 scope, but unlike in the proposed work, it does not include 691 a magnetometer which would provide additional features. 692 A CNN is used in [19] to perform feature extraction, from 693 which an SVM classifies participants. An accuracy of 80% 694 is reported when provided with data containing 5 complete 695 gait cycles. In contrast, we use window size equivalent to a 696 single complete gait cycle and a sample size of 5 windows 697 achieving an accuracy of up to 98.9% using a similar amount 698 of data, as shown in Table 10. Furthermore, the use of a CNN 699 for feature extraction in [19], as compared to our hand-crafted 700 features, will limit the explain-ability and interpretability of 701 the model [51]. 702 Similarly, in [20] IMU data was gathered from 30 partici-703 pants via a smartphone attached to the trunk of the body. As in 704 [19], magnetometer and gyroscope data was not included, 705 both of which may provide useful features, as demonstrated 706 in our proposed solution. As described in section II, attaching 707 the sensor to the trunk means that only generic whole-body 708 data is provided, and does not allow for the evaluation of indi-709 vidual limbs and joints, unlike the proposed work. Further-710 more, the approach described in [20] used the same number 711 of participants as in the proposed solution but achieved only 712 80.3% accuracy, considerably lower than that the 98.9% we 713 report in Table 14. 714 Unlike the other works described in Table 14 .1-B).    an accuracy of 91% is reported, which is substantially lower 718 than the 98.9% achieved by the proposed work which con-719 tains 30 participants and requires fewer sensors.  Again, unlike the statistical features used in the proposed 726 work, CNN-based feature extraction would likely make the 727 model less explainable. Furthermore, only the preprocessed 728 dataset is available, not the original raw data, thus limiting the 729 ability to reuse or experiment the original data. Furthermore, 730 similar to [20], the sensor used in [21] was attached to the 731 trunk. Again, this provides motion data for the body as a 732 whole and does not allow for limb or joint-based approaches 733 to be implemented or evaluated. A further limitation with 734 the dataset used is the fact that only 2.5 seconds of data is recorded per participant, thus limiting the number of com-736 plete gait cycles [52]. In comparison, we provide approximately 60 seconds of data per participant in our primary  Such an implementation may also help to address the prob-772 lem of occlusion that affects many of the current computer 773 vision-based approaches to gait identification, this is possible 774 as except for in cases of total occlusion, there is likely to 775 be at least one or more body joints visually available for 776 the purpose of gait-based identification. As demonstrated in 777 this work, a single joint may provide enough information to 778 accurately identify an individual.

779
It is also important to note that the related problem of gait 780 identification when unknown people have been detected (i.e., 781 those not in the training set or those that we do not seek 782 to identify) should be addressed to improve the real-world 783 implication of the proposed solution. For example, when 784 searching for missing people, the solution should exclude 785 people who are not on the missing persons list (i.e., do 786 not attempt to identify them). To the best of the authors' 787 knowledge, this is an important area of research which has 788 gained little attention and therefore a robust solution is not 789 currently available. One potential solution, OpenMax [53], 790 is an alternative final layer for machine learning models 791 which aims to estimate the probability of a sample belonging 792 to an unknown class. It achieves this by adapting SoftMax and 793 removes the requirement for probabilities over all classes to 794 sum to 1 and includes a category for the unknown classes [53]. 795 However, the implementation and evaluation of this would 796 require the collection of gait data from additional participants 797 to act as the unknown classes, as to the best of the authors' 798 knowledge, such an open set gait dataset is not currently 799 available.

800
Furthermore, the authors are not aware of any existing 801 works exploring how person identification can be performed 802 using pathological gait (i.e., gait abnormalities caused by 803 pain, reduced range-of-motion, or weakness, for example) or 804 where the person has aged significantly since their gait sam-805 ple was collected. As current works utilize datasets captured 806 over a relatively brief period, the data provides information 807 regarding the person's gait at the time of recording, and there-808 fore does not account for future, ageing, injury, or illness. 809 Matovski et al. [54] report that the effect of time between 810 recording session has less impact on identification accuracy 811 than other factors such as clothing. However, the dataset 812 collected contains only a single six-month gap between 813 recordings, and the authors are not aware of any alternative 814 gait datasets with time gaps between recordings. An initial 815 approach to further explore this problem would require the 816 collection of a gait dataset recorded at significant intervals 817 (i.e., years) containing instances of pathological gait. It may 818 then be possible to identify unaffected gait features which 819 may aid in person identification from pathological gait data. 820 Solving this problem would potentially create an identifica-821 tion system capable of identifying individuals long after their 822 gait samples have been collected, without the requirement of 823 updating the gait data for each participant.

825
This work has demonstrated that both the hip and shoulder 826 joint movements on their own as well as in combination 827 possess sufficient information to provide accurate gait-based 828   Table 6. Moreover, the most important 842 features contain features from two of the IMU sensors (gyro-843 scope and accelerometer). Therefore, including a variety of 844 sensors will likely be beneficial for future works, as this is  The authors wish to extend gratitude to all the participants 894 who volunteered to be included in our gait dataset. include several areas the first is the use of tech-1148 nology to enhance learning, specifically he is interested in how game like 1149 environments can be used to promote learning and to motivate learners to 1150 engage in their studies. He has a growing interest in the security and appli-1151 cation of 5G wireless technologies. He has acted as a reviewer for numerous 1152 conferences and journals and a reviewer for the HEA in funding bids for 1153 technology enhanced learning. He is a member of the British Computer 1154 Society.