Gait Recognition With Wearable Sensors Using Modified Residual Block-Based Lightweight CNN

Gait recognition with wearable sensors is an effective approach to identifying people by recognizing their distinctive walking patterns. Deep learning-based networks have recently emerged as a promising technique in gait recognition, yielding better performance than template matching and traditional machine learning methods. However, most recent studies have focused on improving gait detection accuracy while neglecting model complexity in the deep learning domain, making them unsuitable for low-power wearable devices. Therefore, inference from these models results in latency due to calculation overhead. This study proposes an efficient network suitable for wearable devices without sacrificing prediction performance. We have modified the residual block and accumulated it in shallow convolutional neural networks with five weighted layers only for gait recognition and proved the efficacy of all the architectural components with extensive experiments over publicly available IMU-based datasets: whuGait and OU-ISIR. Our proposed model outperforms all the state-of-the-art methods regarding recognition accuracy and is more than 85 percent efficient on average in terms of model parameters and memory consumption.


I. INTRODUCTION
Biometrics is the process of automatically identifying an individual based on physiological or behavioral characteristics that are highly unique, stable, and easily obtained [1]. Physiological biometrics are concerned with the shape of the body, such as the human face [2], fingerprints [3], iris [4], etc., while behavioral biometrics are concerned with the pattern of behavior of a person, such as keystrokes, gait, signatures, etc. Many physiological biometrics have been commercially deployed. However, some of these biometrics are intrusive to users since they rely on the active participation of the users to collect data [5]. For example, users may be asked to place a finger on a gadget to take their fingerprints or stare at a camera close enough to have their irises photographed. In such circumstances, a user may feel insulted and quickly understand that his or her identity is being scrutinized [6]. Moreover, physiological biometrics has several insurmountable flaws. First, sensors for acquiring physiological characteristics (e.g., fingerprint The associate editor coordinating the review of this manuscript and approving it for publication was Mohamad Forouzanfar . scanners, cameras) are costly and large. Second, biometrics can be falsified and hacked in some cases. Face recognition, for example, may be tricked by using a picture or video of the target face [7], [8]. Finally, if any unlocked device is lost, there is a risk of revealing private data to strangers [9].
Gait is a behavioral biometric that refers to the walking posture of a person [10], which is very difficult to duplicate or copy [1], [2], [11]. The identity identification method based on gait is dynamic, real-time, and continuous in nature and does not require direct user participation and has a high level of security [5], [9]. In addition, as microelectronics technology has advanced, practically all the wearable intelligent devices (WIDs) have been integrated with the inertial measurement units (IMUs) because of their low cost, compact size, and low power consumption, which enables the researchers to collect gait information using the built-in IMUs in the WIDs and authenticate the users [12]. Inertial sensors, such as accelerometers and gyroscopes, are used to record the inertial data created by the movement of a walking body in inertia-based gait recognition approaches. As the sensors record gait dynamics, the inertial data effectively extracts walking patterns [13].
Using IMU sensors, the effective use of wearable devices for gait recognition requires efficient recognition networks with minimal computation overhead. Therefore, we had to encounter two major challenges: (a) designing a lightweight model suitable for low-powered wearable devices; and (b) achieving state-of-the-art performance in gait recognition, which is particularly difficult for datasets with a large number of subjects. Our study overcame all these challenges, and the main contributions are as follows: • We proposed an efficient residual convolutional neural network for gait recognition, which is suitable for wearable devices and outperforms all the state-of-theart methods on multiple publicly available IMU-based datasets.
• We modified the residual block by using non-linear activation function before the batch normalization layer and showed the superiority of the proposed residual block by comparing it with the existing residual blocks. Finally, we designed a novel shallow convolutional neural network using the proposed modified residual block to build the lightweight model.
• Furthermore, we demonstrated the efficacy of all architectural components in the proposed lightweight model through extensive ablation, insertion, and modification experiments. The rest of this paper is organized in the following manner: Section 2 presents previous works related to this study. The dataset is described in Section 3, along with the proposed residual learning recognition methodology. The experimental analysis and performance evaluation of the models are presented in Section 4. Finally, this paper concludes by emphasizing our contributions in Section 5.

II. RELATED WORKS
Gait recognition with sensors can be done in several ways, including with sensors on the floor, shoes, and body [14]- [16]. The inertia-sensor-based approaches are the most appealing among these methods and their variations because inertial sensors can be put on the body to collect movement characteristics in great detail [17]- [23], and the acquired time-series gait data can be used to identify and authenticate people [24], [25]. Template matching and machine learningbased methods are the two primary ways of IMU-based gait recognition [26], [27]. The user is identified using template matching methods by comparing the gait templates stored in WIDs [28]. If the resemblance exceeds a predefined level, the user is accepted as authentic. Dynamic time warping (DTW) [29], Pearson correlation coefficient (PCC) [30], and cross-correlation [31] are commonly used methods for calculating resemblance. Many earlier studies have looked at various template matching algorithms [32]- [38], and good results have been produced under controlled laboratory circumstances [39].
In their research, Ailisto et al. presented a signalcorrelation approach for inertia-based gait identification where the recognition was done using template matching and cross-correlation computing [40], [41]. Following this research, Gafurov et al. made numerous significant improvements [16], [37], [42], [43]. In [43], they looked at the gait biometrics of the minimal-effort impersonation attack and the closest person attack. By inserting an accelerometer sensor into the pocket of the user, they were able to collect 300 gait sequences from 50 participants and achieve an equal error rate (EER) of 7.3% [37]. In [16], they tested user authentication using the foot, pocket, arm, and hip and discovered that a sideways motion of the foot makes the most difference and that a different portion of the gait cycle often leads to a different level of discrimination. DTW was used by Liu et al. to match gait curves [44].
As an improvement to this work, the wavelet denoising and gait-cycle segmentation techniques were introduced in their later work [45]. Trivino et al. proposed a method for modeling the perception of signal evolution using a fuzzy finite state machine (FFSM) [46]. Zhang et al. presented a method for avoiding cycle detection failures and phase misalignment between cycles [36]. Derawi et al. improved the gait-based authentication by providing a stable cycle detection mechanism [47] along with thorough comparisons.
Due to the fast growth of mobile devices in recent years, the accelerometer and gyroscope have become increasingly available on smartphones [48] and smartwatches [49]. In a variety of scenarios, such as person authentication [50]- [52], medical analysis [53]- [55], and impersonation-attack defense [56], smartphones have been used for gait recognition [57]- [59]. Data can be collected by keeping smartphones in participants' pockets [50]. However, template matching methods need to detect the gait cycles to construct the gait template, and test samples [39]. Gait cycle identification is difficult since it is sensitive to noise and device placements. Changes in pace, road conditions, and device position are all likely to produce gait cycle detection failures or intercycle phase misalignment, resulting in incorrect recognition results [36], [57], [60], [61]. Though there is currently no standard for manually extracting distinguishing gait features [62], Xu et al. proposed an adaptive preprocessing algorithm for extracting the effective components from gait data, and tested it on four publicly available datasets and three different neural networks [63]. To acquire good results, researchers must have extensive professional knowledge and experience in related domains, as well as go through data preprocessing, feature engineering, and continual experimental verification and improvement, which takes time and effort [64].
Gait recognition was performed using machine learning approaches that extracted and classified the unique properties of gait signals into separate classes [65]- [67]. For gait identification, previous research employed support vector machines (SVM) [68]- [70], k-nearest neighbors (KNN) [71]- [73], and random forests (RF) [68], [69] and found that these performed better than the template matching approaches. The manually derived features used in machine learning-based approaches had a significant effect on the recognition accuracy of these models. In recent years, deep learning has seen much success in the fields of secure computing [74], [75], and activity recognition [76]. Gait identification based on deep learning approaches performed better than classic machine learning-based methods [57], [77], [78]. According to recent studies, the application of deep learning approaches to research gait identification has become a promising new trend [6], [26], [27], [61], [62], [79], [80]. Gadaleta and Rossi [26] used convolutional neural network (CNN) for gait recognition. They created an IDNet framework based on CNN and one-class SVM [81] for user identification and authentication, utilizing data obtained by smartphones' accelerometers and gyroscopes. The PCA [82] approach was used to lower the dimension of the gait features after they were extracted using a threelayer CNN. The features were then used to identify and authenticate users using the one-class SVM. Compared to manual feature extraction, their findings demonstrated that CNN automatically acquired more useful features and performed better. In [83], three deep CNNs were built for gait detection, utilizing the users' gait energy images (GEIs) as input. To boost classification accuracy, feature maps from different convolutional stages were combined. Deep CNNs with contrastive loss and triplet ranking loss were proposed by Takemura et al. for cross-view gait recognition, and better performance in person authentication and identification were obtained [84]. Elharrouss et al. could be able to recognize the gait with high accuracy using extracted GEIs and multitask convolutional neural network models [85]. Gul et al. proposed a 3D CNN architecture for gait recognition using a holistic approach in the form of GEIs [86]. Liu et al. presented a lightweight double-channel depthwise separable convolutional neural network (DC-DSCNN) model for gait recognition for wearable devices which could classify gait with high accuracy using a lightweight model [87].
Traditional feature selection and machine-learning methods like PCA, Bayesian classifier, and SVM can also be integrated with CNN [26], [88], [89]. In [90], CNN was used to process three-dimensional data that included pictures and optical flow information in order to recognize gait and activity. In [76], to take the temporal characteristics into account, a series of 2D images were combined into 3D data, and 3D convolution kernels were used to derive activity recognition characteristics. In the experiment performed by Donahue et al., LSTMs and CNNs were integrated for activity recognition [91]. Yu et al. built a gait feature extractor using a generative adversarial network (GAN) to decrease the influence of view angle, weight, and clothes [92]. Chen et al. used the Multi-view Gait Generative Adversarial Network (MvGGAN) to produce synthetic gait samples in order to augment existing gait datasets, which provide sufficient gait samples for deep learning-based cross-view gait recognition methods [93].
Deep learning-based gait recognition methods yield better performance than template matching and machine learning methods. However, there is an apparent flaw with them: the models are too sophisticated with a large number of model parameters for wearable intelligent devices with limited computational power and capacity [6], [39]. Therefore, inference from these models result in latency due to calculation overhead. It is evident that we have to make the models more efficient by constructing simpler models with minimal computation overhead, i.e., fewer parameters without sacrificing prediction performance to make them suitable for wearable devices. Residual learning was firstly proposed for computer vision tasks to train very deep convolutional neural networks. We have successfully adopted residual learning for gait recognition, accumulated in shallow networks with a minimal number of parameters, and proved the efficacy of all the components of the architecture with extensive experiments. Our proposed model has achieved state-of-the-art accuracy in multiple widely used IMU sensorbased gait datasets and is more than 85% efficient on average in terms of parameter and memory consumption than the latest gait recognition study [39].

A. DATASET DESCRIPTION
We used two publicly available gait datasets, namely the whuGait dataset and the OU-ISIR dataset. Both of them are sensor data collected using accelerometer and gyroscope. Ngo et al. from Osaka University released the OU-ISIR dataset, which is the most extensive public gait dataset with the most number of subjects [94]. Zou et al. from Wuhan University provided the whuGait dataset. They also preprocessed both of the datasets, benchmarked with train set and independent test set and shared them in their research [6].
The whuGait dataset comprises inertial data from 118 persons acquired using smartphones in an unconstrained environment with no knowledge of when, where, or how the subjects walked. The sensors sampled at 50 Hz, with each sample containing 3-axis accelerometer and 3-axis gyroscope data [6]. In this study, we have used four whuGait datasets namely Dataset #1 to Dataset #4 for gait recognition. The number of participants, data segmentation method, whether or not the samples overlap, and sample size are the most significant variations across the four datasets.
The OU-ISIR dataset consists of 744 subjects. Among them, 389 are males, and 355 are females, with a broad age range of 2 to 78 years. This dataset recorded Gait signals at 100 Hz via a waist belt-mounted central IMU. On a flat area, each of the 744 participants walked for 9 meters [6]. The 3-axis acceleration and 3-axis angular velocity data were acquired from accelerometer and gyroscope. Summary of the datasets used in this study is provided in Table 1.

B. PROPOSED METHODS FOR GAIT RECOGNITION 1) MODIFIED RESIDUAL BLOCK BASED LIGHTWEIGHT CNN
Since the inception of the residual learning, residual blocks and skip connections have been an integral part of training very deep neural networks with hundred to thousand VOLUME 10, 2022 TABLE 1. Summary description of the datasets used in this study [6], [39]. layers [95], [96]. Besides very deep architectures, residual learning can be effective and efficient in shallow architectures in many causes. In this study, we have designed a lightweight residual convolutional neural network with two residual blocks using a total of five convolutional layers only. The proposed residual block is created by modifying the original residual block [95]. Contrasting with [95], we have used nonlinear activation before the batch normalization operation. We assume that if we insert the batch normalization layer before an activation layer, the batch normalization layer may fully control the statistics of the input going into the next layer and yields better accuracy. Therefore, we have proposed our residual block with batch normalization after non-linearity. In order to incorporate non-linearity, we have investigated the performance of four widely used activation functions: Rectified Linear Unit (ReLU) [97], Exponential Linear Unit (ELU) [98], Leaky Rectified Linear Unit (LeakyReLU) [99] and Parametric Rectified Linear Unit (PReLU) [100] in our study.
The high-level architecture of the proposed residual block and the model is depicted in Fig. 1. In the model, two subsequent residual blocks incorporate a weight layer in the middle to increase the feature maps. Here, the weight layers are the one-dimensional convolution layers, and the activation refers to the various non-linearity from the ReLU family. The number of perceptrons with softmax activation in the dense layer is likely to change with respect to the number of subjects in the dataset. Table 2 contains further architectural details of the proposed model. The operation, x conv, y refers that the residual field size is x for convolution operation and y number of filters and /z refers to global average pooling across z channels. The operations, output shape and the number of parameters for all the layers are about to be changed with respect to datasets as the complexity of dataset and the number of subjects varies (see Table 1). In Table 2, the Operation, Output Shape and No. of Params. column has been populated depending on Dataset #1 and Dataset #3 for better understanding. Note that we have permuted the axis of the signals beforehand so that the data shape of 6 × 128 is converted into 128 × 6, which is suitable for our architecture.

2) PROPOSED AND OTHER VARIANTS OF RESIDUAL BLOCK
From the inception of the brilliant idea of residual learning [95], there have been numerous proposals for the residual block architecture. Along with the original residual block, He et al. proposed another four different configurations of residual block [96]. Moreover, with different settings of weighted layers, batch normalization, activation functions, and skip connections, there are some other residual blocks adopted to solve some other specific problems [101]- [104]. Fig. 2 contains a few of the depictions of the residual block architectures found in the literature. The sequence of the weighted and normalization layers and the skip connection varies among the blocks, whereas some have tried to replace the ReLU activation function with the other ones.

IV. EXPERIMENTAL ANALYSIS A. EXPERIMENTAL SETTINGS AND EVALUATION METRICS
Zou et al. provided the benchmark datasets split into train and independent test set [6]. We have further split the train set into 90%-10% ramdomly, where 90% of the data was used for training and the other 10% as a validation set. In all experiments, during training, we trained the models for 1000 epochs with an early stopping mechanism to monitor validation loss using the Adam optimizer with a learning rate of 0.001. After training, we evaluated the performance of the models using the independent test set provided by Zou et al. [6].
We have reported accuracy (Acc) as the primary metric of measuring gait recognition performance and comparing the methods (see (1)). To measure the efficiency with respect to the previous studies in terms of parameter reduction (PR) and memory-usage reduction (MR), we have used (2) and (3), respectively. Moreover, the performance gain (PG) was computed as the difference between the performance with the existing methods and ours (see (4)). All of the experiments conducted in this study were backed by a tensor processing unit (TPU v2 with 8 cores) provided by Google Colab. The Keras API over TensorFlow backend was used to construct the models [105].
Here, Acc, PR, MR and PG refers to accuracy, parameter reduction, memory-usage reduction and performance gain, respectively.

B. PERFORMANCE OF PROPOSED MODIFIED RESIDUAL BLOCK
We have evaluated the performance of the proposed residual architecture (see Fig. 1 (b) and Table 3) in different configurations. Varying the activation functions and the number of filters in the convolution layers in the second residual block, the performance of the model is shown in Table 3. We have fixed the number of feature maps in the first residual block of the model to 6 as the number of  features in the input data. Convolving the output from the first residual block using 1D convolution operation, we have increased the number of feature maps into 16, 32, 64, 128, and 256, which is continued to the second residual block. Furthermore, the activation functions play an important role in signal propagation inside the model.
From Table 3, we can see that the LeakyReLU activation performs best with respect to the other activation functions in most of the cases. From the models that outperformed the state-of-the-art, we have identified the configurations based on the number of parameters and calculations overhead (marked in bold in Table 3). We call it the baseline configuration for our proposed model. The baseline configuration differs in terms of number of filters in the second residual block (# filters) and the non-linear activation function keeping rest of the architecture identical. For dataset #1 and dataset #3, we have selected the models with 64 filters in the second residual block using LeakyReLU as the activation function. We have selected the models with identical settings for dataset #2 and dataset #4 for 32 filters. Interestingly, the ELU activation works well for the OU-ISIR dataset. Therefore, we have selected the model with 256 feature maps and ELU activation while the LeakyReLU activation also outperformed all the stateof-the-art methods. Table 4 contains the summary of the baseline configurations of the proposed model. Although our proposed architecture is shallow, the number of parameters can increase if we use larger number of filters. Increasing the number of filters causes the better accuracy, but our goal is to make lightweight model with acceptable performance. Thus, we have selected lower number filters, when the difference of accuracy between the higher and lower # filters is negligible.

C. COMPARISON WITH OTHER RESIDUAL BLOCKS
We have incorporated all the prominent residual block architectures into our proposed architecture (see Section IV-B) and measured their performances. We have reported the results in Table 5. The table shows that our proposed residual block performs better than any other residual blocks for all datasets. The notable change in our residual block architecture is to perform non-linear activation before the batch normalization layer, which yields better performance.

D. ABLATION, INSERTION AND MODIFICATION STUDY ON PROPOSED ARCHITECTURE
To prove the efficacy and stability of our proposed architecture and evaluate partial importance of the components of the model, we have examined our model in three different ways. Along with the widely performed experiments called Ablation [106], [107], which was done by removing some module or portion of the proposed models (see Section IV-B) and measuring the performance to get the notion of partial importance of that module, we have also done some Insertions and Modifications in the model. Insertions are cases where we incorporate some other modules to the proposed model, and modifications are done by making some changes in some portions of the proposed model. In Table 6, we have listed all the ablation, insertion, and modification experiments and presented the performance of the models. To ease of understanding, we have depicted these models in Fig. 3.
We have ablated skip connections from the residual block to make them simple feed-forward CNN (see Fig. 3 (a)) and batch normalization layers to observe the effect of normalization (see Fig. 3 (g)). Insertion of the new modules have been done for most of the times, e.g., new skip connections from the input (head) (see Fig. 3 (b)), additional residual block (repetition of second residual block) with and without multi-headed skip connections (see Fig. 3 (e-f)), additional convolutional layer (with # filters = # filters with baseline configuration / 2) before the first residual block (see Fig. 3 (d)), introducing dropout in batch normalization layer [108] and tested with different dropout rates (see Fig. 3 (i)). In modifications, we have modified the first layer receptive field and tested with different kernel sizes (see Fig. 3 (c) i), used 1 convolution operation to increase feature maps in the convolution layer between the residual blocks (see Fig. 3 (c) II) and replaced the global average pooling with one and two layers of fully connected perceptron layers incorporated with batch normalization and dropout with p = 0.3 (see Fig. 3 (j)). Note that all the ablations, insertions, and modifications are independent of each other, i.e., we have performed all these experiments over the proposed model with baseline configurations defined in Section IV-B.
From Table 6, we can see that, the skip connections plays an important role as the performance decreased for no skip connection (Fig. 3 (a)) for all the datasets compared to the proposed lightweight model whereas, additional  skip connection from head ( Fig. 3 (b)) shows similar kind of results except for OU-ISIR dataset. Wei et al. [101] have chosen their first layer receptive field to cover the 10-millisecond duration of the signal, similar to the window size for many MFCC computations. The sampling rate of the signals they used was 8000 Hz; therefore, they have used 80 as their receptive field size. The sampling rate of our datasets are 50Hz and 100Hz. So, the optimal receptive field according to [101] will be 50/1000 * 10 = 0.5 and 100/1000 * 10 = 1. The receptive field size of 0.5 is invalid; therefore, we have experimented with the receptive field size of 1 along with 7 and 15 to observe the impact of increasing receptive field size. Nevertheless, for most of the cases, the proposed model (with receptive field size = 3) performs better, and increasing or decreasing the field size decreases the performance.
Similarly, using 1 convolution operation, i.e., the receptive field size of 1 at the middle convolution layer (Fig. 3(c) II) performs worse than the proposed model. Additional convolutional layer before the first residual block (Fig. 3(d)) slightly improved the performance for dataset #1 whereas this model acquired the best performance of (99.08%) for OU-ISIR dataset. The additional residual block without multiheaded skip connection (Fig. 3(e)) performed better for all the datasets. Without the batch normalization operation ( Fig. 3(g)), the performance decreased drastically; therefore, the batch normalization operation should be an integral part of the residual models. As it is an evergreen debate: where to put the batch normalization operation -before or after the non-linear activation, contrasting with the proposed models with batch normalization after the activation function and the original residual block with batch normalization before the activation function, we have experimented with batch normalization -both before and after activation function (Fig. 3 (h)); although it does not improve the performances. Dropout in batch normalization layer was proposed in [108]. A very minimal dropout percentage p = 0.05 can improve performance slightly, whereas increasing dropout percentage reduces accuracy due to too much regularization. Replacing global average pooling by fully connected layers with batch normalization and dropout (Fig. 3 (h)) results in drastic overfit. Increasing the number of fully connected layers increases the performance a bit, but none of them are up to the mark while increasing the number of parameters a lot. Our proposed model maximizes the representation learning in the convolutional layers without the use of fully connected layers. From the above experimental result analysis, the performance of our proposed model with baseline configurations can be improved by additional weighted layers and residual blocks and proper choice of dropout in the batch normalization layers. Specially, insertion of an additional residual block shows marginal better performance than the lightweight model in all datasets. Since this insertion costs more than the proposed lightweight model, we considered to stick to the proposed model in further efficiency analysis in terms of number of parameters and memory-usage as our study aims to propose an efficient state-of-the-art model.

E. COMPARISON WITH THE STATE-OF-THE-ART METHODS
We have listed the performances of all the recent studies on gait recognition utilized the whuGait datasets and OU-ISIR dataset in Table 7 along with that of our proposed method.
We have reported the performances of the lightweight models described in Section IV-B and proposed model with an additional residual block from Section IV-D (see Figure 3(e)). Although the proposed model with additional residual block produces a little bit better accuracy than the proposed model, it increases the number of parameters. Since both of our models perform comparatively better than all  the previous methods and our goal is to produce lightweight model, the rest of the analysis will be performed on only the proposed lightweight model.
Simplifying the deep learning model is essential to propose a real-time gait recognition system and reduce the computation overhead. We have reduced the number of trainable parameters by a considerable margin. Our proposed model has a significantly lower number of parameters, and therefore, the inference time is decreased. The average inference time for Dataset#1 and Dataset#3 is 0.32 ms and 0.56 ms for GPU and CPU, respectively, whereas the time for Dataset#2 and Dataset#4 is 0.29 ms (GPU) and 0.43 ms (CPU). For the OU-ISIR dataset, the average inference time is 0.57 ms for GPU and 3.71 ms for CPU. We have calculated all these inference times in milliseconds (ms) on the Intel (R) Xeon(R) 2.20GHz CPU and the Tesla K80 GPU. Consumption of memory for inference has also dropped drastically for our model compared to the previous ones. Comparison with the existing methods in terms of the number of trainable parameters and memory-usage is listed in Table 8.
Using more than 99% fewer parameters than Zou et al. [6] we have achieved better performance in dataset #1 and dataset #2 whereas we have passed their performance by nearly 26% for the OU-ISIR dataset with 88% fewer parameters. On the other hand, using 92% fewer parameters on average than Huang et al., we have achieved better performance in all four whuGait datasets. Our model had to use a greater number of feature maps in the second residual block, as discussed earlier, to surpass Huang et al. [39] in the OU-ISIR dataset. Still, the number of parameters is nearly 60% less than the mentioned study. Reduction in parameters and memory-usage (see (2) and (3)) is presented in Table 9 along with the performance gain (see (4)). Though the performance gain with respect to Huang et al. is marginal, our model costs much lesser computation overhead. Moreover, as we discussed in the ablation, insertion, and modification study section, the performance of our model can further be improved considering some tread-offs.

V. CONCLUSION
The primary intention of residual learning was to train very deep architectures. Nevertheless, we have successfully adopted the residual block with some modifications and efficiently created shallow convolutional neural networks for gait recognition. We have evaluated the performance of our methodology with two publicly available datasets collected in the wild and with the largest population and acquired state-of-the-art accuracy while reducing more than 85% of the parameters on average compared to the recent works. Our model can predict better than any other methods to date with minimum latency as the computation overhead reduces with the number of parameters that are suitable in practical applications using wearable devices. In future studies, we will explore and evaluate our methodology in other domains using IMU sensor datasets. Furthermore, usage of mobile computing power in smartphone-based recognition systems, response time, storage usage, energy consumption, etc. can also be evaluated in real-life scenarios. There are endless applications yet to be studied using gait patterns in medicine-classification of gait abnormalities that can utilize a similar setting. Abnormal gait patterns such as spastic, scissors, propulsive, steppage, etc., can be classified using wearable sensors. Moreover, abnormal gait patterns that develop over time due to some musculoskeletal or neurologic diseases can be predicted before they become life-threatening, potentially saving lives.