Fall Risk Prediction Using Wireless Sensor Insoles With Machine Learning

Accidental fall is a significant health risk among the elderly. However, most of the fall detection systems give notification only after a fall occurs. Therefore, medical attention has shifted to fall preventive measures to reduce risks of fall and prevent any damage entirely. As most fall prediction data in previous literature are obtained from inertial sensors or static pressure sensors, in this study, wireless pressure sensors embedded insoles are used to train machine learning (ML) models to predict the risk of fall of an individual. The novelty of this paper is that dynamic walking data is obtained by wearing smart pressure insoles from 1101 subjects. We applied six different ML models, i.e., support vector machine (SVM), random forest (RF), logistic regression (LR), naive bayes (NB), decision tree (DT), and k-nearest neighbor (kNN). Results show that LR model with oversampling techniques achieved the highest area under curve (AUC) of 0.82, whereas the RF model with oversampling achieved the highest accuracy of 0.81 and specificity of 0.88. The results show that such models combined with pressure embedded wireless sensor insoles are capable for fall risk prediction.


I. INTRODUCTION
There were 727 million persons aged 65 years or over in 2020 [1]. Over the next three decades, the number of the elderly worldwide is projected to more than double, reaching over 1.5 billion in 2050. Globally, the population aged 65 years or over is expected to increase from 9.3% in 2020 to around 16% in 2050 [1]. In each region in the world, hundreds of thousands of elderly face risks and complications caused by fall accidents. Medical research shows that the aging process in humans involves the recession of nervous system and physiological functions [2], which reduces their ability to walk. So, the elderly are more prone to falls than The associate editor coordinating the review of this manuscript and approving it for publication was Fu-Kwun Wang . younger people. Falls among elderly are one of the major health problems that lead to a decreased quality of life and increased morbidity and mortality [3]. Health centers have to deal with a large number of patients due to accidental falls, resulting in a huge cost on society. In 2015, the estimated medical costs attributable to fatal and nonfatal falls were approximately $50 billion [4].
To alleviate the severity of fall accidents, several systems were developed. Most were post-fall detection [5], [6], [7], [8] systems, which notify caretakers or the medical staffs only after a fall occurs. However, despite the early notification, damage has already occurred. Thus, there is need for a fall prediction system to prevent falls from occurring at an early stage and help the elderly to reduce their fall risk. Therefore, a fall risk prediction system which can notify the elderly VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ before they fall using previous data is a promising method to avoid falls. Although there are many studies based on fall risk prediction systems, most studies used accelerometer sensor data [2], [3], [9], [10] or static pressure plates [11], [12]. In this work, a wireless pressure sensor embedded smart insole for fall risk prediction has been proposed. The smart insoles are used to enable dynamic gait analysis. Since the gait and balance disorders have been identified as strong fall risk factors [13], we used feature sets extracted from the gait and balance data for this study, then apply different ML models and compare their performances on fall risk prediction. In addition, the smart insoles are lightweight, thin, and comfortable to wear, providing an unobtrusive way to perform gait and balance analysis.

II. RELATED WORK
In general, there are two different approaches to alleviate falls. One is fall detection [5], [6], [7], [8] and the other is fall prevention [2], [3], [9], [10], [11], [12]. The disadvantage of fall detection method is that it notifies only after the fall occurs. On the contrary, the fall prevention system by using users' past data, we can assess the fall risk and recommend exercise or training to reduce their fall risk.
In [2], a mobile phone and a three-dimensional accelerometer were used to develop a simple statistical fall risk prediction system. Stability and symmetry of gait from accelerometer data were used to predict the fall risk (normal, attentive, dangerous) of walking subjects. In another study [3], deep learning models were used to automatically derive features from raw accelerometer data that assess binary fall risk (fall, non-fall) with an existing dataset of 296 older adults. The best performance was achieved with AUC of 0.75 for prediction of risk of fall. Hemmatpour et al. [9] proposed a polynomial classification model of human gait for real-time fall prediction. Their approach can detect the transition from a normal to an abnormal gait pattern. The accelerometer and gyroscope sensor data we used to analyze various gait features. Their statistical-based approach had the best performance among other fall prediction algorithms, with 99.2% accuracy. Bizovska et al. [10] proposed a statistical model for medial-lateral local dynamic stability during gait using inertial sensors combined with a clinical score, called the Tinetti total score, as a potential fall risk assessment measure. Results showed that an AUC of 0.75 was acquired from a total of 131 elderly subjects observed for one year for this study.
All the above discussed work [2], [3], [9], [10] used inertial sensors (accelerometer, gyroscope) to analyze the gait data and predict fall risk. The challenge with inertial sensors is that the placement of sensors worn on the body may affect the model's performance. Also, it is well-known that corrupted errors, such as measurement noise and time-variant sensor biases can distort the data from inertial sensors [14], [15]. Therefore, such sensors require longer preprocessing time to filter raw sensor data and extract useful features. Furthermore, [2] and [10] used statistical models to predict fall risk.
Statistical models have been used for inference, which is created by fitting a project-specific probability model [16]. In contrast, ML models predict by using general-purpose algorithms to find patterns in often rich data [16]. Therefore, ML models are more generalized than statistical models and are likely to handle future incoming data effectively. In [9], a polynomial classification model and a threshold-based approach were used to classify normal and abnormal gait. However, the abnormal gait scenario is synthesized by adding obstacles; so the approach may not reflect realistic fall-related gait patterns.
As aforementioned, [2], [3], [9], [10] have used only inertial sensors, with the exception of [11] which extends to use the inertial sensor with static pressure sensors. They concluded that features from inertial sensors with pressure platforms obtained better results than inertial sensors alone. Their balance features were calculated from a static pressure platform, which helped predict fall risk. However, the gait analysis cannot be measured from the pressure platform because of its static nature. In this work, we used a pressure sensor embedded smart insole to collect dynamic pressure in real-time gait movement.
In this work, we propose a different approach that employs ML models with inputs obtained from wireless pressure sensor embedded smart insole instead of inertial sensors to predict an individual's fall risk. In particular, we analyze the dynamic gait data by using a butterfly loop, which describes the gait cycle's stability and symmetry. In this work, the application of center of pressure (COP) as dynamic gait/balance assessment is novel for fall risk prediction. Results show that the LR model with oversampling has the highest AUC of 0.82, whereas the RF model with oversampling has the highest accuracy of 0.81 and specificity of 0.88.

III. MATERIALS AND METHODS
This study was conducted at the Khon Kaen University and the Dan Sai Crown Prince Hospital, Thailand. The study was approved by Khon Kaen University Ethics Committee for Human Research (EC number -HE631529). All participants gave written informed consent.

A. GENERAL INFORMATION
A total 1101 volunteers (341 male and 760 female subjects) participated in this study. Excluded participants were subjects with a plantar wound, aged below 65 years, an unstable medical condition with lower limb amputation, gait and/or mobility disorders, and inability to walk 10 meters without support or unable to wear smart insoles.
A fall risk assessment index called Timed Up and Go data was also collected. The Timed Up and Go (TUG) is a simple screening test that is a sensitive and specific measure of the probability of falls among older adults and one of the  primarily used tests for fall risk assessment by experts [17]. TUG measurement is shown in Fig. 1.

B. SMART INSOLE (SURASOLE)
A commercial smart insole (provided by Suratec Co., Ltd., Nakhon Ratchasima, Thailand) as shown in Fig. 2 was used to collect dynamic plantar pressures while a subject was walking. The insole had five force-sensitive resistance sensors each with a diameter of 18 mm embedded in both insoles. These sensors were uniquely positioned to detect and quantify a relative change in the pressure or applied load. As shown in Fig. 3, the data collected during the experiment were divided into five zones of the foot: the big toe (Hallux: HA), the medial forefoot (M1), the lateral forefoot (M5), the midfoot (MF), and the heel (HF). These sensors were connected to a microcontroller using a voltage divider circuit. The output was then connected to a 10-bit analog-to-digital converter. The circuit was adjusted for each sensor to operate at full force range (0-20 kg) and a response time of less than 10 µsec. The sampling rate used was 20 Hz.
The dynamic data collection process began with the patient walking back and forth over a ten-meter distance to assess the pressure distribution under their feet. These real-time measurements were transmitted to a smartphone through Bluetooth and then recorded on a database server for further analysis.

C. PREPROCESSING
After collection of data, the sensor data at the beginning and at the closing of each sample was cut by 20%.  This percentage of cut was selected based on evaluating different cutoff percentage (5%, 10%, 15%, 20%, and 25%) on the collected smart insole data to eliminate the transition at the start and end of the data collection. For this experiment, we used 2202 samples (each participant walked two times back and forth, so two samples for each participant). After removing the samples with invalid data and noise, 2070 samples remained for our experiment. Then the data was analyzed to determine various motion-related parameters, including sway information, and COP. To enhance the feature for fall prediction and gait analysis, a collection of COP from a two-step walk cycle, called the butterfly loop, was introduced. COP traces create a butterfly loop shown in Fig. 4.
Let the coordinates of the butterfly loop in Fig. 4 be defined as A, B, C, D, E, F, G, H, I, where B is the cross-section point of the butterfly loop. The common extracted features from a gait analysis obtained from a butterfly loop are shown below [18]: VOLUME 11, 2023 • Cadence: The number of steps a person walked in one minute.
• Stance time: The time taken to complete a stance phase of corresponding foot in a gait cycle (i.e., the time for which a corresponding foot is in contact with the ground in a gait cycle).
• Sway distance: The distance obtained from the corresponding segment of the butterfly loop. The larger the sway distance, the larger the pressure applied on the foot, i.e., DL (Left sway distance) = BE + EA + AD + DB DR (Right sway distance) = BG + GC + CF + FB.
• Cycle time: The time taken to complete a gait cycle (2 steps). In other words, the time taken to complete a butterfly loop. It is worth to note that these butterfly features have been used for gait analysis [18], but to the best of our knowledge it has not yet been used for fall risk prediction. The additional features extracted from the butterfly loop for fall risk prediction are presented in Table 1.

D. FEATURE SELECTION
The selection of features is a central concept in ML that has a significant effect on the model's performance. Irrelevant features may adversely affect the performance of the model. Therefore, good selection of features will result in reduction of overfitting, improve accuracy and reduction of training time.
We use a method called feature importance [19] to extract useful features in our dataset. In this method, all features have their individual scores. Features having a high score are more important for the target outcomes, i.e., high risk of fall or low/no risk of fall classification.
From Fig. 5, the feature with the highest score (i.e., step count) has the most impact whereas the feature with lowest score (SD_X) has the least impact or no impact at all on the performance of the model. From Fig. 6, it is shown that having more than 5 features does not provide an additional advantage, as can be seen that after 5 features, the AUC score did not improve (increase) any further. Therefore, we used the top 5 features in our experiment from Fig. 5.
In this study, we used TUG to label the data. For community dwelling adults, the cutoff scores for indicating risk of falls is 13.5 seconds [17], giving the following threshold used in this experiment.
• If TUG score is greater or equal to 13.5, the participants are labeled as high risk of fall (fall) category.
• Otherwise, the participants are labeled as the low/no risk of fall (non-fall) category.

IV. EXPERIMENTS AND RESULTS
Six well known ML models (support vector machine (SVM), random forest (RF), logistic regression (LR),  naive bayes (NB), decision tree (DT), and k-nearest neighbor (kNN)) were used in this study to perform binary classification of participants based on their risk of fall. Since for clinical fall risk study, the model performance was evaluated by physicians using AUC score [3], [10], we used AUC as a primary performance measure to compare the ML models. AUC was the measure of the ability of a ML model to distinguish between classes (high risk/low risk of fall) and was used as a summary of the receiver operating characteristic (ROC) curve. In addition to AUC, the accuracy, sensitivity and specificity were also evaluated. In our experiment, we used a hyperparameter tuning library called GridSearchCV [20] with a combination of k-fold cross-validation technique to find the optimal parameter for each model. After we achieved optimal hyperparameters for each ML model, we applied the commonly used ratio of 70:30 as the training set and test set ratio in all models.

A. DATASET HANDLING
The dataset from Section 3.3 consisted of 2070 data samples which consisted of 500 falls and 1570 non-falls based on TUG score. As the ratio between non-fall and fall samples was approximately 3:1, the dataset was imbalanced. As the TABLE 1. Feature set extracted from the butterfly loop (Fig. 4). imbalanced dataset directly affects the performance of ML models, in our study, we used resampling methods to handle the imbalanced dataset. These methods involves creating a new transformed balanced training dataset.
The most commonly used resampling techniques are • Undersampling: which decreases the majority class samples i.e., in our case non-fall samples (Section 4.1.2), • Oversampling: which increases the minority class samples i.e., in our case fall samples (Section 4.1.3). ML models were applied and their performance was compared among three types of dataset, i.e., without resampling dataset, oversampling dataset and undersampling dataset as follows.

1) EXPERIMENT-1 WITHOUT RESAMPLING DATASET
We applied ML models in this experiment without introducing resampling techniques to our imbalanced dataset. The objective was to evaluate the effect of imbalanced dataset without any resampling treatment.
In Fig. 7, we can see that AUC and accuracy scores are the highest and identical in the three models (SVM, LR, kNN), with an AUC of 0.80 and an accuracy of 0.83, whereas DT has the lowest AUC of 0.75. DT may be inefficient if there are features with weak or no interactions since every feature in the DT is compelled to interact with every feature further up the tree [21]. This could be the reason that DT performed the worst in this experiment.
The optimal hyperparamaters were selected using Grid-SearchCV library as shown in Table 2.

2) EXPERIMENT-2 WITH UNDERSAMPLING DATASET
In this experiment, we applied the undersampling technique to balance the dataset. We used two popular undersampling methods in this experiment, i.e., • Random undersampling: This method involves randomly selecting samples from the majority class and deleting them from the training dataset, • Centroid-based undersampling: This method undersamples the majority class by replacing a cluster of majority samples with the cluster centroid of a K-means algorithm. The K-means clustering algorithm computes centroids and repeats until the optimal centroid is found, where K denotes the number of clusters found from data. The main difference between the two undersampling methods is as follows. In the centroid-based undersampling method, the newly generated data samples are synthesized from the centroids, whereas in the random undersampling method, original data samples are used. The samples chosen by random undersampling may be biased and can discard potentially useful information, which could be essential for building binary classifiers. In contrast, the samples chosen in centroid-based undersampling are less biased because they used centroid samples instead of the original samples. We applied different ML models in both undersampling methods to compare their performances.
In Fig. 8, results show that AUC is the highest and identical in RF and kNN for random undersampling and whereas NB  and kNN performs best in centroid-based undersampling, with an AUC of 0.81.
In Fig. 9, we can see that DT has the highest accuracy of 0.78 in random undersampling and NB has the highest accuracy of 0.77 in centroid-based undersampling.
Since NB has both the highest AUC and accuracy, therefore for this experiment, NB was selected as the optimal model using the centroid-based undersampling method. After applying undersampling in our original dataset, the non-fall dataset was reduced from 1570 to 500 samples compared to the other two experiments. As NB works better in smaller datasets compared to other ML models [22], NB performed best in undersampling.

3) EXPERIMENT-3 WITH OVERSAMPLING DATASET
In this experiment, we applied the oversampling techniques to the training dataset to balance the data. We used two popular oversampling methods in this experiment, i.e., • Random oversampling: This method involves randomly selecting samples from the minority class with replacements and adding them to the training dataset, • SMOTE (Synthetic Minority Oversampling TEchnique): This method is a type of data augmentation for the minority class.  The main difference between the two oversampling methods is as follows. In the SMOTE method, the newly generated data samples are synthesized from the existing samples, whereas in the random oversampling method, duplicating the existing data samples. We applied different ML models in both oversampling methods to compare their performances.
In Fig. 10, LR has the highest AUC score of 0.82 in both oversampling methods. LR is useful when the response variable is binary, but the features are numeric [23]. This would be the case since we were predicting whether or not a subject is at fall risk, using the information on their age, step count, cycle time, and other COP parameters, which are numeric variables.
In Fig. 11, RF has the highest accuracy of 0.81 and specificity of 0.88 in the random oversampling method. After applying oversampling in our original dataset, the fall dataset expanded from 500 to 1570 samples compared to the other two experiments. As RF works better in terms of accuracy on larger datasets compared to other ML models [24], RF had the highest accuracy in oversampling.
In this experiment, the AUC scores were the highest for oversampling. Therefore, the two oversampling methods were selected as the optimal methods in which the remaining performance metrics are shown in Table 3. From Table 3, for both oversampling methods, LR model has the highest AUC and RF model has the highest accuracy and specificity. Since AUC is the most significant factor for clinical fall risk assessment, our results suggested that LR model with oversampling may be used. However, if accuracy and specificity are the main concern, then RF with oversampling may be used.
In comparison with previous works, deep learning methods (convolutional neural network (CNN), long short-term memory (LSTM), and a combination of CNN and LSTM, i.e., ConvLSTM) were used to predict the risk of falls on raw accelerometer sensor data achieved AUC of 0.75 [3]. In our work, we achieved an AUC of 0.82 by using the LR model with data obtained from oversampling method. In another study [12], a One-One-One deep learning model (combination of 1D-CNN, LSTM, and the Dense) was proposed on force plate data for fall risk prediction. This model achieved accuracy, precision, and sensitivity of 0.99, 1, and 1, respectively. This suggests potential improvement in performance if deep learning method is used with pressure sensor data from dynamic gait.

V. CONCLUSION
In this work, six different ML approaches are compared to predict the risk of falls by using the butterfly loop gait/balance assessment obtained from wireless pressure-embedded smart insoles data. Our proposed method was evaluated on a dataset of 1101 elderly subjects walking data collected using a pressure-embedded smart insole. Different oversampling and undersampling methods were applied due to the imbalanced nature of the dataset. Whereas, in our study, the LR model with oversampling achieved the highest AUC of 0.82, and the RF model with oversampling achieved the highest accuracy of 0.81 and specificity of 0.88. Therefore, results suggest that our methods have the potential to predict the risk of falls using wireless pressure sensor embedded smart insoles.
Furthermore, existing literature demonstrated that high accuracy performance can be achieved by using deep learning methods, we will explore different deep learning (CNN, recurrent neural network (RNN), LSTM, etc.) methods to extract insights of human gait and balance with the help of hidden layers of neural networks in future research. In addition to that, we will investigate more gait and balance features from our insole data to develop a multiple-class fall risk prediction model.