Machine Learning-Based Automatic Classification of Knee Osteoarthritis Severity Using Gait Data and Radiographic Images

Knee osteoarthritis (KOA) is a leading cause of disability among elderly adults, and it causes pain and discomfort and limits the functional independence of such adults. The aim of this study was the development of an automated classification model for KOA, based on the Kellgren–Lawrence (KL) grading system, using radiographic imaging and gait analysis data. Gait features highly associated with the radiological severity of KOA identified from our previous study, in addition to radiographic image features extracted from a deep learning network, namely, Inception-ResNet-v2, were exploited using a support vector machine for KOA multi-classification. The area under the curve (AUC) of the receiver operating characteristic curve from KL Grades 0–4 were 0.93, 0.82, 0.83, 0.88, and 0.97, respectively. The sensitivity, precision, and F1-score of the model were 0.70, 0.76, and 0.71, respectively. The proposed model outperformed a common deep learning approach that is based on using only radiographic images as the input data. This result indicates that gait data and radiographic images are complementary with respect to KOA classification, and the use of both data can improve the accuracy of the automated diagnosis of multiclass KOA.


I. INTRODUCTION
Osteoarthritis (OA) is a leading cause of disability among elderly adults, which affects 30% of the global population over the age of 60 years old [1]. With more than 250 million patients worldwide, 1-2% of the gross domestic product is spent on OA [2], [3]. With the aging of the global population, the number of patients who suffer from knee osteoarthritis (KOA) is expected to increase [4]. The typical symptoms of KOA include pain, stiffness, decreased joint range of motion, and gait dysfunctions, which worsen in accordance with an increase in the disease progression [5], [6]. These The associate editor coordinating the review of this manuscript and approving it for publication was Emre Koyuncu . symptoms can impair the functional independence of individuals and decrease their quality of life.
The current gold standard for the radiographic assessment of KOA is the Kellgren-Lawrence (KL) grading system [7]. The KL grading system classifies KOA into five grades, ranging from 0-4, where Grade 0 indicates healthy subjects with no KOA symptoms, and Grade 4 indicates the most severe cases. The KL grade is determined by observing the presence of joint space narrowing, osteophytes, bone deformity, and sclerosis from radiographic images. Although the KL grading system is widely implemented in clinical applications for the diagnosis of KOA, it is time consuming and requires highly trained experts, generally with fellowship training experiences in arthroplasty or radiography [8]. For the accurate evaluation of the KL grades, two experts are VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ required to independently conduct radiographic evaluations without considering other data. If the diagnoses of the two experts are contradictory, the results are discussed to reach a conclusion. An automatic diagnosis system can decrease the time-consumption, thus allowing for the clinicians to direct their attention toward clinical findings. Accordingly, deep learning, machine learning and image process techniques were proposed in previous studies for automatic KOA diagnosis using radiographic imaging [8]- [13]. Deep learning is an effective technique for the analysis and classification of images, which is widely applied in various fields such as the medical field, and demonstrates excellent performances. Besides, the deep learning approach did not demonstrate a satisfactory performance when it was applied to the classification of KOA based on radiographic images. Although deep learning demonstrated a satisfactory performance for binary classification between OA and non-OA cases, with an area under the curve (AUC) of 0.92, the overall results of the multi-class diagnosis of KOA were unsatisfactory, with an accuracy of 66.7% [8]; when compared with the accurate results of the deep learning approaches in other fields of application [11], [14]- [16]. The low performance of deep learning can be attributed to the limitation of two-dimensional (2D) radiographic images with respect to the indication of the wear of articular cartilage, whereas the structure of articular cartilage is three-dimensional (3D). Abedin et al have proposed machine learning models based on patient assessment data. In their study, the random forest demonstrated the root mean squared error of 0.94 when predicting KL grades. Saleem et al have shown a computer-vision-based imaged process system with high KOA detection accuracy of 97%. However, this result was based on 82 subjects.
Modern gait analysis is an effective technique for the analysis of the biomechanical information of lower joints, and it provides a temporal signal of each joint and additional gait information such as cadence, stride length, and step width. Due to the degradation of gait functions in accordance with the progression of the disease, the relationship between gait analysis and the severity of KOA based on radiographic images has received significant research attention [6], [17], [18]. In our previous study [19], twenty critical features associated with the radiographic grade of KOA were identified. We have extracted 149 features from kinetic and kinematic data of hip, knee and ankle in gait analysis data, with additional 16 gait features. The extracted gait features include, kurtosis, area under the curve during swing phase, stride length and other characteristics of gait. For feature selection, a machine learning based method was applied, namely, neighborhood component analysis (NCA) [20]. The 20 selected features were further validated using one-way analysis of variance, followed by a post-hoc Student's t-test.
In this cross-sectional study, the radiographic images and gait data were hypothesized to contain critical and complementary information with respect to the knee joint and the severity of the disease; thus, the use of both data improves the classification accuracy of the model. An automatic diagnosis model was developed for KOA, which ranged from non-KOA (Grade 0) to end-stage KOA (Grade 4) based on radiographic images and gait data. In this study, features were extracted from radiographic images using a deep learning network, and then a classification model was developed for KOA based on image features and 20 gait features identified as KOA-related features, as obtained from previous work [19].

A. PARTICIPANTS
This study was approved by the Institutional Review Board of Seoul National University Hospital (IRB no. 1810-004-974) and conducted in accordance with the relevant guidelines and regulations. Written informed consent was obtained from all the study participants, and the study was conducted using the gait lab database from 2013-2017. The database consists of gait reports of various degree of KOA patients, in addition to healthy volunteers. All participants underwent a physical examination, and radiographic imaging was conducted on their knees in the full-limb, standing, knee-extended positions. Subjects were excluded based on the following criteria: (1) age < 20 years; (2) spine disease, hip, or ankle arthritis; (3) inflammatory or traumatic arthritis of the knee; and (4) prior bone surgery in the lower extremities. Each limb was counted as separate data, and a total of 728 limbs from 364 subjects were included in this study. The degree of KOA was determined based on the KL grading system. Table 1 summarizes the demographic characteristics and walking speeds of the participants.

B. GAIT DATA COLLECTION
All the gait analysis data, which included the kinetic, kinematic, and spatial-temporal data, were collected at the Human Motion Analysis Laboratory of the Seoul National University Hospital. The subjects were first familiarized with the experimental setup. Thereafter, an operator with 20 years of experience tagged the subjects with reflective markers based on the Helen Hayes set. After placing the markers, the subjects were asked to walk along a track with a length of 9 m. Gait data were collected using twelve charge-coupled device cameras using a three-dimensional optical motion capture system (Motion Analysis Corp., Santa Rosa, CA), in addition to two force plates embedded in the floor at a sampling frequency of 120 Hz. The kinetic and kinematic gait data of all the joints were averaged after five or six trials and used as the experimental data.

C. RADIOGRAPHICAL ASSESSMENT
All the full limb radiographic images used in this study were digitally obtained using an image archiving and communication system (Maroview 5.4; INFINITT Healthcare, Seoul, Korea). The radiographic evaluations were conducted independently, in accordance with the KL grading system, by two experts with fellowship training experience in arthroplasty. The two experts did not consider the other information related to the subjects. When the evaluation results were contradictory, the grading was discussed to reach a conclusion. The inter-observer reliability of the radiographic assessments was satisfactory (intra-class correlation coefficient, 0.93). The collected radiographic image did not go through any prior image processing except for cropping around the knee joint to resize the image for deep learning network's input size.

D. FEATURE EXTRACTION AND CLASSIFICATION
The data analyses and classification were conducted using MATLAB 2019b (MathWorks, Natick, MA). Twenty gait features (9 from the knee, 7 from the hip, 2 from the ankle joint, and 2 spatiotemporal parameters) identified in our previous work were employed [19]. The features were extracted by calculating the AUC, difference between the maximum and minimum values, and kurtosis, and other characteristics of kinetic and kinematic gait parameters. The gait parameters, which included critical features, were as follows: the knee extension moment, knee abduction moment, knee rotational moment, knee flexion angle, hip abduction moment, hip extension moment, hip extension angle, ankle dorsiflexion moment, cadence, and stride length.
The Inception-ResNet-v2, which is a pre-trained convolutional neural network based on the ImageNet database, was used for two purpose: 1) To extract the features from radiographic images; 2) Classification model to compare our proposed SVM model. The Inception-ResNet-v2 combines the Inception architecture with a residual connection for the acceleration of the training and the improvement of the accuracy of the network [21]. It consists of 164 layers with image input dimensions of 299 × 299, and is considered as an effective and efficient network. The full limb radiographic images obtained from all subjects were cropped around the knee and resized into dimensions of 299 × 299. The features were extracted after the final average pooling layer before the Softmax layer. A NCA was then conducted after the feature extraction using a deep learning network. The NCA feature selection obtains feature weights by regularization, which minimizes the error for leave-one-out classification [20]. Leave-one-out classification is conducted using one sample as a test set and the remaining data as a training set. This process is repeated until every sample has been used as a test set. For the selected gait and image features, the Student's t-test was conducted to determine the significantly different features between two different KL grades (p < 0.0001). This analysis provides information on the features that significantly influence the classification of KL grades.
A support vector machine (SVM) with the cubic kernel function was used for final classification of the KL grade using both gait features identified from previous study, and image features extracted deep learning network. A hold-out validation was conducted to train and test the model. The data were split into train and test sets with a ratio of 7:3. The training set was used to train the SVM model, and the remainder of the unknown data was used for the test. A diagram of the study flowchart is shown in Fig. 1.
For a comparison of the proposed SVM model, using both gait and image features as input, with deep learning methods using the same dataset as input, a deep learning network was trained and validated using the Inception-ResNet-v2. An Adam optimizer and a cross-entropy loss function were used with a mini-batch size of 32. A learning rate of 1e-4 was set to stochastically optimize the network. Radiographic images cropped used feature extraction was used as input to train and validate the model. A hold-out validation method was used to test the model by splitting dataset into train and test sets with a ratio of 7:3. The total number of iteration was 900. The two models were examined by calculating the AUC of the ROC, sensitivity, precision, and F1-score. The sensitivity, precision, and F1-score were calculated using Equations 1-3, respectively: The occlusion map [22], which indicates the relative significance of the 2D area for classification, was also observed to validate if the deep learning model uses the appropriate location for decision-making. The occlusion map is generated by covering a small portion of image with a mask, which VOLUME 8, 2020  moves across the image. The changes in the probability score for each mask location were measured and used to identify the relative significant area from an image. Fig. 2a presents an example cropped radiographic image of the knee used as an input for both radiographic image feature extraction and deep learning model. Fig. 2b shows the result of an occlusion map of the deep learning model based on the radiographic image input. Table 2 presents the confusion matrix of the validation result of the model and the AUC results, which were 0.93, 0.80, 0.85, 0.78, and 0.97 for KL Grades 0-4, respectively. The sensitivity, precision, and F1-score of the model were 0.55, 0.60, and 0.55, respectively. Fig. 3a presents an ROC curve for the results obtained using the deep learning approach based on radiographic images.

B. PROPOSED MODEL BASED ON GAIT DATA AND RADIOGRAPHICAL IMAGES
Among the 1,536 features extracted from a deep learning model, 65 features were selected. Table 3 presents the confusion matrix and AUC for the SVM model with total of 85 gait

IV. DISCUSSION
This study demonstrated that the machine learning approach based on gait and radiographic image features can improve the classification performance of KOA at the KL grading scale when compared with the use of only radiographic images. The proposed model is based on gait analysis data and radiographic images of the knee, whereas the models proposed in previous studies utilized only one of the data types. Moreover, it was demonstrated that gait data can be used in clinical applications. Although gait data includes significant information on joints, its application is limited due to its complexity [23]. This paper proposes a method for the application of gait data to the diagnosis of KOA, in combination with radiographic images. Moreover, the proposed method demonstrated an improved performance in the classification of KOA when compared with the methods proposed in previous studies. In particular, the proposed method demonstrated a relatively high accuracy, AUC, and F1-score, among other metrics.
In a previous study, 20 key gait features associated with the severity of KOA based on radiographic images were identified.
Instead of using the traditional method based on statistics, machine learning techniques were employed to identify gait features related to KOA. Many traditional features are limited to the maximal or minimal points of the gait parameters. Features were extracted, which included the traditional features, in addition to features such as the variance, RMS, and area of the curve. In this study, the novel gait features were found to have a direct relationship with the severity of KOA based on radiographic images. Moreover, the gait features and information obtained from radiographic images can be used to develop an accurate and automatic KOA classification model.
As shown in Fig. 2b, the occlusion map directs the region of interest (ROI), which indicates that the model was suitably trained. Tiulpin et al [8] presented an attention map that accurately determined the ROI and demonstrated a similar classification performance to the proposed deep learning model. However, the classification performance of the deep learning network based on radiographic images was not sufficient for the accurate diagnosis of KOA using the KL grading system. Therefore, instead of limiting the applicability of the deep learning method as a classifier, deep learning was used to extract features from the radiographic images. Given that the deep learning pointed ROI as relatively important features on spatial location of 2D radiographic image, the features that were extracted using deep learning also contain information on the ROI. The combined gait and radiographic image features demonstrated a superior performance when compared with the deep learning method. The accuracy of the combined model was 75.2%, whereas that of the deep learning method was 64.7%.
As shown in Table 4, the radiographic image and gait data features were found to be complementary. For example, with respect to KL Grades 1-3, half of the 20 gait features were significantly different; whereas there were only four significantly different features among the 65 image features. Moreover, only two significant differences were observed between the gait data features at KL Grades 1 and 2. However, 46 features image features, which constituted more than half of the total number of features, were found to be significantly different. The results support the hypothesis, in that the gait data and radiographic image features are complementary for the distinction of the KL grades. It should be noted that the number of input features in the proposed model is significant less than that of the deep learning network based on radiographic images, which demonstrates a poorer classification performance. With reference to the literature, this was the first study wherein KOA was diagnosed based on the KL grading system using both gait and radiographic image features. VOLUME 8, 2020 The limitations of this study were as follows. First, the results were only validated internally. For the further validation of the model with respect to clinical applications, the validation should be conducted based external data; however, there is no existing open-source database that contains the radiographic images and gait data of KOA patients. Second, the sample size was relatively smaller than those used in previous studies, which may result in the poor performance of the deep learning model based on radiographic images. In future work, the gait data collected from wearable devices will be inputted into the proposed model. As reported in previous studies, most of gait data used in this study can be obtained using wearable sensors [24]- [26]. The acquisition of gait data using wearable sensors can decrease the cost required for gait analysis and allow for the use of the dailylife information of patients, which can lead to more accurate diagnoses. Also, a long-term follow-up investigating the relationship between gait data and the progression of KOA would provide information to prevent or delay the progression of KOA.
In conclusion, the proposed model based on gait data and radiographic images was demonstrated to improve the accuracy of diagnosing the severity of KOA using the KL grading system. The automatic classification of KOA using the proposed method can reduce the work of the clinician and improve the reliability of the KL grading system. SOON  HEE CHAN KIM (Member, IEEE) received the Ph.D. degree in control and instrumentation engineering (biomedical engineering major) from Seoul National University, Seoul, South Korea, in 1989. From 1989 to 1991, he was a Staff Engineer working with the National Institute of Health (NIH)-funded electrohydraulic total artificial heart project at the Artificial Heart Research Laboratory, University of Utah, Salt Lake City, USA. He joined the faculty of the Department of Biomedical Engineering, College of Medicine, Seoul National University and Seoul National University Hospital, in 1991, where he is currently a Professor leading the Medical Electronics Laboratory (MELab). Major research activities in the MELab include development of intelligent algorithms and electronic instrumentations for medical and biological applications including artificial organs (such as artificial heart and artificial pancreas), biosensors, ubiquitous/mobile healthcare systems, and man-machine interface. In these areas, he has published over 180 peer-reviewed scientific articles in international journals and holds more than 170 patents. He is a member of Korea Society of Medical and Biological Engineering (KOSOMBE), IEEE/EMBS, and American Society of Artificial Internal Organs (ASAIO).