Wavelet Features of the Thickness Map of Retinal Ganglion Cell-Inner Plexiform Layer Best Discriminate Prior Optic Neuritis in Patients With Multiple Sclerosis

The goal of this study was to analyze wavelet features of topographic thickness maps of the retinal ganglion cell-inner plexiform layer (GCIPL) and evaluate their discrimination ability in patients with multiple sclerosis (MS) and a history of optic neuritis (ON). Twenty-nine patients with relapsing-remitting MS and a history of ON were recruited together with 63 age- and sex-matched controls (HC). There were 33 eyes with a history of ON (MSON), 25 non-ON fellow eyes (MSFE) and 63 HC eyes. Ultrahigh-resolution optical coherence tomography was used to image the macula of each participant, and the volumetric dataset was segmented to yield a topographic thickness map of the GCIPL. The thickness map was analyzed using discrete wavelet transform to extract useful features, which were further analyzed with three machine learning methods, including logistic regression (LR), logistic regression regularized with the elastic net penalty (LR-EN), and support vector machine (SVM), for classification between groups. LR-EN with nested leave-one-out cross-validation resulted in 94% (83%) sensitivity and 100% (97%) specificity in discriminating MSON (MSFE) eyes from HC eyes in the training data and 88% (64%) sensitivity and 94% (94%) specificity in discriminating MSON (MSFE) eyes from HC eyes in the validation data, which were similar to or better than those resulting from LR and SVM. LR-EN improved the sensitivity and specificity of discriminating prior ON and MSFE eyes in patients with MS from HC eyes compared with traditional thickness analysis. It may be promising in facilitating the diagnosis of subclinical ON in patients with MS.


I. INTRODUCTION
Multiple sclerosis (MS) is an autoimmune neurodegenerative disorder of the central nervous system [1]. Early diagnosis may facilitate treatment and a better prognosis [2]. Brain and spine magnetic resonance imaging (MRI) are the main The associate editor coordinating the review of this manuscript and approving it for publication was Diego Oliva .
imaging diagnostic methods to detect demyelinating lesions. Space and time dissemination of the lesions are major diagnostic criteria of MS [3]. The ocular pathway is often involved in the MS disease course, but optic nerve lesions are not included in the imaging diagnostic criteria, perhaps because of the low resolution of MRI images for the optic nerve. The thinning of the peripapillary nerve fiber layer (pRNFL) and ganglion cell-inner plexiform layer (GCIPL) measured by optical coherence tomography (OCT) is regarded as a marker of neurodegeneration in MS [3]- [7].
Although retinal neurodegeneration occurs in patients with MS, the loss of the pRNFL and GCIPL is usually significant in the eyes of optic neuritis. Indeed, the intereye thickness differences of the pRNFL and GCIPL were reported to be possible image markers of optic neuritis (ON) [8]- [10]. However, the reported discrimination power is approximately 70%, and the assumption of unilateral occurrence of the ON needs to be satisfied once the intereye thickness is applied [8], [9], [11]. If both eyes have ON, this approach may not be applicable [12]- [14]. Alternatively, a detailed examination of the thickness map may improve the identification of focal thinning of GCIPL [15], [16], which may represent optic nerve damage due to ON. However, this requires a special laboratory process of the thickness data, such as alignment and subtraction from the normal healthy population. The sophisticated and complicated approaches in data interpretation may be impractical in routine clinical practice.
Machine learning is an important component of artificial intelligence (AI) systems, and it allows AI systems to acquire their own knowledge by extracting patterns from raw data [17]- [19]. The recent advances in AI, especially in a particular type of machine learning technique named deep learning, has enabled applications of AI in many areas of medicine and healthcare, including medical imaging [17]- [19]. Machine learning algorithms can be used to analyze various features in medical images to make predictions or even diagnoses [18], [19]. In ophthalmology, the focus of the AI approach has been on the diagnosis of early glaucoma [20]- [23] and diabetic retinopathy (DR) [17]. The AI approach improves the diagnostic accuracy of glaucoma, although the majority of studies used only 30 patients with glaucoma [20], [21], [24]- [29]. The success of a pivotal trial of an AI system to detect DR led the US. Food and Drug Administration (FDA) to authorize the system for use by health care providers to detect mild DR and diabetic macular edema [17], which is the first FDA authorized autonomous AI diagnostic system in any field of medicine [17]. In addition, applying data transforms such as the wavelet transform before proceeding with machine learning analysis may have an advantage compared to raw data analysis. For example, wavelet transform can exploit certain features of the thickness maps. Wavelet features can capture the patterns in the image at multiple resolutions and therefore provide more information, which is expected to improve discriminative power. Combining the wavelet transform with machine learning can be designed based on certain optimality criteria to discriminate images in the vector space of features, which could improve the discriminative information appropriately [19], [24], [30].
In the present study, we applied wavelet transform and machine learning methods to analyze topographic thickness maps of GCIPL. The goal was to analyze wavelet features of topographic thickness maps of GCIPL and evaluate their discrimination ability in patients with MS and a history of ON.

II. METHODS
The approval of the Institutional Review Board (IRB) of the University of Miami was obtained, and each recruited subject signed the informed consent form. MS was diagnosed according to the 2010 Revised McDonald Criteria [31]. In addition, the exclusion criteria were applied: any other systemic, ocular or neurological diseases; refractive errors > ± 6.0 diopters; or MS relapse within the past 6 months.
The patient data were from an ongoing MS study, from which several reports involving segmentation of intraretinal layers and thickness maps were published [15], [16]. In the present study, thirty patients with relapsing-remitting MS and a history of ON were recruited together with 63 age-and sex-matched controls (HC) ( Table 1). There were 33 eyes with a history of ON (MSON), 25 non-ON fellow eyes (MSFE) and 63 HC eyes.
Ultrahigh-resolution optical coherence tomography (UHR-OCT) was used to image the macula, similar to previously reported studies [15], [16], [32]- [35]. The custom system configuration, imaging settings and image processing have been detailed in previous studies [15], [16], [32]- [35]. Briefly, UHR-OCT is a spectral-domain OCT with an axial resolution of ∼3 µm and a scan speed of 24,000 A-scans per second. The raster scan containing 512 × 128 A-scan covers a macular area of 6 × 6 mm centered on the fovea. The volumetric dataset was exported and segmented using an automated segmentation software program (Orion, Voxeleron LLC, Pleasanton, CA, USA). The software segmented up to 6 intraretinal layers and exported thickness maps. Previous studies [15], [16] demonstrated that the most profound alteration in topographic thickness in the macula occurred in the GCIPL in patients with MS and a history of ON. Therefore, only the thickness map of the GCIPL was used to identify the patterns in the present study. The dataset of the thickness map contained the thickness information of the map (512 × 128 data points).
To apply the machine learning of data analysis, we first used a two-dimensional discrete wavelet transform (DWT) [36] to extract features for the classification of images. DWT can characterize the texture features of an image at different scales [30], [37] and facilitate the extraction of appropriate features that have discriminative power. We employed the wavedec2 function in MATLAB to implement the DWT, and the biorthogonal wavelet bior1.5 was used. Of note, DWT has also been employed for the classification of fundus images of glaucoma [24], [38]. Since images are of size 512 × 128 pixels, we performed the 7-level (log 2 (128) = 7) DWT. At the first level, the original image was decomposed into four images in four frequency bands: HH 1 , HL 1 , LH 1 , and LL 1 , using the biorthogonal wavelet bior1.5 as mentioned earlier. At the i th level (I = 2, 3, . . . , 7), the image in the LL i−1 band was decomposed into four bands: HH i , HL i , LH i , and LL i . Therefore, each VOLUME 8, 2020 image yields 22 images of different resolutions in a total of 22 frequency bands. For each image, the energy in each of 22 frequency bands was calculated and standardized using the z-score, which yielded 22 features that would be further used to classify images. More specifically, let us represent the image in a frequency band of the i th individual as a p × q matrix D i , then the energy of the band is is the entry of matrix D i on the jth row and the kth column. Suppose that there are n individuals in the dataset Of note, previous research has shown that the energy distribution in different frequency bands has discriminatory power for image classification [30], [36].
We then applied three machine learning methods to classify images based on the 22 wavelet features. These three methods include logistic regression (LR), support vector machine (SVM) and logistic regression regularized with the elastic net penalty (LR-EN). LR and LR-EN were implemented with the software package glmnet [39], and SVM was implemented with the svm function [40]. LR-EN can perform feature selection and classification simultaneously, which may improve classification accuracy compared to other methods that carry out feature selection and classification separately, whereas LR and SVM cannot select features and perform classification based on all 22 wavelet features. Mathematical formulation of LR and SVM can be found in a machine learning textbook [41], [42], while LR-EN can be found in [24] and is briefly described as follows. Let x i be the feature vector in the i th individual, and y i = 0 or 1 be the class label of the i th individual. Here, x i is a 22 × 1 vector containing the energies in 22 frequency bands. The logistic regression model assumes the following probability p( . LR-EN finds model parameters (β 0 , β) by minimizing the negative log-likelihood function of the data regularized by the elastic net penalty [24]. More specifically, the following optimization problem is solved to find regression coefficients of LR-EN, (β 0 , β) where β k , k = 1, 2, is the l k -norm of β, and λ > 0 and 0 ≤ α ≤ 1 are two model parameters that can be determined with cross validation (CV). If we set λ = 0 in (1), the solution of (1) gives the parameters of LR. Since the number of data samples is relatively small and there was the issue of quasicomplete separation [43] in the datasets under consideration, the solution of (1) with λ = 0 is not stable. To overcome this problem, we implemented LR using (1) with α = 0 and λ > 0. The model parameter λ was determined using cross validation.  We used the nested CV procedure [44] to tune the parameters of SVM, LR, and LR-EN and to estimate the generalization error of the classifiers. The nested CV procedure has two loops, both of which employed a leaveone-out (LOO) approach. The inner loop was used to tune the parameters, while the outer loop was used to estimate the generalization error. Since the samples used in the outer loop for error estimation were never used by the training process in the inner loop, the nested CV provided an unbiased estimate of the generalization error. Let us use the dataset of 33 MSON samples and 63 HC samples to illustrate the nested CV. First, one of the 96 samples was held out. The remaining 95 samples were used by an LOO procedure, which was referred to as the inner CV loop to be described in detail later, to train the model and select the optimal values of the parameters. After the parameter values were selected, the 95 samples were used to fit the model, which was then used to calculate the error on the one sample that was held out. This process, referred to as the outer loop, was repeated 96 times until all samples were held out and used to calculate the error. This error is the generalization error, also named the cross validation error in this paper. Now, let us describe the inner CV loop. One of the 95 samples was left out, the remaining 94 samples were used to train the model for a set of parameter values, and the trained model was used to predict the label of the sample left out. This process was repeated 95 times until every sample was left out and its label was predicted. The predicted labels of the 95 samples were used to calculate the error, and the value of the parameter(s) that yielded the smallest error was selected as the optimal value.

III. RESULTS
Three different approaches yielded different areas under the ROC curves (AUCs) in discriminating MSON and MSFE from HC, as well as in discriminating MSON from MSFE. The AUCs, specificity and sensitivity, and the ROCs are shown in Tables 2, 3, and Fig. 1, respectively, for the training data and in Tables 4, 5, and Fig. 2, respectively, for the cross validation data. The sensitivity and specificity of  LR-EN and LR were calculated at the default cutoff probability of 0.5, and the sensitivity and specificity of SVM were determined at the default threshold of 0 for the decision statistic.
First, let us look at the results for the training errors in Tables 2, 3, and Fig. 1. LR-EN yielded AUCs of 0.999 and 0.992 in discriminating MSON and MSFE, respectively, from HC, which are larger than the corresponding AUCs of SVM and LR. The AUC of LR-EN in discriminating MSON from MSFE is 0.744, which is lower than that of SVM and LR.
The LR-EN method provided 94% (83%) sensitivity and 100% (97%) specificity in discriminating MSON (MSFE) from HC, which were higher than or equivalent to those of the SVM and LR methods. In addition, the LR-EN method provided 0.91 of sensitivity and 0.33 of specificity in differentiating MSFE from MSON.
Next, let us look at the results for cross-validation errors in Tables 4, 5, and Fig. 2. LR-EN yielded an AUC of 0.950 (0.849) in discriminating MSON (MSFE) from HC and 0.632 in discriminating MSON from MSFE. These discrimination powers using the LR-EN method were ranked higher than those of the SVM and LR methods. The LR-EN method provided 88% (64%) sensitivity and 94% (94%) specificity in discriminating MSON (MSFE) from HC, which were higher than or equivalent to those of SVM and LR methods. In addition, the LR-EN method provided 91% sensitivity and 28% specificity in differentiating MSFE from MSON. The power of discriminating MSFE from MSON is much smaller than that of discriminating MSON or MSFE from HC, which is At the optimal parameter value obtained with cross validation, we fitted the LR-EN model to the whole dataset and found that LR-EN selected 15 and 17 out of the 22 wavelet features when discriminating MSON and MSFE, respectively, from HC and 4 out of the 22 wavelet features when discriminating MSON from MSFE. LR-EN's inherent feature selection capability may contribute to its superior performance compared to SVM, LR, and the traditional thickness mapping approach. To illustrate the difference between traditional thickness mapping and frequency band-based mapping, images were constructed using inverse DWT (IDWT) (Fig. 3). A total of 17 frequency bands selected by the LR-EN method were used to reconstruct the IDWT images of the MSFE eyes (Fig. 3) compared to the HC eyes. It appeared to show more profound changes in the IDWT images, which may explain the improvement in the performance using the LR-EN method.
The absolute value of a regression coefficient of LR and LR-EN indicates the importance of the corresponding feature in classifying the data points. A larger absolute value reveals that the corresponding feature is more important. Since SVM employed the Gaussian kernel, which effectively transferred the original feature into a new feature space using a nonlinear transform, it is difficult to determine the importance of features from the results of SVM. We ranked the importance of features using the regression coefficients of LR and LR-EN. Both LR and LR-EN picked LH 2 , LH 3 , and HH 7 as the top three features in discriminating MSON from HC and ranked the other features differently. They selected HL 4 , HH 7 , and HH 3 as the top three features when discriminating MSFE from HC and LH 3 , LH 4 , LH 1 , and HL 4 as the top four features when discriminating MSON from MSFE. While three sets of the top features have some overlap, they are largely different.

IV. DISCUSSION
This study demonstrates the improvement in the discrimination powers of MSON and MSFE by analyzing wavelet   [15]. In contrast, Shi et al. reported that the focal thinning of the GCIPL was similar to a circular area (i.e., the M Zone) in the nasal side in eyes without a history of ON in patients with MS [16]. Thinning in the M Zone was also found in MSFE eyes [15]. Although these focal thinning zones provided better discrimination powers in identifying eyes in patients with MS, the discrimination powers did not reach 100%, as found in the present study. In addition, the determination of the focal thickness requires the alignment of the thickness maps using the foveal center, which was done manually in previous studies [16], [33]. Therefore, this traditional approach is time-consuming and subject to possible misalignment, which may not be translated into clinical use.
In contrast, machine learning for analyzing the thickness maps did not require alignment since the wavelet features were directly processed from the raw thickness maps. The machine learning approaches, particularly LR-EN, provided better performance than the traditional focal thickness analysis, which was done in a previous study [15]. The traditional focal thickness analysis provided a sensitivity of 0.97 and specificity of 0.83 on the training data, using the U-shaped thinning zone, to discriminate MSON from HC [15]. The improvement of the machine methods is mainly due to the following two factors. First, wavelet features were exploited, whereas the focal thickness analysis used the average thickness. The patterns of the images were captured and provided more information, which contributed to the improvement of the discriminative power. Second, these refined optimality criteria in machine learning to discriminate images in the vector space of features may further exploit the discriminative information [19], [24], [30].
LR and LR-EN are essentially built on the same probabilistic model to discriminate two classes of images. However, LR-EN is capable of selecting features, whereas LR does not have such capability. LR-EN discarded features in several low-frequency bands and enhanced the difference between images of two different classes, as shown in Figure 2, which in turn improved the performance. SVM employs a different model from LR and LR-EN, and it is not capable of selecting features. We may use a feature selection method to select features first and then input selected features to SVM. However, such a method may not guarantee performance improvement. We tried to feed features selected by LR-EN to SVM but only observed performance degradation compared to the performance of SVM using all 22 wavelet features. This demonstrates the importance of both feature selection and mathematical models for improving discrimination power.
There are several limitations to the present study. The sample size may be small, although we used a similar sample size in previous studies and found significant focal thinning using traditional thickness mapping [15], [33]. Of note, approximately 30 cases with targeted diseases were used in the majority of these previous studies with the AI approach for the diagnosis of glaucoma [20], [21], [24]- [29]. As the first attempt to apply machine learning in patients with MS, this study may serve as a good starting point for the AI approach in the field. Second, we did not clinically confirm that these MSFEs had subclinical ON, although focal thinning of the GCIPL was evident [15]. Third, we did not conduct a longitudinal follow-up of these patients. Fourth, we did not use other information, such as thickness maps of other intraretinal layers and retinal angiography, which, if incorporated into the VOLUME 8, 2020 machine learning process, may further improve the accuracy and robustness.
In conclusion, the machine learning approach using wavelet features of thickness maps of GCIPL improves the performance of the traditional thickness analysis on discriminating prior ON in patients with MS, and the machine learning approach may be promising in facilitating the diagnosis of prior ON in patients with MS.
HONG JIANG received the M.D. and Ph.D. degrees. She is currently an Associate Professor of neurology and ophthalmology with the University of Miami. She completed her neurology residency training at the Jackson Memorial Hospital/University of Miami, Miami, in 2010, and the neuro-ophthalmology fellowship at the Bascom Palmer Eye Institute/ University of Miami, in 2011. As a neuro-ophthalmologist at the Bascom Palmer Eye Institute, she specializes in the diagnosis and treatment of various neuro-ophthalmologic disorders, such as vision loss due to optic neuritis, brain tumor or dementia, and double vision.
In the Department of Neurology, she has expertise in the evaluation and treatment of the various neurologic diseases such as multiple sclerosis, memory disorders, headaches and spinal diseases. Her research interest is to study the ocular microvascular dysfunction in ocular and central nervous system diseases, such as multiple sclerosis, dementia, and dry eye. She has multiple publications in ocular microvascular function studies.
Dr. Jiang is the member of International Multiple Sclerosis Visual System Consortium (IMSVISUAL), North American Neuro-Ophthalmology Society (NANOS), American Academy of Neurology (AAN), American Academy of Ophthalmology (AAO), and the Association for Research in Vision and Ophthalmology (ARVO).