A Novel Convolutional Variation of Broad Learning System for Alzheimer’s Disease Diagnosis by Using MRI Images

Alzheimer’s disease (AD) is a serious chronic health problem that causes great pain and loss to patients and their families. Its early and accurate diagnosis would achieve significant progress on the prevention and treatment of the disease. Magnetic Resonance Imaging (MRI) is a commonly used technique in nuclear medical diagnostics. However, it is still a challenging problem to diagnose AD, Control Normal (CN), and Mild Cognitive Impairment (MCI) because of the complex structures of MRI. In this paper, diagnosing models for MRI images are proposed to identify the various stages of AD based on the Broad Learning Systems (BLS), as well as its convolutional variants. To verify the validity of the proposed models, experiments on MRI images collected from the ADNI website are tested and evaluated. The results show that our algorithms outperform the other state-of-the-art algorithms for various tasks with better accuracy and less training times. Finally, the cross-domain learning ability of the proposed algorithms is verified on an independent AD dataset.


I. INTRODUCTION
Al zheimer's disease (AD), the most common dementia, is a complex disease leading to memory impairment and other cognitive problems. Currently, there is no known treatment that could cure this disorder, and the average remaining life after diagnosed is about three to nine years [1]- [5]. According to the 2018 World Alzheimer report, nearly 50 million people worldwide were living with dementia in 2018. And this number will triple to 152 million by 2050. Around the world, there will be a new case every 3 seconds [6]. The state of AD can be divided into three categories: Alzheimer's disease (AD), mild cognitive impairment (MCI), control normal (CN), of which MCI can be subdivided into stable mild corresponding required diagnostic images [10]- [12]. In comparison, MRI image is easier to obtain than PET because MRI requires a shorter time and is relatively less expensive for patients. At the same time, machine learning has grown by leaps and bounds. In terms of medical images, researchers have designed many effective algorithms for different data types to diagnose the state of AD.
On one hand, some of these algorithms are based on normal neural network and support vector machine (SVM) algorithms and contain a range of feature processing methods to preprocess MRI images. For example, Liu et al. [13] design a whole-brain hierarchical network (WBHN) to classify Alzheimer's Diseases. WBHN is a special hierarchical neural network designed based on a number of regions of interest (ROI) in the brain. In addition, Jha et al. [14] take a series of feature processing methods, such as Principal component analysis (PCA) and linear discriminant analysis (LDA), to improve the effect of feature extraction on MRI images. The model designed by Salvatore et al. [15], [16] uses the SVM as a classifier, while the intermediate feature extraction and selection are processed by the PCA. In summary, such methods rely on a series of feature processing algorithms to classify accurately.
On the other hand, using convolutional networks to extract high dimension features from MRI or PET images is another approach. Lin et al. [17] propose a convolutional based model, which extracts features from three different axes of an MRI image, to diagnose pMCI and sMCI. Liu et al. [18] adopt a data-driven landmark discovery algorithm to locate the most informative image patches in MRI images. And a group of 3D CNN is designed to extract features and classify various states of AD [19]. Ding et al. [20] design a deep approach based on Inception Network version 3 [21], [22] which is a deep convolutional neural network to classify AD, MCI, or CN based on FDG-PET images. And the InceptionNet performs better than three radiologists on an independent dataset. The success of the above-mentioned methods inspires us to process MRI images by convolutional networks. In addition, some methods based on multi-modal data and functional-MRI images are also proposed and applied [23]- [25], but they are beyond the scope of this paper.
Considering the above two types of methods comprehensively, the former traditional methods must be based on certain additional feature processing methods, and these preprocessing methods have a greater impact on the results and reduce the efficiency of the diagnostic algorithm. Among the many methods of the latter, some convolutional networks also depend on the prior knowledge of the data. Besides that, the InceptionNet model faces the problem of huge model size and unsatisfactory performance on MRI images although it performs well on PET data. Hence, in this paper, we first aim to accurately predict the various states of AD and the diagnosis of AD using MRI images. The second goal is to design an efficient algorithm that relies on feature processing as little as possible to achieve an accurate prediction of AD. Consequently, based on specific image processing, we first design a method named Convolution Feature-based Cascade of Enhancement Nodes BLS (CF-CEBLS) that combines the convolution variation of Broad Learning System (BLS) [26] algorithms to construct the effective diagnosis model of AD. After that, another variants based on the proposed CF-CEBLS and the direct solution BLS are introduced to solve the AD diagnosis problem. Fig. 1 presents the overall flow of AD diagnosis based on MRI images. The main contributions of this paper can be summarized as follows: • BLS and its variation algorithms are introduced into medical image analysis for the first time, providing an effective tool for the diagnosis of AD by MRI images.
• Combined with the advantages of the direct solution method, a fast classification model of medical images is experimentally verified.
• Compared with other models, it provides a feature module with a smaller number of parameters, and the experimental results of our overall model show advantages in multiple data sets. This paper is mainly composed of the following parts. Introduction and succinct related works are given in the first section. The second section offers the details of our dataset, data preprocessing and proposed models. In the third section, we evaluate our algorithm and other related algorithms with the experimental analysis. Finally, the summary of the full paper and future work are given in the fourth chapter.

II. MATERIALS AND METHODS
Here, we first introduce the datasets and MRI image processing pipeline used in this work and then present the proposed methods.

A. DATA DESCRIPTION
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database [27]. The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the Food and Drug Administration (FDA), as a 5-year public-private partnership, led by the principal investigator, Michael W. Weiner, MD. The primary goal of ADNI is to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessments subjected to participants could be combined to measure the progression of MCI and early Alzheimer's disease (AD). For more details with the ADNI database, please check http://adni.loni.usc.edu.
In our experiments, T1-weighted MRI images are randomly selected from ADNI website to use as base training and test dataset. The dataset includes 768 MRI images and it contains 3 categories, each of which has 256 images. Specific details about the dataset are shown in Table 1. Mini-Mental State Examination (MMSE) [28], which is a questionnaire that has 30 questions used to assess cognitive impairment, is often used by the medical community to examine patients with dementia. The highest score for MMSE is 30 points and the higher the score, the better the ability. Usually, a person with an MMSE score of less than 24 is considered to be an AD patient. As shown in Table 1, the average age of the AD group is 75 years (range from 55 years to 92 years) with an average of 22.2 MMSE. The average age of the MCI group is 77 years (range from 58 years to 90 years). And the average age of the CN group is 78 years (range from 55 years to 92 years). Furthermore, an external independent dataset is also used to evaluate the performance of our algorithm according to [16]. The independent dataset is derived from MRI images obtained by each patient when their conditions are stable. It contains four categories, AD, CN, sMCI, and pMCI, for a total of 201 images. Table 2 shows specific information about the independent dataset. As shown in Table 2, the average age of the AD group is 77 years (range from 58 years to 89 years). The average age of the pMCI group is 76 years (range from 57 years to 90 years). The average age of the sMCI group is 76 years (range from 60 years to 88 years). And the average age of the CN group is 77 years (range from 62 years to 89 years).
Since the category number of this independent dataset is different from the training dataset, we need to adjust the categories of the validation set to fit the trained model when validating model generalization performance. The processing method of the independent dataset can be checked in Section 3. A. All images were downloaded in NIfTI format after being licensed on the ADNI website: http://ida.loni.usc.edu/.

B. DATA PROCESSING
According to the ADNI acquisition protocol [27], examinations were performed in 1.5T using a T1-weighted sequence. The MRI images are undergone with the following preprocessing steps: (1) 3D grad warp correction for geometry correction caused by gradient non-linearity [29], (2) B1 nonuniformity correction for intensity correction caused by nonuniformity [30]. These preprocessing steps help to improve the standardization among MRI images from different platforms. Fig. 2.(a) shows expanded views of an MRI image at three viewing angles. After the above preprocessing, the MRI images are further processed as follows: A sequence of processing procedure was performed on the images, consisting of: (1) image re-orientation; (2) cropping; (3) skull-stripping; (4) image normalization to the MNI standard space by means of co-registration to the MNI template [31]. After this phase, all MR images resulted to be with size 121 × 145 × 121 voxels. And the noise mitigation [32] is also considered by the above steps. Fig. 2.(b) shows the expanded views of an MRI image at three viewing axes. The whole process was performed by using the CAT12 [33] software package installed on the Matlab platform (Matlab R2016b, The MathWorks).

2) STEP 2
Until now, all MRI images will retain the entire brain area, and the excess invalid area will be cut off by applying software Nilearn [34] and Nibabel [35]. And then, starting with the 20th slice of the axial view, a slice will be extracted for each of 5 slices from an MRI image. At the end of that, 16 slices are obtained and every slice will be scaled to the size of 125 × 100. For the purpose of fitting the convolutional neural network, the total 16 slices will be distributed in an image by 4 × 4 in the order of skull base to cranial top similar to [20]. After all the processing is completed, the original image is changed from 3D to a 2D image with a size of 500 × 100. Fig. 2.(c) is an example of the final image that feeds into the model.
From another perspective, medical workers often take a similar approach to achieve a 3D image reduction to 2D images. It is worth mentioning that although 3D computer vision technology has been widely used in the field of image and video processing and analysis, 2D convolution still plays an irreplaceable role. We visited physicians in medical imaging, learned from clinicians' conventional processing methods of MRI images, and [20], and then designed relevant models in the hope that the models could be applied to practical applications. Therefore, from the perspective of practical application, this paper hopes to conduct image preprocessing in a way closer to the clinical practice. All processing steps were conducted on Python 3 and SciPy [36], [37]. MRI volumes were visually inspected for checking homogeneity and absence of artifacts both before and after the processing step. The image representation is visible in Figure 2.

C. METHOD PRELIMINARIES
The neural network is a machine learning tool widely used in disease diagnosis, among which Broad Learning System (BLS) is a special neural network structure. BLS is proposed by Chen et al. [26], [38] based on Functional-Link Neural Network (FLNN) and regularization techniques [39], [40]. It aims to offer an alternative way for classification problems instead of learning a deep structure. Prior to this, another graph CNN variant of BLS is prominent in processing electroencephalograms (EEG) [41]. The network structure of BLS is a three-layer network structure. Fig. 3 illustrates the basic model of BLS where X X X ∈ R N ×m is the input matrix, N is the number of instances, and m is the dimension of every instance. Y Y Y is the output layer and W W W is the weights connecting the output layer to both the feature mapping layer and the enhancement layer.
After concatenating the output of the feature mapping layer and enhancement layer, we can think of it as the overall input of output layer, denoted as . W W W e , β β β e are the weights and bias from X X X to the feature mapping layer and W W W h , β β β h connect the feature mapping layer to enhancement layer. Furthermore, Z Z Z is the output of the feature mapping layer. W W W e,h and β β β e,h are randomly generated and the dimensions of W W W e and W W W h are hyper-parameters. ψ and ξ are non-linear functions, such as tansig and tanh. Finally, the problem is transformed into a linear equation system problem, which is To obtain W W W, Eq. (1) can be solved according to the generalized inverse and ridge regression methods by optimizing: where ρ and σ always equal to 2 and λ is the regularization coefficient. We can deduce that: where Through changing the network structure, BLS has derived many varieties, among which the most common one is to change the connection mode of nodes in the enhancement layer. Hence, one variation named Cascade of Enhancement Nodes (CEBLS) [38] is proposed. In this method, for the input data X, the first n groups of feature nodes are generated by the following equations: and W W W e i and β β β e i are sampled from the given distribution. Project the feature nodes Z Z Z n := [Z Z Z 1 , Z 2 , . . . , Z Z Z n ] by function ξ (·), the first group of enhancement nodes is where the associated weights are randomly sampled. The enhancement nodes from the second group to mth group are compositely established as follows: Consequently

D. METHOD PROPOSED
We design a multi-layer convolution module to serve as the feature extractor of the raw image. Meanwhile, in the enhancement layer of the BLS model, we draw the successful experience of CEBLS [38] structure and adopt the cascade network structure for our BLS enhancement layer. In a short, the proposed method could be regarded as two main parts: convolutional feature extractor and CEBLS. Fig. 5 illustrates the overall network structure of CF-CEBLS and details will be explained in the following parts.
First, for the feature extractor module, the convolutionactivation-pooling (CAP) block is designed as the basic part. Hence, with the stacking of multiple CAP blocks, the feature extractor module is made up and defined as follows: where t is a natural number that means the number of convolution-activation layers in each block, P P P is a pooling layer, θ is an activation function and  Fig. 4. Second, the outputs of the last feature extractor block are fed into the first feature mapping node of CEBLS. The specific form can be formulated as Eq. (5), where X is the output of the feature extraction module at this time. And the data flow through each enhancement node in turn. The representation of the above enhancement layer is formulated as follows: In addition, based on the convolution structure mentioned above, we also combine the direct solution BLS, as shown in Eq. (4), with the proposed convolution feature extractor to AD diagnosing in this paper.
In a word, the algorithms proposed in this paper can be summarized into the following two types according to the different top weight solutions. The first model is model 1, which integrates the convolution module and a CEBLS module as a whole, and realizes the simultaneous optimization of the weight of the two modules by gradient descent. The second model is model 2, which based on model 1. Once model 1 is trained, its convolution module will be regarded as the feature extractor, and a BLS module is stacked, which is solved by the pseudo inverse. Model 2 is only applied to the independent dataset.

1) MODEL SETTING
In our model, a basic (multiple convolutions)-pooling block is composed of 5 Convolution-ReLU layers and a pooling layer. The setting of the number of the Convolution-ReLU layers is based on the inspires of VGGNet and our trial experience in the experiment. The convolution kernel number within each CAP block is consistent, which are 32, 64,  Figure 4. According to the theory of BLS, the connection between the output of feature extract modules and Y are optional. However, such kind of variants are not proposed and discussed here, and the connections are presented in dotted lines. b).Z 1 , Z 2 , . . . , Z n are the feature mapping layers and H 1 , H 2 , . . . , H m are the feature enhancement layers. The difference between Z and H is the type of inner connection. For Z, each layer has the same input and there are no connections between them. While for the feature enhancement layers H, every layer is connected to the previous layer except the first layer, and the first layer is connected with the concatenation of all the feature mapping layers. In order to reduce the parameter redundancy and the risk of model overfitting, we applied one feature mapping layer and two enhancement layers in the experiments, and detailed information of the structure are shown in Table 3. 128, 256, 512 respectively. Except for special instructions, all convolution kernel sizes in this study are 3 × 3. For the top-level network structure, the structural parameters are searched by the grid search method, and the resulted optimal structure could be checked in Table 3. For the BLS model in Model 2, similar searches through the searching space are implemented. The network structure parameters of the corresponding results are shown in Tables 5 and 7. Among them, the total nodes of the feature mapping layer and enhancement layer are 20 × 19 + 2000 and 5 × 24 + 900 in Tables 5 and 7, respectively.

2) LOSS FUNCTION
To solve the weights of the other layers to the output layer, that is, solving the objective function Eq. (1), analytic solutions could be obtained by pseudo inverse in BLS. Another alternative way to train output weights is the gradient descent method. Therefore, different from the direct solution method of Eq. (2), the cross-entropy loss function is used as the objective function of our model. Cross-entropy loss measures the performance of a classification model by probability value between 0 and 1. It increases if the predicted probability diverges from the actual label. A typical Categorical Cross-Entropy Loss can be formulated as follows: where M is the number of classes and y is a binary indicator (0 or 1). When class label c is correct for observation o, y = 1. p is predicted probability that observation o is of class c.

III. EXPERIMENTS AND DISCUSSION
In this section, experimental results are presented to verify the proposed methods. Firstly, we introduce the comparison VOLUME 8, 2020 methods and detailed experimental settings. And the experimental results for both the AD state prediction task (AD vs. MCI vs. CN) and AD classification task (AD vs. CN) are shown in the following.

A. METHODS FOR COMPARISON AND EXPERIMENTAL SETTINGS 1) METHODS FOR COMPARISON
To prove the effectiveness of our model, we will compare the performance of our method with existing methods, including InceptionNet [20], which is a typical deep model, and a set of PCA-SVM models [16]. Deep Model: InceptionNet was firstly proposed by Christian Szegedy et al. [21] for normal computer vision problems like image classification. Ding et al. [20] apply the third version of InceptionNet [22] to settle the AD diagnosis problem using FDG-PET images and it achieves excellent results that outperform three clinicians. For Incep-tionNet [20], based on the code released by authors, we made extra improvements by reducing the learning rate(1e −3 → 1e −5 ) and adjusting the dropout rate. It is worth noting that InceptionNet is pre-trained on the ImageNet dataset.
Traditional Model: Traditional machine learning methods, such as PCA and SVM, and their variants often bring unexpected excellent results. Salvatore et al. propose various methods based on PCA and SVM to predict AD [16]. For PCA-SVM models, since the related codes are not released, the relevant results are obtained from the results claimed in their paper. The PCA-SVM models are available in multiple binary classification tasks, for instance, AD vs CN, CN vs MCI. The proposed models and the InceptionNet are multi-class classification models, we also divide the independent dataset into the same partition as PCA-SVM to achieve fairness in comparison.

2) EXPERIMENTAL SETTINGS
We validate our proposed methods on both AD states prediction (AD vs CN vs MCI) and AD classification (AD vs CN). For the AD states prediction, due to sMCI and pMCI both belong to MCI state, the prediction task is equivalent to classify AD vs CN vs MCI (sMCI + pMCI). sMCI means that patients with MCI will not convert to AD for at least 36 months, whereas pMCI means the patient would convert during 36 months. On the other hand, for patients with sMCI, it has much smaller risk to be an AD patient than pMCI. Therefore, we categorize the sMCI as CN and the pMCI as AD similarly. Consequently, the AD classification task is to divide AD (AD+pMCI) from CN(CN + sMCI) Initialization method of Truncated normal is adopted in our model. For both models, the number of epochs is set at 200. The ADNI dataset is randomly divided into two parts: 80% is the training set and the other 20% is the testing set. Meanwhile, 5-fold cross-validation is applied to the independent dataset. Furthermore, in order to reduce the over-fitting, we adopt the early stopping technic.

B. MODEL EVALUATION
We take multiple assessment indicators to evaluate different models. Accuracy (ACC) is the overall accuracy of the test dataset. Sensitivity (SEN) is a measure of how well a model can correctly identify the number of positive samples in all positive samples. Similarly, specificity (SPE) is a measure of how well a model can correctly identify the number of negative samples in all negative samples. And precision refers to how many samples are predicted correctly in the positive samples. F1-scores is an indicator to check the balance between precision and sensitivity. The closer the value of F1-scores is to 1, the better the performance of the model.
Here TP, TN , FP, FN is the number of true positives, true negatives, false positives, and false negatives, respectively.

C. MODEL TRANSFER LEARNING
As mentioned in section 2. A, two datasets, images collected from the ADNI website and the independent dataset, are used to validate the proposed models. In addition to the conventional model training, we also adopt the training method commonly used in transfer learning (fine-tuning): a). Fix the trained feature extractor module, b). Modify the number of final output nodes c). Re-train the top CEBLS model. In detail, both the proposed model and the InceptionNet are trained in ADNI datasets for the classification of AD, CN, and MCI. After that, their feature extractor blocks are fixed while the left networks are retrained in the independent datasets for the classification task of AD and CN. Both models are also performed individually in the independent datasets to verify the cross-domain and cross-task abilities of our proposed model.     been pre-trained by the ImageNet dataset before apply to the AD diagnosis task.

2) INDEPENDENT DATASET
In order to verify the performance of models on different datasets, we apply the dataset of [16], which is the mentioned independent dataset in the previous section. For the purpose of matching the category settings of the independent dataset with the settings of the ADNI dataset, the classes of sMCI and pMCI are classified as MCI. To accomplish the above tasks on the independent set, we propose an addi-tional model, model 2: combining our convolution feature extraction module and the BLS model with analytic solutions. Table. 5 shows the relative results of the InceptionNet model, our model 1, and model 2 on the independent dataset. Among them, both our models and InceptionNet model are fine-tuned without structure changing on the independent dataset. In addition, the results of model 2 are obtained with 5-fold cross-validation on the convolution features of the independent dataset. It can be seen that our models have obvious superior compared with InceptionNet. On the other hand, our model 1 needs much fewer parameters to be trained. For our model 2, relatively excellent results could also be achieved with fixed convolution features.

E. RESULTS OF AD CLASSIFICATION: AD VS CN 1) ADNI DATASET
The results of the AD vs CN task on the ADNI dataset can be obtained by the AD vs CN vs MCI task, as the former is a simplification of the latter. Relevant experimental results are extracted and expressed in Table 6. Obviously, our model 1 performs better than InceptionNet in various indicators.

2) INDEPENDENT DATASET
For the purpose of evaluating the model, we first trained both our model 1 and InceptionNet model from scratch on the independent dataset, respectively. Results are shown in Table.7 with corner mark *. Further, we fine-tuned the network structure of the model obtained from the ADNI dataset again and the results are marked corner mark # in Table.7. In this subsection, we still perform 5-fold cross-validation on the trained model.
As shown in Table. 7, our model 1 with corner mark # performs better under various evaluation indicators, for example, the ACC of our model is 93.51% which is 1.51% higher than the best PCA-SVM model; the parameters that our model needs are relatively minimal. Comparing with the algorithms denoted with corner mark *, we found that both our model 1 and InceptionNet would appear obvious overfitting due to the limitation of data size. However, when the experiments are conducted with fine-tuning, the performance of our model was improved, and the results are significantly better than that of InceptionNet. Therefore, we conclude that the top-level CEBLS structure of the proposed model will provide a more effective feature expression. For the same reason, although the experimental results of our model 2 are not the optimal results compared with other models, its structure is more flexible and the parameters and training time required for training are more concise. If some feature optimization methods are added, we believe that its performance will be improved to some extent.
In addition, we provide the training time and the number of parameters for a comprehensive description of the model performance. It can be found that our model 1 can achieve the best accuracy with only 0.24M parameters, which is about one-tenth of the InceptionNet model fine-tuning parameters, its training time is longer. The success of our model 1 that transfer learning to evaluate on the independent dataset proves the efficiency of our feature extractor module.

1) CLASS ACTIVATION MAPPING
Class activation mapping [42]- [44] is a typical method for visualization of the convolution features of the raw image. It can visualize the model's attention location of the image's underlying features. Similar to neural networks, such as [11], CNN also has excellent performance at data expression. In this paper, in order to demonstrate the correspondence between the class activation mappings and the original image, the class activation mappings of the first and second feature extraction block are shown in Fig. 6 and Fig. 7. By comparing the feature representation of class activation map on samples of different categories, it is shown that the underlying convolution feature extraction block of the model exerts different feature attention on different categories. In Fig. 6, although the model pays more attention to the global information of all class samples, the weight changes in different categories. Specifically, the activation weight of the model belongs to AD is significantly higher than that of the normal sample. As the number of convolutional network layers increases, the model pays more attention to the key classification areas and further extract stronger features conducive to classification. In Fig. 7, by comparing AD samples and normal samples, we can conclude that the differentiation of activated regions occurs in local parts of the model. The region with the most obvious differences is described by the dotted line in the figure. Meanwhile, the solid line region gives a local specific region.

2) RESULT ANALYSIS
A popular method to evaluate the performance of various models is statistical analysis [45]. However, we can intuitively find obvious differences between the models through simple comparison in this paper. By comparing the experimental results of our models and the InceptionNet model, we can ensure that on the ADNI dataset, our model is superior to InceptionNet with relatively small differences. On the diagnosis task of the independent dataset, our model outperforms the InceptionNet obviously.
According to our analysis and understanding, there may be two reasons for the relatively poor performance of Inception-Net on independent dataset. Firstly, the independent dataset is relatively smaller than the ANDI dataset, which leads to the overfitting of the models. Secondly, the images in the independent sets are not completely consistent with the samples in ADNI dataset, therefore whether it is trained by scratch or transfer learning, the effect of the InceptionNet model is relatively ineffective. On the contrary, the distinction of convolution modules and top-level layers between our models and the InceptionNet leads to different results. And the results have proven that the models we proposed are more convenient than the InceptionNet to process cross-domain tasks.
In addition, for the independent dataset, if only PCA or Partial Least Squares (PLS) methods are used for dimensionality reduction and feature extraction, and the SVM model is used for final classification, the results are obviously lower than other models. The main reason is that although the PCA method reduces the dimension of data, part of the useful information is lost in the meanwhile. With the different from it, our models apply the convolution modules as the feature extractor instead of PCA, which results that stronger features associated with various classes could be extracted.

IV. CONCLUSION
Once diagnosed, AD is hard to be cured and afflicts patients for the rest of their lives. Lately, the number of people suffering from AD continues to rise. The early diagnosis of AD is of great significance to delay the deterioration of the disease. In this paper, novel models combined with the BLS method is presented for the diagnosis of AD, CN, and MCI based on MRI images. The proposed models are tested in the selected ADNI dataset and the independent dataset, and the results are compared with InceptionNet and a series of PCA-SVM methods. Overall, the experimental results show that our model is superior to other models. On the other hand, the convolutional part of our model can be regarded as a better feature extraction module when the cross-domain tasks are needed.
As for the future work, we firstly aim to optimize the output of each feature extract block to achieve balance between network performance and parameters. Second, we would like to optimize the convolution module and improve our algorithm to achieve more realistic end-to-end AD diagnosis methods.