ADD-Net: An Effective Deep Learning Model for Early Detection of Alzheimer Disease in MRI Scans

Alzheimer’s Disease (AD) is a neurological brain disorder marked by dementia and neurological dysfunction that affects memory, behavioral patterns, and reasoning. Alzheimer’s disease is an incurable disease that primarily affects people over 40. Alzheimer’s disease is diagnosed through a manual evaluation of a patient’s MRI scan and neuro-psychological examinations. Deep Learning (DL), a type of Artificial Intelligence (AI), has pioneered new approaches to automate medical image diagnosis. This study aims to create a reliable and efficient system for classifying AD using MRI by applying the deep Convolutional Neural Network (CNN). In this paper, we propose a new CNN architecture for detecting AD with relatively few parameters, and the proposed solution is ideal for training a smaller dataset. This proposed model successfully distinguishes the early stages of Alzheimer’s disease and shows class activation maps as a heat map on the brain. The proposed Alzheimer’s Disease Detection Network (ADD-Net) is built from scratch to precisely classify the stages of AD by decreasing parameters and calculation costs. The Kaggle MRI image dataset has a significant class imbalance problem, and we exploited a synthetic oversampling technique to evenly distribute the image among the classes to prevent the problem of class imbalance. The proposed ADD-Net is extensively evaluated against DenseNet169, VGG19, and InceptionResNet V2 using precision, recall, F1-score, Area Under the Curve (AUC), and loss. The ADD-Net achieved the following values for evaluation metrics: 98.63%, 99.76%, 98.61%, 98.63%, 98.58%, and 0.0549% accuracy, AUC, F1-score, precision, recall, and loss, respectively. The simulation results show that the proposed ADD-Net outperforms other state-of-the-art models in all the evaluation metrics.

The proposed CNN model uses a series of conventional 88 blocks consisting of different deep layers to accomplish out-89 standing classification results. The proposed ADD-Net aims 90 to obtain an accurate classification result for detecting AD in 91 its earlier stages with better accuracy. The main contributions 92 of the research study are: 93 • We propose a new convolutional neural network archi- 94 tecture for detecting AD with relatively few parameters, 95 and the proposed solution is ideal for training a smaller 96 dataset.

97
• The previous methods [22], [23], [25] accuracy com-98 promised on Alzheimer's data-set due to an imbalanced 99 number of classes. To handle the imbalance problem 100 of the Alzheimer's data set, we exploited the SMOTE-101 TOMEK oversampling algorithm, which interpolates 102 new images to balance the class samples. 103 • In our proposed model, we used the Grad-CAM to show 104 and highlight the infected part of the brain for different 105 stages of Alzheimer's disease, and the generated heat 106 map intensities highlight each stage's severity.

107
• The proposed model is extensively compared with sev-108 eral other approaches using various evaluation parame-109 ters: Accuracy, AUC, Precision, Recall, F1-score, and 110 size of trainable parameters. It is observed that our 111 approach outperforms other state-of-the-art models.

112
The rest of the paper is arranged in the following way: The 113 related studies of the proposed model are briefed in section II. 114 The methodology and proposed ADD-Net model for AD 115 classification details are presented with the description of 116 the dataset, and model components are shown in section III. 117 The visualization process and the ADD-Net model evaluation 118 with the state-of-the-art models are presented in section IV. 119 The ADD-Net's limitations and the conclusion with future 120 goals are described in section V and section VI, respectively. 121

122
Precise classification of medical images is a strenuous task 123 because of the complicated procedure of obtaining med-124 ical data sets [25]. Unlike other data sets, medical data 125 sets are prepared by expert specialists and contain sensi-126 tive and private information about patients, which cannot 127 be publicly disclosed to anyone. That is why organizations 128 and institutions like Alzheimer's Disease Neuroimaging Ini-129 tiative (ADNI) [26] and Open Access Series of Imaging 130 Studies (OASIS) [27] providing medical data-sets have a 131 screening process for accessing their data-sets which requires 132 an application to be filled and terms to be agreed by the 133 researcher, constraining them from using it for research pur-134 poses only [28], [29], [30], [31]. Medical data sets are inher-135 ently highly imbalanced because it is impossible to compile 136 a data set with an equal number of patients with health and 137 ailment samples. The techniques to tackle this problem are 138 pretty challenging themselves [32], [33], [34], [35]. OASIS 139 data-set containing 416 3D samples is used by Islam

173
These two models are selected due to the ability of VGG19 174 to train on many classes with remarkable accuracy, and 175 the DenseNet169 can handle vanishing gradient issues and 176 reduce the number of training parameters. The data set from 177 Kaggle was fed to both models via Image Data Genera-178 tor (IDG) with different augmentation parameters. Through 179 the augmentation, the pre-trained models like VGG19 and 180 DenseNet169 achieved an accuracy of 88% and 87%, respec-181 tively. Battineni et al. [33] employed an OASIS-3 data set and 182 created a five-layer CNN model to classify three different 183 early stages of Alzheimer's disease [45].

184
Not all the features extracted by a deep model are helpful 185 in accurately predicting the correct class of a sample, and 186 some hinder a model from reaching desired results [46], [47]. 187 This issue of deep models was tackled by El-Aal et al. [29] 188 and presented a novel approach to selecting specific fea-189 tures from the feature map of deep models, which ultimately 190 improves the classification results and reduces the train-191 ing time of the model.    globally. A few researchers have used data augmentation 262 techniques to improve their results. In contrast, none of 263 the reviewed research papers regarding the classification 264 of Alzheimer's disease has recognized the imbalance data-265 set issue. Some researchers failed to obtain notable results 266 because they did not train their models enough. It is observed 267 that research papers focus on discovering new approaches 268 toward classification purposes for biomedical diagnoses. In 269 this proposed model, the input data set is pre-processed using 270 normalization. The essential process of converting the cate-271 gorical data variables is to be provided to the ADD-Net using 272 the one-hot encoder. Then, the Synthetic Minority Oversam-273 pling Technique (SMOTETOMEK) algorithm is utilized to 274 solve the imbalanced data-set issue that over-samples the 275 classes to balance the data-set. Afterward, the data set is 276 split into train, test, and validation by 60%, 20%, and 20%, 277 respectively. Furthermore, the features are extracted using a 278 standard CNN for effectively training the ADD-Net, as shown 279 in Fig. 1. The size of training parameters is smaller in compar-280 ison with [29], [31], and [33] for the robustness of the model 281 in AD classification. The Grad-CAM heat-map algorithm is 282 utilized to visualize the class activation map, highlighting the 283 features that lead to the classification of an image sample. According to the description of the data set, each sample 302 in the data set available on Kaggle is personally verified by 303 the uploader himself. Also, the data set size is reasonable, and 304 the pieces are already cleaned up, i.e., resized and organized.

305
Based on these factors, this data set is used in our research.

306
The data set has 6400 samples in total. The samples are respectively. The only downside of this data set is that it is 312 imbalanced, as discussed in Table 2

318
Typically, oversampling and under-sampling are two tech-319 niques for re-sampling. However, another type of re-sampling 320 approach exists, which is a hybrid of both methods. 321 For this research study, we have employed the hybrid 322 SMOTETOMEK algorithm. It combines SMOTE, the up-323 sampling algorithm, and TOMEK, the down-sampling 324 method. SMOTE generates new samples relying on class 325 nearest neighbors, while TOMEK is an implementation 326 of condensed nearest neighbors. Both algorithms work in 327 sequence, and SMOTE chooses a random instance from a 328 minority class and increases its proportion by interpolating 329 new samples. TOMEK then selects a random sample and 330 discards it if its nearest neighbors belong to the minority 331 class. In this way, SMOTETOMEK evens the examples of 332 each type and effectively solves the dataset imbalance prob-333 lem as depicted in Table 3. To balance out the data set, 334 SMOTETOMEK utilizes the Nearest Neighbor technique to 335 interpolate new imitation samples for the minority classes 336 shown in Fig. 3.     Table 4, and a description of hyper-parameters that plays 358 a vital role in practical training of the ADD-Net model in 359   Table 5.  a convolutional 2D, a ReLU, and an average-pooling2D. 364 The kernel initializer is used to choose weights for the con-365 volutional 2D layer. The ReLU activation function is used 366 to overcome the gradient vanishing problem and allow the 367 network to learn and perform faster. At the same time, the 368 VOLUME 10, 2022 convolutional 2D down-samples the image and its spatial

391
The are two Dense blocks in the proposed architecture and 392 each ADD-Net block has few layers. The details of each layer 393 is discussed in the next subsection. Activation functions are mathematical operations that decide 396 whether output from a perceptron is to be forwarded to the 397 next layer. In short, they activate and deactivate nodes in a 398 deep model. The activation function is used in the output 399 layer to start the node, which returns its label, which is then 400 assigned to the image processed through the model. There are 401 several activation functions. We used ReLU in hidden layers 402 because of its simple and time-saving calculation. SoftMax, 403 a probability-based activation function, is used for the output 404 layer because our model is for multi-class classification.
where L is the calculated loss of each class, and P is the 482 probability calculated by the SOFT function. ROC curves are commonly used in binary classification 505 to investigate a classifier's output. Binarizing the output 506 is required to expand the ROC curve and ROC area to 507 multi-class or multi-label classification. One ROC curve can 508 be generated for each label; however, each element of the 509 label indicator matrix can also be treated as a binary predic-510 tion (micro-averaging). The proposed ADD-Net is compared 511 using the Extension of the ROC curve with DenseNet169, 512  InceptionResNet V2, and VGG19 on the balance and imbal-513 ance AD dataset as depicted in Fig. 9. We can note that 514 after balancing the AD data-set using the SMOTETOMEK 515 algorithm, the AUC significantly for all the approaches, 516 as shown in Fig. 10. AUC has also noted a similar effect 517 for all the classes of the proposed ADD-Net. The AUC of 518  Several deep models were created to classify the early stages 555 of AD. Some were conventional CNN models, while others 556 were based on pre-trained deep architectures. Our proposed 557 model is a deep CNN-based ADD-Net consisting of different 558 ADD blocks and is very effective in classifying the different 559 AD classes, as discussed earlier in this paper. We also created 560 a few hybrid models using state-of-the-art classification mod-561 els InceptionResNet V2, VGG19, and DenseNet169. The first 562 model is a hybrid framework of DenseNet169 and MobileNet 563 V2, reaching an AUC = 98% and AUC = 99% before and 564 after balancing the AD data-set through SMOTETOMEK as 565 depicted in Fig. 13   Several deep models were developed to classify Alzheimer's 616 disease in its early stages. Some algorithms were tradi-617 tional CNN, while others were pre-trained deep architec-618 tures. As mentioned earlier in this paper, our proposed model 619 is a deep CNN-based ADD-Net comprising distinct ADD 620 blocks. We compared our model with the InceptionRes-621 Net V2, VGG19, and DenseNet169 classification models as 622 shown in Fig. 19   In this proposed ADD-Net model, the input data set is 640 pre-processed using normalization. The essential process of 641 converting the categorical data variables is to be provided to 642 the model using the one-hot encoder.

643
Then, the SMOTETOMEK algorithm is applied to resolve 644 the imbalanced data-set issue that over-samples the classes 645 VOLUME 10, 2022         Table 6.

686
A solution to solve any real-world problem is not perfect 687 in every aspect; this ideal case for a solution is used to 688 solve a critical real-world problem that is well matured in 689 its early versions and does not need upgrades. Solutions are 690 prepared after studying the base requirement necessary to fix 691 a problem and then gradually improve by analyzing real-time 692 reviews about the system. In this proposed study, we present 693 VOLUME 10, 2022  Although outperforming other models still has shortcom-696 ings, the proposed mode efficiency suffers on the imbalanced 697 dataset. As discussed above, due to an imbalanced dataset, the accuracy of deep learning models is compromised; our