A Deep Convolutional Neural Networks Based Approach for Alzheimer’s Disease and Mild Cognitive Impairment Classification Using Brain Images

Alzheimer’s disease (AD) is a hazardous neurological disorder of people aged in the early 60s. The main symptoms of AD is significant memory loss. Mild Cognitive Impairment (MCI) is a state of dementia in which a patient exhibits the early symptoms of AD. Since brain is the most impacted region, the disorders can be classified by analyzing factors from brain tissues in different subjects. Machine Learning (ML) is a widely utilised concept that aids in the decision-making process. Deep Convolutional Neural Network (DNN) is a type of ML techniques that uses artificially connected neurons to mimic the human brain. In this work, we have proposed a novel DNN-based model for distinguishing AD and MCI patients from Cognitively Normal individuals. Inspired by the original VGG-19, we have created 19 deep layers in the network. In Back Propagation, deeper models suffer from the problem of vanishing gradient and information loss. As a solution, we borrowed the Dense-Block notion from the original DenseNet architecture, which provides a path of information exchange amongst all the layers. Furthermore, we have implemented depth-wise convolutional procedures to make the model computationally faster. Outcome of the proposed model is compared with some prominent DNN models and observed that, the proposed approach performs most convincingly with an average performance rate of 95.39%.

ing the patient incapable of performing even the most basic 25 The associate editor coordinating the review of this manuscript and approving it for publication was Szidonia Lefkovits . tasks and suffering from extensive memory loss [6]. Mild 26 Cognitive Impairment (MCI) is a state of schizophrenia in 27 which a patient's cognitive deterioration is greater than that 28 of a Cognitively Normal (CN) person of the same age [7], [8]. 29 Though MCI patients have difficulties with communication, 30 cognition, and reasoning skills, their problems are not as seri- 31 ous as those of AD. According to a study, most persons with 32 MCI get AD after few years [8]. As a reason, it is imperative 33 to diagnose MCI, and effective neurological therapy may help 34 the patient prevent from developing AD. network, we have replaced all the convolutional opera-91 tions by depth-wise convolutional operations. The use of 92 depth-wise convolutions resulted in a significant reduc-93 tion in computational time. 94 • We compared the proposed model's performance to 95 those of other widely used models such as LeNet, 96 AlexNet, VGG-16 & 19, Inception-V3,ResNet152-97 V2, InceptionResNet, MobileNet-V2, EfficientNet-B7, 98 Xception, NasNet-C, and DenseNet-121. It is observed 99 from the performance evaluations that the proposed 100 model performs better and faster (in terms of train-101 ing/testing time required per epoch).

102
The rest of the paper is organised as follows: a) in section 2, 103 we discussed some of the recently published related state of 104 the art, b) in section 3, we discussed the materials and meth-105 ods used in this work, c) in section 4, we discussed briefly 106 about the proposed architecture, d) in section 5, we discussed 107 and compared the results of different DNN models, e) in 108 section 6, we discussed and concluded the work, and f) in 109 the last section, we have the references.

112
ANN is considered amongst one of the most effective 113 approaches for AD categorization for its ability to learn 114 from prior iterations and improve predictions in subsequent 115 rounds [19]. This section discusses some of the recent pub-116 lications on AD classifications utilising ANN-based tech-117 niques.

118
Jae Y Choi, et al. presented a new AD classifying method 119 focused on the fusion of numerous DNN by accumulating 120 the generalization Loss [20]. Taking brain MRIs as inputs, 121 the authors proposed to combine several DNN based models 122 together. For the combinations of DNNs, For brain images, 123 several perspectives (axial, sagittal, and coronal) are com-124 bined to form assemblages for various DNN. A deep ensem-125 ble oriented generalisation loss towards discovering the most 126 optimal weights among the neurons is used, which also aids in 127 connecting and collaborating for the optimal weight search-128 ing.

157
To determine the most discriminatory regions in the brain, Alzheimer's disease categorization based on tissue-wise hip-208 pocampus characteristics extracted from brain scans [27]. 209 A dual ensemble Hough CNN is used to choose the optimum 210 slices for localising the hippocampus lobes. All 3D slices are 211 converted to 2D and hippocampus patches are then extracted. 212 The anthropometric detail from 2D slices is obtained using 213 a DiscreteVolumeEstimationCNN (DCNN). Convolutional 214 Layers, Rectified Linear Unit, Batch-Normalization layer, 215 and Hidden Layers are utilised in the Hough CNN. Also in 216 the DCNN, a set of Convolutional layers, Batch normaliza-217 tion layers, and a ReLu activation layer is used. The model 218 achieved an average performance of 93%.

219
A new CNN based AD classification paradigm is presented 220 by Liu et al. [28]. As per the authors claim, they only used a 221 minimal number of MR scans for training and validation and 222 yet got good results. For faster training, Depth-wise Separable 223 Convolution (DwSC) is used in place of normal convolutional 224 layers. The authors applied transfer learning from 2 popular 225 DNN models, AlexNet and GoogLeNet, to achieve better and 226 more precise categorization. The overall performance of the 227 model is around 92.21%.  [29]. A revised functional 3-D DNN 230 is designed for conducting two tasks simultaneously: hip-231 pocampal separation and AD diagnoses using MRI scans. 232 The authors combined the concept of V-Net and DenseNet 233 models, where the lower elements in V-Net are replaced by 234 the bottle neck of DenseNet. Following the acquisition of 235 separated hippocampal areas, the separated data are transmit-236 ted to a 3D CNN for categorization of Alzheimer's disease. 237 Regional hippocampus characteristics and also structural 238 similarity from brain imaging are exploited for individual 239 classification. Furthermore, the authors proposes a new loss 240 estimation method that aided in the production of persuasive 241 findings. The average performance of the model is around 242 86.2%.

244
A. DATA AND TOOLS 245 All data utilised in this study are obtained from the 246 Alzheimer's Disease Neuroimaging Initiative's public data 247 sets (ADNI) [30]. The acquired data are in the form of volu-248 metric T1-weighted, Magnetization Prepared Rapid Gradient 249 Echo (MP-RAGE) MRIs. More than 2500 images are col-250 lected for three patient groups (CN, MCI, AD). The images 251 are taken from 240 people (CN=80, MCI=80, AD=80), 120 of 252 whom are male and 120 of whom are female.

253
Python is a popular programming language tool that is 254 often used in clinical computer vision tasks [31]. Python is 255 better than several other tools in terms of implementation 256  because of its simple as well as user-friendly features [32]. 1. GM volume/size: Hence, to analyze the model in more effective way, we have 302 further distributed the data in different sub-classes namely 303 CN1, CN2, CN3, MCI1, MCI2, MCI3, AD1, AD2, and AD3. 304 Table 2 depicts the data organisation.

307
The architecture of the proposed DNN model is presented 308 in Figure 2. Inspired by the original VGG-19 model, in the 309 proposed architecture, we have taken 19 deep layers (Convo-310 lution layers: 16, Dense layers: 3). For pooling operations we 311 have used the MaxPooling layers. For information sharing, 312 shortcut bridges are created from output of each pooling 313 layers to the input of all next convolution layers in forward 314 direction. If we divide the whole architecture in different 315 blocks, then each block of the architecture will look like as 316 Figure 3.

317
In the input layer, brain images of different subject groups 318 are used for training as well as testing the model. As discussed 319 in Table 1, 11700 no.s of training and 4500 no.s of testing and 320 validations images are used.

321
Next layer in the model is for performing convolutional 322 operations. Convolution layers comprised of a series of char-323 acteristic layouts that are used to retrieve crucial image fea-324 tures including boundaries, curves, and so on. Kernels are a 325 series of quadratic arrays of the same parameters. Convolu-326 tion is the process of rolling and overlaying filters across the 327 whole image pixels. In the suggested model, normal convolu-328 tion processes are computationally expensive since the model 329 shares information from each output layer to next all the lay-330 ers in forward direction. To overcome this issue, we have used 331 the Depth_wise_Convolutional (DwC) operations which is a 332 popular method for reducing executing cost and enhancing 333 representation effectiveness [38]. For all channels within the frames, DwC utilises separate kernels and a point-wise 335 1 × 1 convolutional procedure is used to integrate all of the 336 outcomes across distinct channels. Equation The regular convolution process has a computation time of Equations 4 and 5 can be used to represent the total cost The model's subsequent layer is utilised to reduce the  Once determining the loss estimate, next task is to compute 377 the gradients for all of the essential parameters and enhance 378 them using an effective approach. Equations 8 and 9 would 379 be used to represent the process.
In Equation 9, β represents learning rate.

383
Since the proposed model is deeper in nature, it has a num-384 ber of difficulties, including gradient vanishing, information 385 losses. To overcome the issue, we used the Dense-Block like 386 notion from the DenseNet design, which provides a conduit 387 for information sharing across all layers, which can be seen 388 in Figure 3. Equation 10 can be deduced from Figure 3.
In Equation 10, L 0 , L 2 , .., L j represent the j th block, and 391 w 1 , w 2 , . . . , w j represents the weight of the block. Back prop-392 agation of a block can be presented as Equation 11 .
In Equation 11, ϕ is the loss function.

396
After a series of DwC and pooling layers, we have dense 397 or fully connected layers. All neurons in dense layers are 398 interconnected to one another. For example, 'k' is a neuron 399 in dense layer, and l 1 , l 2 , l 3 , . . . . . . , l i are the input weights 400 from n i different neurons. Equation 12 can be used to express 401 the outcome of 'k'.     Table 3. 424 Table 3 shows the average performance based on the val-    Table 4. From The ROC (Receiver Operating Characteristic) curve for 446 all the classes are also obtained as shown in Figure 4 to 447 Figure 12. 448 From Figure 4, it can be observed that, while classifying 449 the classes CN1 vs MCI1, the average ROC score is 0.95. 450 The mean micro & macro average ROC score is 0.905.

451
As presented in Figure 5, while classifying CN2 vs MCI2 452 classes, the average ROC score of both the classes is achieved 453 as 0.92, and the mean micro & macro average ROC score is 454 0.915.

455
As shown in Figure 6, the proposed method achieved an 456 average ROC score of 0.955 while classifying CN3 vs MCI3 457 VOLUME 10, 2022

459
The ROC curve of MCI1 vs AD1 is presented in Figure 7.     ROC curve of MCI3 vs AD3 is presented in Figure 9. The 468 average ROC score is found as 0.95 for the proposed model.

469
The mean micro & macro average ROC score is 0.93.

470
In figure 10, the ROC curve of CN1 vs AD1 is presented.   As presented in Figure 11, the average ROC score achieved 475 by the proposed model for CN2 vs AD2 classes is 0.975 and 476 the mean micro & macro average ROC score is 0.975.

477
In Figure 12,   Performance of the proposed model is compared with the 482 discussed state-of-arts as presented in Figure 13. layers. The proposed model's performance is compared to 508 that of some of the most widely used DNN models, as well 509 as some of the most current state-of-the-art models. Based on 510 all of the performance comparisons, the proposed architecture 511 appears to be the most convincing.

512
Despite the fact that the proposed model performs 513 admirably, there is still some work may be done in the future. 514 Firstly, this model holds a significant number of features 515 due to too many bridge connections. An appropriate feature 516 minimization method can be utilised in the future to minimize 517 extraneous features from feature maps. GradCam/ScoreCam 518 visualisation can be also added to analyse the model's per-519 formance in future. Secondly, data from several sources can 520 be accumulated in the future to enhance performance evalu-521 ations. Furthermore, several more dementia stages, including 522 the early stages of Alzheimer's disease, such as Progressive 523 MCI, Stable MCI, and so on, can be added as classification 524 classes.

526
All authors are responsible for analysis, conceptualization, 527 and writing the original manuscript. All authors have read and 528 agreed to the published version of the manuscript.

530
The authors declare no potential conflict of interest.