AlzheimerNet: An Effective Deep Learning Based Proposition for Alzheimer’s Disease Stages Classification From Functional Brain Changes in Magnetic Resonance Images

Alzheimer’s disease is largely the underlying cause of dementia due to its progressive neurodegenerative nature among the elderly. The disease can be divided into five stages: Subjective Memory Concern (SMC), Mild Cognitive Impairment (MCI), Early MCI (EMCI), Late MCI (LMCI), and Alzheimer’s Disease (AD). Alzheimer’s disease is conventionally diagnosed using an MRI scan of the brain. In this research, we propose a fine-tuned convolutional neural network (CNN) classifier called AlzheimerNet, which can identify all five stages of Alzheimer’s disease and the Normal Control (NC) class. The ADNI database’s MRI scan dataset is obtained for use in training and testing the proposed model. To prepare the raw data for analysis, we applied the CLAHE image enhancement method. Data augmentation was used to remedy the unbalanced nature of the dataset and the resultant dataset consisted of 60000 image data on the 6 classes. Initially, five existing models including VGG16, MobileNetV2, AlexNet, ResNet50 and InceptionV3 were trained and tested to achieve test accuracies of 78.84%, 86.85%, 78.87%, 80.98% and 96.31% respectively. Since InceptionV3 provides the highest accuracy, this model is later modified to design the AlzheimerNet using RMSprop optimizer and learning rate 0.00001 to achieve the highest test accuracy of 98.67%. The five pre-trained models and the proposed fine-tuned model were compared in terms of various performance matrices to demonstrate whether the AlzheimerNet model is in fact performing better in classifying and detecting the six classes. An ablation study shows the hyperparameters used in the experiment. The suggested model outperforms the traditional methods for classifying Alzheimer’s disease stages from brain MRI, as measured by a two-tailed Wilcoxon signed-rank test, with a significance of <0.05.


I. INTRODUCTION
Alzheimer's disease is a progressive neurodegenerative health condition that causes dementia, damage to brain cells and cognitive impairment over time among patients [1], [2]. Detecting Alzheimer's is a difficult and time-consuming process, taking an average of 2.8 years in cases of late-onset and 4.4 years in cases of early-onset dementia [3]. A few years are just a little period of opportunity to improve the diagnostic method. There is considerable evidence that AD is the most prevalent kind of dementia that begins to show clinical indications at the age of 65 or above [4]. It is vital to advance diagnosis in order to speed up the discovery of a cure because late therapy is likely to be a key factor in failed treatment [5]. Furthermore, primary and correct diagnosis has the considerable possibility to minimize health care expenditures since they provide patients with supportive care that enables them to avoid incarceration [6]. There have been and continue to be technological advancements in the field of medical imaging that assess diagnostic medical photographs and provide in-depth information about the diagnostic to clinicians, researchers, and academics to inspire more study and analysis [7], [8]. Medical imaging techniques used today include fMRI, MRI, CT, EEG, and PET scans, among others. Many studies have been conducted using MRI brain scans. The MRI-scanned data have a substantial benefit in that they provide better spatial resolutions and that image features are visible to diagnose disease. AD most usually affects the elderly. Its prevalence is predicted to treble within the coming two decades as that of the ageing population, posing a substantial burden on society [9].
One of the most obvious characteristics of AD disease is brain shrinkage spreading from AD hallmark regions throughout the whole cortical region, as seen by an MRI scan. These apparent structural alterations occurred long before there was a significant loss in cognition, providing a potential for an early diagnosis of AD utilizing imaging techniques. To stop the aberrant deterioration of brain tissue, save healthcare costs, and enhance treatment, researchers are examining methods for detecting this disease sooner rather than later. Recent struggles in AD studies may emphasize the need for early detection and treatments [10]. Dementia diagnosis has come to rely heavily on neuroimaging techniques, and as a result, there have been a variety of newer diagnoses developed. Using machine learning, neuroimaging improves diagnostic accuracy for several kinds of dementia. Early identification and automated categorization of AD have recently evolved, resulting in large-scale multimodal neuroimaging data. MRI, PET, and genetic sequencing findings are among the several techniques used to examine AD.
The ADNI database is one of the most credible sources of brain imaging data [11]. The database contains weighted MRI scans of patients at different stages in the progress of AD. The dataset contains six classes: AD, CN, MCI, EMCI, LMCI, and SMC. Mild cognitive impairment (MCI) is the stage that occurs between the normal ageing-related decrease in memory and thinking and the more severe dementia-related decline. The trend is slower in EMCI patients, who reflect those with moderate levels of cognitive and functional impairment. The rate of development of dementia and issues with language or judgment was greater in LMCI groups. Utilizing deep learning models, the features of the disease can easily be identified for faster and early diagnosis. The proposed model AlzheimerNet structure employs numerous layers to assess functional and structural MRI data for the recognition of Alzheimer's disease. This differs from other comparable models, which often employ only a single type of network or analyze only structural or functional MRI data. Additionally, AlzheimerNet can perform on a high volume of data. It implements a multi-task learning strategy in which the network concurrently learns to identify both the progression and diagnosis of Alzheimer's disease, whereas previous models normally only predict one or the other. The key contribution of the research work are as follows: • Exploring filters for smoothing the images. • Exploring high volume dataset; this is new for Alzheimer's disease classification.
• The proposed method integrates extracted features with deep learning in a novel way for the classification of Alzheimer's disease from the ADNI dataset. Following is how this paper was processed: • A fine-tuned AlzheimerNet model is proposed in this work.
• The model is able to classify individuals into the five phases of Alzheimer's disease namely: MCI, EMCI, LMCI, SMC, AD, and overall six classes, including NC.
• The performance of five different CNN models, VGG16, ResNet50, MobileNetV2, AlexNet, and InceptionV3, is compared to AlzheimerNet to determine the proposed model's efficiency in performance..
• Data pre-processing along with augmentation are carried out to enhance the features and balance the dataset. The following steps are taken for performance comparison and model evaluation: • Parameters like accuracy, recall, precision, f1-score, etc.
are computed from the confusion matrix and used to evaluate the models' efficiency.
• Grad-CAM is designed to provide a visual representation of the model.
• To evaluate the proposed model, a two-tailed Wilcoxon signed-rank p-value is worked out.
• To further evaluate the proposed model, the ROC curve is generated for the model and the AUC value is calculated. The remaining contents of this article will be presented in the following format. Section II contains the literature review of research done previously for AD detection using the ADNI database with the employed models' accuracy rates. Section III describes the dataset collection, pre-processing and preparation before being fed into the model. Section IV illustrated the selected transfer learning models, their parameters and the proposed model. Section V portrayed the ablation study to choose the hyperparameters for the fine-tuned model (AlzheimerNet). Section VI discusses the result and the discussion. Section VII summarizes the study and discusses further research. Figure 1 depicts the entire research process flow.

II. LITERATURE REVIEW
Machine learning provides a method for automatic categorization by learning complicated and nuanced patterns derived from massive volumes of data. In AD studies, various techniques are developed on a regular basis to automate diagnosis and forecast prospective medical state utilizing biomarkers. To classify AD using MRI and PET brain scans, research [12], suggests the use of a cascaded CNN model. Multiple imaging modalities may be used to automatically learn the general, multi-level, and multi-modal characteristics for categorization, which are rather resistant to changes in scale and rotation. The pre-processing of the brain scans does not need any image segmentation or rigorous registration. A total of 397 people, including 93 Alzheimer's patients, 204 people with mild cognitive impairment, and 100 normal controls (NC), were scanned as part of the ADNI database. Based on the experimentation results, the suggested strategy has a 93.26% accuracy rate. This study [13] tries to demonstrate a classifying approach that depends on RF (Random Forest) feature selection and CNN classification. The model is trained on a mixed cohort of patients with HC, AD, MCI, and cMCI. Overall, this effort had a 38.8% accuracy rate.
The authors in [14] propose three effective methods for creating graphical descriptions for AD classification using 3DCNN. The 3D-VGGNet and 3D-ResNet methods for categorizing MRI scans from the AD and normal cohorts (NC) are trained using ADNI brain MRI images in this study. The area under the receiver operating characteristic curve (AUC) and classification accuracy (ACC) are used to quantify classification performance, with 3D-ResNet achieving the best accuracy of 79.4%. The authors of [15] suggest a technique for diagnosing AD that uses a CNN and a mix of sMRI and DTI techniques on a hippocampus ROI using data from the ADNI database. The authors selected 214 individuals, 48 AD cases, 108 MCI cases, as well as 58 NC patients. Each patient is scanned with a T1-weighted sMRI and a DTI scan Additionally, the authors provide an approach for balancing classes of varied sizes and investigate the influence of ROI size on classification results, achieving a classification accuracy of 96.7%. This paper's [16] goal is to evaluate the accuracy of MCI to AD conversion using MRI data and a deep learning method based on CNNs. To begin, MRI images are processed for age correction. Second, local patches are taken from these scans and combined into 2.5 dimensions. The segments from AD and NC are then used to train a CNN to identify the deep learning features of MCI patients. Following that, FreeSurfer is used to mine structural brain picture characteristics to aid CNN. Finally, all sets of attributes are fed into a Learning algorithms classifier, which is used to forecast AD conversion. This method yields a 79.9% accuracy.
The purpose of this research [17] was to detect MCI and AD in 3089 T1-weighted MRI data from the ADNI-2 cohort and to examine its viability for predicting AD in the ADNI-1 cohort and an Asian cohort using a spectral graph-CNN with cortical thickness and shape. The model outperformed other deep learning approaches such as Euclidean domain-based multilayer networks and 1D CNNs on brain structure, as well as 2D and 3D CNNs on T1-weighted MRI data from the ADNI-2 cohort. On the ADNI-1 cohort, the find-tuned graph-CNN achieved an excellent 89.4% accuracy of classification for CN vs AD. The authors developed a three-dimensional CNN for the purpose of detecting AD in [18]. The suggested model is based on 1230 PET scans of 988 individuals (169 AD, 661 MCI, and 400 CN). The ADNI database was used to acquire the data. The raw scans were stripped and normalized to exclude non-cerebral structures such as the skull, scalp, and dura, which reduces computational complexity and time. For CN/AD classification, the network obtains a similar accuracy of 88.76%. The authors of [19] extract feature using ensembles of basic CNN's then classify them using SoftMax cross-entropy. Then, they used a patch-based method in light of the data shortage. After preprocessing, authors concentrated on both the left and right hippocampal sections and input three view patches (TVPs) to the CNN, achieving 90.05% accuracy. The authors of [20] describe a deep learning technique for predicting the progression of AD in a specific patient using resting state fMRI data and a 3D CNN. This approach extracts spatial information from a four-dimensional volume and eliminates the time-consuming processes connected with extracting features in the past. In the experiment, a very basic deep learning architecture achieves 94.58% accuracy in classifying AD. This highlights the value of geometric feature extraction and classifiers for the identification of biomarkers for neurological diseases. The research [21] aimed to evaluate a novel deep-learning strategy for diagnosing AD and simultaneously predict Mental Status Assessment outcomes. The authors created operational 3D independent element spatial maps to implement as features in regression and classification problems by analyzing 331 individuals' resting-state fMRI images. For the classification job, a 3D CNN architecture was designed. In order to predict MMSE scores, we used linear least square, support vector, and bagging-based ensemble regressions, all of which were based on variables from the groups' independent component analyses. Techniques for maximizing features, such as the least absolute shrinkage and selection operator, as well as SVM-RFE, were employed to improve the performance of the MMSE regression. For the categorization of AD vs healthy controls, the mean balanced test accuracy was determined at 85.27%. By dealing directly with four-dimensional fMRI data, the authors of [22] describe a four-dimensional deep learning algorithm (C3d-LSTM) for AD classification that can quickly use spatial information. To extract spatial information from every region in a three-dimensional static picture sequence collected from fMRI, the proposed C3d-LSTM model integrates a number of 3D CNN models. The instant information contained in the data was then retrieved using the produced features and supplied into an LSTM technique. The findings demonstrate the suitability of the suggested C3d-LSTM model for processing four-dimensional fMRI data and precisely extracting its spatiotemporal aspects for AD diagnosis. The primary goal of [23] is to apply a streamlined approach for feature extraction and classification of AD to fMRI data using a modified 3D CNN. The fMRI data were motion-corrected, normalized, and coregistered, the global mean and temporal signal drift was decreased, and intensity-based thresholding and masking were conducted before being input to the CNN. The trained network has a 93% success rate in properly labeling regions of 4D BOLD fMRI data as having either AD, cerebrovascular disease, encephalomyelitis, or traumatic brain injury.
In [24] machine learning techniques such as KNN, SVM, DT, LDA, and RF are used to classify the data (RF). Additionally, a novel CNN architecture is proposed for classifying the severity of AD. The objective of this research is to examine the link between Alzheimer's patients' functional MRI scans and their MMSE scores. The robust multitask feature learning technique is used to extract the features. Additionally, the Mini-Mental State Examination score is used to determine the severity, which is categorized as low, mild, moderate, or severe. The given CNN approach yields a 96.7% accuracy. The study in [25] proposes and evaluates a number of deep models and architectures, including two-and three-dimensional CNN and RNN. To deploy a 2D CNN to 3D MRI volumes, every MRI scan is split into two 2D slices, with no respect for the interconnections among the 2 two-dimensional slices inside the MRI volume. Alternatively, a CNN model could be preceded by a Recurrent neural network in such a way that the two-dimensional CNN + RNN model can understand the connection across sequences of two-dimensional slices acquired during an MRI. The 3D voxel-based technique combined with transfer learning achieved a classification accuracy of 96.88%. The suggested approach in [26] creates multilayer perceptron representations of individual AD risk and high-resolution disease probability maps from regional brain activity. To prevent inequalities between classes, samples should be dispersed uniformly. The five subtypes of the ADNI dataset are called AD, MCI, EMCI, LMCI, and NC. The 1296 photos in the ADNI dataset were modified to fit the DEMNET model at 176 × 176. The accuracy of the DEMNET model is 84.83%. The authors of [27] trained a CNN to identify AD utilizing N = 663 T1-weighted MRI scans of persons with dementia and amnestic MCI. The models were subsequently verified utilizing cross-validation and three individual datasets totalling N = 1655 cases. They examined the relationship between relevance ratings and hippocampal volume in order to demonstrate the approach's therapeutic value. To help with model understanding, the authors created an interactive depiction of 3D CNN significance maps that enables intuitive model evaluation. Classification of AD vs. CN was performed with a 94.9% accuracy.
According to the proposed model [28], which uses AlexNet and MRI datasets, a maximum of 95% and 0.1643% accuracy and loss may be achieved for distinguishing non-demented, very mild, moderately severe dementing states. The AlexNet architecture combined with the Adam optimizer is being used to train MRI images using a variety of learning rates, including 0.0001, 0.001, 0.01, and 0.1, as well as loss binary cross-entropy. In [29], Angkoso et al. suggested a Multiplane CNN method on 1500 MRI Data from the ADNI dataset to classify AD, MCI and NC. The method uses brain extract tool (BET2) to remove the non-brain area from the MRI image. The proposed architecture is based on sequential CNN to identify the spatial structure data. The experiment achieved 93% overall accuracy in the classification of the three classes. A 2DCNN is proposed to classify AD and MCI using 3312 MRI images in [30]. The study uses BET2 for skull stripping in pre-processing the images. The proposed model is based on LENet-5 with the modification of the Leaky ReLU activation function and sigmoid function. Additionally, batch normalization was used to stabilize the learning process. The highest accuracy of 84% was achieved by the fine-tuned model successfully in the classification of AD. In [31], authors proposed a finetuned ResNet18 model to classify MCI AD and CN from MRI and PET data. The fine-tuned model used transfer learning and weighted loss function to balance the weight for each class. Additionally, the mish activation function was employed to increase classification accuracy. Overall, the proposed model can successfully classify with an accuracy of 88.3%. The authors propose a 3D CNN with dilated convolutions at the subject level in [32]. The system is designed using ADNI data from individuals with stable AD-kind dementia (sDAT) and normal control (sNC). The sDAT and sNC were identified from one other with an accuracy of 88% after a 5-fold cross-validation.
In this study in [33], authors used MRI to solve the job of AD multiclass classification while experimenting with randomized weights of ResNet18 and DenseNet201. The image's discriminating region was identified using a gradient class activation map for the suggested model's prediction. The experimental research revealed that the modified model could perform multiclass classification with an accuracy score of 98.89% suggesting that Alzheimer's disease may be classified using cutting-edge deep learning methods. The authors combined Resnet18 and Densenet121 to create a ResD hybrid technique that was utilized to classify Alzheimer's disease into five categories in [34]. For the study uses MRI data from the ADNI datastore. In this study [35], authors developed and tested CNN models for feature extraction of MRI image data and classification of AD. The model is then assessed using the validated model by examining the features that CNN (ResNet) retrieved from the fully connected layer. SVM, RF, and Softmax are the three ML classifiers that are used for each set of features and to analyze the outcomes. The ADNI and MIRIAD public datasets are used in this work. The classification accuracy produced by the combined models ResNet50 + SVM and ResNet50 + RF is 92% and 85.7%, respectively.
This study [36] will make use of the pre-trained models SqueezeNet, ResNet18, AlexNet, VGG16, DenseNet, and InceptionV3 to classify AD into four categories. The OASIS database's MRI images were used in the authors' experiments. After evaluating the implemented models, it is observed that the pre-trained SqueezeNet model derived the highest validation accuracy of 82.53% or multiclass classification of AD. For AD classification, authors in [37] suggested an improved VGG-16 architecture employing the arithmetic optimization algorithm (AOA). During pre-processing, the T1-weighted MRI image file is processed using the CAT12 toolbox. While the linear contrast stretching raises the picture contrast level. Similarly, image-enhancing techniques correct the unequal light distribution. After data preprocessing, an Optimized VGG-16 employing AOA successfully categorizes the four AD classes with 97. 89% accuracy. The study [38] is an overview of various pre-trained CNN models applied of the ADNI dataset. The authors use MRI brain image to determine NC, MCI and AD classes. To determine the 3 classes, CNN models such as GoogLeNet, ResNet50, AlexNet, ResNet18, DenseNet and SqueezeNet were implemented. It is determined that the highest classification accuracy is obtained from GoogLeNet with 96.81% followed by ALexNet and ResNet50 with accuracies of 96.59% and 96.13% respectively. The neural networks used in the article [39] were applied to datasets of MRI images. There are five different types of subjects with Alzheimer's disease used in the study. For the classification challenge, authors employed MobileNet for transfer learning. For multi-class AD stage classifications, the MobileNet pre-trained model has been improved and has obtained 96.6% accuracy. By adopting a Learnable Weighted Pooling (LWP) method to transform 3D brain imaging into 2D fused images, the authors in [40] developed a unique 3D-2D methodology. The suggested model can quickly transfer the fused 2D picture by a pre-trained 2D model using the 3D-to-2D conversion while attaining higher performance over various 3D and 2D baselines. ResNet34 was used for feature extraction since it surpassed other 2D CNN structures and gains an accuracy of 88%.

A. ADNI DATASET
The research utilized the Alzheimer's Disease Neuroimaging Initiative's (ADNI) dataset [11]. Michael Weiner, MD in 2003 founded ADNI as a public-private collaboration with the sole focus of testing whether MRI and PET scans, biological markers, biochemical tests, and evaluations of neurophysiology may be used to monitor the development of AD at every stage. The database provided 23508 T2-weighted MRI scans from 2456 different subjects. T2w scans were obtained in 3D using an FSE sequence with 3.0 mm slice thickness, 56 axial slices spanning the whole brain, a 256×256 scan matrix, a voxel size of 0.8594 × 0.8594 × 5 mm, with TR = 3000 ms, and TE = 95.2 ms for T2w. The imaging scans are divided into six categories namely NC, MCI, EMCI, 16380 VOLUME 11, 2023 LMCI, SMC, and AD. A sample of the dataset is presented in Figure 2. Each scan consists of multiple slices that show the brain's functional properties. Table 1 shows the demographics of the dataset utilized in the study. In this study, we included all subjects and their scan images however we excluded the variables ''Sex'' and ''Age'' as they do not contribute to the image processing classification task.

B. DATASET PREPARATION
The ADNI MRI data is collected in (Digital Imaging and Communication in Medicine) format. Since the CNN model employed in the detection requires image data in JPEG format, the 3D DICOM image is converted to 2D JPEG image. Using the DICOM converter windows application, the downloaded data from the ADNI website is converted to JPEG image. These converted data are viewable with any image viewer application are ready to be used as input in CNN models. In order to improve image quality, image enhancement and data augmentation approaches are used. This section describes the image processing techniques in detail.

1) IMAGE ENHANCEMENT
Image enhancement is a process that emphasizes certain parts of an image while simultaneously reducing or eliminating any unnecessary elements. For instance, reducing noise, revealing blurred features, and adjusting the intensity of a picture to attract attention to specific regions. Images are enhanced so that they may be processed or analyzed further by increasing their contrasts and sharpness. Prior to processing, picture enhancement is the technique of boosting the original data's quality and information content. Before being further processed, image enhancement increases the brightness and information of raw data. The enhancement doesn't really increase the data's fundamental information value; rather, it increases the contrast ratio of the selected look, making it more clearly recognizable [41].

a: APPLYING CONTRAST LIMITED ADAPTIVE HISTOGRAM EQUALIZATION (CLAHE)
CLAHE is a more sophisticated form of Adaptive Histogram Equalization (AHE) that outperforms AHE in improving the quality of complicated structures, as demonstrated by the authors in [42]. Additionally, it boosts local contrast to improve the utility of medical imaging [43] and works on tiny areas of an image to eliminate the false borders by combining the adjacent tiles utilizing bilinear interpolation [44], [45].
Suppose an image has a size of N × N and each tile in that image has a size of n × n. Then, using Equation (1) the total number of tiles is computed, Required histograms of the mentioned tiles are generated by leveraging the clip limit, C L as mentioned in equation (2).
Here, N CL refers to the Normalized form of Contrast Limit. N AVG indicates the Number of average counts of the pixels in the image. Equation (3) is enforced to determine the value of N AVG .
From the equation, N g , N x and N y refers to the Number of Gray levels, the number of pixels in x and y dimension respectively in the tile. Equation (4) calculates clipped pixel average counts.
Here, C L indicates clipped pixels and N C L refers to the total number of C L . The N cp is dispersed among each corresponding gray level. However, some pixels remain to be distributed. To redistribute the remaining pixels, equation (5) is used.
where redistribution, R is computed by dividing N g by the Number of pixels to be redistributed, N r . Bi-linear interpolation merges neighborhood tiles to remove image data boundaries.
The CLAHE technique overcomes the constraints of global methods by enhancing local contrast. This technique, however, is dependent on two critical hyperparameters: (1) the VOLUME 11, 2023 number of tiles and (2) the clip limit. An incorrect hyperparameter selection may significantly reduce image quality. Iteratively testing different values for these parameters leads to the optimal solution, which in this case is tileGridSize (12,12), clip limit (3). Figure 3 illustrates the findings.

b: FILTER SELECTION
Applying a filter is the last enhancement process of this study. To ''filter'' an image means to alter its visual appearance by adjusting the pixel colors in the image. Applying filters has the effect of increasing contrast and adding a number of distinctive effects to the images. The properties of the image are increased and made more comparable to the model that is being trained, validated, and tested using this filter approach. It can be observed that from figure 4, several filters (Blue Orange Icb, 16 Colors, Green Fire Blue, 6 Shades, Cyan Hot) are applied to MRI scans images and selected the best filter with the highest accuracy. In Figure 5, the enhanced image is shown as output following the use of CLAHE and Green Fire Blue methods.

C. IMAGE AUGMENTATION
A deep learning model needs huge data to perform properly. Some augmentation approaches are applied in this research to augment the enhanced data. By adding more diverse training dataset samples, data augmentation can help machine learning techniques perform better and produce better results. The algorithm performs well and is more reliable if the dataset utilized to train it is sufficiently vast and diversified. The use of data augmentation methods [46] increases the accuracy of the results. Additionally, data augmentation methods are a very well strategy for increasing the variety of the dataset. This strategy reduces overfitting in CNN models [47], [48] by providing sufficient diverse data for training. Flipping, rotating, zooming, mirroring, and cropping techniques are the most often used to augment data. In this study, seven augmentation techniques are utilized including horizontal flip, vertical flip, rotate 45 • , rotate 45 • horizontal, rotate 90 • left, rotate 90 • right, and translate on pre-processed data. Table 2 provides a concise overview of all available augmented settings. The dataset was expanded to a larger dataset containing 60000 images (NC: 10000, MCI: 10000, EMCI: 10000, LMCI: 10000, SMC: 10000, AD: 10000) after augmentation. The process of data augmentation is shown in figure 6.
The final step before training is to divide the dataset. The MRI scans were separated into three groups based on a 60:20:20 proportion for the training, validation, and test sets, respectively. After splitting the larger MRI Scans dataset of 60000 images into three subsets, the training set contains 36,000 MRI scans, the validation set contains 12000 MRI scans, and the testing set contains 12000 MRI scans. All the transformed images in training, validation and testing are independent of each other. Table 3 illustrates the dataset's final description after all preprocessing procedures.

IV. PROPOSED MODEL
Our proposed model was developed with the primary intent of automating the identification of persons with AD while enhancing prediction accuracy. In order to detect AD using MRI data, five pre-trained models, including the VGG16, MobileNetV2, AlexNet, ResNet50, and InceptionV3 models, are investigated to determine the most efficient transfer learning strategy for the classification job. To achieve superior classification efficiency, a modified DL framework is presented to identify AD.

A. ENVIRONMENTAL SETUP
All testing was carried out on an AMD Ryzen 7 3800X Central processing unit with an 8-core, 16-thread configuration running at 3.90 GHz and 32 GB of RAM, and the Anaconda 3 Spyder application platform was used. It is paired with a GPU from the MD Radeon RX 580 series. Following extensive analyses of five architectures, the AlzheimerNet model was selected as the most accurate model. Alzheimer-Net training took over 6 days with this configuration. Python and OpenCV were used to achieve the results. The model is tested using Scikit-learn and NumPy.

B. DEEP LEARNING MODELS FOR CLASSIFICATION 1) VGG16
Simonyan and Zisserman [49] introduced VGG16 DCNN. VGG models may help the kernel learn increasingly complicated traits. A fine-tuned VGG16 model outperformed a fully trained network [46]. The network was trained on 1 million  ImageNet pictures. The model is made of sixteen layers and is able to identify pictures from diverse categories. The conv1 layer receives an input of 224 × 224 RGB images of fixed size. The image is processed through a series of convolutions with filters configured to develop the lowest achievable area of receptivity size: 224 × 224. In addition, one of the combinations employs 1 × 1 convolution filters, which perform a linear transformation on the input channels. Following the convolutional layers, spatial pooling is achieved using five max-pooling layers. Over a 2 × 2 pixel frame, Stride 2 is employed to perform max-pooling. Three Convolution Layers are stacked on top of a convolutional layer stack. The first two have a total of 4096 channels, while the third is capable of 1000-way ILSVRC classification. Lastly, there is a softmax layer [50]. In all networks, the structure of the fully interconnected layers is the same [51].

2) MobileNetV2
MobileNetV2 [52] is a 53-layer convolutional neural network. It is a pre-trained classifier based on over a million images from ImageNet. The model has been pre-trained to classify images into 100 different object categories. Consequently, the network has obtained a lot of features extracted for a broad set of images. MobileNetV2's construction starts with 32 filters and 19 bottleneck layers arranged in fully convolution layers. It consists of two separate blocks, each with three layers. Both blocks begin and conclude with 1 × 1 convolutional layers containing 32 filters, however, the second layer is a convolutional layer that operates in depth order. The ReLU is used at all levels of the structure. The stride values differ between the two blocks, with block one holding the stride size of 1, and block two holding the stride length of 2.

3) AlexNet
AlexNet [53] is a well-known model of convolutional neural networks. As its fundamental construction blocks, it includes convolutions, max pooling, and dense layers. The model is fitted across two GPUs using grouping convolutions. Eight layers make up the Alexnet architecture, and each layer has a unique set of attributes that could be trained. The model is made up of 5 convolution layers that use a combo of max pooling and fully connected layers, as well as 2 normalizing layers and 1 softmax layer. Each layer is composed of convolution layer and a nonlinear activation function based on the ReLU principle. Max pooling is performed by the use of pooling layers. Due to the connected layers, the input size of 224 × 224 × 3 is fixed. When a grayscale picture is used as an input, it is converted to RGB by doubling the single channel, resulting in a three-channel RGB image. The model has 60 million parameters and a batch size of 128.

4) ResNet50
ResNet50 [54] is a 50-layer convolution neural network of 48 convolution layers, one Max Pool layer, and one Average Pool layer. It supports floating-point operations up to 3.8 × 10 9 . The ResNet50 architecture employs a combination of convolutional filters of various sizes to address the decay problem associated with CNN models and to decrease the training time associated with the deep structure. ResNets have fewer filters and are thus faster. When compared to the smaller 18-layer ResNets, the performance of the 34-layer ResNet is 3.6 billion FLOPs, as opposed to 1.8 billion FLOPs. This architecture can be trained on approximately 23 million parameters. The network may take an input picture with a height, width, and channel width that are multiples of 32. Each ResNet design uses a 7 × 7 and 3 × 3 kernel size for initial convolution and max-pooling, respectively.

5) InceptionV3
InceptionV3 [55] minimizes the amount of computational power necessary by modifying previous Inception designs. A few techniques used to reduce computing costs include dimension reduction, regularization, factorized convolutions, and parallelized computations. This network significantly improves upon earlier Inception models in several ways, including factorized convolutional layers, label smoothing, and the inclusion of auxiliary classifiers to transmit labelled data throughout the network. The InceptionV3 model's training time is reduced by substituting smaller convolutions for bigger convolutions. Numerous optimization strategies have been proposed to remove constraints and enhance flexibility in an InceptionV3 model. Image pre-processing is a critical component of the system and significantly impacts the model's maximum accuracy during training. At the very least, photos must be encoded and scaled to match the model. For Inception, images should be 299×299×3 pixels in size. The model is composed of symmetric and asymmetric foundations, including average pooling, convolutions, concatenation, maximum pooling, dropouts and fully connected layers. The model employs batch normalization considerably on activation inputs and the loss is computed using the SoftMax method.

6) PROPOSED MODEL (AlzheimerNet)
The fine-tuned InceptionV3 architecture outperforms the other five architectures mentioned above in classification performance. Thus, the AlzheimerNet model is suggested, principally on the basis of a modified InceptionV3 framework and evaluated using the ADNI dataset. Additionally, ablation research is conducted to enhance the architecture's durability for the Alzheimer's classification problem. Alzheimer-Net is compact, computationally intensive, and improved performance in large and small datasets. Figure 7 depicts the model's overall structure. The dataset consists of 60,000 images into 6 classes namely NC, MCI, EMCI, LMCI, SMC, and AD. The dataset is split into 20% for testing purposes, 60% for training and 20% for validation purposes. Initially, all the images are not the same size, all the images are reshaped into 255 × 255. The base layer of InceptionV3 is frozen later and different types of layers are added to customize the model for better accuracy on MRI scans dataset. The input layer of the architecture image size is 255×255×3. In the first block, InceptionV3 base architecture is used and dropout was applied to avoid the overfitting problem with a range of 0.5. Convolution, GlobalAveragePooling and flatten layer is added with the kernel size 3×3. BatchNormalization is conducted in the second block, and a dense layer of 512 neurons with the activation function Relu is added.
Similarly, Dropout and BatchNormalization have been used in a different combination in the third block. Consequently, another dense layer with 256 neurons and a RelU activation function has been introduced in this block. Batch-Normalization and RelU activation functions are added to simplify the process. The same method is repeated for the next two dense layers with the kernel size 3×3, containing 128 and 64 neurons, respectively, for blocks 4 and 5.
In the last block, a flatten layer, Dropout, and BatchNormalization were applied using the RMSprop optimizer, and a softmax activation was included in the final layer. Throughout the procedure, a rate of learning of 0.00001 is utilized. Finally, the performance is evaluated based on the performance matrix. The layer's connection and novel features in the model are described stepwise below.
Step 4: Flattening the image data.
Step 7: Adding another dense layer with 256 neurons and activation function as RelU.
Step 11: Put another dense layer with 64 neurons with Relu Activation function.
Step 12: Applying Batch Normalization, adding a flatten layer with SoftMax activation function.

V. ABLATION STUDY
A common technique for parameter tuning is the grid search strategy, which is employed to select the parameters, including epochs, learning rate, optimizer, dropout, and batch size. Furthermore, the ablation study has been conducted to ensure the proposed model robustness. The following elements have been affected by the ablation study: Flatten layer, Global AveragePooling2D (GAP2D), Optimizer, Loss function, Filters and learning rate.  The validation loss [56] and validation accuracy are abbreviated ''Vld_Ls' and 'Vld_Acc' respectively, the test loss and test accuracy are indicated by 'Tst_Ls' and 'Tst_Acc' respectively in this research. According to In this investigation, a flatten layer has been incorporated to flatten the feature maps of the previous convolutional layers. Prior to this, a flatten layer was employed to refine the features map produced by the earlier convolutional layers. This layer is changed by GlobalAveragePooling2D, AveragePool-ing2D, GlobalMaxPooling2D, and Maxpooling2D to analyze its influence on the network's performance. As seen in Table 5, the accuracy of GlobalAveragePooling2D decreases moderately. The network produces the lowest results for AveragePooling2D, GlobalMaxPooling2D, and Maxpool-ing2D with the Tst_Acc values 96.16%, 95.70%, and 95.10% simultaneously.

C. CASE 3: ALTERING LOSS FUNCTIONS
To decide on the proper loss function for the proposed model, examined many loss functions, including Categorical Crossentropy, Cosine similarity, and Mean Squared Error. Table 6 displays the model's results using the specified loss functions. The model demonstrated the highest test accuracy of 98.68% when it was outfitted with Categorical Crossentropy. The Cosine similarity (95.80%) and the Mean Squared Error loss functions in fact contributed to a minor decline in test accuracy (95.01%). The categorical cross-entropy loss function was employed in subsequent studies for optimum classification performance.

D. CASE 4: ALTERING OPTIMIZERS AND LEARNING RATES
Different optimizers have been employed, including RMSprop, Adam, SGD, Nadam and Adadelta, to find the best outcome. Furthermore, the optimizer uses the momentum method to help accelerate gredients vectors in the right directions, thus leading to faster converging. Momentum was developed to enhance learning in low-curvature directions without being unstable in high-curvature directions. According to

F. MODEL PARAMETERS SELECTION BASED ON ABLATION STUDY
The proposed method AlzheimerNet was shown to be more trainable on image classes than the other models. As seen in Table 9, it also produced more consistently excellent outcomes than competing models.
Following the completion of the ablation research on the proposed AlzheimerNet network, it was discovered that when the appropriate learning rate, loss function, activation function, and optimizer are applied, the model's classification accuracy improves. Table 10 summarizes AlzheimerNet's final configuration.

G. PERFORMANCE EVALUATION MATRIX
To evaluate the models, elements such as True-Positive (TP), False-Positive (FP), True-Negative (TN ), and False-Negative (FN ) are derived from the confusion matrix produced for each model. While working with a certain class or stage of the disease, TP indicates correctly classifying data to a class. FP refers to incorrectly classifying other class data as the detecting class. TN refers to correctly identifying which class a data does not belong to. Finally, FN refers to classifying data incorrectly as another class. Applying the Confusion Matrix Elements, measures such as accuracy, precision, sensitivity, specificity, and f1-score of many others are calculated.
From Equation (6), accuracy is the number of correctly classified samples in all the classes.
In equation (7), Accuracy (overall) refers to the model's overall accuracy where n is the total number of classes in the dataset.
Accuracy (overall) = TP n total number of data (7) From Equation (8), precision is the correctly classified samples from the truly positive samples of the detecting class.    From Equation (10), Specificity is the number of accurately identified samples of other classes from all the actual samples in those classes.
From Equation (11), F1-Score is overall the performance calculated from the precision and recall of the model.
From Equation (12), Negative Predicted Value (NPV) is the number of all the samples of other classes by the models to be of those classes.
From Equation (13), False Negative Rate (FNR) refers to the number of predicted negative values in real positive in the data sample.
16388 VOLUME 11, 2023 The false Positive Rate (FPR) from Equation (14) is the number of negative samples classified as positive.
The false Discovery Rate (FDR) from Equation (15) is the number of positive samples classified as negative.
From Equation (16), False Omission Rate (FOR) is the number of negative values that are both predicted and real negative.

A. ANALYSIS
The work employs five existing CNN models that are VGG16, MobileNetV2, AlexNet, ResNet50 and Incep-tionV3. The performance of the models is compared to identify the most efficient model that can classify the six classes from T2-weighted MRI scan images acquired from the ADNI database. In comparison to the other models, it is discovered that InceptionV3 is the model with the highest classification accuracy. Subsequently, the model InceptionV3 is modified to get higher classification accuracy and the fine-tuned model AlzheimerNet is proposed. The AlzheimerNet model is finetuned with 300 epochs and RMSprop optimizer to optimize the loss function. The learning rate of the model is 0.00001. To guarantee the highest quality data, the MRI scan images are resized to 255×255 pixels after going through a series of image preprocessing stages for improved classification accuracy. After augmentation, the balanced dataset contains a total number of 60000 MRI scan images which are divided into training, validation and test sets in the ratio of 60:20:20. Figure 8 represents a visualization of the performance of each of the employed models using a confusion matrix. The classification performance of each of the classes used in the models is shown in detail. The test dataset contains a total number of 12000 Image data.  11. The mean precision score, mean recall score and mean specificity score of the models are calculated by adding the values of the respective parameter of each class and dividing the sum by the number of classes. The overall f1-score of the models is determined using equation 11, where precision and recall are mean precision and mean recall. The proposed models give the best result compared to the existing models as the mean precision is 98.68%, mean recall is 98.68%, mean specificity is 99.74%, and the overall f1-score is 98.68%. In table 11, Accuracy and loss of training, validation and test are also presented apart from performance parameters as an overview of the models' complete performance where the training accuracy and loss are represented by Trn_Acc and Trn_Ls, respectively. The proposed AlzheimerNet model shows the highest accuracy during training, validation and testing with 98.95%, 97.81% and 98.68%, respectively. The computed loss function during training, validation and testing and minimum with 0.20%, 0.23% and 0.18%, respectively, which indicates that the model is more efficient compared to the existing models.

D. PERFORMANCE COMPARISON OF ALL APPLIED MODELS
From the confusion matrix depicted in figure 8, the components TP, FP, TN, and FN are computed. Accuracy, precision, recall, specificity, and f1-score are determined using these components from equations 6 to 16. The classification performance of the six classes of the dataset is visualized in detail as bar graphs. From the graphs, it can be perceived that the proposed model AlzheimerNet shows much superior performance across all parameters for the six classes. In figure 9, the accuracy (ACC), precision (PRE), recall (REC), specificity (SPE), f1-score (FSC) and NPV are scribed for the six implemented models. To compare the performance of the classification of the six classes by the implemented model, the bar diagram is presented. It is observed from figure 9   In figure 10, the FOR, FPR, FDR and FNR of the implemented models are stated. In figure 10(a)    1.70% respectively and 0.06%, 0.14%, 0.19%, 0.44%, 0.34%, and 0.42% respectively. The highest FDR and FOR scores are 2.20% and 0.46% respectively both for the class MCI. This suggests that the model display superior performance outcome in detecting the stages of AD.

E. GRAD-CAM ANALYSIS
In order to make deep learning more understandable and applicable, numerous studies have been undertaken. Additionally, in several deep learning applications connected to medical imaging, it is crucial to increase the comprehensibility of deep neural networks. Gradient Weighted Class Activation Mapping (Grad-CAM), a method created by Selvaraju et al. [57], serves as an example of how deep learning works. A visual depiction of any densely connected neural network is provided by Grad-CAM. This helps when making predictions or doing recognition tasks to learn more details about the model. The proposed model is employed as a detection method, with the input being an image from a T2-weighted MRI scan. The final Convolution layer receives Grad-CAM after computing the anticipated label using the suggested model. Here, the Grad-CAM approach is utilized to extract the feature map for the proposed network. As a visual representation of a proposed network, the heat map displays the image area crucial for identifying the target class. The proposed methodology for visualising heatmaps on MRI scan pictures is shown in Figure 11. The figure depicts the heatmaps of the original images of a CN, SMC, MCI, EMCI, LMCI, and AD that were generated by utilizing the Grad-CAM algorithm. This highlights the areas of the brain that have the most impact on the prediction of the model. In order to validate the probable diagnosis, this information may be significant in directing Alzheimer's detection. Additionally, regular application of the algorithm with integrated visual characteristics to various patients would raise a doctor's confidence in the algorithm's predictions if the system is correct in its predictions. VOLUME 11, 2023

F. ROC CURVE ANALYSIS
To measure the suggested model's performance, the ROC (Receiver Operating Characteristics Curve) and AUC (Area Under Curve) [58] values are computed. For multiclass classification, one class vs rest process is applied. The ROC curve is constructed with sensitivity (True Positive rate) as the y-axis and 1-Specificity (false positive rate) as the x-axis. The AUC score is determined by calculating the area under the ROC curve. The limit of the AUC value is 0 to 1. The closer the value is to 0, the poor the performance of the model is. Similarly, the closer the value is to 1, the better the performance of the model is. In figure 12 the ROC curve of the six classes is presented for the proposed AlzheimerNet model. The AUC value of the class CN is 0.9773, SMC is 0.9785, AD is 0.9796, MCI is 0.9733, LMCI is 0.9741, and EMCI is 0.9758.
Observing the values, it can be stated that the proposed model is an acceptable model.

G. WILCOXON SIGNED-RANK TEST
A statistical analysis of significance (S) was carried out to ensure that the results could not have been the result of random chance. The researchers accomplished this by doing the Wilcoxon signed-rank tests and computing the p-values for each model. The Wilcoxon signed-rank test is often used to compare two non-parametrically. This test was used to compare two independent samples in order to do a pairwise differences assessment on multiple observations on a single piece. Following that, the outcome indicates whether or not their population mean ranks differ. Table 12 summarizes the p-values for pairwise comparisons of the models [59], [60]. AlzheimerNet outperformed the other models. To describe, the p-value difference between AlzheimerNet and the other models is less than 0.05, indicating that AlzheimerNet outperformed the other five models statistically significantly.

H. COMPARISON ACCURACY WITH SOME EXISTING STUDIES
This article provides a comparison for determining a CNN model for identifying stages of Alzheimer's disease. A comparison of the results obtained using the employed pre-trained models with the results obtained using the suggested modified model demonstrates that the suggested model yields a higher performance score. In order to determine how accurate the recommended model is, it was compared with other models that have been utilized in previous research, as can be seen in table 13. The works of literature have been chosen in accordance with the AD detection models. The table explicitly demonstrates that the suggested model has far higher classification accuracy than previous works.

VII. CONCLUSION
Alzheimer's disease is a detrimental form of dementia that affects a sizable percentage of the world's geriatric population. AD is a degenerative neurological disorder that typically affects the elderly. It is responsible for various diseases, including memory loss and confusion. Individuals who have AD have the inability to do everyday tasks. It cannot be emphasized how critical it is to detect this condition early to provide appropriate therapy and improve patients' lives. In this study, a fine-tuned model AlzheimerNet is proposed to classify the stages of Alzheimer's disease: AD, EMCI, LMCI, MCI, SMC, and NC. The raw data from an MRI scan are given a quality boost with the use of the image enhancement method. For classification, five pre-trained models and one fine-tuned model are experimented with to evaluate the performance with the highest accuracy. Different layers and hyperparameters were used to create Alzheimer-Net's architecture, which was inspired by the InceptionV3 model. An ablation study on the proposed network is done to see how well it stands up with various hyperparameters. The enhanced image dataset was increased by seven augmentation methods to enlarge and balance the MRI dataset into 60000 images. AlzheimerNet performed best with RMSprop optimizer and learning rate of 0.00001, with the training accuracy of 98.95%, validation accuracy of 97.81% and testing accuracy of 98.68%. There was a high rate of correct classification with the model that used image processing, finetuning, and ablation study. To evaluate the proposed model, two-tailed Wilcoxon signed-rank test is performed. Results from this research show that image processing and enhancement methods may boost the model's accuracy. Transfer learning can be an excellent way to work with both small and large numbers of images in computer vision. Additionally, how the model's architecture and hyper-parameters are set up using an ablation study also has an effect on how accurate it is. As future work, a hybrid model could be implemented on ADNI fMRI and PET datasets to diagnose the stages of Alzheimer's disease.
F M JAVED MEHEDI SHAMRAT received the B.Sc. degree in software engineering from Daffodil International University, in 2018. He was formerly employed with Daffodil International University and also worked as a Lecturer with the Department of Computer Science and Engineering, European University of Bangladesh. He has been actively engaged in collaborative research with researchers from Bangladesh, USA, Canada, China, and Australia. He has several research publications published in prestigious journals and conferences (Scopus). His research interests include the study include the IoT, deep learning, image processing, neural networks, bioinformatics, and machine learning.
SHAMIMA AKTER received the B.S. degree in biochemistry and molecular biology from the University of Dhaka, Bangladesh, and the M.S. degree in genomics and bioinformatics from Virginia Tech, in 2017, with a focus on machine learning research. She is currently pursuing the Ph.D. degree in bioinformatics and computational biology with George Mason University. From 2017 to 2019, she was a Research Associate in biomedical science with Virginia Tech to perform bioinformatics that includes Python, R, and Linux programming-based complex data analysis and machine learning-based virology/microbiology projects. She also performed networks analysis, different machine learning projects both in Python and R, and some deep learning projects in Python. Her research interests include artificial intelligence and machine learning in biomedical science i.e., liver disease, chronic kidney disease, mathematical modeling of Alzheimer's diseases, and computational biology.
SAMI AZAM is currently a leading Researcher and a Senior Lecturer with the College of Engineering and IT, Charles Darwin University, Casuarina, NT, Australia. He has several publications in peer-reviewed journals and international conference proceedings. His research interests include computer vision, signal processing, artificial intelligence, and biomedical engineering.
ASIF KARIM (Member, IEEE) is currently a Research Active Lecturer with Charles Darwin University, Australia. Alongside being an Active Researcher, he has considerable industry experience in IT, primarily in the field of software engineering. His research interests include machine intelligence, health informatics, and smart contracts.
PRONAB GHOSH received the Bachelor of Science (B.Sc.) degree from the Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh, and the master's degree from the Department of Computer Science (Artificial Intelligence), Lakehead University, ON, Canada. His several research works have been published at conferences and in reputable journals. His current research interests include artificial intelligence, machine learning, deep learning, image processing, computer vision, health informatics, smart vehicular transportation, and wireless communications.
ZARRIN TASNIM received the bachelor's degree in software engineering from Daffodil International University, Dhaka, Bangladesh. Several of her research papers have been published in recognized journals and conferences. Her research interests include artificial intelligence, health informatics, and deep learning.
KHAN MD. HASIB (Member, IEEE) received the B.Sc. degree from the Computer Science and Engineering Department, Ahsanullah University of Science and Technology (AUST), and the M.Sc. degree from the Computer Science and Engineering Department, BRAC University. He has more than four years of teaching and three years of research experiences in computer science. He has been heavily involved in collaborative research activities especially in the fields of applied machine learning, medical image processing, health informatics, computer vision, and natural language processing. He has published over 25 research papers in highly recognized journals, book chapters, and conference proceedings. He is currently working on several projects, such as efficient detection of specific language impairment in children, applied deep learning in image-based cervical cancer detection, activity recognition using LFR, feature-based CT image registration of liver cancer, and COVID-19 vaccination prediction from numeric data using machine learning algorithms. He is currently pursuing the Ph.D. degree in biomedical engineering with the University of Saskatchewan, Canada. He is also working as an Assistant Professor with Mawlana Bhashani Science and Technology University. Prior to that, he joined the Software Engineering Department, Daffodil International University, as a Lecturer. He has more than 250 publications in IEEE, IET, OSA, Elsevier, Springer, ISI, and PubMed-indexed journals. He has published two books on bioinformatics and photonic sensor design. He is also a Research Coordinator of the Group of Biophotomati. His research interests include biomedical engineering, biophotonics, biosensor, machine learning, federated learning, data mining, and bioinformatics. He is also a member of SPIE and OSA. He holds the top position at his department and in university and is listed among the top ten researchers in Bangladesh, from 2017 to 2020 (Scopus indexed-based). His research group received the SPIE Travelling Award and the Best Paper Award at the IEEE WIECON ECE-2015 Conference. He has achieved gold medals for engineering faculty first both in the B.Sc. and M.Sc. degrees from Mawlana Bhashani Science and Technology University for his academic excellence. VOLUME 11, 2023