An Improved LeNet-Deep Neural Network Model for Alzheimer’s Disease Classification Using Brain Magnetic Resonance Images

Alzheimer’s Disease (AD) is a psychological disorder in elderly people which causes severe intellectual disabilities. Proper processing of neuro-images can provide differences in brain tissues which may help in diagnosing the disease more effectively. But, due to the complex structures, this is a challenge in differentiating the brain tissues and classifying AD using traditional classification mechanisms. Deep Neural Network (DNN) is a machine learning technique that has the ability to absorb the most important information for classifying an object accurately. LeNet is a popular DNN based model with a simple and effective architecture that also consumes very less implementation time. As like most of the DNN models, LeNet also uses MaxPooling layer for dimensionality reduction by eliminating the information of minimum valued elements. In brain images low intensity valued pixels also may contain very important features. To keep the minimum valued elements too in the network, we have created a separate layer that performs Min-Pooling operation. MinPooling and MaxPooling layers are then concatenated together. Finally, we have replaced all MaxPooling Layers in LeNet by the concatenated layers. We have analysed and compared the performances of modified LeNet model with 20 other most commonly used DNN models, and some of the related works. It is observed that, the modified LeNet model achieved the highest performances. It is also observed that, original LeNet model can classify AD with a performance rate of 80%, whereas, the proposed modified LeNet model achieved an average performance rate of 96.64%.


I. INTRODUCTION
Alzheimer's disease (AD) is one of the most death causing psychological disorders in elderly people [1]. In AD, gray tissues in brain which controls the intellectual and behavioural functions, such as the hippocampus, amygdala, etc., gets affected severely [2]- [4]. Initially the memory cells in brain are affected and in later stages, it destroys other gray matter cells which makes a patient inefficient to perform the The associate editor coordinating the review of this manuscript and approving it for publication was M. Shamim Kaiser . simplest tasks. As a result, AD patients experiences serious behavioural and intellectual disabilities along with rigorous memory loss [5]. Most of the patients who develops AD, have gone through an intermediate dementia stage called Mild Cognitive Impairment (MCI) [6], [7]. Since the affects of MCI is not as serious as AD, it is important to diagnosis it and proper neurological assistance may prevent an MCI patient from developing AD. Sample brain images of Cognitively Normal (CN), MCI, and AD patients are shown in Figure 1.
From Figure 1, it can be observed that, the overall grey size of brain changes rapidly from CN to MCI to AD. Similarly, the hippocampus is also smaller in size for the patient of AD and MCI as compare to the CN subject.
Traditional AD diagnosing techniques requires a variety of approaches. In most of the traditional AD diagnosis process, physicians often with the help of specialists such as neurologists, neuro-psychologists, etc. examines various tests such as, patient's medical history examination [8], physical exam and diagnostic tests [9], neurological examination [10], Mini-Mental State Exam (MMSE) [11], mood assessment [12], etc. To perform all these operations, various tools are required which is a long process and less effective.
Magnetic Resonance Imaging (MRI) is well known tool for determining tissue wise detail information of the brain [13]. MRI have been using as a successful tool in diagnosing various diseases such as, cancer, tumor, etc. [14]. Using proper image processing, it is possible to determine the difference in brain tissues amongst AD, MCI and Cognitively Normal (CN) patients. AD classification using brain images requires less time and less number of tools. Moreover, accurate processing of brain images can provide important bio-markers much before a person develops AD [15]. Hence, AD classification using brain images is one of the first choices by the researchers. But, because of the complex structures and pixel information, it is a challenge for the researchers to classify AD vs other patients by determining the tissue differences using the traditional classifiers [16].
Artificial Neural Network (ANN) is a popular machine learning technique, where, a set of artificial neurons are used to design a network, that works as a replica of human brain, and helps to train a machine for taking smart decisions [17]. In an ANN, neurons, which are also known as the processing elements, are interconnected via their weights. In the training step, a set of relevant data are used and processed using a training algorithm which estimate and assigns weights of the neurons. After the model is well trained, it can be used to classify unknown relevant data. Multi-layer perceptron is the most common algorithm uses in ANN [18]. Deep Neural Network (DNN) is a well known ANN model where a set of connected hidden layers works to transmit signals from input to the output layers [19]. DNN have been using popularly in image classification problems with a convincing performances [20]. A sample architecture of DNN for image classification is shown in Figure 2. Figure 2 is example of a DNN model used in image classification. The example architecture is shown for a two classes classification problem. As we can see from Figure 2, all neurons are connected with each other. If 'a' is a neuron in the network and w 1 , w 2 , w 3 , . . . . . . ., w i are its input weights from the previous neurons then output of 'a' can be expressed as Equation 1.
To train the model, Soft-max based energy function is a popular method where the loss estimation is determined using a cross entropy based function. In Equation 3, soft-max operation is defined mathematically.
In Equation 3, f represents the feature channel, y φ (x) represents the pixel to pixel based activation. M is the number of classes, f φ (x) returns the maximum function, i.e, 1, when m gives the max activation y φ (x). For any other value of φ, f φ (x) is 0. All data in a network are divided in several batches, and in each iteration of training and testing, loss function is calculated to improve the results in forthcoming iterations. Loss function is calculated as the summation of delusions amongst the actual and the projected outputs [21]. This procedure is also called Forward Propagation. Mean Square Error (MS), and Binary Cross Entropy (BC) are the examples of 2 most popular loss functions can be expressed as in Equation 4 and Equation 5.
In Equation 4, and 5, y j is actual and f (y j ; w) represents the projected outcomes. Based on the loss value, the network then estimate gradients of cost functions by considering the VOLUME 9, 2021 most crucial parameters, and endorse an appropriate descent process to minimise the loss value. This whole process is called the Back Propagation which can be expressed as Equation 6 and Equation 7.
In Equation 7, η represents the learning rate. Suppose a Back Propagation (BP) operation is performed amongst the neuron 'p','q', and 'r' (p to r via q), then the mathematical expression of BP in neuron 'p' can be expressed as Equation 8.
Researchers have been trying to develop a proper DNN based model for image classification. Till date, various such successful models have been designed. DNN is popularly experimented in AD classification and achieved very convincing results [22]. For dimensionality reduction of the input data, DNN models uses the concept of a pooling layer [23]. In DNN, two types of pooling layers are mostly used namely MaxPooling layer and the Average Pooling layer. Max pooling layer works excellent in traditional image classifications, where pixels with higher intensity value plays most important roles. But in images like brain MRI, a major drawback of using MaxPooling operation is that, it ignores the element with minimum values, which may contain important information [24]. Alternatively, Average Pooling layer takes the average value of the elements in a stride [25]. A major drawback of Average Pooling is that, when it takes the average amongst very high and very low valued elements, the output works neither as a high valued nor as a low valued element. Moreover, if there are many zero valued elements in the stride, the output value of the operation will be reduced significantly [24].
To overcome the limitations of the max pooling and average pooling layers, by taking LeNet as a base model, this work proposes a novel concept of creating a min pooling layer and then wrapping min and max pooling layers together that helps the model to choose better features from brain images in classification of AD. LeNet is one of the most oldest DNN models with the simplest architecture introduced in 1989 by Yann LeCun [26]. The model is famous because of it's abilities to perform faster operation than other models. As like most of the DNN models, LeNet also uses Max Pooling layer to reduce the dimensionality of the input data. All MRI data for this experiment are acquired from the online data-set ''Alzheimer's Disease Neuroimaging Initiative (ADNI)'' [27]. To improve the AD classification performances of LeNet, our main contribution can be summarized as below: • Since low valued pixels in brain images may also contain important information, we have created a new type of layer to perform Min Pool operations. We have replaced the original MaxPooling layer in LeNet by the Min-Pooling layer and observe the importance of MinPooling layers by comparing the performances with MaxPooling and AveragePooling layers.
• To keep both, high valued and low valued pixels in the network, we have concatenated MaxPooling and Min-Pooling layers. All the MaxPooling layers of LeNet are then replaced by the concatenated layer.
• Concatenation of max and min pooling layers makes the model slower in execution. To overcome from this issue, we have used the depth-wise convolution layers in place of original convolutional layers.
• For analysing the effectiveness of our proposed model, we have implemented 20 other most commonly used DNN models, and also we have compared our work with some of the recently published related works, and observed that, our proposed methodology achieved more convincing results with an average performance of 96%.
The remaining paper can be organized as follows: a) In section 2, we have discussed some of the recently published related state of arts, b) In section 3, we have discussed about the various data, tools, pre-processing operations, and the model constructions, c) In section 4, we have discussed the experimental performances of the proposed work and some other related works, and d) In section 5, we have discussed about the conclusion of our work along with the future scope of works.

II. RELATED STUDY: ANN IN CLASSIFICATION OF AD USING BRAIN IMAGES
ANN is one of the best choices by the researchers for AD classification due to its learning abilities from the previous iterations and accordingly improving the predictions in upcoming iterations [28]. Some of the recently published works on AD classifications using ANN based approaches are discussed in this section. Abol Basher, et al. presented a novel approach of AD classifications using tissue-wise hippocampal features from brain images [29]. The appropriate slices for localizing the hippocampal regions (both left and right) are determined by applying a double-staged ensemble Hough-CNN (HCNN). 3D patches are then mined from the region of interests (hippocampus). 3D slices are converted and separated to 2D form from all the three directions ( axial, sagittal, coroal). A Discrete Volume Estimation CNN (DVECNN) based approach is used for extracting the volumetric information from 2D slices which are then used in training and testing the network. In the HCNN, six Convolutional Layers (CLs), Rectified Linear Unit (ReLU), Batch-Normalization (BN) layer, and a set of Connected Hidden Layers (CHLs) are used along with max pooling (MaxPool) layers for dimensionality reductions. For the DVECNN, the authors have used six CLs, three CHLs, BN layers, and a ReLu activation layer. As like HCNN, in DVECNN also the authors have used the MaxPool layers too.
For classifying AD, MCI and CN subjects, P C Muhammed Raees, et al. implemented various deep learning-based approaches for AD classifications using brain MR images [30]. The authors have acquired MRI data for 111 different subjects from the online data-set ADNI. For classification of AD, the authors have tried different machine learning algorithms includes SVM classifier. The authors have implemented some of the commonly used DNN models for AD classifications, namely AlexNet, VGG-16, VGG-19, and GoogleNet. After comparing the performances, the authors have concluded that, DNN models achieved higher performances (80-90%). Amongst all the implemented DNN models, VGG-19 achieved highest performances (approximately 90%).
A DNN based CAD system is proposed by V. Sathiyamoorthi, et al. in the literature [31]. The authors have used an Adaptive Mean Shift Modified Expectation Maximization (AMS-MEM) based approach for brain image segmentation. For performing various pre-processing operations, authors have used the 2D Adaptive Bilateral Filter (ABF) as well as the Adaptive Histogram Adjustment (AHA) toolboxes. For features estimation, 2D Gray Level Co-Occurrence Matrix (GLCM) is used. After selecting the appropriate features, DNN is used for classifications. The authors used transfer learning in a CNN constructed with five convolutional layers, 3 pooling layers, fully Connected layers, and the output layer.
In a similar research, Pemmu Raghavaiah, et al. proposed a novel approach to diagnosis AD from brain MR images using an optimal DNN model [32]. Authors have used the Statistical Parameter Mapping (SPM) toolbox for segmenting input brain images in three parts, namely Cerebrospinal Fluid (CSF), Gray Matter (GM), and White Matter (WM). Gaussian filter is applied for image smoothing, and Gabor filter with 8 orientations is used for texture feature extractions from the 2D image slices. The authors have designed a DNN model for classification of AD, MCI, and CN subjects from brain images, where the important features are adopted by stacked sparse auto-encoders consists of input, hidden, and output layers. The Squirrel Search Algorithm (SSA) is used as an optimization algorithm.
A Long Short-Term Memory(LSTM) DNN for MR imaging based AD dementia classification method is proposed by Sneha Mirulalini Gnanasegar, et al [33]. From the input brain images, most relevant features are selected by using the Boruta algorithm, which basically is a Random Forest (RF) based approach. After selecting the features, the authors used an LSTM DNN based classification approach for classifying AD vs CN subjects. In LSTM, 4 specific components are added for better performances, namely input, forget, memory, and the output gate. As per the author's claim, the approach achieved a convincing results with zero over fitting issues.
Jong Bin Bae, et al. proposed a CNN based model for AD classification in the literature [34]. The authors have trained the networks on 5 batches covering Medial Temporal Lobe (MTL) of 30 coronal slices from the input brain images.
The atrophy of MTL regions amongst different subject groups are determined. For performing the pre-processing operations, including MTL extractions, the authors have used the FreeSurfer toolbox. For classification, the authors have constructed a CNN inspired by the famous Inception-v4 model. For the experiment, 156 AD, 156 CN subjects are taken for training and 39 AD, 39 CN subjects are take for testing the network.
For early detection of AD, a novel framework by combining CNN and ensemble learning is proposed in the literature [35]. Initially a set of CNNs are constructed for various input data of sagittal, coronal, and transverse brain tissues. All the CNNs are then combined together as a single network for classifications. In pre-processing, all the tissues are converted into Montreal Neurological Institute (MNI) space using the Computational Anatomy Toolbox (CAT). The important bio-markers includes those regions where most of the pixels are intersected. The ensemble learning used is comprises of two steps. In step 1, a set of different CNNs (40 CNNs for sagittal, 50 for coronal, and 33 for transverse) are constructed for all tissues in the MNI space. Five best performing CNNs for each slice orientations are selected for further operations. In step 2, all the three CNNs are combined together for the final classifications.
An AD classification framework using brain MR images and DNN is proposed by Amnaya Pradhan, et al. in the literature [36]. For this work, the authors have acquired data from Kaggle online dataset for 4 different subjects group, namely Mildly, Moderately, Very Mildly and Non-Demented subjects. Acquired data are then distributed as 8:2 ratio for training and testing. For better performance comparisons, the authors have taken two famous DNN models, namely VGG-19, and DenseNet-169. Same dataset are used for both the models. The authors have concluded that, VGG-19 performs better than DenseNet.
Eman N. Marzban, et al. proposed an AD classification approach using the Diffusion Tensor Images (DTI) and DNN [37]. All input images are segmented and normalized using the Statistical Parametric Mapping (SPM) toolbox. The volumes of Gray Matter (GM), and White Matter (WM) are determined. The CNN comprises of several layers including an input layer, convolutional layer, batch-normalization layer, ReLU activation layer, pooling layer, connected hidden layers, and the output layer. The Root Mean Square Propagation (rmsprop) based weight estimation algorithms used. For training and testing, concept of 10-cross validation method is used by the authors.
Using the concept of depthwise separable CNN, a novel approach for AD classification is proposed by Junxiu Liu, et al [38]. The authors have claimed that, a small set of MR images are acquired for training and testing and still achieved a high classification performances. For improving the portabilities and time complexities, concept of the Depth-wise Separable Convolution (DSC) is used in the network. DSC basically used to reduce the unwanted parameters as well as the computational time, and at the same time classification performances also gets increased. DSC makes a normal convolution layer as a set of 2 layers; first layer works as a filter, and the second layer extracts features by using several 1 × 1 kernels. DSC uses the kernels in a particular channel of the images, followed by a point-wise convolutional operation for integrating output of all the channels. For faster and accurate classification, the authors have used transfer learning for two well known DNN models, namely AlexNet and GoogLeNet.
By taking DenseNet as reference, Braulio Solano Rojas, et al. proposed a DNN based approach for AD classification [39]. From 3D MR images, the authors have selected 42 most appropriate slices for further processing. The authors have adopted the Bottleneck-Compressed based model from DenseNet. Additional to the original architecture, the authors have included a channel parameter that considered three particular channels (RGB) from the monochromatic MR images. For improving in selection of imaging features, the M3d-Cam tool is used in combination of a Guided Gradient weighted Class Activation Mapping (Grad-CAM) algorithm. The process is called attention maps, that helps in discovering the unwanted features. By using appropriate processing operations, all unwanted pixels are then removed.
In a similar research, Jingwen Sun, et al. proposed a novel DNN based approach for AD classification [40]. The authors proposed a modified functional 3-D DNN for performing two simultaneous operations; hippocampus segmentation, and classifications of AD using MR images. By taking V-Net as a base model, the authors have designed an architecture, where the lower parts of the network is replaced by a bottleneck block (inspired from DenseNet). After getting the segmented hippocampus regions, the segmented images are then forwarded to 1a 3D CNN for classification of AD. For classification of the subjects, local hippocampal features as well as the global features from the brain images are mined. Moreover, the authors have also proposed a novel loss estimation functions that helped in achieving a convincing results.
For classifying AD, MCI, and CN subjects, Boo Kyeong Choi, et al. proposed a neural network based approach using brain images [41]. Initially, the hippocampus regions in brain images are segmented using the 3D Slicer toolbox. Then, area of segmented regions are processes by the Local-Entropy-Minimization-bi cubic Spline (LEMS) based homogeneity rectification approach. Finally, a binary neural network based classifier is designed to perform the classifications. The proposed CNN comprises of input layer, two convolutional layers, two max pooling layers, flatten layers, fully connected layers followed by the output classification layer.
An AD classification framework using Multi-Modality CNN is proposed by Yechong Huang, et al [42]. The authors have constructed a CNN based model where the most important features of the hippocampus regions can be integrated from T1-MR and FDG-PET images. No segmentation operations are performed. For preparing data for the classifier, all MR and PET images are transformed into a same spatial space. For ensuring the identical tissues of same brain regions amongst the image pairs of both the modalities, rigid registration is performed. To construct the classification network, the authors have followed the idea behind the VGG based DNN models. The classification model is designed to classify CN vs. AD, CN vs. pMCI (Progressive MCI), and the sMCI (Stable MCI) vs. pMCI subjects.
Using Structural MR images, Chunfeng Lian, et al. proposed a framework for joint atrophy Localization as well as AD classification [43]. A hierarchical CNN (HCCN) model is constructed for identifying the most discriminative patch/region wise locations are determined. Based on the identified regions, the most important features are extracted which are then used to train the HCNN. To train the HCNN model, data of local brain image patches are taken as inputs. For generating the estimated locations for feature extractions, a tissue wise anatomical for each of the linearly aligned images is constructed. For better performances, the authors also used the concept of a hybrid loss function.
A Deep Multi Task Multi Channel Learning (DMTMCL) based approach for AD classification is proposed by Mingxia Liu, et al. [44]. The proposed DMTMCL is used for two operations simultaneously from the brain images and the demographic information. The operations performed by the model are AD classification, and the neurological score regression. Initially, the most discerning anatomical bio-markers are identified from input images, after that important tissues from the identified landmarks are extracted. The model can also distinctly consolidate the demographic features of all the subjects in the training phase. Finally, selected tissues and the demographic properties are combined and forwarded as inputs of DNN model for performing classification and regression operations. Jae Young Choi, et al. proposed a novel AD classification approach based on the Combination of several DNN by ensemble generalization Loss [45]. Multiple DNN are combined together where brain MR images are taken as inputs. For the combinations of DNNs, numerous MRI projections (axial, sagittal, coronal) are ensembles together for different deep neural networks. The process also helps in increasing the deep assembling heterogeneity. For finding the most ideal weights amongst the neurons of DNNs, the authors proposed a deep assemblage based generalization loss, that helps in interacting and cooperating for the ideal weight search. For constructing multiple DNNs, the authors have taken the popular VGG-16, GoogLeNet, and AlexNet as base models.
For diagnosing and predicting the progression of AD, Yan Zhao, et al. proposed an ANN based framework [46]. The model consists of a 3D Multi-information Generative Adversarial Network (MGAN) for predicting the brain changes over the ages. For classification of the brain images, a DenseNet based architecture is constructed which is basically optimize the focal decay of the brain images to estimate the dementia stages. Multiple information are used in the model, such as the age, gender, etc. In pre-processing, skull stripping as well as the segmenting of brain images in three parts (GM, WM, and CSF) are performed using the Voxel based morphometry (VBM) toolbox. The proposed model can classify different dementia stages, such as MCI vs AD, MCI vs CN, pMCI vs. sMCI, etc. The model is also tested for multiclass classification and achieved a convincing result.
A Broad Learning System (BLS) based AD classification approach is proposed by Ruizhi Han, et al [47]. The diagnosing tool uses the brain MR images and can classify multiple stages of AD by using the BLS and its convolutional based on the Broad Learning Systems (BLS), as well as its convolutional developments. For performing different pre-processing operations, authors have used the Computational Anatomy Toolbox (CAT-12) toolbox. Based on processed images, the authors have designed a model called Convolution Feature based Cascade of Enhancement Nodes BLS (CCEBLS) which helps in combining the variations of the BLS. Consequently, one more variant is proposed by taking reference of the CCEBLS as well as the BLS. Multi layer CNN is used for extracting features from the images. The architecture of the model is inspired by the famous VGG model.
A novel Residual Self-Attention Neural Network (ReSAN-NEt) for atrophy localization ans AD classification is proposed by Xin Zhang, et al [48]. The novelty of the framework can be divided into three steps, a) For improving the classification performances, a DNN of residual self-attention is designed that helps in capturing local/global as well as the spatial properties from the brain images, b) A Gradient-based Localization Class Activation mapping (GCAM) based intelligible approach is used for improving the explainable characters, c) An sub-sequential learning proposition for automated classification. The 3D ReSANNEt for AD classification is inspired from the ResNet model. The 3D GCAM is applied to the 3D ReSANNEt for getting the best performances. The framework is designed to classify AD vs CN, as well as pMCI vs sMCI. Python is an effective toolbox popularly used in different medical image processing applications [49]. Due to its easy and user friendly interfaces, Python is faster in implementations than many other toolboxes [50]. For executing all the model architectures, we have used Python toolbox. To increase the training performances, we have used the data generator functions such as rotation, contrast enhancement, flipping, etc. to increase the number of input data.

B. PREPROCESSING
For deep learning models, 3D images requires a huge number of layers which also increases the computational loads [34]. Moreover, sometimes we need to perform various post processing operations that also leads to increase the execution time [51]. For this work, we have converted the 3D brain images into a group of 2D slices. In various dementia stages of AD, hippocampus in brain is known for the most severely affected regions [52]. A regular decay in hippocampus is experiences by the MCI/AD patients [53], [54]. Hence, we preferred to use the brain images with hippocampus regions in sagittal view for the classification network. Under the supervision of a neuro expert from ''North Eastern Indira Gandhi Regional Institute of Health & Medical Sciences'' (NEIGRIHMS, http://www.neigrihms.gov.in/) we have analysed the 2D images and identified the most suitable slices which can provide the hippocampus regions. All input MR images are reshaped into 256 × 256 × 1, sized images.
All the brain MR images also contains some non brain parts known as the skull. In AD classification, since contribution of skull part is ignorable, presence of skull will increase the dimensionality in the feature maps. Hence, we have segmented the brain images from skull parts. As shown in Table 1, for segmenting the skull properly, we have implemented some of the most commonly used image segmentation techniques and chosen the best performing technique, which is the Histogram Based Thresholding approach. A part of our skull stripping operation is published in the article [55]. One of the visual outcomes of skull removing operations is shown in Figure 3. In Figure 3, a sample input brain image is shown in part a. In part b, the visual outcome after applying the skull stripping technique is presented.

C. IMPROVED LeNet MODEL CONSTRUCTION
LeNet is one of the most effective DNN models which consumes less computational time. A sample architecture of LeNet model is shown in Figure 4.  In figure 4, architecture of LeNet is shown which is comprises of 7 layers along with an output layer. Input layer of the network takes the images as inputs and forward them to the next layer after performing the size-normalization operation.
Next layer is the Convolutional layer which consists of a set of feature maps/kernels to perform extractions of important feature information, such as edges, corners, etc. Kernels or the feature maps are nothing but a set of squared matrices having identical weights. The step of sliding and overlapping the kernels throughout the entire image pixels is known as the convolution operation. Since the proposed model concatenates max and min pooling layers together, the normal convolutional operations takes more time memory spaces. Depth-wise convolution is a well known way to improve the execution time, and representational efficiency [56]. Depthwise convolution uses different kernels for each of the input channels in the images. Finally, all the outputs from different channels are combined together with a point-wise 1 × 1 convolutional operation. Mathematically, the depth-wise convolution is expressed in Equation 9.
where,Â is the depth-wise convolution filter of size P A × P A × X (P A is the spatial dimension, and M is the sum of input-channels). Here c th kernel inL is used in c th -channel of Z, for producing the c th channel for the filtered feature map Cl. The computational cost of the depth-wise convolution can be estimated as Equation 10.
In Equation 9, the computational cost of the normal convolution operation is P A · P A · X · Y · P Z · P Z (Y is total output-channels), which is more expensive than the depthwise convolution. To combine the depth-wise convolutions, a 1 × 1 point convolution is used. Including the point convolution, the overall cost can be expressed as Equation 11.
The overall cost reduction can be expressed as Equation  12 and Equation 13.
Next layer in LeNet is used for reducing dimensionality of matrices. The process is known as pooling operation where based on the mathematical functions, less important information are discarded. This layer is known as the Pooling layer. In standard LeNet model Max Pooling operation is used, hence the layer is also known as MaxPooling layer. In Max pooling operation, only the maximum valued elements in a kernel are selected and forwarded to the next layer. Mathematical operation of Max pooling operation can be represented as Equation 14.
The Max pooling operation with a 2 × 2 kernel is visually expressed as in Figure 5. From Figure 5, it can be observed that from the input feature matrix, based on the sliding kernel size, it extracts only the highest valued elements. Max pooling works excellent in normal image processing, such as handwriting detection, object detection, etc. where the highest valued pixels plays the most important role. But, in medical image processing, such as in AD classification using brain MRIs, since quality of images are not so good, small valued pixels also may contain very important features. Hence, we have introduced the concept of Min Pooling along with the Max Pooling operations in LeNet for AD classification. The mathematical equation for Min Pooling operation can be expressed as Equation 15. (15) A sample Min Pooling operation is shown in Figure 6. As we can see from Figure 6, Min pooling operation only chooses the minimum valued pixel elements in the kernel. Our main aim is to keep both minimum and maximum valued pixel elements in the network. Hence we have considered both Max and Min Pooling operations and concatenated their results together. The concatenation operation can be expressed as
The visual representation of a sample concatenation pooling operation can be shown as in Figure 7. From Figure 7, it can be observed that, the concatenation of pooling layers selects both highest and lowest valued elements from the feature maps. Inspired by the original LeNet model (Figure 4), the modified network architecture is shown in Figure 8. To train the model, Soft-max based energy function is defined where the loss estimation is determined using a cross entropy based function. In Equation 17, softmax operation is defined mathematically.
In Equation 17, f represents the feature channel, y φ (x) represents the pixel to pixel based activation. M is the number of classes, f φ (x) returns the maximum function, i.e, 1, when m gives the max activation y φ (x). For any other value of φ, f φ (x) is 0. Cross entropy is used to construct a mechanism to penalize each pixel from the deviation of f n(x) (x) by using Equation 18.
where, n : Z → 1, . . . , M represents the actual pixel's level and e : Z represents a weight map used for identifying and giving importance to those pixels that contributes the most.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
Experimental Setup: For evaluating all the experimental analysis of this work, we have used a CPU of having 16 GB RAM, 500 GB SSD storage, 4 GB graphics, i7 processor with windows 10 as operating system. Because of the user friendly  interface and fast execution capabilities, Python is popularly used in medical image processing [49], [50]. For the experimental implementations, we have used Python 3.0 toolbox. For training the models, 50 epochs are used with a data batch size of 32.
We have implemented the improved LeNet model as shown in Figure 11. For training and testing the model, we have acquired MR images of more than 200 patients of three different subject groups CN, MCI, and AD. Number of images acquired are more than 2000. Using the data generator functions, the number of images are increased to more than 15000. For better performance evaluations, we have further subdivided the images into three different groups based on patient's ages. For each of the subject groups, in group 1, patient's of aged in between 60-69 years, in group 2, patients of aged in between 70-79 years and in group 3, patients of aged 80+ years are separated.     Some of the most commonly used parameters for classification performance analysis, namely Accuracy, sensitivity, specificity, and the Precision are used for performance evaluation. Apart from the above mentioned parameters, we have also used the ROC(Receiver Operating Characteristic) Curve for performance evaluation of the proposed model. For better performance comparison, we have implemented the LeNet model using four different types of pooling operations. Firstly using MaxPooling layers, secondly using Aver-agePooling layers, thirdly using MinPooling layers, and fourthly using the concatenated pooling layers. ROC curves for each of the subject groups (age-wise) are shown in Figure 9 to Figure 11. Performance evaluation table of the proposed model is shown in Table 2 and Table 4.
We have implemented and compared the average results of the improved LeNet model with the original LeNet model, LeNet with AveragePooling layers, and the LeNet model with MinPooling (newly introduced) layers. The models are implemented using the same training and testing data for CN, MCI, and AD patients. For all the variants of LeNet, we used the same groups of patients (60-69, 70-79, 80+) years. The performances of all the LeNet variants is presented in Table 2.
For a better performance comparison, we have also implemented 20 other most commonly used DNN models. We have implemented all the models using same data distributions. A part of this experimental comparison works is submitted in [57].
From Table 2 and Table 3, it can be observed that, amongst all the implemented DNN models for AD classifications, the proposed improved LeNet model achieved the highest performances. The second highest performances is achieved by the DenseNet-121 model. Because of the simplest architecture, LeNet model consumes least computational time. Though the improved LeNet requires a little more computational time than the original LeNet model, still the time requirement is less than all other models except the AlexNet.
From Table 3, it can be observed that, the proposed improved LeNet model has an average performance rate of 96.64%. The detailed evaluation of the proposed model is presented in Table 4.
From Table 3, and Table 4, it can be observed that, improved LeNet can classify the different stages of AD more accurately in less execution time. We have observed some of the recently published related state of arts which are summarized in Table 5.
From Table 5, it can be observed that, amongst all the discussed recently published state of arts, proposed improved LeNet model has the ability to classify the different stages of AD more accurately. From Figure 9 to Figure 11, the ROC curve of the proposed model also indicates a convincing performance.

V. CONCLUSION AND FUTURE WORK
Diagnosing of AD using the traditional approaches are less effective and more time consuming. Since, brain is the main region of attacks in AD, researchers are trying to develop an accurate methodology for the classification of different stages in AD using brain images. The most common three stages of AD are CN, MCI, and AD. MCI is also known as the middle stage between CN and AD.
Since, the structures of brain tissues are quite complex, and due to the complex pixel information in brain images, it is difficult to classify AD using the traditional classifiers. DNN is famous for train a machine to take very complex decisions and also popularly used in various applications of image processing. But, as far our knowledge, very few of the DNN models are experimented in AD classification.
Amongst all the popular DNN models, LeNet is the most simplest and the oldest model. LeNet is also one of least time consuming models. LeNet is effectively used in various image classification frameworks. As like most of the DNN models, LeNet also uses the MaxPooling layers for reducing dimensionality of input data. One drawback of using MaxPooling layers in AD classification using brain images is that, it considers only the highest valued elements in the feature maps. That means, it doesn't considers the pixels in the images having low intensity values. Since brain images consists of complex pixel information and also less enhanced in comparison to other digital images, hence, low intensity pixels may also contain very important features. To keep both maximum and minimum valued pixels in the model, first we have created a separate pooling layer to perform Min-Pooling operations, and then concatenated MaxPooling and MinPooling layers together. The concatenated pooling layers results in additional computational time for the model. To reduce the computational time, we have replaced all the convolutional layers by Depth-wise convolutional layers.
For experimental analysis of the improved model, we have acquired MR images of more than 200 patients from the online data-set ADNI. For better performance analysing, we have distributed the data in various subgroups of having different age groups (60-69 years, 70-79 years, 80+ years). Hippocampus in brain is the most affected regions in AD. Hence, with the help of an expert radiologist, using the 3D-slicers toolbox, we have extracted the slices of MR images that contains hippocampus regions. Finally, 2D brain images containing hippocampus regions are used as inputs in the model.
The average performances of the constructed model for different aged groups of various subjects are presented in Table 2. The ROC (Receiver Operating Characteristic) curves of the improved model's classification performances are also shown in Figure 9 to Figure 11. We have implemented and compared performances of different variants of LeNet model, i.e, the original LeNet (using MaxPooling layers), LeNet using AveragePooling layers, LeNet using newly constructed MinPooling layers, and the improved LeNet model (using concatenated pooling layers), and observed that, the improved LeNet model achieved the most convincing classification results as shown in Table 3. Using the same data distributions, we have implemented 20 other commonly used DNN models for AD classification. After comparing the average classification performances amongst all the implemented DNN models, it is observed that, the improved LeNet model begged the highest performance rate of 96.64% as shown in Table 4. We have also discussed 20 recently published state of arts for AD classification using brain images and various neural network models. The average performances amongst the improved model and the related state of arts are also compared. From the comparison Table 5 and Table 6, it can be observed that, amongst all the related works, our proposed model achieves the highest classification performance.
Though the improved LeNet model Achieved a convincing result, still it can be further improved in future works. One drawback of the improved LeNet is that, it requires more memory spaces than the original model. In future works, a proper feature elimination method can be used to reduce unnecessary features that may help in reducing the memory space requirements of the model. Data for some more stages of AD patients (such as stable MCI (s-MCI), progressive MCI (p-MCI), etc.) can be acquired and tested the classification results, which may help in early detection of AD. In future work, data from different sources other than ADNI also can be acquired and tested the performances of the model. AJITH ABRAHAM (Senior Member, IEEE) received the M.Sc. degree from Nanyang Technological University, Singapore, in 1998, and the Ph.D. degree in computer science from Monash University, Melbourne, Australia, in 2001. He is currently the Director of the Machine Intelligence Research Labs (MIR Labs), a Not-for-Profit Scientific Network for Innovation and Research Excellence connecting Industry and Academia. The Network with HQ in Seattle, USA, has currently more than 1 000 scientific members from more than 100 countries. As an Investigator/Co-Investigator, he has won research grants worth more than U.S. $100 million from Australia, USA, EU, Italy, Czech Republic, France, Malaysia, and China. He works in a multi-disciplinary environment involving machine intelligence, cyber-physical systems, the Internet of Things, network security, sensor networks, web intelligence, web services, data mining, and applied to various real-world problems. In these areas, he has authored/coauthored more than 1 400 research publications out of which there are more than 100 books covering various aspects of computer science. One of his books was translated to Japanese and a few other articles were translated to Russian and Chinese. About more than 1 100 publications are indexed by Scopus and more than 900 are indexed by Thomson ISI Web of Science. Some of the articles are available in the ScienceDirect Top 25 hottest articles. He has more than 1 100 coauthors originating from more than 40 countries. He has more than 45 000 academic citations (H-index of 98 as per Google Scholar). He has given more than 150 plenary lectures and conference tutorials (in more than 20 countries). For his research, he has won seven best paper awards at prestigious international conferences held in Belgium, Canada, Bahrain, Czech Republic, China, and India. Since 2008, he has been the Chair of IEEE Systems Man and Cybernetics Society Technical Committee on Soft Computing (which has more than 200 members) and served as a Distinguished Lecturer of IEEE Computer Society representing Europe from 2011 to 2013. He is also the Editor-in-Chief of Engineering Applications of Artificial Intelligence (EAAI) and serves/served the editorial board of more than 15 international journals indexed by Thomson ISI. He is actively involved in the organization of several academic conferences, and some of them are now annual events. More information can be available at http://www.softcomputing.net/ scholars. He is also reviewer of several reputed international journals and a guest editor of one Springer journal. His research interests include machine learning, image processing, and natural language processing.