Automatic Localization and Discrete Volume Measurements of Hippocampi From MRI Data Using a Convolutional Neural Network

Automatic hippocampal volume measurement from brain magnetic resonance imaging (MRI) is a crucial task and an important research area, especially in the study of neurodegenerative diseases; hippocampal volume atrophy is known to be connected with Alzheimer’s disease. In this research work, we propose a deep learning-based method to automatically measure the discrete hippocampal volume without prior segmentation of the volumetric MRI scans. We constructed a 2-D convolutional neural network (CNN) model that uses 3-channel 2-D patches to predict the number of voxels attributed to the hippocampus; the number of estimated hippocampal voxels is multiplied by the voxel volume to measure the discrete volume of the hippocampus. In addition, we demonstrate a preprocessing scheme to prepare the data using a relatively small number of MRI scans. The average errors in the measured volumes of the proposed approach and the compared atlas-based system were 4.3173 ± 3.5436 (avg. error% ± STD) and 4.1562 ± 3.5262 (avg. error % ± STD) for the left and right hippocampi, respectively. The correlation coefﬁcients of the proposed approach with atlas-based volume measurement were statistically signiﬁcant (p-value < 0.01, R 2 = 0 . 834 (left hippocampus), and R 2 = 0 . 848 (right hippocampus) based on 0.05 signiﬁcance level), which suggests that the proposed approach can be used as a proxy method for the atlas-based system. Furthermore, the proposed approach is computationally efﬁcient and requires less than 2 seconds to calculate the number of voxels for an MRI scan. Moreover, our method outperforms the state-of-the-art deep learning approach, such as 2-D U-Net and SegNet in the context of voxel/volume estimation errors% for the left and right hippocampi.

ABSTRACT Automatic hippocampal volume measurement from brain magnetic resonance imaging (MRI) is a crucial task and an important research area, especially in the study of neurodegenerative diseases; hippocampal volume atrophy is known to be connected with Alzheimer's disease. In this research work, we propose a deep learning-based method to automatically measure the discrete hippocampal volume without prior segmentation of the volumetric MRI scans. We constructed a 2-D convolutional neural network (CNN) model that uses 3-channel 2-D patches to predict the number of voxels attributed to the hippocampus; the number of estimated hippocampal voxels is multiplied by the voxel volume to measure the discrete volume of the hippocampus. In addition, we demonstrate a preprocessing scheme to prepare the data using a relatively small number of MRI scans. The average errors in the measured volumes of the proposed approach and the compared atlas-based system were 4.3173 ± 3.5436 (avg. error% ± STD) and 4.1562 ±3.5262 (avg. error % ± STD) for the left and right hippocampi, respectively. The correlation coefficients of the proposed approach with atlas-based volume measurement were statistically significant (p-value < 0.01, R 2 = 0.834 (left hippocampus), and R 2 = 0.848 (right hippocampus) based on 0.05 significance level), which suggests that the proposed approach can be used as a proxy method for the atlas-based system. Furthermore, the proposed approach is computationally efficient and requires less than 2 seconds to calculate the number of voxels for an MRI scan. Moreover, our method outperforms the state-of-the-art deep learning approach, such as 2-D U-Net and SegNet in the context of voxel/volume estimation errors% for the left and right hippocampi.

I. INTRODUCTION
The hippocampus is a widely studied structure in the context of learning, memory, stress and neurological disorders. Hippocampal atrophy is known to be linked to various serious brain dysfunctions, such as Alzheimer's disease [1], [2], schizophrenia [3], and depression [1]. Hippocampal volume measurement is a crucial task to perform automatically, and it is relatively time consuming.
The hippocampus consists of distinct, interacting subregions with a complex, heterogeneous structure [4]. Therefore, it is highly difficult to perform critical analysis on its The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei .
subregions and measure the exact volume. Because of the increasing resolution of MRI scans, automatic segmentation of the hippocampus with its subfields becomes possible. To segment and measure the volume of hippocampi with its subfields, several methods [5]- [9] have been developed. Moreover, a few software packages, such as, FreeSurfer, FIRST, SPM, and Neuro I, and online platforms, such as VolBrain (https://volbrain.upv.es/) are available to estimate the volume from MRI scans. The gold standard method to measure hippocampal volume is the manual delineation of brain MRI scans. However, this method requires careful work by trained operators and is often impractically labor intensive. Therefore, considerable attention from the research communities is being paid for developing an automatic system to analyze the hippocampus with its important subfields, such as the right parasubiculum, left and right presubiculum, right subiculum, left dentate gyrus, left CA4, left HATA and right tail. These subregions are believed to be correlated with normal aging and Alzheimer's disease. Moreover, they are known to be more sensitive biomarkers of AD and neurological disorders [4].
On the other hand, data acquisition plays a significant role in the analysis process. Cross sectional data provides less diversities than their longitudinal MRI data counterparts. Therefore, several large-scale investigations on MRI data are being carried out, and different organizations, including the Alzheimer's Disease Neuroimaging Initiative (ADNI), are collecting longitudinal MRI data along with cross-sectional scans. In the longitudinal data acquisition process, the confounding between-subject variability is removed which enables accurate quantification of withinsubject neuroanatomical changes and deliver high sensitivity [4], [10]- [12]. Using longitudinal high resolution 3T MRI scans, it is possible to generate ex vivo atlas that allows us to distinguish the multiple subregions of the hippocampus. However, both MRI acquisition techniques [13] are being used to provide a broad range of methods for analyzing different neurological disorders effectively.
There are many different MRI analysis and clinical motivations to pursue various methods for segmentation, such as manual delineation, atlas/statistical-atlas-based methods [14]- [18], statistical parametric approach and/or statistical shape models [6], [19], [20], deformable morphometry-based approaches [21], Bayesian approaches [4], [10], patch-based methods [22], [23], machine learning-based approaches, and deep learning-based approaches [3], [8], [9], [24]- [27]. Segmentation of brain regions from MRI scans using any of these approaches does not strictly rely on the intensity information, rather, the intensity distribution of different subfields has a considerably overlapping intensity values. Furthermore, all the edge boundaries are not properly visible in MRI scans. For example, it is often seen that the white matter of the hippocampus is not well resolved and its boundary can overlap with that of the amygdala, in other words, the boundary can be invisible [28]. Therefore, prior knowledge of hippocampal boundaries is crucially important to trace it properly.
The most common method for segmenting and measuring the volume is an atlas-based system. In an atlas-based system, a reference image is used to coregister the target images where the regions of interest are manually traced onto reference images by an expert radiologist [14]. FreeSurfer is a software package for subcortical segmentation and cortical A. Basher et al.: Automatic Localization and Discrete Volume Measurements of Hippocampi From MRI Data parcellation, a popular example for atlas-based systems. It uses atlas images to register/segment the regions of interest from MRI scans. It offers the easiest way to perform such automated operations but it is time consuming.
In this research project, we considered the T1-weighted Gwangju Alzheimer's and Related Dementia (GARD) cohort dataset, consisting of 326 MRI scans, analyzed using the ANT algorithm [7], which is included in the Neuro I software package (http://www.infomeditech. com/). We explored a deep learning algorithm to analyze MRI scans to localize the hippocampi and measure the discrete volumes. The hippocampi were localized automatically using two stage ensemble Hough convolutional neural network (Hough-CNN) model [29], and the voxel positions of hippocampi were traced. Using those voxel locations, 2-D patches of the left and right hippocampi were extracted to train deep learning models. Utilizing the deep learning models, we quantitatively estimated the left and right hippocampal volumes automatically. To the best of our knowledge, this is the first attempt to measure discrete volumes without prior segmentation of hippocampi from the MRI scans.
This paper is organized using the following hierarchy. We illustrate the methodology in section II. In the same section, we described the necessary preprocessing steps to train, validate and test the individual model with their corresponding loss functions. In section III, we describe about the dataset and error estimation procedures. Discrete volume measurement procedure on the test phase and the comparative analysis with other state-of-the-art deep learning methods are explained in the same section. In section IV, we provide a detail overview of our method and it's limitations. Finally, a summary of the entire process is illustrated briefly in section V.

A. PRIOR WORKS
Hippocampal shape, size, partial volume, contrast and resolution constraints of MRI scans have led to the researchers to develop several methods to facilitate the recognition of its structure. To localize the hippocampus and measure the volume accurately and visualize the subfields successfully, several scanning processes and scanners are being built. Although the manual delineation of MRI scans is still considered the gold standard, many automatic methods are being proposed to perform segmentation and localization.
Depending on variations in head size, the total intracranial volume(ICV) is calculated, which is also known as the total cranial volume (TIV). Variation are observed because of different sexes, ages or races [6]. In [6], Ian B. Malone et al. used Statistical Parametric Mapping 12 (SPM 12) to automate the segmentation for TIV measurement. The hippocampal volume changes were measured by [5]. Mulder et al. [5] measured hippocampal atrophy rates in healthy aging, MCI and AD patients using the automated software package FreeSurfer (longitudinal processing stream) and manual delineation. Their study was to observe the atrophy measurements in between baseline scans and the 12-Month follow-up visit in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset of 80 subjects. On the other hand, the hippocampal subfields were studied using high-resolution 3T MRI scans in [1]. Winterburn et al. [1] measured the whole hippocampal volume along with the subfields volume using ex vivo specimens in a 9.4T small-bore scanner. In [30], [31] and [32], hippocampal and amygdala volume measurements were performed to analyze different types of neurological disorders, such as Alzheimer's disease and temporal lobe epilepsy. On the other hand, the relationship between memory/learning and hippocampal volume was studied in 7-yearsold children by a group of researchers in [33] to understand whether there is any viable objective that exists with hippocampal volume in an early age. To predict Alzheimer's disease (AD) in patients with mild cognitive impairment (MCI), the hippocampal volume was studied by different groups of researchers. In [34], MCI following AD was observed using hippocampal volume changes. FreeSurfer 5.3.0 uses it's own atlas scaling factor derived from the registration images of an average template to measure the TIV by utilizing a full affine transformation [6]. However, the latest version of FreeSurfer 6.0 has upgraded functionality to segment and measure the volume of hippocampal subfields using a computational atlas [35]. It requires a large amount of time (several hours for a single MRI scan using a recon-all pipeline with an additional flag) to register and/or segment a single MRI scan and measure the volume of the whole hippocampus with its subfields. On the other hand, FSL FIRST is a Bayesian model-based subcortical brain segmentation/registration tool that uses shapes/appearances from manually segmented images [36]. Similarly, Neuro I is another software package that was built on an open source algorithm provided by ANTs (Advanced Normalization Tools, (https://github.com/ANTsX/ANTs) [7] to reconstruct MRI images into a three-dimensional model and measure the thickness of the cerebral substructure volume and cerebral cortex. In this study, we used the ANT method to analyze data as a standard for the discrete volume measurement to train, validate and test our proposed approach.
Deep learning is a powerful tool for solving numerous complex problems in various disciplines, such as pattern recognition, speech recognition and medical imaging. Various complicated medical imaging problems have already been addressed using deep learning-based algorithms [9], [37]- [41]. Wachinger et al. [9] proposed the deep learningbased DeepNAT method to segment the neuroanatomy. BrainNetCNN was proposed by Kawahara et al. [24], and was used to predict the neurodevelopment of the brain using Diffusion MRI scans of preterm infants. Automatic hippocampal subfield segmentation is a crucial task for large-scale studies. Dolze et al. [3] proposed a 3-D fully convolutional network for subcortical segmentation of MRI scans. They tested their proposed network on two publicly available datasets (the ISBR dataset and the ABIDE dataset). In [42], the authors proposed U-Net, a deep learning-based method, to segment biomedical images. Algorithms based on Hough-CNNs [8], The data augmentation operation is performed on the slices. Similarly, using the same localized voxel position, a 3-D patch (size: 64 × 64 × 64) is extracted from the labeled T1-weighted MRI data, and then, the patches are split into axial,coronal and sagittal slices. The number of voxels that were assigned to represent the left and right hippocampi were counted from each slice. The total number of voxels attributed to the hippocmapus in any slice is considered the slice area of that particular slice. The HLV denotes the hippocampus label value. In our case, 17 and 53 were assigned the hippocampus label values for the left and right hippocampi, respectively. The calculated total number of voxels from any slice is used as the ground truth for that particular slice. (b) In the test phase, we only generated the patches. The generated patches are passed through the trained network to determine the number of voxels in the slices contributing to the hippocampus. Then, the areas of the slices in the MRI scan are summed to measure the total area of an MRI scan and multiplied by the slice placement distance (D b SP) to calculate the volume.
[43] [29] are used to detect and localize objects from images of different modalities. Also, a two phase, multimodel automatic brain tumour diagnosis system was developed using CNN in [44]. The ensemble system of a deep convolutional neural network (CNN) and transfer learning-based approach were proposed by a group of researchers in [27], [45], [46] to diagnose Alzheimer's disease from MRI and fMRI scans.

B. CONTRIBUTIONS
This study investigates the possibility of measuring the hard segmented discrete volume of hippocampi from MRI scans using a CNN without performing prior segmentation. Measuring the volume by segmenting the region of interest (ROI) is a conventional method. We tried to eliminate this conventional methodology and propose a direct approach to measure the hippocampal discrete volume automatically using deep learning. The proposed method is illustrated in Fig. 1 and Fig. 2 for discrete volume measurement. The analysis of MRI scans using a CNN after performing several preprocessing steps offers us a quantitatively efficient deep learningbased model to estimate the volume. Our proposed approach requires a few seconds to estimate the hippocampal volume, which allows for the on-site analysis of a patient's condition. Moreover, this method can be extended to measure any ROI volume from 3-D cross-sectional/longitudinal MRI scans.

II. METHODS AND PREPROCESSING
Hippocampal volumes were automatically estimated from MRI scans using the deep learning model. From the 3-D MRI scans, the corresponding slices attributed to the hippocampi were separated, and the number of voxels was calculated. Then, the total number of voxels was multiplied by the voxel volume associated with the left and right hippocampi; thus, the hippocampal discrete volume was measured.
An algorithm was developed using a CNN to estimate the number of voxels associated with each slice. Prior to measuring the number of voxels from each slice, it is necessary to locate the left and right hippocampi inside an MRI scan. The locations of the left and right hippocampi were estimated using a two-stage Hough-CNN model similar to that in [29]. Using these estimated locations, 3-D patches were extracted, and then the slices were separated into 2-D patches for further preprocessing. The 2-D CNN models were trained using the 2-D patches. These trained models were used to calculate the number of voxels contributing to the hippocampus in each slice of the respective hemisphere in the test phase.

A. CONVOLUTIONAL NEURAL NETWORK
The proposed convolutional neural network (CNN) models [47], [48] consist of a number of layers that conduct operations on the input data (I size ). The convolutional layers (C #filter kernel ) perform convolution operations on the input images I size with a number of preset kernels. They are usually followed by a batch normalization layer and activation function, where the normalization layer normalizes the convolution results and the activation function rescales the batch-normalized convolution outputs in a non linear manner. Pooling layers (P Type stride ) are used to reduce the dimensions of the outcomes produced by the previous layers through downsampling. The type of pooling can be max-pooling or average pooling. Finally, to extract the high-level features, the fully connected layers (F #filter ) are employed. The weights are optimized during training through backpropagation [49]. CNNs are good at extracting features without requiring any assistant from the user. The data are processed through the layers in a feed-forward manner, the results of the network are compared with the ground truth through a loss function, and the error is backpropagated to update the weights of all the layers.The training process continues until the model converges. After completing the training, predictions can be made by using the trained CNN model in a feed-forward manner, and the results are reported from the outputs of the last layer.

B. PATCH AND LABEL GENERATION FOR LOCALIZATION
The CNN models were designed in a similar way as those in [29], [50] to localize the left and right hippocampi. The network architecture has been changed slightly for this research work. Instead of using three models for each stage, one model was used for each stage; i.e., for the global phase, one global model was used, and for the local phase, one local model was used. For the global model, the patches were extracted from the whole MRI scan except the boundary region. The uniformly distributed random sample points were used to extract the global patches. The 96 × 96 patches were extracted with their corresponding displacement vectors and then resized into 32 × 32 2-D patches. The global patches were used to train the global model to estimate the global position of the hippocampus. For the local model, 32 × 32-sized patches were generated with corresponding displacement vectors. The patches were normalized considering a standard deviation of 1 and a mean of zero. Using these patches, a local model was trained and validated to locate the hippocampal positions from both sides of the hemispheres.

C. PATCH AND LABEL GENERATION FOR HARD SEGMENTED DISCRETE VOLUME MEASUREMENTS
The 326 MRI samples were separated into 5 roughly equal sections. The cross-validation were preformed by leaving one fold out for testing. For comparison with deep learning segmentation methods such as U-net and SegNet, only the results from the fifth fold is compared.
Using two-stage Hough-CNN model, we located the left and right hippocampi from the MRI scans. These located coordinates are used to extract 64 × 64 × 64, 3-D patches in the vicinity of the left and right hippocampi. From these 3-D patches, the axial, coronal, and sagittal slices were extracted. In the axial, coronal and sagittal planes, 64 slices were collected for each view from each MRI scan. Then, the slices were normalized separately. Using data augmentation technique; the slices are rotated by degrees and the dataset was expanded 10 times. The (−90 < < 90) is chosen from a random integer generator. The augmented slices were then resized and reshaped into 32×32×1. At the end, the reshaped axial, coronal and sagittal slices were concatenated along axis=2 (0-based axis) to construct 3-channel 2-D patches of size 32 × 32 × 3. From 1-channel gray 3-D MRI images, we have generated 2-D 3-Channel input data for our proposed model.
Similarly, the 64 × 64 × 64 3-D patches were extracted from the segmented label MRI scans and were separated into slices. Then, the numbers of voxels assigned to the left and right hippocampi were counted. The number of voxels counted from each slice is considered as the ground truth for the corresponding slice in the original MRI. The complete network detail and step by step processes for training and testing are illustrated in Fig. 2. The patch generation and label counting procedures for each MRI scan are explained in Fig. 1.

D. LOCALIZATION NETWORK ARCHITECTURE
The two-stage ensemble Hough CNN [29] was implemented with a minor change. The network sizes were kept the same for the global Hough CNN (GH-CNN) and the local Hough CNN (LH-CNN). The network consists of 6 convolutional layers followed by 3 fully connected layers. Three max pooling layers are used after 2 nd , 3 rd , and 4 th convolutional layers. Batch normalization [51] and ReLU [52] activation layers are used after each of the convolutional layer. Similarly, the fully connected layers are followed by ReLU activation and batch normalization layers, except for the last layer. The last layer is followed by a batch normalization layer. After the first and second fully connected layers, 25% and 35% dropout are used, respectively. The Adam optimizer [53] was used with its default parameter settings except the learning rate, which is set to 1e-4. The network architecture details are illustrated in Table 1.

E. LOCALIZATION
The global and local models were trained in exactly the same way as in [29]. The global model learned the features from the whole MRI scan and predicted the global position of the hippocampus. Similarly, the local patches were extracted to train the local model to predict the exact location of the hippocampus. In each phase, the models were trained using the Hough voting strategy described in [8], [29], [43]. In the test phase, the positions estimated by the global model were used to extract the local patches to predict the exact position of the hippocampus. The Localized coordinates of the hippocampi in the test MRI scans were used to extract 3-D patches to estimate the volume of the hippocampi in the corresponding MRI scans. The representative localization of right hippocampus by the two-stage Hough-CNN is shown in Fig. 3.

F. DISCRETE VOLUME MEASUREMENT NETWORK ARCHITECTURE
For the left and right hippocampi's discrete volume measurements, we designed identical network structures. Six convolutional layers were used to form the CNN structures followed by a batch normalization layer and ReLU activation function. A Max pooling layer are used after 2 nd , 3 rd , and 4 th convolutional layers. The fully connected layers are concatenated with the batch normalization layer and ReLU activation function. After the last fully connected layer, we used a batch normalization layer. The Adam optimizer is used with the default parameter settings with a learning rate of 1e-4. The detailed network architecture is shown in Table 2 and Fig. 2(a).

G. LOSS FUNCTION
To train the two-stage ensemble localization network model, the mean squared error is considered as the loss function.

MSE Hippocampus localization
where α is the number of patches generated from each MRI and q is the total number of MRI scans used for training. (X j , Y j , Z j ) are the target displacement vectors and (X j , Y j , Z j ) are the predicted displacement vectors. For hard segmented volume measurements, we used the mean squared error as a loss function as well. The squared differences between the predicted number of voxels of any particular slice and the true number of voxels of the same slice were used to train the network.
where, A X j , A Y j ,A Z j are the true numbers of voxels attributed to the corresponding j th slices of the axial, coronal and sagittal views, respectively, calculated by the ANT method, whereas, A X j ,A Y j ,A Z j are the corresponding predicted numbers of voxels of the same j th slice using our proposed approach. Here, β is the number of slices contributing to the hippocampus and n is the factor by which the slices are augmented to increase the dataset size to train the proposed neural network.   The representative training and testing curves are shown in Fig. 4. We validated the proposed models by performing 5 folds cross validation. The cross-validated results are shown in Table 3. We used a HP workstation Intel Xeon Processor (3.10 GHz) with 32GB RAM along with INVIDIA Quadro MD4000 GPU (8GB) to conduct the training, validation and testing operation.

A. DATASET
We used the GARD cohort dataset to verify our proposed approach. The GARD cohort dataset consists of 326 MRI scans of 326 patients. The patient age range is from 49 to 87 years, and the average age is 70.0184±6.074 (avg. age ± STD). This dataset is divided into four classes: Alzheimer's disease dementia (ADD), asymptomatic Alzheimer's disease (aAD), mild Alzheimer's diseases (MCI), and normal control (NC). The GARD cohort dataset was analysed using the ANT algorithm (Neuro I software package). Using ANT, the regions of interest (ROIs) were segmented, and the volumes were measured. The segmented MRI ROIs were marked using specific label values. The left and right hippocampi were identified using label values of 17 and 53, respectively. T1-weighted 326 MRI scans with corresponding 326 segmented label MRI scans are considered in this research work.
The MRI scans have 0.512mm 3 sized voxels with the dimensions of 320 × 212 × 240.

B. DISCRETE VOLUME MEASUREMENT
In an MRI M XYZ , if any voxel in location (X,Y,Z) contributes to the hippocampus, it is assigned as a target voxel to the corresponding slice. The total number of voxels attributed to the hippocampus in any particular slice are considered for the area measurement of that particular slice. In this way, we calculated the total number of voxels in different slices from an MRI scan. The slice area is multiplied by the slice placement distance D b SP to calculate the total discrete volume.
Let us consider that for the axial, coronal, and sagittal views of any particular MRI scan, if the numbers of hippocampal voxels in that MRI scan are A X , A Y , and A Z for each view, which can be separately considered as the total area of hippocampus in that MRI scan for each view, then the total number of voxels attributed to the hippocampus can be expressed using the following equation.
Next, for the β number of slices attributed to the hippocampus from any MRI scan with an augmentation factor of n, we can derive a general formula to calculate the total   number of voxels that form the hippocampus in any particular hemisphere.
To calculate the discrete volume, the estimated number of voxels needs to be multiplied by the voxel spacing distance, in other words, the number of voxels will be multiplied by the voxel volume.
Here, V d is the measured discrete volume of the target MRI scan. The whole procedure of our proposed approach of discrete volume measurement is shown in Algorithm 1. The measured discrete volume of the MRI scans of fifth fold using the proposed approach and the ANT method are reported using bar graphs in Fig. 5 and Fig. 6.

C. ERROR ESTIMATION FROM THE PREDICTED VOLUME
If the predicted discrete volume and discrete volume measured by the ANT method are V predicted and V Actual , the prediction error of our proposed approach with the ANT method can be expressed as follows.
Now, we can express the estimated errors in the discrete volume measurements as a percentage using the following expression.
The average errors in the measured volumes using the proposed approach and the ANT method for the left and right hippocampi are 4.3173 and 4.1562, respectively. In Tables 4  and 5, the predicted discrete volume with their corresponding errors are illustrated for 10 MRI scans (best case and worest case scenarios) along with their original discrete volumes.

D. STATISTICAL ANALYSIS
We performed a statistical analysis on the volumes measured by our proposed approach and the ANT method to determine whether our proposed approach has any statistical significance. To show the agreement in the volume measurements between our proposed method and those of the ANT method, Bland and Altman [54] mean-difference plots were generated using SPSS 16.0. The plots are shown in Fig. 7 and Fig. 8. The p-value measured in the test was less than .01 (p<.01), which indicates that it supported the null hypothesis; i.e., the methods that were utilized to measure the discrete volume of hippocampi were similar and statistically significant. We report the measured volume in Fig. 5 and Fig. 6.
The results were further analyzed to assess the suitability of the proposed approach as a substitute for atlas-based systems, such as the ANT algorithm. We calculated the squared Pearson correlation coefficients (R 2 ) comparing our method with the ANT method, where a large R 2 indicates that the method can be utilized as a proxy. The measured R 2 values were 0.834 (95% confidence intervals) and 0.848 (95% confidence intervals) for the left and right hippocampi, respectively. Two scatter plots of the discrete volumes of the left and right hippocampi measured by our method and the ANT algorithm with a linear line of best fit are shown in Fig. 9 and Fig. 10. Left Hippocampus: fifth fold scatter plot (our approach minus the ANT method plotted against their mean) of the discrete volumes measured by our proposed approach and the ANT method with 95% limits of agreement (LOAs) confined by green and orange lines.

E. COMPARATIVE ANALYSIS WITH OTHER METHODS
We performed a comparison test of our method with the 2-D UNet [42] and SegNet [55]. The segmentation matrices such as IoU, F1 score, Dice coefficient are not applicable in our VOLUME 8, 2020 A X β * n , A Y β * n , A Z β * n ← S β * n to the trained CNN model; comparison study. Instead, average voxel/volume errors% are compared. The dice coefficents are also reported as references.

1) DATA PREPROCESSING FOR U-Net AND SegNet
We have trained the U-Net and SegNet using the same MRI scans that were used to train our proposed models. Similarly, the testing set was also kept the same. We have used the axial slices for training, validation and testing the U-Net and SegNet. Previously, using two-stage Hough-CNN, we have located the left and right hippocampi's locations of the MRI scans. Those voxels locations were used to extract 3-D patches of size 80 × 80 × 80. We intentionally increased the size of patches to provide more global view of the brain regions. Then the axial slices (size: 80 × 80) were extracted     FIGURE 11. The visual interpretation of segmented volume with the voxel estimation are shown for the right hippocampus of a test sample for ANT method (a), U-Net (b), SegNet (c) and our proposed approach (d). As our method does not generate any mask instead it directly predicts the number of voxels attributed to the hippocampus, therefore, there is no graphical interpretation available for our method.
U-Net used 512×512 images [42] (ISBI 2012 dataset) which was much bigger than our 2-D patches of size 80×80. Similar to the T1-weighted MRI scans preparation, the segmented label MRI scans were pre-processed to generate the ground truths. We replaced HLV value (left hippocampus = 17 and Right hippocampus = 53) to 1 after generating the patches for both right and left hippocampi. The preprocessed data were used to train, validate and test the U-Net and SegNet. The detail of U-Net and SegNet are illustrated in the following sections.

2) U-Net
We have implemented the U-Net [42] using Keras library (backend: Tensorflow). The original implementation was kept same with few modifications. The loss function has been modified in our implementation. We used dice loss with VOLUME 8, 2020  sigmoid activation in the last layer instead of cross entropy loss with softmax activation that was used in the original implementation using Caffe library. The network parameters optimization was performed using Adam optimizer with a learning rate of 1e-5 along with other default parameter settings instead of SGD optimizer.
We have trained the U-Net with a 20 epochs that had taken approximately 12 hours for each side of the hippocampus in the same machine that we have used to train the proposed models. The dice losses for training, validation and testing are shown in Table 6 with the dice coefficient of the predicted mask images for the test MRI scans (fifth fold) against the ground truth. The dice coefficient values were calculated after reconstruction of the 3-D volume (size:80 × 80 × 80) from axial slices of mask images predicted by the trained U-Net. The estimated average dice coefficients for left and right hippocampi are 0.8114±0.0471 and 0.8130±0.0359 (Dice±STD) for MRI scans from fifth fold, respectively. The average computational time to predict the mask image was 3.28 seconds for each MRI scan (each MRI scan (80×80×80, 3-D patch) consisted of axial slices of 80 sample patches).

3) SegNet
SegNet [55] (Encoder-Decoder Architecture) was implemented using Keras library (backend: Tensorflow). In the decoder side, we modified the last layer activation function from softmax to sigmoid and changed the loss function to dice loss. The official SegNet was trained on RGB images, therefore, the input of the encoder side is 3-channel data. In this case, we changed it to 1-channel encoder input. Rest of the architecture was kept same as the original SegNet except for the optimizer.
SegNet was trained with 40 epochs requiring more than 57 hours for each side of the hippocampus in the same machine. We replaced SGD optimizer by Adam and used a learning rate of 1e-5 with other parameter settings as default. The detail dice losses of the SegNet training, validation and testing along with dice coefficients for the left and right hippocampi are shown in Table 6. After reconstructing 3-D masked volumes (size: 80 × 80 × 80) from the predicted axial slices of test MRI scans by the trained SegNet, the dice coefficients were estimated for the left and right hippocampi. The average dice coefficients are 0.8960±0.0356 and 0.9080±0.0230 for the left and right hippocampi, respectively. The average computational time for SegNet was 5.92 seconds for each reconstructed 3-D Volume.

IV. DISCUSSION
Hippocampal atrophy is a primary feature that contributes to the diagnosis of Alzheimer's disease. Tracing the hippocampus and automatically measuring its volume is a complicated and time-consuming task, even for expert neuroradiologist [17]. Several methods were introduced to segment and measure the volume automatically, such as automatic segmentation using an atlas and/or a probabilistic atlas-based method [4], deformation-based morphometry [21], [56], and statistical parametric approach [20], [57], [58]. Very recently, machine learning and deep learning algorithms have been proposed to trace and estimate the volume from MRI scans. All these methods were developed to detect or segment and analyze ROIs in the MRI scans for neurodegenerative diseases.
In this research work, we analyzed the GARD cohort dataset of 326 MRI scans using a deep learning algorithm. The purpose of this study was to measure the discrete volumes of the MRI scans and compare them with the atlas-based system to determine whether we can use the proposed approach as a proxy method. We estimated the number of voxels attributed to the corresponding slices, which contribute to the left and right hippocampi on each slice. Then, the total number of voxels was multiplied by the voxel volume to measure the discrete volume of that particular MRI scan. Several research initiatives have been implemented to segment the ROIs based on voxels. In [28], based on probabilistic information, multiple ROIs, including the hippocampi, were segmented by assigning a specific label to the respective voxels for a particular ROI. We used a CNN model to determine the numbers of voxels of the hippocampi. After determining the voxels, we calculated the volume multiplied by the voxel volume.
We performed a statistical analysis based on the measured volumes to determine whether the proposed approach is a viable substitute for the atlas-based system. We calculated the squared Pearson correlation coefficients of the reported discrete volumes. The values of R 2 were 0.834 and 0.848 (95% confidence intervals) for the left and right hippocampi, respectively. In [59], the authors' reported Pearson correlation coefficients of 0.83 / 0.82 for the manually measured volume and their method for the left hippocampus. Although it is a completely different case, as the datasets are not same, we can have an intuition of a proxy method from this statistical study. Two scatter plots with linear lines of best fit are shown in Fig. 9 and Fig. 10 based on our study. To assess the agreement in the values from our proposed CNN-based approach and the atlas-based approach, we used Bland and Altman [54] plots. The two measurements for the same MRI scan are expected to report the same result, i.e., the same volume, where the slop of regression line will be close to 1. We plotted the difference against the mean value which allows us to assess the bias and deviation. Bias values of 42.72 and 9.98 for the left and right hippocampi shown in Fig. 7 and Fig. 8 signify that, on average, the proposed method measures the volumes of 42.72 mm 3 and 9.98 mm 3 less than the compared ANT method [7] for the left and right hippocampi, respectively. However, we found that both methods were strongly correlated, and the measured p-value was less than 0.01.
At the end, the proposed method was compared with the state-of-the-art deep learning methods, such as U-Net and SegNet. The detail implementation and other changes made in the original architecture of U-Net and SegNet are explained in the previous sections. The U-Net performance was good in the context of dice coefficient (0.8114±0.0471 and 0.8130±0.0359 (Dice±STD) for the left and right hippocampi, respectively), however, the average voxel/volume errors% were comparatively very high (13.086±10.761 and 17.812±14.748 (Error%±STD) for the left and right hippocampi, respectively). Therefore, the proposed method outperformed the U-Net in the context of average voxel/volume estimation errors%. Moreover, SegNet performance was better than U-Net, however, our proposed method offers less average voxel/volume errors% than the SegNet. The average voxel/volume errors% for the left and right hippocampi of MRI scans from fifth fold along with the computational time are shown in Table 7 and Table 8. The proposed method's inference time is given for the final CNN model (Localization models (two-stage Hough-CNN) require 2 seconds to estimate the hippocampus position in an MRI scan.). ANT method's reported inference time is for the whole brain processing, whereas, the other methods only process the hippocampal region. The representative segmentation performance of U-Net, SegNet and ANT method with direct estimation of voxels on the same MRI scan by the proposed method are shown in Fig. 11.

A. LIMITATIONS
Although the proposed approach estimates the discrete segmentation volume, it cannot visualize the exact shapes of the hippocampi. The motivation of this research work is to measure the volume directly for on-site diagnosis and observe the volume changes in the baseline scans with other time point scans, such as 4-, 8-, and 12-Month visits.

V. CONCLUSION
In this research paper, we demonstrated a complete automatic neural network-based approach to measure discrete volumes from MRI scans. Our proposed method can accurately measure the left and right discrete hippocampal volumes from MRI scans and saves the time in a large margin by discarding the conventional method of measuring the volume by manually segmenting the region of interest. As volume atrophy is one of the primary biomarkers for Alzheimer's disease, the automatic estimation of discrete volumes from MRI scans can help in the diagnosis of Alzheimer's disease. The average errors of the predicted discrete volumes with the ANT method of the left and right hippocampi were 4.3173 ± 3.5436 (avg. error % ± STD) and 4.1562 ±3.5262 (avg. error % ± STD), respectively. The statistical analysis of the measured volumes showed that the volumes measured by proposed approach were significantly correlated with those from the ANT method. In addition, the proposed approach outperforms the state-of-the-art deep learning methods, such as U-Net and SegNet in case of average voxel/volume estimation errors%. The proposed approach can be used as a proxy method for the measurement of the discrete volume from MRI scans.