Automatic Detection of Amyloid Beta Plaques in Somatosensory Cortex of an Alzheimer’s Disease Mouse using Deep Learning

Identification of amyloid beta (Aβ) plaques in the cerebral cortex in models of Alzheimer’s Disease (AD) is of critical importance for research into therapeutics. Here we propose an innovative framework which automatically measures Aβ plaques in the cortex of a rodent model, based on anatomical segmentation using a deep learning approach. The framework has three phases: data acquisition to enhance image quality using preprocessing techniques and image normalization with a novel plaque removal algorithm, then an anatomical segmentation phase using the trained model, and finally an analysis phase to quantitate Aβ plaques. Supervised training with 946 sets of mouse brain section annotations exhibiting Aβ protein-labeled plaques (Aβ plaques) were trained with deep neural networks (DNNs). Five DNN architectures: FCN32, FCN16, FCN8, SegNet, and U-Net, were tested. Of these, U-Net was selected as it showed the most reliable segmentation performance. The framework demonstrated an accuracy of 83.98% and 91.21% of the Dice coefficient score for atlas segmentation with the test dataset. The proposed framework automatically segmented the somatosensory cortex and calculated the intensity and extent of Aβ plaques. This study contributes to image analysis in the field of neuroscience, allowing region-specific quantitation of image features using a deep learning approach.


I. INTRODUCTION
Alzheimer's Disease (AD) is one of the most common types of dementia, and the second leading cause of death in Australia [1,2]. AD is a degenerative brain disorder causing progressive cognitive decline and widespread neuron death [3]. Although the cause of AD is not yet understood, the presence of amyloid beta (Aβ) plaques, insoluble protein deposits in the cerebral cortex and hippocampus, is hypothesized to be a key element in the disease process [4][5][6].
The mouse brain has been utilized as a model for many human disorders [7][8][9][10] as a mouse share more than 80% of its genome with humans [11]. Rodent studies overcome the limitations of human experiments, such as ethical and economic issues. However, unlike studies using human brain image segmentation, anatomical segmentation of the mouse brain has received comparatively little attention [8,12].
Transgenic animals, such as mice and rats, are commonly utilized in studying Aβ accumulation and investigating therapeutics targeted at Aβ removal, which requires quantitative analysis of brain images [7]. To automate this analysis, the use of a brain atlas to segment the anatomical structures of the brain is essential for anatomically accurate quantitation [13]. Several studies have been conducted to develop brain atlas maps from brain images to help improve the AD diagnosis process [3,7,[13][14][15]. However, the quantitative analysis of Aβ in brain image datasets remains a great challenge in the field of neuroscience, as it requires anatomical expertise and compensation for the distortion of brain sections, as well as dataset acquisition being an expensive process [3,13,16]. Therefore, an innovative framework for Aβ quantitation is needed to reduce the cost and difficulty of the process, and reduce the numbers of experimental animals needed for adequately powered studies.
When studying cortical pathology, such as Aβ plaques, the somatosensory cortex is one of the most readily identifiable regions in coronal sections. This region processes sensory stimulation from innervation of the body in mammals, and especially the mystacial vibrissae in mice [17]. Although human AD tends to spare primary sensory areas, in transgenic mice the increasing level of transgene-driven Aβ in the somatosensory cortex disrupts neural timing relationships, leading to abnormalities in sensory processing [18]. Accurate quantitation of Aβ distribution with reference to consistent anatomical segmentation allows greater reproducibility and analytical reliability in studies of amyloid accumulation, which ultimately can help to understand the mechanism of AD in humans.
Typically, identifying cortical regions in brain sections is performed by human experts. Like all human judgements, however, it is subject to inconsistency, inexactness, subjectivity, and a degree of irreproducibility, as well as being hard to learn [19]. To overcome these challenges, several previous studies have attempted consistent anatomical segmentation using image processing techniques [11,[20][21][22][23]. Although they improved the recognition process, these approaches still had several limitations, including limited area localization [21,24], the requirement for a predefined template [25,26], and constraints on the input image [10,23,27]. Advances in deep learning methods offer an opportunity to overcome these limitations and further improve anatomical registration [3,6,28] to make the process automated, reproducible, and reliable [29]. However, there is no existing comprehensive deep learning framework providing Aβ plaque quantitation within automatically delineated regions of interest in a mouse brain image.
In this paper, an innovative framework, that quantitates pathology in a transgenic mice model by automatically measuring Aβ plaques in anatomically defined regions, is proposed. The proposed framework deploys both deep learning technology and image processing techniques to extract regions of interest within mouse brain images. Combining advantages of both techniques, an accurate quantitation of pathology becomes more reliable, costeffective, objective, and consistent, by generating region boundary guidelines for mouse brain images in a fully automated way.

A. DEVELOPING BRAIN ATLASES USING IMAGE PROCESSING TECHNIQUES AND DEEP LEARNING
Segmentation is one of the most important analytical steps applied to brain images for identifying anatomical structures [8,30,31]. Current approaches for anatomical segmentation of the mouse brain are generally semi-automatic [13], using supplementary image processing tools, such as ImageJ, custom plug-ins [7], and PyVips [6]. Conventional laboratory studies use microscopic imaging of postmortem brain slices that vary slightly in form, scale, texture, position, and pathology [25], making conventional manual approaches to segmentation time-consuming and labor-intensive [8]. Several previous approaches have used image processing techniques with the aim of simplifying this process [21,[32][33][34].
Template-based, or model-based segmentation, has been attempted by several groups [10,20,23,25,27]. In this approach, the template is a common reference frame created by magnetic resonance imaging (MRI) anatomists to consistently and accurately standardize brain regions [31,35,36]. Template-based segmentation enables regions of interest to be identified within images that cannot be segmented by simple image processing [10,27]. Various approaches, such as Automatic Nonlinear Image Matching and Anatomical Labeling [20], Multiple Automatically generated templates [23], and Advanced Normalization Tools [37], have been devised to segment images using manually derived templates [26,35]. By automatically overlaying the predefined template on the target image, structures are indicated by coloured labels [38], which reduces the time and difficulty of the segmentation process [23]. Template matching techniques generally show better performance than manual annotation [11] as well as relieving the need for experts to spend long periods annotating large numbers of brain images [10,23].
The template matching technique, however, is limited by serious shape and alignment constraints [39] due to the diversity of anatomical differences between brains [40]. These differences prevent templates from covering all variations of anatomical structures [22], as well as requiring a logistic difficulty of obtaining images of uniform quality and appearance when imaging postmortem slices. Since brain sectioning and processing is complex and exacting, variants, such as shape distortion, uneven labelling, and tissue damage are common in these images, which impair the reliability and accuracy of segmentation. Therefore, a reliable and consistent image analysis technique, robust to these variations in tissue and image quality, is needed to overcome these constraints.
Even though there are several computer-aided diagnostic systems for AD pathology analysis [3,6,41], a fully automated pipeline for anatomical segmentation and pathology quantitation using deep learning technology has not yet been reported. Using these approaches, anatomical registration can be improved by applying object detection and segmentation technologies [13], as is used by several studies aimed at creating brain atlases [3,12,13,42,43].
One such project used a fully convolutional neural network (F-CNN), inspired by the DeepLab architecture, to develop a human brain atlas from MRI scans [44], which was independent of differences in alignment or registration of brain sections [43]. The experiment used the Internet Brain Segmentation Repository (IBSR) dataset [45], containing MRIs with 18 classes of annotation, as well as the dataset from the Rolandic Epilepsy (RE) study with 35 human brain MRIs. The researchers compared the F-CNN method with one based on a Random Forest (RF) approach, a classification and regression algorithm with a randomizing layer [46]. The average Dice coefficient accuracy using the CNN-based method was 82.4%, whereas that of the RF-based method was 78.8% [43]. However, they reported study limitations, such as inaccurate segmentation of thin tail areas, the ambiguity of training symmetric structures, and constraints of the identifying volumetric structure [43].
Another project proposed a fully-automated, deep neural network-based method named Segmenting Brain Regions of interest (SeBRe) [13], to overcome the difficulty of anatomical segmentation, which holds vary in image shape and size, as is common with histological processing. To anatomically register sections with minimal human supervision, SeBRe deploys optimized masks using an architecture of regionbased convolutional neural networks (R-CNN) [47] and convolutional backbones. In use, SeBRe showed 84% mean average precision (mAP) for the original dataset and 87% mAP for the extended test dataset, with an evenly distributed accuracy rate across predefined classes. In addition to its excellent performance on mouse brain images, the SeBRe pipeline achieved 95% mAP in segmenting human MRI brain images. This approach demonstrates the utility of real-time segmentation of microscope images, even with typical variation in morphology [13]. However, the approach was limited to provide sectional segmentation without quantitative data.

B. QUANTITATION OF Aβ ACCUMULATION
Although the amyloid hypothesis of AD etiology has come under question [48,49], Aβ plaque load has long been the accepted metric for staging AD progression in postmortem tissue [50] and recently, via selective PET ligands in clinical trials for human AD [51]. In laboratory models such as transgenic rodents [52], studies of therapeutics targeting Aβ accumulation require precise quantitation of pathology from tissue sections, for which current approaches are often inconsistent [41], resulting in studies which are often underpowered to detect the effects they are designed to test [53].
Other than pathological methods, neuroimaging techniques such as computerized tomography (CT), magnetic resonance imaging (MRI) [54], or positron emission tomography (PET) imaging [51,55], are used to quantitate neurodegenerative changes and plaque loads in the brain. These methods are relatively non-invasive and can monitor disease progression promptly [54], particularly when combined with newer modalities [56]. Imaging methods commonly use a templatebased approach [57] and multimodal contrasts [58], however, they show low sensitivity and specificity in Aβ quantitation [59].

A. FRAMEWORK OVERVIEW
The framework proposed in this paper has three phases. First, a data acquisition phase improves image quality with preprocessing techniques and normalization using a novel plaque removal algorithm. Second, the anatomical The DNNs model is trained with the set of preprocessed images to infer brain atlas regions from section images. (Figure 1a). Utilizing the trained model, an atlas is inferred from the preprocessed input image so that the somatosensory cortex ROI can be identified (Figure 1b). This is then overlaid on the original image (Figure 1c), and the somatosensory area is extracted as regions of interest ( Figure 1d). Finally, Aβ plaques in these ROIs are quantitated for analysis (Figure 1e).

B. DATA ACQUISITION
A total of 1,558 mouse brain sectional images containing Aβ plaques were collected from 21 mice, with an average of 76 images per mouse. Six anatomical structures were annotated on these sections: hippocampal formation, thalamus, hypothalamus, retrosplenial cortex, somatosensory cortex, and striatum. Aβ plaques in the images (white areas in Figure 1a) were removed and infilled with the average intensity of the boundary in the training dataset.
The training dataset contained images from 6-month-old transgenic mice with two human familial AD genes, APPswe and PS1dE9, driven by the PrP promoter to rapidly generate large quantities of Aβ in their cortex [60]. The mice were housed in standard conditions at 20℃, a 12/12 hour light/dark cycle, and with standard lab chow and water ad libitum [7]. Brains were collected as follows: mice were anaesthetized (100 mg/kg sodium pentobarbitone i.p.; Sigma, USA) and then transcardially perfused with 4% paraformaldehyde in a 0.01M phosphate buffer (Sigma). The brains were removed and coronally sliced between bregma -0.22mm and bregma -1.22mm (17 in total) in 50 μm intervals using a vibrating microtome (VT 1000, Leica Microsytems, Germany). Brain slices were incubated in 88% formic acid for 8 min at room temperature before 6 × 5 minutes washes in 0.01M phosphate buffered saline (PBS), to expose the Aβ antigen for antibody labelling. Sections were then incubated in 10% normal goat serum (Sigma) in PBS, followed by an antiamyloid-β primary antibody (MOAB-2, 1:2000, NBP2-13075, Novus Biologicals, USA) in 0.01M PBS, then a goat-antimouse-IgG2b secondary conjugated with Alexa 546 fluorophore (A21143, Thermo Fisher Scientific, USA). After mounting, the sections were scanned with a VS120-L100-W Olympus Virtual Slide Microscope using a 20× Olympus UPLSAPO objective.
All animal work was compliant with the NHMRC Guidelines for Animal Research and was approved by the Animal Ethics Committee at the University of Tasmania, (permit A16276).
Level 5 resolution images were extracted from the Olympus VSI file format, at 8 bits per pixel and a 1:1 pixel aspect ratio. Image extraction from original multi-resolution VSI files generated by the microscope imaging software, was done using the Bio-Formats plugin for ImageJ. Image size varied according to brain section dimensions, typically around 19200 × 38000 pixels. For standardization in training, images were downsampled and unified to 1750 × 1250, the maximum crop which minimizes pixel losses from the original images.

C. Aβ PLAQUE DELETION
To allow the model to segment images more efficiently, an infilling method was employed as a simple and efficient method to clean the images (Figure 2). This process is important to improve the DNN model's performance and capacity: deleting the plaques from the input image allows the framework to perform consistent segmentations regardless of plaque load. Since the acquired images vary greatly in the number of plaques present, robustness to variation in plaque load is essential to this framework. Thus, this process infills Aβ plaques with a neutral gray level to remove anatomically irrelevant features for the anatomical registration step.
RGB images were converted to grayscale images by eliminating hue and saturation, whilst retaining the luminance, as shown in equation (1) [61]: (1) A global threshold value, T, is then calculated using Otsu's method [62] to minimize the variance of pixels above and below this value and to produce a binary image, which was then dilated by 2 pixels.
Next, all connected components from the binary image are extracted as "plaque regions". Their boundaries are traced, and the intensity values are averaged along the length of the boundary to give an average intensity (Av), which is used to infill the plaque area and thereby "remove" it from the image.

D. DATA ANNOTATION AND DIVISION
Image annotations were guided manually by experienced neuroscientists to produce templates generated using computer vision annotation tools (CVAT) (https://github.com/opencv/cvat) in an RGB scale image format. A total of 1,558 mouse brain images were used (Table  1) as the experiment dataset. This dataset was divided into 5folded training data, employed on the training and validation datasets in turns, and testing data ( Table 1). The datasets were used to train a model and to test the trained model respectively. The data splitting process ensured that the data from the same sectional image was in the same split set. This data splitting policy ensured mutual exclusion between the datasets, which means that the validation and test datasets were set to be unseen data from the view of the trained model. To enlarge the training dataset, additional images were created by augmenting the original images by a rotation range from +18° to -18° at 2° intervals. This allowed the model to be trained with a more abundant dataset of images at various angles.  Hippocampal  formation  Train  Fold 1  285  570  570  494  570  532  532  Fold 2  285  513  570  551  532  513  551  Fold 3  266  494  532  532  532  437  532  Fold 4  266  494  532  513  532  456  532  Fold 5  266  532  532  513  532  494  532  Test  -190  361  380  380  380  361  361  Total  1,558  2,964  3,316  2,983 3,078 2,793 3,040

E. ARCHITECTURE SELECTION AND TRAINING
A variety of DNNs were trained using collected images, along with corresponding annotation data, to build an object detection and segmentation model (Figure 3a) [12,[63][64][65][66]. All the training, validation, and test datasets were processed using the Av algorithm as previously described (Figure 3b). With reference to previous literature, we compared five candidates, FCN32, FCN16, FCN8, U-Net [67], and SegNet [68], to find the most reliable architecture for anatomical segmentation (Figure 3c). All configurations were set to be equal for a fair comparison, minimizing any possible variants between model training processes. After several attempts, the training hyperparameters were experimentally determined as follows: training for 100 epochs with 512 steps per epoch, a learning rate of 0.001 optimized using an Adam optimizer, and a batch size of 1. These trained models automatically generated anatomical ROIs from input images (Figure 3d). After training and evaluation with statistical performance measures, such as the Dice coefficient and accuracy [69], U-Net was found to be the most accurate architecture.
Model training was conducted on Anaconda 4.7.11, running 64bit Ubuntu Linux 16.04.6 LTS and Python v3.8.3. TensorFlow-GPU v1.14.0 was used to accelerate the DNN framework's training process and Keras v2.4.0 was used as a Python deep learning application programming interface (API). To allow the final model to cope with brightness variations inherent in the image acquisition process, the Keras framework was used to apply brightness adjustments of up to ±40% to the input images.

H. Aβ PLAQUE QUANTITATION
To analyze pathology, an Aβ plaque quantitation step was then performed. Using the brain atlas overlay to extract the somatosensory cortex area from the original images, the number and pixel extent of Aβ plaques was calculated as an estimate of plaque load in this region.

F. MODEL EVALUATION
The classification performance of the trained model was evaluated using the following metrics: accuracy (2), recall (3), precision (4) and Dice coefficient (5). Compared to the reference annotation, each pixel is counted as one of four possible outcomes: true positive (TP), true negative (TN), false positive (FP) or false negative (FN) [70], from which these metrics are derived as follows:  (Figure 5f) show relatively low segmentation accuracy. FCN8 shows false positive pixels in the somatosensory cortex and thalamus, whereas SegNet shows many true negative pixels in both the somatosensory cortex and hypothalamus. Table 2 compares the five trained models. In both the validation and test datasets, U-Net shows the highest overall result across the evaluation criteria. In the training results (Table 2), the Dice coefficient of U-Net (93.64 ± 0.40%) was significantly (P < 0.05) higher than FCN32 (92.59 ± 0.36%), FCN16 (92.03 ± 0.54%), FCN8 (91.62 ± 0.33%), or SegNet (92.41 ± 0.19%). U-Net also performed significantly (P <  are represented as mean ± standard error mean, means in the same column with different superscripts differ significantly *(P < 0.05) and ** (P < 0.01) 0.01) higher in terms of accuracy (92.61 ± 0.18%), precision (92.69 ± 0.18%) and recall (92.56 ± 0.19%). Additionally, the same training process was conducted with animal-based split dataset. However, Dice, accuracy, recall, and precision were not significantly different between image-based splits and animal-based splits of the dataset during validation and testing (data not shown). In the test dataset results for trained models (Table 3), representable models are selected from each trained model. Model selection is based on median values of the Dice coefficient in the results of 5-fold cross validations. It can be seen that U-Net also performed significantly higher than other DNNs in terms of accuracy, precision and recall for testing datasets. Table 4 reports the analysis of Aβ plaques identified in the segmented somatosensory cortex of the test datasets. 10 images were randomly selected from the test datasets and the performances of Aβ plaque detection between experts' manual quantitation and the automated framework proposed in this paper were compared ( Table 4). The proposed framework successfully extracts somatosensory cortex ROIs from Aβ immunolabeled mouse brain images and quantitates the number and extent of Aβ plaques in those anatomical regions ( Table 4). The percentage of plaque area, also known as "plaque burden" or "plaque load", is calculated by dividing the number of detected Aβ pixels by the total number of segmented ROI pixels, from which accuracy is calculated by using ground truth data (Table 4). Table 4 reports that the performance of the proposed framework shows reliable results when compared with the performance of experts by recording an average of 95.63 accuracy.

V. DISCUSSION
The proposed framework suggests a comprehensive solution for automated analysis of Aβ plaques in specified anatomical regions of mouse brain sections. However, there are several potential areas of improvement needed to supplement the reliability of the framework. First, there are more sources of variation to be considered in the input images when preprocessing the dataset. These occur because the brain slices are manually placed and then adjusted by experimenters when they are placed on the scanner. In addition, because the brain slices are secured in a floating state, it is physically hard to maintain the same angle Percentage of Aβ area (%): number of Aβ pixels / total region of interest pixels of somatosensory cortex and position [7]. Many other factors can affect the quality of microscopic imaging, such as tissue perfusion and fixation, the length of time in a storage solution, the effectiveness of antigen retrieval, the quality of primary and secondary antibodies, fading of fluorophores, the intensity and alignment of the light source and imaging optics, tissue autofluorescence, and occasional clipping in the imaging sensor [41]. Such variants can cause irregular data during the data augmentation process. For example, if a +20° rotation augmentation is applied to an already tilted image, it may no longer be in the acceptable range for training. To moderate extreme cases, angle and brightness adjustment are applied. The section angle is adjusted by reference to the standardized mouse brain atlas template provided by the Allen Institute [71] and brightness normalizing is made using OpenCV. After angle adjustment and normalization, the resulting image set can be utilized more effectively for model training.
In addition to this, the Av algorithm is needed to delete plaques from input images to enable accurate segmentation, which might limit this framework in fully automatic applications. To resolve this limitation, a more automated method to apply the Av algorithm needs to be developed.
Second, the number of DNN architectures tested was small, given the proliferation of new models in this fastgrowing domain. It might be expected that future, new and improved DNN architectures could show better performance than those evaluated here.
Third, the input dataset was limited to a set of images from one experiment. To create a suitable model training input which mimics possible wider variations in input images, the dataset was artificially expanded with augmentation, by duplicating and rotating images. Adding more datasets for training might be expected to produce more robust performance across a range of applications.
Fourth, the quantitation result is highly dependent on the automatic segmentation process. The performance of the segmentation DNN is vital for the performance of the proposed framework. In addition, the quantitation result will be affected by errors produced by each step. This successive error dependency might be lowered by applying image processing techniques at the formal stage of quantitation. This will be the focus of a future study that refines detected segments by removing false positive pixels, or the use of machine learning based techniques for more robust plaque quantitation [41].
Finally, the number of annotation classes for the brain atlas regions was limited. The mouse brain contains more than 60 major anatomical divisions and over 500 substructures [72]. In this study, only a few classes were annotated and trained, but they provided guidance for identifying most of the major regions. The models also accurately segmented the somatosensory cortex, which is often assayed in animal models of AD. Therefore, the present study forms a basis to expand the variety of identifiable structures in future work, using additional datasets with more extensive annotation.
An important advantage of the use of automated anatomical segmentation is the potential for high-throughput quantitation using consistent criteria. The great majority of experimental outcomes in studies of transgenic models of Aβ pathology use just a few brain sections for which ROIs are drawn by hand, and which also are quantitated using measures highly susceptible to bias, such as manual thresholding [53]. The automation, combined with robust ML based segmentation of pathology [41], would permit large numbers of sections to be quantitated in an unbiased, reproducible manner that does not require experimenter blinding to study conditions. Along with the evaluation metrics reported, qualitative evaluation of segmentation results should be made -not only from the perspective of a deep learning engineer, but by neuroscientists, who are the target domain experts for this method. For the engineer, having a low false positive rate and low true negative rate for a high accuracy rate is a key criterion. However, for a specific neuroscience application, having a high true positive rate and high false negative rate may be more acceptable when considering the segmentation in its regional context, rather than pixel by pixel. This emphasizes the need for a range of performance criteria, both quantitative and qualitative, exemplified by comparing Table  2 and Figure 5.

VI. CONCLUSION
The proposed framework demonstrated reliable anatomical segmentation using the standalone knowledge in the trained DNNs. The best model, U-Net, showed an 83.98% accuracy and a 91.21% Dice coefficient score on the test dataset.
This study contributes to image analysis in the field of neuroscience, allowing region-specific quantitation of images features by means of a deep learning approach. In the case of measuring plaque loads in AD transgenic mice, this approach offers consistent and unbiased selection of measurement ROIs, using a documented and reproducible algorithmic technique. This has the potential to improve reproducibility and inter-study comparison, as well as reducing the intrinsic variation in current manual ROI selection. The aim of this refinement is to increase the statistical power of studies using tissue analysis, with the goal of more reliably detecting effects, and reducing the need for large cohorts of experimental animals. Going forward from this study, the authors will work on enhancing the framework in the domains of image transformation, machine learning, neuroanatomy, and multi-resolution imaging. The aim of this future research is to contribute techniques and skills which could be adopted in medical imaging, image recognition, and artificial intelligence applications.

DATA AND CODE AVAILABILITY
The data and code used in this study is publicly available from the following link: https://github.com/boguss1225/image-segmentation-keras