Skip Connection U-Net for White Matter Hyperintensities Segmentation From MRI

White matter hyperintensity (WMH) is associated with various aging and neurodegenerative diseases. In this paper, we proposed and validated a fully automatic system which integrates classical image processing and deep neural network for segmenting WHM from fluid attenuation inversion recovery (FLAIR) and T1 magnetic resonance (MR) images. In this system, a novel skip connection U-net (SC U-net) was proposed. In addition, an atlas-based method was introduced in the preprocessing stage to remove non-brain tissues (namely skull-stripping) and thus to improve the segmentation accuracy. Effectiveness of the proposed system was validated on a dataset of 60 paired images based on cross-scanner validation. Our experimental results revealed the effectiveness of the skull-stripping strategy. More importantly, compared to two existing state-of-the-art methods for segmenting WHM, including a U-net-like method and another deep learning method, the proposed SC U-net had a faster convergence, a lower loss and a higher segmentation accuracy. Both quantitative and qualitative analyses (via visual examinations) revealed the superior performance of our proposed SC U-net. The mean dice score of the proposed SC U-net was 78.36% which was much higher than those of a U-net-like method (74.99%) and an alternative deep learning method (74.80%). The software environment and model of the proposed system were made publicly accessible at Dockerhub.


I. INTRODUCTION
White matter hyperintensities (WMH), also known as leukoaraiosis, are brain areas of increased signal intensities, indicating macroscopic changes of brain tissues induced by white matter damages [1].

A. CLINICAL MOTIVATION
WMH are characteristics of aging and neurodegenerative diseases targeting the white matter, including stroke, dementia and multiple sclerosis (MS) [2]- [5]. These diseases may cause irreversible damages to the human brain [6], [7]. Although etiologies of these diseases are not yet fully understood, there is considerable evidence suggesting that they are related to WMH [3], [8]. The white matter is primarily composed of myelinated axons. Trapp

and colleagues concluded that MS is an immune-mediated demyelinating
The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad E. H. Chowdhury . disease [9]. Also a systematic review and meta-analysis have been conducted to demonstrate that WMH may be an important biomarker for predicting the risk of stroke, dementia and mortality [3]. In addition, since WMH often occur in the preclinical stage of dementia, the presence of WMH may increase the likelihood of developing from mild cognitive impairment (MCI) to dementia [10]. Most of the risk factors associated with WMH have been shown to be similar to those associated with cerebrovascular and cardiovascular diseases, including elevated blood pressure, diabetic atherosclerosis and homocysteine levels [11]. WMH can be observed on magnetic resonance images (MRIs), especially on images scanned by the fluid attenuated inversion recovery (FLAIR) sequence [6]. The volume of WMH has been demonstrated to correlate with symptoms' severity, disability progression and clinical outcomes [12]- [14]. The WMH volume can be measured non-invasively and quantitatively to aid the design of treating plans. Accordingly, quantifying WMH volume has become a focus in clinical research, of which a prerequisite is to segment out the WMH (see Fig. 1). Manually delineating WMH is reliable and accurate, but it is labor intensive, time consuming and subjective [15]. Therefore, it is urgent to develop an automatic segmentation method for WMH.

B. RELATED WORK
Over the last few years, various automatic WMH segmentation methods have been proposed. These methods can be broadly classified into unsupervised methods and supervised methods, depending on whether expertly annotated data is available or not.
Unsupervised methods do not require annotated WMH masks in the entire segmentation process. Since WMH are brighter than healthy brain tissues when revealed on FLAIR images, many unsupervised methods used thresholding to segment WMH. For instance, Jack et al. [16] used an optimal FLAIR intensity threshold to segment WMH based on image histogram. Gibson et al. [17] proposed a method using a conservative FLAIR threshold followed by two-class fuzzy C-means clustering (FCM). Yoo et al. [18] introduced an optimal threshold intensity which varies with WMHs volume for segmenting WMH.
Most other unsupervised methods are probabilistic models and are usually designed as hybrid models [19]. For example, Van Leemput et al. [20] proposed a weighted Expectation-Maximization (EM) model to detect MS lesions from a large dataset of T1-, T2-and proton density (PD) weighted scans. This was an early attempt to develop an atlas-based technique and the total lesion load correlation to the automated segmentation results is higher than that to the manual delineations. Schmidt et al. [21] proposed a lesion growth algorithm (LGA) for WMH segmentation using a parametric mixture model in which a Markov random field (MRF) was embedded. A Bayesian MRF relief was employed by Schwarz et al. [22] using lognormal distributions for detection white matter and WMH.Wu et al. [23] proposed a multi-atlas based method for simultaneously detecting and localizing WMH on FLAIR data. Although these unsupervised methods obtained some promising results, some parameters of these unsupervised models may be inadvertently over-tuned across different datasets.
Different from the aforementioned unsupervised methods, many supervised methods utilizing various classifiers based on manually delineated data to segmenting WMH were proposed. In the early stage, researchers usually used K-Nearest Neighbours (K-NN) as the classifiers for voxel-wise WMH classification. After that, many researchers used some other models to segment WMH. For instance, Dadar et al. [24] proposed a linear model trained by using spatial and intensity features from multiple MRI contrasts and manually labeled data to predict the lesion.
Recently, with the development of deep learning methods especially the Convolutional Neural Networks (CNNs) in computer vision, CNNs have been introduced into biomedical image processing [25]- [27]. In terms of WMH segmentation, many CNN-based methods have been proposed and have yielded promising results [28]- [35]. To be specific, Xu et al. [31] applied transfer learning by using pre-trained Visual Geometry Group (VGG) model on Image-Net for natural image classification and fine-tuned model on WMH data. Guerrero et al. [34] used a CNN trained with large high resolution image patches and differentiated between WMH-related pathology and stroke. Among these CNN methods, one of the best methods for WMH is a U-netlike method proposed elsewhere [30]. U-net has been widely used in segmenting biomedical images because it can work efficiently even with limited training samples [36]. In the work of [30] three identical U-net-like models with different randomly-initialized weights were adopted and ensembled and ranked the highest in a WMH segmentation challenge [37]. Although this U-net-like model in [30] has good performance, the connections were only designed between some layers of the down-convolutional part and those of the up-convolutional part resulting in a loss of segmentation accuracy and more time for convergence. To alleviate these two issues, based on the U-net-like model proposed in [30], we propose a new U-net-like architecture, namely skip connection U-net model (SC U-net), by utilizing a novel skip connection strategy.

C. CONTRIBUTION
In this paper, we propose a novel fully automatic system for WHM segmentation. The main contribution of this work is that we propose a new skip connection U-net model, SC U-net, to classify and localize WMH. This architecture consists of a shrinking part that aims to capture context, a symmetric expansive part that gradually combines features to enable a precise localization, and a skip connection part that alleviates the vanishing gradient problem and hence improve the optimization convergence speed. Another contribution is the introduction of a preprocessing step, skull-stripping, to improve WMH segmentation. In addition, we encapsulate our method and all related dependencies into a docker image which is made publicly available at DockerHub 1 . To make our neural network stable, we apply the MRSA initialization [38] in the training stage. This paper is organized as follows. In Section II, the flowchart of our method is depicted and the data processing procedures, as well as SC U-net, are detailed. The datasets, evaluation criteria and five metrics on assessing segmentation performance are presented in Section III. Evaluations of the proposed method on a public training dataset employing leave-one-out testing and cross scanner testing are presented in Section IV. Finally, Section V discusses the advantages of the proposed method and how the key parameters are optimized.

II. METHOD
As shown in Fig. 2, our pipeline for WMH segmentation consists of a data preprocessing procedure, a SC U-net training procedure and a testing procedure. In the data preprocessing, four steps including skull stripping, uniform sizing, gaussian normalization and data augmentation are conducted step by step. After that, the SC U-net is trained by the preprocessed training data. In the testing procedure, after the same preprocessing steps except data augmentation, the testing data are fed into the trained SC U-net to get the final segmentation results.

A. DATA PREPROCESSING
To reduce the false positive caused by scalp fat as it has high intensity in both T1 and FLAIR MRI, to normalize the image size and voxel intensity, and to equip the model with desired invariance and robustness, we conduct the following data preprocessing step by step.

1) SKULL STRIPPING
To separate brain from non-brain tissues, such as skull and eyes, we use the Brain Extraction Tool (BET) in FMRIB Software Library (FSL) 2 to generate brain masks from T1 images and the masks are used to extract the brain [39].

2) UNIFORM SIZING
The sizes of the axial slices from different datasets vary. We set the aimed image size to be 200 × 200, and padding (or cropping) is automatically operated if the axial slice size is smaller (or larger) than the aimed image size.

3) GAUSSIAN NORMALIZATION
MRI scans are often collected under different acquisition parameters or protocols, which may result in a large variation in the intensity ranges. Gaussian normalization is applied to rescale the voxel intensities to guarantee each 3D image has a similar intensity range and overall brightness.
where I (x, y, z) is the intensity value at location (x, y, z), and µ and σ are mean and standard deviation computed across all voxels of the image of interest.

4) DATA AUGMENTATION
Data augmentation is an effective way to equip a deep network with desired invariance and robustness properties when the training data are limited. For each axial slice, rotation, shearing and scaling are conducted. Finally, the size of our dataset is expanded by four times.
We build a SC U-net based on the work of Li et al. [30] and Ronneberger et al. [36]. As shown in Fig. 2, our SC U-net consists of a U-net part and a skip connection part. Similar to the U-net model proposed in [36], it consists of a down-convolution part which aims at extracting features for classifying each voxel into WMH and non-WMH and an up-convolution part which aims at locating WMH more precisely. The down-convolution part consists of two 3×3 convolution layers, each followed by a rectified linear unit (ReLU) and a 2 × 2 max pooling layer for down-sampling. In each step of the up-convolution part, after an up-sampling of the feature map, there are two 3 × 3 convolution layer and each followed by a ReLU. Concatenations are performed between the down-convolution part and the up-convolution part as shown in Fig. 2 using the gray lines. Random initialization is used to initialize the model weights.
Inspired by the work of [40], we propose a novel skip connection between the up-convolution part and the down-convolution part to increase convergence speed and improve segmentation performance. It consists of a cropping operation and a convolution operation (orange arrow with an addition sign shown in Fig 2). To be specific, the outputs of the 4th, 7th, 10th and 13th layers in the down-convolution part are respectively connected to the outputs of the 15th, 18th, 21th and 24th layers in the up-convolution part, and considered as the inputs of the 16th, 19th, 22th and 25th layers.

2) THE INFLUENCE OF SKIP CONNECTION
Skip connections are extra connections between nodes in different layers of a neural network [41], [42]. To get a better analytic understanding of skip connections, we consider linear network firstly. In a L−layer linear plain network, the input-output mapping is given by where x 1 denotes the input, x L denotes the output and W i (i = 1, . . . , L − 1) denotes the weights which are used to perform a linear projection that matches the number of feature maps at layer i to that at layer i + 1. In a linear network with skip connection between adjacent layers, the input-output mapping can be computed as where x SC L denotes the output of the L-th layer under the action of skip connection.
In a nonlinear case, the input-output mapping has the following form where f denotes the ReLU nonlinearity and I denotes the identity matrix having the same size as the input. To learn more features across different layers, we introduce skip connections in some of the symmetric layers rather than adjacent layers. The skip connection operation can be computed as Skip connections are also quite effective at dealing with the problem of gradient vanishing by creating short paths from the top layers to bottom layers [44]. The neural network is easy to train because skip connections improve the flow of information and gradient. As an example shown in Fig. 3, X denotes the input of the first layer, and after two convolution layers c 1 and c 2 , the output is X 1 . In the gradient back-propagation process, a layer receives gradients from the layers which it is connected to. To update the parameters represented by θ 2 of c 2 , the derivate of loss function J with respect to θ 2 is calculated as where X 1 and X 2 are the same. We can further fomulate (7) as Without skip connection, only ∂J ∂θ 2 is computed and its magnitude may become very small after back-propagating through many layers. Thus the gradient is more likely to decrease to zero with the first term only. The second term ∂J ∂X 4 ∂X 4 ∂X 2 ∂X 2 ∂θ 2 carries a larger component since it goes through less layers. According to these analyses, our proposed skip connection is beneficial for updating the weights of bottoms layers and alleviating the gradient vanishing, furthermore, accelerating the convergence of the neural network.

3) DICE LOSS
In general, a loss function maps the values of one variable or multiple variables into a real number and calculates a penalty for every incorrect prediction. In the task of WMH segmentation, the numbers of positives and negatives are highly unbalanced. In the work of [45], they proposed a novel loss function based on the dice coefficient and obtained good performance on a highly unbalanced dataset. As such, we use it as our loss function to train the proposed SC U-net. Let G = {g 1 , · · · , g N } be the ground truth masks over N slices, and P = {p 1 , · · · , p N } be the corresponding predicted maps over those N slices. The dice loss function can be expressed as where • denotes entrywise product of two matrices and |·| denotes the sum of matrix entries. The s term denotes a smoothing factor used to avoid a division by 0 when G and P are all zeros.

C. IMPLEMENTATION
The proposed method is implemented in Python language, using Keras with Tensorflow backend. All experiments are conducted on a Linux machine running Ubuntu 16.04 with 32 GB RAM. The model is trained using two GTX 1080 Ti GPUs. Adam optimizer is employed with an initial learning rate of 0.001 and the batch size is set to be 30 × 30. MSRA initialization proposed in [38] is used to initialize the model weights.

III. MATERIAL AND EVALUATION CRITERIA A. MICCAI WMH SEGMENTATION CHALLENGE DATASET
In this work, all dataset came from a MICCAI WMH challenge 3 . Sixty cases from three centers were released as a publicly-available benchmarking dataset for researchers to use. Characteristics of this data are summarized in Table 1.

B. EVALUATION 1) DICE SIMILARITY COEFFICIENT (DSC)
DSC is a statistic used for comparing the similarity of two sets, which is defined as where G denotes the gold standard segmentation of WMH, S denotes the corresponding automatic segmentation and |G ∩ S| denotes the overlap of G and S.

2) HAUSDORFF DISTANCE USING 95 TH PERCENTILE (H95)
Hausdorff distance measures how far two subsets in a metric space are from each other. It is defined as where d (x, y) , sup and inf respectively denote the distance between x and y, supremum and infimum. To avoid potential 3 https://wmh.isi.uu.nl/ issues induced by noisy segmentations, it is modified to be a robust version by using 95 th percentile, namely H95, instead of the maximum distance.

3) AVERAGE VOLUME DIFFERENCE (AVD)
AVD quantifies the difference of volumes between two subsets. It is defined as where V G and V S respectively denote the volumes of WMH in G and S.

4) SENSITIVITY AND F1-SCORE FOR INDIVIDUAL LESIONS
Let N G be the number of individual lesions in G, N P and N F be the number of correctly detected lesions and the number of wrongly detected lesions. Each individual lesion is defines as a 3D connected component. Then the recall is defined as and the F1-score is defined as

IV. EXPERIMENTAL RESULTS
For the validation experiment, we split all datasets into a training set and a validation set by randomly picking 80% and 20% cases. For cross-scanner experiments, we used 40 subjects from two scanners as the training data and 20 subjects from the other scanner as the testing data. Please note, we used 2D images at the axial view as our samples rather than the entire 3D images. U-net is the most widely used convolutional network in biomedical image segmentation. Li et al. used a U-net-like model in the same WMH segmentation challenge and obtained superior performance over other deep learning based methods [30]. Thus we compared our proposed method with the U-net-like model proposed in [30] and for convenience, we named it as U-net Li . To conduct a fair comparison, the ensemble method adopted in [30] was not considered.
In addition, we also compared our proposed method with another deep learning based segmentation approach proposed in [35].

A. VALIDATION EXPERIMENTAL RESULTS
The validation experiment was used to determine the hyperparameters in a network and compare the convergence rates of U-net Li and SC U-net. The corresponding loss curves over 200 epochs in the validation experiments are demonstrated in Fig. 4. The red and blue lines respectively denote the results of SC U-net and U-net Li . The circle and star markers respectively denote the results on the training data and the validation data. We notice that the training loss does not significantly decrease any more after about 100 epochs. As such, we use 100 epochs in our subsequent cross-scanner experiment. It can be observed that the loss of SC U-net decreases faster than that of U-net Li . And the validation loss of SC U-net is lower than that of U-net Li . A lower loss means a lower difference between the outputs of the network and the ground truth. Fig. 5 depicts representative segmentation results with or without skull-stripping in the data preprocessing. It is clearly shown that the segmentation results are largely affected by the skull fat. The reason is that the skull fat and WMH have similar intensity profiles. Without removing the skull in the FLAIR image, there will be some false positive in the segmentation results. And there will also be some false negative because the neural network will focus more on the skull fat. After applying skull-stripping in the data preprocessing, the segmentation performance is greatly improved.

B. CROSS-SCANNER EXPERIMENTAL RESULTS
The cross-scanner experiments are used to qualitatively and quantitatively analyze the segmentation performance of U-net Li and SC U-net. The first two rows of Fig. 6 show representative segmentation results on Utrecht when training the model using data from the other two datasets Singapore and GE3T.
In the segmentation results of Fig. 6, the green, red and blue areas respectively represent true positive, false negative and false positive. Evidently, the results of SC U-net suffer less false positive (blue area) and false negative (red area) than those of U-net Li . Table 2 shows the segmentation performance of U-net Li [30] and SC U-net in terms of five evaluation criteria described in section III-B. DSC is the main evaluation criterion to quantitatively compare the segmentation performance of different methods. Clearly, the proposed SC U-net has a higher mean and a lower standard deviation of DSC than U-net Li , suggesting that SC U-net is more accurate and more stable than U-net Li . The average DSC, Recall and F1 achieved by the proposed SC U-net are respectively 78.36%, 81.49% and 70.86%, which are higher than those of U-net Li (74.99%, 76.06% and 69.25%). The average H95 and AVD achieved by SC U-net are respectively 7.36 mm and 28.23%, which are lower than U-net Li (10.51 mm and 36.18 %).  These quantitative results reveal the effectiveness of SC U-net in respect of localization accuracy and overall lesion detection.

C. QUANTITATIVE RESULTS
Compared to the method proposed in [35], the mean DSC of our proposed method is higher than that in [35] (78.36% versus 74.80%), which also demonstrates the superior performance of our proposed segmentation method.

V. CONCLUSION
In this work, we proposed and validated a novel WMH segmentation method by combining image preprocessing and deep learning techniques. Skip connection was used to make U-net capture more features and converge to a better optimum. Both qualitative and quantitative analyses revealed the superior performance of the proposed pipeline compared to U-net Li . The proposed method is fully automatic and it has a great potential to become an end-to-end Computer Aided Diagnosis application. In the future, we will explore more efficient and more accurate skull-stripping methods given that FSL based skull-stripping is not very accurate nor computationally efficient. A plausible direction is to develop CNN based methodology to remove non-brain tissues in the preprocessing stage and then use the proposed SC U-net to automatically segment WMH. He was thereafter appointed as a NSERC Postdoctoral Fellow of the Thunder Bay Health Science Centre, Thunder Bay, Canada, where he was conducting research in biomedical X-ray imaging until he joined Apple Inc., Cupertino, CA, in August 2011. At Apple Inc., he was involved in touch sensor development for various Apple products such as iPad and Apple Watch. He is currently a Full Professor with the School of Electronics and Information Technology, Sun Yat-Sen University, in China, and also an Adjunct Professor with the Department of Electrical and Computer Engineering, Carnegie Mellon University, USA. His current research interests include emerging applications of thin-film transistors including flat-panel X-ray imaging, fingerprint biometrics, tactile sensors, energy harvesting, and sensor interfaces.
XIAOYING TANG received the B.S. degree in control system engineering and foreign language education from the Huazhong University of Science and Technology, Wuhan, China, in 2009, the M.S. degrees in electrical and computer engineering and in applied mathematics and statistics in 2011 and 2014, respectively, and the Ph.D. degree in electrical and computer engineering in 2014, from Johns Hopkins University, USA.
From 2014 to 2015, she was a Visiting Professor in electrical and computer engineering with Carnegie Mellon University. From 2015 to 2018, she was a Tenure-Track Assistant Professor with the SYSU-CMU Joint Institute of Engineering, Sun Yat-sen University. She is currently an Assistant Professor and an Associate Researcher with the Department of Electrical and Electronic Engineering, Southern University of Science and Technology. Her research interests include medical image segmentation and registration, diffusion tensor image analysis, statistical shape analysis, manifold learning and clustering, spatiotemporal analysis, multi-modality MRI analysis, pattern recognition, machine learning, and big data in medicine.