DCSegNet: Deep Learning Framework Based on Divide-and-Conquer Method for Liver Segmentation

Image segmentation plays a vital role in the medical diagnosis and intervention field. The segmentation methods can be classified as fully automated, semiautomated or manual. Among them, manual segmentation can best improve the quality of the results, but it is time-consuming and tedious, and it may lead to operator bias. A continuity-aware probabilistic network based on the divide-and-conquer method was proposed in the current work. The proposed network comprised backbone network, local segmentation and a weight network. The backbone network extracts the features from image. The local segmentation divides the data space, whereas the weight network provides the continuity-aware weights. Therefore, combining those results of the weighted segments can eventually yield precise estimations. In this study, the proposed model was evaluated against several recent methods on the three datasets, and a several performance indexes of segmentation were evaluated for liver segmentation, the results showing that it is the most advanced liver segmentation approach. The source code of this work is publicly shared at https://github.com/licongsheng/DCSegNet for others to easily reproduce the work and build their own models with the introduced mechanisms.


I. INTRODUCTION
A basic task in planning liver operations is to detect and evaluate the liver shape using abdominal CT images and computer assistance, such as radiotherapy [1]. There are a lot of researches focused on improving segmentation accuracy and efficiency [2] Nonetheless, numerous issues should be managed in liver segmentation, including the great variations in liver shapes, their different appearances and their ambiguous boundaries [3].
The liver segmentation algorithms are mainly classified into the following categories, including regional growth [4], graph cut [5], level set [6], and deep learning [7]. Typically, the deep learning method have become the hot research topics recently as a result of the accumulated data and the increased computing power, especially in machine vision tasks, such as the image classification [8] and segmentation [9].
The associate editor coordinating the review of this manuscript and approving it for publication was Seok-Bum Ko .
Generally, the deep convolutional neural networks (DCNNs) designed for image segmentation are categorized into two types, namely, 2D and 3D networks [10]. Among them, the 2D DCNNs have achieved favorable performance in many 2D medical image segmentation scenario [11]- [13]. Nonetheless, the interslice correlations of the liver tissue's 3D structure is not considered in the 2D neural network [14], which together with the varying liver data structure makes it unsuitable to apply a 2D CNN to volumetric liver segmentation. It is worth noting that a volumertical liver segmentation algorithm should be able to take into account the interslice and intraslice characteristics. To address this issue, 3D DCNNs have been constructed [15], [16], such as V-net [17], denseVNet [16], Z-Net [18] and other 3D neural networks [19]. However, they will greatly increase the model complexity and the number of hyperparameters in the model [17], [20]. To this end, it is difficult to train a 3D DCNN with limited training data and hardware resources. In addition, some studies have been carried out to improve the segmentation accuracy by adding loss function constraints, such as the boundary loss [20], the Hausdorff distance [21], the Mumford-Shah loss function [22] and signed distance map [23]. These methods contribute to improving the tissue edge segmentation accuracy of 2D slices, but they can hardly consider the continuity between layers.
A 2D DCNN segmentation network attempts to extract the features of all liver cross-sections through convolutions on the plane, which accounts for a key problem leading to the poor continuity between layers [24]. However, the structural changes in cross-sectional liver tissues are very significant; as a result, it is difficult to consider the differences between liver tissue and other tissues and the structural changes in vertical liver tissues. Based on the above considerations, this study aimed to analyze the structural similarity [25] in the crosssectional liver tissues of different human bodies, as shown in Fig. 1. Such a continuity-induced similarity relationship was predominant in neighboring layers, indicating that if a region with high structural similarity was segmented with a local model, then a better segmentation result should theoretically be obtained. A continuity-aware probabilistic network was proposed in the current work to address the abovementioned difficulties. First, a data volume is divided into several sub domains ( Fig. 1 (a)). Then, a DCNN model was proposed that was constituted by a backbone network, local segmentation and weighted network. In the model, the backbone network was employed for feature extraction. All the local segmenters segment the target area according to the features extracted from the backbone network. The weighted network determines which local segmenter should be used for each subspace. In other words, the weighted network is similar to telling us which expert has better segmentation performance for the current sub domain. Our proposed model has several advantages. First, the local segmentation network was used for the explicit modeling of heterogeneous data according to the divide-and-conquer method [26]. Second, the weighted networks recognized the continuity of any two local segmentation networks. Third, the probabilistic soft decision, rather than the hard decision, was utilized in the weighted networks [27]; in this way, all local segmentations provided robust and accurate estimations. Fourth, the local segmentation networks were trained simultaneously with weighted networks, and the network was easily integrated with a deep neural network to form the end-to-end model. Our specific findings are as follows.
1) The DCSegNet model was proposed, which introduced the divide-and-conquer method to 3D data segmentation, and improved the segmentation accuracy through local segmentation. 2) A new data segmentation approach was proposed for training the DCSegNet model, and training data were used to train the adjacent local sequences with overlapping layers to ensure the continuity between adjacent sequences.
3) The proposed DCSegNet model attained higher advanced performance for three datasets.

II. PROPOSED APPROACH A. OVERALL FRAMEWORK
The proposed liver segmentation framework based on the divide-and-conquer method is shown in Fig. 2. It consists of three parts, including a backbone network, a local segmentation network and weighted networks. The backbone network conducted feature extraction for any input image x ∈ X , and each slice image was fed into the deep convolution neural network for feature extraction. In the second part, a weighted network and a local segmentation network were employed to obtain the segmentation results from the extracted features. Notably, the final segmentation results were estimated as the weighted combination of all local segmentation networks. The local segmentation network was used to process the different cross-sectional data and divide the training data into L overlapping subdomains. Then, every data subset was adopted to train the corresponding local segmenter. We denote y ∈ Y as the output target of input sample x ∈ X , where X and Y was denoted the set of input image and VOLUME 8, 2020 segmentation results, and x and y was denoted a sample X and Y . Therefore, the segmentation for the l th subdomain (l = 1, 2, · · · , k) was described below: where z stands for the latent variable denoting the {x, y} affiliation to one subdomain, whereas µ l (x) represents the segmentation results for the l th local segmentation network of the input sample x. In addition, the Gaussian distribution N (y), which has a mean of µ l (x) and a variance of σ 2 l , is adopted for modeling the segmentation error.
To efficiently combine the above segmentation results, a weighted network that had the novel convolution neural network structure was put forward, and it generated the weight of every local segmenter. Typically, the weighted that corresponded to the l th local segmentation network was deemed to be π l (x). It was clear that for all x ∈ X , π l (x) was positive, and l π l (x) = 1. Thereafter, liver segmentation was carried out through constructing the model for the conditional probability function: Notably, segmentation aims to identify the mapping of g : x → y. Subsequently, the expected distribution of the conditional probability was calculated to estimate the output y for the input sample x: Therefore, the pooled results of the segmentation that were weighted based on the weight functions provided the eventual estimation. In the following sections, we will provide a detailed description of backbone network, local segmenter network and weighted networks respectively.

B. BACKBONE NETWORK
DeepLabv3+ extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results, especially along object boundaries [28]. It is able to encode multiscale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view; in addition, it can also capture sharper object boundaries by gradually recovering the spatial information.
The backbone network for DCSegNet is shown in Fig. 3. The last convolution layer and the upsampling operation in DeepLabv3+ are removed. The input of the backbone network is a slice of a volume CT, and the output contains 256 features with 64 × 64 dimensions. These output features are used as the direct input of local segmentation network and weight network respectively. Backbone network employed in this study [28].

C. LOCAL SEGMENTER
In the divide-and-conquer method, local segmenters are adopted for the effective modeling of the subdomain data. The local segmenter network architecture is shown in Fig. 4. The number of local segmenters is same as the number of subdomains. Each local segmenter contains two convolution layers with a 1 × 1 kernel. The first convolution layer extracts three feature maps from the output of the backbone network. Then, the local segmenter merges the local features with the sub features of the neighboring subdomain segmenter as the feature input of the last convolutional layer. The local segmentation networks can be deemed to be several experts, and every expert is skilled over the little segmentation subdomain, with diverse experts covering the distinct segmentation regions. Therefore, all experts are able to provide the expected outcome, regardless of the subdomain data.
To additionally model the liver label continuity, those local segmenter segmentation regions were densely overlapped. Those neighboring local segmenters shared a large degree of overlap within the specific regions, thereby rending a great degree of similarity. As a result, multiple segmenters were responsible for the liver segmentation for each subdomain, and so we were able to adopt ensemble learning to generate the precise segmentation results.
A comprehensively connected layer was employed for the dense overlapping of local segmenters. In addition, the sigmoid function was utilized to be an activation function, and the activation value of every local segmenter was mapped into the corresponding subdomain space, which was used as the expert outcome. According to the above discussion, µ l (x) was utilized to denote the l th local segmentation network results, and the later segmentation loss was given by the following: where

D. WEIGHTED NETWORK
A weighted network was necessary in the approach proposed in this study to determine the local segmentation network weights. As a result, the use of a weighted network that had the divide-and-conquer structure boosted the cooperation between the local segmentation networks and the weighted network. With regard to the divide-and-conquer principle, the tree structure has been extensively utilized as a hierarchical structure. For instance, in the computer vision and machine learning communities, decision trees have been frequently used as classifiers in the coarseto-fine decision-making process and as the tree structure. In this study, the weighted network contains one convolution layer and three fully connected layers. The weights {ω i } , i = 1, 2, · · · , L are regressed from the features extracted from the backbone network. The SoftMax activation function is used to ensure that {ω i } satisfies the following requirements: Notably, the KL divergence [29] was used to be a loss term to train the weight networks.
where ω l (x) is the weight of a subdomain x that was regressed by the weighted network, and z l is one-hot vector that indicates to what subdomain x belongs.

E. LOSS FUNCTION
In the end, we jointly learn local segmenter and weighted networks by defining the total loss as follow: where λ is utilized to balance the importance between the gating task and the regression task. λ = 1 is used in this study. According to our results, our proposed network was easily carried out through the use of the accessible comprehensively connected, sigmoid, and softmax layers within those available deep learning frameworks. Furthermore, the comprehensively differentiable network proposed in this study was embedded in each deep convolutional neural network, making it possible to conduct end-to-end training while obtaining a superior representation of the future.

III. EXPERIMENTS A. DATASET
The data used in the current work includes three datasets. They are the Liver Tumor Segmentation Challenge (LiTS) dataset, the 3D Image Reconstruction for Comparison of Algorithm Database-01 (3D-IRCADb-01) and custom CT scans from the First Affiliated Hospital, Zhejiang University School of Medicine, China. Fig. 6 shows the volume rendered abdominal CT images of three datasets.

1) LiTS DATASET
A total of 200 3D contrast-enhanced abdominal CT scans, together with the segmentation labels for liver and tumor regions, are included in the LiTS dataset, and the resolution in every axial slice is 512 × 512 pixels. One hundred thirty scans had the ground truth labels, including 10 as testing data, 20 as validation data, and 100 as training data. The scanning layer thickness in the data set is 0.8 mm-5 mm, and the number of liver scanning layers is 301-31. In training phase, Intensity values were clipped to the [−10, 200] HU range to ignore those uncorrelated details and normalize those VOLUME 8, 2020 images to [0, 1]. Additionally, the images were downsampled to 256 × 256. The random cropping and normal distribution noise (N (0, 0.01)) method was employed to enhance the training data [30].

2) 3D-IRCADb-01 DATASET
3D-IRCADb-01 is a database that includes several sets of anonymized medical images of patients and the manual segmentation of the various structures of interest was performed by clinical experts. The 3D medical images and masks of the segmented structures of interest are available as DICOM files. The dataset offers a higher variety and complexity of livers and their lesions and is publicly available. The 3DIRCADb includes 20 venous phase enhanced CT volumes from various European hospitals with different CT scanners. The scanning layer thickness in the data set is 1.0 mm-4 mm, and the number of liver scanning layers is 163-45. For our study, all volumes were used to validate the performance of the proposed model.

3) CUSTOM DATASET
The custom CT scans of ten patients from the First Affiliated Hospital, Zhejiang University School of Medicine were taken during the period from 2018 to 2019. All personal information about the patients was removed before the data left the hospital. The scanning layer thickness in the data set is 0.7 mm-1 mm, and the number of liver scanning layers is 162-134. These ten volumes were manually segmented by 3D Slicer (https://www.slicer.org/). In addition, all these data were used to validate the performance of the proposed model.

B. EXPERIMENTAL SETTINGS
Our model was implemented with Torch [31] and optimized using the Adam algorithm on the NVIDIA Tesla K40c GPU. The original learning rate was set to 1e − 4, and it decayed according to the poly schedule learning rate. In addition, 10 volumes from the LiTS validation data were used to monitor the performance of our model. During training, each CT scan was resampled to 256 × 256 pixels as the model input. 20 epochs were adopted to train the proposed model.
In the training data of DCSegNet, the number of subdomains (L) was set as 3, 5, and 7. In addition, there were 0, 3, and 5 overlapping layers in adjacent subdomains that were used to find the best configuration for liver segmentation (as shown in Fig. 7). The labels of the weighted network's outputs were 1, 2, · · · , L.
When the number of subdomains was set as 1, the proposed model could be considered to be the original DeepLabv3+. Table 1 shows the variation in the number of the training dataset of each local segmenter when the number of local segmenters and the size of overlap between adjacent regions changed.

C. EVALUATION METRICS
In this section, |TP|, |TN |, |FP|, and |FN | represent the true positive, true negative, false positive and false negative values, respectively.

1) DICE COEFFICIENT
The Dice coefficient (DICE) is an approach to determine the spatial overlap between the gold standard image and the segmented one, with the value ranging from 0 (no overlap) to 1 (perfect matching), as shown below [33]: The rand index (RI ) assesses the pixel consistency within the ground truth and segmented images, with a value that ranges from 0-1. Here, a value of '0' suggests that the segmentation results are completely different from the ground truth image whereas '1' represents that the segmentation result is identical to the ground truth image. RI is calculated using the following equation:

4) HAUSDORFF DISTANCE
Hausdorff Distance (HD) is widely used in evaluating medical image segmentation methods. It is one of the most informative and useful criteria because it is an indicator of the largest segmentation error. In some applications, segmentation is one step in a more complicated multi-step process. For example, some multimodal medical image registration methods rely on segmentation of an organ of interest in one or several images. In such applications, the largest segmentation error as quantified by HD can be a good measure of the usefulness of the segmentations for the intended task. for two point sets X and Y, the one-sided HD from X and Y is defined as [21], [34]: and, The bidirectional HD between these two sets, In the above definitions we have used the Euclidean distance, but other metrics can be used instead. Intuitively, HD(X,Y) is the longest distance one has to travel from a point in one of the two sets to its closest point in the other set. In image segmentation, HD is computed between boundaries of the estimated and ground-truth segmentations, which consist of curves in 2D and surfaces in 3D.

A. VERIFICATION OF THE WORKING PRINCIPLE OF LOCAL SEGMENTERS
In this study, a liver segmentation model was proposed in line with the divide-and-conquer principle based on the structural similarity of the adjacent liver cross-sections. Typically, high liver segmentation accuracy was achieved through training the local segmenters and the weights of the corresponding segmenter. Fig. 8 shows an example of the segmentation result for local segmenters (L = 5) for the cross-sections at different positions.
According to the outputs of DCSegNet, the model selected the corresponding segmenter and calculated the weight of each segmenter according to the slice features extracted from the backbone network. In this study, the data of adjacent subdomains were overlapped, and the target region was segmented by multiple local segmenters for some areas of the sliced data; therefore, the weighted integration of those output    Fig. 9 shows some 2D slices extracted from the LiTS, 3D-IRCADb-01 and custom datasets, as well as the ground truth and segmentation results obtained from the DCSegNet model.
The means and standard deviations of DICE, RI, TPR and 90 th HD are listed in Table 2 to Table 5, respectively. These tables indicated the DCSegNet model performance with different numbers of local segmenters using three different datasets. As shown from these tables, when the number of local segmenters is 5, for the different datasets, the three segmentation indicators have obtained the best results.

C. ABLATION INVESTIGATION AND PARAMETER ANALYSIS
To evaluate the effect of the number of different local subdomains and the overlap between adjacent subdomains on the segmentation results, the number of subdomains in the LiTS dataset is set a 3, 5, and 7 and the overlap between adjacent subdomains is 0, 3, and 5. We trained different network models, and used the models to verify the data of the three data sets. The comparison of the means and standard deviations of the DICE are shown in Table 6. Table 6 displays some specific conclusions. First, after comparing the impact on the segmentation results of using 3, 5, and 7 local segmenters to construct the segmentation network, it can be seen that the best results are obtained using 5 local segmenters. In theory, highly accurate segmentation was acquired from those abovementioned structures through the use of the set number of segmenters, which was reasonable since a higher number of local segmenters suggested a greater quantity of local segmentation networks, whereas a greater number of local segmentation networks indicated higher expert intelligence. Furthermore, the researchers continued to increase the number of local segmenters to analyze its impact on the segmentation results. For example, when the number of local segmenters increases to 7, the segmentation accuracy decreases by different amounts. Therefore, we can draw a conclusion that the performance tended to be saturated when the number of local segmenters was large enough since certain neighboring local segmentation networks corresponded to identical training data in the presence of an excessive number of local segmenters, making it impossible to boost the actual number of experts. In addition, as seen from Table 1, as the number of local segmenters increases, the amount of training data for each local divider will be significantly reduced, which may result in the insufficient training of the hyperparameters for each local divider. That is, the model is underfit. In addition, due to the individual differences in the liver disease structures of the patients and the differences in the CT imaging layer thickness, the similarity of the CT image structure in the same subdomain is reduced, which will further increase the difficulty of model fitting. These factors need to be further verified by standardizing the imaging process (such as selecting the same layer thickness) and increasing the number of training sets.

2) THE SIZE OF THE OVERLAP LAYERS FOR SUBDOMAINS
In this study, different sized overlapping region for adjacent subdomains of the training data were selected to study their impact on the segmentation results. It can be seen from table 6 that increasing the number of overlapping layers of adjacent subdomains will not significantly improve the segmentation accuracy, but the standard deviation of the segmentation will decrease. Table 1 shows that when increasing the number of overlapping layers of adjacent subdomains, for each local segmenter, the training data for the optimizer network parameters will increase accordingly, which is beneficial to improving the model performance. In addition, as the number of overlaps between adjacent subdomains increases, the data of the adjacent subdomains of adjacent local segmenters has better compatibility. In other words, each expert is not only good at the data segmentation of their respective subdomains but also the data segmentation of adjacent subdomains will be more professional. This results in better stability of the segmentation results, and a lower standard deviation in the DICE coefficient.

3) THE LOCAL SEGMENTATION ARCHITECTURE
To further improve the segmentation performance of the local subregions on adjacent subdomains, the middle layer of each local partition in the study introduces part of the middle CNN layer features from the adjacent local partitions, as shown in Fig. 4. In the study, we compared the effect of introducing the middle layer feature of the adjacent local splitter and not introducing this feature on the segmentation results. The results are shown in table 7. It can be seen from the calculation results that the use of different splitter architectures does not have a significant impact on the segmentation accuracy. This is mainly due to the overlapping of the subdomains in the training data. In addition, the training set data in LiTS use different layer thicknesses for the scanning procedures, which results in a reduction in the similarity of the data structure of the same subdomain and the overlap in the data structures of adjacent subdomains. Therefore, by introducing the local features of the middle layer of the adjacent local dividers in this study, the segmentation performance is not significantly improved.

V. CONCLUSION
The current work proposes the DCSegNet model, which is the continuity-aware probabilistic network used in liver segmentation. DCSegNet can definitely model the established continuity relationship across different components using the local segmentation networks based on the probabilistic network. According to the experimental results from using the three datasets, this model shows higher accuracy than the other advanced models when the appropriate parameters are selected.
Due to the different layer thicknesses of the data in the training set, the individual differences in the abdominal features, and the limited amount of data in the training set, the local segmenter is difficult to achieve the best expert performance. Therefore, it can be seen from the results that this study still does not solve the problem of continuity between adjacent layers. It is necessary to standardize the imaging parameters and collect more data to establish training in the future datasets to further optimize and improve the local segmenter to achieve more accurate segmentation results.