Self-Paced Learning With Diversity for Medical Image Segmentation by Using the Query-by-Committee and Dynamic Clustering Techniques

Convolutional neural networks (CNNs), as a typical deep learning technique, have been widely used in image segmentation, but they often require a large amount of annotated data. However, the number of available pixel-wise labeled medical images is extremely small, and this prevents the application of CNNs in many medical image segmentation tasks. We proposed a self-paced learning with diversity SPLD) framework to boost the performances of medical image segmentation models with a limited amount of annotated data. Self-paced learning (SPL) is a learning regime that selects training samples in order from the easiest to the most difficult for model training. In addition, we took the diversity of the data into consideration. The proposed self-paced learning with diversity by query-by-committee (SPLD-QBC) algorithm dynamically and diversely selects the appropriate training data to boost the performance of an image segmentation model. SPLD-QBC incorporates the query-by-committee (QBC) technique for data selection and affinity propagation for optimizing the data diversity. By dynamically selecting the optimal sequence of training samples from different probability distributions, the segmentation models achieved improved performances. To verify the effectiveness of the proposed SPLD-QBC framework, we conducted experiments on three medical image segmentation tasks with five different datasets. The experimental results indicated that the proposed SPLD-QBC algorithm significantly improved upon the segmentation performances of the baseline models and resulted in a higher Dice score, surface distance and mean intersection over union (mIoU). The proposed SPLD significantly boosts the segmentation performances of models and is easily embedded into CNN-based image segmentation models.


I. INTRODUCTION
The precision of medical image segmentation is crucial for obtaining accurate diagnoses. In recent years, deep learningbased techniques have become the most popular approaches in the various subfields of medical image semantic segmentation, such as retinal vessel segmentation [1], organ segmentation [2], and cell segmentation [3]. To the best of our The associate editor coordinating the review of this manuscript and approving it for publication was Mostafa M. Fouda . knowledge, training convolutional neural networks (CNNs) requires a large amount of labeled training data. Furthermore, deep learning often requires strenuous annotation by clinical experts since a large amount of image data are required for training and validating deep neural networks. Under such circumstances, it is necessary to fully use the information behind the medical data to generate reliable results for clinical usage.
Most of the optimization functions used in deep learning are nonconvex functions [4], which increase the difficulty of finding global minima. Self-paced learning (SPL) [5] is a popular training strategy for enhancing model performance by first feeding easy samples to the model to further exploit the information behind the training data. SPL imitates the learning process of human beings, which also starts with easy samples and then gradually uses some complex data for model training. In SPL, an additional regularization term is added to the optimization function so that the loss and curriculum are trained jointly. However, the ease of a training sample is hard to determine, and what is intuitively ''easy'' for a human may not be as easy for the computer to comprehend.
According to [6], the probability distribution behind a dataset is difficult to capture. In addition, the distribution of the training data is independently and identically distributed. When directly applying the SPL learning framework to image segmentation tasks, the SPL model selects images that reside in the same cluster of the probability distribution so that the weights of the model are optimized according to a local minimum [7]. Hence, employing data from diverse distributions is important.
Self-paced learning with diversity (SPLD) was proposed in [7] by incorporating data diversity into the SPL learning strategy. The original SPLD algorithm demonstrated that the intuitive approach for SPLD is to select samples from different groups, i.e., data from different classes. However, in semantic image segmentation tasks, categorical labels are often missing, thereby limiting the application of SPL with diversity.
In this paper, a versatile self-paced learning with diversity by query-by-committee (SPLD-QBC) framework for medical image segmentation is presented. The proposed SPLD-QBC segmentation framework integrates a regularization term into the SPL loss function. Instead of using image-level accuracy, i.e., the total Dice score and pixel-level metrics, we incorporate the query-by-committee (QBC) [9] technique to determine the simplicity of the data in SPL regimes. A member in the committee is a deep CNN-based segmentation network, and each member is trained with the same dataset. Under the SPLD-QBC learning framework, the data simplicity and diversity are determined by the latent features of each member. The next query is chosen according to the principle of maximal similarity, which is calculated by the cosine distances between the extracted features among all members. An overview of the proposed SPLD-QBC framework is shown in Figure 1.
As depicted in Figure 1, the SPLD-QBC framework contains several members in the committee. Each member is a deep CNN for medical image segmentation. The feature maps with the smallest sizes, located in the deepest layer of the image encoder, are extracted to represent the feature of an image patch. All of the members work cooperatively to determine the learning pace: easy samples are utilized to train the model first, and then hard samples are iteratively added to the training set for model finetuning. Furthermore, we use the extracted feature vectors to group the training data and thus guarantee the diversity of the data. VOLUME 9, 2021 The main contribution of this paper is that we propose a versatile medical image semantic segmentation framework that combines SPLD and the QBC technique. To the best of our knowledge, this paper is the first work that uses self-paced learning to perform medical image semantic segmentation. The proposed SPLD-QBC framework is a supervised learning regime that aims at improving the performance of an image segmentation model. Instead of determining the difficulty level of the training samples, the features extracted from the deep neural network are used as surrogates to measure the simplicity of the data and thus determine the training sequence. More impressively, quantitative and theoretical proofs of the effectiveness of the proposed SPLD-QBC framework are illustrated.

II. RELATED WORK
A. SELF-PACED LEARNING Curriculum learning (CL) indicates that learning first from the easiest aspects of a task and gradually increasing the difficulty level can benefit model training. CL helps the model find improved local minima and speeds up the convergence of the training process towards the global minimum. SPL and CL share a similar learning regime concept but differ in their designs of the learning pace. In comparison to human education, CL is guided by instructors, while SPL is advised by students [8]. In CL, the syllabus or the data learning sequence is determined by prior knowledge; however, the learning sequence is dynamically updated during successive training iterations in SPL. An application using a neural network to determine an appropriate syllabus was presented in [9], and the learning progress was determined using reinforcement learning rewards to maximize learning efficiency.
In detail, SPL embeds curriculum learning and assigns a regularization term into the learning objective function [8]. During the training periods, the loss function and the SPL regularization term are optimized jointly. The goal of a typical SPL model is to jointly learn the model parameter w and the data weight variable v = [v 1 , v 2 , . . . , v n ] T by minimizing the objective function in Eq. 1: where λ is the hyperparameter for controlling the learning pace, f is the machine learning model, L is the loss function, and x i and y i are the training data and corresponding label, respectively.
From an extendibility perspective, SPL has been extended into self-paced curriculum learning [8], self-paced learning with reinforcement learning [10], and self-paced boosted learning [11]. In the extensions of SPL, several regularization terms are added to Eq. 1 for the optimization of specific tasks. From an applicability perspective, SPL has been applied to classification [11], matrix factorization [12], and mixtures of regressions [13]. Zhang et al. [14] proposed a self-paced fine-tuning network for localizing and segmenting objects in weakly-labeled videos. A joint SPL regularizer was employed to compute the data priority values according to the localization task and segmentation task. Moreover, SPL is utilized in multi-instance learning for object detection [15]. In addition, CL and SPL are jointly utilized to perform weak object detection tasks, where the prior knowledge of the learning sequence is integrated by the corresponding regularization term [16]. For image segmentation tasks, Tong et al. [17] proposed a self-paced DenseNet that dynamically adjusts the weights of each target class to train the easiest classes first and then train the hard object classes.
However, most SPL applications are implemented on sparsely labeled classification data and for object detection tasks, where the label often exists as a scalar or a bounding box. However, studies related to dense label classification, such as image segmentation using SPL, are limited.

B. SELF-PACED LEARNING WITH DIVERSITY
SPLD was first proposed by Jiang et al. [7], who embedded a regularization term independent of the specific model objectives to determine the diversity of the training data. SPLD integrates both diversity and the ease of generating a curriculum to sequentially feed reasonable data into the model for training and aims to guarantee improved model performance. The modification of SPLD involves adding a regularization term into Eq. 1 to form the diversity constraint. The proposed objective function is shown in Eq. 2.
where γ is the newly added hyperparameter for the diversification pace. SPLD was first performed on an action recognition task and a multimedia event detection task with a random forest [18] and support vector machine (SVM). The datasets used in both tasks were sparsely labeled datasets in which the class of each data point was known by the SPLD model. However, in medical image semantic segmentation tasks, the image-level labels, i.e., the categorical labels, are unknown. Therefore, controlling data diversity by using previously obtained class labels is impossible.

C. QUERY-BY-COMMITTEE
QBC is an active learning algorithm. The key technique of QBC is that it actively decides which data are critical for the model training process. In QBC, a set of active learners is trained, and the learners vote on the data to decide which data points need to be added to the training set. To some extent, the active learning of QBC can effectively evaluate the informativeness of the training data. In QBC, the cross-entropy method is often used to calculate the data similarity [19]. However, the entropy is always based on sparse label prediction, so it is not suitable for dense label classification tasks.

D. MEDICAL IMAGE SEGMENTATION
Deep neural networks have been widely used in medical image segmentation tasks. For a long time, various algorithms in the research field of computer vision have been proposed and developed for the automatic segmentation of retinal vessels in fundus images [20]. Jung et al. proposed an iterative deep learning model for medical image segmentation by iteratively inputting the network output as the shape prior for model fine-tuning [21]. A curvilinear structure extraction algorithm was presented in [22], and it trained the model from a single patch to the entire image sequentially. Different from natural images with RGB channels, medical image data are always grayscale and contain multimodal data. A cross-modality feature representation method was proposed in [23]; the features from multimodal MR images were extracted, and a precise brain tumor segmentation model was trained. In medical image segmentation, incorporating a shape prior and clinical knowledge is important. Zhang et al. [24] proposed a task-structured brain tumor segmentation network that models the task-modality structure as a weighted combination structure and mimics clinical practices to extract brain tumors. For cell segmentation, splitting clumps of convex objects has many practical applications in biomedical and industrial fields. The splitting framework proposed in [25] successfully solved the overlapping and touching problem in cell segmentation and improved the robustness of the cell-splitting model.

III. APPROACH
In this paper, we propose a new framework for medical image semantic segmentation using the SPLD strategy, which dynamically selects the samples in order from the easiest to the hardest ones while guaranteeing the diversity of the data to facilitate model convergence and to improve the performance of the image segmentation model.

A. QBC BASED SPL REGULARIZATION TERM
Before illustrating the algorithm, we firstly give the definition of the variables. Let L = Labeled Data = {{x 1 , y 1 } , {x 2 , y 2 } , . . . , {x n , y n }} be a collection of labeled image patches cropped from medical images with ground truth. Let C = {θ 1 , θ 2 , . . . , θ C } be a committee with |C| members, each member θ i is a deep CNN for medical image semantic segmentation.
Our approach utilizes the QBC framework. In each iteration, each committee predicts the features of all data points. A CNN-based image segmentation network contains an encoder and a decoder. The convolutional layer in the deep network provides the same receptive field as those of the large filters in the initial layers. When the network goes deeper, most of the existing CNNs employ the pooling layer to increase the receptive field. By deepening the network with small filters and pooling layers, the performance is enhanced [26]. In the encoder part of most image segmentation networks, the number of feature maps increases with decreasing resolution. The shallowest layers of the CNN learn low-level image features, such as shapes and edges, while the highest layers learn high-level features, which are most important to the specific application [27]. In our approach, we extract the feature maps from the last layer of the encoder and use the global average pooling (GAP) layer to convert the feature maps into feature vectors. Each member in the committee generates a feature vector representing the features of the input image. The feature vector generated from θ i is defined as vector i . The similarity between two members is measured by the cosine distance between them, as illustrated in Eq. 3: where |vector i | represents the length of the high-level feature vector extracted by member θ i in committee C, '·' represents the dot product operation and '×' indicates the cross-product operation. All the members in the committee have the same CNN architecture, indicating that the meaning of the elements in the feature vector is the same. According to Eq. 3, the cosine function is used to calculate the global similarity using the two extracted feature vectors. Each pair of elements from the two vectors contributes to the cosine similarity.
The similarity between each member can be formulated as a matrix: In our approach, we define the similarity of an input data x k as: The hypothesis behind our framework is that a set of extracted SPLD-QBC feature vectors with a high similarity indicates that all of the members in the committee have effectively learned the input datum, x k . Thus, a higher similarity indicates that the data are an easier sample and are thus first selected in the SPL training process. In contrast, a sample with a small similarity means that the features generated from each member possess large discrepancies between them, indicating a difficult sample.
After obtaining the similarities of all the training samples, we normalize them into [0, 1] to form the SPL regularization term in Eq. 1 by Eq. 6. v k = sim(x k ) − min i∈ [1,n] sim(x i ) max i∈ [1,n] sim(x i ) − min i∈ [1,n] sim(x i ) To successfully apply the SPL strategy during the training process, we need to tackle the following three problems: VOLUME 9, 2021 (1) How many data should be selected during the initial training epochs? The SPL strategy has to first exploit the information provided with a small number of data and then add more labeled data to the training set to control the learning pace. However, during the first several training epochs, the parameters in the CNNs have not yet been well optimized. If the data selection rule is directly applied to the SPL strategy, the model may collapse, and the SPL framework will not work. To handle this problem, we randomly select a small number of training samples to train the model. For example, we can train the committee with 10% randomly selected samples. The initial ratio of training data is defined as a hyperparameter named ITDR in our algorithm. If the model converges, we use Eq. 6 to control the pace of learning: (2) Another problem for QBC-based learning is that all the members have the same CNN architecture, indicating that they would generate the same sample feature vector for an input image if their parameters are the same. To overcome this shortcoming, we randomly initialize the model variables in each CNN with different seeds. With different seeds, the parameters and layers in each member possess different meanings, and they could effectively determine the data similarity. In addition, during the training process, the parameters are optimized according to different solutions. The experimental evaluation section shows that our method is effective.
(3) How can we control the learning pace? In practice, we need to prevent SPL learners from greedily adding data. If the SPL learner adds a fixed number of training data after each training epoch, all the data will be added to the training set, and the proposed method will become a fully supervised one. This is not our original intention with regard to controlling the learning pace. To tackle this problem, we impose a restriction on the query rule by setting an uncertainty threshold, which is inspired by the uncertainty sampling strategy [28]. The threshold is named UT in our algorithm. If and only if the v k of the data evaluated by all the members in the committee is larger than the threshold, the SPL learner defines the sample as an ''easy'' sample and adds the data to the training set.
In addition, the UT is dynamically changed during the training process. At the first training epoch, UT equal to 0.9 times of the largest v k in v. With the increment of training iterations, UT dynamically changes according to the Eq. 7. [1,n] v i ) where m is the number of training epochs and e is the current epoch during the training process.

B. CLUSTERING BASED DATA DIVERSITY
Intuitively, selecting data from different classes is an easy method to maintain data diversity during model training. However, as demonstrated in section 1, the medical images used in segmentation often lack categorical labels. To address this circumstance, a cluster algorithm is employed to deter-mine the groupings of the data. In section 3.A, the extracted features from each member determine the ease of the training data because the features contain the representations of the data. In the grouping stage, the extracted features are reutilized for clustering. In a committee, for each training data, the model will generate |C| feature vectors. We use the average of the |C| features as the feature vector of the sample, as shown in Eq. 8.
where vector i is pre-defined in Eq. 3. However, even if we can use a clustering algorithm to classify all the data into several groups, the optimal number of groups is still unknown. If the number of groups is too large, all of the data will be separately grouped into cluster centers with small numbers of samples; in contrast, if the number of groups is too small, the diversity of the data is not effectively represented. To handle this problem, affinity propagation [29] is employed to cluster the generated feature vectors. In affinity propagation, three matrixes are calculated: (1) a similarity matrix that indicates the data similarity between the training samples; (2) a responsibility matrix that measures how well the sample serves as the exemplar relative to other candidates; and (3) an availability matrix that represents the appropriateness of a sample picked from the exemplar. During the training iterations, the three matrixes are updated until the cluster boundaries remain unchanged. The exemplars are extracted as the cluster centers for classification. The affinity propagation algorithm is not needed to determine or estimate the number of clusters before clustering because the number of groups in the diversity process is automatically calculated according to the extracted feature vectors.
During each training epoch, the affinity propagation algorithm is applied to extract features and cluster data into different groups. Then, the SPL approach demonstrated in section 3.A is employed to select several easy samples from each clustered group. Formally, by applying the cluster algorithm, the dataset is grouped into g groups as , and g(i) represents the number of samples in the corresponding group. Obviously, g j=1 g(j) = n. To guarantee the diversity of the training sample, the SPL learner selects the easiest samples from each group, and the corresponding data [1, g]. By applying the grouping strategy, the loss function for SPLD changes from Eq. 2 into Eq. 9, as shown below.

C. ALOGIRITHM
In summary, the proposed SPLD-QBC algorithm for medical image segmentation contains two parts: one is SPL data selection, and another is data diversity calculation by using affinity propagation. Combining both parts, the algorithm can be summarized in Algorithm 1. Calculate the v j for each data in the j-th group by Eq. 6; 6. Select the easy samples in j-th group whose v > UT; 7. Train C = {θ 1 , θ 2 , . . . , θ C } with Loss function defined in Eq. 9 on selected data from each group; 8. Update UT by Eq. 7

Algorithm 1 Proposed SPLD-QBC Algorithm
According to the description of Algorithm 1, there are four inputs, including the label dataset, the initialized committee with |C| members, and two hyperparameters. The outputs of our algorithm are the trained members. Line 1 represents the random data selection process. The while loop and line 2 indicate the first initial training epochs. The first 'for' loop demonstrates the process of the SPLD algorithm. Lines 4 to 6 represent the cluster-based data diversity algorithm in section 3.B. Without the group operation listed in lines 4 to 6, the proposed SPLD algorithm would degrade into the SPL algorithm. In our experiments, the SPL strategy is used as a baseline.
We summarize our algorithm as below: (1) During the initial training epochs, the proposed algorithm is an SPL-based algorithm.
(2) The presented algorithm uses λ, γ and a dynamically changed UT to control the learning pace and to prevent the algorithm from degrading into a fully supervised learning (FSL) algorithm.
(3) The segmentation results are generated by the member with the best performance. Like students in a class, the best student can produce the best results for a given task.

D. PROOF OF CONVERGENCE
To guarantee the feasibility of the proposed SPLD-QBC algorithm, it is necessary to prove the convergence of the SPLD approach. The proposed algorithm aims at finding a global optimum to minimize E(w, v; λ), as defined in Eq. 9 for any given w and v.
Optimize w with fixed v. Firstly, if we fix v, then the second and the third terms in Eq. 9 are constants and our algorithm degrades into a SPL algorithm. It is proved that the SPL will converge after several training epochs [34]. And the author mentioned that the learning process of traditional SPL regime can be guaranteed to converge to rational critical points.
Optimize v with fixed w. Then, we need to prove v computed in algorithm 1 attains the global optimization with a fixed w. The loss function defined in Eq. 9 can be rewritten as Eq. 10.
Eq. 10 indicates that the loss function in the proposed SPLD-QBC algorithm is decomposed into g groups of traditional SPLD loss functions, as defined in Eq. 2. Our proposed SPLD-QBC algorithm is viewed as a linear combination of several traditional SPLD algorithms. Since the traditional SPLD algorithm can converge [7], the proposed SPLD-QBC converges after several training iterations.
By the two proofs, we can conclude that our algorithm could converge. Through minimizing the loss function in Eq. 9, the optimal parameters are obtained.

IV. RESULTS AND DISCUSSION
In this section, we introduce medical image segmentation experiments. Three types of medical image semantic segmentation tasks, including retina vessel segmentation, lung organ semantic segmentation and nuclear image segmentation, were performed on five different datasets to demonstrate the effectiveness of the proposed SPLD-QBC learning framework.

A. IMPLEMENTATION DETAILS
Our implementation was based on the TensorFlow deep learning library. To apply the SPL framework, we used image patches cropped from the original medical images, each with a resolution of 128 × 128, for model training. During the testing stage, the CNN model of each member inferred the VOLUME 9, 2021 whole segmentation results rather than the patched images. Due to hardware limitations, the number of members was set to 2, 3 and 4 for the SPLD-QBC environment. Inspired by [31], we used a poly learning strategy, and the learning rate equaled the base learning rate multiplied by (1 − e m ) 0.9 , where e indicates the current training epoch and m represents the total number of training iterations. Typically, we set m = 1000 for retinal vessel segmentation, m = 500 for lung organ segmentation and m = 300 for nuclear cell segmentation. The ITDR was set to 10 %. In addition, the hyperparameters γ and λ were both set as 1 in our experiments.
Furthermore, the architectural design of each member in the committee was different so that the proposed model could be capable of fitting the specific tasks. In the following sections, we discuss the features of these tasks and then illustrate the network for each member.
For each experiment, we compared the proposed SPLD-QBC algorithm with the SPL strategy and fully supervised learning (FSL) strategy. For the SPL baseline, the diversity module implemented by affinity propagation was removed, and the number of committee members was set as 4 (|C| = 4) and was fixed. For the FSL, all training patches were selected from the model during the entire training period. In the SPL and FSL experiments, we trained the model with the same number of epochs as that of SPLD-QBC. For SPL, the hyperparameter ITDR was the same as the setting in SPLD-QBC.

B. RETINA VESSEL SEGMENTATION
Retinal vessel segmentation plays an important role in the automatic detection of retinal diseases with funduscopic images. Retinal blood vessel image analyses provide important information for the detection of several diseases, such as diabetes, retinopathy and glaucoma. The segmentation and localization of retinal blood vessels serve as important cues for the diagnosis of ophthalmological diseases, such as hypertension, microaneurysms and arteriosclerosis [32].
The DRIVE dataset is a public dataset consisting of 40 fundus images of size 565×584 [33]. The images were manually divided into a training set and a test set, both containing 20 images. The STARE dataset is another public dataset consisting of 20 fundus images of size 605 × 700 [34]. The images were manually divided into a training set and a test set, both containing 10 images.
Because the vessels in fundus images vary in shape, thickness and contrast, the model for retina vessel segmentation was designed accordingly. For the vessel segmentation tasks, we employed a full-resolution residual network (FRRN) [35] as the architecture for each member. The FRRN incorporates inputs of different sizes to successfully capture the segment boundaries and enhance the multiscale ability of the model. The last full-resolution residual unit defined in the FRRN was extracted as the latent feature of each member, and we used the feature to compute v k for SPLD-QBC training.
The retina vessel segmentation task is a binary classification task because each pixel belongs to the vessel or the background. To evaluate the model performances, we calculated the Dice score (DSC) and Hausodorff distance (HD) between the segmentation results and the ground truths. The DSC considers the overlap between the segmented results and ground truths. A higher DSC indicates that the model performance is more powerful. The HD measures how far two subsets of a metric space are from each other. In addition, the lower the HD is, the better the model. The comparison results are as shown in Table 1.
Several examples are shown in Figure 2. We juxtaposed the original vessel image, the segmentation results obtained by the FSL, SPL, and SPLD approaches and the corresponding ground truths. It is observed that the proposed SPLD-SPLD-QBC algorithm extracted more capillaries than the SPL and FSL algorithms, thereby demonstrating that the model trained by the SPLD-QBC strategy has the strongest ability to capture both small and large objects because of diversity learning. The results also indicate that by using the SPLD training algorithm developed in Algorithm 1, the feature representation ability of the model was improved.

C. ORGAN SEMANTIC SEGMENTATION IN CHEST X-RAYS
Scanning patient organs using chest X-rays is one of the most important procedures, and it creates significant diagnostic workloads [2]. Accurate segmentation of the lung and heart boundaries provides valuable information for developing a computer-aided diagnosis (CAD) system for inspecting lung and heart function. However, due to the high variations in the shapes, sizes and contrasts of chest X-rays, organ segmentation in the lungs remains a challenging task.
Different from retina vessel segmentation tasks, the lung organ segmentation task is semantic because the model predicts each pixel as belonging to the left lung, right lung, heart or background. In this application, the experiments were performed on the JSRT dataset [36] and Montgomery dataset [37]. The JSRT dataset contains 247 chest X-rays, and the annotations include left lung, right lung and heart. The number of total pixel classes is 4. The Montgomery dataset contains 138 chest X-rays, and the total number of classes is three, including left lung, right lung and background. We scaled all images to 256 * 256 pixels while guaranteeing that the details of organ structures remained clear. In the experiments, 80 % of the samples were randomly selected for training, and the remaining 20 % were selected for testing.
The organ segmentation task is a semantic segmentation task because the number of pixel categories is more than  2. For this task, we employed AdapNet, which combines the convolutional mixture of deep experts (CMoDE) fusion scheme for learning robust kernels and enlarges the receptive field as the model of each member [38]. We extracted the feature maps of the layer directly before the first deconvolution layer as the latent features of each member. The mean intersection over union (mIoU) and HD between the segmentation results and the ground truths were used to measure the model performances. A higher mIoU indicates that the model performance is more powerful. The comparison results are as shown in Table 2.
As shown in Table 2, the SPLD training strategy outperformed the SPL and FSL methods. A further comparison showed that even though the mIoU was enhanced by limited amount, the HD measures were significantly improved. The proposed method achieved the best performance in terms of the HD metric, at approximately half the value achieved by the FSL strategy on the Montgomery dataset (from 40.5771 to 22.7487). For the JSRT dataset, the HD was significantly reduced by a 5-pixel length using the proposed SPLD-QBC compared to the FSL baseline. Segmentation examples are shown in Figure 3. For the Montgomery dataset, as the number of members in the committee increased, the amount of noise in the segmentation maps was reduced significantly. For the JSRT dataset, our proposed SPLD model generated impressive results on heart RoIs, even though they are difficult for humans to determine. In addition, with the use of SPLD, the shapes of the organs in segmented images were more complete than those obtained with the use of FSL for both datasets.

D. NUCLEAR IMAGE SEGMENTATION
Nuclei segmentation is one of the most important tasks for whole slide image analysis in digital pathology [39]. One of the most important steps in clinical practices is to extract information components from the whole slide images (WSIs) [40]. However, thousands of cells exist in a single WSI, and thus manual segmentation is tedious and timeconsuming. To automatically segment the cells, we applied our model to H&E-stained multiorgan nuclei segmentation datasets [40] with 30 WSIs and 22,000 corresponding cell boundaries. Each pixel in the WSIs was classified into two categories: one is the cell, and the other is the background. We set each member as a fully convolutional DenseNet (FC-DenseNet) [41]. In this task, we randomly selected 20 samples for training and the remaining 10 for testing. Each image was resized to a resolution of 1024 × 1024; thus, we cropped  64 patch images with a resolution of 128 × 128 from one WSI.
Similar to the retina vessel segmentation task, we used the DSC and HD to evaluate the model performances, as shown in Table 3. Table 3, our model achieved the best mIoU and HD performance. Figure 4 shows several examples of the model outputs. The improved areas are annotated by yellow circles. Detailed improvements are circled in yellow circles. For example, in the first row of Figure 4, the cells segmented by SPLD with 4 members are fully connected regions without holes. However, holes exist in the segmented cells generated by the other four models. In addition, the number of false positive pixels is smaller in results obtained by the SPLD method than in the SPL-and FSL-generated results. This phenomenon indicates that learning in order from the easiest scenario to the hardest scenario facilitated model training, and the segmentation performance of the model was improved by adding a large number of members to the committee.

V. CONCLUSION
In this paper, we proposed a versatile medical image semantic segmentation framework for medical image segmentation. The proposed approach can enhance the performances of medical image semantic segmentation models. By applying the query-by-committee (QBC) technique, we dynamically selected the optimal sequence of training samples from different probability distributions so that the models could achieve improved performances. To prevent the model from reaching local minima, we employed a clustering algorithm to guarantee data diversity and thus force the model to pursue global minima. The experimental results indicated that the proposed SPLD could significantly boost the segmentation performance of the model and that it is easy to embed into a CNN-based segmentation model.