Model-Agnostic Post-Processing based on Recursive Feedback for Medical Image Segmentation

In medical image segmentation, post-processing can effectively improve the performance of a segmentation model. Existing post-processing methods generally require additional training of a post-processing model using training data or designing a post-processing procedure based on a high level of domain knowledge. Their application is limited in many real-world situations due to the lack of prerequisites. In this study, we present a post-processing method that can be applied to any existing segmentation model without requiring the use of training data or domain knowledge. Given a segmentation model of any type, the proposed method improves its prediction based on the recursive feedback mechanism. For a query image, we first obtain its prediction mask by using the segmentation model. Based on the prediction mask, we modify the original image by selectively blurring the area in which the target object is expected to be absent. Subsequently, the modified image is fed again into the model to acquire a refined prediction mask. We repeat this process to obtain multiple prediction masks, which are then combined to yield the final prediction. We verified the effectiveness of the proposed method through experiments using real-world medical datasets.


I. INTRODUCTION
Image segmentation is a task of classifying every pixel in an image to identify a set of pixels to which the target object belongs. Segmenting numerous images manually is timeconsuming and tedious. Hence, the demand for the automation of image segmentation is high [1]. Especially, image segmentation has been actively studied in the medical domain because it plays an important role in identifying the objects of interest from medical images, such as polyps, tumors, and skin lesions [1][2][3].
The image segmentation task can be mathematically formulated as follows. A query image of m × n size is given in the form of X ∈ R m×n×3 . Based on its object region, image X is annotated at the pixel level with the segmentation mask Y ∈ {0, 1} m×n , in which the value of each element is 1 if the corresponding pixel in X belongs to the target object region, and 0 otherwise. In the absence of the actual mask Y, segmentation model f seeks to segment X by predicting Y as a function of X, thereby yielding the prediction maskŶ = f (X) ∈ [0, 1] m×n . For prediction maskŶ, each element is a probabilistic prediction of the corresponding element in Y, ranging from 0 to 1.
Considerable research efforts have been made to achieve high-performance segmentation models. Conventional approaches are based on Markov random fields [4,5], conditional random fields (CRFs) [6,7], support vector machines [8,9] and random forests [10,11]. In addition, unsupervised learning algorithms have been applied to image segmentation, such as k-means clustering [12][13][14], region growing [15][16][17], and watershed transformation [18,19]. With the advance of deep learning in recent years, convolutional neural networks (CNNs) have shown dramatic performance improvement in image segmentation [20]. Given a training dataset of N images and their segmentation masks, denoted by , a CNN is trained in an end-toend fashion as a segmentation model. The primary research direction for improving image segmentation has focused on designing the CNN architectures for segmentation models [21,22].
Another important research direction on improving image segmentation is the post-processing of the output from the segmentation model for better predicting the segmentation mask [23,24]. Existing post-processing methods can be categorized into two main approaches. The first approach is to build a post-processing model that refines the output from the segmentation model. The requirements of this approach are access to the original training dataset, and extra time and resources for additional training. The second approach is to devise a post-processing procedure manually, for which a high level of domain knowledge is necessary in general. Hence, the existing methods may not be applicable in situations where their prerequisites are not satisfied due to some practical constraints.
In this study, we present a model-agnostic post-processing method that can be applied to any segmentation model without requiring any specific prerequisites. The key idea of the proposed method is to make the segmentation model focus more on its input region, where the target object is expected to be present. Given an existing segmentation model of any type, the proposed method improves its prediction by introducing a post-processing procedure based on the recursive feedback mechanism. For a query image, we feed it into the segmentation model to obtain the prediction mask. Based on the prediction mask, we modify the original input image by selectively blurring its region where the target object is expected to be absent. Subsequently, we obtain an improved prediction mask by re-entering the modified image into the segmentation model. We perform the process repeatedly to obtain several prediction masks, which are then aggregated to obtain the final prediction mask. To validate the effectiveness of the proposed method, we conducted experiments with four real-world medical datasets.
The remainder of this paper is organized as follows. In Section II, we review the related work. In Section III, we introduce the proposed post-processing method. Section IV shows the experimental results. Finally, the conclusion and future work are given in Section V.

II. RELATED WORK A. POST-PROCESSING METHOD FOR IMAGE SEGMENTATION
Various post-processing methods have been developed to improve the performance of existing segmentation models for image segmentation tasks. Given a query image, these methods post-process the prediction mask obtained from the model to predict the actual segmentation mask more accurately. Existing methods for medical image segmentation can be categorized into two main approaches.
The first approach is to build an additional post-processing model using the training data. In general, this approach has no restrictions on the domain to which it applies. The postprocessing model uses the prediction mask by the segmentation model as well as additional inputs, such as the original image and handcrafted features, to obtain a refined predic-tion mask. Many studies utilized CRFs as a post-processing model [25]. CRFs help correcting noise and inconsistency in the prediction mask by considering the geometric relations between pixels [26,27]. To alleviate the high computational cost of CRFs, dense-CRFs [28][29][30][31] and convolutional-CRFs [32] have been proposed. In other studies, non-CRF postprocessing models have been used. Chlebus et al. [33] used a random forest with 36 handcrafted features regarding the shape of object and location in the image. Larrazabal et al. [34] used a denoising autoencoder to reduce noise in the prediction mask by considering topological restrictions and convexity.
The second approach is to manually design a postprocessing procedure specialized to the target task based on a high-level of domain knowledge for a specific domain. In the medical domain, this approach works by imposing constraints on the number, volume, shape, or location of objects on the prediction mask. The methods based on this approach are generally applicable to any segmentation model without requiring the use of training data. Zhao et al. [35] eliminated isolated small blocks in the prediction mask to reduce noise to perform brain tumor segmentation using MRI images. Feng et al. [36] retained the largest interconnected block and removed other parts in the prediction mask to identify an organ to perform multi-organ segmentation from CT images. Groza and Kuzin [37] regraded small blocks in the prediction mask as noise to improve pneumothorax segmentation from chest X-ray images. Birenbaum and Greenspan [38] set a threshold of lesion size to perform multiple sclerosis lesion segmentation using MRI images. Wang et al. [39] merged neighboring blocks with border smoothing in the prediction mask to perform skin lesion segmentation using dermoscopy images.
Both approaches effectively improve the prediction mask from an existing segmentation model when specific prerequisites are satisfied. The first approach involves training of an additional model using training data. The second approach utilizes high-level domain knowledge. Despite their effectiveness, their prerequisites may not be fulfilled in many realworld applications. To overcome the limitation, this study aims to present a post-processing method that does not require any specific prerequisites, such as original training data and domain knowledge.

B. FEEDBACK MECHANISM FOR IMAGE SEGMENTATION
The feedback mechanism is a process wherein the output of a model is recursively routed back to the input of the model. This makes the input and output of the model to affect each other consecutively [40]. Recently, several studies have adapted the feedback mechanism to improving image segmentation. In existing methods, a segmentation model is augmented to incorporate the feedback mechanism and train a model using a specialized learning objective.
An and Liu [41] augmented a CNN to incorporate feedback connections from the output layer to hidden layers based Algorithm 1 Post-processing procedure based on recursive feedback input: query image X ∈ R m×n×3 , segmentation model f output: prediction maskŶ ∈ [0, 1] m×n 1: procedure POST-PROCESSING 2: for t = 1 to T do 5: T t=0 γ t 9: end procedure on feedback recovery and feedback selective algorithms. Shen et al. [42] used multi-stage multi-recursive-input CNNs as a segmentation model. In terms of the sequence of the CNNs, the side outputs in each stage were concatenated with the original input image to serve as the input to the CNN in the next stage. Shibuya and Hotta [43] used a CNN with LSTM layers as a segmentation model. The first output of the model is fed into the model as input to obtain the second output as the final prediction mask. Girum et al. [44] used an encoder-decoder CNN as a segmentation model. The first output of the model was processed with another CNN to extract features, that were reintegrated into the intermediate layer of the model to obtain the second output.
Unlike existing methods, this study aims to make use of the feedback mechanism as the post-processing for a given segmentation model of any type. The proposed post-processing method does not require any architectural modifications or additional training.

III. PROPOSED METHOD A. OVERVIEW
In this section, we present the proposed method. The problem situation we address in this study is as follows. We are given a segmentation model f of any type, which predicts the segmentation mask Y ∈ {0, 1} m×n for an input image X ∈ R m×n×3 asŶ = f (X) ∈ [0, 1] m×n . The training data or explicit domain knowledge are not available for use. For a query image X, we wish to improve its prediction maskŶ under this situation. The proposed method introduces a model-agnostic post-processing procedure that can be applied to any segmentation model without requiring any specific prerequisites. The key idea of the proposed method is the iterative updating of prediction maskŶ based on the recursive feedback mechanism. At each update, the prediction maskŶ is utilized as feedback information to modify the input image X by selectively blurring the area in which the target object is expected to be absent.
Algorithm 1 shows the pseudocode of the post-processing procedure for obtaining the predicted maskŶ. The schematic diagram is illustrated in Fig. 1.

B. POST-PROCESSING PROCEDURE
The post-processing procedure consists of three steps: (1) initial prediction; (2) iterative refinement based on recursive feedback; (3) aggregation for final prediction. We describe each step below.

1) Initial prediction
Given a query image X and a segmentation model f , it is used as the initial input X (0) , i.e., X (0) = X. Using the segmentation model f , we obtain the initial predicted mask as: 2) Iterative refinement based on recursive feedback After initialization, we repeat the recursive feedback process T times to obtain refined predicted masks. At the t-th iteration, t = 1, . . . , T , the previously predicted mask Y (t−1) is used to produce the modified image X (t) as follows: where is the element-wise product and φ blur is a blur operator. Each pixel in X (t) is a convex combination of the corresponding pixels in the original image X (0) and the blurred version of the previous image φ blur (X (t−1) ), for which the prediction maskŶ (t−1) weights for their contributions. If an element inŶ (t−1) is closer to 1 and 0, then the contribution of the corresponding pixels in X (0) and φ blur (X (t−1) ) are high, respectively. The pixels in X (t) that are expected to be devoid of the target object are blurred, while the remaining pixels are preserved. Consequently, the modified image X (t) maintains the target object context whereas irrelevant aspects are suppressed. The modified image X (t) is fed into the segmentation model f to obtain the updated predicted mask asŶ (t) , which is expressed as:

3) Aggregation for final prediction
After the repetition of the recursive feedback process, we finally obtain T + 1 prediction masksŶ (0) ,Ŷ (1) , . . . ,Ŷ (T ) . We aggregate them with an exponential decay with respect to t to obtain the final prediction maskŶ, as follows: where γ is a decay factor whose value is in the range of (0, 1).
Owing to the decay, earlier prediction masks with a smaller t are weighted more than the latter ones with larger t, hence the effect on the final prediction maskŶ is higher. Normalization by the denominator makes every element ofŶ to be within the range of [0, 1].

C. DISCUSSION
The proposed method utilizes the prediction mask of the segmentation model to determine where to blur. In the input image, the region predicted as the target class is regarded as important context, whereas the remaining region is regarded as irrelevant context. The image is repetitively modified by selectively blurring irrelevant contexts while preserving important contexts. Accordingly, the modified image contains fewer irrelevant aspects. This enables the segmentation model to focus more on important contexts while suppressing irrelevant contexts to predict the segmentation mask more accurately. Consequently, the final prediction is enhanced toward improving the segmentation performance.

A. DATA DESCRIPTION
We investigated the effectiveness of the proposed method using four medical image datasets: CVC-ClinicDB, Kvasir-SEG, Kvasir-Instrument, and ISIC-2017. The datasets correspond to different image segmentation tasks. Each dataset contains a number of images and their corresponding segmentation masks, which are annotated and verified by domain experts. Table 1 summarizes the characteristics of the datasets. Since the original images have various resolutions, we resized every image and segmentation mask to 384×384.
In the experiments, we randomly partitioned each dataset into 80% as the training set for training the segmentation model and the remaining 20% as the test set to evaluate the performance of the proposed method.

B. SEGMENTATION MODEL
To build segmentation models, we employed U-Net [49] which is the most widely used CNN architecture for medical image segmentation tasks [50,51]. U-Net consists of encoder-decoder like architecture with skip connections. It has ability to extract global and local features simultaneously via feature map concatenation through skip connections. Consequently, it affords high localization performance. We used VGG16 [52] as the encoder backbone of the architecture. Segmentation models were implemented based on the Segmentation_Models library. 1 For the experiments, each segmentation model was trained as follows. The loss function was set to binary cross-entropy. The parameters of the encoder in the model were pre-trained on ImageNet [53]. Subsequently, the parameters of the model were updated using 75% of the training dataset, for which we used the Adam optimizer [54] with a learning rate of 10 −4 and a mini-batch size of 10. We applied online data augmentation with rotation, flip, shift, zoom, and brightness operations. The remaining 25% was used to validate the model for early-stopping. The training was terminated when the validation loss failed to decrease over 10 successive epochs, or when the number of epochs reached 500.

C. EXPERIMENT SETTINGS
The proposed method involves three hyperparameters: the number of iterations T , blur operator φ blur , and decay factor γ. In the experiments, we used the following settings for the hyperparameters. The number of iterations T was varied from 1 to 10 to investigate its effect on the performance. As a baseline, we evaluated T = 0, for which the vanilla prediction mask of the segmentation model was used without applying recursive feedback. For the blur operator φ blur , we compared the four algorithms implemented in the OpenCV library 2 to evaluate their effect on the performance: average blur, median blur, Gaussian blur, and bilateral filter. For all blur operators, we set the kernel size to 3. For Gaussian blur, the standard deviation of the Gaussian kernel was automatically set based on the default setting in the library. For bilateral filter, the standard deviation of the filter was set to 75. The decay factor γ was set to 0.5 for all experiments. The performance of the proposed method was evaluated in terms of intersection over union (IoU) and area under the receiver operating characteristic curve (AUROC), which are typically used metrics for image segmentation. Given a ground-truth segmentation mask Y and its predictionŶ for an image, IoU is defined as the number of target object pixels common in the masks divided by the number of target object pixels present across the masks. For IoU calculation, we specified the decision threshold to identify the target object pixels inŶ as 0.5. The ROC curve plots the true positive rate against the false positive rate of pixel-level predictions for an image by varying the decision threshold. AUROC evaluates the overall performance of an image in a thresholdindependent manner by integrating the performance over all possible settings of the threshold.
All experiments were performed 10 times independently with different random seeds. We report the mean and standard deviation of each result over 10 replications. Tables 2 and 3 show a comparison of the performance of the proposed method in terms of IoU and AUROC, respectively, with various settings of the number of iterations T and blur operator φ blur . In each row, the results better than the baseline (T = 0) are shown in bold. The single and double asterisks (* and **) indicate that the proposed method significantly outperformed the baseline at significance levels of 0.05 and 0.01, respectively, based on the paired t-test.

D. EXPERIMENTAL RESULTS
As shown in the tables, the proposed method improved the segmentation performance over the baseline in terms of IoU and AUROC with statistical significance for CVC-ClinicDB, Kvasir-SEG, and Kvasir-Instrument datasets. In 2 https://github.com/opencv/opencv-python/ these datasets, IoU continued to increase with T and saturated at approximately T =5, whereas AUROC was the highest at approximately T =1. On ISIC-2017 dataset, the proposed method did not indicate effectiveness in terms of IoU and slightly improved the performance in terms of AUROC. No significant performance differences were observed among the blur operators, whereas median blur consistently showed favorable performance over the experiments. Fig. 2 shows the images and their prediction masks, in which the proposed method with the median blur was the most effective. We observed that the prediction mask improved by expanding target object region while removing noise as the number of iterations T increased.
We examined the characteristics of the images in the benchmark datasets to determine the case in which the proposed method performed better. For the first three datasets, the images exhibited distinct features in both the target object and background regions. Hence, blurring operations helped smoothing the features on the background region. By contrast, the images in ISIC-2017 dataset exhibited a monotonous background region, therefore blurring operations gave little effect on the background region.
Based on the findings from the experimental results, we suggest the following settings for the proposed method. For the number of iterations T , we recommend setting T = 5 when the goal is to improve IoU and T = 1 when AUROC is the target performance indicator. For the blur operator φ blur , we recommend using the median blur owing to its robustness and low computational cost compared with the other blur operators.

V. CONCLUSION
We presented a model-agnostic post-processing method based on the recursive feedback mechanism to improve image segmentation. Given a segmentation model and a query image, the proposed method worked by inputting the query image into the segmentation model to obtain the prediction mask. Subsequently, the image is modified by selectively blurring the region where the target object is not expected to be present. The modified image is re-entered into the segmentation model to obtain a refined prediction mask. Multiple prediction masks are obtained by repeating the process, and they are then aggregated with an exponential decay to obtain the final prediction mask. The proposed method can be applied to any existing segmentation model without requiring additional training.
We demonstrated the effectiveness of the proposed method using four real-world medical datasets. Experimental results VOLUME 4, 2016

CVC-ClinicDB
Average    showed that the proposed method improved the segmentation performance with statistical significance on the datasets on benchmark datasets. We expect that the proposed method will contribute to additional improvement of segmentation models in real-world situations where existing post-processing methods are not applicable owing to practical constraints. In fact, the proposed method can also be employed along with other post-processing methods. For future work, we will further enhance the proposed method by utilizing other image-smoothing operators. In addition, we will investigate the characteristics of query images for which the application of the proposed method is more beneficial, such that the hyperparameter T can be adjusted adaptively for different query images.

CVC-ClinicDB
IoU=0.1132 T = 0 (baseline) He is currently a M.S. student at the Department of Industrial Engineering, Sungkyunkwan University. His research interests include machine learning applications in uncertainty quantification and post-processing methods for image segmentation.
SEOKHO KANG is an assistant professor of systems management engineering (industrial engineering) at Sungkyunkwan University. He received the B.S. and Ph.D. degrees in industrial engineering from Seoul National University in 2011 and 2015, respectively, and was a research staff member with Samsung Advanced Institute of Technology. His research focuses mainly on developing learning algorithms for efficient data-driven modeling and their applications to real-world data mining problems. He has published a number of papers in refereed journals and conference proceedings related to these areas. VOLUME 4, 2016