Polyp Segmentation of Colonoscopy Images by Exploring the Uncertain Areas

Colorectal cancer is one of the leading causes of death worldwide. Polyps are early symptoms of colorectal cancer and prone to malignant transformation. Polyp segmentation of colonoscopy images can help diagnosis. However, existing studies on polyp segmentation of colonoscopy images face two main difficulties: blurry polyp boundaries, close resemblances between polyps and surrounding tissues. The former may lead to partial segmentations, while the latter can result in false positive segmentations. This paper proposes a new polyp segmentation framework to tackle the two challenges. In this method, an uncertainty region based module called Uncertainty eXploration (UnX) is introduced to get the integral polyp region while eliminating the interferences from the backgrounds. Specifically, it refines the feature maps with ternary guidance masks by dividing the initial guidance maps into three types: foreground, background and uncertain region, so that the uncertain areas are highlighted for more foreground objects while the backgrounds are forcefully suppressed to avoid interferences of tissues in background. Taking UnX as side supervision to the transformer encoder based backbone stages, the proposed method can mine the boundary areas from the uncertainty regions gradually and obtain integral polyp segmentation finally. Moreover, a new module called Feature Enhancement (FeE) is also incorporated in the framework to enhance the discrimination for images with significant variation of sizes and shapes of polyps. FeE can supply multi-scale features to the global oriented transformer features. Experiments on five polyp segmentation benchmark datasets of colonoscopy images, Kvasir, CVC-ClinicDB, ETIS, CVC-ColonDB and CVC-300, show the superior performances of our proposed method. Especially, for ETIS, the most challenging among the five datasets, our method achieves 7.7% and 5.6% improvements in mDSC and mIoU respectively in comparison with the state-of-the-arts methods.


I. INTRODUCTION
C OLORECTAL cancer (CRC) is serious for human health, leading to the fourth highest Cancer mortality rate in the world [1]. Polyps in the intestinal tissues are precursors of CRC and can easily turn into malignant lesions [2]. Colonoscopy can provide information on the location and appearance of colorectal polyps, allowing doctors to remove them before turning into cancer. However, colonoscopy can sometimes miss the detection for polyp. Therefore, automatic and accurate polyp segmentation of colonoscopy images is of great significance, which can provide additional support to clinicians.
These methods are still not robust enough to cope with the challenging cases where there are blurry boundaries (e.g., the polyp in the top image of Fig. 1) and close resemblances between polyps and surrounding tissue in colonoscopy images (e.g., the tissue in bottom image of Fig. 1(a)). Consequently, partial segmentations of polyps and false positive segmenta-   [3] (e) UACANet-S [4] FIGURE 1. Example segmentation results among two state-of-arts methods (PraNet [3] and UACANet-S [4] and ours for challenging polyps with blurry boundaries (e.g., top image) or similar background tissues (e.g., bottom image).
(a) Original images. The green and yellow boxes show the polyps and background tissue respectively; (b) GT represents ground truth; (c)-(d) segmentation results from different methods. It can seen that ours can get more robust segmentation results.
tions are difficult to avoid for these methods ( Fig. 1(d) and Fig. 1(e)). Interestingly, the salient foreground regions can be effectively obtained through different methods (e.g., transformer encoder [25]- [27] and convolutional neural networks (CNN) encoder [28], [29]) and has been successfully applied to medical image segmentation [3], [22]. The salient areas can provide important cues about the object distributions in the scene: The apparently dissimilar background should have features significantly different from those in the salient areas, while those areas with intermediate feature distances to the salient areas may include the areas which are difficult to distinguish. The boundaries surrounding polyps and the tissues similar to polyps in background generally stay in these intermediate areas. Therefore, farther mining the intermediate regions will improve the discrimination of the network so that boundary regions and appearance-similar tissues in the background can be recognized properly for robust segmentation.
Therefore, a novel segmentation framework is proposed, which integrates the exploration of the uncertain areas as the main drive for a robust polyp segmentation of colonoscopy images. In particular, it adopts a new module, Uncertainty eXploration (UnX), for such a purpose, which guides the network to highlight uncertain areas and suppress the background areas through a ternary guidance mask and consequently mines more polyp regions in the uncertain areas without the interference of tissues in background efficiently. In addition, considering that the shapes and sizes of polyps vary greatly, a multi-scale feature augmentation module, Feature Enhancement (FeE), is also introduced to boost the performance of UnX by multi-scale atrous convolutions. Unlike previous atrous convolution based methods [30], [31], FeE adopts the convolution at different but appropriate rates in parallel to fit large size variations of polyps, so that scale-rich features can be input to UnX without significant increase of tuning parameters. This UnX and FeE combined strategy makes the proposed polyp segmentation method obtain strong discriminative features and thus fulfill robust segmentation of polyps ( Fig. 1(c)).
This uncertainty exploration based idea can also be ex-plained from the human cognition [32]: Humans has a deliberation ability that they can keep seeking the interesting parts of an object without the interference of background, after initially seeing its salient parts. However, most of the existing studies [8], [21], [33] don't consider the explicit background priors, which may easily obtain false positive segmentation because of the similarities between polyps and surrounding tissues [34]. Recent state-of-the-arts methods try the direct foreground/background binary division based background priors [3], [4]. However, this binary division can only get a rough background prior with the possible inclusion of fake foreground objects in the background. Therefore, it also cannot avoid false positive segmentations and partial segmentations. This paper, however, explores the uncertain areas after separating the whole feature map into three areas with a ternary mask and, therefore, explicitly provides such a background prior to help the network fulfill the robust segmentation process. It is also worthwhile to note that, in the weakly supervised study for nature image segmentation, Hou et al. [35] ever proposed a ternary mask based method, where both ternary and binary masks are applied once sequentially. They need the binary mask for finding more foreground areas from the background because of the possible errors, i.e., missing foreground objects, brought by the ternary mask due to their weak image-level supervision. However, our method takes a strong supervised way so that the ternary-mask is strong enough to help obtaining salient foregrounds and backgrounds. In addition, multiple ternary-mask based side supervisions can gradually to mine the boundary areas for integral polyps segmentations.
Our main contributions can be summarized as follows: • An uncertainty exploration module, UnX, which take a ternary guidance mask to explore the uncertain areas for more foreground objects while suppressing the interference of the backgrounds, and thus boost the discrimination of the network for a robust foreground recognition. • A feature enhancement module, FeE, which takes parallel atrous convolutions to augment the multi-scale feature representation with finer scales and thus helps improve polyp segmentation under different scales. • A deep polyp segmentation framework for colonoscopy images, which integrates the two proposed modules, UnX and FeE, into the transformer encoder for more accurate segmentations of polyp images.

II. RELATED WORK
Existing methods can be classified into two types: traditional methods and deep learning based methods. The former adopts artificially designed features while the latter utilizes the neural network to automatically extract them. Traditional methods are mainly based on low-level features, such as textures [14], geometric features [14] or superpixels [36]. However, they tend to have poor segmentation performances, especially in comparison with the deep learning based meth-2 VOLUME 4, 2016 ods. Therefore, our main focus here is on the deep learning based methods. Many deep learning based methods have been proposed for polyp segmentation. Some of them [37]- [40] directly use some basic deep learning models for polyp segmentation. For example, Vazquez et al. [38] adopted a full convolutional network (FCN), which was perhaps the first deep learning work for polyp segmentation; Qadir et al. [40] employed Mask R-CNN for joint polyp detection and segmentation learning. Those models focus on the basic application of deep learning models and thus obtain limited performances, especially for the two main challenges.
Considering that the boundary of a polyp is often blurry, some researchers [41]- [43] took extra boundaries as supervision, which however need more edge ground truth.
Attention mechanism is also adopted by some methods [3], [4], [33], where robust salient foreground regions provides important initial cues for segmentation. Among them, foreground/background binary division oriented methods [3], [4] achieve the state-of-the-arts performances. However, these methods build on the binary classification of the whole image into foreground and background without considering the uncertainty of background. Kim et al. [4] incorporated an additional uncertain area though a bias operation, But such uncertain areas are actually very thin and along the foreground edges and, therefore, can only get limited performance gain as edge supervision. We, however, propose the ternary mask based UnX to uncover more foreground areas from much wider uncertain regions, which acts as sidesupervision to fully explore the uncertain areas and suppress the interference from the apparent background objects.
The recent development of vision transformer spawns novel attention based methods [21]- [24], [44]. For example, Transfuse [21] and SwinE-Net [44] employs a two-branch architecture combining CNNs and transformers in a parallel style, while Dong et al. [22] applied the transformer to the backbone. These latest ideas show impressive results with higher accuracies. But transformers can only extract global features. Our method additionally integrates FeE to the transformer encoder for more fine scales and takes UnX to discover more foreground objects from the uncertain regions.
There are some studies on learning robust multi-scale features [10], [19], [30], [31], [42] for processing the vast varieties of polyp shapes and sizes. For example, some studies [10], [42] adopt standard convolution or pooling of different sizes to extract multi-scale features. However, standard convolution may require more tuning parameters or have limited receptive field, and pooling may cause loss of spatial information. Luckily, atrous convolution [45] was put forward to enlarge the receptive field, which was adopted by some studies [19], [30], [31] to do polyp segmentation. For example, Sun et al. [30] adopted only one atrous convolution at the end of encoder to widen the receptive field, but it is difficult to capture the appropriate context with only one atrous convolution. ResUNet++ [31] introduces atrous spatial pyramid pooling (ASPP) [46] to capture multi-scale context information, which adopts larger atrous rates, but it can be insufficient to deal with small polyps. Our proposed FeE adopts several atrous convolutions in parallel with different atrous rates appropriate for polyps, which can extract multiscale context information for large size variation of polyps.

III. THE PROPOSED METHOD
This section will first introduce the structure of the proposed framework briefly and then present two main modules, UnX and FeE, in details.
Overall, the proposed method targets at segmenting polyps in colonoscopy images robustly without blurry boundaries and the distraction of background objects. Its main strategy is to explore the uncertain areas through a ternary mask to remove the apparent backgrounds, which also can be taken as mimicking the human deliberation process with an explicit background prior.
The first important component is the salient foreground extraction which provides important foreground cues for creating an effective ternary guidance mask. Here, the transformer [26], [27], [47] which has been proved to exhibit stronger performances than the traditional CNN methods [48], [49] is adopted. In particular, the transformer encoder is applied so that strong backbone features can be obtained to support the robust extraction of the salient foreground parts and further complete and efficient segmentation.
Then the features from different stages of the encoder can be merged through a Partial Decoder (PD) module [28] to obtain the segmentation result. However, this backbone can only detect salient foreground regions and thus be difficult to localize the whole objects, especially when there are blurry boundaries between polyps and surrounding tissues. Therefore, UnX is integrated, which takes the ternary guidance mask to expand the foreground areas from the uncertain regions. UnX relies on the features from PD, which are generated with those from the encoder transformers.
However, the transformer features can only provide limited information about the global distribution [21] and thus miss the important local details for robust segmentation of sizeand-shape varied polyps. Therefore, FeE is integrated at various stages and thus supplies rich features in both local and global scales for boosting the performance of UnX. Fig. 2 demonstrates the principle of the proposed method with the refined features progressively obtained by UnX and FeE. It can be seen that UnX can help obtaining more complete foreground features than the baseline (Fig. 2(d)). However, its features are less rich and discriminative than those obtained after additionally integrating FeE (Fig. 2(e)).
Accordingly, our framework is designed in this way (Fig. 3). It takes the four-stage transformer encoder T as  Wang et al. [27] to obtain the initial backbone features and adopts FeE to each encoder stage so that multi-scale features can be obtained as input to PD. FeE is not considered for the first backbone stage because the feature is too coarse. The PD features U (5) are then input to UnX which combines them with the backbone features F (5) from the last encoder stage to enhance the discrimination of the network, so that more foreground objects in the uncertain areas can be discovered. The enhanced features U (4) are then up-sampled and input to the next UnX. There they will be combined again with the corresponding backbone features F (4) in the higher scales to further boost foreground features in the uncertain areas. This process repeats at the previous encoder stages with features F (3) and F (2) consequently till reaching the coarsest layer which is too coarse to consider. There will be correspondingly enhanced features U (3) and U (2) from UnX modules and the final features U (2) is taken as the prediction after applying the sigmoid function. Formally, the whole process can be written as where: 1) I represents the input image; 2) U and F represent the functions of UnX and FeE respectively; and 3) where P denotes the function of PD. We now discuss the details of UnX and FeE.

A. THE UNCERTAINTY EXPLORATION MODULE
The uncertainty exploration module UnX aims at finding more complete foreground regions from the uncertain areas while eliminating the interferences from the backgrounds according to the saliency distribution from the transformer encoder. The salient regions represent significant foreground objects and, therefore, their features can be taken as a guidance to divide the whole feature map into three types areas: foreground, background and uncertainty. Foreground represents the high-response areas, i.e., the salient foreground; background represents the low-response areas, i.e., the background objects apparently dissimilar to the foreground; and uncertainty is for features with the remaining, intermediate responses, i.e., representing the uncertain areas.
The uncertain areas can be both foreground and background objects and thus should be explored again for more salient features belonging to the foreground. Those features can be incorporated with the salient features from the backbone and consequently help boosting the discrimination of the network to recognize the foreground more completely and accurately.
Now the question is how to obtain the uncertain areas. Naively, a binary mask may be used to obtain them only, e.g., setting the low and high-response areas to zero. However, boosting the foreground features with such an equal setting to both types of areas may lead to some background areas to be fake detected again. Therefore, it's better to have a mask than can help suppressing the low-responses areas. Consequently, a ternary guidance mask is defined, where the pixels corresponding to the low-response areas and highresponse ones are set to -1 and 0 respectively with the pixels corresponding to the uncertain ones set to 1. This setup will make the network suppress the low-responses areas so that no background objects can be detected. It also emphasizes the uncertain areas by the highest mask values.
UnX is just designed based such a ternary mask (Fig. 4). The current feature map U c from previous low-resolution modules is firstly up-sampled and then normalized by the sigmoid function to obtain a feature map U s .
where δ and U denote the sigmoid function and up-sampling operation respectively. Then, a ternary guidance mask M is generated from U s according to two thresholds δ h and δ l . The value of i-th pixel M i is set to be 1, 0 or -1 according to its corresponding feature value in U s being between these two thresholds, higher than δ h or lower than δ l , representing area types of uncertainty, foreground or background, respectively.
Once M is ready, it will multiply with the features from current stage, F c , to obtain an weighted feature map F t . This multiplication significantly highlights the uncertain areas for further foreground feature extraction.
According to Eq. (6), the responses from the background areas will be inverted in F t and consequently their possibilities of being foreground objects is greatly reduced. At the same time, the foreground areas will be erased. As a result, the features from the potential foreground areas show up. Finally, F t is 3 × 3 convoluted for further feature integration and then combined with the up-sampled features from the previous modules U c through element-wise addition. This map is convoluted again by a 3 × 3 filter for integration so that a novel UnX feature U c is output for the next module.  where f c represents the 3 × 3 convolution.

B. THE FEATURE ENHANCEMENT MODULE
The feature enhance module FeE aims at supplying supply multi-scale features for the polyps with varying size and shapes, considering that the backbone transformer features can only supply limited global-scale information. Here atrous convolution is considered to for this target. As is a special form of the standard 3 × 3 convolution, it expands the receptive field by inserting gaps between pairs of convolution elements without introducing more parameters. Different atrous rates lead to different receptive fields and thus can help obtaining abstract features for large or small targets, which fits well with the varying polyps. Therefore, FeE take multiple atrous convolutions at different atrous rates to obtain mutli-scale features. However, big atrous rate is not necessary because generally there are few big polyps. Therefore, FeE only considers three parallel branches with the max rate 5 to extract features in different scales simultaneously (Fig. 5). Taken the ordinary 3 × 3 convolution as the atrous convolution whose atrous rate being 1, the three branches all start from the convolution at a rate 1 and further convolve at different atrous rates among 1, 3 and 5. Features extracted from these branches are integrated by the concat operation, and further convoluted by a 3 × 3 filter to obtain the final output F c . The whole process can be formulated as follows: where C and and B i (i = 1, 3, 5) represent the concat operation and functions of the branches with a max atrous rate i respectively. We now discuss the loss design.

C. LOSS FUNCTION
There are totally four side supervisions gradually integrated to the four stages of the framework (Fig. 3). Therefore, the overall loss can be written as where L m (m = 2, 3, 4, 5) are the corresponding sidesupervision losses between the UnX features U (m) and the ground truth. For each side supervision, two losses are considered: the binary cross entropy loss (BCE) [50] and Intersection over Union (IoU) loss [51]. The former is the most widely used loss function based on pixel-level constraints, while the latter aims at optimizing the global structure rather than focusing VOLUME 4, 2016 on individual pixels. Consequently, the losses for the m-th side supervision can be formulated as where L BCE m and L IoU m are the BCE and IoU losses respectively: (11) and where p i andp i represent ground truth and the output from the proposed framework respectively for i-th pixel in the m -th stage output. The whole framework including the new modules, UnX and FeE, has been introduced till now and next the training process to fulfill the segmentation is summarized.

A. TRAINING SETTINGS
The proposed model is implemented in PyTorch [52] with NVIDIA RTX 2080Ti for GPU acceleration. Adam [53] algorithm was used to optimize the model parameters. The learning rate is set to 1E-4, while the batch size for training is 16 with the maximum epoch setting to 100. Thresholds δ h and δ l are set to 0.7 and 0.3 respectively.

B. TRAINING ALGORITHM
The whole process is trained in an end-to-end way as shown in Algorithm 1. Note that the j-th UnX module in this algorithm represents to the UnX module directly connecting to the j-th encoder stage as shown in Fig. 3.

A. DATASETS
The experiment was carried out on five colonoscopy image datasets for polyp segmentation:  [38] It is a test set selected from En-doScene [38], which contains 60 574×500 images from 44 colonoscopy sequences from 36 patients. • Kvasir [57] It consists of 1000 polyp images. Unlike the other datasets, its image sizes vary from 332×487 to 1920 × 1072, and the polyp sizes and shapes also vary. The same training and testing data as [3] are adopted for fair comparison. The training dataset contains 1450 images Algorithm 1 Algorithm for the Polyp Segmentation of Colonoscopy Images by Exploring the Uncertain Areas Input: Image I, Ground truth G. Output: The segmentation prediction, P .
1: Initialize the transformer encoder T with pre-trained parameters and randomly initialize the remaining parts. 2: while training is not convergent do 3: Generate the backbone feature maps F (i) (i = 1, 2, 3, 4) ← T(I, G).

13:
end while 14: Optimize network by minimizing L. 15: end while 16: Compute P by sigmoid, i.e., P = δ(U (2) ). selected from Kvasir [57] and CVC-ClinicDB [55]. The sizes of all training images are set to 352 × 352. All five datasets except the training images from Kvasir and CVC-ClinicDB are taken as testing images.

B. EVALUATION METRICS
Several popular metrics are adopted to evaluate the performances, including the Dice Similarity Coefficient (DSC) [58], Intersection over Union (IoU) [59], Weighted Fmeasure (F w β ) [60], and Mean Absolute Error (MAE) [61]. DSC and IoU are similarity measures at the regional level, focusing on the internal consistency of the segmented objects: where A and B represent pixel sets for the ground truths and their detection results, respectively. This paper additionally takes mDSC and mIoU to separately represent the average values of DSCs and IoUs of all n test images, i. e., where IoU i and DSC i represent IoU and DSC for i-th test image respectively. F w β comprehensively considers the recall and precision and eliminates equal consideration of all pixels in conventional indicators.
where: P w and R w represent precision and weighted recall [60], respectively; and 2) β is the coefficient and set to one here. MAE is a pixel-by-pixel comparison index, denoting the average absolute error between the predicted value and the true value. The smaller the MAE, the better the model.
wherep i and p i represent the prediction and corresponding ground truth for i-th pixel of total n pixels.

C. RESULTS
Three types of experiments are taken to show the performances of the proposed method: Qualitative and quantitative experiments, and ablation study. In the experiments, several methods including the state-of-the-arts methods are adopted, including U-Net [16], SFA [42], PraNet [3], UACANet-S [4] SANet [33] and MSNet [8]. For fair comparison, the segmentation results are not thresholded with 0.5.

1) Qualitative experiments
These experiments are on the visual comparisons of the proposed method with the existing methods including the stateof-the-arts ones. Fig. 6 shows some segmentation results of different methods. The proposed method can effectively detect the polyps of different shapes without blurry boundaries (e.g., 1 st row and 2 nd row) and fakes from the similar tissues in backgrounds (e.g., 3 nd row). It can be seen that our results are closest to ground truths for all images among all methods. Fig. 7 show the comparison of the segmentation results on some challenging images. Those polyps are either rather small (e.g., 2 nd row ) or with irregular shapes (e.g., 1 st row). The proposed method again demonstrates strong segmentation abilities. Especially, for ETIS (3 nd row), only our method can successfully locate the polyp without the affections of other tissues in the background.

2) Quantitative experiments
Statistical comparisons with existing methods are taken to quantitatively evaluate the effectiveness of the proposed method. Considering training sets are selected from CVC-ClinicDB and Kvasir, the comparisons on learning ability are undertaken firstly. As shown in Table 1, our model is optimal on both datasets, with mDSC 1.8% and 0.5% higher than the state-of-the-arts method MSNet in CVC-ClinicDB and Kvasir, respectively.
Next the generalization abilities are experimented with three unseen datasets (CVC-ColonDB, ETIS and CVC-300) ( Table 2). The results of the proposed method are still optimal. Specially, for ETIS, the most challenging among the five datasets, where most images contain too small polyps to be easily find, our method achieves 7.7% and 5.6% improvements in mDSC and mIoU respectively, comparing with the state-of-the-arts method MSNet. Fig. 8 shows the DSCs of these methods under different thresholds on five polyp segmentation benchmark datasets. These curves show that the proposed model consistently outperforms other models, which proves its good capability for polyp segmentation.

3) Ablation study
The quality of the proposed method, especially the performances of UnX and FeE are evaluated through this study. Different configuration of the propose method are considered here, denoted with different notations as follows: First, the feature abstraction abilities of different configurations are visually tested, where the feature maps are extracted from the last convolution before PD and uniformly re-sized to 44×44 for better display. As can be seen in Fig. 9, the proposed network (full model) can detect more complete polyps than the network without UnX (w/o UnX). This is because the UnX can mine more polyp regions in the uncertain areas efficiently. And compared to the network without FeE (w/o FeE), our proposed network (full model) can capture clearer details to enhance the feature representation. Then the statistical comparisons of the performances under these configurations are taken (Table 3). As can be seen, except on Kvasir, the results from w/o UnX or w/o FeE are lower than those of the proposed full model. For Kvasir, the results from the full model are only slightly lower than those from w/o FeE. This is due to the fact that there are more other surrounding tissues on Kvasir dataset, so the network with FeE may extract those features which can interfere the feature responses of polyps.

VI. CONCLUSIONS AND DISSCUSSIONS
There are two main challenges in polyp segmentation of colonoscopy images: blurry boundaries and close resemblances between polyps and surrounding tissues. To overcome these two difficulties, a new transformer encoder based polyp segmentation network is introduced, which takes a powerful ternary guidance mask based module, Uncertainty eXploration (UnX), to disclose more latent foreground areas from the uncertain areas and thus obtain robust foreground responses for complete object localization without VOLUME 4, 2016     the distraction of tissues in the backgrounds. A multi-scale feature augmentation module, FeE, is also incorporated into the framework to obtain enhanced multi-scale features for UnX, so that varying sizes of shapes of polyps can be coped with efficiently. A series of quantitative and qualitative experiments show that the proposed method is superior than the state-of-the-arts methods and can achieve robust polyp segmentation of colonoscopy images without the affections of blur boundaries and surrounding tissues. However, the inconsistent color distributions across different colonoscopy image datasets may affect the segmentation results. In addition, some images with an extremely small proportion of polyps, i.e., as small as less than 0.1 of the total image area, are also challenges for the segmentation performance. In the future, new tactics to overcome these limitations will be figured out, e.g., new data enhancement methods or the inclusion of the weighed loss to balance the contributions from both background and foreground pixels.
QINGQING GUO received the M.S. degree in 2017 from Anhui University. She is currently pursuing the doctor's degree with the School of Computer Science and Technology. Her research interests include computer vision and medical image processing.
XIANYONG FANG is currently a Full Professor with the School of Computer Science and Technology and also the Director of the Institute of Media Computing, Anhui University, Hefei, China. He was a Post-Doctoral Researcher with the Centre National de la Recherche Scientifique, Laboratoire d'informatique pour la mécaniqueet les sciences de l'ingénieur, Orsay, France. His current research interests include computer vision and computer graphics.
LINBO WANG received the B.S. degree in computer science from Shandong University, Jinan, China, in 2005, and the Ph.D. degree in computer science from Nanjing University, China, in 2014. He is now an associate professor at the School of Computer Science and Technology, Anhui University, China. His research interests include computer vision, image processing and computer graphics. VOLUME 4, 2016