Automatic Image Thresholding Based on Shannon Entropy Difference and Dynamic Synergic Entropy

An automatic thresholding method based on Shannon entropy difference and dynamic synergic entropy is proposed to select a reasonable threshold from the gray level image with a unimodal, bimodal, multimodal, or peakless gray level histogram. Firstly, a new concept called Shannon entropy difference is proposed, and the stopping condition of a multi-scale multiplication transformation is automatically controlled by maximizing Shannon entropy difference to produce edge images. Secondly, the gray level image is thresholded by the gray levels in order from smallest to largest to generate a series of binary images, followed by extracting contour images from the binary images. Then, a series of gray level histograms that can dynamically reflect gray level distributions and pixel positions are constructed using the edge images and the contour images synergically. Finally, dynamic synergic Shannon entropy is calculated from this series of gray level histograms, and the threshold corresponding to maximum dynamic synergic entropy is taken as the final segmentation threshold. The experimental results on 40 synthetic images and 50 real-world images show that, although the proposed method is not superior to 8 automatic segmentation methods in computational efficiency, it has more flexible adaptivity of selecting threshold and better segmentation accuracy.


I. INTRODUCTION
Image segmentation is one of the most fundamental, useful, and studied topics in image processing and analysis. The goal is a partition of the image into coherent regions, which is an important initial step in the analysis of image content. Numerous image segmentation algorithms have been developed in the last several decades, from the earliest methods, such as image thresholding [1], region growing and merging [2], [3], clustering [4], [5], watershed segmentation [6], [7], to more complex algorithms, such as active contours [8], graph cuts [9], [10], and deep learning-based methods [11], [12].
Among these methods, image thresholding is a simple, yet effective, way of separating targets from the background, when the gray levels of the pixels belonging to targets are substantially different from the gray levels of the pixels The associate editor coordinating the review of this manuscript and approving it for publication was Jiachen Yang .
belonging to the background [1], [13]- [15]. Image thresholding is also one of the most commonly used low-level image processing methods in various image analysis systems [1], [16]- [19]. Image thresholding compares the gray level of each pixel in a gray level image with a selected threshold to determine whether the pixel belongs to the targets or the background. Selecting an appropriate threshold thus becomes a key step for accurate thresholding segmentation, and the core goal of image thresholding is to make the selected threshold as close as possible to the expected optimal threshold [20].
In terms of automatically selecting threshold, one representative idea is to apply the principle of maximum entropy [21] in information theory to select the segmentation threshold [22], [23]. The types of entropy involved in the thresholding methods derived from this idea are mainly Shannon entropy, Rényi entropy, Tsallis entropy, Arimoto entropy, and Masi entropy [24]. The basis for calculating the above various entropies is first to obtain the gray level distribution, or the gray level histogram, from a gray level image. According to the dimensions of gray level histogram and the spatial position relationship between pixels involved in constructing the gray level histogram, the gray level histogram can be divided into two categories: global one-dimensional histogram and local two-dimensional histogram. The change from global one-dimensional histogram to local two-dimensional histogram reflects a main idea in the development of thresholding methods based on the principle of maximum entropy: considering both gray level distributions and pixel positions.
The global one-dimensional gray level histogram can be established by counting the occurrence frequency of each gray level in a gray level image. A thresholding method based on the principle of maximum entropy with the global one-dimensional gray level histogram is usually called a global entropy method [25]- [30]. The global entropy method can be traced back to maximum Shannon entropy (MSE) method [25] proposed by Kapur et al. The core idea of MSE method is that when the sum of the background Shannon entropy and the target Shannon entropy takes a maximum value, the corresponding gray level in that situation is selected as a segmentation threshold. The theoretical premise of applying the MSE method is that Shannon entropy satisfies the additivity principle for statistics independent subsystems. When the gray level distributions of the background and the target are independent of each other, for example, the gray level distributions of the background and the target are both a uniform distribution and their distributions do not overlap, the MSE method can obtain a theoretical optimal segmentation threshold [31]. However, many real-world images, affected by factors such as noise and low pass filter, usually contain non-extensive information that is long-range correlation or long-term memory [30]. Shannon entropy cannot effectively express non-extensive information. Thus, it is difficult for the MSE method to select a reasonable segmentation threshold from those real-world images.
Many subsequent methods, such as maximum Rényi entropy method [26], maximum Tsallis entropy method [27], maximum Arimoto entropy method [28], and maximum Masi entropy method [30], continue this line of thinking similar to the MSE method, but they adopt Rényi entropy, Tsallis entropy, Arimoto entropy, or Masi entropy in terms of entropy model. For statistics independent subsystems, Rényi entropy, Tsallis entropy, and Arimoto entropy have the ability to describe non-extensive information, while Masi entropy can describe both extensive and non-extensive information [30]. Theoretically, the maximum Rényi, Tsallis, Arimoto, or Masi entropy method has the potential to improve the MSE method. However, Rényi entropy, Tsallis entropy, Arimoto entropy, and Masi entropy are all the entropy with parameters, and they just utilize additional parameters to characterize the extensibility and/or non-extensibility of random system. In practice, segmentation results are often sensitive to the used parameters when these entropies with parameters are applied to image thresholding [30]. This means that before a reasonable threshold can be automatically selected, an appropriate entropy parameter should be first automatically evaluated [29]. However, this is still currently an open issue. If an empirical fixed parameter or manual adjustment parameter is adopted, it will inevitably limit the automatic adaptability of these methods based on the principle of maximum entropy.
A thresholding method based on the principle of maximum entropy with the local two-dimensional gray level histogram is usually called a local entropy method [22], [32]- [36]. Local entropy methods realize a common problem of global entropy methods: the spatial correlation between pixels is not considered when the one-dimensional gray level histogram is constructed. For a specific global entropy method, as long as the one-dimensional gray level histograms of the gray level images are the same, the method will obtain a same segmentation threshold even if the contents of the gray level images are different. Local entropy methods attempt to alleviate the deficiencies of global entropy methods in the description of image content by constructing two-dimensional gray level histograms. Local entropy methods mainly adopt two ways to construct two-dimensional gray level histograms. The first way is to use an original gray level image and its local mean image [23], [32], [33], [35]. The second way is to construct a gray level co-occurrence matrix from an original gray level image [22], [34], [36]. Although it is possible to construct various gray level co-occurrence matrices with different directions and pixel distances, it mainly uses simple four-neighbor relationship of pixel pairs to build the gray level co-occurrence matrix because of the difficulty in automatically determining the most reasonable direction and pixel distance for a given gray level image.
While extending global entropy methods, local entropy methods implicitly or explicitly inherit three shortcomings of global entropy methods. First, although local entropy methods consider the spatial correlation between pixels, it limits the correlation to a small range of local neighborhoods. Second, if local entropy methods adopt Shannon entropy, the non-extensive information in a two-dimensional gray level histogram still acts as an obstacle to evaluating segmentation thresholds; if Rényi entropy, Tsallis entropy, Arimoto entropy or Masi entropy is adopted, local entropy methods will also face the problem of automatically evaluating the corresponding entropy parameters. Third, local entropy methods extend the dimension of the gray level histogram, thus there are more choices when applying the principle of maximum entropy. However, local entropy methods no longer consider the original gray level image after establishing twodimensional gray level histograms, and this cuts off the association between the original image and the segmentation threshold.
In addition to the inherent shortcomings of global entropy methods and local entropy methods, the complex and variable targets and backgrounds also objectively increase the difficulty of automatically selecting a reasonable segmentation threshold. Affected by factors such as random noise, low pass filter, as well as the size of target and background, the gray level histogram of an image may be unimodal, bimodal, multimodal, or even peakless. When the basic distribution constituting a gray level histogram is a non-Gaussian distribution, such as a gamma distribution, an extreme value distribution, a Rayleigh distribution, a uniform distribution or a beta distribution, it remains challenging to automatically select the segmentation threshold that is as reasonable as possible.
To overcome the shortcomings of global entropy methods and local entropy methods, and also to deal with automatic threshold selection under the above different gray level distributions in a unified framework, this article proposes an automatic thresholding method guided by maximizing dynamic synergic entropy (MDSE). The MDSE method constructs dynamically a series of one-dimensional gray level histograms using synergically an invariant guiding edge image and a series of changing contour images. The guiding edge image is produced by performing a multi-scale multiplication transformation on an original gray level image, where the stopping condition of the multi-scale multiplication transformation is automatically controlled by maximizing Shannon entropy difference. The changing contour images are generated by continuously thresholding the original gray level image with every possible gray level, and then extracting internal and external contours from these binary images. The one-dimensional gray level histograms are synergically constructed from the guiding edge image and the changing contour images, which considers both gray level distributions and pixel positions. Thus, the calculated Shannon entropy (dynamic synergic entropy, DSE) based on this series of one-dimensional gray level histograms lays a foundation for applying the principle of maximum entropy to automatically select a reasonable threshold.
Some main contributions of this study can be summarized as follows: 1 In terms of extending the theory of Shannon entropy, a new concept called Shannon entropy difference is proposed, and it is used for automatically controlling the stopping condition of a multi-scale multiplication transformation. 2 Based on the analysis of the monotonicity of DSE, a new objective function is proposed to automatically select appropriate segmentation thresholds from the gray level images with different gray level distributions. 3 A novel image thresholding method called MDSE is proposed, which has better segmentation adaptability and robustness than other image thresholding methods. 4 40 synthetic images and 50 real-world images with different gray level distributions are generated or collected, and their corresponding segmentation reference images are also provided. All these images are shared online.
The rest of this article is organized as follows: Section II focuses on a new concept, i.e., Shannon entropy difference, and its application. In particular, Section II.A defines the concept of Shannon entropy difference, and discusses the relationship between Shannon entropy difference and gray level histogram; Section II.B analyzes a technique called multi-scale multiplication transformation and proposes a criterion maximizing Shannon entropy difference to automatically stop computing multi-scale multiplication.
Section III proposes a new criterion of selecting threshold based on DSE, and analyze the rationality of calculating the final threshold according to the new criterion. Section IV describes the corresponding algorithm steps. Section V analyzes and discusses the experimental results of the proposed MDSE method and 9 compared methods on 40 synthetic images and 50 real-world images. Finally, SectionVI draws several conclusions and describes future works.

A. DEFINITION AND ANALYSIS OF SHANNON ENTROPY DIFFERENCE
Given a gray level histogram of a gray level image, let a gray level l divide this histogram into two parts (see Fig. 1). Suppose that there are m gray levels on the left part, the discrete probability distribution of these gray levels is q i (1 ≤ i ≤ m) and m i=1 q i = 1, and the total frequency on the left part is L. Suppose that there are n gray levels on the right part, the discrete probability distribution of these gray levels is p i (1 ≤ i ≤ n) and n i=1 p i = 1, and the total frequency on the right part is R. Let H Left and H Right denote the Shannon entropy corresponding to the left and the right parts, respectively. According to the definition of Shannon entropy, the following equations can be given: Let H Left∪Right indicate the Shannon entropy corresponding to the whole histogram, and its expression is where Further, we define H Right −H Left∪Right as Shannon entropy difference. The next 3 propositions will be used for proving and analyzing: to increase the Shannon entropy difference, what kind of distribution characteristics the gray level histogram in Fig. 1 should have.
After simplifying it, we have Further, let the above expression be greater than 0, then we get the solution s > − H Left∪Right take a relatively greater value. In other words, the Shannon entropy H Right corresponding to the right histogram composed of relatively few pixels should be as great as possible, and the Shannon entropy H Left corresponding to the left histogram composed of relatively many pixels should be as small as possible, which will make the Shannon entropy difference H Right − H Left∪Right take a relatively greater value. Further, according to the property that more uniform the probability distribution tends to, the greater the Shannon entropy becomes [37], a kind of gray level histogram that makes the Shannon entropy difference H Right − H Left∪Right tend to take a relatively greater value is as follows: the right histogram composed of relatively few pixels should be distributed as uniformly as possible over a wider range of gray levels, and the left histogram composed of relatively many pixels should be distributed as narrowly as possible over a narrower range of gray levels. The next Section II.B will elaborate on how to obtain the gray level histogram with the above distribution characteristics.

B. GUIDING EDGE IMAGE BASED ON MAXIMIZING SHANNON ENTROPY DIFFERENCE
Let the symbol f denote a gray level image, and let the symbol ∇g x denote the partial derivative of two-dimensional +y 2 2σ 2 with respect to x. Let the symbol κ x (σ ) denote the absolute value of the output image resulting from the convolution of the image f with the kernel ∇g x : The image κ x (σ ) is closely related to convolution scale σ , and different σ will produce different image κ x (σ ). Define the multi-scale multiplication transformation of the image f in the x-axis direction as the product of u x different images κ x (σ i ): According to the sampling theory based on the Gaussian kernel function, convolving an image with a (8σ i + 1) × (8σ i + 1) Gaussian kernel can ensure that the convolution result sufficiently approximates the result obtained by convolving the image with the full Gaussian distribution [38]. In addition, for the convolution operation on digital images, the neighborhood defined by a convolution kernel is often centered on a pixel and the neighborhood size is usually odd [39], such as 3×3, 5×5, 7×7, etc. Thus, each convolution scale can be calculated in turn as In Eq. (5), the existence of Gaussian function makes edge signals and noises (or random details) have different response characteristics to ∇g x . When the convolution scale σ gradually increases, the responses of noises (or random details) rapidly decrease, while the responses of edge signals show the following characteristics: 1 the response at the center of the edge remains relatively good; 2 the response gradually decreases as the pixel position gets farther and farther from the center of the edge.
Under the premise that the gray level of the image K u x is normalized to [0, 255], as the number u x of images participating in multi-scale multiplication transformation gradually increases, the product of the response values of noises (or random details) will gradually approach the minimum value 0, and the product of the response values of the edge signals will be spread between 0 and 255 (see the left images in Figs. 2(b)-(e)). As u x gradually increases, the gray level histogram of the image K u x will show the following characteristics or tendencies (see the right images in Figs. 2(b)-(e)): 1 the gray level histogram shows a heavy right tail distribution. 2 the mode of the gray level histogram gradually shifts to the left, and the mode will be equal to 0 when u x is large enough. 3 if the value u x is continuously increased so that it exceeds a certain critical value, the multiplicative effect will cause more response values of edge signals to gather near 0. As a result, the gray level distribution between 0 and 255 becomes increasingly sparse, and the number of gray levels corresponding to the frequency 0 is gradually increasing.
The relationship between the gray level histogram of the image K u x and the value u x indicates that a relatively appropriate value u x needs to be found so that the gray level histogram of the image K u x is more consistent with the gray level distribution characteristics expected in Section II.A. To this end, a natural choice is to maximize the Shannon entropy difference of the gray level histogram of the image K u x , which can be formalized as: Once the number u * x of images participating in the multiscale multiplication transformation is determined, the corresponding image K u * x can be calculated by combining the Eq. (5) and (6). Here the image K u * x will be called the guiding edge image in the direction of x-axis. The above analysis and reasoning to the image f in the direction of x-axis are also applicable to the case in the direction of y-axis. Thus, for the image K u y in the direction of y-axis, the relatively appropriate number u * y of images participating in multi-scale multiplication transformation can be calculated according to the following Eq. (8): Similarly, the image K u * y will be called the guiding edge in the direction of y-axis.

III. CRITERION OF SELECTING THRESHOLD BASED ON DSE
A binary image b t is produced by thresholding a gray level image f with a gray level t (see Fig. 3(c)), and a contour image c t is extracted from the binary image b t (see Fig. 3(d)). The pixels with the value 1 in the image c t are further utilized to sample the guiding edge images K u * x and K u * y calculated in Section II.B, and then two new gray level histograms ϒ x and ϒ y are reconstructed from the sampled pixels. Figs. 3(b), (d), and (e) show a visual example of constructing a gray level histogram ϒ x .
The MDSE method proposes the following objective function to select the final threshold t * : where t min and t max represent the minimum and maximum gray level in the gray level image f , and H Left t ∪Right t x and H Left t ∪Right t y are Shannon entropy calculated from the gray level histogram ϒ x and ϒ y , respectively.
When the gray level t gradually varies from t min to t max , the contour image c t changes continuously, and consequently the corresponding gray level histogram ϒ x and ϒ y also change continuously, so that the Shannon entropy to dynamically and synergically reflect the spatial relationship between the pixels that constitute targets and the background. The Shannon entropy calculated in this way is hereinafter called DSE (dynamic synergic entropy).
Here we further analyze the rationality of calculating the final threshold t * according to Eq. (9). Because the analysis of H corresponding to the gray level histogram ϒ x . One is to do the calculation based on the entire ϒ x , the other is to divide the gray level histogram ϒ x into left and right parts with the gray level l * x calculated by Eq. (7), and compute Shannon entropy in the same way as H Left∪Right in Section II.A. The first method is more direct and more efficient, and it will be used in the specific implementation of the proposed algorithm. The second method can construct a multivariate function consisting of 3 independent variables, which facilitates theoretical analysis of the rationale for the VOLUME 8, 2020 threshold t taking a reasonable value. According to the second method, we can calculate H Left t ∪Right t x as follows: denotes the Shannon entropy corresponding to the right part in the gray level histogram ϒ x greater than the gray level l * x , H Left t x denotes the Shannon entropy corresponding to the left part in the gray level histogram ϒ x less than the gray level l * x , and s * where L * x and R * x are the number of pixels corresponding to the left and right part in the gray level histogram ϒ x , respectively.  . The gray level histogram ϒ x is constructed by sampling the guiding edge image K u * x with the pixels taking the value 1 in the contour image c t , and therefore the gray level histogram ϒ x is composed of the edge and non-edge pixels in the guiding edge image K u * x . In the gray level histogram ϒ x , the left part less than the gray level l * x is mostly composed of non-edge pixels in the image K u * x , and the right part greater than the gray level l * x is mostly composed of edge pixels in the image K u * x . Thus the value s * x reflects the ratio of the non-edge regions to the edge regions in the contour image c t located in the guiding edge image K u * x . For the contour image c t , the less s * x , the less the proportion of non-edge regions, which is an important feature expected for good segmentation. Thus, we can find a greater DSE H . Second, the y-direction edge in the image f is also suppressed to near 0 in the guiding edge image K u * x , whereas the y-direction contour still exists in the contour image c t (see Figs. 3(b) and (d)), which facilitates s * x easier to be greater than s ⊥ x . This is another reason why to separately process in the directions of x-axis and y-axis.

IV. ALGORITHM DESCRIPTION
Algorithm 1. Algorithm Name: MDSE Input: a gray level image f to be segmented Output: a selected threshold t * and a thresholding result image Step 1: Calculate the guiding edge images K u * x and K u * y according to Eqs. (5), (6), (7), and (8).
Step 2: Use H x t and H y t to record H Left t ∪Right t x and H Left t ∪Right t y during the subsequent loop, respectively, and use H t to record the arithmetic mean of H x t and H y t . Use H max to record the maximum H t during the subsequent loop, and use t * to record the gray level corresponding to H max . The initial values of H x t , H y t , H t , H max , and t * are all 0. For each possible gray level t ∈ [t min , t max ] in the image f , repeat the following Steps 3 to 6 in ascending order.
Step 3: Utilize the gray level t to threshold the image f to obtain the corresponding binary image b t .
Step 4: Extract the contour image c t from the binary image b t , which can be specifically divided into 3 small steps. Initially, let all pixels in the image c t take value 1; then extract the inner contour: if the value of a pixel in b t is 1 and its 4-neighborhood pixels all take value 1, set the pixel value of its corresponding position in c t to 0; finally extract the outer contour: after generating the complement image b t by reversing 0 and 1 in the image b t , the pixel and its 4-neighborhood pixels in the image b t are analyzed and judged in the same way as in the image b t , and the image c t is also set accordingly.
Step 5: Utilize the pixels with the value 1 in the image c t to sample the guiding edge image K u * x , and reconstruct a normalized gray level histogram from the sampled pixels, then calculate the DSE H x t from this normalized gray level histogram. Use the same way to process the guiding edge image K u * y and calculate the corresponding DSE H y t .
Step 6: Calculate the mean H t of H x t and H y t , and then judge the relationship between H t and H max . If H t > H max , let H max equal H t and t * equal t.
Step 7: Generate the binary image b t * by thresholding the image f with the final threshold t * , output the final threshold t * and the binary image b t * .

A. TEST ENVIRONMENT, QUANTITATIVE EVALUATION INDICATOR, AND METHODS PARTICIPATING COMPARISONS
The main software and hardware used for the test experiments are as follows: Intel Core i3-2350M 2.3GHz CPU, 4GB DDR2 memory, Windows 7 32-bit operating system, and Matlab 2009a 32-bit development platform. The test image set contains 40 synthetic images and 50 real-world images, and these test images have different gray level histograms. The test image set and the image set of segmentation reference are available online [40].
Misclassification error (ME) [20], [30] is a commonly used quantitative evaluation indicator when the segmentation accuracy of a thresholding method is evaluated. ME indicates the percentage of background pixels that are misclassified as target pixels and target pixels that are misclassified as background pixels in a segmentation result image, and its calculation formula is given as follows: where φ and ϕ respectively denote the set of the target and background pixels in the reference image, φ t and ϕ t respectively denote the set of the target and background pixels in the result image obtained by thresholding the test image with a threshold t, the symbol ∩ represents the intersection operation, and the symbol |·| means to calculate the number of elements in a set. ME will be 0% when the thresholding result image is exactly the same as the reference image; ME will be 100% when the thresholding result image is the complement of the reference image. Intersection over Union (IoU), also known as the Jaccard index, is also a popular evaluation metric for tasks such as segmentation, object detection and tracking [41]. IoU emphasizes similarity between finite sample sets, and is formally defined as the size of the intersection divided by the size of the union of the sample sets. The mathematical representation of IoU is written as: IoU will be 0 when the target pixels of the reference image and the thresholding result image do not overlap at all; VOLUME 8, 2020 FIGURE 5. ME values of 10 segmentation methods on 40 synthetic images with different gray level histograms. In each subfigure, the red dots, the green dots, the blue dots, the black dots, the red triangles, the green triangles, the blue triangles, and the black triangles indicate the ME values of the corresponding method on the first to eighth group of test images, respectively. In each subfigure, the red dots, the green dots, the blue dots, the black dots, the red triangles, the green triangles, the blue triangles, and the black triangles indicate the IoU values of the corresponding method on the first to eighth group of test images, respectively. The first group test images and the thresholds selected by the MDSE method. In each subfigure, the test image is shown on the left, its gray level histogram is displayed in the blue area on the right. In addition, the black curve on the right shows the objective function curve of the MDSE method for calculating thresholds, and the red dotted line and the number next to it indicate the threshold selected by the MDSE method (the same below). (a)-(e) sequentially show that the gray level distributions of the target and the background are a Gaussian distribution, a gamma distribution, an extreme value distribution, a Rayleigh distribution, and a uniform distribution.
IoU will be 1 when the target pixels of the reference image and the thresholding result image are a perfect match.
The proposed MDSE method is compared with global Masi entropy thresholding (MET) method [30], global adaptive Tsallis entropy thresholding (TET) method [29], local Shannon entropy thresholding (SET) method [22], iterative triclass thresholding (ITT) method [42], fuzzy entropy thresholding (FET) method [31], transition region thresholding (TRT) method [20], slope difference distribution (SDD) clustering method [4], fast and robust fuzzy c-means (FRFCM) clustering method [5], and interactive thresholding (IT) method [43]. The IT method interactively selects a segmentation threshold, and the binary image corresponding to this threshold has the smallest ME value. Therefore, the IT method can be used as a reference method for other methods participating comparisons in terms of segmentation accuracy.

B. EXPERIMENTS ON SYNTHETIC IMAGES
Affected by factors such as random noise, low pass filter, as well as the size of target and background, the gray level histograms of different gray level images may show different histogram patterns. In addition to the common bimodal pattern, there are unimodal, peakless, and even multimodal VOLUME 8, 2020 patterns. Moreover, there are many possibilities for the basic distribution consisting of the histogram pattern. In addition to the common Gaussian distribution, there are also gamma distribution, extreme value distribution, Rayleigh distribution, uniform distribution, and beta distribution.
In comparison experiments on synthetic images, to comprehensively test the segmentation adaptability of 10 segmentation methods to different histogram patterns, 40 different In the first group test images, the size ratio of the target to the background is approximately equal to 3:7. To simulate the basic distributions such as Gaussian distribution, gamma distribution, extreme value distribution, Rayleigh distribution, and uniform distribution, the test images are generated by adding Gaussian noise, gamma noise, extreme value noise, Rayleigh noise, and uniform noise to the noise-free synthetic image, respectively (see Fig. 7). Because the size ratio of the target to the background is relatively balanced, when the basic distribution is a Gaussian distribution, a gamma distribution, an extreme value distribution, and a Rayleigh distribution, the gray level histograms of these test images show typical bimodal patterns (see Figs. 7(a)-(d)); when the basic distribution is a uniform distribution, the gray level histogram of the test image show peakless pattern (see Fig. 7(e)). As shown by the red dots in Figs. 5-6 (the corresponding image numbers are 1 to 5): 1 the overall segmentation result of the MET VOLUME 8, 2020 method is the worst. 2 the segmentation results of the TET method are slightly better than the MET method in general, but its mis-segmentation is serious when the basic distribution is a gamma distribution and a uniform distribution. 3 the overall segmentation results of the SET, FET, and SDD method are better than the MET and TET methods, but their mis-segmentation are also serious when the basic distribution is a uniform distribution or a gamma distribution. 4 the segmentation results of the ITT, TRT, FRFCM, and MDSE methods are obviously better than the other 5 compared methods; in particular, the segmentation results of the MDSE and IT methods are completely consistent, and they achieve the theoretically optimal segmentation in terms of minimizing ME and maximizing IoU.
In the second group test images, the size ratio of the target to the background is approximately equal to 3:7. The test images are generated by adding beta noise with different parameters to the noise-free synthetic image, respectively. As a result, the gray level histograms of the test images are bimodal (see Figs. 8(a), (c)-(e)) or multimodal (see Fig. 8(b)). As shown by the green dots in Figs. 5-6 (the corresponding image numbers are 6 to 10): 1 both the MET and SET methods have serious mis-segmentations. 2 the segmentation results of the TET, ITT, FET, and SDD methods are unstable, and the ME value is sometimes large and sometimes small. 3 compared with the other 6 methods, the TRT, FRFCM, MDSE, and IT methods obtain significantly better segmentation accuracy; in particular, the proposed MDSE method and the IT method achieve completely consistent segmentation results, and they once again get the theoretically optimal segmentation.
In the third and fourth group test images, the size ratio of the target to the background is approximately equal to 3:1997. The test images in the third group are generated by adding Gaussian noise, gamma noise, extreme value noise, Rayleigh noise, and uniform noise to the noise-free synthetic image, respectively (see Fig. 9), while the test images in the fourth group are generated by adding beta noise with different parameters to the noise-free synthetic image (see Fig. 10). Because the size ratio of the target to the background is seriously unbalanced, the gray level histograms of the test images show a unimodal pattern (see Figs. 9(a)-(d) and Figs. 10(a), (c)-(e)) or a bimodal pattern (see Fig. 10(b)) or a peakless pattern (see Fig. 9(e)). We can observe that from the blue dots and the black ones in Figs. 5-6 (the corresponding image numbers are 11 to 15, and 16 to 20, respectively): 1 the MET and TET methods are more suitable for segmenting test images with right-biased unimodal histogram, but not for test images with left-biased unimodal histogram (see Fig. 9(c)); 2 the SET, ITT, FET, TRT, SDD, and FRFCM methods have serious mis-segmentations on these two groups of test images, which shows that these methods are not suitable for segmenting images with a serious imbalance between the size of target and background. 3 On these two groups of test images, the thresholds calculated by the MDSE method are always consistent with the thresholds selected by the IT method, so their ME values and IoU values are also always consistent.
The test images of the fifth and sixth group are generated by performing Gaussian point diffusion on the test images of the first and second group, respectively (see Figs. [11][12]. The targets and the backgrounds in these two groups of test images are both blurred. As a result, the gray level histograms of these test images are bimodal and with flat valleys, but they are different in the flat valleys. We can observe that from the red triangles and green ones in Figs. 5-6 (the corresponding image numbers are 21 to 25, and 26 to 30, respectively): 1 the ME values of MET, SET, FET, and SDD methods significantly decrease in general, and the IoU values of MET and SET methods significantly increase in general, which indicates that, contrasted with the test images with bimodal deep-valley histograms, the MET and SET methods are more suitable for segmenting the images with bimodal flat-valley histograms; 2 some ME or IoU values of the TET method have risen and some have fallen, but the TET method has relatively better a daptability to the test image with bimodal flat-valley histograms; 3 the average ME values of the ITT method, the MDSE method, the TRT method, and the IT method on the test images of Group 5 and Group 6 are 0.94%, 0.59%, 0.44%, and 0.42%, respectively, which indicates the ITT method, the TRT method, and the MDSE method can segment test images with bimodal flatvalley histogram very well.
The test images of the seventh and eighth group are generated by performing Gaussian point diffusion on the test images of the third and fourth group, respectively (see Figs. [13][14]. The two groups of test images all show a small blurred target on a blurred background. The gray level histograms of these test images are right-biased unimodal. As shown by the blue triangles and black ones in Figs. 5-6 (the corresponding image numbers are 31 to 35, and 36 to 40, respectively): 1 the MET, TET, FET, TRT, and SDD methods are more suitable for segmenting test images with rightbiased unimodal histogram; 2 the SET, ITT, and FRFCM methods have more serious mis-segmentation on these two groups of test images; 3 the MDSE and IT methods have completely consistent segmentation results on the seventh group test images; only on the first, second, and fourth test images in the eighth group, the segmentation results of the MDSE and IT methods are slightly different.

C. EXPERIMENTS ON REAL-WORLD IMAGES
50 real-world images with different gray level histograms are utilized to further test the segmentation adaptability of 10 segmentation methods. The gray level histograms of these real-world images can be approximated by a mixture of several distributions such as Gaussian distribution, gamma distribution, extreme value distribution, Rayleigh distribution, uniform distribution, and beta distribution. The 50 test images are divided into 4 groups. The first group of images are numbered from 1 to 11, the second group of images are numbered from 12 to 28, the third group of images are numbered from 29 to 46, and the fourth group of images are numbered from 47 to 50. The gray level histograms of the first group of test images appear as unimodal left-biased, unimodal right-biased or unimodal unbiased patterns; the gray level histograms of the second group of test images are shown as bimodal deep-valley or bimodal flat-valley patterns; due to the complex background, the gray level histograms of the third group of test images appear as multimodal patterns with the number of modalities greater than or equal to 3; the gray level histograms of the fourth group of test images appear as peakless patterns.  Fig. 15 shows 4 test images numbered 3, 18, 39, and 47 and their corresponding gray level histograms, as well as the segmentation results obtained by the MDSE method. It can be observed that the MDSE method accurately separates the target from the background. In fact, for the 4 test images, the corresponding differences of the ME values between the MDSE method and the IT method are 0.088%, 0.092%, 0.003%, and 0.317%, respectively; the corresponding differences of the IoU values between the MDSE method and the IT method are −0.053, −0.003, −0.01, and −0.005, respectively. Additionally, the differences between the selected thresholds obtained by the MDSE method and the IT method are −4, −1, 1, and 1, respectively. All this indicates that the segmentation results of the MDSE method are very close to the optimal segmentation results determined by the IT method.

As 4 representative examples,
Figs. 16-17 shows the quantitative comparison results about segmentation accuracy of 10 segmentation methods on 50 real-world images. As shown in Figs. 16-17: 1 for the test images with unimodal, bimodal, multimodal, or peakless histograms, the average ME values of the MDSE method and the IT method are much smaller than the average ME values of other methods, and the average IoU values of the MDSE method and the IT method are much larger than the average IoU values of other methods; 2 although the ITT and FRFCM methods have good segmentation accuracy for the images with bimodal histogram, they have serious mis-segmentation in the case of unimodal, multimodal, and peakless patterns; 3 the SET method has serious missegmentation in all 4 cases, and the respective average ME values all exceed 11%, and the respective average IoU values are less than 0.736; in particular, the ME values on 12 test images are greater than 20%; 4 compared with the SET, ITT, and FRFCM methods, the overall segmentation accuracies of the TET, FET, and MET methods are slightly better, but they have relatively greater ME values for some test images, even greater than 20%.
The IT method interactively selects segmentation thresholds under the criterion of minimizing the ME value, so the thresholds it chooses are optimal ones in the sense of minimizing ME value. By calculating and comparing the differences between the thresholds selected by the IT method and other methods (see Fig. 18), the degree of deviation in the threshold selection of other methods can be quantitatively compared. As shown in Fig. 18 images are relatively close to the optimal thresholds, their selected thresholds are far from the optimal thresholds as a whole: the number of cases where the absolute value of the threshold difference is greater than or equal to 10 are 45, 41,37,35,44, and 27, respectively; the number of cases where the absolute value of the threshold difference is greater than or equal to 20 are still 33, 31, 31, 28, 33, and 15, respectively; 3 overall, the thresholds selected by the MDSE method on different gray level histograms are closer to optimal thresholds, and the absolute value of the threshold difference are all within 9 gray levels.

D. COMPARISON EXPERIMENTS ON COMPUTATIONAL EFFICIENCY
Comparing CPU time of different methods under same software and hardware conditions can intuitively reflect the difference in the computational efficiency of different methods. Note that CPU time of same program will fluctuate slightly when running at different times. To reduce this kind VOLUME 8, 2020 FIGURE 16. ME values of 10 segmentation methods on 50 real-world images with different gray level histograms. In each subfigure, the red dots, the green dots, the blue dots, and the black dots indicate the ME values of the corresponding method on the first to fourth group of test images, respectively. In each subfigure, the red dots, the green dots, the blue dots, and the black dots indicate the IoU values of the corresponding method on the first to fourth group of test images, respectively. FIGURE 18. The differences between the thresholds selected by the MDSE, MET, TET, SET, ITT, FET, TRT methods and the thresholds selected by the IT method. In each subfigure, the abscissa represents the image number, the ordinate represents the threshold difference, and the number next to each black dot indicates the corresponding threshold difference. of fluctuating effect, each segmentation method to be compared can be continuously run 20 times on same test image, and then take the average CPU time of these 20 times as the CPU time of this segmentation method on this test image.
Further, the mean and standard deviation of the CPU time for each segmentation method on 40 synthetic images and 50 real-world images can be calculated. It can be observed from Table 1 that the computational efficiencies of the ITT, MET, FRFCM, and SDD methods are relatively higher, while the computational efficiencies of the SET, FET, and MDSE method is relatively lower.
The ITT method first constructs the gray level distribution of original gray level image, and then performs a relatively simple calculation about mean and variance, therefore it takes the least average CPU time. The MET method also needs to calculate the gray level distribution of original image in advance, and then computes Masi entropy. The computation of Masi entropy involves logarithm operation and is relatively more complicated than the computation of mean and variance, so the computational cost of the MET method is slightly higher than that of the ITT method. The FRFCM method is an improved FCM clustering algorithm. Its main computations include morphological reconstruction of the image to be segmented, constructing gray level histogram of the reconstructed image, and fast filtering of membership. The computational cost of the FRFCM method is higher than that of the ITT method, so its average CPU time is more than that of the ITT method. The SDD method is also a clustering segmentation method. It first constructs a normalized gray level histogram and smooths the histogram in the frequency domain, and then calculates a slope difference of the smoothed gray level histogram. It finally selects clustering centers based on the peaks of the computed slope difference distribution. The last two calculations of the SDD method are more complex than the calculation of Masi entropy of the MET method, so the calculation cost of the former is slightly higher than the latter. In addition to the gray level distribution of original image, the TET method also needs to automatically calculate the entropy index q of Tsallis entropy by analyzing the redundancy of original image, as well as the power of q of each gray level distribution probability. As a result, its average CPU time is greater than the MET's CPU time. The TRT method is a thresholding method based on transition region. Its key points are to first extract a transition region set between targets and the background, and then follow a basic rule, that is, the elements of stable transition region set are equal or close to each other in the average gray level, to obtain the final segmentation threshold using the maximizing 1-STRS strategy. The main computations of this method occur in the construction of transition region set.
The SET method is a local entropy method. It needs to utilize an original gray level image to first construct a gray level co-occurrence matrix, and then divide the gray level co-occurrence matrix into 4 quadrants, and define the Shannon entropy on each quadrant. Compared with the ITT, MET, and TET methods that all are based on one-dimensional gray level histogram, the SET method is more time-consuming to build a gray level co-occurrence matrix, so its average CPU time is greater than ITT, MET, and TET methods. The FET method is a thresholding method based on fuzzy entropy. It essentially utilizes a fuzzy membership function to modify the gray level histogram of an original gray level image. Its main work involves two parts: one is to construct a fuzzy membership function based on the division of the gray level histogram, and the other is to define new fuzzy entropy based on the constructed fuzzy membership function. The computational cost of the MDSE method mainly arises from the relatively complex calculations involving the guiding edge images, the changing contour images, and the DSE. Consequently, its average CPU time is the relatively largest.
To reduce the CPU time of the MDSE method, the goal of determining maximum DSE can be divided into 2 procedures: jumping processing and stepwise processing. In the jumping processing, processing the gray level one by one in the interval [t min , t max ] in Step 2 of Algorithm 1 in Section IV is changed to a jumping processing with a certain step size γ , while the other steps for calculating the DSE are unchanged. A gray level t * temp is output after completing the jumping processing, and it corresponds to the maximum DSE generated during the jumping processing. Then, in the stepwise processing, the gray level in the interval [t * temp − γ + 1, t * temp + γ − 1] is processed one by one, that is, the interval [t min , t max ] in Step 2 in Algorithm 1 is replaced by the interval [t * temp − γ + 1, t * temp + γ − 1], while the other steps for calculating the DSE are unchanged. The final threshold t * is output after completing the stepwise processing, and it corresponds to the maximum DSE in the gray level interval [t * temp − γ + 1, t * temp + γ − 1]. As shown in Table 2, for 40 synthetic images and 50 realworld images: 1 when the step size γ changes from 1 to 5, the corresponding average CPU time decreases gradually, but the corresponding average ME value and average IoU value remain basically unchanged; 2 when the step size γ changes from 5 to 10, the corresponding average CPU time remains basically stable, but the corresponding average ME value and average IoU value fluctuate. Consequently, when the step size γ is 3, 4, or 5, the MDSE method can approach the TET method or the TRT method in terms of computational efficiency, while maintaining the segmentation accuracy.

VI. CONCLUSION AND FUTURE WORKS
For the gray level images with a unimodal, bimodal, multimodal or peakless histogram, when the gray level distribution of the target or the background can be approximated by a Gaussian distribution, a gamma distribution, an extreme value distribution, a Rayleigh distribution, a uniform distribution, or a beta distribution, compared with the MET, TET, SET, ITT, FET, TRT, SDD, and FRFCM methods, the MDSE method has more flexible adaptivity of selecting threshold and better segmentation accuracy.
The MDSE method possesses the above 2 advantages mainly due to the following 3 aspects: 1 the MDSE method generates the guiding edge images in horizontal and vertical directions respectively, which is conducive to highlighting the transitional regions between the target and the background in advance. Moreover, sampling the guiding edge images with the changing contour images favors constructing dynamically the gray level distribution of the transitional and nontransitional regions. 2 the MDSE method constructs a series of one-dimensional gray level histograms using the invariant guiding edge image and the dynamically changing contour images, and utilizes the Shannon entropy without parameter as the entropy calculation model. Thus, the MDSE method can calculate the Shannon entropy that indirectly reflects both gray level distributions and pixel positions. 3 for each possible gray level in the original image, the MDSE method utilizes it to threshold the original image to generate a corresponding binary image, and extracts a contour image from the binary image. During this whole procedure, the segmentation threshold maintains a close relationship with the original image by means of the contour image.
The MDSE method needs to calculate the guiding edge image and the changing contour images, which makes it inferior to the MET, TET, SET, ITT, FET, TRT, SDD, and FRFCM methods in terms of computational efciency. How to improve the computational efficiency of the MDSE method is worth further investigation. In the future, some generalized entropies can be considered to replace the Shannon entropy adopted by the MDSE method, then the automatic evaluation of the parameters in those generalized entropies will be worth further study.