A Defect Detection Method for the Image With Intersecting Feature

Since the intersecting feature between the defect and the background of the image, the defect detection often results in under-segmentation or over-segmentation. To solve this problem, we propose a new defect extraction method by calculating the maximum mutual information of intersecting features. Firstly, we construct a new two-dimensional histogram according to the defect features. The new histogram is called Gray Level and Local Spatial Difference histogram (GLSD), which is constructed by grayscale and the improved local gray difference with the spatial relationship. Secondly, considering the geometric distribution of high-probability background events, we improve the segmentation shape of the background event distribution and divide the GLSD histogram preliminary. Finally, we calculate the maximum mutual information of the intersecting feature between the defect and the background. At this point, the boundary of the intersecting feature interval of the GLSD histogram is determined. To verify the effectiveness of the proposed method, we used two sets of databases for performance evaluation. The experimental results show that the proposed method is suitable for non-obvious defect detection under the local uniform background. Meanwhile, it can improve the sensitivity, speciﬁcity, and accuracy of defect detection compared with the classical threshold segmentation methods.


I. INTRODUCTION
Image segmentation is one of the simple, effective, and common methods in defect visual detection, which can separate the defect from the background into nonoverlapping, homogeneous regions. Based on the extracted defect features, intuitive segmentation methods are generated (e.g., threshold [1], edge contour [2], [3], matching [4], clustering [5]). Besides, recondite segmentation methods based on neural network [6], intensity estimation [7], and deep learning [8], etc., have been widely applied in recent years. However, these methods require large amounts of data to provide prior knowledge. In contrast, the threshold segmentation method has a prominent advantage in terms of time complexity. Threshold segmentation method can quickly and quantitatively analyze the features of the target to determine the threshold. Image threshold segmentation includes bi-level thresholding The associate editor coordinating the review of this manuscript and approving it for publication was Tomasz Trzcinski. segmentation and multi-level thresholding segmentation [9]. And bi-level thresholding segmentation is suitable for defect detection.
The core of threshold segmentation is to determine the appropriate target features. Gray level, mean value [10], variance [11], entropy [12], [13], and other grayscale related information [14] have performed well in recent studies. Since the defects are usually local rather than global, the local features of images are gradually concerned. The local gray mean is a grayscale linear smoothing method that reduces noise and boundary interference. Local variance reflects the dispersion degree of the local grayscale. Local entropy reflects the aggregation of local grayscale distribution. Local gradient-magnitude in [14] and gradientdirection in [15], [16] refer to the intensity and direction of the edge, respectively. These local features can eliminate the influence of the overall nonuniformity and reflect the local gray information related to the pixel spatial location.
Threshold segmentation usually divides the image into two distinct regions (i.e., the target region and the background region). However, in the defect detection for industrial production, the feature of defects cannot be accurately separated from the background. At this point, the precise threshold segmentation will result in over-segmentation or under-segmentation. Thus, the authors of [17] proposed the concept of fuzzy. In [18], [19], the fuzzy set is segmented through S or Z membership functions with a membership degree of 0.5. Parameterization greatly limits the universality of the segmentation method based on the fuzzy set [20]. It can be seen that the fuzzy concept can be utilized to explain the intersecting feature interval, but its fuzzy membership function cannot be used to describe the subset of the intersecting feature interval. The fuzzy membership function is not fully applicable to defect detection, which is reflected in the unexplainable forced segmentation of the coupling region between the defect and background. The primary idea of this paper is not to classify the features of defects and background to the maximum extent but to locate the intersecting feature interval based on the fuzzy concept by using relevant information theory.
It is necessary to extract the feature threshold adaptively in defect detection under a complex background environment [21]. Feature statistic based on Histogram is the most widely used threshold extraction tool. Generally, the histogram is utilized to calculate the distribution of gray values without additional instructions. In the histogram, we classify the image components based on different effective features. Larger inter-class differences and smaller intra-class differences between defects and backgrounds help to determine accurate thresholds. The improvement of the two-dimensional histogram is to add a feature dimension [8], which changes from plane feature statistics to stereo feature statistics. Histogram increases feature statistics as well as the segmentation and reliability of the defect and the background. We also utilize the concept of the 2D histogram in our algorithm proposed in this paper. To sum up, the selection of the second feature and the adaptive threshold calculation method play important roles in the application of a 2D histogram in defect segmentation [22].
The previous adaptive threshold calculation methods based on 2D histogram include OTSU [23], [24], maximum entropy [10], some derived entropy methods (e.g., Renyi entropy [25], [26], Tsallis-Havrda-Charvat entropy [27], [28], K-L divergence [29], relative entropy [30] and Tsallis entropy [31]), and some entropy methods with additional conditions (the fuzzy set in [18] and weights in [32] were added to the entropy as additional constraints to calculate the maximum entropy threshold). It is worth mentioning that the interclass algorithm is not suitable for detection when the defect is out of proportion to the background, such as the OTSU algorithm. However, entropy can represent the uncertainty of random variables. Maximum entropy (including its expansion) threshold segmentation can maximize the probability distribution uniformity of defects and backgrounds. Among the two-dimensional histogram threshold calculation methods, entropy is one of the most commonly used methods that can achieve no-parameter segmentation. Mutual information is a measurement of information that belongs to entropy. Previously, mutual information is widely used in machine learning to test the similarity between the real data and the predicted data [33], which is suitable for the definition of intersecting feature information. We reference the idea of different pixels of an object are in the same class in [34] and transform it to the same pixel in different classes. Mutual information can represent the correlation degree of two random variables. When the mutual information is maximum, the probability distribution of two sets has the maximum correlation.
To solve the problem of over-segmentation or undersegmentation caused by intersecting features, a new defect detection method based on mutual information is proposed. The main contributions of this paper are the following four points. (a) For the non-obvious defect features, we improve the local gray difference with the spatial relationship. Then we use it to construct a new two-dimensional histogram, which named Gray Level and Local Spatial Difference histogram (GLSD). (b) Since the traditional two-dimensional histogram segmentation shape does not meet the actual characteristics, we change the segmentation shape of the highprobability background event in GLSD. (c) Since there is an intersecting feature interval between the background and the defect, an accurate segmentation boundary cannot be determined. Therefore, we use the measurement of the uncertainty of mutual information to locate the intersecting feature interval for segmentation. To the best of our knowledge, there are no studies that use mutual information to locate the intersecting feature interval. (d) Besides, to verify the effectiveness of the proposed algorithm, we publish a set of surface image datasets for motorcycle wheels, which has certain representativeness in the field of metal surface defect detection. Experimental evaluation demonstrates the effectiveness of the proposed method in detecting non-obvious defects, compared to current state-of-the-art domain comparison methods.
This paper is organized as follows. In section II, we developed an additional feature and used it to construct a new histogram (GLSD). In section III, we study the distribution of the defect feature and locate the intersecting feature interval. In section IV, we carry out experiments on two sets of databases to verify the accuracy of the proposed algorithm. Finally, we conclude this paper in section VI.

II. GRAY LEVEL AND LOCAL SPATIAL DIFFERENCE HISTOGRAM
Based on the 2D histogram, we make a statistical analysis of the extended features. And we summarize it as the local mean value [10], the local variance [11], the local entropy [12], VOLUME 8, 2020 and the local gray-level spatial correlation [32]. These are all about the features of the grayscale in the mask. In this paper, the second dimension of the histogram is generated based on the physical characteristics of the defect. In general, defects are minute pits on a locally uniform surface that causes the defect component to be the slightly darker block of the image. The commonality of defects has nothing to do with the area, shape, or texture, but with the difference in representation by comparing the background. The significant difference between the defect and its neighborhood (background) is related to the grayscale. Therefore, the primary purpose of this paper is to extract the grayscale difference between the defect and the background.

A. LOCAL SPATIAL DIFFERENCE
To enhance the difference between the defect and the background, we use the local gray difference. First of all, we introduce the calculation method of the local gray difference. At the macro-level, it aims to enhance the differences between the images by using a filter to eliminate defects and subtract them from the original image (i.e., the defect regions). At the micro-level, it refers to the difference between the gray value of the pixel and the gray correlation value of its neighborhood (e.g., local gray mean). In this paper, the gray correlation value of its neighborhood is the local gray mean, which is also used as the minuend. It is regarded as an averaging smoothing filter, which is a linear smoothing with the gray values of the input image. There are no defects in the smoothed image. Subtracting the original image can subtract the background and preserve the defects. Therefore, the local gray difference can represent the difference between the defect and the background. Although the local gray difference can suppress the influence of global non-uniformity, it cannot avoid the influence of noise on small defects. To solve this problem, this paper improves the calculation method of the local gray difference by substituting the spatial weight coefficient α. In the high-resolution image, we first select the mask to cover the defect and calculate the gray difference of each pixel. Then, we compare several cases of the pixel gray difference and its neighborhood gray difference. Finally, we get the spatial weight coefficient α by combining the actual situation, to make the corresponding adjustment to the local gray difference.

1) LOCAL GRAY DIFFERENCE
Let the digital image size be M × N , the gray level range be [0,L], and the gray level of the pixel (x, y) be g (x, y) ∈ [0,L]. We take either pixel (x, y) as the center, and the neighborhood is (x, y). The size m × n of the neighborhood (x, y) is chosen on the basis that no matter how big a defect can be overridden (i.e., mask size). Then we calculate the average gray value A v (x, y) of the neighborhood (x, y) as the local gray mean. The local gray difference d (x, y) between the pixel (x, y) and its neighborhood (x, y) is calculated as follows: (1) is the grayscale mean of the neighborhood (x, y), which is expressed as follows: To prevent an overflow or underflow, we normalize d (x, y) to d n (x, y) ∈ [0, L ]. Since d n (x, y) is a variable related to the grayscale, we consider that the range [0,L ] of the local gray difference d n (x, y) is the same as the grayscale level [0,L] (i.e., L = L).
At this point, we can obtain d n (x, y) after d(x, y) of all pixels are calculated. Where d max and d min are the maximum and the minimum value of d(x, y), respectively.

2) LOCAL GRAY DIFFERENCE WITH SPATIAL RELATIONSHIP
The local difference is determined not only by the current pixel but also by the surrounding pixel. Therefore, we improve the local gray difference d n (x, y) to d w (x, y) with spatial weight. The improved local gray difference d w (x, y) of the pixel (x, y) uses the local gray difference d n (x, y) as the benchmark and refers to the maximum gray difference d n max (x, y) and the average gray difference d n mean (x, y) of its 8 neighbors. The local gray difference set of 8 neighbors is {D 8 (x, y)}, and its maximum value and average value are expressed as: Suppose the comparison value ω is the maximum of the gray difference d n (x, y) for all pixels. The local gray difference d n (x, y) of the central pixel and the local gray difference set {D 8 (x, y)} of its 8-neighborhoods are compared with the comparison value ω. And there could be four possible scenarios: (a) d n (x, y) and most subsets of {D 8 (x, y)} are less than the comparison value ω, i.e. d n (x, y) and d n mean (x, y) are less than the comparison value ω. Then the pixel (x, y) does not belong to the defects, and d n (x, y) remains unchanged. (b) Only d n (x, y) is greater than the comparison value ω, i.e. d n (x, y) is greater than the comparison value ω, but d n max (x, y) and d n mean (x, y) are less than the comparison value ω. Then the pixel (x, y) is noise with d n (x, y) needs to be suppressed. (c) d n (x, y) and a few subsets of {D 8 (x, y)} are greater than the comparison value ω, i.e. d n (x, y) and d n max (x, y) are greater than the comparison value ω, but d n mean (x, y) is less than the comparison value ω. Then the pixel (x, y) is likely to be an edge or noise (defect uncertainty), and d n (x, y) remains unchanged. (d) Most of the set {D 8 (x, y)} is greater than the comparison value ω, i.e. d mean (x, y) is greater than the comparison value ω, no matter the values of d n (x, y) and d n max (x, y) are. Then the pixel (x, y) belongs to the defects, and d n (x, y) needs to be enhanced. According to the preliminary judgment above, it is necessary to adjust d n (x, y) by the spatial weight coefficient α. The spatial weight coefficient α is not set artificially but adjusts robustly according to its value of d n (x, y). In this way, the statistical curve does not deform into a particular shape (e.g., compression or stretching), and remains the original statistical shape. The corresponding adjustment with the spatial weight coefficient α is d w (x, y): where d w n (x, y) is the normalization of the stretch interval according to the comparison value ω, the expression is: According to the four situations previously analyzed, we make a simplification and summary. The value of the spatial weight factor α is determined by d n (x, y). And then d w (x, y) decides to keep constant, enhance, or suppress. The value of the spatial weight factor α is as follows: The main functions of the local spatial difference d w (x, y) are as follows. (a) When the pixel (x, y) is preliminarily judged as noise, its gray difference will be suppressed, (b) when the pixel (x, y) is preliminarily judged to be a defect, the gray difference will be amplified, (c) in other or uncertain cases, the gray difference remains the same. The improved local gray difference can extend the interval where the defect is located. And it can also improve the segmentation accuracy, which will prove in the experimental section. Intuitively, the local gray difference is mainly determined by the current pixel gray value, while the local spatial difference considers the local grayscale difference distribution.

B. CONSTRUCT TWO-DIMENSIONAL HISTOGRAM
The two-dimensional histogram in this paper is established by the gray dimension (horizontal axes) and the local spatial difference dimension (vertical axes), which is called Gray Level and Local Spatial Difference histogram (GLSD). The 2D histogram planform approximates to a matrix. The feature dimensions are the determinants of the matrix, and the statistics are the values of the matrix. We first analyze the statistics under the one-dimensional histogram, respectively.
For example, the frequency of the gray level ∈ [0,L] in the gray level histogram is the probability p( ). The gray level probability expression p g ( ) of the gray level value ∈ [0,L] and the local spatial difference probability expression p w ( ) of the local spatial difference value ∈ [0,L ] are as follows: where h g ( ) is the statistical number of pixels with grayscale , and h w ( ) is the statistical number of pixels with the local spatial difference . Therefore, in the GLSD histogram, the probability expression p( , ) of the gray level and the local spatial difference is: where h( , ) is the statistical number of pixels with the grayscale and the local spatial difference . It is the result of two feature dimensions. The two-dimensional histogram is often used to analyze the distribution of targets and backgrounds. This paper considers the defect as the target. In this section, we improve the segmentation shape, which replaces the classical rectangle segmentation shape by the elliptical projection shape. We use our original data (Porosity Defect-1) to verify the authenticity of the defect distribution and the background distribution, as shown in Fig. 1. Firstly, we use the ground truth to calculate the defect distribution and the background distribution of the gray level and the gray difference level, respectively, as shown in Fig. 1 (a). The defect distribution and the background distribution of the two-dimensional level, as shown in Fig. 1 (b). Previous scholars divide the target and background into two groups by a threshold * , * : defect set A, and background set B, as shown in Fig. 2(a). However, in Fig. 1 (a), we can see the intersection between the distribution of defects and background. At this time, it is not guaranteed that the features of the two dimensions can completely separate the target and background, and there may even be an intersecting feature interval, as shown in Fig. 1(c-1). Therefore, we preliminarily improve the segmentation model, as shown in Fig. 2 (b). The intersecting feature interval is called the intersecting feature set C. In this set, the feature values are extracted from the defect exist in the background interval, thus it is difficult to completely separate them with an accurate threshold.
The distribution of high-probability target events and highprobability background events will have double or even multiple peaks. At this point, there are significant differences in the distance, density, or other geometric features between peaks. However, defects are usually small probability events, which are characterized by a small amount of data and a relatively scattered distribution, as shown in Fig. 1 (c-1).  There is only one peak when the background is single (formed by the background), showing a large amount of data and a relatively concentrated distribution, as shown in Fig. 1 (c-2). Specifically, there is only one peak of the complanate quasi-Gaussian distribution of the two one-dimensional histograms in Fig. 1 (a-2), and there is also only one peak of the stereoscopic quasi-Gaussian distribution of the GLSD histogram in Fig. 1 (b-2). Since the Gaussian fitting peak width of the background distribution in two dimensions is different, the elliptical projection shape will be generated. Meanwhile, if the Gaussian fitting peak width is the same, the projection shape is the positive circle. We take the cross-section of the slightly elevated X-Y plane and the background distribution to further prove that the projection shape of the stereo quasi-Gaussian distribution onto the twodimensional histogram is elliptical, as shown in Fig. 1(c-2). Previously, the classification using a rectangular projection is inconsistent with the actual stereoscopic quasi-Gaussian distribution or other shapes [35]. Therefore, we divide the background high-probability events into an elliptical set B, as shown in Fig. 2(c). And the axis of the ellipse in the two-dimensional histogram is located in the parallel direction to the diagonal. Since the data of low-probability defect events are few and scattered, there is no obvious shape of the two-dimensional histogram. Therefore, the defect is still divided into a rectangular set A in this paper. And the intersecting feature between the defect and background is the intersecting set C.
The common two-dimensional histogram is usually used to eliminate noise and edge interference, and then to maximize the segmentation of the target and background. However, when there has an intersecting feature between the target and the background, it cannot be precisely separated. In this paper, the two-dimensional histogram is obtained on the premise of the improved gray difference, which can eliminate the noise and enhance the defect features already. Therefore, the purpose of the twodimensional histogram proposed in this paper is to locate the intersecting feature interval between the defect and the background.

III. THRESHOLD WITH MAXIMUM MUTUAL INFORMATION
The role of entropy in information theory is to measure the uncertainty of information. The maximum entropy usually calculates the segmentation threshold between the target and the background in image processing. When there is a correlation between the target and the background, we are required to introduce additional information to measure the intersection interval. In this paper, the intersection interval is the intersecting feature set C, as shown in Fig. 2(c). The significance of intersecting feature set C for segmentation is as follows: when a single threshold cannot completely segment the defect set and the background set, we need to locate the interval of the defect set and the background set respectively, that is, we need to focus on locating the fuzzy intersecting feature interval.

A. PROBABILITY DISTRIBUTION AND MUTUAL INFORMATION
Mutual information, as an expansion of entropy, is also a measure of information about information theory. Mutual information represents the degree of interdependence between two variables. Mutual information is a measure of correlation, i.e., the information that one random variable contains in another random variable. The greater the value of mutual information, the more relevant the information between the two variables. In this paper, the actual significance of mutual information lies in the extent to which the intersecting feature set C belongs to both defect A and background B. When the mutual information of set C is maximized, it indicates that the feature intersection of defects and background in set C is maximized. This also indirectly explains the necessity of determining the interval of set C.
The mutual information of two discrete random variables A and B can be defined as follows: where p(a, b) is the joint probability distribution function of random variables A and B. And p (a) , p(b) are the marginal probability distribution of random variables A and B, respectively. Mutual information is the relative entropy of the joint distribution p(a, b) and product distribution p (a) * p(b) of two discrete random variables A and B. Since the histogram is two dimensions, in this paper, random events are determined by two features, e.g., the subset ( A , A ). The subset ( A , A ) of the defect set A is located in the rectangle, as showed in Fig. 2(c), and its probability is: where: Given that the subset ( B , B ) of the background set B is located in the ellipse. We segment the background set B by an ellipse [36]. The center of the ellipse is located at the maximum value P( p , p ) of the 2D histogram. The axial direction θ of the ellipse is parallel to the diagonal line between the point (0, L ) and the point (L, 0) of the 2D histogram. Since L is equal to L in this paper, the direction θ of the ellipse axis is π /4. This results in that the interval of the background set B is only determined by the long semi-axis r a and the short semi-axis r b . The constraint on the background subset ( B , B ) is: And the probability of subset ( B , B ) is: where: The subset ( A∩B , A∩B ) of the intersecting feature set C is the intersection of the defect set A and the background set B. Since mutual information is the interdependence of two hypothetical independent random variables, the probability distribution of set C is determined by hypothetical independent events A and B. Therefore, the joint probability distribution of A and B is as follows: where: We substitute the above probability expressions into Eq. (12), and obtain the solution of mutual information in the twodimensional histogram as follows: where Q is as follows: VOLUME 8, 2020

B. INTERSECTING FEATURE INTERVAL
The larger the mutual information value of defect set A and background set B, the larger the information of the intersecting feature set C. In this paper, it is equivalent to the higher mixing degree of defect feature and background feature. The purpose of maximum mutual information segmentation is not to obtain the accurate threshold value between the target and the background, but to locate the boundary of the intersecting feature set C through the set A and set B. On the premise of minimizing the number of missed detection and false detection, we calculate the optimal threshold vector * , * , r a , r b through the maximum mutual information: * , * , r a , where the threshold * ∈ [0, argmax[h g ( )]] on the grayscale axis and the threshold * ∈ [argmax [h w ( )] , L ] on the gray difference axis. And the image uses the threshold vector * , * , r a , r b can be segmented as: where the set {A − C} is the defined defect region, while the set {C} cannot be certainly identified as the defect region. This is similar to positioning target regions and retaining target fuzzy boundaries(regions) [19]. Although the set {C} is not identified as a defect region, it can be used as a supplement to the defined defect set A. The set {B − C} and otherwise are both belong to the background region (including interference).

IV. EXPERIMENT RESULTS
In this chapter, we use two databases to evaluate the performance of the proposed method. The first database is the set of images of motorcycle wheel surfaces (MWS database), which we have published for the first time and have added to the supplementary materials. The high-quality images scanned by a wire-array camera, as shown in Fig. 3. Fig. 3 shows part of the original data and their ground truth (i.e., binary diagram with white defect regions). It should be noted that only the marked regions are defects. The white vertical stripes on the left or right of the image represent other reflective surfaces (non-defective). Other slightly larger black strips are drawn by mark pen (non-defective). The image has a precision of 0.6(mm/pix). The original data contain two defect types, namely, Porosity and Gas Porosity (for each defect type, there are two sets of images). Meanwhile, the defect features of the original data are non-obvious, which is helpful to evaluate the performance of the algorithm. We make statistics on the defect gray distribution, the defect gray difference distribution, the background gray distribution, and the background gray difference distribution of all original images.
Since the disproportionate area of defects and backgrounds, we normalize their distribution, as shown in Table 1. Each distribution has an intersection of the defect and background. The second database is the NEU surface defect database [37]. This public database is steel surface defects, some of the images have heterogeneous defects. we selected five defect types with comparable features to be tested. The defects of the selected image in the NEU database and the image in the MWS database have an intersecting feature. All these characteristics make it far more challenging for surface defect detection.
The detection evaluation indexes include the sensitivity, specificity, and accuracy are defined in Eq. (24), Eq. (25),  and Eq. (26) to evaluate the experimental results of the defect detection. Where the number of pixels with true-positive (correct defect detected) defect detection is represented as TPs, the number of pixels with false-positive (error defect detected) defect detection is represented as FPs, the number of pixels with true-negative (correct non-defect detected) non-defect detection is represented as TNs, the number of pixels with false-negative (error non-defect detected) nondefect detection is represented as FNs. We compare the local spatial difference with traditional features, such as local mean, local variance, local entropy, local gray-level spatial correlation. Fig. 4. Shows the features of Porosity Defect-1 in the MWS database. The experimental results show that the main function of the local mean is to eliminate noise, rather than to highlight the feature of the defects. The local variance can only extract the region with large local grayscale deviation. Local entropy is concerned with the degree of aggregation of gray distribution. The local gray-level spatial correlation reflects the degree of local grayscale similarity. The above features are inconsistent with the actual defect representation: (a) The defect area is much smaller than the image area, (b) the defect compares with the background, it varies in gray level, (c) the defect has no obvious step edge, compared with its neighborhood. The local spatial difference meets the requirements of defect detection. We also utilize the maximum mutual information method proposed in this paper to segment the 2D histogram, and the segmentation results are shown in Fig. 4. It is concluded that the extracted defect areas using the local gray difference are more accurate and complete. VOLUME 8, 2020 FIGURE 5. The contrast of local gray difference and local spatial difference of the original data (Porosity defect-1). (a) Their one-dimensional distribution, (b) local gray difference segmentation results, and (c) local spatial difference segmentation results.

FIGURE 6.
In the seven methods, the original data (Porosity defect-1) uses rectangular projection shape and elliptic projection shape, respectively (the red regions are the defects after morphological selection). (a) 2DOTSU [23], (b) 2DME [10], (c) 2DMME [38], (d) 2DMTE [28], (e) 2DMRE [30], (f) MMSE [40], (g) the proposed method (2DMMI). Fig. 5 shows the comparison results of the local gray difference and the local spatial difference. The results show that the local spatial difference disperses the concentration degree of the background in the intersection interval. And the local spatial difference can slightly extend the distance between the background and the defect without changing the basic shape of the statistical distribution. This approach increases the separability of defects and backgrounds. We also segment the images according to the proposed maximum mutual information method. The results show that the local spatial difference can significantly reduce the noise and improve the integrity of the detected defects. Fig. 1(a) shows that the one-dimensional distribution of Porosity Defect-1 in the MWS database is approximately Gaussian. To accurately verify the influence of projection shape on image segmentation results, we select seven non-parametric threshold segmentation methods for auxiliary analysis, including 2D-OTSU [23], 2D maximum entropy (2DME) [10], 2D maximum the minimum entropy (2DMME) [38], 2D maximum Tsallis-Havrda-Charvát entropy (2DMTE) [28] with a parameter of 0.1 [39], 2D minimizing relative entropy (2DMRE) [30], maximum Masi entropy (MMSE) [40], and our 2D maximum mutual information method (2DMMI). Fig. 6 shows the segmentation results (Porosity Defect-1) of 7 methods under the rectangular projection and the elliptical projection, respectively. The detailed performance evaluation results are shown in Table 2. It can be seen that the boundary between the defect and the background under the elliptic projection shape is clearer. In most methods, the segmentation accuracy of the elliptical projection shape is better than that of the rectangular projection shape. Moreover, the large difference in sensitivity indicates that the detected defect shape under the elliptical projection shape is more consistent with the actual defect shape, and the ellipse projection shape decreases the influence of interference on defect extraction (i.e., the number of false-positives). Therefore, in the subsequent experiments of this paper, the GLSD histogram uses the elliptical projection shape of high-probability background events.

C. PERFORMANCE EVALUATION
We demonstrate and quantitatively analyze the results of the MWS database. Based on the elliptical projection shape, we use 2D-OTSU, 2DME, 2DMME, 2DMTE, 2DMRE, MMSE, Lei's minimum square rough entropy method (MSRE) [41] and our 2DMMI in GLSD histogram in MWS  [23], (c) 2DME [10], (d) 2DMME [38], (e) 2DMTE [28], (f) 2DMRE [30], (g) MMSE [40], (h) MSRE [41], (i) the proposed method. VOLUME 8, 2020  [23], (c) 2DME [10], (d) 2DMME [38], (e) 2DMTE [28], (f) 2DMRE [30], (g) MMSE [40], (h) MSRE [41], (i) the proposed method. database for a more detailed and complete comparison experiment, as shown in Fig. 7. And we use morphology to further extract the defect region and mark it in red, as shown in Fig. 7. Finally, the experimental results demonstrate that: (a) Due to the area of the background is much larger than that of the target, 2D-OTSU is not effective in defect detection. (b) 2DME could not extract the defect regions with the non-obvious feature (e.g., the overlap between defect periphery and background), and the integrity of the extracted defect is poor. (c) 2DMME has numerous false-negative, which is gross negligence in defect detection. (d) 2DMTE has many falsepositives in some images and is inseparable from the defect region. (e) 2DMRE has many false-negative, but few falsepositive. (f) The parameter r of MMSE method has a great impact on the results, and we use the parameter of 0.5 with the best effect after many experiments. The experimental results show that MMSE can well extract the defects with obvious features, but the detection effect of non-obvious defects is not poor. (g) MSRE can well locate the boundary of obvious features, but the detection effect for non-obvious defects is still not ideal. (h) Our 2D maximum mutual information method is the closest to the real shape and less error detection method.
We evaluate the detection results over the MWS database, as shown in Table 3. The most prominent advantage of the proposed method is to improve the evaluation index for seriously unbalanced defect detection. The results show that the sensitivity, specificity, and accuracy of the proposed method are better than other comparison methods. In particular, the high sensitivity proves that the proposed method can significantly improve the detection of nonobvious defects features. However, there are many porosities with extremely small area around a large area of defects in Gas Porosity defect-1. After segmentation by using the proposed method, some true-positives (isolated points) misjudge as false-negative in further morphological selection. For sensitivity, the core problem lies in the false-negative. Although the specificity and accuracy indexes are not as good as 2DMME, they should be compared under the premise of having a good sensitivity index. The reason is that the problem of false-negative is often more serious than the problem of false-positive. At this point, we preliminarily verify the effectiveness of the proposed method for detecting the intersecting feature defects.
Although we have verified the proposed method on the MWS database, it is not enough to demonstrate the universality of the method to other related surface detection. Therefore, we use five defect images (the defect types are crazing, inclusion, patches, pitted surface, and rolled-in scale) in the NEU surface defect database to verify the proposed method,  as shown in Fig. 8. Similarly, we use the detection evaluation indexes (sensitivity, specificity, and accuracy) to evaluate the proposed method and the seven comparison methods, as shown in Table 4. We utilize the NEU surface defect database to get the same conclusion as the MWS database.
The experimental results show that the proposed method has certain applicability to other data. In the figure, the proposed method can extract more complete defects (especially the defect edge with non-obvious features) and relatively less interference. According to the quantitative analysis in the table, the advantage of the proposed method is the sensitivity (i.e., the minimum number of false-negative). Meanwhile, the proposed method can obtain better values of specificity and accuracy. Nevertheless, the proposed method still has some shortcomings. When the tiny defect is divided into a subset of intersecting feature sets and forms an isolated point, it will not be appointed as a true-positive. In summary, the proposed method is a reliable and effective defect (with the intersecting feature) detection method.

V. CONCLUSION AND DISCUSSION
Image threshold segmentation is a simple and effective method of defect extraction. Based on the two-dimensional histogram, we propose a defect detection method using maximum mutual information. Firstly, we improve the local gray difference with the spatial relationship. The local spatial difference can not only eliminate the noise but also expand the feature interval between the defect and the background to improve the segmentation effect. We construct a new two-dimensional histogram with the gray-level and the local spatial difference level, which is called Gray Level and Local Spatial Difference histogram (GLSD). Secondly, we improved the segmentation projection shape of the GLSD histogram. The segmentation projection shape is established based on the geometric projection of highprobability background events, which is more suitable for the actual data distribution. Thirdly, we use mutual information to measure the degree of correlation between the two random variables. When the information of the intersecting feature interval between the defect and the background reaches the maximum, the uncertainty of the intersecting feature interval is the largest and the mixing degree is the highest. Thus, the corresponding eigenvector can determine the boundary of the segmentation interval. Finally, we publish a data set of images on the surface of the motorcycle wheel (MWS database) to verify the effectiveness of the proposed method.
The proposed algorithm can effectively reduce the influence of the difference between low-probability defect events and high-probability background events, and solve the problem of over-segmentation or under-segmentation caused by intersecting features. The common robust segmentation methods are used for target detection with a large background. The proposed method is qualitatively and quantitatively evaluated using the defect images from the MWS database and NEU surface defect database. The results show the superiority of the proposed method compared with those of the state-of-theart methods.
In the field of image defect detection, Non-obvious defects are always ignored. With the rapid development of visual detection technology, a large number of researchers are committed to improving the segmentation threshold accuracy between background and target. However, the feature of the background and the target usually intersect. In other words, this is an intersecting feature interval that cannot be explicitly defined. We consider that this interval is not negligible, and it helps to improve the segmentation accuracy of dividing the target region from the background. To our knowledge, in the field of image defect detection, few studies clearly define the intersecting feature interval and extract value information from it. We deeply explore the feature distribution of the image from the perspective of information theory. The results show that the intersecting feature interval is an effective entry point to improve image defect detection accuracy.
Two interesting topics for future research are: (a) to focus on the defect components of the intersecting feature interval and reduce the uncertainty of intersecting feature interval, (b) to explore the influence of defect types on the segmentation projection shape of GLSD to improve the universality of the algorithm.