A Method for Wafer Defect Detection Using Spatial Feature Points Guided Affine Iterative Closest Point Algorithm

In integrated circuit manufacturing industry, in order to meet the high demand of electronic products, wafers are designed to be smaller and smaller, which makes automatic wafer defect detection a great challenge. The existing wafer defect detection methods are mainly based on the precise segmentation of one single wafer, which relies on high-cost and complicated hardware instruments. The segmentation performance obtained is unstable because there are too many limitations brought by hardware implementations such as the camera location, the light source location, and the product location. To address this problem, in this paper, we propose a method for wafer defect detection. This novel method includes two phases, namely wafer segmentation and defect detection. In wafer segmentation phase, the target wafer image is segmented based on the affine iterative closest algorithm with spatial feature points guided (AICP-FP). In wafer defect detection phase, with the inherent characteristics of wafers, a simple and effective algorithm based on machine vision is proposed. The simulations demonstrate that, with these two phases, the higher accuracy and higher speed of wafer defect detection can be achieved at the same time. For real industrial system, this novel method can satisfy the real-time detection requirements of automatic production line.


I. INTRODUCTION
With the rapid development of semiconductor, the demand of the basic element, wafers, is growing higher and higher. Also, the quality standards of wafer is becoming stricter. There are lots of complex procedures during the manufacturing process, so it is highly possible that wafers get contaminated in the assembly line. Therefore, it is necessary to recognize the defect pattern for finding out the abnormal sources in the manufacturing process [1]. According to the research conducted by A. Freeman [2], the accuracy of human-expert based detection method is less than 45%. What's worse, the final product contains a large number of single wafer, as shown in Figure 1, making detection much more difficult.
The associate editor coordinating the review of this manuscript and approving it for publication was Sudipta Roy .  The location of the spot is unknown and the angle of the scratch is uncertain and the example wafers are shown in Figure 2.
As the research hotspot, many studies on automatic defect detection have been carried out for decades and many useful methods have been proposed since then. For example, Chang et al. [3] proposed an unsupervised self-organizing neural network for automatic detection. Ng [4] revised and improved the Otsu method on common defect detection applications. Some statistic-based methods are also applied to wafer defect detection. Hwang and Kuob [5], Yuan and Kuo [6] and Wang et al. [7] proposed different probabilistic models to describe wafer defect patterns respectively. Yuan et al. [8] proposed a particle filter re-detection method via correlation filters and Tsai and Luo [9] proposed a mean shift-based method for solar wafer surface defect detection.
In recent years, the focus of wafer defect detection has been shifted to defects classification. Wang [14] propsed a spectral clustering method to recognize the defect patterns. In [15], a set of novel-rotation and scale-invariant features is presented for obtaining a reduced representation of wafer maps for future failure pattern recognition (WMFPR). Fan et al. [16] proposed a detection method based on OPTICS and multi-label classification. Bourgeat and Meriaudeau [17] used the digital holography technology and Gabor filter to extract wafer features for defect recognition. Liu et al. [18] did the research on spectral subtraction method for defect classification.
The methods mentioned above perform well in wafer defect detection, however, these methods have a common premise that the target wafer image must be a perfect match with the template wafer image, which means the wafer segmentation must be precise. In order to get the precise match, most of existing conventional methods rely on sophisticated hardware instruments, which definitely complicate the production chains and increase the costs. What's worse, the segmentation performance obtained by hardware instruments is uncertain, cannot ensure the precise match and seriously affects the following wafer defect detection phase. Therefore, proposing a high-precision and low-cost wafer segmentation method is the primary task.
As for the wafer segmentation methods, there are many relevant researches in recent years and the existing methods can be divided into two main categories: methods based on learning and methods based on feature matching/registration. For learning methods, deep learning such as convolutional networks is applied [11]- [13] to capture spatial difference for segmentation in different kinds of images or videos. Learning-based methods have achieved good performance in segmentation field, however, there are some limitations. This kind of methods requires a large number of samples to learn their features, which is unrealistic in wafer segmentation case. What's worse, the wafer sample is changing all the time and learning-based method is unable to adapt to new samples. In addition, learning-based methods are utilized to learn statistical information instead of learning details of images, which means precision images are difficult to obtain through this kind of methods. Therefore, matching/registration methods are more suitable in this case.
As for matching/registration methods, there are also many existing researches. In particular, Ou et al. [10] proposed a patch-based visual method for with object selection/segmentation. Bourgeat et al. [19], [20] proposed a segmentation algorithm which is suitable for semiconductor wafer images generated by optical inspection tools. Zhen et al. [21] directly used Hough line detection method to detect wafer border and then locate the wafer based on machine learning. Pan et al. [22] proposed a binary segmentation method for locating the center line of wafer images based on the density histograms. Although these method are able to locate the wafer cutting line, the segmentation accuracy is greatly influenced by the image quality, which still cannot satisfy the requirements of defect detection. Besides, Pugazhenthi and Singhai [23] proposed an image segmentation algorithm based on k-means clustering algorithm, which can automatically cluster the centroids. However, the influence of the wafer image gray scale caused by wafer defects makes this algorithm unstable for complex wafer centroid detection. Li et al. [24] proposed a LSD [25] line detection algorithm for automatic checkerboard corner extraction. Although LSD line detection algorithm is generally more accurate than Hough line detection algorithm, the results are not as good as expected because the detection performance is greatly influenced by the image quality and border defects.
It can be concluded from above discussion that the segmentaion performance can be easily influenced by the wafer image quality. With unstable segmentation process, it is impossible to get expected results on the following defect detection step.
Aiming at the problems mentioned above, in this paper, we design a method combining high-precision wafer segmentation algorithm and strong real-time defect detection algorithm. In the wafer segmentation phase, we first extract spatial feature points of wafer images as guidance and then use the affine iterative closest point algorithm (AICP) for precise matching. This method, which we refer as AICP-FP, can extract important spatial feature points in wafer images and use these feature points for matching. Experiments illustrate that our proposed AICP-FP algorithm outperforms five other segmentation method. In the defect detection phase, a simple and effective detection method is designed for defect pattern recognition. This method, utilizing the inherent characteristics of the wafer image, can accomplish this detection task in a short notice. Compared with previous studies, this method greatly improves the segmentation accuracy and the real-time performance of defect detection, which can satisfy the requirements of realistic production. Furthermore, our method can reduce the cost of the whole detection system, and has a good prospect of industrial application.
The rest of the paper is organized as followers: Section 2 provides a detailed discussion of the wafer segmentation phase and defect detection phase employed in this method. Experiments and the related analysis are shown in Section 3. Finally, conclusions are drawn in Section 4.

II. PROPOSED METHOD
According to the discussion above, the schematic of this method is shown in Figure 3. This method includes two main phases: wafer segmentation and wafer defect detection. After we obtain the product wafer, the first step is to cut out the single wafer as is shown in Figure 3, then the second step is to recognize the defect pattern of the defective wafer.

A. WAFER SEGMENTATION
In wafer segmentation phase, first of all, wafer images are pre-processed using Canny edge operator and Hough transform to extract point cloud sets for registration. Also, Harris corner detection is used to extract the spatial feature points as guidance. Then, we apply the affine iterative closest point algorithm with spatial feature points guided to achieve the optimal registration between the wafer template image and wafer target image. Finally, we manage to obtain the segmented single wafer sample. The flowchart of this phase is shown in Figure 4.

1) POINT CLOUD SET EXTRACTION
We choose Canny edge detector [26] to extract edge points. As a common edge extraction operator, Canny edge operator has several advantages such as it can decrease the image noise by optimal smoothing and connect disconnected edge effectively.
The steps of Canny edge detector are summarized as follows: 1) Apply Gaussian filter to smooth the image in order to remove the noise; 2) Find the intensity gradients of the image; 3) Apply non-maximum suppression to get rid of spurious response to edge detection; 4) Apply double threshold to track the wafer image edges by hysteresis [27]. Some results of the Canny edge detector are shown in Figure 5. However, the precise registration of two point sets is based on the rough alignment of these two point sets. Therefore, appropriate initial value is an important guarantee for establishing the best match of these two point sets. Considering the fixed structures of wafer products and the regularity of product images, we use Hough Line Detection [28], [29] to detect wafer borders and we consider the intersection points J = m i |i = 1, 2, . . . , N j between borders as the initial value of next move.
A straight line equation y = ax+b in Cartesian coordinates can be expressed in Polar coordinates as follows: 79058 VOLUME 8, 2020 where ρ is the vertical distance from the origin to the straight line and θ is the normal direction of this line. The basic principle of Hough transform is to transform the straight line in Cartesian coordinates into Polar coordinates using the duality of points and lines. For each feature point (x, y) in the image, traverse through all the θ, calculate the corresponding ρ and discrete interval ρ i with θ i according to (1), then add one to the counter (θ i , ρ i ). Thus the straight line detection problem becomes a problem of searching for peaks in the parameter space.
In order to reduce time complexity of the algorithm, this paper adopts a method based on edge gradient [30] to simplify the calculation.
The steps of Hough Transform are expressed as follows: 1) Initialize Hough array (θ, ρ) with the value 0; 2) Generate a first order differential edge gradient equation: 3) For each non-zero pixel (x, y), when G(x, y) > T , perform the coordinate transformation ρ = xcosθ + sinθ (where T is a threshold and we let T = 0.7G max (x, y) to ensure the accuracy of edge extraction as well as to reduce the multiplication in line detection in the experiment); 4) Add one to the counter (θ, ρ); 5) Find the maximum point, note its position, and set the Hough array back to 0; 6) Repeat step 5 until find peaks required; 7) Calculate the slope of the line k = (y 2 − y 1 )/(x 2 − x 1 ) according to the corresponding point of each line; 8) Connect the straight lines which share the same slope and the same point and establish a point set of intersections. Hough line detection has stronger anti-noise ability than other methods and it can reduce the impact of wafer border flaws due to its discontinuous insensitivity to the edge. Hough line detection based on edge gradient can effectively reduce the multiplication and accelerate the detection speed.
Finally, we take the border points as point cloud sets for next registration.

Given the point cloud set of wafer target image
, we manage to solve the matching problem based on the shape registration. The similarity between the wafer target image and the wafer template image is measured by the error distance between those two point cloud sets. In another word, we need to find a transformation matrix T which is an optimal match between the wafer target feature point set and the wafer template feature point set. Therefore, we can write the objective based on the least square (LS) criterion as follows: In (2), the correspondence between the wafer target point set and the wafer template point set is and assume the optimal matching relation is The matching process between those two point sets is actually the affine transformation process, which is so called shape registration. First of all, the transformation matrix T needs to be linear transformed, so (2) is written as: In (3), T is decomposed into a full rank matrix and a translation vector, where is a rotation matrix from point set to point set. VOLUME 8, 2020 The traditional affine iterative closet algorithm [31] solves the problem of optimization (3) by iteration. In each iteration, the algorithm consists following two steps: between point set P and Q according to the affine transformation (A k−1 , t k−1 ) at the (k − 1)th step: Repeat the above two steps until the mean square error between the two point sets reaches the minimum or the maximum number of iterations is reached, we obtain the final affine transformation (A k , t k ).
The traditional affine iterative closest point algorithm can achieve fast and efficient registration between two point sets, but in the absence of constraints, the affine registration is easy to fall into local optimum. Based on the strong local feature of wafer images, in addition to avoid falling into local optimum, we reset up the objective function guided by spatial feature points [32]. Harris corner detection [33] is used to extract spatial feature point sets X and Y from the target point set respectively. Then we use the affine iterative closest point algorithm to pre-register these two spatial feature point sets and obtain the corresponding sets X = { x l } N l l=1 and Y = { y l } N l l=1 . We introduce the spatial feature information to (3) to reset up the new objective function, which is shown as follows: where α is the weight to guide the spatial feature points. When α is increasing, the second term in (6) is also increasing. Like traditional affine iterative closest point algorithm, we iteratively solve the constrained optimization problem (6) with two steps. In the first step, according to the affine transformation of the (k − 1)th step, the correspondence between two point sets is established by using (4).
In the second step, the kth transformation (A k , t k ) is calculated according to the correspondence in last step: where α is the weight coefficient of the spatial feature points, x l and y l is the corresponding spatial feature points in the two feature point sets, N p and N l are the number of points in the target point set and the feature point set respectively. The first step of the algorithm can be accomplished by some fast algorithms such as the nearest point search based on Delaunay tessellation [34] or k − d tree [35].
Therefore, the key point is to solve the second step in the iteration. The (7) then becomes: We can derive the objective function F(A, t) with respect to t: Let dF(A, t)/dt = 0, we can get: Take t into (8), we can simplify the F(A, t) and get: Then, the objective function F(A, t) can be written as: 79060 VOLUME 8, 2020 Let N = N p + N l , E = e j N 1 , F = f j N 1 , define e j and f j as follows: The (12) can be simplified as: In order to minimize the objective function F(A), we make dF(A)/dA = 0: and the affine matrix A can be obtained as follows: Take affine matrix A into (10), we can obtain the translation vector t. To evaluate the results of the registration between two point sets, an error function is set up as as follows: In the kth iteration, if the error function reaches the minimum value |ε k − ε k−1 | ≤ δ or the maximum number of iterations is reached, the iteration stops and outputs the final affine transformation (A k , t k ).
The whole process of affine iterative closest point algorithm with spatial feature points guided is summarized as follows: 3) AICP ALGORITHM WITH SPATIAL FEATURE POINTS GUIDED 1) Obtain point cloud set of wafer target image P = . Two spatial feature point sets X = { x l } N l l=1 and Y = { y l } N l l=1 ; 2) Initialize affine matrix A 0 , translation vector t 0 , spatial feature points weight α; 3) Find the correspondence { p i , q c k (i) } between two point sets using (4) based on the affine transformation of the (k − 1)th step; 4) Calculate the kth transformation (A k , t k ) using (10) and (16); 5) Repeat step 3 and step 4 until |ε k − ε k−1 | ≤ δ or the maximum number of iterations is reached. 6) Output the affine matrix A k and the translation vector t k .

Remark 1:
The computational complexity of the AICP-FP algorithm is the same as that of the traditional AICP algorithm. The whole process of the AICP-FP algorithm is similar to the traditional AICP algorithm. Both of them contains two main steps, the first step is to establish the correspondence, the second step is to calculate the transformation. As the transformation is the closed-form solution, the computing time is negligible compared with the first step. So, the whole computing time focus on the time of establishing the correspondence, where the computational complexity is O (N p ln N q ) [36].
Theorem 1: The AICP-FP algorithm converges monotonically to a local minimum with respect to the mean square error from any given initial transformation.
Proof: Given two point sets . The initial affine matrix is A 0 and the initial translation vector is t 0 . At the (k − 1)th step, assume the {A k−1 , t k−1 } is known, the corresponding closest point set { p i , q c k (i) } is computed. Then the mean square error (MSE) is shown as follows: By minimizing (7), we can obtain the kth transformation (A k , t k ). Therefore, the MSE is shown as follows: It is apparent that ε k ≤ e k , otherwise, if A k = A k−1 , t k = t k−1 , namely the algorithm does converges, there is ε k = e k .
Next, assume the {A k , t k } is known, we can compute the corresponding closest point set { p i , q c k+1 (i) } via minimizing (4). Therefore, the MSE is shown as follows: Since the e k+1 is the minimum of (4), there is e k+1 ≤ ε k . Accordingly, repeat the procedures above for any k and we can obtain: According to the Monotonic Sequence Theorem ''Every bounded monotonic sequence of real numbers is convergent'', the AICP-FP algorithm converges monotonically to the local minimum with respect to the mean square error. VOLUME 8, 2020

B. DEFECT DETECTION
In wafer defect detection phase, first of all, we do the subtraction between the target wafer image and the qualified flawless one. Next, we calculate the peak value of the wafer image gray histogram and distinguish the defective wafer from the flawless wafer with a threshold T 1 . After obtaining the defective wafer, we binary it and do the close operation to extract the inherent characteristics. In the end, we calculate the major axis length of the connected domain to distinguish the scratch defect from the spot defect with another threshold T 2 . The flowchart of this phase is shown in Figure 6.

1) SUBTRACTION BETWEEN TWO GRAY-SCALE IMAGES
After obtaining the single wafer image, the first step is to distinguish between the defective wafer and the qualified flawless wafer. Considering the wafer image features, we propose a simple and efficient strategy using the image histogram.
An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value. The equation for calculating the frequency P(b k ) of gray-scale value b k is given by: where M is the number of all the pixels in the image, N k is the number of pixels whose gray scale value are, L is the gray level, normally L = 256. It is obvious that the defective wafer image has certain scale difference in some regions, in order to highlight this difference, we do the subtraction between the target image and the qualified flawless one and the difference will emerge in gray scale histogram. Apparently, if the target image has no defect, the subtraction result will be almost all 'black' because the gray value of the corresponding pixels is similar, which means there is a very high peak frequency in low gray level. According to this, we calculate the peak frequency and set an appropriate threshold T 1 to distinguish defective wafers from flawless ones. If the peak value is greater than the threshold T 1 , it means that the target wafer is flawless, otherwise, the target wafer is defective. Experiments show that the peak frequency of flawless wafer images is greater than 10000 and the peak frequency of defective wafer images is smaller than 3000. Therefore, we set the threshold T 1 = 5000.

2) BINARIZATION USING 2-MODE METHOD
There are two main defect patterns shown in Figure 2: spot and scratch. The most difference between the spot defect and the scratch defect is that the scratch has obvious linear feature. It is natural to distinguish the scratch from the spot using this inherent feature. So, extracting the scratch information has become the crucial step in this defect detection phase.
As the interesting region which contains the defect has a sharp contrast to the image background, the gray histogram shows two peaks which are formed by the interesting region and the background respectively. Based on this, we use the 2-mode method [37] to obtain the binary image. The 2-mode method choose the trough value of the gray histogram as the threshold for image binarization.

3) SCRATCH CONNECTION USING CLOSE OPERATION
However, it is highly possible that the scratch is discontinuous in the binary image because the scratch itself is discontinuous or its distribution of gray scale is uneven. The discontinuity of the scratch can weaken the linear feature and inevitably affect the performance of defect detection. So, we need to connect the scratch to enhance the linear feature. There are several common methods for scratch connection such as Hough transform and fitting line with least square method [38].
However, Hough transform involves space conversion and abundant computation, which takes too much time. As for the fitting line with least square method, there are too many regions in the image that can be detected as lines, which is inappropriate in this case. In order to improve the accuracy and save the computing time, we choose to use close operation [37] to connect the scratch. As a simple image processing operation, the equation of close operation is given by: where A is a binary image, B is a structuring element, ⊕ and denote the dilation and erosion, respectively. The close operation is used to fill small holes in an object, to connect adjacent objects and to smooth its boundary, while it does not significantly change its area, which is suitable in this case.
After the close operation is finished, we calculate the major axis length of the connected domain and set a threshold T 2 to distinguish the scratch from the spot. Clearly, the length of the scratch is much longer than the spot's and the two lengths are not the same order of magnitude. Experiments shows that setting the threshold T 2 = 15 has best performance.
All steps can be summarized as: step 1: Do the subtraction between the gray histogram of the wafer target image and the qualified flawless one; step 2: Calculate the peak frequency of the result in step 1, if the peak value is greater than the threshold T 1 , it means the target wafer is flawless, otherwise, it is defective; step 3: Using 2-mode method for the defective image binaryzation; step 4: Apply the close operation to the binary image in step 3 for scratch connection; step 5: Calculate the major axis length of the connected domain. If the length is greater than the threshold T 2 , it means the target wafer has the scratch defect, otherwise, it has the spot defect.

III. RESULTS AND ANALYSIS
The whole experiment includes two main parts: wafer segmentation and defect detection. In order to verify the accuracy and robustness of the proposed method, we first propose an evaluation metric for wafer segmentation and then make comparative experiments. After wafer segmentation accomplished, we calculate the accuracy and running time in defect detection part, some intermediate results are demonstrated using example wafer images. The experimental datasets are acquired in industrial process and an example of this wafer product image is shown in Figure 7, the total number is 638 and it includes both flawless wafers and defective wafers. The experimental environment is Intel(R) Core(TM) i7-8700 @ 3.20GHz with 8GB RAM.

A. WAFER SEGMENTATION
In this method, wafer segmentation is the essential prerequisite and the precise wafer segmentation results are crucial to the following defect detection. In order to verify the accuracy of our proposed wafer segmentation method, we make  comparative experiments with five other wafer segmentation methods: segmentation based on the size, segmentation based on line segment detector (LSD) algorithm, centroid segmentation based on k-means clustering algorithm, traditional affine iterative closest algorithm and corner points detection algorithm.

1) EVALUATION METRIC
Since there is no existing evaluation metric for wafer segmentation, in order to verify wafer segmentation performance, we propose a new evaluation metric based on the wafer structure. This evaluation metric is illustrated in Figure 8.
Let the point set Z = {p i , i = 1, 2, 3, 4} and Z = {p i , i = 1, 2, 3, 4}, where p i = (x i , y i ) and p i = (x i , y i ). We compute the Euclidean distance between those corresponding points to represent the segmentation error E.
The smaller the E is, the better the segmentation performance is, vice versa.

2) RESULTS
We number those 638 wafers from left top to right bottom and then make experiments using six wafer segmentation methods respectively. We run each algorithm for 50 times and then calculate the running time for each sample. We select VOLUME 8, 2020 out 8 representative wafers, some of which are flawless and some of which have scratches or spots, for quantitative analysis. The segmentation error E i , where i represents the wafer number, are shown in Table 1.
We can see from Table 1, it is obvious that the performance of the method based on size is the worst because the cumulative error can increase as times increase.
The errors of the method based on LSD algorithm are all greater than 20. We infer that the wafer image quality is the primary reason because it can affect the line extraction badly which leads to the imprecise segmentation. The centroid segmentation method based on k-means clustering algorithm is better than the previous two methods according to the average error in Table 1. However, the performance of this method is based on the accuracy of centroid positioning, as the defects can easily deviate the wafer centroid, the results are unstable. The errors of the method based on the AICP algorithm are greater than those of centroid segmentation method and similar with the LSD algorithm. The straight line extracted by Hough transform often has jitter and bending, so the result of LSD is not as good as that of AICP most of the time. However, some feature information is missing while using AICP, which causes the result of AICP is worse than LSD, such as E333 and E457 in Table1.
The error of the segmentation method based on corner points varies widely according to Table 1 because there are several similar parts (the four corners) on one wafer image and the extracted feature corner points will be matched to another part with a high probability. As for our proposed segmentation method, its performance is the best. With the constraint of spatial feature points, the AICP-FP algorithm can achieve the the optimal match even some feature information is missing and thus we can obtain the precise coordinates for next phase.
The running time for one wafer sample of each method is shown in last line of Table 1. The minimum time is 0.001s using the method based on size because this method only considers the size of the wafer without large computation. The running time of centroid segmentation method is only 0.2s because this method only calculates the centroid of the wafer image and the computation is little as well. The running time od LSD algorithm and AICP algorithm is longer because these two methods not only extract the features but also need to match the corresponding features in the template wafer. For our proposed AICP-FP algorithm, its running time is the longest but less than 1s. Considering its outstanding performance, it is worthwhile during the segmentation process. We will focus on how to improve the operating efficiency of the algorithm in our future work.
In order to present the segmentation effect directly, the sample 89 with spots and the sample 125 with serious scratches are shown for qualitative analysis.
It is obvious that the error of the segmentation method based on the size in Figure 9(a) and Figure 10(a) is the largest. The segmented wafer is not complete at all. We can see from Figure 9(b) and Figure 9(b), the right and bottom borders are not complete either. As for the centroid segmentation method and the segmentation method based on the AICP algorithm, the wafer sample edge (red line) in Figure 9(c)-9(d) and Figure 10(c)-10(d) basically coincides with the template edge (green line). However, in Figure 9(c) and Figure 10(c), the outer edges such as the borders deviate farther from the edges of the template. The middle parts segmented using the AICP algorithm are registered better than those using centroid segmentation method. We can see from Figure 9(e) and Figure 10(e), there are some feature points in the left wafer image linked to the points in the similar but wrong part of the right wafer image, which leads to the bad error in Table 1. As for our proposed method, Figure 9(f) and Figure 10(f) have fewer red lines especially the left part in Figure 9(f) and the lower right part in Figure 10(f), which means the error is smaller. The results are consistent with the errors in Table 1. In conclusion, our proposed segmentation method based on the AICP-FP algorithm outperforms other methods.

3) VALIDATION
In order to validate our proposed segmented method, we use another kind of wafer samples for validation. The example of this wafer image is shown in Figure 11. The same as above, we number these wafer samples from left top to right bottom and then make experiments using six wafer segmentation  methods respectively. The segmentation error E i are shown in Table 2.
As can be seen in Table 2, the results of the method based on size are still the worst because of the accumulative error. The error of the method based on corner points is large and still unstable because of the mismatches. Because of the asymmetry of the wafer sample shape, the average error of the centroid segmentation method is 13.136, which the third large among six methods. The performance of the method based on LSD is better than the method based on AICP because the straight lines in wafer sample are clear and easy to be detected. To sum up, our proposed AICP-FP method is still the best among six segmentation methods, which means our method is robust to different kinds of wafer samples.

B. WAFER DEFECT DETECTION
In this section, we make experiments on the proposed wafer defect detection method the segmented wafer image. Some intermediate results of this phase as well as final results are shown in this section.

1) RESULTS
Several example wafer images are employed to validate the performance of the defect detection method. Figure 12 shows the subtraction results and their gray histograms using the three wafer images in Figure 2.
As is shown in Figure 12, if the target wafer image is flawless, the subtraction result is almost all 'black' which leads to a quite high peak in its gray histogram, so we use this characteristic to distinguish the defective wafer from the flawless wafer. As for the defective wafer image, the defect emerges in the subtraction result image, which contrasts the wafer image background. It can be seen from the gray histogram in Figure 12(b) and 12(c) that the gray histogram of the subtraction result has the two-peak characteristic.
We also use the common region growing algorithm [39] to make comparative experiments. Figure 13 shows the results of region growing algorithm applied to the same defective wafer image with different seed points. The left image in Figure 13(a) and 13(b) is the same defective image where the seed point is starred differently and the right image is the result of using region growing algorithm.
As is shown Figure 13, the results are different with different seed points. If we choose the seed point as starred in Figure 13(a), the result shows the complete scratch but only the contour. If we choose another seed point as starred in Figure 13(b), we cannot obtain the whole scratch. In fact, including the artificial selection of seed point, the region growing algorithm is complex and not ideal in this case.
In Figure 14, the first column are some example wafer images, the results of image binaryzation using 2-mode method are shown in the second column, the results of applying close operation are shown in the last column.
We can see from the second column in Figure 14, the defects emerge in binary images and the scratches in Figure 14(b) and 14(c) are discontinuous. In the third  column, after applying the close operation, the discontinuous scratches get connected effectively. In Figure 14(d), some of these spots also get connected because they are too close to be considered as different parts of one same scratch, however, even those spots get connected, its length is still shorter than the length of the actual scratch by over an order of magnitude.  Comparing the results of close operation in Figure 14 and the results of region growing algorithm in Figure 13, it is obvious that the close operation is much more efficient in this case.
In this experiment, we calculate the average running time and the detection accuracy of the steps mentioned in the above defect detection phase (II.B).The results are shown in Table 2.
The step 1 and step 2 are designed to distinguish the defective wafer from flawless one and the following three steps are designed to identify the defect patterns, so we calculate the average running time of these two major components, respectively.
We run step 1 and step 2 for 50 times and the running time is 0.08s. We also run step 3, 4 and 5 for 50 times and the running time of is 0.027s. The running time is quite short which can save a lot of time in actual production.
There are 638 wafer sample, including 313 flawless samples and 325 defective ones (171 samples with scratches and 154 samples with spots). The accuracy of the step 1 and 2 is 100%, which means all the defective wafer samples can be distinguished correctly. As for defective wafer samples, 164 out of 171 samples with scratches (95.91%) and 142 out of 154 samples with spots (92.21%) can be correctly identified. In total 306 samples and the recognition rate is 94.15%.
At last, we combine these two phases as a whole method and test on it. We run this whole method for 50 times and the average running time of the whole method is about 1s, which is fairly fast enough to be deployed in the industrial assembly lines.

IV. CONCLUSION
A method is proposed for wafer segmentation and defect detection in this paper. This method includes two main functions: cutting out the single wafer and recognizing the defect pattern. Aiming at the problems among the existing methods, we propose a wafer segmentation method based on the AICP-FP algorithm without relying on hardware facilities as well as a defect detection method based on machine vision. The details are as follows: 1) The proposed segmentation algorithm based on AICP-FP outperforms the five other segmentation algorithms. The running time is acceptable considering its best performance 2) The proposed defect detection algorithm is much more accurate than human-expert based detection algorithm.
Its running time is 0.1s, which is fast enough for industrial assembly line. Combining these two parts as a whole, its accuracy and the running time is outstanding which can satisfy the requirements of realistic production. The research of this paper provides important technical reference for reducing manufacturing costs and improving the automation level.