Data Augmentation With CycleGAN to Build a Classifier for Novel Defects From the Dicing Stage of Semiconductor Package Assembly

Industry 4.0, a concept first proposed by Germany, has resulted in an increasing number of companies adopting a mass customization strategy. This strategy is widely used across various industries, enabling the production of small batches of diversified products to meet the diverse needs of customers. It encompasses the business process of providing customized goods that best fulfill individual customer requirements, thereby necessitating small-scale production of multiple products. Therefore, the product life cycle of mass customization is much shorter than other production strategies. When product line changes are frequent and customized products have high yield rates, accurately detecting potential defects from a limited number of images is a daunting challenge. If the defect identification classification model needs to maintain a certain level of identification accuracy and the model needs to be deployed quickly, it is impossible to wait until a large number of defect images are collected before deploying an accurate model for new defects. Obtaining a high-precision defect identification classification model is crucial. In this study, we employed the style transfer method of CycleGAN, which takes advantage of the unmatched training images, to successfully transfer the style of defective images from old products to defect-free images of new products. However, CycleGAN requires a large number of images for training, so this study primarily focuses on rare sample categories. We first obtained the defect mask through a semantic segmentation model and then separated the foreground defect from the wafer background using digital image processing techniques. We then copied and pasted the separated defect onto a new wafer background to generate fake defect images. Finally, a generative adversarial network architecture was used to perform image blending to make the fake defect images more natural and realistic. The effectiveness of the data augmentation method was verified through a convolutional neural network model. Through the proposed method in this study, the number of defect images in new products was successfully increased, which helps to deploy a defect identification classification model for new products quickly.


I. INTRODUCTION
OSAT stands for ''Outsourced Semiconductor Assembly and Test.'' It is an important part of the semiconductor industry, involving the final stages of semiconductor device manufacturing. OSAT companies are specialized service The associate editor coordinating the review of this manuscript and approving it for publication was Ravi Mahajan.
providers that handle the assembly, testing, and packaging of semiconductor chips produced by fabless semiconductor companies or integrated device manufacturers (IDMs).
Automated Optical Inspection (AOI) in OSAT: After wafers are shipped from wafer fabs (such as TSMC and Intel), they are sent to OSAT factories for packaging and testing. In the front end of the OSAT process, wafer grinding, thinning, and dicing are performed to create individual chips.
At this stage, surface inspection of the wafers is necessary. Traditionally, manual inspectors were employed to perform visual inspections, focusing on significant defects such as chip cracks and scratches. However, in response to the need for increased yield and automation, AOI has been introduced to replace human inspectors. The purpose of AOI is to prevent human error, but it relies on rule-based judgments, often detecting all irrelevant anomalies as failures. Moreover distinguishing scratches from particles is extremely difficult due to their similarity, and enhancing the detection capability leads to a high rate of false positives (overkill). Figure 1 shows the semiconductor factory processes, with emphasis on AOI after wafer dicing.
AOI+AI: This process involves using AI to confirm the images that were flagged as overkill by AOI and then having professional inspectors confirm and label the final images before feeding them back into the AI model for further training. This closed-loop system is effective, and many OSATs are implementing AOI+AI to reduce the time required for manual inspections and improve training for new products.
Industry 4.0 [1], also known as the Fourth Industrial Revolution. It aims to create ''smart factories'' where machines, products, and systems can communicate and cooperate with each other, making production processes more flexible, adaptive, and cost-effective.
However, since Industry 4.0, new products have shifted from mass production to customization, and semiconductor factories have had to adjust their processes and circuits to maintain competitiveness, resulting in shorter product life cycles. As a result, the current AOI+AI model cannot keep up with the speed of product iterations in wafer fabs. The robustness test [2] of the AI model shows that the accuracy drops from 30% to 40% for the novel wafer pattern. This situation makes it impossible for the factory to operate effectively.
AOI uses computer vision technology and trains a golden sample to distinguish between normal and abnormal conditions. However, AOI's judgment method is rule-based and has its limitations. If handled rigorously, it will produce a large amount of overkill. In addition, the AOI capability of OSATs is less robust than that of semiconductor factories, primarily due to position deviations induced by the cutting process. This report will not discuss AOI, but rather, it will concentrate on the effective utilization of AI. Therefore, in the second chapter of the review, past experiences using GANs are discussed; the third chapter discusses the new AI model combination architecture; the fourth chapter discusses the experiment results; and the fifth chapter discusses the conclusion and follow-up.

II. REVIEW
Wafer surface defects detection: The judgment and standards for defects between OSAT and semiconductor manufacturers may differ. For semiconductor manufacturers, most defects may lead to short circuits in the chip's circuitry. The cleanliness level is specified by the ISO standard 14644-1 [3], which classifies cleanrooms based on the number of particles of specific sizes per cubic meter of air. The lower the cleanliness level number, the cleaner the environment. The cleanliness level of semiconductor manufacturing is typically considered to be 100, indicating that there are no more than 100 particles of size 0.5 micrometers or larger per cubic foot of air. However, for most OSAT, due to the grinding and dicing process, there may be some dirt on the surface caused by water stains, and the cleanliness level of OSAT is usually considered to be 1000. There may also be particles and burrs that adhere to the surface due to static electricity, which will be blown away during postprocessing. However, this can cause AOI misjudgment, which we mentioned earlier as overkill.
There is a geometric progression in the number of defects for each type of defect: Table 1 lists the common types of defects found in the dicing stage, such as foreign material, backside chipping, chip fly, wafer not cut through, passivation defect, chip crack, chip scratch, residual burr, silicon dust, and cut shift. Counting these defects reveals significant differences in their respective quantities, and due to the extremely high yield rate of the OSAT, defects of interest are rare. However, considering our customers' intellectual property rights, we replace the original images with photos obtained from online searches and combine them with the defective items received from the factory before sticking them together. VOLUME 11, 2023  Traditionally, image augmentation involves the utilization of digital image processing techniques, such as angle rotation, mirror flipping, random cropping, and even the addition of salt and pepper noise, to enhance the diversity of training images. Although this method can provide some level of augmentation for a small number of new product images, it may not be feasible to use digital image processing to augment new product images, especially in the pilot run stage, where there may not even be a single defective image.
The current AI models include various Generative Adversarial Network (GAN) technologies [4]. CycleGAN [5], [6], [7], [8] is commonly applied in wafer defect inspection. It enables the transformation of one type of image into another, which proves helpful in generating defect images to facilitate the training of defect recognition models. The advantage of CycleGAN is that it can generate a greater variety of defect images to expand the training dataset, thereby improving model accuracy. Moreover, CycleGAN can produce high-quality defect images that can be used for model testing and validation. However, the disadvantage is that CycleGAN requires a large amount of training data to generate high-quality defect images, which may result in significant time and resource costs for model training.
In addition, CycleGAN-generated images may exhibit bias or distortion, which may negatively impact model accuracy. Overall, CycleGAN is a promising technology for wafer defect inspection, but issues such as unrealistic defects and long training times may arise in practical applications.

III. RESEARCH METHOD A. DATA COLLECTION AND LABELING
Just like in an OSAT factory, we treat the production of wafer surface defects as a process and have integrated various AI models into a standard procedure consisting of five steps, which we call the Defect Image Value-added Architecture (DIVA) as shown in Table 2. Of course, the company also uses other models in combination, but each has its limitations or shortcomings, such as WGAN [9], WGAN-GP [10], Conditional GAN [11], Pix2Pix [12], U-Net [13], among others.

B. CycleGAN WITH ATTENTION MECHANISM
CycleGAN and attention mechanism [14] have gained attention in recent years in the field of wafer defect detection. CycleGAN is a type of generative adversarial network model that can learn two distribution generators with data from two different fields, thus it can achieve image style transfer and image translation. In order to get the tasks mentioned above, model must introduce an additional loss function so that the learned mapping function maintains cyclic consistency, thereby reducing the probability space of the mapping function. In wafer defect detection, CycleGAN can transform normal wafer images into defective wafer images to generate more defect samples for training the defect detector. The attention mechanism is a neural network structure that can weight the output based on different features of the input. In wafer defect detection, the attention mechanism can focus the network's attention on areas where defects may exist to improve the accuracy of the detector. In the relevant literature, Tsai et al. [15] proposed a wafer defect detection method based on CycleGAN and the attention mechanism. This method first uses CycleGAN to transform normal wafer images into defective ones and then uses the attention mechanism to focus the network's attention on areas where defects may exist. Finally, a convolutional neural network is used for defect detection. Experimental results show that this method has better detection performance and accuracy in wafer defect detection.
BASNet [1] semantic segmentation model: Qin et al. [16] and Van Gansbeke et al. [17] studied the advantages and disadvantages of the BASNet semantic segmentation model, including the following advantages for detecting wafer defects: 93014 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   However, the BASNet semantic segmentation model has some disadvantages, including: 1) The model requires a large amount of training data and needs to be trained for a long time to improve detection accuracy. 2) The model exhibits inadequate performance when processing high-contrast images, which can lead to missed detections or false detections.
As shown in Figure 3, defects are extracted and placed in other locations during the process. However, if the defect falls in a high-density circuit area, it may be necessary to try other semantic segmentation models.

C. GPGAN IMAGE BLENDING MODEL
GPGAN [18] is a generative adversarial network used for image blending, which can generate high-quality synthetic images and use them to train detection models. According to Jalayer et al. [19] and Wu et al. [18], GPGAN can be used to synthesize images with defects to expand existing datasets and improve the accuracy of defect detection models. However, GPGAN's training requires a large amount of image data and computational resources, and the synthesized images may contain some unrealistic details, which could affect the performance of the model. Figure 4 shows the results of image blending using GPCAN.  The EfficientNet [20] image classification model offers a solution to address more intricate problems. In convolutional neural networks, the conventional approach involves adjusting the depth, width, or input resolution. While it is possible to modify all three dimensions simultaneously, previous studies have typically focused on adjusting only one dimension due to the high complexity associated with changing all three dimensions and the absence of a reference principle. Effi-cientNet, on the other hand, is a highly efficient and accurate image classification model primarily designed to classify input images into different categories. Atila et al. [21] stated that this model achieved the highest accuracy at the minimum cost of computation. In wafer defect detection, the advantage of using EfficientNet is that it can improve the accuracy of defect detection while reducing computation time, thus improving efficiency. However, using this model requires a large amount of training data and computing resources, as well as higher technical requirements for model parameter tuning. A series of EfficientNet models were compared with well-known convolutional neural network models, and from Figure 5, it can be observed that the EfficientNet model has higher accuracy with the same model parameters. In practice, models with fewer parameters and higher accuracy are preferred, so the model was chosen as the classification model for this study.

A. DATA DESCRIPTION
Over a period of three months, images were collected from an automated optical inspection machine. Among them, 20% of the images were of new products. The number of images for each type of defect is shown in Figure 6: Normal (1433 images), Particle (1322 images), Burr (280 images), Die Crack (61 images), and Scratch (29 images), for a total of 3125 images. It can be observed that the numbers of Burr, Die Crack, and Scratch images are extremely rare, resulting in severe data imbalance.

B. DEFECT DATA AUGMENTATION
The effect of using image blending versus directly copying and pasting images is compared in Figure 7. The two cases are using direct copy and paste and using GPGAN for image blending. The Fréchet Inception Distance (FID) coefficient [22] is used to evaluate the quality of the generated images. The FID coefficient is an index used to evaluate the difference between the images generated by a generative adversarial network (GAN) and real images. Regardless of visual observation or evaluation indicators, the generated images using image blending look closer to real images. Therefore, the image blending method will be used in generating defect images. Finally, the number of data increments is shown in Table 3. In addition to normal images, we randomly generated approximately 300 pseudo-defect images for other defect categories to reduce the severity of data imbalance.
These pseudo-defect images will only be added to the training set, and the number of images in the testing set will remain unchanged. Therefore, the training set will increase from 2,186 images to 3,410 images.

C. COMPARISON OF RESULTS
The EfficientNet classification model was used to evaluate the difference before and after using the data augmentation method proposed in this study. Evaluation metrics include a confusion matrix [23], [24], [25], as shown in Figure 8. It can be seen that the accuracy of the model has been slightly improved from 95% to 97%. In addition to model accuracy, we also compare the performance of the model for each defect class.
Due to the imbalanced number of samples for each defect category, besides model accuracy, other evaluation metrics such as precision, recall, and F1-score need to be compared, as shown in Table 4 and Table 5. 93016 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   Although the model without data augmentation achieved an accuracy of 95%, Table 5 shows that the precision, recall, and F1-Score of the major defect classes, Die Crack and Scratch, were all less than 60%, with the lowest being only 22%. Table 5 shows that using the data augmentation method proposed in this study can effectively improve the accuracy of the model for each defect class, especially for the major defect classes Die Crack and Scratch. In terms of recall rate, Die Crack increased from 53% to 79%, and Scratch increased from 22% to 67%, both with significant improvement.

D. VISUAL VERIFICATION
Class Activation Mapping (CAM) is a method for visualizing the attention regions of a neural network for classification. In this study, Gradient-weighted Class Activation Mapping (Grad-CAM) [26], [27] was used to visualize whether the EfficientNet classification model truly learned the types of defects and the correct attention regions.
Using Grad-CAM, we can observe that even for generated defect images, the EfficientNet classification model can correctly identify the defect regions, as shown in Figure 9. This further confirms the feasibility of the image generation method proposed in this study.

V. CONCLUSION AND DISCUSSION
Combinatorial techniques can reduce training models: Since AlphaGo, AI has become increasingly powerful, but it can only handle one task. Before the development of general AI [28], combining AI models to obtain high-quality working abilities was a good practice that could save computing power and training time.
If more abnormal images are desired, deformation can be used to increase the training data. Elastic Deformations [29] is a method for calculating material deformation, which can be used for incremental analysis of wafer defects. There is a paper titled ''Elastic Deformation-Based Defect Detection in Semiconductor Manufacturing,'' which was published by Wen-Chieh Wang et al. in IEEE Transactions on Semiconductor Manufacturing in 2013. The paper proposes a defect detection method based on elastic deformation and demonstrates its effectiveness in experiments, as shown in Figure 10. In addition, there are also patents and technical reports that provide relevant information about elastic deformations in semiconductor defect inspection. For example, the US patent US20150092066A1 describes a method for detecting semiconductor defects based on elastic deformation, as well as systems and devices for implementing the method.
Commercialization in the future will bring significant benefits to anomaly prevention, such as surface defects detection. However, currently, we are facing barriers such as AOI vendors not releasing APIs and OSAT vendors not allowing AI system integration or sharing anomaly patterns with non-customers. Until these barriers are removed, we can only use inefficient methods.