Weakly-Supervised Defect Segmentation on Periodic Textures Using CycleGAN

The importance of an automated defect inspection system has been increasing in the manufacturing industries. Various products to be examined have periodic textures. Among image-based inspection systems, it is common that supervised defect segmentation requires a great number of defect images with their own region-level labels; however, it is difficult to prepare sufficient training data. Because most products are of normal quality, it is difficult to obtain images of product defects. Pixel-wise annotation for semantic segmentation tasks is an exhausting and time-consuming process. To solve these problems, we propose a weakly-supervised defect segmentation framework for defect images with periodic textures and a data augmentation process using generative adversarial networks. With only image-level labeling, the proposed segmentation framework translates a defect image into its defect-free version, called a golden template, using CycleGAN and then segments the defects by comparing the two images. The proposed augmentation process creates whole new synthetic defect images from real defect images to obtain sufficient data. Furthermore, synthetic non-defect images are generated even from real defect images through the augmentation process. The experimental results demonstrate that the proposed framework with data augmentation outperforms an existing weakly-supervised method and shows remarkable results comparable to those of supervised segmentation methods.


I. INTRODUCTION
Most manufacturing industries have aimed to provide their clientele with defect-free products to enhance their corporate competitiveness. In order to achieve this goal, product inspections are usually conducted at the final stage of the manufacturing process, and a large number of manufactured products have been examined by human inspectors. It is rare for the accuracy of the inspector to be uniform during long working hours because inspectors become tired as time elapses. Furthermore, because it is difficult for novices to adequately check the product quality initially, it is likely that The associate editor coordinating the review of this manuscript and approving it for publication was Li He . the inspection results will be unsatisfactory, and time will be required for training purposes. For these reasons, the demand for automated defect inspection has been increasing in the manufacturing industries. Especially, defect inspection is quite important in the semiconductor manufacturing process.
To satisfy the need, several automated inspection systems have been proposed. Image-based, thermography-based, and ultrasonic-based inspection systems, which are utilized for specific purposes, have been widely applied. Image-based systems perform inspection applying image processing and computer vision techniques on images. These systems focus on the appearance of the area where a defect exists on visible objects. Thermography-based systems examine defects by analyzing the thermal distribution of objects [1]- [4]. It is FIGURE 1. Examples of periodic texture image for inspection: (a) TFT-LCD [8], (b) wafer [25], and (c) fabric [26].
applied to objects whose defects are derived from certain thermal characteristics. Ultrasound-based systems that transmit ultrasonic waves into objects to detect flaws are mainly used when an inspection for the internal structures of the objects is needed [5]- [7].
An image-based inspection system is one of the most utilized approaches and has been widely applied to the inspection of products such as textiles, wafers, and thin film transistor liquid crystal displays (TFT-LCDs). As shown in Figure 1, such products have periodic patterns, respectively. In order to detect defects in periodic patterns, various methods utilizing image processing techniques have been introduced: template-based, filter-based, and statistical methods [8]. Template-based methods are simpler than other approaches, and are used to compare an input image with its defect-free shape, called a golden template [9]- [12]. This approach is only useful when a golden template for the input image can be obtained. Filter-based methods perform convolution with filter banks and detect a defective region by analyzing the response of the results [13]- [15]. In this approach, knowing the structure of both defective and defect-free regions helps to design appropriate filters. Statistical methods inspect defects based on the statistical difference between the defective and defect-free regions [16]- [18]. In order to discriminate between the two regions, an adequate number of sample images are needed. In recent years, convolutional neural networks (CNNs) have shown remarkable outcomes in various computer vision applications. In particular, several networks including FCN [19], SegNet [20], and Adap-Net [21] demonstrated a notable performance for semantic segmentation, and thus, deep learning-based approaches have been widely applied to defect inspections [22]- [24].
Although the field of image-based defect inspection has advanced, some challenging issues still remain: data insufficiency, data imbalance, and annotation cost, to name a few. The first issue is data insufficiency. Several datasets for general object detection and semantic segmentation have been publicly released such as PASCAL VOC [27], COCO [28], and Citiscapes [29]. Unlike general objects, it is difficult to obtain images that contain defects because most products are not faulty. It is common that only three or four occurrences per million units or events are allowed in the modern manufacturing industries employing Six Sigma methodology. Especially, it is more difficult to capture defects in semiconductors than other products since defects in semi-conductors are several nanometers in size. Furthermore, generating defective products for the purpose of inspection is even more difficult. In addition, in most cases, the inspection results are treated as strictly in-house and confidential. The second issue is data imbalance, which indicates that data of certain classes are lack or missed. In general cases of defect segmentation, imbalanced data are concerned with the kinds of defects; however, this is occasionally broadened to the existence of defects. Because inspection tasks mainly focus on defective products, it is possible that data of normal products will not exist. The last but not the least one is annotation cost. In the field of machine learning, model training often requires labels of data called the ground truth (GT), which are the correct answers to the data (e.g., class, bounding box, and segmentation mask) on the object of interest. In detail, the annotation required for a semantic segmentation task is tedious and time-consuming owing to the pixel-wise labeling.
In this paper, we propose an image-based defect segmentation framework for periodic texture images using GAN-based golden template generation and data augmentation process. The proposed framework generates the golden template of the input image and then segments defects in a pixel-wise manner using simple post-processing. The concept of this framework is inspired by the dissimilarities between the defective and normal regions. Moreover, synthetic defect and non-defect images are generated from a small number of real defect images through the proposed data augmentation process, the volume of which is sufficient to train a model. The process does not apply simple geometric transformations to existing images, such as scaling, translation, and rotation, but generates whole new images. The main contributions of this paper are: • We propose a framework using CycleGAN [30] for defect segmentation on periodic textures. We achieve a competitive performance of pixel-wise segmentation compared to supervised learning-based methods while using only class labels. The proposed framework lowers the burden of the annotation cost.
• We propose two data augmentation processes for generating synthetic defect and non-defect images, respectively. The proposed data augmentation process alleviates the data insufficiency and data imbalance problems described earlier. The rest of this paper is organized as follows. Section II introduces existing image-based defect segmentation methods. Section III describes the proposed defect segmentation framework and data augmentation process. Section IV presents our experimental settings and results. Finally, Section V provides some concluding remarks.

A. TRADITIONAL APPROACHES
As mentioned in the introduction section, various methods based on traditional image processing and computer VOLUME 8, 2020 vision techniques have been reported. Template-based methods utilize defect-free template images for comparison. Khalaj et al. [9] constructed the building block, the structure of repeated patterns, from direct patterned wafers. The repetition periods of the patterns were estimated along the horizontal and vertical directions in the input image. Xie and Guan [10] generated defect-free images using a simulated building block whose size is equal to the horizontal and vertical periods of repeating patterns for patterned wafer inspection. A golden template was built in a manner analogous to that of constructing a building block. Shankar and Zhong [11] introduced a template-based vision system to inspect semiconductor wafer surfaces. In this system, the mean square error (MSE) analysis between the reference circuit image and the test image was performed using a two-dimensional discrete cosine transform (DCT). A rule-based approach for semiconductor defect segmentation was reported in which the segmentation was performed based on the diagnostic rule, similarity rule, and logical rule using an error image [12]. The resulting error image is generated by matching the inspected die image and the golden master (GM) image.
Filter-based methods are based on various types of filters that output different responses in each of the defective and defect-free regions. Tsa and Wu [13] chose the best parameters of a Gabor filter based on the output response of convolution for each textured surface to deal with unseen defects in the given surface. They demonstrated the effectiveness of their method for both structural and statistical textures. Gabor filter-based supervised and unsupervised approaches for defect detection in textured materials were presented [15]. In the supervised scheme, the best representative Gabor filter was determined based on the filter-selection methodology. For unsupervised defect detection, a multichannel filtering scheme was used and an imaginary Gabor function (IGF) was employed to lower the computational time. Chan and Pang [14] analyzed the frequency spectrum to detect defects in the patterned fabric on the basis that the frequency spectrum will vary when the fabric image has defects. They utilized a fast Fourier transform (FFT) instead of a discrete Fourier transform (DFT) for computational efficiency.
In the statistical methods, defects can be discriminated against flawless regions based on the characteristics of the texture, such as the intensity distribution. Liu et al. [16] introduced an inline defect-defection (IDD) system for TFT-LCD inspection. After some pre-processing to obtain patches for classification, they classified whether a patch is defective or not using locally linear embedding (LLE) and support vector data description (SVDD). In order to solve the problems in SVDD, automatic target defect identification based on fuzzy support vector data description (F-SVDD) ensemble was reported [17]. A partitioning-entropy-based kernel fuzzy c-means (KFCM) algorithm was utilized for constructing F-SVDD ensemble. Yu and Lu [18] developed a wafer map defect detection and recognition method using local and nonlocal preserving projection (LNPP) and joint local and nonlocal linear discriminant analysis (JLNDA). They used several features for representing wafer maps: geometrical features, gray features, textual features, and projection features.
In particular, there were some studies where local binary patterns (LBP) variants were used to detect defective regions. Tajeripour and Fekri-Ershad [31] developed an approach for porosity detection in stone textures using one-dimensional local binary patterns (1DLBP). They divided a stone image into sub-windows and compared a feature vector of each window with that of the porosity-less image. A surface defect detection approach using noise-resistant color local binary patterns (NrCLBP) was presented [32]. They combined feature vectors that have different sizes of neighborhood radius in NrCLBP for multi-resolution analysis. Cao et al. [33] introduced a nickel foam surface defect detection method using multi-scale block local binary patterns (MB-LBP). They utilized a non-subsampled contourlet transform (NSCT) to extract multi-scale texture characteristics.

B. METHODS IN THE ERA OF DEEP NEURAL NETWORKS
After the success of AlexNet [34], several defect inspection methods using deep neural networks (DNNs) have been recently reported. According to the level of supervision for the training data, DNN-based methods can be categorized into three groups: supervised, weakly-supervised, and unsupervised learning.
Ouyang et al. [22] constructed a network called PPAL-CNN, which consists of seven layers for fabric defect detection. In order to localize fine defects and deal with data imbalance, they generated a defect probability map from an input image and utilized it as a dynamic activation layer (PPAL) instead of an activation function. Marino et al. [35] applied class activation mapping (CAM) [36] to potato defect classification and localization. CAM gives a network trained for classification tasks the ability to localize target objects in images by adding a convolutional layer and a global average pooling layer before the last fully-connected layer [36]. They employed several well-known networks such as AlexNet [34], VGGNet [37], and GoogLeNet [38] as backbone networks and modified these networks to extract the CAM results. A network called LEDNet, which is based on CAM for classifying and localizing defects in LED chip images, was presented [39]. In addition, data augmentation was performed randomly for the collected images using geometric transformation techniques including rotation, flipping, translating, noising, and blurring to improve the accuracy. Schlegl et al. [40] introduced AnoGAN to detect lesions in medical images with deep convolutional generative adversarial networks (DCGAN) [41]. They trained a model with only normal data, and thus, the trained network can represent the distribution of normal data. They then compared the input images and the images generated from the latent variable computed by the inverse operation of the generator. When an input image is normal, it is analogous to its generated image. By contrast, a visual difference exists between the input image and the generated image when the input image is anomalous.
Niu et al. [42] presented DefectGAN using CycleGAN [30] for weakly-supervised defect detection. This study is similar to our proposed framework in terms of the non-defect image generation using CycleGAN; however, the total loss function used in the study was inadequate to make a golden template well for periodic patterns. In order to deal with this problem, we utilized another term, identity mapping loss [30], to the total loss function and demonstrated the effectiveness of the additional term. Moreover, we performed data augmentation to solve the problems of data insufficiency, and generalization performance of the golden template generation is enhanced through our data augmentation scheme.

C. DATA AUGMENTATION FOR IMPROVING PERFORMANCE
The volume and diversity of data are crucial to data-driven approaches such as DNN-based methods. Various data augmentation approaches for improving performance in different tasks have been presented.
Budvytis et al. [43] investigated the effect of video data augmentation for semantic segmentation in driving environments. They increased the segmentation performance of different networks by performing label propagation from coarsely labeled frames to adjacent unlabeled ones. The effectiveness of data augmentation in image classification tasks was demonstrated [44]. Three augmentation approaches including traditional transformations, GANs, and the augmentation network were utilized in this study. Bowles et al. [45] improved segmentation accuracy in medical imaging with augmenting training data. They investigated the performance of segmentation networks trained with different amounts of synthetic data. GAN-based medical image augmentation was performed for liver lesion classification [46]. In this study, two GAN variants were employed to generate synthetic liver lesion images, and the classification performance was improved by using the generated synthetic images. Zhao et al. [47] synthesized labeled medical images for the segmentation task in magnetic resonance imaging (MRI) brain scans. They trained spatial and appearance transform models for generating synthetic images and labels.

III. PROPOSED FRAMEWORK A. OVERVIEW
Before describing the details of the defect segmentation in periodic texture images, we first describe the overall scheme. Flowcharts of the proposed data augmentation process and the defect segmentation framework are depicted in Figures 2 and 3, respectively.
As shown in Figure 2, the proposed data augmentation process includes the two subordinate procedures for synthetic image generation: defect and non-defect. By using DCGAN [41] and CycleGAN [30], we create synthetic defect images whose volume is sufficient for training a network. PatchMatch [48] and periodic spatial generative adversarial network (PSGAN) [49] are utilized to create synthetic non-defect images. The proposed data augmentation allows the golden template generator to produce more plausible results. This suggests that our data augmentation scheme improves the generalization performance of the golden template generation.
The proposed defect segmentation framework consists of the golden template generation and post-processing, as illustrated in Figure 3. The golden template generation is performed using CycleGAN, which makes a defect-free version of the input image. After the golden template, straightforward image processing techniques are applied for detecting defects. To make a golden template for periodic patterns, we employ another loss term, identity mapping loss [30]. Although the loss is often auxiliary in other CycleGAN applications, in this work at least, it is crucial to the golden template generation from the perspective that the periodicity of the pattern must remain.

B. SYNTHETIC DEFECT IMAGE GENERATION
Synthetic defect images are generated out of real defect images through DCGAN [41] and CycleGAN [30]. DCGAN is utilized to make synthetic defect images; however, whose resolution is too small to be used as training samples for golden template generation. In addition, it is inadequate to apply a naive scaling method to the images. To solve this problem, image-to-image translation using CycleGAN is carried out for super-resolution.
In the training phase, each of the two networks is trained for their own purposes. In the generating phase, synthetic defect images are created through the trained networks.
DCGAN consists of two adversarial modules, a generator G and a discriminator D, which are trained by min-max game with the loss function V (G, D): where x is the input data for a discriminator D, and z is a latent variable for a generator G. In this work, a generator that creates fake defect images indistinguishable from real ones is learned when given the actual defect images as training samples. From now on this network will be called as N FD . CycleGAN for super-resolution learns the two mapping functions G H : X L → X H , G L : X H → X L where X L is the low-resolution domain of the defect and X H is the high-resolution domain of the defect. The two adversarial discriminators, D X H and D X L , aim to distinguish the data of the domain and the data mapped from the other domain, respectively. Loss functions for the total loss are expressed as: Our total loss function used to train a model for superresolution is: where L GAN , L cyc , and L idt are the adversarial loss, cycle consistency loss, and identity mapping loss, respectively. λ cyc and λ idt are used to control the impact of L cyc and L idt , respectively. The network for super-resolution will be termed N SR hereafter. After the two networks are trained, the images produced from N FD trained to create fake defect images are fed into N SR trained for super-resolution. We use the resulting images of N SR as our synthetic defect images for training the golden template generator.

C. SYNTHETIC NON-DEFECT IMAGE GENERATION
To train CycleGAN for golden template generation, the two domains of the image are required. For this reason, Patch-Match [48] and PSGAN [49] are employed to perform synthetic non-defect image generation. In order to deal with a situation in which real non-defect images can not be acquired, factitious images such as defect-removed are generated using PatchMatch. Periodic textures are then synthesized from the defect-removed images using PSGAN.
In PatchMatch, a nearest-neighbor field (NNF) is initialized as patches, which are at the uniformly random offset f (x, y) across the whole image. Based on the patch distance D(v) between the patch at (x, y) in an image and the patch at (x, y) + v in the other image, the offset f (x, y) is propagated. On odd iterations, f (x, y) is changed into a value that minimizes {D(f (x, y)), D(f (x − 1, y)), D(f (x, y − 1))}. Furthermore, this propagation is performed in the opposite direction using f (x + 1, y) and f (x, y + 1) on even iterations.
After the propagation, the offset v 0 = f (x, y) is checked with different candidate offsets to avoid convergence to the local minima. A series of candidate offsets exponentially decrease as: where w is the maximum search radius, which is initially set to the maximum image dimension, α is the decaying parameter for reducing the search window sizes, and R i is a random value in [−1, 1] × [−1, 1]. This random search finishes when wα i is less than 1 pixel. PSGAN is based on the DCGAN architecture; however, it is composed solely of convolutional layers. In addition, the generator G is extended in a two-dimensional spatial domain to map a latent variable Z ∈ R L×M ×d to an image X ∈ R H ×W ×C . A latent variable Z consists of three sections: local independent part Z l , spatially global part Z g , and periodic part Z p . The channel dimension of Z , d, is the sum of the channel dimensions of the three parts d l , d g , d p . In accordance with the extension of the generator G, the discriminator D outputs a L × M field from an image X .
As the variation of the generator and discriminator in PSGAN, the standard GAN loss function is also altered as: where D ij (X ) is the discriminator at (i, j), 1 ≤ i ≤ L and 1 ≤ j ≤ M , for a local part X in an input image X . After a network for texture synthesis is trained, the enlarged synthesis results are randomly cropped. The resulting images of the random cropping are used as our synthetic non-defect images for the training of a golden template generator. Henceforth, the network for texture synthesis will be dubbed N TS .

D. GOLDEN TEMPLATE GENERATION
For golden template generation, CycleGAN learns the two mapping functions G N : X → Y and G D : Y → X between the defect domain X and the non-defect domain Y . In order to train a network for golden template generation, we used the same total loss function in Equation (5). The total loss function for a golden template generator is expressed as: where L GAN , L cyc , and L idt are calculated using Equations (2) to (4). After this section we will call the network for golden template generation as N GT . We employ the identity mapping loss [30] which allows a model to have the capability to preserve the color composition after translation as the additional term in the total loss function. Although the loss term is not commonly used in other applications, we found that it helps to preserve the periodicity of patterns. When the coefficient of the identity mapping loss is zero, flawless regions are slightly varied. This phenomenon occurs when N SR is trained without L idt in Equation (5). Only the defective region should be changed while the defect-free region is unaltered, which is our goal and the reason for using the identity mapping loss.

E. DEFECT SEGMENTATION
After the golden template of the input image is obtained, simple image processing techniques are applied for detecting defects. To measure the similarity between the input image and its golden template, the patch-wise sum of the absolute difference (pSAD) is calculated as: where (i, j) is a patch ranging from (i − W , j − H ) to (i + W , j + H ). I D and I G denote the input defect image and the golden template of the input image created by the golden template generator, respectively. In general, defective regions are quite different from their golden templates. On the contrary, defect-free regions are extremely similar to the templates. Therefore, the values in the pSAD results are usually larger in the defective region  than in the flawless region. Defects are then segmented by applying hysteresis thresholding to the pSAD results.

IV. EXPERIMENTS A. IMPLEMENTATION DETAILS
We utilized several networks to verify our framework. In our experiments, all the network training and testing were performed on an AMD Ryzen 7 2700X CPU and an NVIDIA RTX 2080Ti GPU using CUDA 10.0.
Synthetic Defect Image Generation Firstly, we trained N FD on defect images of 64 × 64 resolution and employed the architecture introduced by Radford et al. [41]. The network was trained for 2500 epochs and the dimension of the latent variable was set to 100. Secondly, we trained N SR on low-resolution defect images and high-resolution defect images of 256×256 resolution with the architecture presented by Zhu et al. [30]. The 64 × 64 low-resolution images were resized to 256 × 256 for training. We trained the network for 200 epochs keeping the learning rate for the first 100 epochs and linearly decaying the initial rate to zero in the next 100 epochs. The coefficients λ cyc and λ idt in Equation (5) were 10 and 0.5, respectively.
Synthetic Non-defect Image Generation In order to obtain defect-removed images from real defect images, we performed PatchMatch. The defective region was located manually and re-drawn through the approximate nearest-neighbor algorithm. For texture synthesis, N TS was trained on images of 160 × 160 resolution with the architecture introduced by Bergmann et al. [49]. The network was learned for 100 epochs to generate fake texture images from the latent variables. For the channel dimension of the latent variables, we set the dimension of the three parts as d l = 10, d g = 0, and d p = 2. With the trained network, we produced synthesized texture images of 640 × 640 resolution and then randomly cropped the images to obtain small texture patches.

Golden Template Generation
To perform golden template generation, we trained N GT on defect images of 256 × 256 resolution and non-defect images of 256 × 256 resolution with the aforementioned architecture in synthetic defect image generation. In addition to random flip in training phase, random rotation with a maximum of ±5 • was performed to achieve a generalization performance of the representation. The coefficients λ cyc and λ idt in Equation (8) were both 10.
Defect Segmentation We chose the parameters of the patch size and thresholding values empirically. The pSAD results were calculated with a patch whose size ranges from 5 × 5 to 21 × 21. Hysteresis thresholding was performed using an upper bound T u and a lower bound T l . The upper bound T u and the lower bound T l were multiples of 0.02 with a constraint that T u should be between 0.5 and 0.98, and that T l should be lower than T u .

B. DATASET
We experimented with three periodic textures. One is the images of defects in semiconductor wafers, and the others are those of defects in textiles. Each of them has distinctive defects and periodic textures.  Our Dataset In order to demonstrate our proposed framework, we experimented with our private dataset. This dataset is about defects occurring in semiconductors, the images of which were captured by scanning electron microscope (SEM). There are seven types of defects in the dataset, and the sample images and their descriptions of our dataset are shown in Table 1. The dataset contains 264 grayscale defect images of 480×480 resolution with periodic textures; however, there are no non-defect images.
We resized the original images to images of 256 × 256 resolution and used the resized images for the experiments. For the golden template generation and data augmentation process, 200 images were randomly selected as the training data. The performance on this dataset was evaluated with the others.
Because our dataset does not include non-defect images, we used some non-defect images acquired at different scales to construct the real subset. By utilizing the scale VOLUME 8, 2020  information in the defect and non-defect images, we resized the non-defect images to make the textures in the non-defect images similar to the patterns of the defect images. Then, we randomly cropped the resized non-defect images as the size of real defect images, as shown in Figure 4. Consequently, we obtained 200 non-defect images with the modification scheme. As shown in Table 2, the real subset includes 200 real defect and non-defect images. The real+syn subset contains 200 real defect and synthetic non-defect images. The syn subset consists of 10, 000 synthetic defect and non-defect images.
TILDA In addition, we utilized a public dataset, TILDA textile texture-database [50], to apply our framework to other periodic textures. This is a dataset of defects in textiles, and there are eight types of fabric. Among the textiles, we used the subsets, {C3R1, C3R3}, which have periodic structures. In the subcategories of the two subsets, we used {E1, E2, E3, E4} as target defects and {E0} as defect-free textures. Each subcategory contains 50 grayscale images of 768 × 512 resolution and the sample images of the subcategories in the two subsets are shown in Figures 5 and 6.
In order to be adapted for the proposed framework, the original non-defect images were divided into six patches  of size 256 × 256 without overlap. For defect images, first, we manually labeled defects in a pixel-wise manner. Based on the annotation results, we set the smallest ROI which covers all the defective regions in the image. The ROI has a constraint that the center of the ROI is equal to that of defects and the width and height of ROI are multiples of 256. With this constraint, we divided the ROI into patches of size 256 × 256. Among the acquired patches, those which contain defects less than 100 pixels were discarded. As shown in Figure 7, the patches covered with the blue bounding boxes were obtained as defect images, whereas the regions in the red boxes were not used. Accordingly, we obtained 397 and 323 defect patches in the {C3R1, C3R3}, respectively. In order to train networks for the golden template generation and data augmentation process, 300 and 250 images were arbitrarily chosen. The other images were used as test data.

C. QUALITATIVE RESULTS OF THE PROPOSED FRAMEWORK
To verify the effectiveness of our data augmentation process and the identity mapping loss, we trained the network for golden template generation with the three subsets {real, real+syn, and syn} of our dataset.  Through our data augmentation process, we generated synthetic defect images and synthetic non-defect images using our dataset, as shown in Figures 8 and 10. The resulting non-defect images seemed to be almost the same as the golden templates of the real defect images; however, the generated defect images did not look like completely the real ones. Nonetheless, we achieved remarkable results with them. With many synthetic images, the network for golden template generation could learn a robust mapping from various defective regions to defect-free textures.
We acquired the most reasonable results for the test images when the network for golden template generation was trained on the syn subset, as shown in Figure 9. The defective regions in the input image changed as normal, and it was difficult to distinguish where the original defective region had been. When the network was trained on the real+syn subset, the defect-free regions in the golden template were very similar to those in the input image. By contrast, the regions in the golden template where defects had existed were slightly different from the defect-free regions. As shown in Figure 11, our framework generated clean defect-free images from the input defective images. The post-processing for defect segmentation could be seemed to be superfluous; however, we dealt with the noises that are due to the great change of the intensity in a normal pixel. Despite the different intensity changes of the pixels in the defective region, we detected the region as one defect. With this step, we achieved better performance.
As shown in Figure 12, the identity mapping loss has a great effect on the golden template generation. The golden template generator trained without the identity mapping loss missed the periodicity of the textures, so that the difference values in the defect-free regions were as large as those in the defective regions. On the contrary, the generator trained with the loss represented the periodicity while removing defects in the images.
In order to analyze these results numerically, we obtained the average values of both the defective and defect-free regions in the absolute difference images with region-level labeling, as shown in Figures 13 and 14. Because the  non-defect images in the real subset are quite different from the defect-free regions in the real defect images, the average value of the defect-free region was the largest among the three subsets. The network trained on the syn subset reduced the average value of the absolute difference in the defect-free region. This suggests that our data augmentation process made more realistic images. It was difficult to discriminate between the defective regions and the defect-free regions in the absolute difference images when the identity mapping loss was not utilized; however, the use of the loss made the gap between the average values of the two regions clear.
The very similar method, DefectGAN [42], does not include the identity mapping loss in the total loss function. The model trained without the loss created the textures shifted from those in the input image. This could make defect-free regions be segmented as defects. In addition, with our data augmentation process, the regions where defects existed in the input image could turn into the textures more similar to  those of defect-free regions. For these reasons, our framework could be differentiated from DefectGAN for segmenting defects on periodic textures.
Additionally, we applied our framework to the TILDA dataset and adjusted the training scheme and hyperparameters of some networks for data augmentation of the TILDA dataset. Because directions of the pattern in the subsets are different, the two networks, N FD and N TS , for synthetic defect and non-defect image generation were trained by direction of pattern. We generated synthetic defect and non-defect images, the number of which was as same as the number of those of our dataset.
in Figures 16 and 18, our framework made decent golden templates for input defect images; however, there was a little vestige of the defective region in the generated golden template.

D. QUANTITATIVE EVALUATION OF THE PROPOSED FRAMEWORK
In order to demonstrate the competitiveness of our framework on defect segmentation, we compared our framework with the three methods: CAM [36], FCN [19], and AdapNet++ [51].
The two supervised semantic segmentation networks were trained on the real subset of our dataset. With the real defect images and their region-level labels, FCN and AdapNet++ learned the semantic context for 200 and 100 epochs, respectively. Our framework and CAM were trained on the real, real+syn, and syn subsets of our datasets with only image-level labels. They were trained for 200 epochs, except that our framework learned two mapping functions between the defect and non-defect domains on the syn subset for 50 epochs. In the training and testing phase of the CAM, several backbone networks were employed: DenseNet [52], ResNet [53], and SqueezeNet [54]. The segmentation masks of the CAM were obtained by applying the same thresholding scheme of our framework to the resulting heatmaps of the CAM. We selected the thresholding parameters to achieve the best performance of the CAM.
As shown in Figure 19, our framework produced remarkable segmentation results for various defects. These outcomes were comparable to those of the other two supervised methods. While the CAMs of the three backbone networks localized defects, our framework segmented the defective region more accurately. Specifically, our framework showed better segmentation results for the relatively smaller defects, compared with the three CAMs. These results could be due to the downsampling in the backbone networks of the CAM. The spatial resolution of the feature map was reduced through pooling layers or strided convolutions, and the resulting heatmap was enlarged to the original input size.  We adopted the intersection over union (IoU) to evaluate the segmentation performance [27]. The IoU of the two regions, R p and R gt , is expressed as: and the mean IoU (mIoU) is calculated as: where area R p ∪ R gt denotes the union of the predicted segmentation mask and the ground truth, area R p ∩ R gt indicates their intersection, and N is the number of test data. In this work, the mIoU was measured with only the defective region. As shown in Table 4, the performance of our framework was enhanced with the proposed data augmentation process. Moreover, our framework outperformed the CAMs by a huge margin. Unfortunately, the performances of the two supervised segmentation methods were superior to those of our framework. However, given the qualitative segmentation results, it seemed that our framework achieved a decent performance of defect segmentation. Above all, our framework produced competitive segmentation results without pixel-wise labeling. In addition, we compared the inference time of the proposed framework with those of the two segmentation networks. As shown in Table 5, our framework showed the fastest inference time. In our framework, the inference time of the golden template generation and defect segmentation was about 16 and 2 ms, respectively.

E. DISCUSSION
There were a few limitations and failure cases of our framework, as shown in Figure 20.
Because pSAD was applied to deal with noises in the absolute difference images between the input image and its golden template, our framework produced less detailed results than the supervised methods. Hysteresis thresholding was employed to determine defective regions; however, there were the limitations that the number and the area of defects were heavily dependent on the two parameters of the thresholding. When a small foreign body was between line patterns, the segmentation result of our framework was wider than its actual region. When some line patterns were broken or bridged, a few defective regions were not detected because the intensities of the regions hardly changed.
TILDA dataset is more challenging than our semiconductor dataset because the textures of the two subsets are less strict and more complex than those of our dataset. Particularly, the directions of the patterns in the original images are different from those of one another. Furthermore, there are some indistinct defects that are indistinguishable from defect-free textures. The qualitative results of the golden template generation seemed to be decent with the naked eye; however, the segmentation results were not good. This was because the intensities in defective regions were similar to those in defect-free textures. The structures of the defects in the two subsets are more complex than that of the defects in our dataset. Detailed shapes of defects and surrounding textures could not be formed in the synthetic defect generation.
Particularly, the patterns in the C3R1 subset have both global and local periodicity. This means that the images in the C3R1 subset have check patterns globally and line patterns locally. Unfortunately, it was difficult to reproduce these distinctive textures in the synthetic defect images. For these reasons, the quantitative performance on the TILDA dataset was not good, as shown in Tables 6 and 7.

V. CONCLUSION
In this paper, we propose a weakly-supervised defect segmentation framework for periodic textures and a data augmentation process applicable to our framework. We  generated a golden template from an input defect image and segmented the defective region by applying straightforward post-processing to the two images. Furthermore, we found that the identity mapping loss is crucial to the golden template generation of defect images with periodic textures. As a result, we localized the defects in a pixel-wise manner without region-level labeling. Through the proposed augmentation process, we created synthetic defect and non-defect images even from only real defect images. With the augmented data, the golden template generator made more plausible results, and the segmentation performance of our framework was enhanced. The proposed framework was qualitatively and quantitatively compared to other defect segmentation methods on periodic texture images with various defects. The experimental results suggest that the proposed framework outperformed the CAM-based method and showed results comparable to those of the supervised segmentation in strictly periodic textures.
In future work, we plan to simplify the whole proposed framework and develop the data augmentation process to make more realistic images. Since the difference of intensities showed a limitation for segmenting defective regions, we plan to develop more in-depth post-processing. Particularly, we will try to improve the quality of the golden template generation for loosely periodic textures and to segment the detailed structure of defects.  University, in 1994, where he is currently a Professor with the Department of Electronic Engineering. His research interests include health monitoring using mobile devices, visual surveillance, virtual devices, machine vision systems, advanced driver assistance systems, and 3D vision systems in sports, as well as home appliances.