Image Demoireing Via U-Net for Detection of Display Defects

Mura defects, which occur during display manufacturing, degrade the quality of the display. Therefore, Mura detection is critical. When the camera is focused on the display for accurately detecting Mura defects, a moire pattern occurs in a captured image because of the frequency difference between the subpixels of the display and the color filter array of the camera. Typical image data handled with existing demoireing methods do not have Mura defects and include synthetic moire images. Therefore, we created a dataset to detect Mura defects that include real moire patterns, classified into two categories: weak and strong. We propose a new demoireing framework to remove the moire patterns in the captured image, thereby accurately detecting Mura defects. We also propose inserting ArUco markers for accurate alignment and automation, conducting multiple experiments with U-Net. Based on the captured data, the proposed U-Net, which combines a frequency loss and data augmentation, improves the performance by 6.41 dB higher for the weak moire pattern and 4.14dB higher for the strong moire pattern than state-of-the-art networks in terms of peak signal-to-noise ratio.


I. INTRODUCTION
During display manufacturing, defects-including poor brightness, low contrast, and saturation-occur due to process flaws, dust, and manufacturer mistakes. Stain defects caused by defective pixels are called Mura defects. Because Mura defects degrade the quality of the display, the process to inspect Mura defects is essential. Because of differences in judgment, skill, and fatigue, inspectors cannot fairly examine Mura defects, and their measurement is subjective. Therefore, to objectively and quantitatively measure Mura defects, the display is captured with a camera, automating the inspection. When the camera focus is adjusted on the display to detect the accurate Mura defects, an aliasing phenomenon occurs between the subpixels of the display and the camera color filter array, resulting in irregular or amorphous moire patterns in the captured images, and image quality is degraded. Although the moire pattern can be suppressed by adjusting the focus or adding a filter front to the camera lens, accurately detecting Mura defects becomes difficult because the image is blurred or over-smoothed. Consequently, a moire pattern in the image captured by a camera is inevitable when detecting Mura defects, and the moire pattern must be removed.
Various methods have been proposed to remove the moire pattern [1], [13], [17], [18], [21], [22]. The TIP 2018 dataset [13], which is used predominantly in existing methods, was captured based on ImageNet [14], it does not include Mura defects, and the moire pattern of the AIM2019 dataset [23] is synthetic, not real. Therefore, using general image data is not suitable for our task. We address the limitations of these datasets by creating a real moire dataset with Mura defects. To the best of our knowledge, this is the first real moire dataset relevant to detecting Mura defects in the displays. The position and angle of the camera looking at the display must vary to alter the dataset's moire pattern. However, to fairly compare the captured image and the ground truth (GT) image, the process of aligning the two images is essential. If the alignment process is manual, an automation algorithm that can align the two images is required because a manual algorithm is less accurate and requires significant time and money.
The TIP 2018 dataset created corners by masking the image in black and matching the alignment with corner detection. However, the masking method hinders the detection of Mura defects because it covers a large part of the display. Therefore, we propose inserting ArUco markers as a new alignment method. We propose a novel framework for a demoireing network to capture Mura defects accurately. As depicted in Fig. 1(a), our framework is divided into a capture process and a training process, and the capture process is depicted in Fig. 1(b). The capture process consists of inserting an ArUco marker for alignment, camera setting including white balance, exposure adjusting, and alignment and cropping of the captured image. The training process consists of augmentation, model, and loss calculation. We compared various networks [13], [18], [21], [24] for moire pattern removal. As depicted in Fig. 2, the moire pattern is more pronounced in the frequency domain; removing the moire pattern using a frequency transform is a recent study trend. Therefore, we calculated the frequency loss directly from the fast Fourier transform (FFT) domain. To the best of our knowledge, this is the first time FFT loss was used to research moire pattern removal. We experimented with an augmentation to reduce capturing errors in datasets, bit-depth variation for capture accuracy, and adding modules for attention to Mura defects. Our contributions can be summarized as follows: • A new moire dataset for the Mura defects detection is generated to remove the moire pattern made when  This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. proposed for aligning an input image and a reference image for supervised learning. • A demoireing network optimized for detecting a display defect is proposed that demonstrates state-ofthe-art performance compared with existing methods. The remainder of this paper is organized as follows. Section Ⅱ reviews studies related to display defects and removing moire patterns. Section Ⅲ introduces the details of the proposed method. Section Ⅳ presents experimental results, including a bit depth variation. Section Ⅴ presents the conclusions.

A. DISPLAY DEFECTS
Various defects occur during display manufacturing. Mura defects are typical vision defects of a liquid crystal display (LCD) panel, which degrades the quality of the display as a defective pixel with a luminance distinct from that of the surroundings. Therefore, we should not create Mura defects or inspect Mura defects to improve the quality of the display. Mura defects can be classified into white, black, dot, and line stains and are caused primarily by photomask problems, misconduct during process control, dust, liquid crystal quality issues, polarizer uniformity problems, and polyimide pollution [2]. Because the size of Mura defects is minimal compared with the size of the display, images without a background should be used [3]. Furthermore, Mura defects are difficult to detect due to low contrast and unclear edges [28]. [34] conducted the study with a similar purpose to detect stains on fabric. However, since our capturing target is a display, it is not flexible but hard. Detection for the defects in stones, which have similar characteristics with Mura defects, has also been studied [35]. Compared with other defects, Mura defects are more difficult to detect due to intrinsic non-uniformity [5]. Although Mura defect detection has been automated, quantification is not objective because it is performed by inspectors other than machines [4]. One Mura defect detection method applies deep learning. Kim et al. [3] proposed an efficient Mura defect classifier using a convolutional neural network (CNN) with accumulated ensemble techniques to eliminate the background pattern of LCDs and improve Mura defect classification performance. Yang et al. [6] proposed an online sequential classifier and transit learning method based on a deep neural network (DNN) for Mura defects classification in a planar display.

B. MOIRE PATTERN REMOVAL DEFECTS
Before the era of deep learning, the moire pattern was analyzed in a manual and computational method such as matrix composition. In addition, it was not analyzed in a frequency domain by using DFT and DCT. Because conventional networks [12], [22] were designed based on signal processing or removing the moire of the image with rough background patterns, removing the moire pattern generated while capturing the display did not produce satisfactory results.
Recently, end-to-end image demoireing methods have been proposed. Sun et al. [13] proposed a multi-scale-based dualdomain multi-scale CNN (DMCNN) with a CNN introduced and created the first moire dataset based on ImageNet [14]. Cheng et al. [15] improved on DMCNN [13] by introducing adaptive instance normalization [16] based on dynamic feature encodes. He et al. [17] labeled the shape, frequency, and color of the TIP 2018 dataset to remove moire patterns more precisely. These methods used the TIP 2018 dataset, but it is inappropriate because it does not contain Mura defects. Zheng et al. [18] classified demoireing in the frequency domain into moire pattern removal and color restoration. They used learnable weights in the frequency domain after the discrete cosine transform (DCT) [19] of the moire pattern. Liu et al. [8] proposed transforming the moire image into the wavelet domain to remove the moire pattern. However, these networks [13], [15], [17], [18], [8] only considered the removal of the moire pattern. Therefore, it is necessary to consider the Mura defects of the display.

A. MOIRE DATASET GENERATION BASED ON MURA DEFECTS
We created images with the Mura defects needing references for the Mura defect to perform supervised learning. Mura defects were replaced by displaying the reference image on the mobile phone; the process of generating the dataset is depicted in Fig. 1. The first step in capturing images is to focus the display's camera for detecting Mura defects after displaying a GT image on the mobile phone. The second step is to adjust the resolution of the captured image. We repeatedly focused the camera and adjusted the distance several times to set the resolution of the images equal to the size of the display. However, it was practically impossible to match the resolution of the images with the pixel number of the display. Therefore, we calculated the distance between pixels of the mobile phone, and the calculation formula is as follows.
where is the distance between pixels in the display, is the length of the display, and is the number of pixels in the display. The images were captured after adjusting the pixel-to-pixel distance of the image to approximate . The third step is to adjust the white balance. The weight of the RGB was adjusted because the size and number of used mobile phones differ between RGB subpixels. The last step is = , This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. to set the exposure. We set the camera exposure to match the luminance of the GT images and the captured images. During the capturing process, we ensured the dataset contained various shapes of moire patterns to prevent the model from being overfitted to a specific moire pattern. Because the moire pattern is an aliasing phenomenon, the shape of the moire patterns varies depending on the angle, arrangement, and direction between the camera and the mobile phone. Therefore, we fixed the camera and ensured that the dataset can secure moire patterns with various shapes by shifting or rotating the mobile phone-even if the moire patterns of the captured data seem the same, they were captured in different states.
Furthermore, all displayed images were different to prevent networks from overfitting the Mura defect. Due to the display's subpixels, when the display is arranged in the vertical and horizontal directions with the camera, the aliasing phenomenon becomes severe, and the moire pattern becomes rapidly stronger. We named images captured when the camera and display were vertical or horizontal to one another as a strong moire pattern. In contrast, we named images captured when the camera and display were diagonal to one another as a weak moire pattern. As depicted in Fig. 3, the dataset is  After capturing a target display, calibrating the captured and GT images is essential for training a model and quantitative comparison. For the TIP 2018 dataset, the images were masked in black to create corners and aligned by matching the keypoints of the corners. However, if the image is masked in black, the Mura defect in the masked portion becomes challenging to detect. Therefore, the area for alignment should be minimal. Hence, we propose inserting ArUco markers. The overall process and images are depicted in Fig. 4. The first step in alignment is changing three-channel images to one-channel images. The second step is to detect the keypoints of the image with the scale-invariant feature transform (SIFT) descriptor and compute the descriptors. The third step involves applying brute force matching (BFM) to the keypoints of two images to be aligned and extracting the location of the keypoints. The last step is to calculate the locations to obtain homography and warp the captured image.

B. PROPOSED NETWORK
We experimented with various demoireing networks such as high-resolution demoire network (HRDN) [21], multiscale bandpass CNN (MBCNN) [18], and DMCNN [13]. However, existing demoireing networks [21], [18], [13] cannot correctly maintain the Mura defect or remove the moire pattern. The existing demoireing dataset is an image with a pattern on the background, and the number of images is 10,000 [23] or 135,000 [13], so task difficulty is high. However, the dataset we created is a relatively low task difficulty for the dataset because it contains a Mura defect with no pattern on the background and 1,000 images.
Accordingly, we experimented with U-Net, which can be trained with fewer datasets. The experimental detail of adding the module is depicted in Fig. 5. U-Net is based on the U-Net structure, an end-to-end fully-convolutional network-based model proposed for image segmentation tasks in the biomedical field and still used as a backbone network and benchmark for numerous models. U-Net is divided into a contracting path that understands the context of the input images and an extension path consisting of fine localization and feature map expansion. The contracting path removes the moire pattern that appears in the entire image, and the expanding path performs localization of the Mura defects. The advantages of U-Net are that it demonstrates excellent performance in various segmentation problems even with a small dataset. Moreover, it is fast because of its end-to-end method. One of the features of U-Net is a skip connection, which connects the contracting path and the expanding path, so context identification and localization are possible simultaneously.

C. LEARNING PROCESS
We performed comparison and analysis under multiple conditions.

a) Image augmentation
Although the camera exposure was set for identical conditions, the luminance of images varied due to camera quantization error [30] and the display scanning rate problem

FIGURE 6. Comparison of output images for two types of strong moire images (an input, an output, an output heatmap, and ground truth)
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3186685 [9]. We experimented with luminance augmentation to enhance the network's robustness to luminance.

b) Frequency loss
The second experiment is a combination of frequency loss. As depicted in Fig. 6, the strong moire pattern was removed using only the mean square error (MSE) Loss. However, the much stronger moire pattern was not removed. As depicted in Fig. 2, the difference between the moire and GT images is noticeable in the frequency domain. Given that MBCNN [18] and Full High Definition Demoireing Network (FHDe2Net) [7] perform DCT, Watermark-Decomposition Network (WDNet) [8] performs a wavelet transform, and Moire pattern Removal Neural Network (MopNet) [17] performs frequency analysis, several deep learning studies [7], [8], [17], [18] have tended to remove the moire pattern from the frequency domain. In contrast, we first applied the direct loss in the frequency domain. The Fourier transform converts periodic functions from the spatial domain to the frequency domain. When the function transformed is discrete, the discrete Fourier transform (DFT) is used, and the algorithm that performs DFT quickly is the FFT, as follows: where H is the height of the image and W is the width of the image. F(u,v) obtained through FFT conversion consists of real and imaginary numbers, as follows: where R is the real number, and I is the imaginary number. After calculating the frequency loss of output images and GT images using the FFT, we combine the FFT loss and the MSE loss. Each equation for frequency loss, MSE loss, and the combination of MSE loss and frequency loss are as follows: where ℱℱ , , , and ̂ are the FFT, frequency loss, GT, and output of the DNN.

c) Additional Modules
The third experiment is the addition of a module to the skip connection. The skip connection performs a vital function of delivering the low-level feature of the contracting path to the expanding path, but it has poor localization accuracy because the low-level feature is delivered as-is without correction. Therefore, Attention U-Net [31] added an attention module to the skip connection, and Attention Augmented (AA)-TransUNet [20] replaced the skip connection with a Convolutional Block Attention Module (CBAM) [27]. However, both networks allow the low-level feature to pass through the module. We compensated for these shortcomings by designing the skip connection to preserve the original information and supplement the insufficient information of the low-level feature by concatenating the module and delivering it to the expanding path. CBAM [27] consists of a channel attention module and spatial attention module, each creating an attention map for channel and space. By multiplying the attention map by the input feature map, important information is emphasized. Vision Transformer (ViT) [26] is a representative network that incorporates an attention mechanism commonly used in the field of natural language processing (NLP) into the field of computer vision. It is designed to disassemble the input feature into patch units and calculate the attention score with the other patch. ViT calculated the attention score with all patches and overcame the limitations of CNN's receptive field disadvantage. Among the methods in the demoireing field, MBCNN [18], a state-of-the-art network, converted the feature of the spatial domain into the feature of the frequency domain through DCT [19] and multiplied the feature by weight to enable weight to be learned. We obtained the localization information and compensated for the lack of information of low-level features by concatenating CBAM, ViT, and the moire pattern removal block (MPRB, MBCNN) [32] in the skip connection.

IV. EXPERIMENTAL RESULTS
For our dataset, the resolution of the image is 2,000×1,024 pixels, and the image size for the test is same. During the training, images were randomly cropped to 256×256, with a batch size of 2. The resolution of the display is 2,532×1,170, and the resolution of the captured image is approximately 2,480×1,150. The distance between pixels of the image is 56.3 μm, which is 1.1 μm longer than the distance between pixels of the display. Adam [33] is used as the training optimizer. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Training was repeated for 500 epochs. The learning rate is initialized to be 10-3, and the learning rate was reduced by 30% for every 50 epochs.
The dataset was an 8-bit image with 500 strong and weak moire patterns each, with 450 and 50 were used for training and testing. The average luminance of the captured image differed by approximately 0.2 from the background luminance of the GT image. Therefore, when we experimented with image augmentation, we randomly added ±0.2 to the input. Due to GPU memory limitations, the ViT module was added

FIGURE 8. Comparison among various U-Nets adding the proposed modules in skip connection (an input, an input heatmap, four output heatmaps for an original U-Net and U-Nets adding proposed modules, and ground truth).
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and Existing networks cannot correctly maintain Mura defects, so the strong moire pattern was not completely removed. However, U-Net maintained the Mura defects well, resulting in a negligible error and the complete removal of the moire pattern. As presented in Table 1, the peak signal-to-noise ratio (PSNR) score of U-Net is 4.39 dB higher for the weak moire pattern and 1.91 dB higher for the strong moire pattern than that of MBCNN, the state-of-the-art network. The existing demoireing networks have many parameters and are designed based on the existing moire datasets [13], [23]. Therefore, they are not adequate for our dataset. However, U-Net can be trained with fewer datasets and is simple, so it performs well with our datasets. Therefore, using a complex network does not guarantee improved performance over a light network when removing the moire pattern that occurs when detecting Mura defects. As presented in Table 2, when augmentation and frequency loss were used, the PSNR increased in the weak moire pattern by 0.39 and 0.82 dB and in the strong moire pattern by 0.55 and 1.83 dB. As depicted in Fig. 7, image augmentation enhances the network's robustness to the changing luminance of the dataset. Therefore, the moire pattern can be removed, and the frequency loss can be used to ensure the removal of the moire pattern. We overcame the problem in the capture process using augmentation. Because the moire pattern is in the spatial domain, it is necessary to combine the frequency domain to achieve high performance. As presented in Table 3, the addition of the ViT, CBAM, and MPRB modules to the skip connection led to an increase in PSNR for the weak moire pattern by 0.27, 0.8, and 0.7dB, and for the strong moire pattern by 0.22, 1.35, and 1.13 dB, respectively. As depicted in Fig. 8, the addition of ViT reduced the error in the part with Mura defects, but ViT did not completely remove the moire pattern because it requires significant data and training. The addition of MPRB effectively eliminates the moire pattern and reduces errors, but some moire patterns remain. The addition of CBAM calls attention to moire and Mura defects, so the moire pattern is    This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. adequately removed, the background is clean, and the Mura defect is well maintained. CBAM must be added rather than ViT or MPRB when the number of captured datasets is small and the Mura defect requires attention. We experimented by combining frequency loss, augmentation, and CBAM. As presented in Table 4, PSNR increased by 2.02 dB for the weak moire pattern and 2.23 dB for the strong moire pattern compared with U-Net. As depicted in Fig. 9, frequency loss has a significant error in the Mura defect part, and augmentation does not remove the moire pattern sufficiently. However, when frequency loss, CBAM, and augmentation are combined, the removal of the moire pattern and the maintenance of Mura defects were successful, and the error area was reduced.
We also added U-Net++ for a broader experiment of U-Net. U-Net++ had 55.86dB in PSNR for the weak moire pattern. This is because U-Net++ has a skip connection with a meshlike structure, thereby resulting in a mixture and loss of information. Therefore, its performance was lower compared to U-Net. Among the existing methodologies, we experimented with the recent method, MopNet. The performance of MopNet was 56.43dB in terms of PSNR. MopNet structurally focused on the color of moire. Channel wise edge predictor is a method of referring to the removal of the moire pattern of red and blue using the characteristic that the moire pattern of the green channel is weaker. However, since we used the image as a gray scale, the green channel didn't matter. Therefore, MopNet is not suitable for the detection of Mura defect.

Ablation study
We compared the 8-bit and 12-bit images to verify the performance of U-Net. As a result of using images taken on a 12-bit domain, the PSNR increased by 0.02 dB for the weak moire pattern and 1.90 dB for the strong moire pattern. As depicted in Fig. 10, the increase in the number of bits did not result in a change when experimenting on the weak moire pattern. In contrast, the moire pattern was sufficiently removed for the strong moire pattern.
Because there is no change in the model and only the number of bits in the input is changed, the difference in the maintenance performance of the Mura defect is inadequate, but the removal performance of the moire pattern is slightly increased because the moire pattern is more accurately captured. The edge of the moire pattern is weak and the PSNR is already high for the weak moire pattern. However, the accurate value of the moire pattern can be known for the strong moire pattern, so training the 12-bit image resulted in a significant increase in performance.

V. CONCLUSION
In this paper, we proposed a novel demoireing network based on U-Net with augmentation, frequency loss, and CBAM for detecting Mura defects. This network provides features to remove the moire pattern and maintain the Mura defects simultaneously. We also created a new Moire dataset to detect Mura defects and suggested inserting ArUco markers to align two images. We conducted various ablation experiments to demonstrate the effectiveness of the augmentation, frequency loss, CBAM, and bit-depth variation in the proposed method. Our proposed method achieved the highest PSNR compared with previous state-of-the-art methods. This method is appropriate to increase PSNR and simultaneously maintain Mura defects while removing the moire pattern. In the future, we will study more efficient networks that can be trained smoothly even on datasets captured under various conditions. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3186685