Effective Defect Detection Method Based on Bilinear Texture Features for LGPs

Automatic defect detection of light guide plates (LGPs) is an important task in the manufacture of liquid crystal displays. During thermo-printing, defects of tag lines on LGPs may occur easily, and these defects are of two categories: bubbles and missing tag lines. These defects lack salient visual attributes, such as edge-based and region-based features, and as such, traditional methods fail to detect them. To address this, we propose a Dense-bilinear convolutional neural network (BCNN), an end-to-end defect detection network, utilizing Dense-blocks (Huang et al., 2017), Bilinear feature layers (Lin et al., 2015), and squeeze-and-excitation blocks (Hu et al., 2018). Our network exploits fine-grained texture features, which leads to parameter reduction and accuracy enhancement. We validate our network on our LGP dataset containing 5,860 images from three cases: bubbles, tag line existence, and tag line missing. Our network outperforms AlexNet (Krizhevsky et al., 2012), VGG (Simonyan and Zisserman, 2014) and ResNet (He et al., 2016), on both the public and our LGP datasets with less GPU memory consumption.


I. INTRODUCTION
In recent years, liquid crystal displays have become increasingly thinner, and owing to this, the demand for high quality light guide plates (LGPs), which are core components of the backlight module, has increased. Defect detection of LGPs is an essential requirement in liquid crystal displays and can be performed using machine vision. Specifically, the task of LGP defect detection is categorized into three types, bubbles (i.e., defects), tag lines missing (i.e., defects) and tag lines existence (i.e., Normal), as shown in Fig. 1. However, to save cost, industrial detection systems are usually equipped with low-cost cameras and cheap GPUs with small memory sizes. Therefore, defect detection is mostly performed using gray-level cameras and simple image processing algorithms.
Traditional detection methods usually involve image preprocessing for the extraction of edges or regions of LGPs for locating the tag line. However, LGPs have good light transmittance, which causes the image edges to blur and the regions to become inconsistent. This renders the traditional image preprocessing and defect detection algorithms ineffective. As shown in Fig. 2(a), we test the Canny edge The associate editor coordinating the review of this manuscript and approving it for publication was Guitao Cao . detector [7], Gabor filters [8] and OTSU [9] and Partial adaptive threshold methods [10] using LGP images. As can be seen, it is difficult to distinguish the line with no lines and bubbles based on the results of these methods. Fig. 2(b) illustrates the result of tag line detection based on high-order polynomial coefficient line fitting and Gaussian elimination methods [11]. From Fig. 2, we can find that traditional image preprocessing methods and line detection methods are sensitive to blurred edges and inconsistent regions of LGPs. Moreover, detecting bubbles using these methods is challenging. Traditional machine-vision methods are unable to ensure such flexibility as features must be hand-crafted to suit the particular domain. Thus, traditional machine-vision methods conflict the trend moving towards generalization of the production line. Deep learning-based methods provide flexible solutions that can be quickly adapted to new types of products, only using the appropriate number of training images [12] To address the challenges associated with the defect detection of LGPs using the traditional methods, we herein use texture features that provide rich information for defect classification. Our detection method is mainly based on image texture classification, where the texture features provide the context about the image for inference, and often, the richer the features, the better is the inference. Several studies have focused on designing an optimal filter to extract texture features with high discriminability. Sophisticated hand-designed features cannot be automatically and directly extracted from large datasets. The recent impressive results of deep learning-based methods in machine vision applications have opened up new possibilities for the research and industrial communities. This success can be attributed to the fact that these methods can learn data-driven features, and as such, hand-craft features are not required in such methods. Moreover, deep neural networks are trained endto-end directly on raw image pixel values. Recently, bilinear convolutional neural networks (BCNNs [2]) were proposed to build orderless texture representations and can be trained in an end-to-end manner. The original BCNNs are based on the VGG [5] backbone. Essentially, the bilinear feature generated by BCNNs can be considered equivalent to the Gram matrix representation, which is a well-known classical two-order texture descriptor. Inspired by BCNNs and to ensure costeffectiveness (i.e. GPU memory size is a dominant factor of GPU price), we improve BCNNs by reducing the number of parameters achieved by a hybrid framework of BCNNs and Dense-blocks, and boosting the performance of the network with the use of squeeze-and-excitation (SE) blocks.
The main contributions of this work are two-fold: • We perform defect detection of LGPs using improved BCNNs by replacing the VGG backbone with Dense-blocks and SE-blocks. To the best of our knowledge, few studies have proposed the use of bilinear features for defect detection with potential for practical application. Our method outperforms the state-of-the-art CNNs on our LGP dataset.
• We build an LGP dataset, which is a special type of dataset that we use for defect detection. To the best of our knowledge, there is no publicly available dataset for the automatic defect detection of LGPs. We annotated 5,860 images, based on the three main categories of LGP defects. Our method only requires gray-level images as input and very few parameters, which significantly increases its applicability in the industry as cheap GPUs can be used. In the remainder of the paper, we first discuss related work, then introduce our method, and finally evaluate and compare it with the state-of-the-art methods.

II. RELATED WORK
Studies concerning defect detection for industrial inspection are scarce. We now review the relevant literature.

A. HAND-CRAFTED FEATURES METHODS
Traditional defect inspection methods usually involve image acquisition, image preprocessing, defect region segmentation, feature extraction, and defect classification. Among these steps, defect region segmentation and feature extraction are the most critical in defect classification. Bi et al. [13], Gan and Zhao [14] used region-based and modified regionbased segmentation methods to detect Mura defects. Li and Tsai [15] proposed a hough transform-based line detection to identify low-contrast defects in unevenly illuminated images. Lu and Tsai [16] detected scratch and fingerprint defects based on a global image reconstruction scheme using the singular value decomposition method. However, as the aforementioned methods are sensitive to noise and uneven background, they are incapable of detecting various types of defects. Texture-based models are more robust and natural to defect classification. It is generally agreed that the extraction of powerful texture features lead to reliable classification results. The study of texture analysis can be traced back to the earliest work of Julesz [17], who suggested that texture can be modeled using k th order statistics of pixels, also called the cooccurrence statistics. Gray level cooccurrence matrix (GLCM) [18] method was developed based on the cooccurrence statistics. Jiang et al. [19] performed weld defect classification using GLCM. Approaches using filters such as the Gabor filter were widely used for texture representation in the early years [8]. Li et al. [20] analyzed the texture information of woven fabric using Gabor filters. VOLUME 9, 2021 However, in a traditional machine-vision fashion, features have to be hand-crafted to suit the particular application. In the final stage, a decision is then made using a hand-crafted rule-based approach or using learning-based classifiers such as support vector machines (SVMs), decision trees, or kNN. Since such classifiers are less powerful than deep-learning methods, the hand-crafted features become very important. In the past, much effort has been made to extract optimal features manually.

B. CNN-BASED METHODS
With the advent of neural networks, it is now possible to perform defect inspection using convolutional neural network (CNN) based methods. CNNs improve flexibility of feature extraction since they are data-driven. The developed methods can be quickly adapted to new types of products and defects by only changing training images. Li [21] used neural networks to learn and detect different types of Mura defects. However, his method requires that the fringe images be enhanced before being learned by neural networks. In fact, popular generic CNN Models can serve as good choices for feature extraction, including AlexNet [4], VGG [5], GoogleNet [22], ResNet [6] and DenseNet [1]. AlexNet was the first to be proposed for this purpose, and the ones proposed later on are deeper and more complex than AlexNet. Deep-learning methods began being applied more often to defect classification problems shortly after the introduction of AlexNet. Masci et al. [23] showed their deep-learning approach based surface-defect classification can outperform classic machine-vision approaches where hand-engineered features are combined with SVMs. They achieved excellent results; however, their work was limited to a shallow network (a CNN with five layers), as they did not use ReLU and batch normalization. Faghih-Roohi et al. [24] used a similar architecture for the detection of rail-surface defects. They used ReLU activation function and evaluated several networks for the problem of classifying rail defects. Weimer et al. [25] evaluated several deep-learning architectures with varying depths of layers for surfaceanomaly detection. Some surface-anomaly detections can be addressed as binary-image-classification problems. Therefore, DeepLabv3+ [26] and U-Net [27], normally used for the semantic segmentation, are also applied to defect detection task. However, some defect detections are difficult to recognize using semantic segmentation methods. Recently, Gatys et al. [28] showed that the Gram matrix representations extracted from various layers of a VGG can be inverted for texture synthesis. BCNNs [2] yield a pooled outer product of features from two CNNs, identical to Gram matrix representations, i.e., in the 2 nd order statistics of pixels. The bilinear pooling of CNN features was proved to be advantageous for texture recognition by Lin and Maji [29].
We reformulate our defect detection into a fine-grained texture recognition problem. Fine-grained texture recognition is a challenging problem and has recently emerged as an active topic, due to the diverse appearance and complex struc-ture of texture, high intra-class variability and small interclass differences. Similar to the traditional texture methods, the bilinear feature vector is an orderless representation of the input image and is therefore suitable for modeling textures. Compared to the related methods, the approach proposed in this paper follows an end-to-end design with the DenseNet network and the Bilinear network. Our method is evaluated and compared with the state-of-the-art methods. The results provide an insight in the complexity of different industrial defects recognition tasks.

III. METHOD A. OVERVIEW OF THE NETWORK
As discussed in Section I, GPU cost and performance are critical aspects that need to be considered for industrial defect detection. A drawback of the bilinear features is the memory overhead of storing the high-dimensional features. Thus, our method is based on DenseNet, which requires less parameters and does not suffer from the problem of gradient vanishing, thereby meeting the economic requirement. For detection accuracy, we use BCNNs to extract orderless texture features and augment the networks using SE-blocks that strengthen their representational power by adaptively recalibrating channel-wise feature responses. Therefore, our proposed network meets the requirements of cost and performance for industrial applications. Fig. 3 illustrates the architecture of our network for LGP defect detection. The design of the defect classification network follows two important principles. First, the appropriate capacity for large complex appearances is ensured by using several layers of convolution and bilinear modules. This enables the network to capture not only the local texture features, but also the global ones that span a large area of the image. We also consider features between channels and use SE-blocks to improve the quality of representations produced by the network by explicitly modeling the interdependencies between the channels of its convolutional features. Second, our network should reduce the overfitting to the large number of parameters. We employ Dense-blocks which introduces a shortcut that the network can use to avoid using a large number of feature maps, if they are not needed. In our network, we use five Dense-blocks, four SE-blocks, and a Bilinear feature layer in the end; there are therefore a total of 118 convolution layers. We also employ transition layers; 1 × 1 conv and 2 × 2 average pooling by stride 2. The main difference between DenseNet and our network is that in our network, the final feature passes through a pooled outer product, thereby leading to the generation of an orderless texture feature. In the experiments, we find that our architecture achieves higher precision than VGGs based BCNNs.

1) FEATURE FUNCTIONS AND LOSS
The bilinear feature (or the Gram matrix form) is a type of orderless representation of an image and is therefore a decent texture descriptor. The simplest bilinear feature layer can be implemented using a pooled outer product of features derived from two identical convolutional features. The bilinear layer is closely related to the Second-Order Pooling approach [30]. However, more studies show that the bilinear feature is generic, and several texture representations can be written as bilinear features. The paper [2] has shown that various orderless texture descriptors can be written in the bilinear form and derive variants that are end-to-end trainable. Moreover, bilinear layers can be easily plugged into existing CNNs or domain-specific fine-tuning for transfer learning. Our loss function is the cross-entropy cost function for classification. The feature outputs are given by the sum of the pooling of the outer products of features from the last Denseblock.
In the classification task, the BCNN model B is defined as quadruple B = (f A , f B , P, C), where f A and f B are two CNN feature function, P is a pooling function, and C is a classification function. This BCNN extracts deep visual φ for image I as below: where bilinear (l, I , f A , f B ) = f A (l, I τ ) f B (l, I ) is the bilinear feature combination of f A and f B at each location l ∈ L. The mapping function f : I × L → R c×D outputs a feature vector of size c × D for image I at location L. For the classification task, function C is trained using image features φ. Note that φ is a high-dimensional feature vector. In our network, we implement B by concatenating the feature output from H (·) into a vector and feeding it into an outer product. Fig. 4 and Fig. 5 provide the details of implementing B and H (·). Our network comprises 100 convolution layers, 9 connected layers, and 115 ReLUs and batch normalization layers. Table 1 shows that the use of the bilinear feature layer leads to a significant improvement in accuracy and loss.
To exploit the discriminability of BCNNs, we employ the SE-blocks and Dense-blocks. These blocks can improve the quality of representations produced by a network by explicitly modeling the interdependencies between the channels of its convolutional features. Our ablation study demonstrates the effectiveness of using SE-blocks and Dense-blocks for the precision and recall of the network. Dense-blocks help the   network converge to a low loss, and SE-blocks can achieve high precision on the validation set.

2) END-TO-END TRAINING
Compared to existing methods, we do not have to transfer weights from pre-trained weights to initialize BCNNs. We directly train our network from scratch in an end-to-end manner; consequently, we find that our training converges fast.

IV. EXPERIMENTS AND ANALYSIS
The proposed network is extensively evaluated on the defect detection of LGPs and objects classification. This section first presents the details of the dataset and then presents the details of the evaluation and its results. We implement our end-to-end VOLUME 9, 2021  architecture with Tensorflow and run it on an i5-10400F CPU with an NVIDIA Geforce RTX3060 12GB.

A. LGP DATASETS
The LGP dataset comprises 4 categories and 3,969 graylevel images, each of 448 × 448 pixels and captured using industrial CMOS cameras. Note that we perform data augmentation on categories of ''NG-0,'' ''NG-1,'' and ''NG-2'' by flipping images horizontally and vertically. Finally, we obtain 6,905 images, comprising 1,419 images of ''NG-0,'' 889 images of ''NG-1,'' 2,097 images of ''NG-2,'' and 2,500 images of ''OK.'' Fig. 7 presents instances of different types in the LGP dataset. The corresponding statistics of the LGP dataset are shown in Fig. 6. We split the LGP dataset into a train set (5,594 images), validation set (620 images) and test set (691 images). Fig. 6 illustrates the distribution of the bubble sizes, tag line pollution levels, and contamination levels. Small bubbles, heavy pollution, and heavy contamination are also identified, and they occupy a certain portion; these can be regarded as hard samples.

B. OPTIMIZATION USING RECTIFIED ADAM
Now, an adaptive learning rate is used to accelerate the optimization of the deep learning model, which is the main method for developing the optimizer. Many optimization methods have been proposed, such as the adaptive gradient algorithm, Adadelta, Adamax, root mean square propagation, adaptive moment optimization (Adam), or Nesterov adaptive moment optimization. Rectified Adam is one of the most progressive algorithms, which was developed by [31]. It improves the generalization, and introduces a term to rectify the variance of the adaptive learning rate by applying warm up with a low initial learning rate. Rectified Adam has been confirmed in project research and achieved success [32].
Computing the weights according to the Adam optimizer: The first moving momentum: The second moving momentum: The bias correction of the momentums: Adding the rectification term in Equation (1), the recent variant of the Adam optimization, named rectified Adam (RAdam), has the form: where the step size, η, is an adjustable hyperparameter and rectification rate is: with When the length of the approximated simple moving average is less than or equal to 4, the variance of the adaptive learning rate is deactivated. Otherwise, the variance rectification term is calculated and parameters are updated with the adaptive learning rate.

C. RESULTS ON THE LGP DATASET AND PUBLIC DATASETS
In this section, we evaluate our method first on public datasets and then on the LGP dataset containing images that have Gaussian noise in them. In Table 1, we compare our method with popular classification CNNs, such as AlexNet, VGG16, BCNN based on VGG backbone, ResNet with 101 layers, ShuffleNet (version 2), Mobilenet (version 3), and DenseNet on public datasets. Our method (herein, referred to as ''BCNN + SE + Dense,'' where ''SE'' and ''Dense'' means SE-blocks and Dense-blocks, respectively) achieves the best performance on most of the metrics, and outperforms most of the aforementioned methods significantly. Although our method is slightly inferior to BCNN based on DenseNet backbone, the STD of 10-fold in terms of accuracy shows that our method is more stable. Note that accuracy, recall, precision, and STD in Table 1 to Table 7 are the mean of four categories.
To further evaluate our method, we add Gaussian noise on the LGP images and test our network using ResNet and Shuf-    fleNet. Table 2 to Table 7 reports that our network is robust to different levels of noise. Although, in the case of Gaussian noise ranging from σ = 0.1 to σ = 1.0, the accuracy of our network drops slightly, our network has smaller STD in terms of accuracy than most of other methods at all levels of noises. Similarly, we achieve good performance in the case of salt and pepper noise ranging from 5% to 20%. Moreover, we find VOLUME 9, 2021 that our method is better in terms of recall. Note that recall is significant in industrial defect detection. We note that Alexnet and VGG16 achieves better performance except in terms of recall for heavy salt and pepper noise. These old networks may shed light on how a network robust against noise attacks can be designed. Table 8 reports GPU memory usage as well as training and inference time for different methods. We observe that our method provides a good balance between network parameters, training time, and inference time. Our network consumes less GPU memory, and achieves fast inference. However, our training time is slightly longer than that of other methods. Nevertheless, it is acceptable for industrial applications.
We also provide a visual explanation for the output of our last convolutional layer in Fig. 8. We observe that the position of the highest responses of our network can efficiently locate the defect areas. Our network can capture the features of bubbles and the tag line.

D. ABLATION STUDY
To investigate the behavior of our proposed method, we conduct several ablation studies. First, we investigate the effect of bilinear layers. To do this, we compare DenseNet with BCNN with Dense-blocks; we observe that the latter is superior to DenseNet, as listed in Table 1. To validate the effect of the SE-blocks, we compare BCNN with and without SE-blocks; we find that SE-blocks lead to improvement in most cases, as listed in Table 1. Even for Gaussian and salt and pepper noise, we can still make improvements with SE-blocks, as listed in Table 2 to Table 7. Furthermore, we present the Precision-Recall curves of four alternatives, including BCNN, BCNN with Dense-blocks, BCNN with SE-blocks, and BCNN with SE-blocks and Dense-blocks. As shown in Fig. 9, we observe that SE-blocks and Denseblocks improve the performance significantly, while achieving higher AUC values. The results on the Precision-Recall curves are consistent with our results in Table 2 to Table 7. Moreover, Fig. 10 and Fig. 11 shows that our network converges faster during training.

E. INDUSTRIAL SCENARIOS TEST
We also conduct a realistic test of our method using LGP images captured from a production line. Fig. 12 illustrates our    industrial set up, which includes an industrial CMOS camera and some software modules. The camera captures the LGP images and sends them to our network via backend. Our network runs on a GPU and is capable of processing images with a speed of approximately 66 FPS. Since our network requires an exceptionally low GPU memory footprint (63.29 MB), our method can also be run on a cheap GPU with a small memory, which is advantageous for the industrial community.
We compare our method with the traditional method, our method achieves a high true/false positive, and its performance has also been verified in a real production line. Notably, traditional image processing algorithms are not robust to noise and regional inconsistency as well as not flexible to new task.
We also present the confusion matrix of our method in Fig. 13. Our algorithm yields an extremely high true positive and extremely low false positive and negative. Our method demonstrates good discrimination ability and less confusion between different classes.

V. CONCLUSION AND DISCUSSION
In industrial processes, one of the most important tasks is defect detection, which ensures the quality of the finished product. Often, quality control is carried out manually and workers are trained to identify complex defects. Such control is, however, very time consuming, inefficient, and results in a serious limitation of the production capacity.
This paper explored a deep-learning approach to surfacedefect detection with a texture classification network from the point of view of specific industrial application. We propose a novel end-to-end LGP defect detection network based on an improved version of BCNNs. Specifically, we aim to detect three types of cases: tag line existence, tag line missing defect, and bubble defect. To address the challenges associated with noise and regional inconsistency in images, we introduce Dense-blocks and SE-blocks to the BCNNs to improve the classification discriminability of defects texture. Our method achieves the best performance compared to AlexNet, VGG, ResNet, ShuffleNet, Mobilenet and DenseNet. Furthermore, our network requires less parameters and is suitable for application in cheap GPUs with less memory for industrial inspection. We also verify our method in real industrial scenarios and confirm that it achieves superior performance for use by the industrial community. In future work, we will focus on acquiring new complex datasets based on real-world visual defects inspection problems, where deep-learning (and other) methods could be realistically evaluated in full extent; the dataset presented in this paper is a first step in this direction. Our CNN model still can be optimized further in terms of inference and training time.
DIBIN ZHOU was born in Anyi, Jiangxi, China, in 1978. He received the Ph.D. degree in computer science from Zhejiang University, in 2008.
He was a Research Associate with British Bedfordshire University, U.K. He is currently a Lecturer with the School of Information Science and Technology, Hangzhou Normal University. His research interests include commercial digital image processing software design and development, machine vision, deep learning, and wireless internet planning. He was a Postdoctoral Research Fellow in computer science and engineering with Ewha Woman University and Nanyang Technological University. He is currently an Associate Professor with the School of Information Science and Technology, Hangzhou Normal University. His research interests include scene understanding, computer graphics, GPU rendering, and collision detection.