Fusing Multilevel Deep Features for Fabric Defect Detection Based NTV-RPCA

Fabric defect detection plays an important role in automated inspection and quality control in textile manufacturing. As the fabric images have complex and diverse textures and defects, traditional detection methods show a poor adaptability and low detection accuracy. Robust principal component analysis (RPCA) model that can be used to separate the image into object and background have proven applicable in fabric defect detection. However, how to represent texture feature of the fabric image more effectively is still problematic in this kind of method. In addition, the use of the traditional RPCA may result in low accuracy and more noises in sparse part. In this article, a novel fabric defect detection method based on multilevel deep features fusion and non-convex total variation regularized RPCA (NTV-RPCA) is proposed. Firstly, the image representation ability is well enhanced through multilevel deep features extracted by a convolutional neural network. Then, the non-convex total variation regularized RPCA is proposed in which total variation constraint significantly reduces the noises in sparse part and non-convex solution is more approximate to the authentic one. Next, multilevel saliency maps generated by the sparse matrixes are fused via RPCA to produce a more reliable detection result. Finally, the defect region is located by segmenting the fused saliency map via a threshold segmentation algorithm. Qualitative and quantitative experiments conducted on two public fabric image databases demonstrate that the proposed method improves the adaptability and detection accuracy comparing to the state-of-the-arts.


I. INTRODUCTION
Detection of fabric defect is an essential task in textile manufacturing, as the presence of defects in fabrics can lead to significant loss, e.g. 45%−65% reduction in sales price [1]. However, in many production lines inspection of fabric defect is still conducted visually by workers, whose skill and ability play crucial role on the efficiency of detection, and the performance can be influenced by many factors, e.g., visual fatigue of the workers. Automated detection of fabric defect based on machine vision technology can provide the objective, The associate editor coordinating the review of this manuscript and approving it for publication was Jinming Wen . stable and reliable performance in defects examination, and hence has become a research focus. However, most existing methods can only be used for the limited types fabrics, and it is hence imperative to further study this topic by development of new methods with improved adaptability and accuracy.
Existing machine vision methods can be classified into two categories according to their suitability to the type of fabric texture. The first category of methods are devised for unpatterned fabric image as shown in Fig 1(a), including statistical-based method [2], spectral analysis-based method [3], [4], model-based method [5], [6] and dictionary learning-based method [7], [8], etc. These methods work well for simple plain and twill fabric images, but not for fabric of complex texture, and hence they can not be directly applied to patterned fabrics. The second category of methods are for fabric image with patterned texture, as shown in Fig 1(b), including Elo rating method [9], Motif-based method [10], and convolutional matching pursuit (CMP) dual-dictionary method [11], etc. These methods localize the defect using template-matching approach, requiring use of a suitable template and precise alignment. Recently, deep learning techniques, particularly convolutional neural networks (CNNs), have achieved excellent performance on image classification, localization and detection. This has inspired great efforts to apply CNNs to classification and localization of defect in fabric images. However, there is not much work reported yet in literature for CNNs based defect segmentation and detection. This is principally because thousands of training images with pixel-level annotations are still scarce in fabric defect detection tasks yet.
Robust principal component analysis (RPCA), also known as low-rank decomposition model, is an effective tool to separate an image into low-rank part and sparse part. The non-defective backgrounds of fabric images are macro-homogeneous and highly redundant, and they can be treated as the low-rank subspace. In contrast, the defective areas are sparse, deviating from the low-rank subspace. Therefore, the low-rank decomposition model can naturally be used for fabric defect detection. A few methods have been proposed based on this idea and achieved good results [12]- [15]. However, further improvement is still required, as fabric images may be contaminated by various noises and interferences, which are also sparse in nature and hence may be falsely detected as defects by low-rank decomposition model. A couple of techniques [16], [17] were proposed to solve this problem by integrating the total variation model (TV) into RPCA to improve the detection performance. However, the total variation model is solved by convex surrogate, which will result in significant deviation from the original solution.
In addition, effective feature characterization of the fabric plays an important role for the performance of above mentioned low-rank based detection methods. The reason is that a good feature descriptor can effectively separate the background part and the sparse defect part. Traditional handcrafted feature descriptors are used in the above mentioned RPCA based detection methods, such as Gabor feature, HOG (Histogram of Gradient) feature to characterize the fabric texture. These feature descriptors are designed for the specific task and not able to adapt to the texture changes. As the convolutional layers in CNN resemble simple and complex cells in the human visual system, and fully connected layers act like higher-level inference and decision making [18], CNN is a powerful method for automatically leaning feature representations. Therefore, it is a consensus that features extracted using CNNs are highly versatile and superior to traditional handcrafted feature descriptors. In general, the high-level convolutional layers abstract more semantic information, which are good at category classification, but weak in shape and location. While, the low-level and the mid-level convolutional layers have a higher resolution and can generate sharp and detailed boundaries. Therefore, different convolution layers focus on describing the different characteristics. In the proposed method, the fabric texture will be characterized by fusing multilevel deep features, leading to improvement in the representative capability to distinguish the defect from background.
In order to learn robust feature representation and to cope with the noise contamination for fabric defect detection, a method based on multilevel deep features fusion and non-convex total variation regularized RPCA (NTV-RPCA) is proposed. The main contributions of this article can be summarized as follows: 1) Multilevel deep feature are extracted to characterize the complex and diverse fabric texture. 2) RPCA is used to separate the defect from background. 3) The non-convex total variation regularized term is integrated into RPCA model to detect fabric defect, which is advantageous by reduction in noise and improvement in the accuracy. 4) A novel fusion strategy based on RPCA is presented to improve the detection results.
The remainder of this article is organized as follows: Section II briefly reviews the related work for fabric defect detection. Section III presents the details of the proposed method. In Section IV, the performance of the proposed method is comprehensively evaluated. Section V summarizes our research work.

II. RELATED WORK A. FABRIC DEFECT DETECTION
Existing fabric defect detection methods based on machine vision can be roughly divided into two categories according to the types of fabric, i.e., the methods for unpatterned fabric images with simple texture and the techniques for the patterned fabric images with complex texture. The methods for the plain and twill fabric include statistical-based methods, spectral analysis methods, model-based methods and dictionary learning-based methods.
The main idea of the statistical-based method is to divide the test image into blocks, and each block is assessed by measuring its statistical properties, i.e., the texture features, such as texton feature [2]. Then, the image blocks containing defect will exhibit different statistical properties. However, it is difficult to extract appropriate statistical feature and such methods are sensitive to scale changes in the fabric texture.
Spectral analysis methods transform the test image into a spectrum domain, and then detect the defect by computing the energy of filter responses [3], [4]. The performance of these methods depends on the filter banks selected and the type of the defect.
The model-based methods detect fabric defects by modeling and parameter estimation. In [5], Markov random fields (MRF) is used as the texture model and a Karhunen-Loeve transforms are proposed for defect detection. Susan and Sharma [6] used a non-extensive entropy calculated by Gaussian mixture model as the regularity index to detect the defects. However, the model-based methods are difficult to be implemented due to the high computational complexity.
Dictionary learning based methods use labeled samples to train defect classifiers. Tong et al. [7] proposed a non-locally centralized sparse representation model to estimate the non-defective class. However, the accuracy of detection is affected by the sparse coding model and small defects are difficult to detect. Sezer et al. [8] used independent components approach to detect defects on raw textile. However, the method does not work well for twill and plain weave fabric.
All methods described above were developed and can work for certain plain and twill fabric images. However, they can not be used for patterned fabric because of the complexity and sophistication of patterned fabrics texture. Recently, several approaches have been proposed for patterned fabric, such as Elo rating method, Motif-based method, and convolutional matching pursuit (CMP) dual-dictionary method.
Tsang et al. [9] proposed an Elo rating (ER) method to detect the patterned fabric defect by the idea of sportsmanship, i.e., fair matches between the image block in a test image. The test image is divided into image blocks with a standard size, and then the matches between various patches will be updated by an Elo point matrix, and the image blocks can be classified into defect blocks or defect-free blocks. However, the performance of the method relies on partition size and the number of randomly located partitions.
Motif-based method [10] assumes that the patterned texture can be divided into lattices. Then the symmetry property of motifs is utilized to calculate the energy of moving subtraction and its variance among different motifs. By counting the distribution of defective patch and defect-free patterns, the threshold for discriminating defective and defect-free patterns can be determined. However, this method cannot detect small defect whose size is smaller than the partitioned lattices.
Jing et al. [11] proposed a convolutional matching pursuit (CMP) dual-dictionary method for patterned fabric defect detection. A set of defect-free image blocks are selected as a sample set by sliding window. Subsequently, dual-dictionary and sparse coefficients of the defect-free sample set are obtained via CMP and the K-singular value decomposition (K-SVD). Then the projections of defect-free and defective fabric image on the dual-dictionary are used as features for defect detection. Finally, the test results are determined by comparing the distance between the features to be measured. However, this method requires a set of defect-free image and is sensitive to the size of blocks.
Based on the above analysis, the methods based on machine vision are more objective, stable and reliable than the traditional manual method. However, when applied for unpatterned fabric, these methods have an unstable performance and are sensitive to the selection of parameter. The methods designed for patterned fabric images usually require defect-free samples to train, or choose a suitable template, thus they are lack of adaptability and will be restricted to specified type fabric.

B. CNN BASED FABRIC DEFECT DETECTION
Deep convolutional neural network has demonstrated itself as a powerful tool for image classification, localization and detection with the promotion of Large Scale Visual Recognition Challenge (ILSVRC) [19]. Recently, deep learning has also been used for fabric defect localization [20], [21]. For fabric defect detection, namely as defect segmentation which is to tackle pixel-wise object instance segmentation, there is not much work reported yet, because collecting large numbers of fabric defect samples with pixel-level labels, especially for some rare types fabric, is extremely difficult in practice. Actually, the key to the success of deep learning lies in a well-designed convolutional neural network. Such a network has been deemed as a feature extractor with a stronger versatility and portability than traditional handcrafted features. Besides, the key advantage of deep learning is that these layers of features are not designed by human engineers. They are learned from data using a general-purpose learning procedure, thus reducing the dependency of specific domain knowledge and complex procedures needed in traditional feature engineering. Inspired by this, we employ activations of CNN as descriptor to represent fabric images.

C. RPCA BASED FABRIC DEFECT DETECTION
A series of RPCA based methods have been proposed for fabric defect detection. Li et al. [12] introduced dictionary learning into RPCA, i.e. low-rank representation (LRR), which can retain certain edge and texture information and hence can accurately detect saliency defect. Cao et al. [13] proposed to use prior knowledge guided least squares regression. In this work, an 8-dimension texture feature is used for characterizing the fabric image, local prior learnt from local texture features is incorporated into LRR to further guide and improve the detection. In [14], a feature descriptor DERF derived from the biological modeling of P ganglion cells is utilized to improve the representation of fabric images. Meanwhile, Laplacian regularization is integrated in LRR to further enlarge the gaps between defective regions and the background. In [15], a spatial pooling strategy is utilized to improve the discrimination ability of an efficient second-order orientation-aware descriptor GHOG. Then the nuclear norm in RPCA is surrogated by a non-convex log det, which can improve the efficiency. In [36], a second-order multi-channel feature extracted by modeling P-type ganglion cells in the primate retina is proposed to characterize the fabric texture. Then, a joint low-rank decomposition method is utilized to model biological visual saliency and detect defect.
In fact, the performance of RPCA based fabric defect detection depends on the effectiveness of feature descriptor and RPCA model. However, the above methods still use traditional handcrafted feature descriptor to characterize the fabric texture, which is not sufficient for the fabric image with complex texture. In addition, the existing RPCA model may result in additional noises in the sparse part. Meanwhile, traditional machine-vision methods cannot handle all types of fabric images, while CNN based methods still only study the defect classification or localization currently, due to the lack of a large number of fabric defect samples with pixel-level label. In order to overcome or alleviate these problems, we proposed a new method based on multilevel deep features fusion and NTV-RPCA in this work.

III. THE PROPOSED METHOD
The novel method proposed is based on multilevel deep features fusion and NTV-RPCA, which can be divided into five steps, as described in Fig.2.
To begin with, multilevel deep features are extracted by retrained VGG16 [22] and they will be employed to represent the fabric image. Then, the deep feature maps are partitioned into multiple regions by overlapping uniform partition, and the feature vectors of each region will be aggregated to generate the deep feature matrix of the image. Subsequently, RPCA model is used to separate the deep feature matrix into the redundant matrix representing background and the sparse matrix representing defect. Meanwhile, the non-convex total variation term is integrated into RPCA model to further improve the detection performance. Besides, the multiple saliency maps generated from various convolutional layers are fused via RPCA model to combine their advantages and obtain a better result. Finally, the detection results can be obtained by a threshold segmentation operation.

A. MULTILEVEL DEEP FEATURE EXTRACTION
Feature extraction is crucial for the fabric defect detection based on RPCA model. The traditional handcrafted feature descriptors require careful engineering and considerable expertise, which are usually only devised for specific kind of images. In other words, they cannot adapt to fabric images with different texture pattern. As a powerful feature extraction method, convolutional neural networks can automatically learn hierarchical and representative features via a layer-to-layer successive propagation pipeline. As CNNs are originally inspired by biological neural network, it is a natural choice to build a feature extractor for visual saliency. So, deep features extracted by CNNs have been proved to exhibit stronger versatility and portability than the traditional handcrafted features. Therefore, the feature extraction based on CNNs will be utilized in this article.
The performance of feature extraction by CNNs requires a well trained neural network, and it needs tens of thousands or even millions of labeled images. However, there is not a public fabric defect database with enough labeled images to support training a new network. Since ImageNet database has a large number of images, among which many are with complex texture similar to fabric images, we can transfer a model pre-trained over the ImageNet database to fabric image [23]. Specifically, we will adopt a typical transfer learning approach that only retraining the last fully connected layer of a pre-trained model.
Considering that VGG16 network is advantageous by higher expansibility than other CNN models, we adopt VGG16 network as the pre-trained model to extract deep features from the input fabric images. The 13 activations layers of VGG16 network corresponding to 13 feature extractors will be used to extract feature respectively. LeCun et al. [24] pointed out that the learned features in the first layer typically represent the edges at particular orientations and locations in the image, like a Gabor filter bank extracting low-level features, while subsequent layers detect objects as combinations of these detailed parts. The feature extracted from the deeper layers corresponds to the abstract feature of input image including the semantic properties, which are suitable to locate the salient regions. The feature at shallow layers contains the spatial structural details, which is suitable to locate boundaries. For fabric defect detection, the features extracted from the shallower layers are more important than the deeper layers for fabric defect detection.
Since the size of feature map is inconsistency due to convolution and pooling operations in VGG16, we should resize each feature map to make them the same size as the input images. Then for each pixel, the deep feature is formed by concatenating activations from feature maps at the same location of the pixel. Let f i be the feature vector of each pixel extracted from the fabric image where i = 1, 2, · · · , N × N , N×N is the image size. x il indicates the activation taken from the l th resized feature maps of a certain convolutional layer at the i th pixel. In order to construct feature matrix, deep feature maps are divided into the image blocks with the same size n b × n b . For each segment R k , where k = 1, 2, · · · , N b , N b is the number of image block, and the mean of feature vectorsf k within this segment is regarded as the feature of this image block.
Then the deep feature matrix can be formed by stacking the feature of all image blocks. (3)

B. SALIENCY INFERENCE WITH NON-CONVEX TV-RPCA
As fabric is woven by warp and weft in a particular way, and the defect breaks this regularity, so the background of a fabric image always can be considered as a highly redundant which lies in a low dimensional subspace, while the defect is always different from the background and usually occupies areas of small size, which implies sparse. Therefore the RPCA can be used at dealing with the task of fabric defect detect, and it can be implemented by the following two steps, namely, model construction and model solution.

1) MODEL CONSTRUCTION
The RPCA can be realized through minimizing the following problem: where F is the deep feature matrix extracted from a certain convolution layer, L is a low-rank matrix representing the background, S is a sparse matrix indicating the defective object, γ is used to balance the effect of the two terms. · * denotes the nuclear norm which is defined as the sum of the singular values of the matrix, · 1 denotes the l 1 norm defined as the sum of the absolute value of all entries. However, fabric images are easily contaminated by noise derived from camera sensors and background clutters. The noise mainly include Gaussian noise or impulse noise that also possess sparse property, thus if we utilize the traditional RPCA in (4), both the defect and noise are easy to be separated into matrix S simultaneously, making it difficult for the defects to be detected. Therefore, effective separation of defect and noise is a great challenge for fabric defect detection based on RPCA.
As an effective approach for denoising images and videos, total variation norm (TV-norm) is able to suppress discontinuous changes, preserve the edges and spatially promote piecewise smoothness. Inspired by this, we integrate TV-norm into the RPCA model to detect defects and denoise simultaneously. The model is denoted as TV-RPCA, and can be described as follows.
where · TV is a total variation regularization, β is a weighting parameter whose role is identical to γ . The convex total variation regularization function S TV has two cases: one is isotropic form S iso TV , the other is anisotropic form S ani TV . And they can be defined as DS 2,1 and DS 1 respectively.
where D x and D y denote the first-order forward finitedifference operators along the horizontal and vertical directions respectively. D i = [D x ; D y ] represents a two row matrix formed by stacking the i th row of D x and D y . D i u Vec denotes the first-order difference of S at each pixel i in both horizontal and vertical directions.
Among approaches for solving TV-norm, Split Bregman iteration [26] can transform a constrained optimization problem to a series of unconstrained ones, and achieve the highest efficiency. However, such method belongs to the convex regularization, and l 2,1 norm is for isotropic case and l 1 norm is for anisotropic case. The common drawback derives from such relaxation may make the solution significantly deviate from the authentic solution. In order to improve the solution accuracy, we integrate non-separable non-convex TV regularizations into TV-RPCA model with both anisotropic and isotropic cases, and we call it as the non-convex total variation regularized RPCA (NTV-RPCA). This can be solved by minimizing the following objective function: min where · NTV is the non-convex TV regularization based on the Moreau envelop and minimax-concave penalty [27], [28].

2) MODEL OPTIMIZATION
In order to split the energy function, we introduce an auxiliary variable J = S.
Thereafter, alternating direction method of multipliers (ADMM) is employed to solve problem (9). The augmented Lagrangian function of problem (9) is rewritten as follows.
where Y 1 and Y 2 are the Lagrange multiplier matrices, · represents the inner product, · F denotes the Frobenius norm and µ is a positive penalty parameter. In addition to the Lagrange multipliers, there are three variables, i.e. L, S and J . It is difficult to hard to make a joint optimization over them simultaneously. So we approximately solve it in the manner of minimizing one variable with others fixed. The detail of optimal iteration is as following: a: UPDATING L WITH THE OTHER VARIABLES FIXED the solution of L at the (k + 1) th iteration L k+1 can be obtained by solving the following sub-problem: Such sub-problem had been solved by the Singular Value Thresholding (SVT) directly [29].

b: UPDATING S WITH THE OTHER VARIABLES FIXED
the solution of S at the (k + 1) th iteration S k+1 can be obtained by solving the following sub-problem: Such sub-problem had been solved by the Soft Threshold (ST) directly directly [30].

c: UPDATING J WITH THE OTHER VARIABLES FIXED
the solution of J at the (k + 1) th iteration J k+1 can be obtained by solving the following sub-problem: To be convenient for description, such non-convex TV-norm (NTV) denoising model (13) can be written as: where ϕ α,q (·) := · NTV is the minimax-concave penalty function of (6) or (7), q represents either the l 1 or l 2,1 norm that correspond to anisotropic and isotropic case respectively. Such problem is convex when 0 ≤ α ≤ 1 λ, and it will reduce to standard TV-norm denoising model when α = 0. Such sub-problem can be solved by forward-backward splitting (FBS), and details of FBS can be reviewed in [31]. After repeated iterations, we can obtain the optimal value until reaching the stop condition: Motivated by the afore-discussed considerations, the pseudo-code of minimizing the Lagrangian function of (9) using ADMM is summarized in Algorithm 1 Algorithm 1 Solving NTV-RPCA by ADMM Input: fusion feature matrix F; parameters γ > 0, β > 0; 25/ F 2 ,µ max = µ 0 10 7 ,ρ = 1.5,k = 0,tol = 10 6 while not converged do 1) Update L k+1 using (11); 2) Update S k+1 using (12); 3) Update J k+1 using (13); 4) Update the Lagrange multipliers Y 1 k+1 , Y 2 k+1 using (15) and (16) The higher saliency score m(I j ) indicates the higher probability for the image block to belong to the defect. Then, the corresponding saliency map m is generated according to the spatial position relation.

D. MULTILEVEL SALIENCY FUSION
For each convolutional layer, we can obtain one saliency map with the above process, and multiple saliency maps can be obtained correspond to different convolutional layers. Based on the above analysis, saliency map generated by any single convolutional layer may fail in capturing the intrinsic salient defect regions. In order to achieve reliable saliency detection, we propose to use an adaptive fusion strategy via RPCA.
To compute the adaptive fusion weights, each saliency map, which is a data matrix, will be firstly converted to a row vector respectively, and then we stack all the vectors to form a saliency indication matrixF. Because the saliency maps generated from different convolutional layers are similar,F is of low-rank. We can model this using the RPCA: where each rowŜ i ofŜ represents the disparity of the corresponding saliency map. The larger theŜ i is, the more inconsistent this saliency map is with others, and so the corresponding saliency map m i should be endowed with a relatively small weight. The weight ω i can be calculated as follows.
Then the final saliency map using multilevel saliency maps can be obtained by

E. SALIENCY MAP SEGMENTATION
Since the defective regions always occupy smaller part of the entire saliency map, a threshold operation can be used to estimate the upper and lower boundary of the threshold value, which is given by the following equation where c is a constant, µ and σ are mean and standard deviation of pixel values in the saliency map. Finally, the segmentation results can be obtained by a binary imageM to locate the defect regions.

A. EXPERIMENTS SETUP 1) DATASET
Two fabric databases are selected to implement comprehensive evaluation of the proposed methods. One is the TILDA fabric images dataset [32]. constructed by workgroup on texture analysis of German Research Council, and it includes 284 plain or twill fabric images with simple textures. The other is from the Research Associate of Industrial Automation Research Laboratory, Department of Electrical and Electronic Engineering, Hong Kong University. It mainly includes patterned fabric images with complex texture from the star-, box-and dot-patterned fabric. Among them, the star-patterned fabric database contains 25 defect-free and 25 defective images, the box-patterned fabric database contains 30 defect-free and 26 defective images, and the dot-patterned fabric database contains 30 defect-free and 30 defective images. Noted that only the patterned fabric images have corresponding ground-truth images, which are treated as standard criterion.

2) IMPLEMENTATION DETAILS
First, we transfer domain-specifically VGG16 to adapt to our fabric databases by replacing the original softmax layers with 2-way outputs, namely belonging to defective image or not.
Then, the transfer learning is carried out by stochastic gradient descent with a batch size of m=200, momentum of 0.9, and weight decay of 0.0001. The learning rate is initially set to 0.0001 and is decreased by a factor of 3 when the validation set accuracy is stabilized. Then, for the saliency inference model, the model parameters γ and β are empirically set to 0.002 and 0.01 respectively, and constant c in the threshold operation is set to 0.27. All parameters are kept fixed for all the experiments to demonstrate the robustness and stability of our method. The simulation is performed in matlab2018b, running on a PC with an i7-8750H CPU accelerated by a NVIDIA GeForce GTX 1080 GPU.

3) EVALUATION CRITERIA
To perform a comprehensive evaluation, statistical parameters are introduced to verify the performance, including true positive (TP), true negative (TN), false positive (FP), false negative (FN). Where true positive is the number of defective periodic blocks identified as defective; true negative is the number of defect-free periodic blocks identified as defectfree; false positive is the number of defect-free blocks identified as defective, and false negative is the number of defective blocks identified as defect-free.
Based on the above parameters, evaluation criteria are applied, including: Accuracy ACC = (TP + TN)/(TP + FN + FP + TN), true positive rate TPR = TP/(TP + FN), false positive rate FPR = FP/(FP + TN), positive predictive value PPV = TP/(TP + FP) and negative predictive value NPV=TN/(TN + FN) are adopted in this article. Moreover, curve metrics, including receiver operating characteristic (ROC) curves and precision-recall (PR) curves, are also shown based on the above evaluation criteria, and the AUC (Area Under ROC curve) score is reported given from ROC curve. It should be noted that because of the lack of ground-truth in the TILDA fabric database, the above quantitative evaluation will only be conducted for the patterned fabric databases.

B. ABLATION STUDY 1) COMPARISONS OF DIFFERENT CONVOLUTION LAYERS
The activations of each convolutional layer are employed to form deep feature. The features derived from different convolutional layers range from Conv1-1 to Conv5-3, which are used to characterize the multilevel deep features. Among them, the shallow layers of VGG16 act like a Gabor filter bank, which can extract low-level contrast information. The deeper layers with receptive field of entire image represent high-level semantic information, but have low discriminability for pixels. Due to the limited space, we will only show the saliency maps generated from the top 9 of convolutional layers, as presented in Fig. 3. The first column is the original image; the second column to the last column are the generated saliency map using the Conv1-1 to Conv4-2, respectively.  It can be seen that the shallowest and the deepest convolution layers shown in Fig.3 yield the worse results, while the intermediate layers achieve better results. Therefore, we can conclude that the layers higher than Conv4-2 cannot generate the good detection result. We also find that, from different convolution layer, the best result for different image is generated. For example, the best detection results for the first image come from Conv3-3, but the best results for the last image come from Conv3-1.
In order to perform a quantitative evaluation, the PR curve of different convolution layers in patterned fabric database is shown in Fig.4. From this figure, we can see that Conv2-2 achieves the best performance, and the intermediate layers outperform other layers. This verifies the discussion in section III -A that shallow and intermediate layers are more important than deep layers for characterizing the fabric texture.
Through quantitative and qualitative experiments, we can see that a single convolution layer is not able to efficiently generate the best detection result for all kinds of fabric image. The results of each convolution layer are complementary to others, and so different convolution layers should be fused to improve the detection results. VOLUME 8, 2020 2) COMPARISONS WITH HANDCRAFTED FEATURES In this section, deep features extracted by VGG16 will further make a comparison with other frequently-used low-level handcrafted features, such as Gabor, DAISY, HSOG and LBP. We will directly use ROC curve to perform the qualitative evaluation of these features. Meantime, we only use Conv2-2 layer feature to compare with the handcrafted features. The detection results are shown in Fig.5. It can be seen that even if one layer feature is significantly superior to all the handcrafted features, which demonstrates the substantial superiority of deep features. In addition, we adopt the AUC to evaluate the detection result, as shown in Table 1. We can see that the deep feature achieves better performance than other handcrafted features. Because we only adopt one layer feature Conv2-2, it is expected that the performance can be further improved by fusing multilevel deep features,

3) COMPARISONS OF SALIENCY INFERENCE MODELS
In section III -B, a saliency inference model NTV-RPCA is proposed to detect the defect saliency map. To validate its effectiveness, we compare our proposed model with the other four models: 1) original RPCA; 2) isotropic TV regularized RPCA (ITV-RPCA); 3) anisotropic TV-norm based RPCA (ATV-RPCA); 4) non-convex isotropic TV regularized RPCA (NITV-RPCA); 5) non-convex anisotropic TV-norm based RPCA (NATV-RPCA). In addition, we consistently use Conv2-2 of VGG16 to extract features consistently for the sake of fair comparison, and the results are shown in Fig.6, where the first row is the original image and the last five rows correspond to the saliency map generated from RPCA, ITV-RPCA, ATV-RPCA, NITV-RPCA and NATV-RPCA respectively.
From Fig.6, it can be concluded that TV-RPCA can consistently improve the results computed using RPCA across all the two databases due to the integration of total variation. Then because of the technology of non-convex TV, the results of NTV-RPCA can significantly reduce noise and enhanced outline information of defects when comparing with RPCA and TV-RPCA, especially NATV-RPCA on the patterned fabric images database. In order to further demonstrate the efficiency of our proposed model, the ROC curve and PR curve, comparisons of five configurations on patterned fabric dataset are shown in Fig.7. We can conclude that that results of NATV-RPCA and NITV-RPCA perform better than the other three configurations, where the curve of NATV-RPCA is higher than NITV-RPCA a little. Both qualitative and quantitative experiments confirm that the effectiveness of non-convex TV regularization term and NATV-RPCA is more suitable for the fabric defect detection.

4) COMPARISONS WITH THE STATE OF THE ARTS
As discussed in the previous sections, we had investigated the contribution of image representation and saliency inference after an ablation study. In this section, we compare the detection results of our method with the state of the art, including HOG [34], PGLSR [13], ER [9], LSF-GSA [35] and SOMC [36].
A subjective comparison is shown in Fig.8, where the first column is the original fabric image, and the second to the seventh columns are the detection results generated by the HOG, PGLSR, ER, LSF-GSA, SOMC, and our proposed method. The eighth row is the segmentation results generated by our method and the last row is the ground-truth images. It can be observed that dramatically discrete defects occur in the results of HOG method, especially in Fig.8 (a) and (b), and so such method cannot work for the patterned fabric image. The PGLSR method could effectively detect defect position in the patterned fabric, but at the cost of inaccurate shape descriptions of the defects. ER method could not only locate defects position, but also retain some contour information. However, such method requires fabric images without defects as match samples, which belongs to supervised learning. LSF-GSA generated saliency map by incorporating local texture features with global analysis, but the detection results are filled with lots of spots noise. SOMC based on multi-channel feature matrixes extraction and joint low-rank decomposition could effectively detect defect position and outline, but part of detection results for box-and dot-patterned fabric images are discontinuous and exist noises. Our method can not only highlight the position of defective regions, but also outline the shape of defects for all types of fabric images, and its segmentation result is extremely similar to the ground truth images. In addition, the proposed method can detect the fabric images with big defects more effectively than the other methods, as shown in the first row of Fig.8 (c).
Besides the methods mentioned above in Fig.8, the evaluation results of WGIS reported by [14] and TDVSM [37] are also listed in the quantitative evaluation. The average evaluation results for each defect type of star-, box-, and dot-patterned fabric images are illustrated in Table 2. Results that surpass other competing methods are bold. From this Table, we can see that our proposed method performs better than the existing methods in most cases on the three patterned fabric datasets. Even our method is slightly lower than others in rare evaluation criteria, it is obvious that defect contour and continuity of our method, as shown in Fig.8, is more complete than others. In summary, qualitative and quantitative experiments verify the robustness and superiority of our proposed method.

V. CONCLUSION
In this article, we proposed a novel fabric defect detection method based on multilevel deep feature and NTV-RPCA. Based on the fact that handcrafted feature is incapable of characterizing the fabric texture comprehensively, the multilevel deep features extracted by VGG16 are used to improve the image representation ability. In order to separate the defects effectively, RPCA is adopted to decompose the fabric images into background parts and salient defect parts. Meanwhile, non-convex total variation regularization term is integrated into RPCA to prevent defect saliency map from being polluted by noises as much as possible. Besides, saliency maps generated by multilevel deep features are fused to combine the advantages of all convolution layers. We also compare the performance of the proposed approach with that of previous approaches, such as the HOG, PGLSR, ER, LSF-GSA, SOMC methods. The qualitative and quantitative experimental results demonstrate that our proposed algorithm is more effective than other state-of-the-art methods. In addition, the proposed algorithm provides a new solution for detecting surface defects of other industrial products.