Surface Defect Detection of Wet-Blue Leather Using Hyperspectral Imaging

Detection of surface defects on wet-blue leather is much more challenging than raw-hide leather. Since wet-blue leather turns blue and contains moisture after pre-treatment, it is a semi-product of the cowhide processing. At present, the defect detection of wet-blue leather is mostly carried out manually and is time-consuming and labor-intensive for professional inspectors. This paper is the first to use hyperspectral imaging (HSI) to implement the surface inspection of five wet-blue leather defects including brand masks, rotten grain, rupture, insect bites, and scratches in the pixel level detection. Hyperspectral Leather Defect Detection Algorithm (HLDDA) including Hyperspectral Target Detection (HTD) and Deep Learning (DL) techniques was proposed in this paper. In HTD, Weighted Background Suppression Constrained Energy Minimization (WBS-CEM) and WBS-Hierarchical CEM (WBS-hCEM) were developed in this paper by using weighting to suppress the background and enhance the contrast between the target and background. Experimental results showed that the overall performance of WBS was better than the original CEM. In the DL part, 1D-Convolutional Neural Network (CNN), 2D-Unet and 3D-UNet architectures were designed to segment defect areas. For various characteristics of defects, 1D-CNN emphasizes on defects with spectral features, 2D-Unet emphasizes on defects with spatial features, and 3D-Unet can simultaneously process spatial and spectral information in HSI. The experimental results verified that the proposed HLDDA could effectively quantify and estimate the size of the defect, thereby accelerating the leather inspection process by professional inspectors and develop an automated leather grading system towards Industry 4.0.


I. INTRODUCTION
The leather industry is one of the important traditional industries. The produced leather is mainly supplied for downstream leather goods factories, which use leather as the raw material to make various leather goods, including leather shoes, bags, suitcases, gloves, belts, and sofas. The foremost raw material of the leather industry is cowhide. The skin peeled from the freshly slaughtered cattle is generally known as fur or rawhide. It turns blue after deoiling, degreasing, unhairing, and chroming. It is often called wet-blue leather because it contains moisture and is the semi-product of the cowhide processing procedure. Some marks are left on the leather surface during the growth of cattle, such as brand masks, rotten grain, ruptures, insect bites, healed scars (closed/open), and The associate editor coordinating the review of this manuscript and approving it for publication was Emre Koyuncu . scratches, as shown in Fig. 1. Therefore, the products made from natural leather tend to retain these marks, which in turn, affect the leather grade [1]. At present, the common grading standard is SATRA [2], which contains six grades according to the usable area ratio of leather with Grade 6 being considered as unusable. Grades 1-6 are listed in Table 1. Leather is inspected manually; the inspectors must visually inspect the leather and mark the defects with chalk. Manual inspection is likely to cause fatigue and misrecognition; thus, the judgment result must be verified and approved by other inspectors. Therefore, a rapid, comprehensive, and noninvasive inspection method for leather has become an important issue.
As the leather surface is blue after pretreatment, the defects are not obvious as compared to those in the raw-hide leather. The major challenge of detection and recognition of defects on wet blue leather is that the specimens provide very limited spatial information and they are generally difficult to be VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/  confirmed or detected with the naked eye using RGB images. Therefore, the image processing techniques in the traditional spatial domain may be inapplicable. In recent years, the development of remote sensing instruments, hyperspectral image (HSI) has become an emerging technology and has been extensively used in the domains of geology [3], agriculture [4]- [7], global change [8], and national defense [9], with highly promising industrial potential [6], [10]- [13]. The hyperspectral sensor has 200 continuous spectral bands to enhance the spectral resolution. Hyperspectral sensors with very dense bands can be used for detecting, classifying, distinguishing, identifying, and quantifying micro-objects and substances. This study employed the spectral information of leather defects instead of the spatial information processing techniques to effectively identify leather defects.
This study proposed a novel method of using HSI to replace the traditional manual wet blue leather inspections. The method is known as the Hyperspectral Leather Defect Detection Algorithm (HLDDA) and combines Hyperspectral Target Detection (HTD) and Deep Learning (DL) techniques to locate and quantize the defective areas of leather. Since hyperspectral data volume is very large, a high data storage capacity is required and reducing data volume is also a topic worth exploring. In terms of the band selection, this study controlled the number of bands effectively at 10, so that the defect position can be detected rapidly with the spectral information of only one target object. The experimental results showed that the WBS technique of HTD effectively suppressed background interference and achieved better results. In addition, this study proposed 1D-CNN, 2D-Unet, and 3D-UNet architectures based on DL. The 1D-CNN performed analysis and quantization completely according to the signals of spectra, and 2D-Unet focused on the defects with spatial characteristics. As the HSI is a 3D image, the spectral and spatial information can be processed simultaneously by 3D-UNet. The experimental results illustrated that the 3D-UNet has the best performance in detecting ruptures, rotten grains, and scratches defects, whereas the 1D-CNN has a better performance in detecting insect bites and 2D-Unet has the best result in detecting brand masks.
The HTD and DL are completely different concepts. The HTD separates the target from the background based on the concept of a matched filter and can detect the defect position rapidly. For DL, this study applied three different dimensions of convolution to extract the features and correlations from the spatial and spectral domain in HSI, thus, DL requires more prior information to train the neural network. HTD and DL both have their own merits and demerits. The experimental results of this paper can provide appropriate algorithms and spectral ranges, which can effectively quantify and estimate the size of various defects. Using HLDDA to detect defects can ensure the consistency of the inspection criteria, save time, and increase efficiency. The findings of this study can provide a reference for professional inspectors of image analysis, and for leather factories to develop an effective leather grading system. This technique will play an important role in the future development of leather grading towards Industry 4.0.

II. RELATED RESEARCH
Many previous studies have conducted experiments on defect classification for rawhide images. The shape features (e.g. length, width) and textural features (e.g. contrast, entropy) were selected according to the defect features and standard features of leather surface. Then the Feedforward Neural Network (FNN) [14] was combined with Decision Tree [15] to select the optimal attribute and classify the defects. Bong et al. [16] extracted the image features and located the defects. The extracted image features included color, shape, and textural features for Support Vector Machine (SVM) classification. Kwak et al. [17] used geometrical and statistical characteristics as characteristic sets in the defect classification process. A three-stage sequential decision-tree classifier was used to classify five kinds of defects. Liong et al. [18] used AlexNet classification and the U-Net segmentation method to detect three kinds of defect data. Each collected image was separated into 400 × 400 pixels and then classified. Liong et al. [19] used Mask R-CNN for automatic defect segmentation of tick bites. Each collected image was divided into 400×400 pixels, and a neural network was built using a sliding window. Villar et al. [20] used Multilayer Perceptron (MLP) to detect four defects in the wet blue leather. The first-order statistics and contrast characteristics were extracted from the Gray Scale image, RGB, and HSV (Hue, Saturation, Lightness) channels for training. Viana et al. [21] used the Interaction Maps and the Grey Level Co-occurrence Matrices (GLCM) to extract the attributes of samples as textural features. Several color features were extracted. LIBSVM (a library for SVM) and SMO (sequential minimal optimization) were used for recognition. Pereira et al. [22] used GLCM, Local Binary Patterns (LBP), and Structural Co-occurrence Matrix (SCM) to extract features and trained k-nearest neighbors (KNN), Multilayer Perceptron (MLP), and SVM for detecting defects. Pistori et al. [23] used the software of the DTCOURO leather defect detection system to extract small samples from the defect and non-defect regions. The textural and color features were extracted from each sample, and the SVM and KNN were used to detect the defects in rawhide and wet blue leather. Yeh and Perng [24] used image processing techniques to detect the wet-blue leather defects and calculated the number of pixels in the unusable region. Aslam et al. [25] investigated various transfer learning strategies and integrated networks to improve the classification performance of leather defects by their Wet-blue Leather Image Dataset (WBLID). Although some deep learning-based methods have been applied to defect detection in recent studies [26]- [30], they are not workable in wet-blue leather defect detection, or may not be effective even if they can be applied to as the defects in the wet-blue leather.
Unlike most of the studies that used rawhide, this study focused on wet-blue leather, which defects become less apparent and difficult to recognize by naked eyes after the dyeing process. The traditional image processing techniques may fail to detect those defects. This paper is the first to use HSI to detect defects in the pixel level, and its contributions are as follows: 1) Most of the existing studies used the color, shape, and texture of rawhide leather defects as the benchmark for defect detection, and then used classifiers such as SVM, KNN, and MLP to classify the images of each defect. Evaluation results used the number of images as a unit. This study used HSI to combine spectral and spatial information to achieve pixel level detection and segmentation on the wet-blue leather. Since the grading of leather is defined by the size of the available area, the proposed HLDDA can effectively quantify the area of the defect and is more accurate for automated leather grading, which is a great progress for the leather industry.
2) Past studies used deep learning to implement the image segmentation of rawhide leather, such as UNET [18], which could only detect the more obvious defects (e.g., black line and wrinkle), and Mask R-CNN [19], which only had an overall accuracy of 70% on image segmentation. The proposed method of this study could detect five common defects on the wet-blue leather and reached an accuracy of over 96% for each defect.
3) Different from other literature proposing one method to classify various defects, this study designed appropriate algorithms including matched filter based HTD, spectral based 1D-CNN, spatial based 2D-Unet, and spatial-spectral based 3D-UNet for various defects. The different characteristics were compared in HSI, which is a novel approach. 4) Different from other wet-blue leather literature using public data sets, this study created its own data set by collecting and establishing more than 20 HSI data sets with the assistance of the tannery, which can better reflect the actual defects handled by the leather factories.

III. MATERIALS AND METHODS
The hyperspectral sensors, leather defect types, and proposed algorithms are detailed as follows:

A. HYPERSPECTRAL SENSORS
The hyperspectral signal has a wider spectrum range and higher spectrum resolution [31]- [33], meaning that the hyperspectral signal has abundant information hidden in each pixel. This study used the push bloom hyperspectral sensors FX10 and FX17 of SPECIM, and the hyperspectral system supplied by Isuzu Optics Corp. The wavelength ranges are 400∼1000 nm and 900∼1700 nm. The number of bands is 224 and the system is controlled by the software provided by Isuzu Optics Corp. Table 3 tabulates the specification of sensors. The data of whiteboard (absolute reflection, reflectivity is 100%) and black cloth (absolute absorption, reflectivity is 0%) in the dark box were recorded automatically before shooting. The sample was completely closed in the course of shooting and the linear light in the dark box was the only light source, as shown in Fig. 2. Finally, the sample reflectivity was calculated for experiments according to the collected light intensity.

B. LEATHER SAMPLES
The skin peeled off the slaughtered cattle is known as fur and it rots. To prevent it from rotting and losing hair, it is generally cured with salt. The processed fur with salt is called wet salted hide, which can be preserved for 3 to 5 months in general. The fur and wet salted hide are generally known as rawhide, which then turns blue after soaking, fleshing, limin, deliming, bating, pickling, and chroming. As it contains moisture, it is generally known as wet-blue leather. The defects on the surface after processing are less apparent than those on rawhide.
The skinner, Shan Been Jeou Ind. Co., Ltd., in Taichung, Taiwan provided nearly 20 sheets of wet-blue leather. Professional inspectors had identified the defects including brand mask, rotten grains, rupture, insect bites, and scratches, as shown in Fig. 3. For this study, the brand mask, rupture, and scratches were shot by the FX10 whereas the rotten grain and insect bite were shot by the FX17 camera. The collected images were cropped for the leather only (the original image contains the whiteboard and black cloth for calculating reflectivity). The dimensions of the cropped images are 350 × 750 pixels. The total number of pixels is 120,000 × 280,000.

1) BRAND MASK
Brand mask is a destructive sign on the cattle made by the livestock farm for the convenience of cattle management. Its feature is quite apparent but its color is similar to the leather.

2) ROTTEN GRAIN
Rotten grain is a bead-like defect formed on the rotten leather surface that resulted from defective preservation after the animal is slaughtered. There are micro craters in the defect, and the defect is not obvious in color images.

3) RUPTURE
Rupture is the destruction induced by folding the cowhide in preservation. It is apparent in color images, like broken skin.

4) INSECT BITE
Insect Bite is the scar resulted from a mosquito or parasite bite during the growing period of cattle, including the scars before healing and after healing. Its appearance is very small and unobvious in color images, but craters can be seen after a closer look, and are similar to the signal of normal leather in HSI.

5) SCRATCHES
Scratches are the marks induced by a slight cut in cattle. In general, there is a black region around it. It is obvious in color images, but the appearance of the cut is not obvious.

C. HYPERSPECTRAL TARGET DETECTION (HTD)
HTD can detect targets both actively and passively. Active target detection looks for certain target information, and detects specific targets. These targets of interest can be obtained by the supervised or unsupervised method. This study used active target detection to detect specific defects.

1) CONSTRAINED ENERGY MINIMIZATION (CEM)
The CEM [34]- [40] is an active target detection algorithm. Among the numerous existing target detection algorithms, CEM is stable and excellent in sub-pixel detection. During target detection, the CEM algorithm is given only one spectral signature as parameter 'd' (Desired Signature), without requiring other prior knowledge (e.g. multiple targets of interest or background). It means that the users can extract specific targets of interest and obtain the results of target detection when the other factors of the detection environment are unknown. Moreover, as many signals cannot be recognized or observed with the naked eyes, the CEM uses the correlation matrix R to suppress the background and uses the feature 'd' to match the customized FIR and highlight the target. This is to enhance the ability for detecting features and efficient execution. If there are n spectral signatures, all the spectral signatures are defined as {r 1 , r 2 , r 3 , . . . . . . , r n−2 , r n−1 , r n }. It is expressed as The WBS-CEM [40] was proposed for the RGB image for the first time. Its main concept is to use different weights for each pixel in calculating correlation matrix R, and a new nonparametric correlation matrix is redefined for feature extraction. The method aims at the data with the non-Gaussian distribution. Since the distribution of data points is not Gaussian distribution, the average value of each class cannot represent the center of the integer. The WBS-CEM VOLUME 9, 2021 uses w i in calculating the correlation matrix for weight adjustment. The principle is to use the distance to the target point for weight adjustment. The shorter distance is closer to the target spectrum. The correlation matrix can be multiplied by the distance to reduce the influence of the spectral signature.
It is similar to the target of interest on the correlation matrix, and the weighted correlation matrix is extended into a part of target detection algorithms. This is to enhance the accuracy of target detection algorithms. The w i computing modes can be Euclidean distance [41] or Spectral Information Divergence [42] The correlation matrix R is adjusted by weight The weight is substituted in the correlation matrix of CEM, expressed as:

3) HIERARCHICAL CEM (hCEM)
The CEM uses the FIR filter and calculates the correlation matrix to suppress the background. It does not perform well in some scenarios; hence, this paper uses hCEM [43]. The method performs the first calculation according to the traditional CEM method. X is the input image data including N spectral signatures. It is expressed as X = [x_1, x_2, . . . , x_N]. The k th output y is expressed as Afterward, the output result is multiplied by weight q to eliminate non-targets of interest. The weight q of the t th hyperspectral signal can be expressed as To sum up, the k + 1 th result can be expressed as k ⇐ k + 1, and then the output of Eq. 8 is used in (6) and (8) and repeat until satisfaction. The final output is expressed as

4) WEIGHTED BACKGROUND SUPPRESSION HIERARCHICAL CEM (WBS-hCEM)
The method is derived from the combination of the concepts in the previous two sections. The hyperspectral signal is given an initial weight, expressed as Eq. 2 or 3. Then, the output result is multiplied by weight q, expressed as Eq. 7. The irrelevant information of interest is removed gradually by continuous iteration, expressed as The next step is the same as hCEM in the previous section, iterated until satisfaction.

D. DEEP LEARNING (DL)
The present DL [44] is one of the major application domains of machine learning and has been widely used in Artificial Intelligence (AI). The earliest basic concept of neural networks was proposed by Chen et al. [40], and its network architecture has been used to simulate the image recognition method of humans to enhance the image recognition capability of the machine.

1) 1D-CNN
The CNN looks for the feature map of data through the convolutional layer. It is usually used to search for continuous coherent signals in the case of 1D. To obtain the features of interest from the shorter part of the entire data set, and when the positions of features in the segment are free of high correlation, the 1D CNN is very effective [41]. It is highly applicable to the data of sensors, such as time series analysis of audio signal or accelerometer data. As the hyperspectral signal has numerous and correlated bands, this paper attempts to find the features of leather defects through 1D-CNN. The network model architecture, as shown in Fig. 4, comprises the 2-layer and 4-layer 1D CNN are proposed in this paper. As shown, 2-layer 1D CNN is designed for brand mask, rotten grains, ruptures, and scratches; 4-layer 1D CNN is designed for insect bites. Since the spectral signature of insect bites is very similar to the normal leather, two more layers of 1D convolution are added to obtain more features.  The input signal is the signal of each pixel in the leather image. After 1D convolution, the convolution kernel is 3 × 3, and the activation function is ReLu. The result of convolution is flattened into 1D through a fully connected layer and the number of outputs is 64. Finally, a fully connected layer with two outputs is connected and combined with the activation function Softmax for classification.

2) 2D-UNet
The architecture of 2D-UNet [47] is evolved from the Fully Convolutional Network (FCN) [48]. The concept of the residual block was imported by Milletari [49]. The residual block is well known in ResNet. The original design goal is to reduce the gradient vanishing and saturation problems induced by stacking a number of convolutional layers. The construction of UNet aims at prediction with a few data. The dimension reduction path in the left of the architecture diagram is called the contracting path, in the right is the expanding path. To obtain accurate information, there will be convolution of at least two successive layers before dimension reduction or dimension raising, known as successive convolution.
Another characteristic is to maintain high channel number of up-sampling so that relative position relation and detailed features can be fully combined and the quality of recognition is upgraded. The U-Net has been used in image segmentation, such as medical images [50], [51] and remote sensing [52]; it can be also extended and enhanced to the 3D-UNet models, which have been applied in medical applications [53]- [55]. The network architecture with parameters is shown in Fig. 5 (a). The convolution kernel of all the 2D convolutional layers of the model is [3,3].

3) 3D-UNet
This paper considers the information content of the hyperspectral signal in spatial and spectral information and constructs the 3D-UNet architecture based on the 2D-UNet. The network model architecture is shown in Fig. 5(b). The shape of input data is set is [192,192,192]. It is set as 192 because the 3D-UNet requires four dimension reductions. Each dimension reduction reduces the information content of each dimension by half. There is 1/16 of input data remaining after four dimension reductions. To avoid  zero-padding of the model, the 192 which is the maximum multiple of 16 within 200 is selected as the input size. The convolution kernel of all the 3D convolutional layers of the model is [3,3,3]. The activation function is Swish [56] and the output of the second to the last layer is 3D data. The last layer uses 2D convolution to eliminate the third dimension. The activation function is ReLu so that the output is a 2D image. It is noteworthy that the 3D-UNet has fewer samples than 1D-CNN. The loss function used in this model is balanced cross-entropy [57], the weight of the class (i.e. defect) with fewer samples is set as 0.85, and the other class (i.e. background) is set as 0.15, so as to effectively suppress the behavior of the prior model, which is likely to identify the full sheet of leather as background. The loss function gives individual weight to each class, expressed as follows.

E. BAND SELECTION (BS)
The concept of BS was discussed as early as in multispectral times. Mausel [34] used the concept of correlation coefficient of statistics to determine a subset from an 8-channel VOLUME 9, 2021 multispectral image for classification. The classification rate was not affected. In recent years, the minimum similaritybased method is the most frequent BS method. Briefly, the newly selected band must have the minimum similarity and correlation with the selected band. The methods include correlation coefficient, Euclidean distance, Spectral Angle Mapper (SAM) proposed by Keshava [58], and Orthogonal Subspace Projection (OSP) [59]. This study used the target detection concept as the main band selection algorithm.

1) CONSTRAINED-TARGET BAND SELECTION (CTBS)
A new band selection method for multi-target detection is known as Constrained-Target Band Selection (CTBS) [60]. It is derived from the concept of CEM by constraining the target energy while minimizing the variance induced by background. Based on CEM, the variance resulting from the target of interest can be used for determining the frequency band priority and selecting the frequency band for the specific target.
If {b l } L l=1 is a group of band images representing hyperspectral signal cube, where b l is the l-th band. According to the obtained CEM error if BS is a group of bands selected and r i BS N i=1 is data set, the following equation can be defined as CTBS priority standard

2) MINIMAL/MAXIMAL VARIANCE-BASED BAND PROCESSING (MIN/maxV-BP)
According to the optimization method of CEM, the priority score is processed by variance, if d is used for detecting the target of interest of d designated target. For each band b l , the variance of the band can be calculated according to Eq. (12) As V (b l ) only uses the data sample vector designated by b l , its value can be used in the priority score assigned to b l . Therefore, the priorities of all bands can be sequenced according to the value of V (b l ). A smaller V (b l ) means that the b l is more important. The band selection is known as Minimal Variance-based BP (MinV-BP).
The FIR filter of all band set can be substituted, and The maximum variance is removed from and V ( − b l ) is used as a priority score. In other words, the higher the V ( − b l ) is, the more important is the b l band, known as Maximal Variance-based BP (MaxV-BP).

F. HYPERSPECTRAL LEATHER DEFECT DETECTION ALGORITHM (HLDDA)
This paper combined the above methods to develop the Hyperspectral Leather Defect Detection Algorithm (HLDDA) including HTD and DL techniques. Considering the noise during imaging of the hyperspectral camera, HLDDA removed low signal/high noise bands: bands 1-20 th and bands 211-224 th . A total of 192 bands were used in the experiments. In addition, to remove noise more efficiently, the Minimum Noise Fraction Transform (MNF Transform) [61] was used for hyperspectral data. The method regards the Signalto-Noise Ratio (SNR) as the indicator for evaluating signal quality. The noise interference can be reduced by MNF Transform, which makes the data cleaner. HLDDA does not use MNF Transform to directly reduce the bands because the clean band is not always suitable for various defects, as long as the noise of the original image is reduced and the suitable bands for various defects can be found according to the BS method.
The BS method is performed for the imported HSI to select 10 bands. It is worth noting that DL requires a lot of data information to achieve better results; in this case, it can skip BS and keep full bands HIS to obtain more features. In addition, the HTD requires a desired target as a prior knowledge; HLDDA uses the average of three target pixels selected from the ground truth as an input parameter. In the training process of DL, 1D-CNN randomly takes 1% of its data as the training set. The 3D-UNet uses a sliding window to split HSI to obtain five small images with defects as a training set, while the rest of the data is the testing set. Fig. 6 plots the flowchart of HLDDA.  Rate (FPR), Accuracy (ACC), Kappa, and Intersection over Union (IoU) are also used for evaluation.

IV. EXPERIMENTS
The TPR is the proportion of correctly identified samples to all the samples that are defective in fact (positive). The FPR is the proportion of misidentified samples to all the samples that are normal in fact (negative). The (FPR, TPR) coordinates of each threshold of the same model are drawn in the ROC space to form the ROC curve of a specific model. The area under the curve is AUC [62].
Cohen's kappa [63] is different from the ROC curve. The effect of the detector under different thresholds can be calculated. The kappa coefficient (κ) represents two errors (omission and commission) and the overall accuracy of the classifier. The confusion matrix expressed as Eq. (15)   The Intersection over Union (IoU) [64] is also known as the Jaccard similarity coefficient. It is the statistics for comparing the similarity and diversity of the sample set. The Jaccard index can measure the similarity of the finite sample set. It is defined as the ratio of intersection size to union size of two sets AUC, ACC, Kappa, and IoU were used for evaluation. The aforesaid five methods were compared. The result of HTD is a gray-level image, which must be binarized before classification. This study found the optimum threshold of maximization TPR+ (1-FPR). TPR, FPR, ACC, Kappa, and IoU were calculated according to the classification result. Finally, a comparison histogram of the detection results of each algorithm and overall performance is plotted in Fig. 13. There are at least three HSIs of each defect. The data of each result is the average of three HSIs.
According to the results in Fig.13(a), the 2D-UNet has the best performance in detecting brand mask because the brand mask is a defect with more spatial features. The 3D-UNet combining spatial information with spectral information is better than the other algorithms with only spectral information.
In detecting the rotten grain defect in Fig.13(b), the WBS-CEM has the best performance in AUC and TPR, and the 3D-UNet has the best performance in Kappa and IOU. Because the WBS-CEM uses the concept of matched filter, it has higher AUC and TPR, as well as a higher false alarm, leading to worse data of Kappa and IOU.
In detecting the rupture defect in Fig. 13(c), the WBS-CEM has a better effect in AUC, but 3D-UNet still has the best results in Kappa and IOU among other algorithms.
In detecting the insect bite in Fig. 13(d), the 2D and 3D-UNet cannot smoothly work. It is possibly because the insect bite defect is very small with very limited spatial information and its feature disappears after multiple convolutions. In this case, we can rely on spectral information based on 1D-CNN. According to the experimental results, 1D-CNN has the best result and the HTD may be disturbed by the other defects, leading to an excessive false alarm. According to the spectral signature of insect bites, it is very close to the signal of normal leather, resulting in high difficulties in analysis.
The results on detecting scratches in Fig. 13(e) are universally good. The HTD results in better AUC and TPR. The DL has better FPR, ACC, Kappa, and IoU.
The results of overall performance are shown in Fig. 13(f). As seen, the proposed WBS-CEM and WBS-hCEM perform better than original CEM and hCEM in AUC and Acc. However, the overall performance of 3D-UNet in ACC, Kappa, and IoU is still better than other methods. Table 4 lists the HTD and DL prediction time for reference. The average computing time of one image is given below. The image size is [390,482]. It is worth noting that HTD does not require any training process, which is the major advantage. However, DL takes much more time than HTD including training and testing time. Especially, 3D-UNet spends longest time in training, in order to implement 3D convolution. HTD is the fastest using the original CEM, but the commuting time are all very close and fast.

D. DISCUSSION
According to the experimental results in the previous section, our proposed WBS-CEM and WBS-hCEM perform much better than the original CEM for different defects. The results prove that background suppression plays an important role in detection results. From the angle of vision, the WBS-hCEM has the best effect on background suppression, followed by WBS-CEM. The hCEM is an excellent algorithm, but can also powerfully suppress the background to reduce the false alarm rate greatly. Additionally, some indistinct targets are suppressed as background. Excessive suppression results in worse evaluation indexes, such as AUC, than WBS-CEM. If the WBS-hCEM is selected, the result is generally between the former two, better or worse than the former two in some instances. In terms of DL, 1D-CNN has the best result in detecting insect bites since the defect of insect bites can only be recognized by spectral information; 2D-Unet takes advantage of spatial information so that it performs best in brand mask, which has more spatial features; the 3D-UNet considers spatial information and spectral information simultaneously. Therefore, it has the best performance in rotten grain, rupture, and scratch defects. A disadvantage is that tiny defects cannot be detected since the features vanish during the convolution. This is the reason that the insect bite defect fails to be detected in 2D and 3D-Unet. However, the 3D-UNet generally has a better overall performance than 1D-CNN and 2D-Unet, but it requires more GPU memory. If the image is divided into small pieces, partial spatial characteristics will be lost and the detection results will be unacceptable.
To sum up, the HTD and DL perform signal analysis from different starting points. The HTD uses the concept of matched filter to separate the target object from the background. Our proposed WBS-CEM and WBS-hCEM use weighting to suppress background and enhance the contrast between target and background. The advantage is that the target object can be located as long as one spectral signature is given as a parameter. On the contrary, the ID-CNN, 2D-Unet, and 3D-Unet analyze spatial and spectral signatures in the concept of a neural network. They must collect a lot of data for analysis. The ID-CNN mainly convolves spectral signature, whereas the 2D-Unet mainly convolves spatial features and 3D-Unet convolve spatial and spectral information simultaneously, but requires a longer computing time. The experimental results prove the above theory. Therefore, with limited resources and time, the HTD is preferred. On the contrary, if there are adequate resources and time, the DL is workable.

V. CONCLUSION
This paper analyzed five defects in wet-blue leather including brand mask, rotten grain, rupture, insect bite, and scratches by using HSI. Our proposed HLDDA includes HTD-based WBS-CEM and WBS-hCEM, as well as DL-based 1D-CNN, 2D-Unet, and 3D-Unet architectures to detect defects. The experimental results show that the HLDDA positively provides a new direction and feasibility of wet-blue leather inspection. Several contributions of this paper are considered to be significant and summarized in the following.
1) Prior studies mostly extract features from the spatial domain and then implement image classification on rawhide leather images. This paper is the first analytical study using HSI for wet-blue leather at the pixel level. The wet-blue leather after pre-processing is more challenging than general rawhide leather in defect detection. 2) Our proposed HLDDA uses the spectral and spatial information of leather and implements pixel level defect detection and segmentation to effectively quantize and estimate the size of defects so that it is more precise for leather grading. 3) We created our own data set by collecting more than 20 HSIs, almost 5.25 million spectral signatures with tannery's help. Our data set could be closer to the real situation of the factory. 4) In the HTD results, our proposed WBS-CEM and WBS-hCEM perform much better than the original CEM in different defects. The results prove that the background suppression technique plays an important role from the angle of vision, and can provide better visual contrast. In DL, we designed the spectral based 1D-CNN, spatial based 2D-Unet, and spatial-spectral based 3D-UNet for various defect features, which was never investigated in the past. 5) This paper provides an important wavelength range for recognizing different defects. This is an advantage in practical application and customizing hyper/ multispectral sensors in the future. 6) This paper is the pilot study and guideline for HSI in the detection of wet-blue leather to design appropriate algorithms from the angle of HTD and DL. Our future work is to present a better and novel network structure in the HTD or DL section. The short-term goal is to enable professional inspectors to diagnose defect positions faster. The long-term goal is to work with automated manipulators to develop intelligence leather grading towards Industry 4.0.