Damage Identification of Low Emissivity Coating Based on Convolution Neural Network

At present, low emissivity coating has been widely used in various fields, but damage will greatly reduce the efficiency of low emissivity coating, so the damage detection of low emissivity coating becomes an important work. Based on convolution neural network, a model for automatic identification of coating damage with low emissivity is proposed. Firstly, the optical image data set of low emissivity coating is constructed and extended by means of data enhancement. After that, VGG-19 and ResNet-50 models are built based on tensorflow, and the cross entropy loss function is used in the models. Then, SGD, momentum, RMSprop and Adam are used to optimize the model. In the process of model optimization, the learning rate is adjusted to get the optimal model. The results show that when the learning rate is $5\times 10^{-5}$ and Adam method is used to optimize the model, the recognition accuracy of VGG-19 model is 90.64%, while that of ResNet-50 model is 94.14%. This paper is of great significance for the study of automatic damage identification of low emissivity coatings.


I. INTRODUCTION
Low emissivity materials can be divided into coating materials and structural materials in terms of forming process and bearing capacity [1]. Among them, coating material refers to the coating with stealth function on the surface of the structure, while structural material refers to the material with both stealth and load-bearing functions (common laminated plate and sandwich type) [2]. The research focus of low emissivity materials is coating materials [3]. Therefore, this paper will study the automatic damage identification of low emissivity coating.
Low emissivity coating is widely used in aircraft, ships, missiles, military vehicles and other weapons. During the life of weapon equipment, any low emissivity coating will be affected and acted by environmental factors in the process of storage, transportation and use, resulting in the changes of physical and chemical properties such as discoloration, pulverization, delamination, cracking, adhesion reduction The associate editor coordinating the review of this manuscript and approving it for publication was Xiaokang Yin . and fluctuations of low emissivity performance of coating. In addition, the mechanical damage of low emissivity coating caused by battlefield and normal training will seriously affect the low emissivity performance of the coating. The emissivity of the coating will increase sharply when the coating falls off. However, the identification of low emissivity coating damage is highly dependent on professionals and inefficient, so it is necessary to propose an automatic damage identification method.
Neural network is a kind of supervised learning algorithm, which is usually based on training data. In this study, an advanced neural network, the so-called convolution neural network, is used, which can save the spatial topological structure information of data, and extract small details from the input image to improve the recognition accuracy [5].
The main purpose of this paper is to propose an automatic damage identification method based on the optical image of low emissivity coating. In other word, this study uses convolutional neural network method to realize damage identification based on the optical image of low emissivity stealth coating. In this paper, the data set of low emissivity coating damage is constructed, and the data set is enhanced. The performance of low emissivity coating damage identification of different convolution neural networks is compared, and the optimal model is obtained by super parameter adjustment. This study is of great significance to the damage identification of low emissivity coatings.

II. DATA SET CONSTRUCTION AND IMAGE PREPROCESSING
A. BUILDING DATA SETS In this paper, five damage types of low emissivity coating are selected, which are blistering, rusting, cracking, abrasion mark and peeling off. 1500 low emissivity coating pictures (300 pictures for each kind of damage) and 300 normal low emissivity coating pictures without damage were collected from different parts of different types of aircraft, and 1800 pictures in total formed the data set. Among them, 90% of the pictures are used as training sets, and the rest are used as test sets, and the proportion of pictures with different damage types in test sets is exactly the same as that in training sets.

B. DATA ENHANCEMENT
For the neural network applied in computer vision, collecting more training data can effectively enhance the generalization ability of the model, but in reality, the number of low emissivity coating pictures for specific damage is very limited. To solve this problem, this paper uses the method of data  enhancement to expand the training set, including: random clipping, random flipping, random change of brightness, Mosaic Data Augmentation.
Random clipping is to intercept each picture in the training set with a specified size. By establishing the weight of each feature and corresponding category to reduce the weight of background factor, it can weaken the data noise and enhance the generalization ability of the model on the basis of expanding the data. The random clipping process is shown in Figure 2.
Random flipping can be divided into horizontal flipping and vertical flipping, which can adjust the picture left and right or up and down. By changing the absolute position of the damage feature in the image, but keeping the details of the damage feature, weakening the position weight of the damage feature, the purpose of expanding the training set is achieved. The random turning process is shown in Figure 3.
By randomly changing the brightness of low emissivity coating image, the sampling process of damage identification under different light intensity can be effectively simulated, and the generalization ability of the model can be significantly enhanced. The process of randomly changing brightness is shown in Figure 4.
Mosaic represents a new data augmentation method that mixes 4 training images. Thus 4 different contexts are mixed, while CutMix mixes only 2 input images. This allows detection of objects outside their normal context. In addition, batch normalization calculates activation statistics from 4 different images on each layer. This significantly reduces the need for a large mini-batch size [6] Mosaic Data Augmentation   Data enhancement makes the number of images in the data set 11 times larger than the original, that is to say, the number of images in the data set is expanded from 1800 to 19800, and each type of image is expanded from 300 to 3300. Among them, random clipping increases the dataset image by 4 times, random flipping makes the dataset image increase by 2 times, random change of brightness makes the dataset image double, and Mosaic Data Augmentation increases the dataset image by 3 times.

C. NORMALIZATION
In order to reduce the training difficulty of the model, improve the generalization ability of the model and prevent the occurrence of gradient explosion, the image in the low emissivity coating data set is normalized, as shown in formula (1). After normalization, the data in the data set conforms to the standard normal distribution.
where g (x, y) is the original value of the pixel, f (x, y) is the value of the pixel after normalization, (x, y) is the coordinate of the pixel, µ is the mean value of pixel values of all images in the training set, and σ is the variance of pixel values of all images in the training set.

III. CONVOLUTIONAL NEURAL NETWORK A. VGG-19
VGG-19 network [7] adopts the idea of modular design, and constructs a ''5 + 3'' structure, which includes 5 conv modules and 3 fully connected layers. The first and second conv modules each contain two convolution layers and one pooling layer. The third, fourth and fifth conv modules each contain four convolution layers and one pooling layer. Each convolution layer is connected with the relu activation function. The structure of VGG-19 is shown in Figure 6. Activation function is introduced to increase the nonlinearity of neural network model. Without activation function, each layer of neural network is equivalent to matrix multiplication. The output of each layer is a linear function of the input of the upper layer, so no matter how many layers of the neural network, the output is a linear combination of inputs, which is the most primitive perceptron. In this way, the neural network can be applied to many nonlinear models. Compared with the traditional sigmoid activation function, relu activation function (as shown in formula (2)) has the advantages of saving computation and preventing gradient from disappearing.
Compared with the traditional 5 × 5 convolution kernel, Vgg-19 network adopts 3 × 3 small-size convolution kernel,  which can effectively reduce the model parameters and calculation without reducing the model accuracy. The receptive field of the convolution kernel with two layers of size 3 × 3 is the same as that of the convolution kernel with one layer of size 5 × 5. The effect of feature extraction with two layers of size 3 × 3 is similar to that with one layer of size 5 × 5, as shown in Figure 7. However, the number of parameters to be learned for one layer of 5 × 5 convolution kernel is 50, and that of two layers of 3 × 3 convolution kernel is only 36. It can be seen that using two layers of 3 × 3 convolution kernels instead of one layer of 5 × 5 convolution kernels can effectively reduce the amount of parameters to be learned, and then reduce the amount of calculation.
In the pooling layer, the maximum pooling method is used to sample the feature map to reduce the spatial characteristics of the input feature map, as shown in Figure 8. Maximum pooling can effectively save texture features and weaken position weight. In this paper, the number of units of the three fully connected layers is 512, 256 and 6 respectively. In the last layer, the output is normalized by softmax function, so that the output interval is transformed from (−∞, +∞) to [0, 1], and the sum is 1. The output value of the fully connected layer is transformed into probability value, as shown in formula (3). where, x i and x j are the output results of the i and j nodes of the last full connection layer respectively, and S j is the probability value obtained after normalization.

B. RESNET-50
Resnet-50 network [8] not only adopts modular design idea, but also adopts residual idea. It consists of five conv modules, an average pooling layer and a full connection layer. The first conv module consists of a convolution layer with size of 7 × 7 and stride size of 2 and a maximum pooling layer with size of 3 × 3 and stride size of 2. The second, third, fourth and fifth conv modules contain three, four, six and three residual units respectively. Resnet-50 network structure is shown in Figure 9.
In convolutional neural network, when the number of network layers is less than a certain value (about 20 layers), the network performance increases with the increase of layers, but when the number of network layers is more than a certain value (about 20 layers), the network performance will not increase with the increase of layers, but will decrease, because the increase of network layers will lead to the disappearance of gradient or gradient explosion [8]. The back-propagation of neural network is to multiply the partial derivatives of functions layer by layer. Therefore, when the number of layers of neural network is very deep and the absolute value of partial derivative of function is less than 1, the deviation of the final output layer will become smaller and smaller because it multiplies a lot of numbers less than 1, and eventually it will approach 0, resulting in the weight of shallow layers not updated, which is the disappearance of gradient. When the absolute value of the partial derivative of the function is greater than 1, the error gradient continuously accumulates in the update process and becomes a very large gradient, which leads to a large update of the weight of the network, and thus makes the network unstable, which is the gradient explosion. Residual unit solves this problem with the idea of residuals. The structure is shown in Figure 10.
Aiming at the problem of ''with the deepening of the network, the accuracy decreases'', ResNet designed two parallel channels, namely identity mapping and residual mapping. If the network has reached the optimal level and continues to deepen the network, the residual mapping will be updated to 0, leaving only identity mapping, which can ensure that there will be no gradient disappearance or gradient explosion. In this way, ResNet-50 network will always be in the optimal state, and the network performance will not decrease with the increase of depth.
Each Residual Unit consists of three convolution layers. After each convolution layer, the batch normalization processing layer is connected, which effectively increases the generalization ability of the model and reduces the training difficulty of the model, and the relu activation function is connected after the batch normalization processing layer. The convolution kernel sizes of the three convolutions are 1 × 1, 3 × 3, 1 × 1, respectively. Each Residual Unit also contains a ''Shortcut Connection'' to realize the short connection operation. Just because of the existence of ''Shortcut Connection'', the deep convolution neural network can degenerate into the shallow network in the training process, so as to prevent the gradient disappearance or gradient explosion.

IV. EXPERIMENTAL METHOD
This paper is based on tensorflow to write a program to build VGG-19 network, including data preparation, network configuration, model training and model evaluation. The data preparation part is used to construct the data provider and preprocess the data; the network configuration part mainly determines the network model, loss function and optimization function; the model training part is used to train and save the model; the model evaluation part is used to observe  the intermediate results of model training. Tensorflow is a machine learning framework developed by Google, which is based on Python language. In this paper, the VGG-19 network is built based on tensorflow. The training set of the low emissivity coating damage data set which has been pretreated is sent to the network for 100 epochs of training, and the loss function of each epoch of training of the model is recorded. The cross entropy loss function is adopted in this network, as shown in formula (4). After each epoch of training, the test set is sent to the model for detection, and the top-1 accuracy rate of damage identification is obtained. Through experiments, the learning rate is constantly adjusted, and it is found that when the learning rate is 5 × 10 −5 , the optimal model is obtained, the loss function curve of model training is shown in Figure 11, and the detection top-1 accuracy curve is shown in Figure 12.
where p (x) represents the true value and q (x) represents the predicted value.
The experimental results show that with the increase of training epochs, the model converges and gradually reaches a stable state. After the 77th epoch of training, the minimum loss function is 1.639 × 10 −5 . After training, the total loss  It can be seen from the above experimental results that the detection accuracy of the model is low, and it is difficult to achieve the purpose of accurate identification of low emissivity coating damage. The reason is that the damage features of the low emissivity coating damage data set constructed in this paper are complex, and the VGG-19 network structure is relatively simple, so it can not extract and identify the damage features well. Therefore, VGG-19 model is not suitable for low emissivity coating damage identification. This paper builds resnet-50 network based on tensorflow, uses cross entropy loss function, and trains the network 100 epochs based on low emissivity coating damage data set. Under the condition of learning rate of 5 × 10 −5 , SGD, momentum, RMSprop and Adam are respectively applied to optimize the network. Under the condition of learning rate of 5 × 10 −5 , the loss function curve of model training is shown in Figure 13, and the top-1 accuracy curve is shown in Figure 14.
The calculation time of SGD optimization algorithm (as in formula (5)) [9] does not depend on the number of training samples. Even if the number of training samples is very large, they can converge. For large enough data sets, SGD may converge to a certain fault tolerance range of final error before processing the whole training set. However, it is difficult to choose a suitable learning rate. If it is set too large, the learning curve will vibrate violently, and the loss function value will usually increase significantly; if it is too small, the learning process will be very slow. If the initial learning rate is too low, the learning may stay at a very high loss function value.
where, L is the loss function, w i is the ith weight parameter of the network, b i is the ith bias parameter of the network, t is the number of iterations, and α is the learning rate. Momentum optimization algorithm (as in formula (6)) [10] can accelerate the learning of parameters with the same direction, and reduce the update of parameters with gradient changing direction. Therefore, momentum algorithm can accelerate learning in related directions, suppress oscillation, and accelerate convergence.
where, L is the loss function, w i is the ith weight parameter of the network, b i is the ith bias parameter of the network, t is the number of iterations, α is the learning rate and β is the index weighted parameter. In this paper, β = 0.9. RMSprop ptimization algorithm (as in formula (7)) [11] is an optimization algorithm with adaptive learning rate. At the beginning of training, the denominator is smaller, the learning rate is larger and the learning speed is faster; in the later stage of training, the learning speed will gradually slow down. Moreover, it is suitable for dealing with sparse gradients. Parameters with large partial derivatives have a fast decreasing learning rate, while parameters with small partial derivatives have a slowly decreasing learning rate.
where, L is the loss function, w i is the ith weight parameter of the network, b i is the ith bias parameter of the network, t is the number of iterations, α is the learning rate, β is the exponential weight parameter, in this paper, β = 0.999 is taken, ε is to prevent a small amount of denominator zero, and in this paper, ε = 1 × 10 −9 is taken. Adam optimization algorithm (as in formula (8)) [11] combines momentum and RMSprop, uses gradient first-order moment estimation and second-order moment estimation to dynamically adjust the learning rate of each parameter, and adds bias correction.
where, L is the loss function, w i is the ith weight parameter of the network, b i is the ith bias parameter of the network, t is the number of iterations, α is the learning rate, β 1 and β 2 are the exponential weight parameters, in this paper, β 1 = 0.9 and β 2 = 0.999 are taken, ε is to prevent a small amount of denominator zero, and in this paper, ε = 1 × 10 −9 is taken.
The experimental results show that the detection accuracy of ResNet-50 model is higher than that of VGG-19 model. Under the condition of learning rate of 5 × 10 −5 , when SGD method is used to optimize the network, the decline speed of loss function is very slow, the top-1 accuracy is kept at 0.5319, and the whole training process is almost non convergence. Therefore, SGD method is not suitable for optimizing the model constructed in this paper. When the momentum optimization method is used, the loss function reaches the minimum value of 4.02 × 10 −3 after the 96th epoch of training; After training, the total loss function of the model is 6.001    Under the condition that Adam algorithm is selected to optimize the network, the learning rate is adjusted so that the network can be trained under different learning rates. The loss function curve of model training is shown in Figure 15, and the top-1 accuracy curve is shown in Figure 16.
The experimental results show that when the learning rate is 0.001 or 0.0005, the loss function does not decrease with the increase of training epochs, but continuously oscillates, and the top-1 is always accurate at 0.5319. This is because the learning rate is too large, resulting in a large vibration of the model, which always beats around the global optimal solution and cannot converge to the global optimal solution. When the learning rate is 0.0001, the loss function reaches the minimum value of 2.838 × 10 −8 after the 89th epoch of training; After training, the total loss function of the model is 6.001 × 10 −3 After the 47th epoch of training, the top-1 accuracy reaches the maximum of 0.9549. After training, the overall top-1 accuracy of the model was 0.8911. When the learning rate  As shown in Figure 17, the change of the overall top-1 accuracy rate with the learning rate is a convex function. It can be seen that the overall top-1 accuracy rate increases first and then decreases with the learning rate in the range of 0.001 to 0.00001, and the model with too large or too small learning rate tends to converge to the local optimal solution. This is because when the learning rate is too high, the model will fluctuate greatly, and it is easy to skip the global optimization, while when the learning rate is too low, the network will continue to learn near the local optimal solution of the model, unable to find the global optimal value.

V. RESULTS
For the low emissivity coating damage data set constructed in this paper, vgg-19 network and resnet-50 network are built respectively for training, and the optimal model is solved by adjusting the optimization method and learning rate. It can be seen from table 1 that, compared with non mosaic data augmentation, the recognition accuracy of the model is improved when mosaic data augmentation is used, which indicates that it is effective to expand the data set by mixing different types of images to improve the efficiency of the model.
When Adam optimization algorithm is adopted and the learning rate is 5 × 10 −5 , VGG-19 network and ResNet-50 network get the optimal model respectively. However, the overall detection accuracy of the former optimal model is only 0.9064, and that of the latter optimal model is 0.9414. According to the experimental results, resnet-50 network constructed in this paper is better for automatic damage identification of low emissivity coatings.

VI. CONCLUSION
In this paper, a low emissivity coating damage data set is constructed, and the data set is enhanced to facilitate further research by other researchers in this field.
We build VGG-19 and ResNet-50 models, and train the model based on the data set of low emissivity coating damage, which proves that ResNet-50 model is better for the identification of low emissivity coating damage. When the Adam optimization algorithm is used and the learning rate is 5 × 10 −5 , the optimal model can be used for the automatic identification of low emissivity coating damage. This study is of great significance for the automatic detection of low emissivity coating damage.