Generation of CNN Architectures Using the Harmonic Search Algorithm and Its Application to Classification of Damaged Sewer

When used as an image processing method, convolutional neural networks (CNNs) cannot verify conditions that can achieve high performance; however, they show sufficiently high performance and are used in various fields. The architecture of a CNN and various parameters determine its performance, and it is impossible to verify the number of all cases that determine the performance of the CNN. Therefore, well-known CNN models are generally used. Recently, various methods for adjusting the parameters of CNNs or generating CNN architectures have been studied in various ways. Methods using metaheuristic algorithms often focus on parameter tuning, or the use of simple hierarchical architectures. This paper proposes a method to create a CNN model with a complex CNN architecture that can be applied to different datasets using the harmony search (HS) algorithm from among metaheuristic algorithms. This study aimed to generate a CNN architecture using fewer computing resources and to verify the results. To make the CNN model in units of cells, the internal and hierarchical architecture of the cell was created based on the learning of the CIFAR image dataset through HS, and the performance was confirmed by applying it to the classification of a damaged sewer pipe image dataset.


I. INTRODUCTION
In image classification, which made machine learning popular, CNNs [1] have become a frequently used method. CNNs, which perform well in image processing, are used in various fields, and models using various layers and connection methods are released every year. Since the emergence of AlexNet [2], which increased the popularity of CNNs, other CNN models such as GoogleNet [3], VGGNet [4], ResNet [5], and DenseNet [6] have been created. Newly created CNN models have shown that the performance of CNNs can be improved by the features of each model. However, these models are CNN models produced by a person directly determining the type of layer, the connection architecture of each layer, and the parameters. Therefore, the performance of a CNN model depends on the type of problem to which it is applied. However, it is difficult for humans to design different CNN architectures according to each type of problem. When creating a CNN model, it is impossible to make the best model by checking the performance of each model and manually attempting to adjust the type of layer, the connection architecture of each layer, and the parameters. Therefore, various studies are underway to generate better CNN models. This paper proposes a method for generating CNN architectures using the harmony search (HS) [7] algorithm. This is executed using the CIFAR image dataset, and is applied to a damaged sewer pipe.
One of the methods to improve the performance of a CNN model is to tune the parameters of the layers used in the CNN.
For parameter tuning, there are methods that use genetic algorithms [8][9][10], particle swarm optimizers [11][12][13][14], or HS [15,16]. The parameter tuning method is slightly different for each study, but the commonality is that the type and order of the layers are fixed; and the kernel size, stride, padding, and some types of output channels of each layer are changed to obtain better results. Parameter tuning has the advantage that it takes less time because the CNN layer structure itself is fixed; however, it has the disadvantage that the performance cannot be greatly improved in the original CNN model as the layer type and order are fixed.
There have also been studies proposing a method for generating the CNN architecture itself. Studies on CNN architecture generation have been published using various methods such as neural architecture search (NAS) [19] that use reinforcement learning [17], and genetic CNNs [20] that use genetic algorithms (GAs) [18]. The following is a list of some of the CNN architecture generation methods that have been published: Meta-QNN [21], hierarchical evolution [22], largescale evolution [23], genetic programming CNN (CGP-CNN) [24], efficient architecture search (EAS) [25], evolving deep CNN [26], advanced neural architecture search (NASNet) [27], and CNN-genetic algorithms [28]. Each study uses a slightly different method to construct the CNN architecture. The CNN architecture generation consists of several different methods; such as fixing the overall architecture of the CNN and changing only the connection relationship between layers; adjusting both the CNN architecture and the parameters of the layers; and a method to apply the CNN architecture obtained from one dataset to another dataset. Evidently, this topic has been studied from a variety of perspectives.
Generating CNN architectures and comparing their performances is difficult. Because CNN architecture generation requires a lot of computational resources, the performance of the CNN model depends on the amount of time for which it is used--even if the same method is used. In the case of NAS, 800 graphic processing units (GPUs) were used for 28 days to generate a CNN architecture with 93.99% accuracy with the CIFAR10 dataset. Therefore, if one GPU was used, approximately 20,000 days would be required. Contrastingly, in the case of the genetic CNN, 17 GPUs were used for one day to build a model with 92.90% accuracy on the CIFAR10 dataset. Accuracy is an important factor in a CNN classification model, but considering the time and computational resources used to obtain each of them, it is difficult to conclude whether NAS or genetic CNN is the better method. In addition, because each study has differences in the learning method used, data preprocessing method, and generation purpose, etc., a given CNN architecture generation method cannot be appointed as the best method--even if it is performed on the same dataset. Each has its own advantages and disadvantages.
Using the metaheuristic algorithm in the existing CNN architecture generation methods ensures that the algorithm's population values match the layer type connection relationship, and adjusts the layer parameters. The method using the metaheuristic algorithm can check the result in less time and is easier to manufacture than the method using reinforcement learning. This is because the adjusted value is large in one model result confirmation. However, using a metaheuristic algorithm has the disadvantage that the size of the CNN architecture produced is determined according to the number of variables in the algorithm. Among these metaheuristic algorithms, the size of the CNN architecture to be generated while using HS is not fixed; in this paper, we propose using a CNN model produced in a dataset other than that used to produce the CNN architecture.
The main features of the CNN generation method using HS proposed in this study are as follows: First, the CNN architectures generated using the existing metaheuristic algorithm mainly function by changing the connection architecture of the layers, or changing the type of layer in a CNN model with a fixed connection architecture. The existing methods have the advantage that they can be easily applied to various fields, but have the disadvantage of fixing the number of variables in the algorithm because the number of layers in the CNN architecture does not change. This study demonstrates that among the different types of metaheuristic algorithms, HS can be used to generate a complex model even in CNN architecture generation, and various variables that determine the cell structure are determined using the unit (cell) used in NASNet. The internal structure of the cell is divided into three levels, each of which is composed of three layers, and a cell unit model is created through the added part between the layers.
Second, in the existing CNN architecture generation methods, the epochs performed for CNN model performance comparison are different in each study, and the performance of the CNN model is compared using the accuracy metric. In this study, we compare the performance of the CNN model throughout 20 epochs. As a feature of the proposed method and the accuracy at 20 epochs, the accuracy before 20 epochs and the number of CNN model parameters were used to compare the CNN model performance. Using this method, the accuracy does not converge at 20 epochs, and a CNN model with as few parameters as possible is selected in a CNN architecture using multiple layers.
Third, to generate a CNN architecture, it is necessary to check the training results of various CNN models; thus the larger the CNN architecture to be created, the greater the computational resources and time required to obtain an appropriate CNN model. Therefore, as it takes a long time to learn the CNN architecture of a large image, the cell structure of NASNet is adopted. This study showed that a model trained on a small sized CIFAR image dataset using a cell structure could be applied to a large sized sewer pipe image dataset to produce results.
The remainder of this paper is as follows. Related work is discussed in Section II. Section III describes the HS algorithm used to generate the CNN architecture and the method for setting the main parameters of the HS algorithm. In Section IV, we describe the generation of a CNN architecture using HS. In Section V, we describe the conditions used to generate the CNN architecture. In Section VI, we compare the CNN architecture generated using HS with that of other CNN models. Section VI presents the conclusions of this study.

A. CONVOLUTIONAL NEURAL NETWORKS
CNNs are a type of neural network method that produces good results with fewer computations than a neural network with fully connected nodes. A CNN mainly uses two layers: a convolution layer, and a pooling layer. The convolution layer uses the concept of a filter to perform convolution operations on the input data. A filter is treated as a matrix of a size called the kernel size. Convolution repeats saving after adding the multiplied value of the filter with the same number of channels to the input image consisting of multiple channels. The filter moves along the given stride in the horizontal direction in the input image, moves to the end, then moves along the given stride in the vertical direction-then repeats this process. After all the work is completed, the resulting matrix of stored values becomes an output called a feature map. The number of feature maps is specified as a parameter in the corresponding layer. The convolution process can be performed by treating the edge of the input image as additionally padded with a value of 0, referred to as padding. The main parameters used in the convolution layer are the number of output channels, kernel size, stride, and padding. The pooling layer performs in the same way as the convolution layer, but the value of the filter itself does not exist because the highest value or average value in the filter is used, instead of adding values in the convolution operation. Because there is no filter with a value in the pooling layer, the number of input channels and output channels is the same. The main parameters used in the pooling layer are kernel size, stride, padding, and filter operation types. Among the parameters of these convolution and pooling layers, kernel size, stride, and padding determine the size of the feature map; thus the size of the input and output feature maps can be increased or decreased according to the values of the three types of parameters. The convolution layer also has several names depending on how the convolution layer is used. One of them, depth-wise separable convolution, is composed of two layers: depth-wise convolution, and point-wise convolution [29]. Unlike general convolution layers, depth-wise convolution generates a feature map for each channel rather than adding all the values after multiplying the input channel by a filter. As a result, depthwise convolution has a smaller amount of computation compared to the number of output channels as the existing convolution layer. However, as the number of output channels increases, the amount of computation for the next operation increases significantly. Therefore, point-wise convolution is used to reduce the number of channels increased by depthwise convolution. Point-wise convolution is a layer whose purpose is to adjust the number of output channels using a convolution layer with a 1 × 1 kernel size. As the kernel size is small, the amount of computation is small; therefore, it is used to reduce the computation in the next layer by reducing the number of output channels. Information loss, however, occurs as the number of channels is reduced. In this study, the number of channels is multiplied by 8 in depth-wise convolution. Among the methods used in CNNs, there is one called skip connection, which was first introduced in [30]. Backpropagation training, which is the primary learning method of neural networks, has a vanishing gradient problem in deep neural networks owing to the problem its poor performance. The vanishing gradient refers to cases where the multiplication of values less than one is accumulated during backpropagation using multiplication; as the value used in training is insufficient, the training is not performed properly. Therefore, to avoid these problems, a bypass is created that does not go through certain layers, called a skip connection. ResNet [31] is a representative CNN model using skip connections, which are an effective method.

B. RELATED WORK
There have been many studies on improving the performance of CNNs without human involvement.
In the early days, methods [8][9][10][11][12][13][14][15][16] were used to adjust parameters in the layers of CNNs. These methods fixed the type of layer and the connection architecture and changed only the layer parameters to change the performance of the CNN model. However, the disadvantage was the need to use an already created CNN model.
Subsequently, a method of generating a CNN architecture was developed; and methods such as genetic CNNs [20] using a metaheuristic algorithm determined the connection architecture between layers. The CNN architecture generation method, in which the number of variables in the algorithm is fixed, uses only a small number of variables to fix the type of layer and together with the major parameters; therefore, the order of connection of each layer of a small number of variables is determined. Further, it changes the connection order of each layer, or adds skip connections to generate a CNN architecture. Generating a CNN architecture using a layer-connected architecture has the advantage that the CNN architecture itself is not complicated; thus, it can be used easily, and it does not take a long time to evaluate the performance. However, the problem is that it is difficult to create a large CNN architecture, and the layer type is fixed. In contrast, NAS [19], which uses a recurrent neural network (RNN) [32] to determine almost all architectures of CNNs, is a representative method for determining the layer type and layer connection architecture in the generation of a CNN architecture. The disadvantage of NAS is that it spends a significant amount of time generating a CNN architecture that has low accuracy. However, NASNet [27], abandoned several parameters and focused on the architecture of the CNN--showing that it can produce better results-and has influenced various studies on CNN architecture generation since then.

III. PSF-Harmony Search Algorithm
The HS algorithm [7] is a metaheuristic algorithm that uses harmony memory (HM) to store solutions with high fitness values in algorithm execution, and gradually obtains good fitness values. The characteristic of HS is that a part of the solution in HM is used for the individual solution to be checked in the next iteration. HS has three main parameters: the harmony memory consideration rate (HMCR), pitch adjustment rate (PAR), and bandwidth. When creating a new individual in HS, one of the values of the corresponding solution part stored in the HM is used according to the probability of the HMCR for each part of the solution. If part of the new solution uses the value of HM, it is added to the solution by adding a random value within the bandwidth with the probability of PAR. If a part of the new solution does not refer to the value of HM, a random value is used. Therefore, in HS, HMCR, PAR, and bandwidth are important parameters that determine exploration and exploitation performance of the algorithm. A parameter-setting-free (PSF) method [33] that automatically changes the HMCR and PAR bandwidths of HS according to the execution of the algorithm was devised; in this study, the advanced PSF-HS [34] is used. Advanced PSF-HS induces exploitation by increasing the values of two parameters; HMCR-which is the probability of bringing the value stored in HM--and PAR-which is the probability of adding a value within the bandwidth as the HS performance progresses. The advanced PSF-HS used in this study uses the HMCR, PAR, and bandwidths of equations (1)-(3) in the i-th iteration.
where fitobj is the target fitness, fitmean is the average fitness of the solutions stored in the HM at the time of execution, fitstart is the average of the solutions entered in the first HM, and fiti is the fitness of the performance of the i-th iteration--where n(val) is the number of HS variables to be performed.

IV. PROPOSED METHOD
Herein, we propose a new method for determining the fitness and creating a cell composed of multiple layers to generate a CNN architecture with few parameters.

A. FITNESS
In most studies on CNN architecture generation, the accuracy is mainly used for a given classification problem to compare the performance of each CNN model. In some studies, the accuracy was compared by performing small epochs to evaluate the performance of a CNN model. [19,21,[24][25][26] This study compared the performance of the generated CNN model at 20 epochs, and performed the final performance evaluation at 300 epochs. The fitness of the HS used in the performance comparison of the CNN model was determined by adding other factors rather than using only the accuracy of a specific epoch. Because the performance of the CNN model should be compared after performing 20 epochs, there is an inevitable difference from the learning results performed by 300 epochs. In the performance evaluation of the CNN model, to predict future performance with the result of a small number of epochs, the accuracy value at the intermediate epoch is used for fitness herein. After training the CNN model for 20 epochs, we used fitness by adding the difference between the accuracy at 10 epochs and the accuracy at 20 epochs to estimate the degree to which the accuracy converges to the accuracy obtained by 300 epochs. The learning possibility is considered after 20 epochs through the difference in accuracy at the beginning of training the CNN model. We also used the number of parameters of the CNN model for fitness. One of the goals of this study is to operate on minimal computing resources, and a model of a suitable size may be better than high accuracy. Therefore, in the HS performance, the corresponding value is added to the fitness such that the CNN model parameter has a small value. We use the fitness (4) created using these conditions.
where acci is the classification accuracy in the i-th epoch, and param is the number of parameters of the CNN model.

B. CNN ARCHITECTURE
This paper is inspired by NASNet [27] and uses the cell structure for CNN architecture generation. The cell structure uses both normal and reduction cells, and the internal structure of each cell is based on Figure 1. For the input hi, a layer-add configuration consisting of a minimum of one and a maximum of three levels is used--and in the case of Figure 1, three levels are used. The HS determines the number of output channels of a layer, and layers of the same level have the same output channels. When using a convolution layer, batch normalization and leaky rectified linear units (ReLU) are connected. Each layer enters one of the three added parts according to the value set by the HS algorithm. The added part is used as the input to the next layer and repeats. The last level adds the part that generates output hi+1 through a concatenation operation. The added part that is not connected to any layer in the middle places the input hi and 1 × 1 convolution in the middle to make the number of channels in the feature map the same as the added part at the same level. This method makes it difficult to create a skip layer by adjusting the number of channels, but it uses an unused add part to achieve a similar result. A normal cell uses the structure as it is, but for the reduction cell, a pooling layer that halves the feature map size is added immediately after concatenation.  As shown in Figure 2, normal and reduction cells are connected to form a CNN architecture. Normal cells are used repeatedly as many times as set in the HS. Normal cells and reduction cells form one group, and the group repeats a set number of times in the HS to connect the input image and the output softmax.

C. HARMONY SEARCH
Herein, HS determines the number of repetitions of the cell structure in the CNN architecture generation, the type and connection of each layer inside the cell, and the number of output channels per layer. The difference from the normal HS operation is that if there is an unused value-depending on the number of levels when generating the CNN architecture--this value is not stored in the HM. If the HS algorithm attempts to retrieve a value with the probability of HMCR for the next iteration, but there is no value stored in the HM, a random value is used. The pseudocode of the HS used in this study is outlined in Algorithm 1.

V. EXPERIMENTAL DESIGN
In this study, one GPU card was used to generate the CNN architecture--the GPU card was an Nvidia GeForce GTX 1060 model. The optimizer that was used for training used stochastic gradient descent (SGD). The default learning rate was 0.01, and was multiplied by 0.1, at 101 and 201 epochs, respectively, for a total of 300 epochs. Generating the CNN architecture from the CIFAR image dataset took 10 days. The training loss function used crossentropy to determine the accuracy of the training loss from the results using the sewer pipe image dataset.

A. HARMONY SEARCH SETTING
The main parameters of HS are calculated with PSF-HS using Equations (1-3), thus there is no need to set the parameters separately. Therefore, when performing HS, a person must directly determine the input range of each variable. Each variable and input range was configured, as shown in Figure 3. Normal cell repetition was repeated 1-2 times, and normalization repetition was repeated 1-3 times. The level number is the number of levels, and determines how many levels each cell will have--it has 1-3 levels. Each level layer can have 32, 64, 128, 256, or 512 output channels, expressed as 1-5. The reduction cell reduces the size of the feature map by half by adding a pooling layer with a stride of 2 at the end of the cell structure, and expresses the type of the pooling layer as 1-4. Each layer sets the layer type with a value of 1-13, as shown in Table 1. The three added parts that each layer will be connected to are determined by a value of 1-3.

B. CIFAR IMAGE DATASET
In this study, the CIFAR image dataset [35] was used to generate the cell structure. The dataset comprised a total of 60,000 32 × 32 RGB images; 50,000 images were used for training, and 10,000 images were used for testing. The CIFAR image dataset is divided into CIFAR10 with 10 classes, and CIFAR100 with 100 classes. CIFAR10 has 5,000 pieces of training data and 1,000 pieces of test data for each class. CIFAR100 has 500 pieces of training data and 100 pieces of test data for each class. The 32 × 32 size CIFAR image dataset has been used as a benchmark dataset in various studies, and many studies have used the CIFAR image dataset in the field of CNN architecture generation. CNN architecture generation studies differ in the learning method of CNN architecture generation, epoch, GPU model, GPU days, and model generation method, etc. In CNN architecture generation studies, even if we compare the classification performance on the same dataset, no method is the best method. As it is impossible to compare the results of other studies, a specific method cannot be best for generating CNN architectures.
Herein, the results of other studies using the CIFAR image dataset are tabulated, but no comparison of each study's results is given in detail.
In this study, fitness was determined only using the test dataset without using the validation dataset. In each direction of the dataset image, four pixels with zero values were padded and randomly cropped to a size of 32 × 32, and a horizontal flip was performed with a probability of 0.5. As transforming the image dataset affects the performance results, only representative image transformation methods performed in the CNN architecture generation paper were applied.

C. SEWER PIPE IMAGE DATASET
The goal of this study was to create a model that classifies images of damaged sewer pipes. The sewer pipe image dataset consisted of 12 classes. Sewer pipe classes include an undamaged sewer pipe type consisting of three types: pipe joints, the inside of the pipe, and inverts. There are nine types of damage in the sewer pipes: longitudinal cracks, circumferential cracks, surface damage, broken pipes, lateral protruding, faulty joints, displaced joints, silt deposits, and ETC (Types of damaged sewer pipes not mentioned above). In the sewer pipe image dataset consisting of 2,000 images for each class, there is subtitle information included in the shooting, thus RGB noise was added to that part to prevent overfitting problems. The sewer pipe images were resized to 128 × 128; 19,200 images (80%) were used for training, whereas the remaining 4,800 images (20%) were used for testing. These images are shown in Figure 4 and Figure 5. In the CNN training using the sewer pipe image, 16 pixels with zero values were padded in each direction of the dataset image-was randomly cropped to a size of 128 × 128-and then a horizontal flip was performed with a probability of 0.5. A total of 100 epochs were learned for CNN architecture learning using the transformed sewer pipe image.

VI. EXPERIMENTAL RESULT
The results of this study are comparable with those of other CNN models in other studies. The results of VGGNet [4], ResNet [5], and DenseNet [6] were obtained under the same conditions as HS-CNN using the model provided by PyTorch. Among them, VGGNet uses a model with batch normalization. As for the results of VGGNet, ResNet, and DenseNet, two cases were confirmed when transfer learning was used to train a pre-trained CNN model with the ImageNet image dataset, and when transfer learning was not used. An accurate comparison of results cited in other papers is not possible because the epochs, learning methods, image preprocessing methods, and types of GPU models used are different for each paper. This study is meaningful in comparing the CNN architecture generation study results with relatively few GPU days compared to other studies.

A. CIFAR10 IMAGE DATASET
The cell and cell connection method used in the CNN architecture obtained using the CIFAR10 image dataset is shown in Figure 6. A batch size of 4 was used in training for CNN architecture generation. A fitobj of 95 was used in the advanced PSF-HS algorithm. The CNN architecture using the cell in Figure 6 repeats the normal cell twice, and the Normal-Reduction cell group also repeats twice; it is the structure of a = 2 and b = 2 in Figure 2. The results of 300 epochs of training using a batch size of 32 in the CNN architecture produced with the structure in Figure  6 are indicated by HS-CNN in Table 2.  Compared with HS-CNN and VGGNet, which have the highest accuracy among the given CNN models, HS-CNN uses fewer parameters, and the accuracy is 0.15% higher. As a result, HS-CNN took only 10 GPU days and showed better results than the pre-trained CNN model.

B. CIFAR100 IMAGE DATASET
The cell and cell connection method used in the CNN architecture obtained using the CIFAR100 image dataset is shown in Figure 7. A batch size of 4 was used in training for CNN architecture generation. A fitobj of 75 was used in the advanced PSF-HS algorithm. The CNN architecture using the cell in Figure 7 repeats the normal cell twice and the Normal-Reduction cell group twice, shown by a = 2 and b = 2 in Figure  2.
The results of 300 epochs of training using a batch size of 32 in the CNN architecture constructed with the structure in Figure 7 are indicated by HS-CNN in Table 3. Compared with HS-CNN and VGGNet, which has the highest accuracy among the given CNN models, it can be seen that HS-CNN achieves a 2% higher accuracy.

C. SEWER PIPE IMAGE DATASET
Training using the sewer pipe image dataset was performed for 100 epochs, with learning rates of 0.01, of 0.001, and 0.0001 at the start, at 50 epochs, and 75 epochs, respectively. A batch size of 4 was used. In Table 4, the results of GoogleNet [3] and WideResNet [37] that do not use transfer learning are added to the results of VGGNet, ResNet, and DenseNet. In addition, the results of NASNet-A Mobile and NASNet-A Large using the CNN model created in the NASNet [27] study were added. In Table  4, the results using VGGNet and ResNet show that when using the CFIAR image dataset, the accuracy may decrease when using transfer learning instead of a significant increase in accuracy.
The image of the sewer pipe has a low RGB value, as seen in Figure 4-5, and there are many parts close to 0 in the center of the image; therefore it shows a different appearance from the ImageNet image with a relatively high RGB value. As a result, for NASNet-A Large. When performing the sewer pipe classification of NASNet-A, like other CNN models, the number of output channels of the linear layer of the final output was changed to 12, the number of classes of the sewer pipe, and the rest of the learning conditions were also used in the same way. In Table 4, the sewage pipe classification accuracy of the NASNet-A model without transfer learning was very low. NASNet-A, which showed very low accuracy compared to training loss, seems to show low accuracy due to overfitting like ResNet and DenseNet. Sewer classification accuracy of NASNet-A model using transfer learning of pretrained model with ImageNet dataset showed higher accuracy than when transfer learning was not used. It can be seen that the pre-train model of NASNet-A, which shows a significant difference in accuracy compared to other CNN models according to the use of transfer learning, has more suitable parameters for classifying sewage pipe images. However, since the sewage pipe classification accuracy using all NASNet-A models is lower than the sewage pipe classification accuracy using the VGGNet model, it can be confirmed that the NASNet-A model is not suitable for the sewage pipe image classification. Therefore, it can be seen that the CNN model based on HS-CNN proposed in this paper performs better in classification of sewage pipe images.

VII. CONCLUSION
This paper proposes a method to generate a CNN architecture from a dataset using small-sized images with HS, and to create a classification model for large-sized images using the generated CNN architecture. We generated the CNN architecture through HS using the CIFAR image dataset as a dataset of small-size images and compared the transfer learning results from other papers and known CNN models. It showed better accuracy than known CNN models. Unlike other CNN architecture generation methods, the CNN architecture using HS should be considered as a model built using computing resources with 10 GPU days. Based on the HS-CNN produced by the CIFAR image dataset, we created a classification CNN model of the sewer pipe image dataset using large-size images and confirmed the results. Because the sewer pipe image dataset is an original image dataset that has not been used in other studies, it cannot be compared with the results of other studies. Therefore, the sewer pipe image classification was compared with existing CNN models. In the sewer pipe image dataset, the existing CNN models were able to evaluate the performance of the CNN model alone, as the accuracy decreased when transfer learning was performed with the pre-trained CNN model through the ImageNet image. The CNN architecture fabricated using HS showed a classification accuracy that was at least 1.96%, and up to 5.67% higher than that of VGGNet, which has the highest accuracy among the existing CNN models.
In this study, the two CNN models created for classification of the sewer pipe image dataset using the HS and CIFAR image datasets showed high accuracy compared to other CNN models, confirming that a transferable architecture that retrieves the CNN architecture is possible. This paper shows that classification models of different image datasets can be created through CNN architecture generation using HS with few computing resources.