Using Hybrid Filter-Wrapper Feature Selection With Multi-Objective Improved-Salp Optimization for Crack Severity Recognition

The emerging technology of Structural Health Monitoring (SHM) paved the way for spotting and continuous tracking of structural damage. One of the major defects in historical structures is cracking, which represents an indicator of potential structural deterioration according to its severity. This paper presents a novel crack severity recognition system using a hybrid filter-wrapper with multi-objective optimization feature selection method. The proposed approach comprises two main components, namely, (1) feature extraction based on hand-crafted feature engineering and CNN-based deep feature learning and (2) feature selection using hybrid filter-wrapper with a multi-objective improved salp swarm optimization. The proposed approach is trained and validated by utilizing 10 representative UCI datasets and 4 datasets of crack images. The obtained experimental results show that the proposed system enhances the performance of crack severity recognition with ≈ 37% and ≈ 31% increase in recognition average accuracy and F-measure, respectively. Also, a reduction rate of ≈ 67% is achieved in the extracted feature set with all the tested datasets compared to the conventional classification approaches using the whole set of features. Moreover, the proposed approach outperforms other approaches with classical feature selection methods in terms of feature reduction rate and computational time. It is noticed that using VGG16 learned features outperforms using the fused hand-crafted features by 17.7%, 15.9%, and 23.5% for fine, moderate, and severe crack recognition, respectively. The significance of this paper is to investigate and highlight the impact of applying multi-feature dimensionality reduction through adopting hybrid filter-wrapper with multi-objective optimization methods for feature selection considering the case study of crack severity recognition for SHM.


I. INTRODUCTION
Protection and preservation of historical buildings have attracted the interest of many researchers due to their importance. There are many threats facing historical buildings such as environmental and human factors inducing stresses inside the buildings and resulting in the degradation process over time [1]- [3]. Therefore, developing automated systems for detecting historical building damage is a pivotal step to save cultural heritage.
Structural Health Monitoring (SHM) is considered as a substantial method for damage detection and estimation of The associate editor coordinating the review of this manuscript and approving it for publication was Xiaoou Li . structures durability by using multidisciplinary fields such as sensors, image processing, materials, system integration, signal processing, and interpretation. The main aim of SHM technology is not solely detecting structural failures/deterioration, but also providing an early indication of materialistic damage. Consequently, those provided early warnings can be used to determine repair strategies before the structural damage results in failure [4], [5].
Cracks represent major defect in building infrastructure, with a critical economic impact and can constitute safety risks if left without care. Also, cracks are considered the main indicators of potential structural damage and building's durability. In order to help safeguard historical buildings from further damage and to understand future development of structural defects, it is necessary to develop various effective and adaptable monitoring strategies to assess the state of damaged structures/buildings. Observation results could be used to plan and optimize maintenance and conservation works. Usually, cracks are distinguished by dark areas in an image. This makes it easy for a human to identify cracks, however, presents difficulties for human judgment to classify crack severity in addition to time consumption, and error-prone. Hence, the automation of crack detection and crack severity recognition processes is a challenging task to obtain information about building's structural health conditions [6], [7].
Nowadays, many researchers widely use several Machine Learning (ML) and Deep Learning (DL) techniques for crack detection. The performance of these techniques is strongly influenced by a variety of factors such as the size of training data, number of features, algorithm parameters, and in some cases, the nature and complexity of the studied problem. Whereas, feeding ML techniques with a huge number of features often causes several challenges such as high computation complexity, overfitting, and low interpretability of the final trained model. This is due to the curse of dimensionality arising from the proportion between the number of features and the number of samples [8]- [10]. Hence, to improve the performance of crack detection systems, there is a need to resolve this problem. Reducing the number of features is the most common way to handle the dimensionality problem using either feature selection or feature construction.
Feature selection (FS) is one of the most substantial preprocessing steps in any classification task. The main objective of FS is to pick out the best subset of features via removing extraneous or redundant features to effectively reduce the dimensionality of data, shorten the training time, and enhance the classification accuracy. Utilizing the brute-force method to achieve FS leads to a high computational cost, with considerable risk of overfitting in most circumstances. On the other hand, the manual selection of prevalent features is inconsistent/infeasible in most cases. Hence, searching for the optimal feature subset, which accurately detects cracks, is a highly demanding task [11]- [13].
Feature selection methods can be generally classified into three types, namely, wrapper-based, filter-based, and hybrid methods, which are differed in the evaluation approach. Wrapper-based methods employ a ML algorithm to compute the quality of the selected subset of features. Although, it obtains better accuracy than the filter-based methods, it has the following drawbacks: 1) the selected features are dependent on the learner and may be not optimal for other learners, 2) the high computational cost, 3) parameters fine-tuning of the learners may be time consuming, and 4) the inherent limitations of the learners. In contrast to wrapper-based methods, filter-based methods use predefined metrics or statistical measures (e.g. mutual information) to evaluate the selected features. Even though, these approaches are fast and independent of the learners, they have the drawbacks of lacking the interaction between learners and features in addition to, lacking the ability to work with redundant features [11]- [13].
In this work, we seek to integrate the power of the two illustrated feature selection processes through developing a multiobjective filter-wrapper feature selection method, which will be utilized within the framework of a multi-objective classification approach.
Moreover, during the last two decades, many state-of-theart meta-heuristic algorithms such as Particle Swarm Optimization (PSO), Cuckoo-Search Algorithm (CSA), Artificial Bee Colony (ABC), Whale Optimization Algorithm (WOA), Genetic Algorithm (GA), Firefly Algorithm (FA), Bat Algorithm (BA), Grey Wolf Optimization (GWO), Salp Swarm Algorithm (SSA) etc. are utilized to solve the problem of feature selection due to their efficient and powerful performance in handling several complex real-world problems [11]- [13].
Accordingly, in this paper, to improve the performance of crack detection and severity recognition, we present a multiobjective filter-wrapper feature selection method, based on an improved Binary Salp Swarm Algorithm (BSSA). The SSA mimics the attitude of salps during the foraging and navigation in oceans to conduct global optimization [14].
The significance of this paper is to improve the performance of crack detection and severity recognition through the optimal fusion/selection of hand-crafted features such as Local Binary Patterns (LBP) and Histogram of Oriented Gradients (HOG) and Convolutional Neural Networks (CNN) based learned features.
The main contributions of this paper are summarized in the following points: • Constructing expert-based annotated primary datasets of crack images, collected over 2 years from a historical location.
• Developing a multi-objective improved BSSA (MO-IBSSA) approach with Kappa index as a fitness function for handling the feature selection problem.
• Designing and investigating the use of feature fusion of LBP and HOG hand-crafted features and CNN-based learned features with MO-IBSSA optimization.
• Establishing a novel hybrid filter-wrapper feature selection method based on MO-IBSSA for crack detection crack severity recognition.
• Testing and Validating the performance of the proposed approach through implementing multiple experiments on the primary obtained real-world datasets and other selected publicly available benchmark datasets.
The remaining of this paper is organized as follows. Section II presents state-of-the-art studies related to feature selection problem and crack detection. An overview of the methods used in the proposed approach is explained in section III. Phases of the proposed crack severity recognition system are described in sections IV. Section V presents and discusses the obtained experimental results. Conclusions and future work are introduced in section VI. VOLUME 8, 2020

II. RELATED WORK
Recently, feature selection, fusion, as well as structural damage detection have become very important issues. As mentioned in the previous section, there are many studies tackling these problems that were conducted using diverse ML and DL methods. This section provides a review of state-ofthe-art literature that addressed feature selection and fusion techniques besides crack detection approaches.

A. FEATURE SELECTION/FUSION METHODS
Recently, various studies addressed several feature selection and fusion methods for achieving considerable advantages for classification performance.
In [15], Kaur et al. presented a hybrid feature weighting approach for enhancing the diagnostic accuracy of Magnetic Reasoning (MR) brain tumor classification. The presented approach integrated Fisher Criterion and Parameter-Free BAT optimization (PFree BAT) algorithm for weighting conventional metabolite ratio features extracted from short echo MR spectroscopy. The K-nearest neighbor (KNN) algorithm with 5-fold cross-validation achieved an accuracy, sensitivity, and specificity of 93.72%, 96%, and 91%, respectively.
Sasikala et al. proposed a feature fusion approach based on FA with Optimum-Path Forest classifier fitness function [16] for enhancing breast cancer detection. The proposed approach serially fused LBP features extracted from both Craniocaudal (CC) and Mediolateral Oblique (MLO) view mammogram images. The Support Vector Machine (SVM) classifier achieved detection accuracy of 96.6% and F-measure of 96.53%, respectively, using DDSM dataset. Moreover, the same study achieved detection accuracy of 86.5% and F-measure of 82.88%, respectively, using INbreast dataset.
Also, Gunasundari et al. in [17] proposed a wrapper-based feature selection approach, based on Boolean Particle Swarm Optimization (BoPSO), for improving classification accuracy in liver and kidney disease diagnosis. The proposed approach used first-order and second-order statistical features extracted from Grey Level Co-Occurrence Matrix (GLCM) using abdominal CT image slices. The proposed approach presented two new modified BPSO algorithms, namely, Velocity Bounded BoPSO (VbBoPSO) and Improved Velocity Bounded BoPSO (IVbBoPSO) with Probabilistic Neural Network (PNN)/SVM classifier fitness function. The PNN/SVM achieved accuracies of 77.14% and 82.86%, respectively, for liver diseases using elite features selected by IVbBoPSO. On the other hand, PNN/SVM achieved accuracies of 77.14% and 90%, respectively, for kidney diseases using elite features selected by VbBoPSO.
Also in [18], Alirezaei et al. proposed a wrapper-based method for feature selection using four meta-heuristic algorithms, namely, Non-dominated Sorting Genetic Algorithm II (NSGA-II), FA, PSO, and Imperialist Competitive Algorithm (ICA), with bi-objective fitness for reducing features dimensionality and improving classification accuracy in diabetes diagnosis . The SVM classifier achieved classification accuracies of 100%, 100%, 98.2%, and 94.6%  when evaluating the features selected by FA, ICA, NSGA-II, and PSO, respectively, using the PIMA Indian Type-2 diabetes dataset obtained from UCI Machine Learning Datasets Repository. The depicted results showed that both FA and ICA algorithms outperformed the NSGA-II and PSO.
Moreover, González et al. proposed in [19], a wrapperbased method for multi-objective feature selection based on NSGA-II evolutionary algorithm with Linear Discriminate Analysis (LDA) fitness for enhancing the performance of Brain Computer Interfaces (BCI) systems. A new objective function and feature ranking procedure were proposed for analyzing stability of the proposed method. It used the second moment (variance) of the wavelet coefficient set as features. Evaluating the proposed method on the collected EEG signals using Naive Bayesian Classifier (NBC), KNN, LDA, and combinations of these classifiers proved that KNN obtained the best Kappa values on the test dataset.
In [14], Ibrahim et al. presented a feature selection approach based on an improved SSA for dimensionality reduction. The presented approach integrated both SSA and PSO as a hybrid optimization technique (SSAPSO) to increase the convergence rate and enhance the effectiveness of SSA for the exploration and exploitation. The proposed SSAPSO approach was evaluated on a set of 15 benchmark functions and proved its performance enhancement without affecting the computational time. Also, feeding the KNN with the features selected by the SSAPSO achieved an accuracy of 74.8%-98.31% and F-measure of 78%-98.34% on evaluating with 10 representative datasets obtained from the UCI repository.
Additionally, in [20], Aljarah et al. proposed a feature selection approach based on asynchronous accelerating multi-leader salp chain with KNN and number of features fitness to avoid trapping in local solution when applied to high-dimensional datasets. The proposed approach improved the BSSA via adding 3 different asynchronous updating rules and a novel leadership structure. Through feeding the KNN with the features selected by the proposed approach, an accuracy of 73.64%-99.77% was achieved using 70% of the 20 selected UCI datasets, compared to the other implemented optimization techniques.
Farisa et al. proposed two wrapper-based feature selection approaches based on BSSA with KNN and number of features fitness to test the ability of SSA in feature selection problems [21]. In the first approach, the continuous version of conventional SSA was converted to binary using eight transfer functions to be applied for feature selection. While, the second approach replaced the average operator by utilizing the crossover operator with the transfer function to improve the exploratory behavior of BSSA. Through feeding the KNN with the features selected by the proposed BSSA, an accuracy of 73.35%-100% was achieved using around 90% of the 22 selected UCI datasets, compared to the BGWO, Binary Gravitational Search Algorithms (BGSA), BBA, BPSO, and GA.
Another variant of SSA was proposed in [22] by Hegazy et al., by introducing a new control parameter called inertia weight, used to adjust the present best solution in order to improve solution accuracy, convergence speed, and reliability. The proposed algorithm was applied along with KNN and number of selected features fitness for feature selection problems. The proposed feature selection algorithm was evaluated on a set of 23 selected UCI datasets and proved its superior performance in terms of accuracy and feature reduction rate, against other optimizers. The KNN achieved an accuracy of 63.19%-98.99% when evaluated with 23 datasets obtained from the UCI repository.
Also, in [23], to tackle the problem of slow convergence speed and getting stuck in local optima of SSA, Hegazy et al. presented an improved SSA by combining five chaotic maps with SSA (CSSA). The proposed CSSA with KNN and number of selected features fitness was applied for feature selection problems. Through employing the KNN with the features selected by the proposed CSSA, an accuracy of 60.44%-97.83% was achieved using around 92.6% of the 27 selected UCI datasets, compared to the standard SSA and other optimizers, especially, when using the Tent chaos.
Zawbaa et al. developed a feature selection approach based on hybrid bio-inspired heuristic [24] for solving the dimensionality reduction problem of large datasets with small instance-set. The developed approach combined the Ant-Lion Optimization (ALO) algorithm with the GWO to form a hybrid bio-inspired optimization algorithm ALO-GWO. The developed ALO-GWO takes the characteristics of diversification and stagnation avoidance from ALO and the characteristic of faster convergence from GWO. Evaluating the ALO-GWO with the KNN as well as the number of selected features fitness function on a set of 18 different datasets from the UCI repository showed that the developed approach outperformed other optimization algorithms such as GA, PSO, GWO, and ALO. Moreover, the proposed approach achieved fitness value of 0-0.192 and 0-0.178, when evaluating on 7 different Micro-array gene expression datasets and 5 different face detection image datasets, respectively.
In a similar vein, Chantar et al. [25] presented a wrapperbased method for feature selection based on improved Binary GWO (BGWO) with elite-based crossover for enhancing the classification accuracy of Arabic text. The proposed method employed Term-frequency Inverse Document Frequency (TF-IDF) for term weighting. It presented an improved BGWO with KNN and the number of selected features as a single objective fitness function. Feeding decision trees with the features selected by the proposed BGWO achieved Fmeasures of 0.8255, 0.73663, and 0.79702, however, using Naive Bayes achieved F-measures of 0.93445, 0.82036, and 0.86686, using SVM achieved F-measures of 0.9616, 0.91706, and 0.90473.
After that, Abdel-Basset et al. [26] proposed a new wrapper-based feature selection method based on a combi-nation of GWO with two-phase mutation to strengthen the exploitation capability of the GWO. Moreover, the effect of two different transformation functions were studied. The two mutation phases aimed to reduce the number of selected features while maintaining high classification accuracy and tried to add more enlightening features to enhance classification accuracy. Feeding the KNN classifier with the features selected by the proposed method achieved an accuracy of 33%-98.88% using around 94.3% of the 35 selected UCI datasets, compared to other algorithms.
In [27], Mafarja et al. presented two different binary variants of WOA to solve feature selection problems, where, a novel wrapper-based feature selection method based on WOA with KNN and the number of selected features fitness was proposed to improve classification accuracy. In the first variant, the random selection operator in the searching process was replaced by both Tournament and Roulette Wheel selection mechanisms. While, in the second variant, WOA was equipped with crossover and mutation operators to improve the exploitation capability in the WOA. The performance of the proposed method was evaluated using 20 selected datasets from the UCI repository. The experimental results showed that WOA equipped with crossover and mutation outperformed other algorithms using around 70% of the datasets with an accuracy of 78.5%-100%.
In [28], Thaher et al. presented a wrapper-based feature selection method for high-dimensional low sample size datasets, based on binary Harris Hawks optimizer (HHO) with KNN fitness. The S-shaped and V-shaped transfer functions were utilized to convert the continuous HHO algorithm to a binary HHO (BHHO). To evaluate the proposed BHHO algorithm for feature selection, 9 online public high-dimensional datasets with low samples were used. The obtained results showed that BHHO with S-shaped transfer function achieved higher accuracy for 6 datasets, that is, an accuracy of 54.1%-100% was obtained by feeding the KNN classifier with the features selected by the proposed BHHO using around 55.56% of the used datasets.
Another binary variant of HHO, called Quadratic Binary Harris Hawk Optimization (QBHHO), for solving feature selection problem was proposed in [29] by Too et al. The proposed QBHHO used four quadratic transfer functions to transform the HHO into a binary one. To validate the proposed approach, 22 datasets selected from the UCI repository were used. Feeding the KNN with the features selected by the proposed QBHHO using the fourth quadratic transfer function achieved an accuracy of 39.78%-97.76% using around 50% of the tested datasets.
In [30], Sindhu et al. proposed a wrapper-based feature selection method based on Improved Sine-Cosine Algorithm (ISCA), through combining the SCA with the Elitism strategy and new updating mechanism for the best solution to enhance classification accuracy. The performance of the proposed ISCA was validated with 10 datasets selected from the UCI repository. With feeding Extreme Learning Machine (ELM) classifier with the features selected by the proposed  ISCA, an accuracy of 76.80%-98.73% was achieved using around 50% of the selected UCI datasets, compared to other optimization algorithms. Table 1 summarizes the presented exhaustive survey of state-of-the-art studies related to feature selection/fusion approaches based on meta-heuristic algorithms. The feature selection/fusion approaches are classified into the following categories; filter, wrapper, hybrid, single-objective, and multi-objective approaches.

B. CRACK DETECTION TECHNIQUES
Lately, various studies developed promising crack detection systems, however, a limited number considered utilizing feature selection and fusion methods for achieving enhanced recognition.
In [31], Jang et al. proposed a hybrid crack detection approach based on deep learning to improve crack detection rate and reduce false alarms. The proposed approach has the power of combining vision and laser IR thermography images. After image reconstruction using Time-spatial-Integrated (TSI) coordinate transform, the proposed approach utilized features extracted from a pre-trained GoogLeNet CNN architecture. The used dataset is a total of 200 raw images increased to 20,000 images via segmentation and augmentation. It was observed that the performance of the proposed crack detection system was improved using hybrid images, specifically, through increasing precision from 59.84% to 98.72% and recall from 97.26% to 99.23%.
In a similar vein, Silva et al. [32] developed a concrete crack detection system based on deep learning using transfer learning schema. The proposed system used the pre-trained VGG16 deep learning CNN model. Beside the system development, the authors studied the impact of different training parameters on the performance of the proposed crack detection system. The proposed system was evaluated on a balanced dataset of total 3500 images (intact and crack) of concrete surfaces with 80% used for training and 20% for testing. The obtained experimental results showed that the proposed system achieved an accuracy of 92.27%.
Moreover, Dorafshan et al. in [33], proposed a new hybrid crack detector that combined an edge detection method with a DCNN model for enhancing the accuracy of crack detection via reducing the residual noise generated by edge detectors in the final binary images. Moreover, the authors presented a comprehensive comparison between conventional edge detectors and deep learning models trained in three modes for crack detection problem. The dataset used is a total of 100 raw concrete images. Edge detection methods performed well, specifically, the LoG technique that detected about 53-79% of cracked pixels accurately, whereas, it produced noise in the final binary images. For AlexNet, the transfer learning mode achieved an accuracy of about 98% with a slight increase compared to both full training and classification modes, which achieved an accuracy of about 97%. Generally, the hybrid method reduced the noise by a factor of 24.
Also, in [34], Dorafshan et al. presented a new benchmark dataset for crack detection called SDNET2018 with benchmark results for crack detection using deep learning model. This dataset includes many challenges such as edges, shadows, scaling and surface roughness. A total of 230 raw images of cracked and intact concrete surfaces of three different types (bridge decks, walls, pavements) were divided into sub-images of size (256 × 256 pixels) resulting into more than 56,000 sub-images used for training, validation, and testing. Then, the AlexNet DCNN architecture was used in both full training (FT) and transfer learning (TL) modes for crack detection. The achieved results showed that using AlexNet in transfer learning mode outperformed the one in full training mode, as the AlexNet FT mode achieved an accuracy of 91.92% against 90.45% for TL mode with bridge deck and an accuracy of 89.31% against 87.54% for TL mode with walls. Also, it achieved an accuracy of 95.52% against 94.86% for TL mode with pavement.
In [35], Maeda et al. proposed a CNN based road crack detection approach. The dataset used consists of 500 pavement images captured using a smartphone at the Temple University campus. After segmentation and augmentation, datasets consisting of 640, 000, 160, 000, and 200, 000 samples were used as the training, validation, and testing datasets, respectively. A comparative study was performed between the proposed method, the SVM method, and the Boosting method. The experimental results showed that the proposed ConvNet method outperformed both the SVM and Boosting methods through increasing the F-measure from 0.7359 to 0.8965.
In [36], Cha et al. proposed an automatic crack detection method based on DL to minimize the influence of noise caused by different reasons on the classifier. The proposed method used a deep architecture of CNN that is capable of learning features automatically from raw data. Many images of a complex engineering building were captured with several image variations using a DSLR camera. Then, all images were cropped into small batches of size 256 × 256, then divided into training and validation datasets. The results of the experiment proved that the proposed method outperformed the other methods by achieving an accuracy of 97.95%.
In [37], Zhang et al. proposed a unified crack detection approach for pavement and sealed cracks using transfer learning. The main advantage of this approach is to solve the difficulty of crack extraction and the inaccurate budgeting resulting from noises and sealed cracks and cracks with similar intensity and width, respectively. A novel two-step preclassification based on transfer learning was conducted to increase the detection accuracy. After the pre-processing step, a two-step DCNNs model was applied in transfer learning mode to classify images into 3 classes (background, crack, and sealed crack). After that, a thresholding-based segmentation was used to generate the binary image. In order to extract the final crack region, a curve detection method based on tensor voting was applied. Based on the conducted experiments, the proposed approach achieved a reasonable performance with recall of 0.951 and precision of 0.847.
In [6], Kim et al. proposed a ML based crack classification approach to solve the challenge of classifying crack-like patterns as a crack. The proposed approach consisted of two main steps, namely, crack candidate region (CCR) generation and Speeded-Up Robust Features-based (SURF) and CNN based classifications. Image binarization was used to initially extract all crack candidates from images, which accordingly were manually annotated as crack or non-crack. After the annotation, these crack candidates were used in building the classification models. The experimental results showed that the proposed CNN-based crack classification approach achieved an accuracy of 98% and F-measure of 0.95.
Wang et al. in [38], proposed an efficient crack detection model combining the strengths of using multiple visual features, such as texture and edges, and the power of multitask learning. This research aimed to handle the problem of limited representation resulting from using single type of features via combining the LBP and the HOG as two complementary features. The extracted texture and edges feature vectors were combined and fed into a multi-task learning classification based the ELM approach for crack detection. The obtained experimental results showed that the proposed crack detection model achieved an accuracy of 92%, compared to other traditional methods.
In [39], Xu et al. presented an end-to-end CNN-based crack detection model to improve the crack detection accuracy via avoiding losing the information of crack edge caused by the process of pooling. The proposed model has the advantage of combining the power of the atrous convolution, Atrous Spatial Pyramid Pooling (ASPP) module, and depth-wise separable convolution in obtaining denser feature map, and multi-scale image feature with avoiding details loss and reducing computational complexity. On evaluating the proposed model using the collected bridge crack images, it achieved a crack detection accuracy, precision, sensitivity, specificity and F-measure of 96.37%, 78.11%, 100%, 95.83%, and 0.8771, respectively. Also, it was observed that the proposed model outperformed the investigated traditional DL models. Table 2 summarizes the presented exhaustive survey of state-of-the-art studies related to crack detection systems.

III. PRELIMINARIES A. HISTOGRAM OF ORIENTED GRADIENTS (HOG)
The Histogram of Oriented Gradients (HOG) feature descriptor is one of the most widely used feature descriptor types in computer vision [38], [40], [41]. It can be applied in various pattern recognition domains such as face recognition, pose estimation, human detection etc., with achieving superior results, because of its high ability to strongly describe texture and shape. The HOG descriptor also has the ability of preserving image local information using orientation intensity distribution and edges gradient. The HOG feature descriptor can be calculated according to Algorithm 1 [38], [40], [41].

Algorithm 1: HOG Feature Descriptor Computation
Input : Grayscale image Output: Final HOG feature vector FHOG 1 Calculate Both vertical and horizontal gradients using equations (1) and (2): 2 Calculate gradient magnitude and angular orientations using equations (3) and (4): The Local Binary Patterns (LBP) feature descriptor is widely used as an effective statistical texture descriptor of images in various computer vision systems. The main advantage of the LBP with extracting of image features is robustness to illumination and rotation variation. The LBP feature descriptor can be calculated according to Algorithm 2 [38].

C. CONVOLUTIONAL NEURAL NETWORK (CNN)
Among the various deep neural network models, CNN is considered the most commonly used model for image classification. Standard CNN consists of several convolutional layers, pooling layers, and fully-connected (FC) layers. The main aim of the CNN is the automatic and adaptive learning of spatial hierarchies of useful features, from low-to highlevel patterns [32], [36], [42], [43]. Table 3 describes the different CNN layers.

D. BINARY SALP SWARM ALGORITHM (BSSA)
The Binary Salp Swarm Algorithm (BSSA) is one of the most recent meta-heuristic algorithms, mimicking the swarming behavior of salps during the navigation and foraging in deep oceans. It can be used for solving various optimization problems with achieving superior results. In order to simulate the behavior of salp chain mathematically, the population has been divided into two groups, namely, leader salp and follower salps, based on the positions of salps in the population. A salp in the front of the food chain, which is the nearest salp VOLUME 8, 2020  to food source, is considered as the leader salp; the rest are the followers. Thereby, the swarm is guided by the leader salp, and the follower salps keep track of each other (and leader salp directly or indirectly) [14], [20]. The leader position is calculated by equation (6): where X 1 j stands for the leader's position and F j denotes the position vector of food source in the j th dimension, the ub j and lb j represent the upper and lower limits of j th dimension, respectively. C 2 and C 3 are parameters with random values inside [0,1], where C 1 is the major parameter of SSA, and is expressed according to equation (7): where t is the current iteration number and Max iter denotes the maximum iteration number. The location of each follower salp is updated by equation (8): where i ≥ 2 and X j i is the position of the follower salp in the j th dimension.
Since FS is a binary problem, it is supposed for the salps to move in bounded directions (0 and 1 values). In order to convert continuous SSA positions to be suitable for FS problem, a transfer function is used as defined in equation (9), which is the probability of updating an element in the solution (1 for selected or 0 for not selected).
To update an element of a solution in the next iteration, based on the calculated probability from equation (9), equation (10) is used.
Due to the high dimensionality of the feature selection problem, BSSA needs some modifications and adaptations to be applied to such a problem. Wherefore, in [20], Aljarah et al. proposed an improved BSSA with new operators for dynamically updating the main parameter of BSSA. It also presented a new leadership structure, which assumes half of the population as leaders and the rest as followers instead of a single leader. Then, the whole food chain was divided into several sub-chains with different leaders. In each sub-chain, the salps' positions inside the search space are adaptively updated using a different strategy to strengthen the effectiveness of the BSSA in the matter of exploitative and exploratory tendencies. As a result, according to the best updating strategy called Termite Colony Salp Swarm Algorithm (TCSSA3) [20], the major parameter of SSA C 1 is dynamically updated using asynchronous updating rules according to equation (11) instead of decreasing C 1 parameter gradually over the course VOLUME 8, 2020 where g c and g i are the gray level of the cell's center pixel and the surrounding pixels, respectively, resulting in 8-bit binary vector 3 Convert each 8-bit binary vector into its corresponding decimal value and replace the intensity value with this decimal value using equation (5): Then, the salps' positions are updated using equations (6), and (8). The IBSSA (TCSSA3) works, as follows: • Initialize the population of salps, • Repeat the following steps until stopping condition is met: -Calculate the fitness values of all salps, -Set as the best salp, F, -Divide the population of salps into different subchains, -Update each salp in leaders group, as follows: * Update C 1 by equation (11), * Update the leader's position by equation (6), * Calculate the probability of a feature to be selected using equation (10), -Update the follower's position by equation (8), • Return F. In this paper, a multi-objective version of the Improved BSSA (MO-IBSSA) with the Kappa index as fitness is proposed. The details of the proposed MO-IBSSA is described in section IV-D2.

Algorithm 3: Binary Salp Swarm Algorithm (BSSA)
Input : Swarm size n; Problem dimension d; Maximum number of iterations Max iter Output: The leader salp F 1 Initialize the swarm X i (i = 1, 2, ..., n) 2 while (t < Max iter ) do 3 Obtain the fitness of all salps F=the best search agent 4 Set F as the leader salp 5 Update C 1 by equation (7) 6 for (every salp (x i )) do 7 if (i==1) then 8 Update the position of leader by equation (6) 9 Calculate the probabilities using equation (10) that takes the output of equation (9) 10 else 11 Update the position of followers by equation Mutual Information (MI) is one of the most widely feature selection methods, used for measuring the mutual dependence between random variables. It therefore provides a way to evaluate the relevance between individual features and classes. The mutual information I (X ; Y ) between two random variables X and Y can be expressed as in equation (12) [12], [46].
where p is the joint probability mass function of x and y. ReliefF is a ranking approach for features based on a knearest neighbor algorithm and can be computed according to equation (13) [12].
ReliefF(x k ) = P(x k value|class d ) − P(x k value|class s ), (13) where x k is the k th feature, class d and class s are the different class and the similar class. P is defined as the probability.
Fisher score is a widely used supervised approach for features ranking according to their discriminant ability. Where, it evaluates features comprehensively, considering both minimizing the intra-class distance and maximizing the inter-class distance. Fisher score for the k th feature F k can be computed according to equation (14) [12], [47].
where µ k i and µ k j are the mean of the k th feature in the i th and j th classes, and σ i k and σ i k are the corresponding standard deviation. As it evaluates each feature individually; so, it cannot pay attention to redundancy.

F. SUPPORT VECTOR MACHINE (SVM)
Support Vector Machine is a widely used ML algorithm for classification and regression tasks with superior results. The conventional SVM classifier is defined for binary class problems through maximizing the margin between both classes depending on the training cases placed on the borders. For example, given a training dataset D with n samples {x 1 , x 2 , . . . , x n }, where x i is a feature vector in a vdimensional feature space belonging to either of two linearly separable classes C 1 and C 2 . Geometrically, the SVM algorithm finds an optimal decision boundary that achieves the maximal margin separating the samples of the two classes. Achieving this objective requires to solve the optimization problem, defined in eqnarray (15) [48], [49]: where, α i is the assigned weight to the training sample x i . If α i > 0, x i is called a support vector. C is a regulation parameter applied to trade-off between the accuracy of training and the complexity of the model to be able to achieve a superior generalization capability. K is a kernel function used to compute similarity between each pair of samples [48], [49].

IV. THE PROPOSED SYSTEM
In this section, we introduce our novel crack detection and crack severity recognition system. The proposed system is composed of two modules. The first one is the feature extraction module based on hand-crafted feature engineering and CNN-based deep feature learning techniques. The second module applies feature selection using hybrid filter-wrapper with MO-IBSSA for reducing feature dimensionality and improving crack severity recognition rate. Figure 1 depicts the general structure of the proposed system for crack detection and severity recognition. The idea behind our proposed system is to improve the crack severity recognition accuracy through using the optimal fused feature set consisted of hand-crafted features or CNN learned features.

A. IMAGE ACQUISITION
In this phase, real data of surface cracks is collected from an ancient building with crack problems. A crack is defined as a damage distinguishable by the human eye. Two primary datasets of 40  In this phase, data is prepared and pre-processed for the next training phase. As shown in Figure 1, the data preparation phase consists of multiple steps as follows: • step 1: Data-bank generation: The raw images were cropped into small images of (256×256 pixel resolution) for binary classification and of (128 × 128 pixel resolution) for multi-class classification, which were then manually annotated as intact or crack images for binary classification and as fine, moderate, and severe crack images for multi-class classification with the help of an expert.
• step 2: Data cleansing: The images, including wood patterns, highly illuminated, or have complex shading, which have high potential to trigger false alarms, are neglected.
• step 3: Image augmentation: Aiming to enlarge the training dataset via generating new samples similar to the original training samples, data augmentation was applied [31], [50]. In this work, spatial and Intensity transformations were used to generate new samples from training samples. Gaussian and salt-and-pepper noise (for only our-dataset-1 and our-dataset-2), flipping, rotation, and a combined transformation were applied for image augmentation in this work, according to the following systematic way: -Flipping image vertically.
-Adding Gaussian and salt-and-pepper noise.
-Combining the output images of the previous steps to create the final augmented dataset. Samples of different cracks and intact types are shown in Figure 2.

C. FEATURE EXTRACTION
In this phase, the proposed system utilizes two types of feature extraction methods for extracting features from the prepared images, including hand-crafted features, using both HOG and LBP methods, and CNN-learned features, using the VGG16 CNN pre-trained model. The output resulted from these methods was used for obtaining three feature vectors, which represent the characteristics of the input images.

1) CNN FEATURE LEARNING
In this step, the prepared training dataset is fed into a pretrained VGG16 deep learning model, which is pre-trained with ImageNet dataset as a base network for feature extraction. Extraction and learning of CNN-learned features is achieved in the fully-connected (FC) layer of the VGG16 pretrained model.

2) HAND-CRAFTED FEATURES ENGINEERING
In this phase, after converting the prepared images from RGB to grayscale, the data of HOG and LBP feature vectors are extracted from the grayscale images according to Algorithm 1 and Algorithm 2, respectively. The hand-crafted features extraction works as illustrated in Algorithm 4.

D. FEATURE FUSION AND SELECTION
This phase consists of two components, namely, feature preselection and feature re-selection.

Algorithm 4: Hand-Crafted Feature Vectors Generation
Input : RGB image Output: HOG feature vector HOGhist; LBP feature vector LBPhist 1 Convert RGB image into grayscale image GImg 2 Compute HOG feature vector HOGhist using Algorithm 1 3 Divide GImg into 15 non-overlapped regions as shown in Figure 3

1) FEATURE PRE-SELECTION AND FUSION STEP
Using all features, especially the high-dimensional features, as an input to a classifier may consume a lot of computational time while the results may be unsatisfactory. So, in the feature pre-selection and fusion step, a novel filter-based feature pre-selection is applied to pre-select the N th highly ranked features from the original high-dimensional space. These pre-selected features are candidates for the next feature reselection step. Generally, this feature pre-selection step works according to Algorithm 5.

Algorithm 5: Features Pre-Selection
Input : Training dataset T Output: Selected features Feat pre 1 Compute each of mutual information MU , fisher score FS, and ReliefF score RS according to equations (12), (13), and (14) using T 2 Select the highest weighted 80% of ranked features MU r , FS r , and RS r according to MU , FS, and RS 3 Compute the pre-slected features Feat pre as the intersection among MU r , FS r , and RS r 4 return Feat pre After selecting the top-ranked features form the strongly relevant features using algorithm 5 for both HOG histogram and LBP histogram, a normalization process is performed to transform all feature vectors within a range [0,1], to make the feature vectors compatible with each other when fused later. The feature pre-selection and fusion approach of hand-crafted features is illustrated in Figure 4.

2) FEATURE RE-SELECTION STEP
After conducting the feature pre-selection step, which reduces the original feature dimensionality, another feature res-election step is carried out for getting the optimal trained SVM (polynomial kernel of degree = 2) model with the highest training and testing Kappa index using a wrapper feature selection method based on MO-IBSSA. The output of this step is the optimal feature subset. As discussed in section III-D, an IBSSA is proposed in [20] to be applied for feature selection problems. In this paper a multi-objective version for IBSSA is developed.
In multi-objective problems, the IBSSA is able to save multiple solutions as the best solution, and update the food source with the best obtained solution so far in every iteration. In order to achieve this goal, the IBSSA is equipped with a repository of food sources for maintaining the best obtained non-dominated solutions so far during optimization and selecting the food source from a group of non-dominated VOLUME 8, 2020 solutions with the least congested neighborhood using ranking process and Roulette Wheel selection, like the one employed in the repository maintenance operator, but with a different probability of selecting the non-dominated solutions [51]. The workflow of the proposed MO-IBSSA is depicted in Figure 5.
Due to the fact that the Kappa index considers not only the correct rate of a classifier, but also the distribution of per class error, it is utilized in this research as an objective function. It can be defined according to equation (16) [19].
where, p o ( , D) and p e ( , D) are the relative observed agreement (similar to accuracy) and the hypothetical probability of chance agreement between the labeled data in the dataset D and the classifier C, respectively. Since the extracted features from crack images are of a very high dimensionality and at sometimes there is a small number of samples, in this research, both training Kappa index and testing Kappa index are used as two objective functions for MO-IBSSA to avoid over-fitting as an alternative to cross-validation approach. That is, applying cross-validation to evaluate all the solutions in the population in all the generations of an optimization algorithm can be quite expensive in computational cost.
The two objective functions are defined as O 1 = κ( , D training ), O 2 = κ( , D testing ). The proposed MO-IBSSA aims to maximize both O 1 and O 2 , since it provides a set of non-dominated solutions. Thus, for comparison and evaluation purposes, a random solution (the solution that achieves the best balance between the two objective functions) is selected for evaluating performance of the proposed approach with 10 selected UCI datasets, crack detection, and crack severity datasets.

E. CLASSIFICATION
Finally, for classification phase, the proposed approach uses the final optimal subset of features obtained by the employed feature selection method for crack detection and crack severity recognition purposes. Also, it applies SVM algorithm (polynomial kernel with degree = 2) as a classifier.

V. EXPERIMENTAL RESULTS AND DISCUSSION
This section presents and discusses all the details related to the experiments carried out to investigate and evaluate the performance of the proposed MO-IBSSA based hybrid filter-wrapper feature selection approach. Moreover, all the details related to evaluating the proposed crack severity recognition system are also described. Simulation experiments were performed on 32 GB RAM, Intel Core i7-4610M CPU (3.00 GHz, 1600 MHz, 4 MB L3 Cache, 2 cores, 37W) and NVIDIA Quadro K5100M 8 GB RAM HP ZBook 15 G2 Mobile Workstation. The proposed approach is designed with Matlab R2015b release on Windows platform.

A. DATASETS AND EVALUATION METRICS
Several experiments were conducted with different crack images datasets as listed in Table 4 using the proposed crack detection and crack severity recognition system. The filter-wrapper feature fusion and selection method based on  MO-IBSSA is used to reduce feature dimensionality and increase crack detection accuracy.
The first three cracks datasets in Table 4 are expert-based annotated primary datasets of crack images, collected over 2 years from a historical location. Figures 6, 7 and 8 show some samples are selected from the three datasets.
To evaluate the performance of the proposed system, several performance metrics, namely, Accuracy, Recall (Sensitivity), Precision, and F-measure, were calculated according to equations (17)-(21), respectively. In addition, the Jaccard similarity score and the Geometric Mean (GM) are calculated using equation (22) and equation (23), respectively, where, TP is the true positive values, FP is the false positive values, TN is the true negative values, FN is the false negative values, and N is the total number of observations.
GM = Sensitivity * Specificity. (23) Also, the Area Under Curve (AUC) separability measure is calculated. The AUC metric is commonly used as a performance measurement for classification problems using various threshold settings. It represents a measure for model's separability or distinguish-ability between classes.

B. CRACK DETECTION EVALUATION
To evaluate the proposed crack detection and crack severity recognition system, each dataset is randomly split into 70% and 30% of the samples as the training set and testing sets, respectively using hold-out cross-validation. In all VOLUME 8, 2020 experiments, 5 independent runs are adopted, the population size is set to 20, and the maximum number of generations is set to 100. The filter-based features pre-selection step is first run on the training set to obtain the most relevance, discriminant features and to reduce the dimensionality for the wrapper-based feature selection approach. Then, the performance of the optimal feature subsets is evaluated by the learning/classification algorithm on the testing set. The SVM classifier, with polynomial kernel (degree = 2) is selected as the learning algorithm due to its superior results in the field of image classification.
For crack detection datasets (1, 2, 4), the average and standard deviation of the fitness and reduction rate, obtained using the subsets of features selected by the proposed system as the training and testing datasets over the 5 independent runs, are presented in Table 5. In addition, the average running time is also shown. For crack severity recognition dataset (3), two types of features, namely, the fused hand-crafted features and CNN-learned features extracted from VGG16 deep learning model are used for recognition purposes. The average and standard deviation of the attained fitness and reduction rate for the training and testing datasets over the 5 independent runs are also presented.
Tables 6 presents the accuracy and F-measure of classification using the whole raw features as well as the accuracy and F-measure of classification using the selected subset of features. From Table 6, it is noticed that the proposed crack detection and severity recognition system outperformed the traditional classification, which doesn't apply feature selection, for the 3 crack detection datasets (1,2,4). Whereas, it increased the detection rate by approximately 27%, 27%, and 57% for bridge-cracks-dataset [39], our-dataset-2, and our-dataset-1, respectively. It also reduced the features by approximately 64.68%-68.8% for all cases.
For crack severity dataset (3), the proposed system improved the recognition rate by 5.84% using the fused handcrafted features. While, there is a slight degradation in performance by only 0.44% using the learned features, but with feature reduction rate of approximately 54.9%.
Tables 7 and 8 present the different performance metrics mentioned above, which are used to evaluate the performance of the proposed crack detection and severity recognition.
In Table 7, the performance of the proposed system is illustrated considering our-dataset-1, our-dataset-2, and bridgecracks-dataset [39]. For our-dataset-1, it was noticed that the achieved accuracies were 96.27% and 89.82% for internal and external testing datasets, respectively. Moreover, the obtained recall percentages were 84.97% and 76.36% as a result of 509 TP hits out of 599 actual crack images and 884.2 TP hits out of 1158 actual crack images for internal and external testing datasets, respectively as shown in Figure 9 (I).
Also, for our-dataset-2, it was noticed that the attained accuracies were 96.86% and 93.85% for internal and external testing datasets, respectively. Moreover, the achieved recall percentages were 93.52% and 92.84% as a result of 1083 TP hits out of 1158 actual crack images and 591.4 TP hits out of 599 actual crack images for internal and external testing datasets, respectively, as shown in Figure 9 (II).
For bridge-cracks-dataset [39], it was observed that the obtained accuracy was 93.46%. Moreover, the achieved recall percentage was 85.37% as a result of only 1115.8 TP hits out of 1169 actual crack images, as shown in Figure 9 (III). Accordingly, it was concluded from Table 6 that the proposed system in general improved the accuracy by ≈ 27%-57% and the F-measure by ≈ 23%-42% compared to the traditional classification approaches that use the whole set of features with an accuracy of 93.46% to 96.84% with both Crack and Intact classes and with a discriminatory power AUC of 94.19% to 98.42%. Table 8 shows the performance of the proposed crack detection and severity recognition system considering our-dataset-3 using both fused hand-crafted and VGG16 learned features.
For crack severity recognition dataset (3), it was noticed from Table 8 that accuracies of 61.85% and 80.41% were obtained using fused hand-crafted and VGG16 learned features, respectively. Moreover, the achieved recall percentages were 62.15% and 80.52% using fused hand-crafted and VGG16 learned features, respectively. 84306 VOLUME 8, 2020   It was noticed from Figure 10 (a) that the proposed severity recognition system can recognize crack severity by an accuracy of 67.8%, 54.1%, and 61.7% for fine, moderate, and severe crack, respectively. It was also noticed that a high confusion was shown between moderate and severe classes with an average of approximately 31.5%, between fine and moderate classes with an average of approximately 16.3%, and finally, between fine and severe classes with an average of approximately 10.4% and an AUC of 0.9287. On the other hand, as shown in Figure 10 (b), using VGG16 learned features improved the performance by 17.7%, 15.9%, and 23.5% for fine, moderate, and severe crack, respectively. While, the confusion between crack severity recognition is minimized by 15.2%, 4.3%, and 9.3% between (moderate and severe cracks), (moderate and fine cracks), and (fine and severe cracks), respectively.

C. FEATURE SELECTION EVALUATION
For more validation and to investigate the performance of the proposed feature selection approach, we tested 10 representative datasets obtained from the UCI Machine Learning Datasets Repository [52], as listed in Table 9, including different numbers of features, samples, and classes. The baseline approaches based on the single objective function described in equation (24), which are performed on the same datasets, is briefly presented, as the results obtained by the proposed hybrid filter-wrapper feature selection method will be VOLUME 8, 2020  compared with those obtained by these baseline approaches.
where E is the classification error rate, R represents the number of selected feature subset and N represents the total  number of features in the dataset, α and β are two constants controlling the importance of classification performance and the length of features subset. All the tabulated evaluations of the proposed method are recorded and compared to state-of-the-art methods using an HP ZBook 15 G2 Mobile Workstation with 32 GB RAM, Intel Core i7-4610M CPU (3.00 GHz, 1600 MHz, 4 MB L3 Cache, 2 cores, 37W), and NVIDIA Quadro K5100M 8 GB RAM. The proposed approach is designed and developed using Matlab R2015b release on Windows 10 64-bit platform. To have fair comparisons, all methods are developed and tested using MATLAB and by the same computing platform in order to use the same global settings for all algorithms. That is, all used algorithms are uniformly randomly initialized. Besides, all algorithms have a population size of 20 search agents and a number of iterations = 100. These values are selected after conducting several initial empirical studies. Moreover, the average and standard deviation of fitness, and reduction rate, in addition to the average computational time over 20 independent runs are used for comparison purposes.
Generally, to provide a credible evidence that the system has a good generalization performance, k-fold crossvalidation was used to asses the processes of feature selection and training [12]. Thus, all baseline approaches were assessed using 5-fold cross-validation to be compared against the proposed feature selection approach and to highlight the effectiveness of the proposed approach without using k-fold cross-validation. On the other hand, to evaluate the proposed feature selection approach, each dataset is randomly split into 70% of the samples as the training set and 30% of the samples as testing set using hold-out cross-validation.  For comparison purposes, we used four different algorithms, namely, BPSO, BGWO, BSSA, and IBSSA. The details of parameters of these algorithms are presented in Table 10. The values of listed parameters in Table 10 have been selected based on both initial experiments and previous researches [21], [26], [29].
The filter approaches were first run on the training set as a pre-selection step to obtain the most relevance, discriminant features and to reduce the problem dimensionality for the wrapper feature selection approach. Then, the performance of the optimal feature subsets was evaluated by the learning/classification algorithm on the testing set. Due to its simplicity and popularity, the learning algorithm is selected as the KNN, where K is set to 5 in the experiments.
The average and standard deviation of the obtained fitness and reduction rate for all the base-line algorithms and the proposed MO-IBSSA over the 20 independent runs are illustrated in Tables 11 and 12. In addition, the average running time is also presented.
According to Tables 11 and 12, it can be observed that the proposed hybrid filter-wrapper feature selection method based on MO-IBSSA achieved higher performance and feature reduction rate over 70% and 90% of the used datasets, respectively. Also, based on the presented results, it can be noticed that the proposed feature selection method can generally evolve a small number of features and achieve similar or better classification performance compared to all the other optimization algorithms except for Austuralian, Colon cancer and SonarEW datasets. For Austuralian dataset, BPSO has a slight improvement in terms of accuracy by only 0.79% at the expense of increasing the average running time by more than twice. While BGWO has an improvement in terms of accuracy by 2.42% with longer running time by approximately 3 times and more features by 16% for SonarEW dataset. For Colon cancer dataset, the basic Salp optimization algorithm has a very slight improvement by only 0.28% at the expense of increasing the number of features by 53.68% and the average running time by more than twice.  In terms of reduction rate, the proposed hybrid filterwrapper feature selection method based on MO-IBSSA is in the first place, where it has the lowest number of features over 90% of the used datasets. BPSO comes in second place, followed by BGWO then Improved Salp optimization algorithm, and finally, the basic Salp optimization algorithm in the last place. In terms of average time, the proposed method has the shortest average running time over 90% of the used datasets. Only for Clean-1 dataset, BPSO beats the proposed method. The proposed method minimizes the running time by more than 50% for the most of used datasets. As, comparing the proposed method with other baseline approaches, it is observed that the proposed method can reduce the computational time at least a half compared with other approaches in most cases.
Tables 13-17, present the different performance metrics mentioned above, which are used to evaluate the different algorithms on the testing dataset.
The average and standard deviation of the different performance metrics for the achieved results from performing the proposed hybrid filter-wrapper feature selection method based on MO-IBSSA on 10 UCI datasets over the 20 independent runs are illustrated in Table 13.
The average and standard deviation of the different performance metrics for the attained results from performing the BPSO feature selection method on 10 UCI datasets over the 20 independent runs are illustrated in Table 14.
The average and standard deviation of the different performance metrics for the obtained results from performing the BGWO feature selection method on 10 UCI datasets over the 20 independent runs are illustrated in Table 15.    The average and standard deviation of the different performance metrics for the achieved results from performing the BSSA feature selection method on 10 UCI datasets over the 20 independent runs are illustrated in Table 16.
The average and standard deviation of the different performance metrics for the obtained results from performing the IBSSA feature selection method on 10 UCI datasets over the 20 independent runs are illustrated in Table 17. in the worst case, as all salps in the population may wish to enter into the archive.
• The computation of the objective function has O(Cof * N ) computational complexity.
• The computational complexity of updating the Salps position is O(d  *  N ).
Thus, the overall complexity of the proposed MO-IBSSA is O(t * (M * N 2 + Cof * N + d * N )), where M indicates the number of objectives functions, t represents the number of iterations, d is the number of variables (dimension),N is the number of solutions, and Cof indicates the cost of objective function. Despite observing that the proposed MO-IBSSA approach achieved the same computational complexity of the basic multi-objective SSA version, it also achieved less computational time through using the filter-based pre-selection phase that reduces the feature set dimensionality. Table 18 shows a comparative summary for characteristics of the proposed approach against several state-of-the-art crack detection approaches.

E. COMPARATIVE ANALYSIS AGAINST STATE-OF-THE-ART CRACK DETECTION approaches
From Table 18, it is noticed that the proposed system outperforms the model based on HOG and LBP proposed in [38], with F-measure ratio. The proposed system obtains Fmeasure ratio of 96.86% using hand-crafted features. Compared to the systems proposed in [6], [32], [34], [35], [37], [39] based on CNN deep learning models, the proposed system outperforms all of them in terms of accuracy. The proposed system achieves an accuracy of 96.86% for crack detection datasets. Moreover, the proposed system has a feature reduction rate of 64.86%-68.8% and the shortest computational time. On the other hand, the proposed system shows slightly lower accuracy by a maximum of 2.14% against all of the proposed systems in [31], [33], [36], but it used only approximately 33% of features with the shortest time, there is no need for high computational resources. However, as accuracy alone is not sufficient for reflecting the actual performance of crack detection systems, in this paper several additional performance metrics have been measured. Finally, up to our knowledge, the proposed system is the first system, which uses optimization techniques in the field of crack detection.

F. LIMITATIONS
As illustrated in the previous sections, there are a lot of advantages of the proposed system, such as having a feature reduction rate of approximately 61% on average for both crack detection and crack severity recognition, reducing computational time to approximately half of the original time, and improving the performance rate of crack detection. Moreover, as far as we know, it is considered the first trial for developing a system that handles the problem of crack severity recognition based on a hybrid filter-wrapper with multi-objective bio-inspired optimization feature selection method. However, the proposed MO-IBSSA feature selection method has several observed limitations to be stated as well. The limitation of the proposed MO-IBSSA is that it still has a lower feature reduction rate, where it selects more features than other optimization algorithms over most of the used datasets. Therefore, a new filter-based selection strategy is used as a pre-selection phase to strengthen the proposed algorithm to select fewer features. Also, the proposed MO-IBSSA feature selection method has other limitations, as listed below. 1) the proposed crack severity recognition is still suffering high confusion between moderate and severe crack severity, 2) the proposed feature pre-selection step applies only filtering method aiming to select the high-ranked features without considering the redundant features, and 3) another limitation related to the fitness function, where the Kappa index may not be an accurate reflector of the true level of agreement between raters as it is influenced by the prevalence of the condition under observation.

VI. CONCLUSION AND FUTURE WORK
This paper proposed a novel crack detection and crack severity recognition system that utilizes a hybrid filterwrapper feature selection method based on a Multi-Objective Improved Binary Salp Swarm Algorithm. The proposed system consists of image acquisition, data preparation through dividing crack images into small patches, data cleansing and augmentation, feature extraction, fusion, selection, and classification. In order to train and validate the proposed system, we implemented multiple experiments with 4 collected primary real-world datasets and other selected UCI publicly available benchmark datasets. Based on the obtained experimental results, the essential finding is that the proposed crack detection system in general improves the accuracy by ≈ 27%-57% and the F-measure by ≈ 23%-42% compared to the state-of-the-art classification using the whole set of features with an accuracy that ranges 93.46%-96.84% on both Crack and Intact classes and with high discriminatory power AUC that ranges 94.19%-98.42%. Moreover, a feature reduction rate of approximately 54.9% -68.8% was achieved with all datasets.
Moreover, for crack severity recognition, it was observed that using the VGG16 CNN learned features outperformed the performance of the fused hand-crafted features by 17.7%, 15.9%, and 23.5% for fine, moderate, and severe cracks, respectively. Also, using the CNN learned features reduced the confusion between crack severity degrees. Moreover, it is worthy mentioning that the proposed system with CNN learned features led to a slight degradation in performance by only 0.44%, while reducing the features by approximately 54%.
For future research, several challenges and research directions could be considered, such as investigating the performance of end-to-end deep learning models using the proposed feature selection approach. Also, applying semantic segmentation to crack images as an additional pre-processing VOLUME 8, 2020 step to improve the recognition rate of crack severity is another challenge.