Optimization Driven Adam-Cuckoo Search-Based Deep Belief Network Classifier for Data Classification

Data classification effectively classifies the data based on the labeled class distribution. To classify the data using the imbalanced distribution poses a significant challenge in the class inequity problem. Various data classification methods are developed in the learning framework, but proving better classification accuracy is a significant challenge in the application domain. Therefore, an effective classification method named Adam-Cuckoo search based Deep Belief Network (Adam-CS based DBN) is proposed to perform the classification process. At first, the input data is forwarded to the pre-processing stage, and then the feature selection stage. The wrapper-based feature selection model conducts the search in space with the possible parameters. The operators specify the connectivity between the states and select the features based on their state. The classification is performed using the Deep Belief Network (DBN) classifier such that the multilayer perceptron (MLP) layer of Deep Belief Network (DBN) is trained using the proposed Adam based Cuckoo search (Adam-CS) algorithm. The breeding behavior of cuckoos is integrated with the step size parameter to enhance the accuracy of the classification process. The adaptive learning rate parameter effectively estimates the moments using a sparse gradient. The proposed Adam based Cuckoo search (Adam-CS) algorithm attained better performance using the metrics, such as accuracy, specificity, and sensitivity, with 90% training data.


I. INTRODUCTION
Classification is a significant process in the pattern recognition framework. Various learning models, like backpropagation neural network, support vector machines, Bayesian network, decision tree, newly formed associative classification, and nearest neighbor are introduced in the classification scheme, which are effectively applied in different application domains [1]. In general, classification is a complex issue in the data analysis framework, because it is required to access a huge amount of data to train the classifier. The classifier is modeled using the training data, and further, the training data is utilized to perform the classification process in the future data. Mostly, the classifier uses the structure format, which will be further used to classify the future data but The associate editor coordinating the review of this manuscript and approving it for publication was Liangxiu Han . not the noise that exists in the training information [2]. Moreover, the class distribution with the imbalanced data set encountered a complex issue in most of the classifier learning techniques [1]. Classifying the data using the imbalanced distribution of the class poses a significant challenge in the performance obtained by the classifier algorithms, which ensure the balance between the misclassification costs and the class distribution. Moreover, the significant complexity and the frequent occurrence of the class inequity problem signify the requirement of extra efforts in the research field. Classification is a significant process in the data mining and knowledge discovery in databases (KDD) scheme. The classification model is used to learn the function from the training data that makes few errors when applied to the previously unseen data. Moreover, the report generated from both the industry and the academia specifies that the imbalanced distribution of the class data set poses a major issue in most of the learning models in the classifier approaches, which further used to generate a moderately imbalanced distribution [3].
In the state of art machine learning approach, the deep learning technique is recently gained major attention in the data classification. The traditional training approach has the drawbacks of the need for maximum iteration and local minima [36]. The major reason behind the deep learning model is that it automatically generates valuable features in the problem domain by eliminating the handcrafted and the complicated engineering process. By designing the deep learning infrastructure as special, the Convolutional neural network (CNNs) is widely applicable in image segmentation and image recognition, as it focuses on the temporal and the spatial correlation between pixels [4]. The deep learning approach (DL) is a field of the machine learning algorithm, which functions based on the learning representations of multiple levels to make the features hierarchy as effective. The higher levels in the class distribution are obtained from the lower level, whereas the features available at the same level helps to define various higher level class features [5]. The structure of DL is used to extend the CNN by including additional hidden layers in the network infrastructure between the output and the input layers for modeling highly nonlinear and complex relationships [31]. Hence, the DL gained more interest among the researchers recently based on its performance to attain better solutions in various medical application-oriented problems, 3D Sensed Data Classification [38], and image analysis, like image classification, segmentation, denoising, and registration [6]. The DL method consumes less time than traditional methods [37]. Recently, the DT techniques are triggered by various learning models or traditional representations, respectively. DL uses the information data set and assimilates the complex behavior for selecting the effective characteristic features automatically with the profound layers using the neural network structure [7].
The DL method attained significantly better performance recently in the classification of Hyperspectral images (HSI). In [8], the DL technique, like DBN is explored in the classification of HIS for the first time. The spatial correlation is incorporated into the DBN to attain better classification. In [11], the DBN performance is enhanced by promoting diversity prior to the latent factors. CNN is the most significant deep learning model, as most of the reports based on CNN is generated in recent years [12]. The spectral based joint features are used in the DL model to minimize computational complexity [13]. Due to the availability of various hidden layers in the non-linear functions, the DL is more effective in different tasks, like machine translation, image classification, network optimization [15], and automatic speech recognition [14], which efficiently enables the higher-level data in the logistic functions [16]. In DL, the Long Short Term Memory (LSTM) is used to improve the classification results [17].

A. LITERATURE SURVEY
Various existing classification methods are surveyed: Hassan and Mahmood [20] developed a joint CNN and RNN model for extracting the higher-level features. Due to the pooling and the Convolutional layer, the long term locality dependencies were effectively captured. It attained better performance in the sentiment benchmark analysis framework. Moreover, it failed to use the model in the machine translation and information retrieval process. Perera and Patel [21] introduced a deep learning approach to learn the feature from one-class classification. It operates under CNN and generates the descriptive features for each class in the feature space. It attained better performance in the state of art mechanism but failed to generate precise values. Zhong et al. [22] developed a deep learning approach for the remotely sensed data. It captures the temporal variations and focuses on seasonal patterns. It used the feature extractor for extracting the features based on the time-series data. It offered an efficient and effective representation of data in the classification tasks. It does not provide a provision to integrate the patterns with human knowledge. Wang et al. [23] introduced a deep CNN based approach to extract the hidden patterns automatically from the smart meter and massive data. It extracts the features from the load files and effectively identifies the demographic information. It enhances the accuracy of the learning model, but the granularity of the data might be affected. Zhang et al. [24] introduced an adaptive dropout deep computation model that was developed for the data features in the industrial IoT framework. It prevents overfitting in the distribution function by setting the dropout rate at each hidden layer. Some other optimal techniques were not considered in the initialization process. Hossain and Muhammad [25] modeled an urban environment classification framework for communication purposes. It utilized mobile edge technology and offered efficient transmission with low latency. It effectively classified the noise and was robust against sound classification. It failed to use the noise cancellation algorithm. Zhai et al. [26] developed an ensemble learning model to train the hidden layer in the feed-forward network. It enhanced prediction stability but failed to consider the relationship between testing and dropout probability. Chambon et al. [27] introduced a deep learning approach to learn the end to end computing spectrograms. It learns the spatial filters for exploiting the array of sensors in the softmax classifier. It attained better classification performance with minimum computational cost but the drawback of this method was the limitation of the spatial context.
The main intention of this work is to design and develop an effective data classification method using the proposed Adam-CS algorithm. The proposed algorithm involves three stages which are depicted as follows: Pre-processing, feature selection, and classification. Initially, the input data is subjected to the pre-processing stage, where the missing value imputation and log transformation are performed VOLUME 8, 2020 to remove the noise and artifacts present in the data. The resulted pre-processed data is forwarded to the feature selection stage, where the essential features are effectively selected using the wrapper-based feature selection method. Finally, the data classification is carried out with the selected features using the DBN classifier, which is trained by the proposed Adam-CS algorithm.
The major contribution of this work is explained as follows: •The missing value imputation and log transformation are performed in the pre-processing stage to remove the noise and artifacts present in the data. The features are selected by the wrapper-based model, which finds the state has the maximum evaluation such that it uses accuracy estimation for both the evaluation function and heuristic function.
•The selected features provide an intuitive and effective abstraction, which facilitates the data classification. Moreover, the data classification is performed using the DBN classifier, which is trained by the proposed Adam-CS algorithm. The breeding behavior of cuckoos is integrated with the step size parameter to enhance the accuracy of the classification process. The adaptive learning rate parameter effectively estimates the moments using a sparse gradient.
The rest of this paper is organized as follows: Section 2 elaborates on the proposed algorithm, and section 3 discussed the results and discussion along with the performance evaluation. Finally, section 4 concludes the paper.

II. PROPOSED ALGORITHM DATA CLASSIFICATION
This section describes the overall details about the proposed Adam-CS based DBN.

A. PROBLEM FORMALIZATION
Data classification is a method of organizing the data into various categories, which is useful in different filed for future usage. Some of the challenges in the data classifications are: Due to the increasing population in urban areas, it is very difficult for the service provider for managing the big data coming from various devices and users, which leads to degrade performance [25]. The training data may overfit in the deep learning model, as there exists a small dataset in the medical image classification. Hence it required more processing functions to acquire image data and annotation, which is a major challenge in the medical image analysis [18]. CNN is the significant architecture used to perform the critical classification operations using the convolution filter. Training the CNN using the large labeled data set is a complex and time-consuming issue in the classification model [6]. The DL model performs the classification through the Hyperspectral data. The feature representation is dependent on the training samples in the deep approach, which is a major challenge associated with the classification framework. Moreover, the manual annotation of the Hyperspectral data is significantly complex [19]. The extreme learning machine model is quite simple and efficient to train the hidden layer, but selecting the architectural features and the prediction instability was a major problem in the feed forward network [23]. The ability to quickly generate large sets of data by validating and evaluating the data poses a major challenge in data classification.
These are the drawbacks found in the existing methods for data classification. The proposed Adam-CS based DBN is used to overcome those drawbacks of the existing methods.

B. ARCHITECTURE OF THE PROPOSED SYSTEM
The proposed approach involves three stages, which are depicted as follows: Pre-processing, feature selection, and classification. Initially, the input data is passed into the pre-processing phase, where the missing value imputation and log transformation is performed to remove the noise and artifacts present in the data. The resulted pre-processed data is subjected to the feature selection phase. Moreover, the feature selection is carried out using the wrapper selection method. The features selected from the data provide an intuitive and effective abstraction, which is used to facilitate the data classification. Finally, the selected features are subjected to the classification phase. In the classification stage, the data classification is carried out using the DBN classifier, such that training of the DBN classifier is performed using the proposed Adam-CS optimization algorithm. The proposed algorithm is the integration of the Adam optimization algorithm [28] and the Cuckoo search (CS) algorithm [29]. Figure 1 shows the architecture of the proposed Adam-CS based DBN for data classification.

1) PRE-PROCESSING
At first, the input data k with the dimension [X × Y] is fed to the pre-processing stage, where the log transformation and the missing data imputation are effectively carried out to remove the noise and artifacts present in the data. The input data k is represented as, where, k ij denotes i th data at j th attribute, X is the total number of data, and Y denotes the total number of attributes. Missing data imputation is a statistical model, which replaces the missing data points with the substituted values. The log transformation process is used to minimize the variability of data by transforming the skewed data to approximately conform to normality. The log transformation process is denoted as, where L denotes the log transformation process, and K is the pre-processed data. The resulted pre-processed data is denoted as L with the dimension of X × Y.

2) WRAPPER-BASED FEATURE SELECTION MODEL
The pre-processed data L is subjected to the feature section stage, where the essential features X is effectively selected using the wrapper-based feature selection model. The wrapper model performs the search in the space using the possible parameters [30]. The search in the space requires a search engine, a termination condition, an initial state, and statespace, respectively. Each state in the search space denotes a feature subset. For a number of features there exists a bits in the state space, and each bit determines '1' for the presence of features and '0' for the absence of features. The operators specify the connections between the states and the operators are selected to use for adding or deleting the features from the state space. The heuristic function is used to find the state space with maximum evaluation such that the accuracy estimation is used in both the evaluation function and the heuristic function. The accuracy estimation effectively executes the crossvalidation in the small dataset as the small dataset uses less time to estimate and learn the function. The wrapper approach uses the feed forward and the backward elimination model. In the forward model, the searching process begins with the empty feature set, while the backward approach starts the search using a full feature set. Most of the state uses the empty feature set as it is faster in building the classifier. The complexity penalty is included in the evaluation function to penalize the subset of features. The penalty value is set as 0.1%, which is small when compared to the accuracy estimation. Moreover, the feature subset is randomly selected based on the estimated accuracy. In the feature selection process, the dimension of the pre-processed data may change based on the usage of the dataset. The features selected using the wrapper approach is denoted as, Z has the dimension of [303 × 13] using heart disease dataset, the dimension of [286 × 9] using breast cancer dataset, the dimension of [32 × 56] using lung cancer dataset, and the dimension of [768 × 8] using diabetes dataset. The resulted pre-processed data Z is further send to the feature selection module.

3) PROPOSED ADAM-CUCKOO SEARCH-BASED DEEP BELIEF NEURAL NETWORK
DBN classifier is used to perform the data classification using the multiple layers of hidden units. The RBM layers use unsupervised learning, whereas the MLP layer undergoes the supervised learning based on the new optimization algorithm. The optimization is based on the characteristics of adam optimization and cuckoo search algorithm. The breeding behavior of cuckoo species is effectively used to compute the weight and bias for training the classifier. In order to compute the weight and bias optimally, the Adam features are used with the cuckoo species to attain better performance in performing the data classification.

a: ARCHITECTURE OF DEEP BELIEF NEURAL NETWORK
The proposed DBN classifier is developed using two restricted Boltzmann machine (RBM) layers and one VOLUME 8, 2020 MLP layer. In DBN, there are no connections among the hidden neurons as well as among the visible neurons. The feature vector F is passed as the input to the visible layer of RBM layer 1. The resultant output from hidden layers of RBM layer 1 forms the input for RBM layer 2, and the output of RBM layer 2 is the input to the MLP layer. Figure 2 shows the architecture of the DBN classifier.
The feature vector F is passed as input to the visible layer of RBM layer 1 such that the visible and the hidden layer of RBM layer 1 is expressed as: where, f denotes the hidden neurons, b 1 m indicates the m th visible neuron at RBM layer 1, and c 1 n denotes the n th hidden neuron and each neuron presented in the hidden and the visible layer has a bias.
Let x and y represent the biases of the visible and hidden layers. Therefore, the two biases that correspond to the neurons at both the layers are represented as, where, y 1 n indicates the bias that corresponds to n th hidden neuron, and x 1 m denotes the bias that corresponds to m th visible neuron. The weight applied to the RBM layer 1 is expressed as, where, S 1 mn denotes the weight between the m th visible and the n th hidden with the weight vector dimension as, l×f. Hence, the output of the hidden layer in RBM layer 1 is calculated using the weight and bias associated with each neuron, which is expressed as, where, λ denotes the activation function. Moreover, the output generated from RBM layer 1 is expressed as, Moreover, the learning process of RBM layer 2 is processed using the output obtained from the hidden layer of RBM layer 1. The output of RBM layer 1 is specified in Eq. (9), which is fed as the input to the visible layer of RBM layer 2. Hence, the number of visible neurons in RBM layer 2 is equivalent to the number of hidden neurons of RBM layer 1, which is represented as, where, c 1 n denotes the output vector of RBM layer 1. Thereby, the hidden layer at RBM layer 2 is represented as, The bias of the hidden and the visible layer has a similar representation as specified in Eq. (5) and (6), but they are represented as x 2 and y 2 . The weight vector applied to the RBM layer 2 is denoted as, where, S 2 nn denotes the weight between the n th visible and n th hidden neuron at RBM layer 2 with the weight vector dimension as f × f . Hence, the output of n th hidden neuron is computed as, where, y 2 n denotes the bias linked with n th hidden neurons. Hence, the output generated by the hidden layer is expressed as, The output of RBM layer 2 is given as the input of the MLP layer such that the neurons present in the input layer are denoted as, f . The input of the MLP layer is expressed as, where, q denotes the neurons of the input layer which is passed by the output of the hidden layer at RBM layer 2 c 2 n . Therefore, the MLP hidden layer is expressed as, where, C denotes the total number of hidden neurons. Let us consider, D A the bias of A th hidden neuron with the values of A = 1, 2, K, C. The output of MLP layer is expressed as, where, υ denotes the number of neurons present at the output layer. MLP contains two weight vectors, where one weight is applied between input and hidden layer while the other is applied between the hidden and the output layer, respectively. S E is the weight vector used between the input and hidden layers, which is expressed as, where, S E nA denotes the weight between n th input neuron and A th hidden neuron with the size of S E is [f × C]. Based on the weight and bias of the neurons, the output of the hidden layer is represented as, where, D A denotes the bias of the hidden neuron such that p n = c 2 n , as the output of RBM layer 2 is the input of MLP. Moreover, the weight between hidden and the output layer is indicated as S G and is represented as, Based on the output of the hidden layer and weight S G , the output vector is computed as, where, S G Aω denotes the weight between A th hidden and ω th output neuron, and d A denoted as the hidden layer output.
Training of RBM Layer 1: The training sample F is subjected to the input to RBM layer 1. It calculates the probability distribution for the hidden neurons and determines the positive gradient using the visible vector. Moreover, the probability of each visible neuron is identified, and the negative gradient function is calculated.
Training of RBM Layer 2: The output attained from the hidden layer of RBM layer 1 is passed as the input to the visible layer of RBM layer 2. It computes the probability distribution, and calculates the energy based on the weight and selects the weight having the lowest energy.
Training of MLP Layer: The MLP layer of the DBN classifier is trained using the proposed Adam-CS optimization algorithm. The training data obtained from the RBM layer 2 is passed as the input to the MLP layer. By analyzing the data, the network is modified iteratively to select the best weight. Moreover, the proposed Adam-CS is adopted to calculate the weight based on the error function. Adam-CS is developed by integrating the Adam optimization algorithm [28] with the Cuckoo search (CS) algorithm [29]. Adam optimization is a stochastic gradient descent algorithm used to efficiently compute the gradients in the data classification model. The cuckoo search is a metaheuristic algorithm, which aimed to perform the global search by imitating the best feature in nature.

b: PROPOSED ADAM-CS ALGORITHM FOR TRAINING THE DBN
The proposed algorithm is effectively used to train the MLP layer of the DBN classifier to enhance the performance of data classification. Adam-CS is the integration of the Adam algorithm and CS algorithm. The breeding behavior of cuckoos is integrated with the step size parameter to enhance the accuracy of the classification process. The adaptive learning rate parameter effectively estimates the moments using a sparse gradient. Cuckoos are the fascinating birds because of their effective reproduction, and beautiful sound-making strategy. Certain species, like Guin and ani, place the eggs in the common nest by removing other species eggs in the nest in order to enhance the hatching probability of the eggs. Moreover, various species lay their eggs in other species nest, which violates the brood parasitism. Three different kinds of brood parasitism are: nest takeover, cooperative breeding, and intraspecific brood parasitism. Some of the host birds have a direct interaction between other intruding cuckoos. When a host bird discovers an egg, and it came to know that it was not their own egg then, either it abandons the nest or removes the alien eggs away and designs a new nest somewhere. In the Tapera cuckoo species, the female parasitic is specialized in a patter of laying eggs, and colour, which minimizes the probability of eggs to be abandoned and maximize the reproductivity rate.
The proposed Adam-CS algorithm incorporates the metaheuristic algorithms with the stochastic gradient descent algorithm to enhance the performance for providing better classification accuracy. The breeding behavior of cuckoos is integrated with the step size parameter to enhance the accuracy of the classification process. The adaptive learning rate parameter effectively estimates the moments using a sparse gradient. The data classification is effective even using a single optimization algorithm, but to enhance the accuracy of the classification performance, the breeding parameter of the CS algorithm is integrated with the gradient function. Hence, the proposed Adam-CS algorithm effectively performs the data classification process and increases the accuracy rate of the classification.
Levy Flight: Furthermore, the time to lay the egg by some species is also fabulous. Parasitic cuckoos select the nest, where the other species bird laid their eggs. Moreover, the cuckoo species eggs will hatch earlier than the host eggs. When the first cuckoo species chick is hatched at once the cuckoo throws the host eggs away from the nest by slightly moving the eggs out of the nest, which maximizes the food sharing among the cuckoos chick. In general, the animals search their food in a quasi or random manner in nature. Moreover, the levy behavior of the animals is determined using the characteristic features of levy flight.
Cuckoo Search: The cuckoo search model uses the following rules for effectively describing the proposed algorithm.
• Each and every cuckoo lay a single egg and place the egg with the randomly selected nest.
• The nest has a better solution (quality of eggs) will be taking over to the subsequent generations.
• The host species either abandon the nest or throw away the alien eggs from the nest and build a new nest at some other location. The algorithmic steps involved in the proposed algorithm are discussed as follows: Population Initialization: The objective function of the cuckoo species is defined as, F (P) ; P = (P 1 , . . . . . . P v ) M , and the population initialization of the k host nest is specified as, P z (z = 1, 2, . . . , k), respectively. The new solution for the cuckoo species is computed using the below equation as, where, P u+1 denotes the new solution, z represents the cuckoo, ⊕ denotes the entry-wise multiplications, Levy (γ ) denotes the levy flight, and β indicates the step-size, which has the values of β > 0. Fitness Evaluation: The fitness is evaluated by considering the cuckoo egg when the cuckoo's egg is similar to the host egg, then egg of cuckoo species is likely to be discovered. The solution with the minimum error value is taken as the best fitness solution.
where, e w is the output of the classifier, and q t is the estimated output.
Update the Nest: The update equation of the Adam optimization algorithm by considering the step-size β is expressed as, where,f u denotes the average moving gradient,ŝ u denotes the squared gradient, and β denotes the step size, respectively. Let us assume, u = u + 1, and u − 1 = u. By substituting the value for u and u − 1 in the above equation results, By substituting the term β of Eq. (26) in Eq. (22), by assuming ϕ = P, the resulted equation is expressed as, Here, the term f u and s u is computed as, where, K = 10 −8 , and f u denotes the moving average gradient, and s u denotes the squared gradient, respectively. Hence,

Eq. (30) is the updated equation of the proposed algorithm.
Here,f u+1 denotes the average moving gradient,ŝ u+1 denotes the squared gradient, and p u+1 z specifies the new updated equation of cuckoo species.
Determine the Feasibility: The feasibility of the solution is determined to generate the best optimal solution. The fitness of the new solution is compared with the fitness of the previous solution, when the fitness of the new solution is higher than the old solution, then the old solution is replaced with the new solution as the optimal solution.
Termination: The above steps are repeated until the optimal solution is attained. Algorithm 1 shows the pseudo-code of the proposed Adam-CS algorithm for data classification.

III. RESULTS AND DISCUSSION
This section describes results and discussion obtained using the proposed Adam-CS algorithm along with the performance analysis. The performance of the classification model is evaluated and the analysis is carried out using the metrics, like accuracy, sensitivity, and specificity, respectively.

A. EXPERIMENTAL SETUP
The implementation of the proposed algorithm is carried out by the MATLAB tool using the Breast Cancer Data Set, Diabetes Data Set, Heart Disease Data Set, Lung Cancer Data Set, and Cardiotocography Data Set. The breast cancer data set contains 201 instances of one class and 85 instances of other classes. Here, the instances are specified using nine attributes. The diabetes data set contains numerous files, and each file holds four fields. Heart disease data set consists of 76 attributes, where the goal field determines the presence of heart disease. The lung cancer dataset describes three different pathological lung cancers, where the attributes contain a class label. The Cardiotocography Data Set contains 2126 instances, 23 attributes, and 162219 number of web hits. This is a multivariate dataset with real attribute characteristics.

B. EVALUATION METRICS
The performance of the proposed algorithm is evaluated using the metrics, such as accuracy, sensitivity, and specificity.
Accuracy: It is defined as the ratio of true measure with the total number of observations, which is expressed as, (34) where, I denotes the true positive, J indicates the true negative, and D specifies the accuracy. Sensitivity: It is defined as the ratio of true positive rate to the predicted observation rate, which is represented as, where, V denotes the false negative, and H is the sensitivity. Specificity: It is termed as the ratio of true negative measure with that of the unpredicted observations, which is expressed as, where, U denotes the false positive rate, and L is the specificity.

C. COMPARATIVE METHODS
The performance of the proposed Adam-CS based DBN is analyzed and is compared with the existing methods, such as Deep Artificial Neural network (Deep ANN) [31], Convolutional Recurrent Neural Network (Convolutional RNN) [20], Convolutional Neural Network (CNN) [21], Firefly and Particle Swarm Optimization with Levenberg Marquardt Neural Network (FFPSO+LM NN) [39], Particle Swarm Optimization-Genetic Algorithm in Artificial Neural Network (PSO-GA ANN) [40], respectively.

D. COMPARATIVE ANALYSIS
This section describes the comparative analysis of the proposed data classification approach made using different datasets are elaborated as follows: VOLUME 8, 2020 FIGURE 3. Comparative analysis using heart disease dataset, a) accuracy, b) sensitivity, c) specificity. Figure 3 shows the comparative analysis of the data classification methods when the heart disease dataset is used.    Figure 4 shows the comparative analysis of the proposed algorithm using the breast cancer dataset.  the proposed Adam-CS based DBN obtained a sensitivity of 0.8408. Figure 4 c) shows the analysis of specificity with respect to the training data. When training data = 50%, the specificity obtained by the proposed Adam-CS based DBN is 0.5764. Hence, the percentage of improvement reported by the proposed algorithm with respect to the existing methods, like Deep ANN, Convolutional RNN, CNN, FFPSO+LM NN, and PSO-GA ANN is 15%, 15%, 15%, 3.82%, and 4.37%, respectively. When training data = 60%, the specificity obtained by existing methods, like Deep ANN, Convolutional RNN, CNN, FFPSO+LM NN, and PSO-GA ANN is 0.5529, 0.60, 0.50, 0.6292, and 0.6195, respectively, whereas the proposed Adam-CS based DBN obtained specificity of 0.5765. Figure 5 shows the comparative analysis of the proposed algorithm using a lung cancer dataset. Figure 5 a) Figure 6 shows the comparative analysis of the proposed algorithm using the diabetes dataset.   Figure 7 shows the comparative analysis of the proposed algorithm using the Cardiotocography Dataset.

6) ANALYSIS BASED ON SENSITIVITY PARAMETERS
This section discusses the analysis based on the sensitivity parameters in the proposed Adam-CS based DBN algorithm, for Cardiotocography Dataset. Here, the sensitivity parameters considered are a) Hidden Neurons, b) Epochs, and c) Batch size.

a: ANALYSIS BASED ON VARYING THE HIDDEN NEURONS
The sensitivity analysis based on varying the hidden neurons is shown in Figure 8. The accuracy of varying the hidden neurons is shown in Figure 8 a) When the training data is 60%, the accuracy of the proposed Adam-CS based DBN algorithm The sensitivity analysis based on varying epochs is shown in Figure 9. Figure 9 a) shows the accuracy of the proposed system by varying the epochs. The proposed Adam-CS based DBN algorithm has the accuracy of 0.6715, 0.6892, 0.7031, and 0.7235, for varying the epochs 10, 20, 30, and 40, respectively, for training data 50%. The sensitivity of varying the epochs for the proposed Adam-CS based DBN algorithm is given in Figure 9 b. The sensitivity is 0.7054, 0.7218, 0.7218, and 0.7518, for the proposed Adam-CS based DBN algorithm for the epochs 10, 20, 30, and 40, respectively, for the training data is 50%. The specificity of varying the epochs for the proposed method is shown in Figure 9 c. When the training data 50 %, the specificity of the proposed Adam-CS based DBN algorithm based on varying the epochs is 0.6315, 0.6673, 0.6683, and 0.6793, for the epochs 10, 20, 30, and 40, respectively. Figure 10 shows the sensitivity parameter analysis based on varying the batch size for the proposed Adam-CS based DBN algorithm. The accuracy based on varying the batch size of the proposed Adam-CS based DBN algorithm is depicted in Figure 10 a. For the training data is 70%, the accuracy of the    Figure 10 c. When the batch size is 50, 100, 150, and 200, of the proposed Adam-CS based DBN algorithm is have the specificity of 0.739, 0.7695, 0.7595, and 0.7026, respectively, for the training data is 70%.

7) COMPARATIVE DISCUSSION
This section elaborates on the comparative discussion made using the proposed Adam-CS based DBN algorithm for data classification with the existing methods, such as Deep ANN, Convolutional RNN, CNN, FFPSO+LM NN, and PSO-GA ANN. Analysis using various datasets for the metrics, like accuracy, sensitivity, and specificity by varying the training data is discussed in Table 2. By analyzing five different datasets the metrics, like accuracy, sensitivity, and specificity, obtained the maximum value with the proposed algorithm is using training data = 90%. The reason for the performance improvement of the proposed system is the classifier to enhance the better accuracy of the classification, to increase the accuracy rate of the classification, to improve the quality of parameter size of classification, also to find the optimal solution.

8) SENSITIVITY PARAMETERS DISCUSSION
The comparative discussion made by the proposed Adam-CS based DBN algorithm for data classification using various sensitivity parameters, such as hidden layers, epochs, and batch sizes are elaborated in this section. Table 3 depicts the analysis based on the accuracy, sensitivity, and specificity of the Cardiotocography Dataset. The maximum accuracy and sensitivity occur when varying the hidden neurons of the proposed Adam-CS based DBN algorithm.

IV. CONCLUSION
An effective data classification method named Adam based Cuckoo search is proposed to perform the classification process. At first, the input data is subjected to the pre-processing stage, where the log transformation and missing value imputation is carried out for eliminating the noise and artifacts present in the data. Then, the wrapper-based approach is used to select the suitable features using the heuristic function. The data classification is achieved using the Deep Belief classifier, in which the MLP layer is trained by the proposed Adam based Cuckoo search algorithm. The breeding behavior of cuckoos is integrated with the step size parameter to enhance the accuracy of the classification process. The key idea of cuckoo species is that they walk in a random manner in a biased way using some random step sizes. Moreover, the proposed Adam-Cuckoo search based DBN attained better performance with respect to the metrics, such as accuracy, specificity, and sensitivity, with 90% of training data. In the future, the classification accuracy of the data classification approach will be enhanced using some other optimization algorithm.
HONG LI received the M.Sc. degree in mathematics and the Ph.D. degree in pattern recognition and intelligence control from the Huazhong University of Science and Technology, Wuhan, China, in 1986 and 1999, respectively. She is currently a Professor with the School of Mathematics and Statistics, Huazhong University of Science and Technology. Her research interests include approximation theory, wavelet analysis, learning theory, neural networks, signal processing, and pattern recognition.
HEMN BARZAN ABDALLA received the Ph.D. degree in communication and information engineering. He worked as a Project Assistant in various higher education places. He is currently working as a Lecturer with Wenzhou-Kean University with a member of the Institute of Training and Development in Sulaimani (KRG). He possesses one decade of experience in teaching. He has more than 100 project systems for several places. His research interests include big data, data security, NoSQL, and application. He is an editorial board member/Reviewer of international/national journals and conferences.