A Fast Accurate Deep Learning Framework for Prediction of All Cancer Types

The mortality rate of cancer is among the highest in the world. One death occurs every six in the world. Both machine learning (ML) and deep learning (DL) have been used by scientists to predict cancer. In addition, DL can analyze a huge amount of healthcare data in a short period of time to study the chances of recurrence, progression and patient survival. An accurate and quick framework for improving cancer prognosis prediction is presented in this study. A fast and accurate optimizer is necessary to predict both critical and non-critical cases, so a modified binary version of the Whale Optimization Algorithm (WOA) is proposed. Based on sigmoid transfer functions, this version identifies the subset of features that is minimally optimal while maximizing classification accuracy. This framework is composed of an optimized parameter Long-Short Term Memory (LSTM) Neural Network, with the input being the optimal set of feature selection layer. The proposed framework performs better than previous frameworks having an average accuracy of 100% and an execution time of 4113 seconds.


I. INTRODUCTION
By 2020, Cancer will be the second most deadly illness, killing one in six individuals, according to the International Agency for Research on Cancer, which predicts 19.3 million new cases and 10 million deaths [1], [2]. Breast, lung, prostate, colon, ovarian and cervical cancers are the most prevalent forms, and they weaken the immune system and alter other biological processes, which is why there is increased worry about this illness.
Disease detection entails classifying tumor kinds and identifying cancer symptoms in order to train a machine capable of detecting new metastatic tumor forms or diagnosing the disease early, when treatment is more difficult. Several earlier researches have proposed frameworks for predicting cancer prognosis, recurrence risk, progression, and patient survival as researchers have demonstrated that prediction accuracy is an aspect that contributes to the efficient treatment of patients [1], [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Gustavo Callico .
Recent advances in Deoxyribonucleic Acid (DNA) fragmentation technologies have created large amount of data, making genomics one of the first fields to generate data [4]. Sequences cannot convey ready-to-use information; however, they can be translated by a complex method that uses the sequence to generate protein. Because the constructed genome sequence matches previously recognized cancer genome sequences, it assesses the protein's expression and determines if it is malignant [5]. The gathering of genomic data has created various challenges in presenting a rational definition of cancer's genetic basis.
Moreover, the identification of the main difficulties associated with the treatment and prevention of diseases by the vast number of gene expression levels in a person containing many features but relatively few samples [6]. The chances of a successful recovery increase with early diagnosis. Its study is therefore crucial [7].
In a Deep Learning Architecture (DLA), features are extracted hierarchically with several degrees of nonlinearity. The H2O framework is a multi-layer neural network (NN) architecture designed for DL tasks [8]. It is possible to train DL models to represent original data in a useful way. Furthermore, they produce the greatest results for complex data [9].
It is possible to categorize meta-heuristic algorithms into two types: single-solution algorithms or population-based algorithms. The single-solution method utilizes only one candidate solution the optimization process, which evolves and is updated over time. Whereas the population-based uses a random search agent to begin the optimization process. Every search agent has its own candidate for solving the optimization issue. Sharing information about the search area and collaborating with each individual will prevent local optima from stagnating and ensure that a global search goal is covered.
As a rule, metaheuristic algorithms are evaluated based on their ability to solve decision-making difficulties and their significant balance between exploring and exploiting [10], [11], [12], [13], [14], [15]. Exploitation refers to the capacity to find better alternatives to well-known answers. The exploration of search spaces means finding better-scoring places by employing metaheuristics.
The Feature Selection (FS) problem and binary optimization difficulties can both be solved using a single optimization technique. References [16], [17], [18], and [19] introduced several hybrid techniques between WOA and simulated annealing presented by researchers such as Genetic Algorithm (GA), Gray Wolf (GW) and Particle Swarm Optimizer (PSO), also a hybrid strategy combining the filter and wrapper approaches of FS.
There is no certainty that a better selection of characteristics will be discovered in the FS issue. Furthermore, no optimizer is appropriate for solving any optimization problems based on the No Free Lunch (NFL) theorem [20].
Based on the reasons mentioned above, there is still a need for a modern, accurate, and high-speed system to deal with cancer diseases, as in non-critical cases we need high accuracy in an appropriate time, while in critical cases we need high speed for less chance of life. And on this, the research provides a fast and accurate DL framework to overcome these previous problems.
The rest of this paper is organized as follows: Section 2 describes the motivations for the study and its contributions. A literature review and a brief overview of the LSTM and Modified WOA (MWOA) are presented in Section 3. The proposed BMWOA-S and its designed frameworks are described in Section 4. Section 5 discusses the experimental results. Section 6 concludes with a discussion of future work and conclusions.

II. MOTIVATIONS & CONTRIBUTIONS OF THE STUDY
Following are the motivations for this study: 1) Propose a fast, accurate, and scalable framework for DL H2O that uses big data to improve cancer prognosis prediction. 2) Provide a binary modified WOA optimizer that is highly accurate and fast, which will allow FS to reduce the dataset size and also tune LSTM (number of layers and number of neurons per layer).
This study provides the following relevant contributions: • The DL H2O framework can handle a lot of data in many forms. Using patient health data to predict cancer prognosis is useful since it incorporates multi-source data. The frame has a high level of accuracy and quickness.
• FS is a method for quickly picking the best features for NN training, possibly enhancing cancer prognostic prediction while also lowering the bulk of the input data to LSTM.
• BMWOA-S is compared to other popular optimizers for its benefits and efficiency.
• Cancers of every type can be predicted with 100% accuracy, which means the earlier treatment is started, the greater the chance of a cure.
Based on the severity of the patient's case (critical vs non-critical patients), this study will help propose the most appropriate framework.

III. RELATED WORK
There has been considerable research into cancer diagnosis prediction using a variety of methodologies, with some showing high accuracy. Previous approaches have revealed the following findings: To enhance therapy and medicine discovery for diagnosis, several researchers employ Machine Learning (ML) classifiers such as k-nearest neighbour (KNN), logistic regression (LR), decision trees (DT), random forest (RF), and support vector machine (SVM).
In [21] applied to cervical cancer dataset. In addition, in [22] and [23] performed on four, six different datasets related to breast cancer. But in [24], two types of datasets were used to study colon cancer. Further study with a larger data set will aid in the improvement of these models' performance.
In [27] S. Parisapogu et al. used a multi-layered DL algorithm on diverse microarray data to diagnose the kind of illness. To categorise illness samples, our model must use many deep learning classification approaches on biological data sets.
N. G. El-Seddeq et al. introduced in [28] one of these newly introduced three frameworks that used to improve the performance of cancer prognosis prediction. With the exception of the lung and cervical cancer datasets, the proposed optimizer outperforms the FS algorithm for the fitness value on all datasets. Also, in [29] various feature selection methods, like correlation analysis and Fisher-ratio, are used to extract useful features and reduce dimensionality. Then, PSO (Particle VOLUME 10, 2022 Swarm Optimization) algorithm is used to generate a pool of candidate base classifiers for learning the subsets re-sampled from the different feature subsets.
In [30] the classification of microarray data using artificial neural networks (ANNs) was suggested. More research with a larger data set will aid in improving the model's effectiveness.
In [31] global optimization, cuckoo search (CS) has been found to be an effective algorithm. Cuckoo search fundamentals and applications are reviewed along with the latest developments.
Liu et al. deployed in [32] ReliefF and PSO algorithms for selecting feature genes. The first step is to use ReliefF as a feature prefilter to eliminate genes that have a low correlation with the target class. Search is then carried out using PSO. For the final optimal subset of genes, the classification accuracy of SVM is used as the evaluation function.
According to [33], gene FS data should be integrated with cancer classification for gene expression, along with other types of genomic data. The use of other algorithms for parameter optimization can help with this model.
As in [34], Principal Component Analysis (PCA) was used for dimension reduction in SVM and Levenberg-Marquardt Back Propagation (LMBP). The deficiency that occurs as a result of the model's excessive time spent in the training process; the choice of architecture must still be done in a more organised manner.
Othman et al. showed in [35], the cuckoo search was combined with evolutionary operators for gene selection as a hybrid multi-objective search. Double mutations and single crossovers are the evolutionary operators used also. The aim is to improve the search capabilities and values of the dimensions.
Saqib et al. in [36], to ensure relevancy and remove irrelevant features, a Multiple Filters and GA Warapper for Feature Selection (MF-GARF) hybrid approach consists of three phases relevance block; Information Gain, Gain Ratio and Gini Index. Second phase involves removing redundancy among features using Pearson Correlation statistics, followed by an Optimization Block. The Optimization Block consists of a Genetic Algorithm wrapper with Random Forest as a fitness evaluator, which provides a high predictive power feature subset based on the Genetic Algorithm wrapper.
Cahyaningrum et al. in [37] used PCA to work on ANN and GA. This model needs better tuning of the parameters of the genetic algorithm, such as mutation rate, crossover rate, elitism, and fitness normalization.
A hybrid model (CNN-LSTM) is proposed in [38] Using CNN layers with convolutional layers; it can extract features with a multi-time scale. By a wide margin, this hybrid model outperforms other existing methods.
Finally, in [39] a classification approach based on deep feature fusion and selection using whale optimization techniques was proposed. A PCA is used to select the best features, which are serially merged to produce a vector of Nx2125 features. Furthermore, WOA was used to select informative features Nx1049 among Nx2125 features and provide them to SVM, KNN, and Wide Neural Network (WNN) classifiers.
The reason for choosing DL over other standard approaches is that we anticipate dealing with large data size issues. Furthermore, the forecast time for cancer diagnosis is crucial since the patient's life depends on it, particularly in serious circumstances. In this scenario, DL is the best approach to utilise. Because it requires high-end infrastructure to train in a reasonable amount of time.
Next section describes the LSTM neural network used in our proposal and the Binary Modified Whale Optimization Algorithm (BMWOA-s) used.

IV. THE PROPOSED CANCER PREDICTION FRAMEWORK
Several Biomedical studies included deep learning frameworks that utilize different optimizers for selecting best features input to the FFNN and tuning it (no. of hidden layers and no. of neurons/layer) have been published [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39]. Accordingly, this thought has been succeeded in the application of deep learning biomedical techniques, especially in case of large datasets which includes two important cases, initial and severe states of the diseases, that happens in all diseases. It should be noted that, the initial state of the disease allows us some time to treat the patient with different attempts and different medicines, however in the severe case does not allow enough time for the prediction. On this, it imposed on these systems high accuracy in prediction as well as high speed in implementation. This always assured us of the need to search for a fast and accurate optimizer. The latest research published to us [28], on different types of cancer, we have obtained a 100% accuracy in most cases of the disease at reasonable processing time, encouraged us to search for the best so that we can reach 100% accuracy in all types of cancers at less processing time. Accordingly, this research proposes a Fast Accurate Deep Learning (FADL) Framework hoping it may results in predicting all types of cancers at a time that allows saving the patient's life, especially in severe cases. The framework is divided into four layers. As shown in Figure 1, these layers include processing, feature selection, deep learning, and prediction (classification).

A. PRE-PROCESSING LAYER
Medical data sets collected from several sources are often insufficient and riddled with errors that lead to misclassification. Certain machine learning algorithms are impaired by features of varying magnitudes. The first stage of the processing layer is the normalization method. Therefore, scaling (normalization) some feature's (column's) values between 0 and 1 can be achieved using these techniques.
Reducing data imbalance: As a second step of preprocessing, redundant columns, such as the ID column, are removed. A trait imbalance is a categorization problem in which some qualities are significantly underrepresented. As a result, the classifier prefers the majority of qualities. By over-sampling,  the occurrences of the minority class, the approach presented in the article [40] was employed to overcome the imbalance dataset problem.
It is essential to remove these incorrect values before separating the dataset into training and testing subsets. The performance of this layer is measured using six benchmark datasets. The first set of data is for breast cancer. The second and third are for lung and prostate cancers, respectively, while the fourth is for colon cancer. Ovarian cancer is the fifth. Cervical cancer is the sixth most common cancer. Finally, the labels for each element in the dataset are converted from textual values to numerical values in the third step of preprocessing. There are two classifications per dataset (Malignant and Benign), which are then transformed to the 0 and 1 values, respectively.

B. THE FS LAYER
To address the FS issue, in this layer, BMWOA-S is proposed as a binary variation of WOA. A dataset containing N features must, therefore, contain 2N features, indicating that a large region of features should be explored for feature reduction thoroughly. WOA [41] is a meta-heuristic algorithm effectively employs 2 attacking strategies, bubble-net and searching for Prey. To make the updating process, many suggestions have been tested, such as modifying the exploration equations, unfortunately all of these have been failed in proposing a fast, accurate optimizer. Therefore, in the following paragraphs, a new version of WOA named MWOA is explained in detail.

a: ENCIRCLING PREY
Based on mathematics, this behaviour can be described as: follows: i = iteration, A and C = coefficients, and X * = the position vector of the best solution found. A and C: The bubble-net attacking behaviors are: 1) Shrinking encircling mechanism: humpback was reducing the a from 2) Spiral updating mechanism: The space between both the whale and its prey: D = the gap between the population's best solution and the present individual whale, w = a fixed value, and q = [−1, 1]. As a result, the scientific formula looks like this:

c: SEARCH FOR PREY (EXPLORATION PHASE)
Gaussian equation is used to increase likelihood of discovering the globally minimum while avoiding becoming caught in the minima. In the DLA growth process, a randomness mechanism is used to produce new particle that use the Gaussian methodology. Based on the improved solution, a sequence of diffusion operations may be calculated: As the number of iterations reduces, a best alternative becomes available. It enhances the Search for Prey capability in the proposed WOA by employing the diffusion approach to discover an ideal solution [42].
The MWOA is an updated WOA version (See Figure 2). One of the ways to resolve this type of disadvantage, an advanced technique will substitute the search technique for the Exploration Group. A list of random walks about the optimal solution can be generated using the diffusion process. The enhancement of the WOA exploration potential to find the optimal solution through this diffusion procedure. In the modified version of WOA (MWOA) for enhancing exploration performance via applying diffusion process instead of the search space in WOA. This can need folks further exploring a promising location in the search area in order to avoid local stagnation by swapping: A continuous MWOA employs Eq. 2 to transfer search agents across the search space to adjust their locations to every location. This is referred to as space. The FS problem handles only binary data [43]; when there is no change, the continuous form of MWOA cannot be employed to solve the FS problem. As a result, we propose the BMWOA-S variant that really is suitable for solving the FS issue. BMWOA-S specifies that the candidate's choices require only binary solutions [0, 1]. If the feature has a value of 0, it is not picked; however, if it has a value of 1, it is selected.
We initially scale the data in the interval [0, 1] to change the MWOA solutions from continuous to binary. According to a prior study, the translation is accomplished by the use of an S-shaped (Sigmoid) transfer function (TF). S1, S2, S3, and S4 make up its family of TFs (Table 1). In order to travel in a binary space, the components of position vectors are converted from 0 to 1. Table 1 contains the theoretical formulas for each TF, and Figure 3 depicts the algebraic curve of an S-shaped curve. The stages involved in BMWOA are depicted in Algorithm 1. Converting scaled continuous data to binary values will be done by applying the formula provided by Kennedy and Eberhart [44]. The LSTM neural network is a type of recurrent neural network (RNN). The RNN architecture is trained by using sequential information which travels from the input vector through the network to the output neurons. Errors are calculated and propagated backward to update the parameters of the network. This type of network incorporates loops of information in its hidden layer. Using loops, information can flow multi-directionally, so at a given time step, the hidden state represents past data held at a given time step. Consequently, all outputs depend on previously made predictions already known. There are, however, limitations to RNNs' ability to bridge more than a specific number of steps. Gradient vanishing is the main reason for the prediction to capture short-term dependencies over time as the information from earlier steps decays. The gradient of the loss function approaches zero as the number of layers in the RNN containing activation functions increases. The LSTM neural networks (LSTM-NNs) enable learning long-term dependencies. By introducing a memory unit and gate mechanism, LSTM can capture sequences with long dependencies. Therefore, LSTM-NNs are capable of selectively remembering or forgetting information and of learning thousands of timesteps by using three gates and cell states.
Medical journals have published practical applications of LSTM. A variety of variants of RNN have been used for classification and prediction purposes in studies. According to one study, longitudinal medical records with irregular time

18:
Select a random search agent (X rand )
Based on Figure 4, we can see that an LSTM block extends the memory of an RNN by using structures like cell states and three gates that allow it to selectively remember and forget information. An LSTM block has four additional layers in addition to the hidden state in a RNN. Cell state (C t ), input gate (i t ), output gate (O t ), and forget gate (f t ) are the names of these layers. In order to generate knowledge from the training data, each layer interacts with the other in a unique way.
A sigmoid activation function (shown in Tab. 2) is implemented in the forget gate. For the input and output gates, however, a combination of sigmoid and hyperbolic tangent (TanH) are used to provide the necessary information to the cell state. In an LSTM neural network as shown in Eq. 15, the information generated by each block flows through the cell state from block to block.
where, σ is the sigmoid activation function, W (f) and b (f) are the weight matrix and bias vector, which will be learned from the input training data. The function takes the old output (h t−1 ) at time t-1 and the current input (p t ) at time t for calculating the components that control the cell state and hidden state of the layer. The results are [0, 1], where 1 represents ''completely hold this'' and 0 represents ''completely throw this away''.
LSTM networks, like RNNs, produce outputsŷ t that are used to train the network via gradient descent at each time step. Iteratively, the parameters of the network are updated during the backward pass. A minor modification to the algorithm is the only fundamental difference between the RNN and LSTM back-propagation algorithms. At each time step, the calculated error term is E t = −y t logŷ t . Similarly to RNN, the error is calculated as the sum of the errors from all time steps E = t −y t logŷ t .
For disease subtype analysis, music generation, text generation, handwriting recognition, language translation, time series analysis, and image captioning, LSTM is a very powerful ANN architecture. As information flows through the state of a cell, LSTM is effective to make predictions since it gives equal attention to all input sequences. The LSTM's prediction accuracy is not affected by the small change in the input sequence due to the mechanism adopted.
An equation describing data flow as an aggregate between m neurons in the previous layer and one neuron i is as follows: where w ij is the weight of contact between neuron i in the present layer from the neuron j of the past layer. x ij = relating data and c i0 = ingrained threshold for neuron i is considered as a standard weight.
In order for FFNN weights to represent the relationship between input vectors and desired output vectors correctly,   they should be identified accurately. To train the neural network and minimize the pattern's execution work, use the following equation: where E is the total mean sum squared error between the measured outputs, is actual state, and d z is desired state. z and g denote the values for the z th training set and g th component of the output vector. This layer represents LSTM with the best settings using the proposed optimizer. The LSTM is trained to use a selected subset of features with structure parameters such as the number of layers, the number of neurons in the hidden layer, biases, and activation function are 3, 10, random, and TanH, respectively (shown in Table 3). As for the initial weights, H2O Frame IDs initialize the weights such that the default initial_weight_distribution and initial_weight_scale parameters are uniform adaptive and one, respectively (shown in Table 4). Moreover, the training parameters learning rule and sum-squared error are Levenberg-Marquardt and 0.01, respectively. The LSTM is then trained to use the features and tested using the validation data. Subsequently, the error rate that is utilized to measure the fitness value is resolved. All the iterations and solutions in the population were achieved with previous tasks. Furthermore, the proposed BMWOA-S, binary WOA (BWOA), binary (GWO), binary PSO (BPSO), and binary GA (BGA) algorithms are examined in this layer. Each optimizer generates the best solution, and it is verified using the test data after the optimization process is performed. During the last testing process, various metrics were enlisted for comparison. The BMWOA-S uses training and validation data portions during the optimization process and for testing perf.confusion_matrix( ) data after optimization. Therefore, we ensure that every optimizer examines the same data set portions in every iteration. In this manner, a fair comparison is obtained.

D. THE PREDICTION LAYER
The optimizer suggested in earlier layers is employed to resolve a problem requiring two different functions: exploration and exploitation. The algorithm presenting the fast accurate framework for all types of cancer prediction is shown above. Moreover, Figure 5 illustrates the 4 layers explained above of the framework visually.

V. EXPERIMENTAL RESULTS AND DISCUSSION
The four experiments we conducted were as follows. During the first experiment, we evaluated the performance of the proposed optimizer, while during the rest, we examined the performance of the three frameworks. All experiments were conducted on Intel R Core TM i7-2.90 GHz processor with 32-GB RAM and an NVIDIA Quadro M2000M GPU. The four experiments were conducted based on datasets.

A. DATASETS DESCRIPTION
Six benchmark datasets were used to evaluate the framework's performance each with two classes, Benign and Malignant.
• Breast Cancer (No. 1): it is submitted by the UCI ML Repository [46], [22] having 699 records each with 10 features, these features are deduced from medical digital images of breast. As shown above and summarized in Table 5, the selected datasets include diverse records, features and classes that could be solved by the proposed framework. The records in every dataset were randomly portioned into three categories, training, testing and validation with ratios of 80%, 5% and 15%, respectively.
Mean Square Error (MSE) C. PERFORMANCE ANALYSIS OF THE PROPOSED BMWOA-S 1) EXPERIMENT NUMBER 1 Based on the second layer of the DL H2O framework, we conduct the first experiment. To validate its performance, the proposed BMWOA-S was evaluated against the BWOA, BGWO, BPSO, and BGA algorithms. The mean error, mean    fitness value, mean size of choice, and mean standard deviation of the algorithms were compared. Table 6 shows the configuration values. Tables 7-10 describe the outcomes of this investigation and provide the aggregate findings for all enhancers across the six data sets. BMWOA-S had the lowest values for mean error, mean selection, and standard deviation. In other words, BMWOA-S beat other algorithms, such as the FS method, for fitness value in all datasets except the cervical dataset, which had relatively few characteristics, demonstrating that BMWOA-S can select the best subset of features with the least error.

2) EXPERIMENT NUMBER 2
For the purpose of evaluating the proposed optimizer's performance, its accuracy is compared to results obtained from  GA, PSO, GWO and WOA optimizers. From the following figures showing the accuracy for 50 iterations for different optimizers, it is evident that the proposed framework is superior.

D. FRAMEWORK's PERFORMANCE ANALYSIS
In Experiments 3-4, the framework was compared in terms of confusion matrix values, precision, accuracy, recall, specificity, F1-score, computational time, logarithmic loss (logloss) and mean squared error (MSE) values.

1) EXPERIMENT NUMBER 3 TESTS THE BEHAVIOR OF THE DL H2O FRAMEWORK
The DL H2O framework for cancer case classification in this experiment comprises four layers: pre-processing, FS, DL with FFNN optimization, and prediction.  The performance of cancer-specific data sets is shown in Table 11. The DL H2O framework obtained mean accuracy of 100 %, accuracy of 100 %, recall of 100 %, F1 score of 100 %, specificity of 100 %, 0.085 MSE, and 0.309 log loss for all cancer data sets, as indicated in the table (see Figure 13).

2) EXPERIMENT NUMBER 4 WILCOXON'S RANK-SUM
Wilcoxon's rating test was used to calculate p-values for the proposed DL framework. Based on this outcome, the  proposal's outputs will determine whether or not it has made a significant difference. DL frameworks differ significantly from other frameworks with p-values below 0.05; however, VOLUME 10, 2022 those with p-values above 0.05 are not significantly different. T-test results are presented in Table 12 along with their p-values and average accuracy. When the p-value is less than 0.05, This validates the suggested framework's superiority and is statistically significant.

E. DISCUSSION
The BMWOA-S optimizer is used to determine the best collection of characteristics to utilize as FFNN inputs, as well as the ideal number of layers and neurons. Table 13 shows the outcomes of the suggested framework (100% for the six used cancer datasets). The table's bold letters reflect the best outcomes. For Breast, Prostate, Colon and Cervical cancer data sets, the proposed system outperforms others. In terms of lung and ovarian cancer data sets, the suggested framework by Xiongshi, Deng et al. [26], Wu and Wang et al. [33], Adiwijaya et al. [34], Saqib et al. [36], and Cahyaningrum et al. [37] surpasses the other framework. In [28], the proposed framework equals the result achieved in [28] but the proposed framework is faster than their framework.

F. DATA ANALYSIS
First, cancer is one of the diseases that causes the highest mortality rate. In the world, every six deaths are one death. In 2021, 2020, 2019, and 2018, ovarian, lung, prostate and cervical cancers were the most common causes of death worldwide, respectively. Early detection of cases increases the chances of the patient receiving treatment, as we discussed earlier. In the case of cancer patients, accuracy and prediction time are the most important factors (See Table 14).
Second, a framework for predicting cancer with high accuracy and a short prediction time has been proposed. According to the results, we advise that for non-critical patients, the frame is suitable with high accuracy, and for critical patients, the prediction time is shorter.
The modified optimizer performs the main step in the FS algorithm by selecting the most optimal features as well as by adjusting the layers and neurons per layer. The accuracy of DL increases as the inner layers are added and as the time is increased. So, we suggest a three-layer frame. The best choice is the one that takes the least amount of time and is high precision.

VI. CONCLUSION AND FUTURE DIRECTIONS
Our goal is to achieve a balance between exploration and exploitation in this paper by proposing a DL cancer prediction framework. The first applies: searching around for individuals and mutations. Moving towards and searching around the leader are the second applies. BMWOA's have been designed to be the most accurate and fastest prediction for all types of cancer diseases. The result of all trials on all cancer types was 100% accuracy. This has been done in less than 4113 seconds as this time is very important for critical cases. As the proposed framework consisted of the control scheme, we were able to assure the stability of it, despite DL H2O's superior accuracy in the six standard data sets.
The proposed approach will be evaluated in the future against other metaheuristic algorithms to address another binary classification problem. Data sets utilizing DL are being analyzed for the impact of increasing their difficulty level. Tables 6-9 show that the suggested method has a mean error of 0.002, a size of determination of 0.265, a fitness value of 0.0115, and a standard deviation of 0.0125, outperforming other algorithms in all datasets except the Cervical cancer dataset. As a result, we want to use the Broad Learning System (BLS) to enhance the outcome.  ZAINAB H. ALI received the B.Sc. degree in computer science and engineering from the Faculty of Electronic Engineering, Menofia University, the master's degree in telecommunication and computer networks, in 2016, and the Ph.D. degree from the Department of Computer Engineering and Control Systems, Faculty of Engineering, Mansoura University, in 2020. Since 2019, she has has been an Instructor with Huawei Company. In 2021, she works as a Lecturer with the Faculty of Graduate Studies for Statistical Research, Cairo University. She is a Lecturer with the Embedded Network Systems and Technology Department, Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, Egypt. She received a lot of badges and certificates from Huawei, Cisco, and IBM in fields of the Internet of Things, artificial intelligence, cloud computing, and predictive analysis. She has served as a Reviewer for many high-quality journals, including Journal of Artificial Intelligence university, Journal of Computer, Oxford Academic, and Journal of Supercomputing (Springer).
ALI I. ELDESOUKY received the M.A. and Ph.D. degrees from the University of Glasgow, U.K. He is currently a Full Professor with the Computers Engineering and Systems Department, Faculty of Engineering, Mansoura University, Egypt. He is also a Visiting Part-Time Professor with MET Academy. He teaches in American and Mansoura Universities and has taken over many positions of leadership and supervision of many scientific papers. He has published hundreds of articles in well-known international journals.