HDLNET: A Hybrid Deep Learning Network Model With Intelligent IOT for Detection and Classification of Chronic Kidney Disease

Over 10% of the world’s population now suffers from chronic kidney disease (CKD), and millions die yearly. CKD should be detected early to extend the lives of those suffering and lower the cost of therapy. Building such a multimedia-driven model is necessary to detect the illness effectively and accurately before it worsens the situation. It is challenging for doctors to identify the various conditions connected to CKD early to prevent the condition. This study introduces a novel hybrid deep learning network model (HDLNet) for CKD early detection and prediction. A deep learning-based technique called the Deep Separable Convolution Neural Network (DSCNN) has been suggested in this research for the early detection of CKD. More processing attributes of characteristics chosen to indicate a kidney issue are extracted by the Capsule Network (CapsNet). The pertinent characteristics are selected using the Aquila Optimization Algorithm (AO) method to speed up the categorization process. The necessary features improve classification effectiveness while needing less computational effort. The DSCNN technique is optimized to diagnose kidney illness as CKD or non-CKD using the Sooty Tern Optimization Algorithm (STOA). The CKD dataset, found in the UCI machine learning repository, is then used to test the dataset. Accuracy, sensitivity, MCC, PPV, FPR, FNR, and specificity are the performance metrics for the suggested CKD classification approach. Additional experimental findings demonstrate that the suggested method produces a better categorization of CKD than the present state-of-the-art method.


I. INTRODUCTION
Kidney cancer is considered one of the most prevalent cancers, which is regrettably challenging to identify in its early stages with standard clinical techniques [1].Renal cancer research is still poor because it is one of the ten most lethal malignancies.The medical world has been dominated by several cancer forms, which has slowed the development of novel methods for detection and therapy [2].Life expectancy is typically estimated to be less than one year for kidney cancer patients with limited treatment choices for decades.Therefore, automatic diagnostic tools will aid a physician in The associate editor coordinating the review of this manuscript and approving it for publication was Tiago Cruz .
rapidly and readily identifying the condition and assisting patients in surviving [3], [4], [5].Early kidney illness can be challenging to diagnose because algorithms for categorization are frequently utilized in many automatic medical diagnostic instruments.The tool can lower the testing pressure, and chronic kidney disease (CKD) affects kidney structure and function [6].Complications from a lengthier illness may include difficulties with blood or heart vessels, weak bones, high PB, anemia, nerve damage, etc. [7].
The kidneys' inability to perform their typical bloodfiltering role and other critical tasks is known as kidney disease (CKD).''Chronic'' refers to the gradual degeneration of renal cells over an extended period [8], [9], [10].CKD is a serious public health concern worldwide, particularly in low to middle-income countries where millions perish from a lack of access to healthcare [11], [12].Heart disease could occur as a result of CKD, which is irreversible.Expected outcomes include kidney transplantation or ongoing dialysis.The earlier research provides evidence that early treatment and diagnosis of CKD can improve the patient's quality of life.As a result, it is crucial to identify and treat CKD as soon as it is detected.This will help patients stop the disease's progression [13], [14], [15].
At present, networks and medical sensors offer a broad advancement in the IoT fields as a plan in which the intelligent medical care device with specific identification numbers could be expressed and linked for receiving and sending vital medical care digital media data to detect earlier severe conditions, for instance, threatening chronic disease such as CKD [16], [17].Each critical healthcare parameter and data collected by IoT sensors may be examined by ML algorithms in a prediction model, which is a successful solution in earlier medical detection to effectively and accurately analyze the patient's healthcare situation [18].In a comprehensive study, data mining techniques like categorization approaches are widely used as an efficient tool in anomaly detection and disease prediction [19].Numerous health tracking and intelligent medical care schemes were established based on the advancements of using contemporary IoT devices and biomedical sensors.Many factors impacting chronic disease have been used in the current research, primarily focusing on earlier diagnosis of chronic diseases such as diabetes mellitus, heart disease, and CKD [20].But taking into account every crucial factor required for disease prediction, the effectiveness of the prediction approach, and the execution time is still difficult.
Therefore, we intend to integrate an effective and adaptive clinical decision support network model for CKD diagnosis in the suggested strategy.Data collection is initially carried out in an IoT setting.Then, a deep learning architecture based on adaptive optimization is created to identify CKD.The suggested deep architecture's hyperparameter tuning is also carried out via an optimization method.

A. NOVEL CONTRIBUTION
The significant essential contribution of this paper is as follows, • In the pre-processing stage, we handle the categorical and missing values and feature scaling.
• To extract the features, utilize the Capsule Network (CapsNet) technique.
• To improve and speed up classification performance, used the Aquila Optimization (AO) Algorithm.
• Finally, classify the kidney disease, whether CKD or Non-CKD, and employ Deep Separable Convolutional Neural Network.To optimize the parameters, using the Sooty Tern Optimization Algorithm (STOA).
• An extensive experimental analysis is carried out using accuracy, precision, recall, f-measure, FPR, FNR, and MCC metrics to examine the improved CKD detection outcomes of the proposed technique.
The following is a summary of this article.Overviews of the earlier works are provided in the II section.The III section provides a methodology description.The results and findings from the discussion are included in Section IV, along with an analysis of each experimental data point.In Section V, the article is concluded.

II. RELATED WORKS
Prediction of CKD is a popular study topic.They applied multiple categorization algorithms to create a compelling and precise prediction system.
To predict chronic kidney disease, Elkholy et al. [21] developed the Deep Belief Network.The kidney-related disorders are predicted using a categorization method named modified Deep Belief Network (DBN), with Softmax as an activation function and Categorical Cross-entropy as a loss function.By having an accuracy of 98.52%, the suggested model outperforms the already used ones.
For the diagnosis and categorization of chronic kidney disease (CKD), Elhoseny et al. [22] created the Density-based Feature Selection (DFS) with an Ant Colony Optimization (D-ACO) algorithm.Before creating the ACO-based algorithm, the suggested intelligent system via DFS removes unnecessary or redundant features.The suggested approach assaulted the other techniques with a notable improvement in categorization accuracy utilizing fewer features.
Khamparia et al. [23] proposed the stacked autoencoder model, a deep learning framework for categorizing CKD.A softmax categorizer was employed to forecast the final class once the dataset's valuable features had been extracted with the help of the stacked autoencoder.This multimodal model was found to have a good classification accuracy compared to other traditional classifiers used to diagnose chronic renal disease.
Heterogeneous Modified Artificial Neural Network (HMANN) was introduced by Ma et al. [24] for the early recognition, segmentation, and diagnosis of CKD failure on the IoMT platform.Additional classifications for the proposed HMANN include a Support Vector Machine and a Multilayer Perceptron (MLP) employing a Backpropagation (BP) technique.The segmentation of the region of interest for the kidneys in the ultrasound image serves as the foundation for the suggested algorithm's operation.The recommended HMANN method for segmenting the kidney offers high precision while significantly reducing the time needed to delineate the contour.
To classify diseases, Jerlin Rubini and Perumal [25] created the multi-kernel support vector machine (MKSVM) and fruit fly optimization algorithm (FFOA).First, FFOA was used to select the best features from the available features.To classify medical data, the MKSVM was given the processed versions of the characteristics chosen from the medical dataset.The provided technique achieves improved categorization accuracy when compared to previous techniques.
To screen for CKD using ultrasound pictures, Hao et al. [26] introduced a unique CNN framework called the texture branch network.In this case, a texture branch is introduced into a conventional CNN to extract and optimize texture features.The method can automatically produce texture and deep features from input photos and classify objects based on the combined data.Experimental results, which achieved an accuracy of 96.01% and a sensitivity of 99.44%, show the usefulness of the suggested approach.
Senan et al. [27] used the Recursive Feature Elimination (RFE) approach to choose the most important representative characteristics.SVM, DT, KNN, and RF were among the categorization algorithms that received input from chosen features.Each strategy yielded effective results when the parameters for each model were changed to achieve the optimum categorization.All other pertinent methodologies were exceeded by the random forest strategy, which achieved above 90% accuracy, precision, recall, and F1-score for all criteria.
Although the earlier studies made significant attempts, the effectiveness of their prediction algorithms still has to be improved if the results are to be highly accurate.To best forecast CKD, this research will provide a deep learningbased approach.
The KNN technique was established by Houssein and Sayed [28] to forecast CKD.A modified INFO (mINFO) with two upgrade methodologies has been created to enhance INFO.Opposition-based learning (OBL) was employed in the established variant to enhance local search capability and prevent getting trapped in local optimums.Dynamic Candidate Solution (DCS) was utilized to solve the premature convergence issue in INFO to establish an appropriate equilibrium between exploitation and exploration capacity.While compared with existing approaches, the proposed approach yields greater performances.
Swain et al. [29] developed an updated ML model for CKD detection that was trained employing the supervised learning methods SVM and RF.Multiple approaches, including missing-value imputation, data balancing, and feature scaling, were used to increase the strategy scalability.For the feature selection method, the chi-squared approach was additionally applied.Furthermore, to adjust the algorithm with the best feasible set of variables, ML-based efficiencyboosting techniques like hyperparameter tuning were also applied.The effectiveness of the suggested work was evaluated in comparison to other investigation's accuracy.
For accurate categorization of CKD and non-CKD, Prasad Reddy and Vydeki [30] suggested the Ebola deep wavelet extreme learning machine (EDWELM).The most discriminative features are selected to increase the effectiveness of categorization using the combination of the darts game and battle royale optimization process known as the darts battle game optimizer.The autoencoder, Ebola optimization search algorithm, and wavelet neural network are used in the suggested EDWELM categorization approach to classify CKD effectively.Compared to other methods, the suggested EDWELM classification algorithm performs much better.
For the diagnosis of CKD, Alikhan et al. [31] introduced the Self-Attention Convolutional Neural Network optimized through the Season Optimization Algorithm.The investigation uses a Smart Medical Big Data healthcare system with IoT and cloud computing.IoT devices that collect data include wearables and sensors.The provided approach outperforms existing approaches in terms of accuracy.

III. PROPOSED METHODOLOGY
An effective DSCNN is created in this paper to handle CKD categorization issues.Fast learning with excellent performance and identifying the essential parameters associated with CKD are two main consequences of the diagnostic method.The suggested strategy involves four phases.During pre-processing, addressing missing values, reducing the dataset, converting it to binary data, and standardizing it are all finished.
The most informative CKD-related characteristics are found using the suggested Aquila optimization-based dimensionality reduction approach, which removes unnecessary features from the CKD dataset.The DSCNN learning and classification accuracy are improved through successful parameter optimization.Figure 1 represents the proposed method framework.

A. PRE-PROCESSING
The transformation of unclean data into a clean dataset may be accomplished through data pre-processing.Training is a necessary first step in every categorization system.This technique completes tasks including addressing missing values, scaling the dataset down, turning it into binary data, and standardizing the dataset.Rescaling is employed to scale the dataset when the collection of attributes has different sizes.The binary transformation turned the value into 0 and 1.When an attribute's value is over a threshold, it is regarded to be 1, and when it is below a threshold, it is considered to be 0. The typical technique requires that each characteristic have a mean of 0 and a standard deviation of 1.

B. FEATURE EXTRACTION BASED ON CAPSULE NETWORK
To extract the features from the data collected after preprocessing, we suggest using the deep learning method known as Capsule Network (CapsNet).Compared to previous feature extraction techniques, the CapsNet technique accurately retrieves the features.Capsules comprise the activation vector, comprising a group of neurons whose outputs are interpreted as varying aspects of the same thing.Each capsule consists of an activation likelihood indicating the dimension of the vectors and a pose matrix indicating the presence of a specific item at a given pixel.The direction of the activation vector gathers details about the object's posture, comprising its directions, as opposed to the size or amplitude, which establishes the likelihood that an object of interest is present.
The activation vector will, for example, change appropriately while rotating an image, but its length won't change.There could be several layers in the capsule.A principal capsule layer (the outcome of the final convolutional layer that has been reshaped and compressed) and ActionCaps layers are employed in our suggested architecture.Every capsule predicts the outcome of the parent capsule.If the predicted output matches the actual outcome of the parent capsule, the coupling coefficient between these capsules improves.If the outcome of the capsule is identified, u i is established as ûj/i stands for the j th capsule's predicted outcome vector, and matrix weight is represented by W ij The coupling coefficient estimates the softmax function based on the degree of conformation between the layer and the parent capsules.
The value of b ij , which corresponds to the log-likelihood, should be set to zero if capsules i and j are to be connected initially via the agreement procedure.Therefore, input vector j is used to ascertain the parent capsule as shown in Equation. (3).
In the final step, the result vectors of the capsules are normalized by keeping them below 1, employing a non-linear squashing function.Therefore, the possibility that a capsule will discover a particular trait may be considered its length.Each capsule's final result depends on its beginning vector value.
While v j indicating the outcome and representing the intermediate input into capsule j. based on the consensus between v j and ûj/i .During routing, the likelihood log must be upgraded.So, using Eq. ( 5), the revised log-likelihood is computed.
Using a dynamic routing technique, the routing coefficient will be raised by a factor ofˆ/I a j-parent capsules.
If necessary, several convolutional layers could be used prior to the first capsule layer.The major flaw in the CNN framework, the max-pool layers, is eliminated with CapsNet.CNN was missing some crucial data and the spatial relationship from the image due to the max-pool layer.Therefore, CapsNet employs convolution with strides over 1 to reduce the dimensionality.The complete structure of the network employed for this is displayed in Fig. 3. 14788864 parameters can be trained.
With parameter 0.0001, the Adam optimizer stochastic gradient descent approach trains the entire network.For a variety of causes, adaptive gradient descent techniques are frequently used.They can learn both sparse and denser information since they can adjust a learning rate for each parameter.Second, they make it possible to learn the learning rate from data rather than having to tune it.Third, when differentiated from non-adaptive systems using the same data and techniques, they frequently reach convergence significantly earlier in the training process.
The principal capsule layer links every capsule to each other capsule in the Action Caps layer.But routingby-agreement, a method that promotes better learning, is employed instead of max-pooling.
The initial and input convolution layers are part of the effective capsules in the framework employed to diagnose CKD, and these layers are discussed below.

1) INPUT LAYER
This layer receives input data from CKD to train the network.

2) PRIMARY CAPSULE LAYER
A convolutional layer is initially coupled to the input layer.It has 256 scalar-containing filters with a kernel size of 9.The non-linear activation function is called ReLU or Rectified Linear Unit.The primary capsule result is molded to create 32 by 6 by 6 feature maps with 8-dimensional vectors.To ensure that the output vectors have a length between 0 and 1, a new squashing function must be used.This provides the principal capsule's output.During training, a modest epsilon value is added to the squash function to get around the vanishing gradient issue.

3) ACTION CAPS LAYER
For each leading and action caps pair, the routing by agreement technique is employed to identify the projected outcome vectors of the action capsules.

C. FEATURE SELECTION
The essential characteristics must be chosen from the extracted features after the features have been extracted.To speed up the categorization process, the relevant characteristics are chosen utilizing the Aquila Optimization Algorithm (AO) approach.The required features only enhance classification efficiency while requiring less computing time.The Aquila Optimizer (AO) receives the characteristics extracted from the capsule network and selects the key features based on their quality.The proposed FS method, dubbed AO, starts by generating X initial populations of N agents; the agents in the existing populations are then upgraded based on this best agent and the AO until they discover the optimal solution.
The amount of features is denoted by D in Equation (19).The random vector Rand (1, D) represents D values.The search space's borders are denoted by LB and UB.The algorithm does exploitation and exploration operations after initializing the population until the best outcome is found.During the exploitation and exploration processes, there are primarily two implemented tactics.
The exploration procedure is carried out using the first method while accounting for the average agents (XM) and the best agents (Xb).The following mathematical formulation represents this tactic: The parameters and u are created at random, with β=1.5 and s=0.01.The agent XR in Equation ( 6) is chosen at random.Additionally, the numerical formulations of x and y are employed to track the spiral: While U=0.00565 and ω=0.005, the parameter r1 ∈ [0, 20] is generated randomly.XM and Xb are used in the first method, which is expressed mathematically as follows, for upgrading the agents during the exploitation process: While δ and α, the exploitation process's adjustment parameters are indicated by the symbols and.A parameter that is produced at random is rand [0, 1].In the second technique, the quality functions QF and Xb, Levy are used to upgrade the agent while it is being exploited.This plan is mathematically expressed as follows.
1) UPDATING POPULATION Equation ( 20) converts Xi,i=1,2,. . ., N into BXi, which is the Boolean value of this stage.By ignoring the unnecessary characteristics that correlate to zero values in BXi, the number of feature selections is decreased.Equation ( 21) is then used to get the fitness value.
The weights used to balance the ratio of essential character- and classification error γ i s are denoted by the periods λ ∈ [0, 1].The best Fit is then identified, along with its related agent Xb.After that, replace the current agents with AQU operators.

2) TERMINAL CRITERIA
When the stopping criteria are evaluated, the updated step is repeated if not met.If not, the learning process is halted, and the outcome of the selected characteristics, Xb, is used instead.

D. CLASSIFICATION
After choosing the key features, we employ an efficient deep learning method called Deep Separable Convolutional Neural Network (DSCNN) to categorize the kidney as CKD.Among the several deep learning models, the DSCNN has been proven to be the superior classifier.

1) DEEP SEPARABLE CONVOLUTION NEURAL NETWORK
To minimize processing and the number of parameters, DSCNN consists of layer-by-layer convolution.W and H stand for the width and length of the channel; The input shape is X × H × S, and the channel number is S. Assuming that stride is set to one and that convolution 2D has a K 3 × 3 convolution, the estimated amount of convolution 2D will be X × H × A × K × 3 × 3. The final feature map's summation symbol is .The bias constant is b.The N 1 × 1 convolution is then repeated to produce N feature maps of point-by-point convolution.The conventional convolution can be stated below The kernel is represented as U, which has the size D U × D U × U , a feature of the input graph is represented as M, the features of input are denoted as i, j, u, and v, and W is the number of input channels.Assuming that the initial feature graph is P × Q in size, there are S channels.The calculation D f × D f is used for the size of the kernel, and U convolution kernels, DSCNN, and regular convolution have similar amounts of data.
For the traditional convolution ratio, the DSCNN is expressed as a result, For DSCNN, the standard practice is to employ a convolution kernel with a size of 3 × 3. When the outcome of a channel is 64, the proposed method only requires 0.126 times as much computation as the calculation of the conventional convolution parameter.

2) SIMILARITY PRUNING AND LAYER-BY-LAYER CONVOLUTION
The fewer output channel feature graphs K in layer-by-layer convolution results in less parameter calculation.This process, known as deep convolution, involves filter K and the input channel feature graphs M. Information among channels is not fused in this process; instead, the main focus is on extracting spatial features.Convolution computation only takes place internally in every channel.Convolution performed layer by layer allows similar filters to work in harmony.The remaining identical filters can continue to assure the accuracy of the network during retraining after one of them is eliminated, which lowers the network's complexity.The resemblance of the filters is evaluated using the KL divergence approach.The computation is done using.
While the initial likelihood is indicated as P, Q is the likelihood approximation, and loss of information is represented as X|Y caused by the approximation of the likelihood distribution.The initial probability distribution is fitted using X.The weight ratios at each respective point of the filter are denoted by q(x i ), X and X (x i ).The ordinary entropy is described as.
While P(x i ) is the information on occurrence, X = x i , and lg(p(x i )) is its likelihood.

3) WEIGHT PRUNING AND POINT-BY-POINT CONVOLUTION
Layer by layer in point-by-point convolution is performed using the K and 1 × 1xN filters for feature graphs in the output.The feature graphs with N channel are produced using the convolution process of point-by-point with the help of N convolution kernels.The point-wise convolution is used to realize the combination of features of the feature map that was input at the channel level.Convolution is the primary focus of computation in deep separable convolution; it is thought to be more efficient to remove the graph features.
where F 1 , F 2 , . . ., F k are the feature graphs of input K and K 1 , K 2 , . . ., K n are representations of 1 × 1 × N filters.Similar filters work well together in layer-by-layer convolution.When one of the filters is eliminated, the other filters can support one another to keep the network's accuracy.Weights are initialized randomly prior to model training, followed by loss computation, reverse propagation, and forward propagation.

4) BATCH NORMALIZATION (BN)
After the dense connection layer, a layer known as batch normalization is applied by using a parameter bias.The input data are standardized to identify the proper feature axis.BN could adapt to parameter initialization.The mean is made to be 0, and the variance is made to be 1 at every level via the converting procedure.Additionally, more sensitive falls in every layer's activation function area resolve the issue of data distribution and upgrade coordination among network layers.Faster convergence, avoiding overfitting, and enhancing network stability accomplish its primary goals.The next four steps comprise the batch normalization algorithm.
Step 1. Find the average value for every training batch.
Step 2. Differences in every training BN should be calculated.
Step 3. The batch's training data is normalized using the calculated mean and variance to get the 0-1 distribution.
Step 4. For the normalized training, perform migration and scale transformation.The following are specific batch normalization algorithms: xi = While the γ , 0 is represented as the scale factor, the translation factor represented as β, and a small positive number ε utilized to prevent a divisor from becoming zero are all present.As the training period is extended, the RMS) error tends to remain constant while the training set metrics of the models show a considerable BN downward trend.Without BN, on the other hand, there is no discernible reduction in the algorithm's RMS error and no trend toward convergence or model fitting.As training time increases, the algorithm's RMS error with batch normalization in the testing set tends to decline sharply, demonstrating a continuous improvement in the network's generalization and prediction abilities.

E. SOOTY TERN OPTIMIZATION ALGORITHM (STOA)
To improve the classification performance even more, the parameters from the DSCNN approach are optimized by the STOA method.The parameters like batch normalization, learning rate, decay, epoch, and momentum are optimized.This section goes into great detail about the inspiration and computational modeling of the suggested algorithm.

1) BIOLOGICAL PARADIGM
Sea birds, known as sooty terns or Onychoprion fuscatus in the scientific community, are widespread worldwide.There are various kinds of sooty terns, all of which vary in size and bulk.The omnivorous sooty tern consumes various foods, including frogs, fish, insects, and reptiles.Sooty terns use their feet to make sounds resembling rain to draw in earthworms buried beneath the surface and use bread crumbs to draw in fish.Sooty terns usually dwell in groups called colonies.They employ their brains to discover and hunt their prey.The sooty terns' migration and predation behavior make them stand out.To find the richest and most plentiful sources of food that will give them enough energy, sooty terns migrate, described as their seasonal movement from one location to another.Following are some examples of this behavior: • Sooty terns move in flocks while migrating.To prevent accidents with one another, sooty terns start at various positions.
• Sooty terns can move in pairs in the path of the sooty tern with the best chance of survival, one whose fitness level is lower than the others.
• The locations of additional sooty terns can be updated based on the fittest sooty tern.Sooty terns attacked in the air while flying in a flapping manner.These behaviors may be expressed in a fashion that allows them to be connected to the objective function that has to be optimized.

2) MIGRATION BEHAVIOR (EXPLORATION)
A sooty tern must meet the three requirements listed below to migrate: • Collision avoidance: To prevent collisions among its neighboring search agents, SA is utilized to compute the new position of the search agent.
While Z is the present iteration, ⃗ P st denotes the search agent's present location, and S A denotes the movement of the search agent inside a certain search space.⃗ C st is the location of the search agent that avoids colliding with other search agents. where,

. , Max iteration
The SA is adjusted by changing C f , which decreases linearly from C f t to 0. This document sets the value of the variable C f to 2. After avoiding collisions, the search agents move in a direction towards their best neighbor.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.While Rand is an integer chosen randomly that falls among [0, 1].Update in line with the optimal search engine: When it comes to the best search agent, the search agent or sooty tern can alter its position.
⃗ D st describes the difference between the search agent and the search agent that is the best fit.

3) ATTACKING BEHAVIOR (EXPLOITATION)
Sooty terns can adjust their speed and attack angle while migrating.They lift off more quickly by using their wings.They exhibit spiral behavior in the air while attacking their victim, as shown in the following description: whereas i stands for the variable that falls within the limit of [0 ≤ k ≤ 2π], Radius denotes the radius of every turn of the spiral.The spiral shape is defined by the variables u and v, while the natural logarithm's base is e.
While (z) upgrades other search agents' locations and collects the best optimal solution.STOA workflow is depicted in Figure 2.

4) COMPUTATIONAL COMPLEXITY
One of the critical criteria used to assess the effectiveness of an algorithm is its complexity.STOA and other algorithms require O(no × np) time for population initialization, where no denotes the number of objectives and np represents the number of population sizes.All methods require O(Maxiterations × of ), whereas of denotes the objective function a specific task, to compute the difficulty of determining the fitness of search agents.In contrast, the total quantity of space that an algorithm can use at once is its space complexity.The first phases of algorithms take this spatial complexity into account.All algorithms in this study are thought to have an O(no × np) space complexity.The suggested STOA algorithm has an (N × Maxiterations × no × np) computational complexity.

IV. RESULT AND DISCUSSIONS
The experiment's outcomes and conclusions are discussed in the section that follows.The effectiveness of the suggested strategy is validated in this study using a CKD dataset.Two groups of data samples are created, one of which serves as the training dataset.Next, the classifier is assessed using the testing dataset.The outcomes on the Python platform were achieved using a computer with an i5 processor and 8 GB of RAM.The parameter setup is shown in Table 1.

A. DATASET DESCRIPTION
The CKD dataset utilized in the present research was made public in July 2015 by Apollo Hospitals, Chennai, in Tamil Nadu, India, and is accessible in the UCI machine learning repository.The collection comprises 400 patient samples with 24 attributes (13 nominal and 11 numerical features).250 individuals were diagnosed as having CKD based on the clinical features that were accessible, while the remaining 150 cases were categorized as not having CKD.

B. IOMT PLATFORM
A key component of IoMT is hardware.To gather real-time data from CKD patients, multiple devices can be employed.
A variety of medical devices can be used to collect data, including the ECG (Electrocardiogram), which can be used to monitor the heart; the glucometer, which can detect diabetes in the blood; and fitness trackers, which can record information about stress, breathing, oxygen level, and breathing rate (Figure 1).These devices are all linked to the web and are in charge of sending data to the cloud through connectivity tools like networks and gateways.Data are often transmitted using an API (Application Programming Interface) key to establish a secure connection.The relevant devices can access and save data on the IoT platform using these API keys.DL models then offer software solutions that give healthcare professionals access to device control, reporting, and data analytics opportunities.The paper suggests using IoMT components to handle CKD data and deploying a repeatable application utilizing the DL method to forecast CKD.

C. EVALUATION PARAMETERS
The following evaluation metrics, whose specifics are explained below, are used to determine how accurate the suggested model is at making diagnoses.The effectiveness of the suggested method is evaluated using performance metrics that are calculated using a confusion matrix that incorporates the terms True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN).
• TP: quantified as the proportion of positive incidents that are genuinely positive.
• FP is defined as the proportion of cases genuinely estimated to be positive.
• TN: defined as the proportion of predicted negative incidents that occur.
• FN: defined as the proportion of possibly negative incidents that turn out to be positive.

1) ACCURACY (AC)
This evaluation statistic calculates the percentage of accurate predictions and the total amount of suggestions the classifier made.It is written as follows: 2) RECALL (RC) This evaluation metric calculates the percentage of correctly classified positive patterns.The classifier delivers the most favorable outcomes, as the higher recall score indicates.It's outlined as follows: 3) SPECIFICITY (SF) The percentage of unfavorable patterns that are accurately classified is estimated by this performance metric.The model returns the most unfavorable outcomes according to the higher specificity value.It is written as follows: This performance metric calculates the proportion of accurately predicted positive findings to all positively anticipated observations.It is described as P R = T P F P + T P (37)

5) F-MEASURE (FM)
It is a single metric of assessment that combines precisions and recalls via their harmonic means.The smaller two components are more likely to be the mean preference.As a result, the value of FM will be low if either precision or recall has low values.It is written as follows: One of the most effective performance metrics, this one essentially measures the correlation between actual and anticipated binary categorization.As it includes values from each of a confusion matrix's four quadrants, it is considered a balanced metric.MCC's possible values range from -1 to +1.
A model with a score of +1 represents an entirely accurate classifier, while a score of -1 represents an entirely inaccurate classifier.As it is written:  Comparison of the proposed approach without using the feature selection approach.

7) MISS RATE (FNR)
The percentage of positive samples that were misclassified is implied by the term miss rate, which is often referred to as false negative rate.As it is written: The proposed feature selection method is compared with other methods in this section.Accuracy, recall, and specificity are used as evaluation criteria.The previous feature selection techniques are utilized, such as CFS+SVM, MIFS+SVM, RFS+SVM, Relief + CSF+ SVM and RFP+SVM.The comparison of feature selection methods is shown in Table 2.
Compared with the existing approaches, the proposed feature selection approach gains better accuracy, recall, and specificity results.The proposed approach yields 99.18% accuracy, 99.08% recall, and 99.02% specificity.A graphical   representation of existing models with the proposed is shown in Figure 3.
The performance analysis of various datasets using the feature selection approach is depicted in Table 3.The datasets like Cleveland, Hungarian, and Switzerland are employed to compare with our proposed utilized dataset.While the comparison can be made with existing datasets without employing a feature selection approach, the performance was poor.The Switzer land dataset yields 86.17% of accuracy, the Hungarian dataset yields 89.11%, and the Cleveland dataset yields 90.4%.Here, our proposed dataset gains 95.08% accuracy.While comparing with others proposed yields a better outcome.A comparison of the proposed approach without using the feature selection approach is shown in Figure 4.A comparison of the proposed approach using the feature selection approach is shown in Figure 5.The Hungarian dataset yields 93.19% accuracy, the Switzer land dataset yields 95.12% accuracy, and the Cleveland dataset yields 96.03% accuracy.Similarly, our proposed dataset gains 99.18% of accuracy.While utilizing the feature selection approach, their performances yield better outcomes.

9) # EXPERIMENT 2 (EVALUATION OF CLASSIFICATION APPROACH (OPTIMIZED DSCNN))
In this section, we evaluate the performance of the classification approach.For CKD classification, we employed the DSCNN approach, which STOA optimizes.It improves the performance of the proposed approach, which is shown in Table 4.
The comparison can be made with the existing approaches like EDL-CDSS, DBN, CNN-GRU, KELM, FNC, MLP, Decision Tree and ACO.The metrics like sensitivity, specificity, accuracy, and recall are utilized.While comparing with 99648 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.existing approaches, the proposed approach yields higher performances (99.18% of accuracy, 98.87% of sensitivity, 99.02% of specificity, and 98.96% of F-Score).A comparison of classification approaches utilizing the CKD dataset is shown in Figure 6.
A compare existing machine learning approaches with the proposed is shown in Table 5 and Figure 7. Logistic regression, Ensemble boosted tree, Support Vector Machines, and RfP-SVM are utilized to compare.While comparing with existing machine learning approaches, the proposed approach yields a greater solution.The second best result is present in the R f P-SVM approach.The ensemble-boosted tree approach needs much more attention.
The comparative performance of FPR and FNR metrics is shown in Figure 8.Compared with others, the proposed approach obtains less FPR and FNR.
A comparison of existing approaches with the proposed ones is shown in Table and Figure 9.While comparing with others, the proposed approach obtains better outcomes.FNR and FPR comparison over existing approaches is represented in Figure 10.

10) # EXPERIMENT 3 (EXECUTION TIME COMPARISON OVER THE MODELS)
The dataset and model pre-processing method directly impact the prediction rate.In addition, time plays a more significant role than other factors.The figure shows that SVM has the lowest predicted proportion of run time.However, GB requires the most time to arrive at a predictable score.Other well-known methods produce shorter time frames to discover the represented outcomes.
Overall classification accuracy and analysis execution time for all used algorithms are shown in Table 7 and Figure 11.The existing approaches, PNN, RBF, SVM, and MLP, are utilized to compare with the proposed method.When a comparison can be made with the proposed approaches, the approach gains higher accuracy and less execution time.The most used technique for categorization errors is the confusion matrix.These numbers are affected by using false-positive, true-negative, true-positive, and falsenegative values.Figures 13 and 14 show that Tng and Tps have significantly greater values than Fps and Fng.These values provided a clear representation of the effective categorization result of the proposed approach.

11) # EXPERIMENT 4 (EVALUATION OF TRAINING AND TESTING)
The graph illustrates the benefits of using the strategy proposed in the paper.In Figures 15 and 16

D. LIMITATIONS
The study also has some shortcomings.The outcomes could be unreliable because the dataset is tiny.Finding a dataset with more attributes and higher instances is challenging.Even more challenging is the dynamic data collected from the IoMT platform.However, overfitting was avoided during optimization by adjusting the parameters used to quantify the difference in error among the training and test datasets.We successfully used several input layers, hidden layers, activation functions, and optimizers in the DL model in the present research.The distinction between loss and validation loss resulting from these activities was relatively low (Figures 13 and 14).Thus, it can be said that even though there were 400 datasets, the models were not overloaded.

V. CONCLUSION AND FUTURE SCOPE
The mortality rate of Chronic Failure can only be decreased by early diagnosis and appropriate treatment.The capacity of categorization approaches to accurately classify disease datasets is leading to their growing importance in the healthcare industry.Based on objective clinical data, this research proposes an efficient, comprehensive categorization approach to identify renal illness.To determine the most critical risk factor characteristics associated with CKD, this diagnostic method is based on effectively optimizing the DSCNN classifier with the dimensionality reduction approach based on the Aquila optimization algorithm.The features are extracted with the help of the Capsule Network approach.To improve the performance even more, the proposed classification approach parameter is optimized by the Sooty tern optimization algorithm (STOA).The generated model's effectiveness is evaluated regarding diagnostic precision, accuracy, recall, PPV, FPR, and specificity.Compared to state-of-the-art algorithms, the findings demonstrated that the proposed approach was the most efficient for the repository CKD dataset.The proposed approach obtains 99.18% categorization accuracy with a clinical dataset, and the outcomes are also high.Therefore, it can be stated that the proposed approach competes with and exceeds existing approaches in the literature while executing in a minimum amount of time with beneficial accuracy when classifying.
The clustering approach will be used in further research studies to improve the effectiveness of categorization and reduce instances of incorrect categorization.

FIGURE 1 .
FIGURE 1. Architecture diagram of the proposed methodology.

FIGURE 3 .
FIGURE 3. Graphical representation of feature selection approaches.

FIGURE 4 .
FIGURE 4. Comparison of the proposed approach without using the feature selection approach.

FIGURE 5 .
FIGURE 5. Comparison of the proposed approach using the feature selection approach.

FIGURE 6 .
FIGURE 6. Performance comparison of classification approaches utilizing the CKD dataset.

FIGURE 7 .
FIGURE 7. Comparison of existing machine learning with the proposed approach.

FIGURE 8 .
FIGURE 8. Comparison of the performance of existing ML methods FPR and FNR metrics.

FIGURE 9 .
FIGURE 9.Comparison of the proposed approach with existing approaches utilizing various datasets.

FIGURE 10 .
FIGURE 10.FNR and FPR comparison over existing approaches.
training and testing loss functions, training and testing accuracy are depicted.The prepped training set is used to train the proposed algorithm for 100 epochs during the training phase.There has been established a 0.1 learning rate.

FIGURE 16 .
FIGURE 16.Training loss vs testing loss.

TABLE 1 .
Setting of the parameter value.

TABLE 2 .
Comparison of feature selection approaches.

TABLE 3 .
Performance of various datasets accuracy using feature selection.

TABLE 4 .
Performance comparison of the CKD dataset.

TABLE 5 .
Comparison of existing machine learning approaches with proposed.

TABLE 6 .
Performance of the existing methods utilizing various datasets.

TABLE 7 .
Overall classification accuracy (%) and analysis execution time for all used algorithms.