Early Prediction of Chronic Kidney Disease Using Deep Belief Network

Chronic kidney disease (CKD) is still a health concern despite advances in surgical care and treatment. CKD’s growth in recent years has gained much interest from researchers around the world in developing high-performance methods for diagnosis, treatment and preventive therapy. Improved performance can be accomplished by learning the features that are in the concern of the problem. In addition to the clinical examination, analysis of the medical data for the patients can help the health care partners to predict the disease in early stage. Although there are many tries to build intelligent systems to predict the CKD by analysis the health data, the performance of these systems still need enhancement. This Paper proposes an intelligent classification and prediction model. It uses modified Deep Belief Network (DBN) as classification algorithm to predict the kidney related diseases and the Softmax as activation function and the Categorical Cross-entropy as a loss function. The evaluation of the proposed model shows that the model can predict the CKD with accuracy 98.5% and sensitivity 87.5 % comparing with existing models. Result analysis proves that using advanced deep learning techniques is beneficial for clinical decision making and can aid in early prediction of CKD and its related phases that reduces the progression of the kidney damage.


I. INTRODUCTION
Kidney disease has become a typical disease with severe problems [1]. Kidney disease is described as a heterogeneous cluster of disorders affecting the structure and performance of the urinary organ. It is widely known that even delicate anomalies in measures of the composition and quality of urinary organs are still associated with increased risk of complications in alternate organ systems [2].
There are four stages of kidney diseases: The associate editor coordinating the review of this manuscript and approving it for publication was Juntao Fei .
• Obesity and hypertension: Many medical conditions can cause KD.
• Family history: If anyone in your family has kidney disease, dialysis, or kidney transplantation, you may be more likely to develop kidney disease than someone without this family history.
• Medicines: Some medicines can cause or exacerbate kidney disease, such as over-the-counter pain medicines.
• Age and race: older people and certain racial groups may have a higher chance of developing renal disease. The diagnosis of kidney disease in early stage saves the patient from serious complications. To predict the kidney diseases, the factors that cause it must be studied carefully.
All these factors can be translated into data to predict kidney disease and suggest a medical protocol to improve the patient health state. Medical information includes continuity, multi-attribution, incompletion, and time-related characteristics. The problem of using large volumes of data efficiently is becoming a major issue for the healthcare industry [5].
Nowadays, in health care domain, data mining plays an indispensable role in disclosing anonymous and valuable knowledge in health data. Health care partners can ameliorate quality of service by identifying latent, potentially valuable patterns that medical diagnosis demands by applying data mining techniques to health care domain [6].
Classification is one of healthcare organization's most utilized forms for data mining. The classification method determines the target category for each set of data. Classification is a valuable data mining process that suggests classification system (called classifier) to new cases.
Recently, in the domain of kidney disease, Deep Learning (DL) methods are used to automate the extraction and interpretation of the functions, hence models based on these techniques achieve high performance [7].
Deep learning is a machine learning focuses on learning to describe and construct multiple levels. Deep Neural Network (DNN) is an input-output neural network with multiple layers of nodes. It Identifies and processes series of layers between input and output in several stages. Deep learning requires computational models consisting of multiple layers of storage to acquire data structures of different abstraction scales. Deep learning approaches aim to learn features in hierarchical structures where features at higher hierarchy levels are created by lower-level features composition [8].
This paper introduces recent literature review in section 2, section 3 discusses the base methods and materials. The experiment design, setup, metrics and results are presented in section 4 and finally the conclusion.

II. LITERATURE REVIEW
Many researchers interested in prediction CKD. They used different classification algorithm to get an efficient and accurate prediction system. Dutta and Bandyopadhyay [9] proposed a Neural Network with 10 folds cross validation methodology to classify the CKD patients comparing that technique with Decision tree, Support Vector Machine, Gradient Boost classifier and K Nearest Neighbours. Experimental results based on accuracy, f1 score, kappa score and MSE showed that neural network with 10 folds cross validation model performed the highest accuracy.
Zhang et al.
[10] proposed a new approach for the transfer learning based using DenseNet-201, DenseNet-121, DensetNet-169 and neural networks, and compared between them for multiple sclerosis (MS) classification. Histogram stretching (HS) was utilized for preprocessing all used images and also assigned different composite learning factor (CLF) to the three layers used. The experimental results showed that the proposed model of DenseNet-201-D is the best one comparing with state-of-the-art approaches.
Anupama et al. [11] developed an intracerebral hemorrhage (ICH) diagnosing model based on GrabCut segmentation technique with using of synergic deep learning (SDL) for the feature extraction, also that model used Gabor filtering for removing the noise to raise the quality of the image.
Gupta et al. [12] built a deep learning classification framework using the stacked autoencoder model to extract the most effective and useful features of the chronic kidney disease dataset and the softmax classifier for prediction, so it provided high accuracy.
Rodrigues et al. [13] established a hybrid model of Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), to provide high defined captions and visual descriptions for video images. They used the CNN to extract the features of the video images then, these features were utilized by Long Short-term memory (LSTM) to generate sentences with a meaningful sequence and were trained with SoftMax function.
Sharma et al. [14] developed an expert system to help the skin specialists using Residual Neural Networks (ResNet) with 50-layers for diagnosing and identifying various skin diseases.
Wang et al. [15] proposed an AlexNet model for alcoholism identification that radiologists can use in patient diagnosis. The AlexNet was used as the main transfer learning model and applied five techniques of data augmentation (DA): random translation, gamma correction, noise injection, scaling and image rotation that improve the performance of classification.
Zhao et al. [16] established and validated the eGFR prediction model using information derived from the regional health network. The model was developed using regression from Random Forest and tested using statistics on goodness of fit and measures of discrimination. The eGFRs estimates were used to identify patients with elevated macro-averaged and micro-averaged parameters into CKD levels.
Abdelaziz et al. [17] proposed a CKD-based Cloud-IOT hybrid intelligent model utilizing two intelligent methods, Linear Regression (LR) and Neural Network (NN). LR is used to identify critical factors affecting CKD while NN is used for prediction. Furthermore, as an instance of a cloud technology environment, a hybrid cognitive framework was implemented on windows azure to forecast CKD and help clinicians in smart cities.
Chaitanya and Rajesh et al. [18] attempted to combine various algorithms such as (ANN+GSA), (ANN+GA) and the K-nearest neighbor. Comparing the performance of these algorithms based on accuracy, sensitivity, and specificity showed that, GA-ANN algorithms have the better classification effectiveness.
Besra and Majhi [19] started with preprocessing, then grouped instances into CKD or NOTCKD, and the percentage of GFR stage calculated. The methodology for classification started with the introduction of different classifiers such as Naive-Bayes, SOM, IB1, VFI, Multi-classifier, and Random Forest. Eventually, the rate and phases of kidney activity were assessed using the GFR test method.
Tekale et al. [20] built two different ML models using Decision Tree and SVM to predict CKD. The comparison between them, based on variety of evaluation measures, showed that the SVM performed better with accuracy of 96.75%.
Gharibdousti et al. [21] used Decision Tree, Linear Regressing, Super Vector Machine, Naive Bayesian and Neural Network as classification strategies. They investigated the similarity of the characteristics to achieve a matrix of correlation.
Boukenze et al. [22] Predicted kidney disease using multiple learning algorithms like SVM, MLP, DT (C4.5), Bayesian Network and KNN. They compared these algorithms and defined the most efficient algorithm based on multiple criteria; results showed that MLP and C4.5 have the best rates of accuracy.
Jojoa et al. [23] combined the elements used for (CKD) recognition. Two major factors, including prognosis, were modern ways of research in systematic understanding of the conditions, and allowed to develop a better and more accurate treatment plan by providing quantifiable prognostic risk factors.
Arora and Sharma [24] used three classification algorithms to detect CKD at the earliest possible stage, Naïve Bayes, J48 and SOM. Experiments with these algorithms on these datasets were implemented using WEKA, a tool of data mining. Ultimately, J48 was reported as the best algorithm for reliable and early diagnosis of chronic kidney disease. It achieved the highest accuracy among the three early-stage algorithms.
Başar et al. [25] proposed a chronic kidney disease diagnosis using learning algorithm of Adaboost ensemble. In this diagnostic approach, decision tree-based classifiers were used. Parameters such as: mean absolute error (MAE), kappa, root mean squared error (RMSE) and area under curve (AUC) were used to determine the category efficiency. The result showed that adaboost ensemble training algorithm delivered better output in classification than human classification.
Eroğlu and Palabaş [26] introduced six different basic classifiers, namely: K Nearest Neighbor (KNN), Naive Bayes, Support Vector Machines (SVM), Random Trees, J48, Decision Tables, and three different ensemble algorithms, namely: bagging, Ada Boost, random subspace. The result showed-that J48-based algorithm with random subspace and bagging ensemble algorithms, and random tree-based algorithm with bagging ensemble algorithm had achieved 100% classification efficiency.
Kunwar et al. [27] predicted chronic kidney disease (CKD) by techniques means of classification such as the Artificial Neural Network (ANN) and Naive Bayes. Rapid miner software was used for experimental purposes and the results showed that Naive Bayes was producing more accurate results than the Artificial Neural Network as it taken little time.
Ani et al. [28] established and created a system for promoting medical decision-making using Classification methods such as probability-based Naive Bayes, Neural Network-based Back Propagation (BPN), LDA classification, node-based decision tree, lazy learner K Nearest Neighbor (KNN) and algorithms for random subspace classification were studied. The accuracy of these algorithms were 78%, 81.5%, 90%, 93% 76% and 94% respectively on the UCI dataset.
Although the previous work did great efforts, the performance of their prediction systems still need enhancement to get high accurate results. So, this paper will propose a model to predict CKD with highest performance based on deep learning.

III. PROPOSED MODEL
The methodology of our research consists of sequence steps including data collection, pre-processing, deep learningbased feature engineering and classification followed by evaluation. These computational steps are graphically presented in Figure 1.  Table 1 shows the description of the used dataset.

B. DATASET PRE-PROCESSING
Classifying data with missing values is a challenge.
The used dataset has missing values, which reduce the efficiency, so it must be removed before analyzing data.
The missing values can be determined in two points of view, cases (records) or attributes.
In cases(record) point of view, the missing values degree may be simple, medium, or complex. It is simple degree if the case (record) has a missing value in one attribute at most. It is medium if the case (record) has missing values in 2% to 50% of the total number of attributes. While It is complex if the case (record) has missing values in at least 50% up to 80% of attributes. Imputation is seen as a way to avoid pitfalls of cases with missing values.
The reason behind missing value imputation is that dataset is not decreased, because the data imputation retains the entire sample size. There are many methods of imputations with different features to impute the missing data.
The mode imputation is used for nominal attributes which takes the most frequent value per column and for the numerical attributes the median imputation is used in which using the mid-range value of the attribute. Some of missing values are shown in Figure 2. Figure 2 shows that the data set contains complex and arbitrary missing values which will be overcome in the preprocessing stage.

C. DEEP LEARNING TRAINING
We performed our experiments on DBN based architectures to train and evaluate the dataset. DBN is based essentially on Restricted Boltzmann Machine (RBM) [31].

1) RESTRICTED BOLTZMAN MACHINE
RBM is a two-layer bipartisan graphic design with a collection of visible unit v, a set of invisible units h, and symmetrical relations defined by a weight matrix W between these two layers. The joint probabilities of the RBM of the hidden units h and visible units v is described by: where E is the energy function, Z is the partition function, while W , a, and b are model parameters. The partition function Z is defined by: The energy function E is: where T represents the transpose operator. The RBM is pre-trained by the Contrastive Divergence (CD) training algorithm using training data.
The method of CD preparation is carried out with the stochastic steepest ascending [32], as seen in Figure 3. Starting with Training Data v0, Equation (4) assumes the hidden state h0 then retrieves training data with invisible state of h0 by (5). Reconstruction v1 assume to the secret state h1.
The hidden state's conditional probabilities presented the visible state v are represented by: The visible state is given by binary random state. The conditional probabilities of the visible state which provided the hidden state are described in similar ways.

2) DEEP BELIEF NETWORKS (DBN)
Deep Belief Networks are interactive systems that built on stacking RBM that trained with CD The algorithm that determines the optimum locale for each layer and the next stacked RBM layer takes those optimally trained values and searches for the optimum locale again that is the cause of the greedy algorithm for learning works of DBN training layer by layer as shown in Algorithm 2 [26]. = v in ; where h 0 denotes the value of units in the input layer. hi represents the units' value of the ith layer. layer = 1; for ∀t = 1: T do, for layer < L, do gibbs sampling h layer using P(h layer |h layer−1 ) Computing the CD in Algorithm 1, ω(layer) is achieved using ω(t + 1) = ω(t) + × ω End for End for DBN learn to derive the training data from a deep hierarchical representation. It is a generational graphic model with multiple layers of deep-seated variables, connecting layers but not units within each layer. DBN is an unsupervised learning method, as opposed to neural networks for vision and back propagation [32]. Most current deep models start with random parameters and then learn gradually to approximate the local optimal ones. Unfortunately, weak initial parameters will contribute to a poor local optimum, which can have a significant impact on the subsequent learning method. A DBN's training process can solve this problem in the pre-training phase in which an unsupervised learning-based training is performed for setting the initial values of the weight of the network by training not randomly [33].  A simple network of beliefs [32] consists of layers of stochastic binary units with weighted relations as shown in Figure 4.

3) PROPOSED CKDPM
The proposed chronic kidney disease prediction model has deep network that is consisted of 6 layers. The first layer is for the input and this layer is constructed by 24 nodes while the second layer takes its input from the first layer; second layer is constructed with 13 nodes. Third layer consists of 28 nodes and its inputs come from the second layer. 8-nodes are used with forth layer and these nodes take the inputs from the third layer. Fifth layer consist of 4 nodes take its inputs from the fourth layer. Finally, the output layer has two nodes as it works for binary classification. Figure 5 shows the structure of the proposed deep neural network.
The model passes through three stages: The first stage is the pre-training phase in which the input layer receives the data and with the second layer forms a typical RBM using training for reaching the local optimal parameters. Then this output is taken as the third layer's input and the third layer with the second layer form a new RBM and the parameters are trained with sing unsupervised learning method continually, RBMs appear to be greedily piled layer by layer to reach the local optimal values and the approximation of the weighting factor is done in one step, resulting in a significant reduction in training time and the problem of underfitting, which is common in other networks, can also be solved during the pre-training phase.
The second stage is the fine-tuning in which using the Back Propagation algorithm to fine-tune the whole network parameters to reach the global optimal using the up-down algorithm, which is a contrastive variant of the wake-sleep algorithm that is used to train the DBNs with labeled data.
In the learning process of recognizing weights, a collection of labels is applied to the top layer in order to determine the network's category boundaries. The up-down algorithm, unlike the current wake-sleep algorithm, does not suffer from mode-averaging problems, which can result in low recognition weights.
The third stage is the classification (test) in that the trained model with optimal weights with biases is obtained, so data can be classified into the proper class.
Here, in our model, we use the modified DBN as in [32], which apply the DBN that built on RBM as discussed previously but it takes the expectation of the output after the activation function as in (7). Taking the expectation as late as possible is generally good, so the output Y of a layer's nodes is defined by: The difference between (6) and (7) is where the expectation is taken.
As mentioned above, F is the activation function used.
Here we use the SoftMax activation function as denoted at (8) it estimates the exponential of the given input x i and the summation of the exponential of all input values and the fraction of these results in the SoftMax function output [34].
The output of the SoftMax function is the probabilities of each class and the class of the highest probability is the output class.
where i = 0, 1, . . . , k (8) Finally, in our model, the Categorical Cross-entropy is a loss function that used for single label identification. It calculates the square difference between the truth output and the predicted one as in (9).
whereŷ is the predicted output value. After applying the activation function in the output layer, the Categorical crossentropy will compare the predictions with the true values [35].
In the fine-tuning the updated weights are propagated to the hidden and visible unites by the back propagation algorithm. For the final layer updates will be as follows: where L is the loss function, o j is the j-th node output, n j is the activation function's input of j-th node, i and j identify the input and the output nodes. For the hidden layer, the updates will be: where δ k and T k are propagated updates from the previous layer, and k identify the previous layer's node [32].

4) MATERIALS (SOFTWARE AND HARDWARE) USED
IBM core i7 processor computer with 16 GB DDRAM and NVIDIA GeForce MX150 graphics card is used to conduct tests and evaluations. The coded program is implemented using MATLAB 2018 x64-bit.

5) MODEL EVALUATION
The dataset is randomly divided in two parts; the first part contains 70% of overall collection of data to train the model. The second part is used for testing and it contains the reminder of the dataset (30%   Accuracy = t p + t n t p + f p + f n + t n (17) Precision = t p t p + f p (18) Recall = t p t p + f n (19) F − measure = 2 * Precision * Recall Precision + Recall (20) where t p , f p , f n , andt n refer to true positive, false positive, false negative, and true negative respectively [35]. Table 2 shows the obtained results for the proposed model during training and testing. Figure 6 and Table 3 show comparison of our model with others which used the same dataset but using different algorithms based on various evaluation measures. The dataset is balanced so the accuracy is perfect and is the most important measure on which all the classifiers depend on, that shows the correctly classified instances percentage. The proposed model shows high accuracy because of the advantages of the DBN that per-training the model in which the values of the local optimal and weighting factors are done in one step. It overcomes the underfitting problem and fine-tunes the whole deep model in one iteration, So the training time is reduced. Finally, the results of comparison revealed that the proposed model is effective in classification of CKD dataset.

V. CONCLUSION
This research presents an intelligent model of deep neural network for prediction and classification of chronic kidney disease dataset using a deep belief network. The model is built by using DBN with SoftMax classifier and the Categorical Cross-entropy as a loss function. We use dataset from UCI's machine learning database and perform preprocessing to handle the missing values.
The efficiency of the proposed model is evaluated, and also made a comparison with the existing models. The proposed model achieves better performance, comparing with the existing models, with accuracy 98.52%. So, the proposed model presents proper predictor and classifier for CKD.