Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare

Heart disease is one of the complex diseases and globally many people suffered from this disease. On time and efficient identification of heart disease plays a key role in healthcare, particularly in the field of cardiology. In this article, we proposed an efficient and accurate system to diagnosis heart disease and the system is based on machine learning techniques. The system is developed based on classification algorithms includes Support vector machine, Logistic regression, Artificial neural network, K-nearest neighbor, Naïve bays, and Decision tree while standard features selection algorithms have been used such as Relief, Minimal redundancy maximal relevance, Least absolute shrinkage selection operator and Local learning for removing irrelevant and redundant features. We also proposed novel fast conditional mutual information feature selection algorithm to solve feature selection problem. The features selection algorithms are used for features selection to increase the classification accuracy and reduce the execution time of classification system. Furthermore, the leave one subject out cross-validation method has been used for learning the best practices of model assessment and for hyperparameter tuning. The performance measuring metrics are used for assessment of the performances of the classifiers. The performances of the classifiers have been checked on the selected features as selected by features selection algorithms. The experimental results show that the proposed feature selection algorithm (FCMIM) is feasible with classifier support vector machine for designing a high-level intelligent system to identify heart disease. The suggested diagnosis system (FCMIM-SVM) achieved good accuracy as compared to previously proposed methods. Additionally, the proposed system can easily be implemented in healthcare for the identification of heart disease.


I. INTRODUCTION
Heart disease (HD) is the critical health issue and numerous people have been suffered by this disease around the world [1]. The HD occurs with common symptoms of breath shortness, physical body weakness and, feet are swollen [2]. Researchers try to come across an efficient technique for the detection of heart disease, as the current diagnosis techniques of heart disease are not much effective in early time identification due to several reasons, such as accuracy and execution time [3]. The diagnosis and treatment of heart disease is extremely difficult when modern technology and medical experts are not available [4]. The effective diagnosis The associate editor coordinating the review of this manuscript and approving it for publication was Navanietha Krishnaraj Krishnaraj Rathinam. and proper treatment can save the lives of many people [5]. According to the European Society of Cardiology, 26 million approximately people of HD were diagnosed and diagnosed 3.6 million annually [6]. Most of the people in the United States are suffering from heart disease [7]. Diagnosis of HD is traditionally done by the analysis of the medical history of the patient, physical examination report and analysis of concerned symptoms by a physician. But the results obtained from this diagnosis method are not accurate in identifying the patient of HD. Moreover, it is expensive and computationally difficult to analyze [8]. Thus, to develop a noninvasive diagnosis system based on classifiers of machine learning (ML) to resolve these issues. Expert decision system based on machine learning classifiers and the application of artificial fuzzy logic is effectively diagnosis the HD as a result, the ratio of death decreases [9] and [10]. The Cleveland heart disease data set was used by various researchers [11] and [12] for the identification problem of HD. The machine learning predictive models need proper data for training and testing. The performance of machine learning model can be increased if balanced dataset is use for training and testing of the model. Furthermore, the model predictive capabilities can improved by using proper and related features from the data. Therefore, data balancing and feature selection is significantly important for model performance improvement. In literature various diagnosis techniques have been proposed by various researchers, however these techniques are not effectively diagnosis HD. In order to improve the predictive capability of machine learning model data preprocessing is important for data standardization.  [13] presented different techniques for different type of feature selection, such as feature selection for high-dimensional small sample size data, large-scale data, and secure feature selection. They also discussed some important topics for feature selection have emerged, such as stable feature selection, multi-view feature selection, distributed feature selection, multi-label feature selection, online feature selection, and adversarial feature selection. Jundong et al. [14] discussed the challenges of feature selection (FS) for big data. It is necessary to decrease the dimensionality of data for various learning tasks due to the curse of dimensionality. Feature selection has great influence in numerous applications such as building simpler, increasing learning performance, creating clean and understandable data. The feature selection from big data is challenging job and create big problems because big data has many dimensions. Further, challenges of feature selection for structured, heterogeneous and streaming data as well as its scalability and stability issues. For big data analytics challenges of feature selection is very important to resolved. In [15] designed unsupervised hashing scheme, called topic hyper graph hashing, to report the limitations. Topic hypergraph hashing effectively mitigates the semantic shortage of hashing codes by exploiting auxiliary texts around images. The proposed Topic hyper graph hashing can achieve superior performance equaled with numerous state-of-theart approaches, and it is more appropriate for mobile image retrieval. The feature selection algorithms are classified into three type such as filter based, wrapper based and embedded based. All these feature selection mechanisms have some advantages and limitations in certain cases. The filter based method measures the relevance of a feature by correlation with the dependent variable while the wrapper feature selection algorithm measure the usefulness of a subset of features by actually training the classifier on it. The filter method is less computationally complex than wrapper method. The feature set selected by the filter is general and can be applied to any model and it is independent of a specific model. In feature selection global relevance is of greater importance.
On another hand suitable machine learning model is necessary for good results. Obviously, a good machine learning model is a model that not only performs well on data seen during training (else a machine learning model could simply learn the training data), but also on unseen data. To evaluate all classifiers on data and find that they get, on average, 50% of the cases right [16]. Furthermore, appropriate cross validation techniques and performance evaluation metrics are critical necessary for a model when model is train and test on dataset.
We proposed a machine learning based diagnosis method for the identification of HD in this research work. Machine learning predictive models include ANN, LR, K-NN, SVM, DT, and NB are used for the identification of HD. The standard state of the art features selection algorithms, such as Relief, mRMR, LASSO and Local-learning-basedfeatures-selection (LLBFS) have been used to select the features. We also proposed fast conditional mutual information (FCMIM) features selection algorithm for features selection. Leave-one-subject-out cross-validation (LOSO) technique has been applied to select the best hyper-parameters for best model selection. Apart from this, different performance assessment metrics have been used for classifiers performances evaluation. The proposed method has been tested on Cleveland HD dataset. Furthermore, the performance of the proposed technique have been compared with state of the art existing methods in the literature, such as NB [17], Three phase ANN (Artificial neural Network) diagnosis system [18], Neural network ensembles (NNE) [19], ANN-Fuzzy-AHP diagnosis system (AFP) [20], Adaptive-weighted-Fuzzy-system-ensemble (AWFSE) [21]. The research study has the following contributions.
• Firstly, the authors try to address the problem of features selection by employing pre-processing techniques and standard state of the art four features selection algorithms such as Relief, mRMR, LASSO, and LLBFS for appropriate subset of features and then applied these features for effective training and testing of the classifiers that identify which feature selection algorithm and classifier gives good results in term of accuracy and computation time.
• Secondly, the authors proposed fast conditional mutual information (FCMIM) FS algorithm for feature selection and then these features are input to classifiers for improving prediction accuracy and reducing computation time. The classifiers performances have been compared on features selected by the standard state VOLUME 8, 2020 of the art FS algorithms with the selected features of the proposed FS algorithm.
• Thirdly, identify weak features from the dataset which affect the performance of the classifiers.
• Finally, suggests that heart disease identification system (FCMIM-SVM) effectively identify the HD.
The paper remaining sections are structured as follows. The literature related to the problem has been discussed in section 2. In section 3 the dataset and the theoretical and mathematical knowledge of feature selection and classification algorithms are discussed in details. Additionally, discuss the technique of cross-validation and performance measuring metrics. In section 4 results of all experiments are analyzed and discussed in details. The last section 5 the conclusion and future direction of the research work have been explored in details.

II. LITERATURE REVIEW
In literature various machine learning based diagnosis techniques have been proposed by researchers to diagnosis HD. This research study present some existing machine learning based diagnosis techniques in order to explain the important of the proposed work. Detrano et al. [11] developed HD classification system by using machine learning classification techniques and the performance of the system was 77% in terms of accuracy. Cleveland dataset was utilized with the method of global evolutionary and with features selection method. In another study Gudadhe et al. [22] developed a diagnosis system using multi-layer Perceptron and support vector machine (SVM) algorithms for HD classification and achieved accuracy 80.41%. Humar et al. [23] designed HD classification system by utilizing a neural network with the integration of Fuzzy logic. The classification system achieved 87.4% accuracy. Resul et al. [19] developed an ANN ensemble based diagnosis system for HD along with statistical measuring system enterprise miner (5.2) and obtained the accuracy of 89.01%, sensitivity 80.09%, and specificity 95.91%. Akil et al. [24] designed a ML based HD diagnosis system. ANN-DBP algorithm along with FS algorithm and performance was good. Palaniappan et al. [17] proposed an expert medical diagnosis system for HD identification. In development of the system the predictive model of machine learning, such as navies bays (NB), Decision Tree (DT), and Artificial Neural Network were used. The 86.12% accuracy was achieved by NB, ANN accuracy 88.12% and DT classifier achieved 80.4% accuracy. Olaniyi et al. [18] developed a three-phase technique based on the artificial neural network technique for HD prediction in angina and achieved 88.89% accuracy. Samuel et al. [20] developed an integrated medical decision support system based on artificial neural network and Fuzzy AHP for diagnosis of HD. The performance of the proposed method in terms of accuracy was 91.10%. Liu et al. [25] proposed a HD classification system using relief and rough set techniques. The proposed method achieved 92.32% classification accuracy. In [26] proposed a HD identification method using feature selection and classification algorithms. Sequential Backward Selection Algorithm (SBS FS) for Features Selection. The classifier K-Nearest Neighbor (K-NN) performance has been checked on full and on selected features set. The proposed method obtained high accuracy. In another study MOHAN et al. [27] designed a HD prediction method by using hybrid machine learning techniques. He also proposed a new method for significant feature selection from the data for effective training and testing of machine learning classifier. They have been recorded 88.07% classification accuracy. Geweid et al. [28] designed HD identification techniques by using improved SVM based duality optimization technique. In the above literature the proposed HD diagnosis methods limitation and advantages have been summarized in Table 1 for better understanding the important of our proposed approach. All these existing techniques used numerous methods to identify the HD at early stages. However, all these techniques have lack of prediction accuracy and high computation time for prediction of HD. According to Table 1 the prediction accuracy of HD detection method need further improvement for efficient and accurate detection at early stages for better treatment and recovery. Thus, the major issues in these previous approaches are low accuracy and high computation time and these might be due the use of irrelevant features in dataset. In order to tackle these problems new methods are needed to detect HD correctly. The improvement in prediction accuracy is a big challenge and research gap.

III. MATERIALS AND METHOD
All the research materials and techniques background are discussed in the following subsections.
A. DATA SET Cleveland Heart Disease [29] dataset is considered for testing purpose in this study. During the designing of this data set there were 303 instances and 75 attributes, however all published experiments refer to using a subset of 14 of them. In this work, we performed pre-processing on the data set,and 6 samples have been eliminated due to missing values. The remaining samples of 297 and 13 features dataset is left and with 1 output label. The output label has two classes to describe the absence of HD and the presence of HD. Hence features matrix 297*13 of extracted features is formed. The dataset matrix information's are given in Table 2.

B. PRE-PROCESSING OF DATA SET
The pre-processing of dataset required for good representation. Techniques of pre-processing such as removing attribute missing values, Standard Scalar (SS), Min-Max Scalar have been applied to the dataset.

C. STANDARD STATE OF THE ART FEATURES SELECTION ALGORITHMS
After data pre-processing, the selection of feature is required for the process. In general, FS is a significant step in constructing a classification model. It works by reducing the number of input features in a classifier, to have good predictive and short computationally complex models [30]. We have been used four standard state of the art FS algorithms and one our proposed FS algorithm in this study.

1) RELIEF
Relief [31] algorithm assigns weights to each data set features and updated weights automatically. The features having high weight values should be selected and low weight will be discarded. Relief and K-NN algorithm process to determine the weights of features are the same [32]. The algorithm relief repeated through m random training samples (R_k), without selection substitution, and m is the parameter. Each k, R_k is the 'target' sample and weight W of the is updated [33]. The algorithm 1 is the Pseudo-code for Relief FS algorithm.

2) MINIMAL-REDUNDANCY-MAXIMAL-RELEVANCE
MRMR algorithm chooses features that are suitable for the prediction and selected features that are non redundant. It does not take care of the combination of features [32]. The MRMR pseudo code is given in algorithm 2 [34].

3) LEAST-ABSOLUTE-SHRINKAGE-SELECTION-OPERATOR ALGORITHM
LASSO choose feature based on modifying the absolute coefficient value of the features. Then these features coefficient values set to zero and finally zero coefficient features are eliminated from the features set. In the selected features set for A ← 1 to a do 8:

4) LOCAL LEARNING BASED FEATURES SELECTION ALGORITHM
LLBFS assigns weights to features and reduced the complexity of non-linear problems into linear. Features having large VOLUME 8, 2020  [36].

D. PROPOSED FEATURE SELECTION ALGORITHM
In order to tackle the feature selection problem, we proposed Fast conditional mutual information (FCMIM) feature selec-tion algorithm [37] in this study. It is an efficient feature selection method which is designed from conditional mutual information (CMI). The ''FCMIM'' algorithm designing having the following procedures. Let us consider a dataset O(X , Y ), where X instances and Y is output labels. As written in Eq. 1.
where x i can be written as in Eq. 2.
We apply pre-processing statistical techniques, such as Min-Max normalization on the dataset O(X, Y) as expressed in Eq. 3. set L i ← 0 7: end for 8: for k ← 1 to K do Initialize score i ← 0 9: for features o i inO do 10: while P i > score k And L i < k − 1 do 11: set L i ← L i + 1 12: Calculate VU i between o k and o i 13: Set p i ← min(p i CM ik ) 14: end while 15: if p i > score k then 16: Set score k = p i 17: 19: end for 20: end for to the result of any feature selected before (O). This condition selects features to vary from ones that have selected already even if they are separately correct as they don't more information about output class. It will be good condition between relevance and duplication [37]. The FCMIM high value shows that feature X n is more relevant to output Y and is highly compatible with another selected feature X j where j ∈ O [38]. Mathematically the stated condition is expressed in Eq. 4.
The FCMIM algorithm tries to obtain a balance between separate power and independence between the comparison of each new feature with features that elected already. The feature X 0 will be good consideration only if I (Y , X 0 |X ) is large for every X already selected. The fast implementation applied feature score during the selection process and evaluate CMI only for those features which give more information and less redundant. FCMIM keep a partial score P i for every feature O i which is minimum out of the FCMI that appears in the min in the algorithm equation number 4. The L i vector store the index of the selected feature based on the calculation of P i . The ''FCMIM'' pseudocode is given in algorithm 3.

E. CLASSIFIERS
For the identification of the heart disease classifiers are utilized in this paper and shortly discussed in Table 3.

F. LEAVE-ONE-SUBJECT-OUT CROSSES VALIDATION TECHNIQUE
In this LOSO validation strategy, one sample is separated as test data and remaining subjects to train the model. The test subject is predicted as HD otherwise, the subject is classified as healthy.

G. PERFORMANCE EVALUATION METRICS
Different performance evaluation metrics have been used for classifiers performance evaluation [50], [51]. These metrics are calculated with the help of the confusion matrix. Table 4 shows the binary classification matrix. From Table 4 we computed the following performance evaluation metrics and mathematically shown in Eq. 5-9 respectively.
Here MCC is Matthews correlation coefficient,

H. PROPOSED HEART DISEASE DIAGNOSIS METHODOLOGY
The system has been designed for the identification of heart disease. The performances of various machine learning classifiers for HD identification have been checked on selected features. The standard state of art algorithms of features selection includes Relief, MRMR, LASSO, and LLBFS are utilized for features selection. We also proposed FCMIM algorithm for features selection. The performance of the classifiers evaluated on selected features sets which are selected by the state of the art FS algorithms and proposed FCMIM algorithm. The LOSO technique of cross-validation also used for best model evaluation. The model's performance measuring metrics include accuracy, specificity, sensitivity, MCC and processing time is automatically calculated for classifiers evaluation. The proposed system methodology is organized  The pre-processing of heart disease dataset using preprocessing methods 3: Features selection using standard state of the art and proposed FCMIM FS algorithms 4: Train the classifiers using training dataset 5: Validate using testing dataset 6: Computes performance evaluation metrics 7: End into these steps such as preprocessing of the dataset, feature selection algorithms, cross-validation method, machine learning classifiers, and classifiers performance evaluation metrics. The algorithm 4 is pseudo-code of the proposed system.

A. EXPERIMENTAL DESIGN SETUP
Supervised classification experiments have been conducted in order to evaluate the classification performance of classifiers. In the first phase, standard features selection algorithms are applied such as Relief, MRMR, LASSO and LLBFS for selection of appropriate features. Then in the second phase of experiments, the proposed FS algorithm was used for features selection. Then the classifiers performances were evaluated on selected features. Furthermore, LOSO CV method is applied with each classifier. To test the performances of the classifiers, various performance evaluation metrics are computed. All the experiments have been performed in a python environment using different machine learning libraries on an Intel(R) C i7-2400 CPU @3.10 GHz system.

1) RESULTS OF DATA PRE-PROCESSING TECHNIQUES
The different statistical operations such as removing attributes missing values, Standard Scalar (SS), Min-Max Scalar, means, standard division have been applied to the dataset. The results of these operations are reported in Table 5.
The processed dataset has 297 instances and 13 inputs attribute with one output Label. Data Visualization is the presentation of data in graphical format. It helps people understand the significance of data by summarizing and presenting huge amount of data in a simple and easyto-understand format and helps communicate information clearly and effectively. Figure 2 is the histogram of the data set represents the frequency of occurrence of specific phenomena which lie within a specific range of values and arranged in consecutive and fixed intervals and Figure 3 describes the co-relation among the features of the dataset using heat map. The heat map, which is a two-dimensional representation of data in which col-ors represent values. A single heat map provides a quick visual summary of information. More elaborate heat maps allow the viewer to understand complex datasets. Furthermore, Heatmap can be super useful when we want to see which intersections of the categorical values have higher concentration of the data compared to the others.

2) FEATURES SELECTED BY STANDARD STATE OF THE ARTS ALGORITHMS
The data preprocessing and important features selected by Relief, MRMR, LASSO, and LLBFS FS algorithms have been reported in Table 6 along with the features scores and their ranking. According to the results of relief algorithm, the most important features for the identification of heart disease are THA, EIA, and CPT. Other FS algorithms are also selecting these important features such as THA, CPT, SEX, VCA, and EIA. These features are more appropriate for the identification of heart disease. Moreover, FBS has a low score in features scores. All the FS algorithms select some features that mostly selecting by every FS algorithm. Figure 4 shows the important features of scores and ranking graphically for a better understanding of four FS algorithms. The LASSO FS algorithm makes binary classification. LASSO create most realted features to output target class as true and the reminder as false. From 13 features 5 features have been true labeled by LASSO. The selected features have been reported in Table 6. LASSO cross validation mean square results are shown in Figure 5 Lambda is weight parameter and the value of lambda lies between [0,1]. In Figure 5 y-axis is a validation VOLUME 8, 2020      Table 7, we report the features selected by FCMIM FS algorithm along with feature score and graphical describes in Figure 6.   other parameters values also passed during the training process. Table 8 represents the performance evaluation of classifiers with LOSO CV. According to Table 8, the classifier logistic regression has good performance that obtained 84% accuracy, 93% specificity, and 75% sensitivity and MCC was 84%, and processing time was 0.003 seconds at C = 10 as compared with others values of parameter C. The K-NN, different experiments conducted with different values of k. However, at k = 7 the performance of K-NN was excel-  lent. ANN was trained with hidden neurons but at 10 hidden neurons give better performance result with accuracy 60%, specificity 100%, and sensitivity 0%. SVM (RBF) with C = 100, g = 0.001 has 61% specificity, 70% sensitivity and 70% accuracy. The SVM linear kernel has 95% specificity, 75% sensitivity, and 85% accuracy. The NB was third good classifiers which have 90% specificity, 78% sensitivity and 80%accuracy. DT has 72% specificity, 83% sensitivity, and 70% accuracy. Figure 7 shows that the SVM outperformed as compared to the other five classifiers. The accuracy of SVM (linear) is 85%, sensitivity 77%, and specificity 95%, and 85% accuracy. Logistic regression is second good classifier has 84% accuracy. The third important classifier is NB and its specificity is 90%, sensitivity is 78%, and classification accuracy is 80%. The worst classifiers were K-NN at k = 1 with LOSO cross-validation. The MCC of SVM is 85% pretty good and SVM is good classifier for heart disease prediction. In Figure 11, we have been shown the execution time of each algorithm in which classifier Svm (linear) on C = 100 and g = 0.009 processing time is 30.145 seconds and logistic regression at C = 10 is 0.003 seconds very fast exaction time as compared to others classifiers with LOSO cross-validation method. Table 8 shows the LOSO cross validation classifiers performance with full features.
In the following sections, the classifiers performances evaluated on features selected by the standard FS algorithm.

5) CLASSIFIERS PERFORMANCES ON FEATURES SET SELECTED BY RELIEF FS ALGORITHM
In this experiment, features selected by Relief in Table 6 are used with different classifiers with LOSO CV method. Additionally, various parameters values are used with classifiers. First, the classifiers are trained and tested with the selected 3 numbers of features subset, second time 4 feature, than 6 features subset, Similarly 8, 10 subsets were used and lastly used 12 features. The classifiers performance are pretty good on 6 number of features set. Thus 7 tables with LOSO were constructed however, we report the performance of classifiers on 6 important features set as shown in Table 9. Additionally, for a better understanding of the results, some graphs have been created.
According to 9, the results of logistic regression with C = 10 was very good and obtained 85% accuracy, 98% specificity, 72% sensitivity along with 88% MCC. And with low processing time 0.001seconds on reduced 6 features set as compared to other values of hyperparameter C. It is clear that the performance of logistic regression improves with on features selection. There are significant improvements in all evaluation metrics. The classification of Logistic Regression whole features was 84% and on reduced features 85%. We used various values of K however with k = 7 the K-NN show good performance in all metrics 80% accuracy and computation time 4.266 seconds on selected features with LOSO validation methods. However, at the K-NN performances were not good on full features on the same values of k = 7. The processing time K-NN with k = 7 on full and selected features 6.601 seconds and 4.266 seconds respec-tively. This one of the advantages of features selected for the classification problem. The performance of ANN was designed as MLP and used a various of hidden neurons units. The MLP on 20 units neurons the MLP gives high results on selected features with LOSO validation method and obtained classification accuracy 80% and on full features, the accuracy was 55%. It clears the difference of performance improvement with features selection. Also, the computation time of the ANN algorithm also reduced from 9.777 seconds to 1.867 seconds. The SVM (RBF) at C = 100, and g = 0.0009 were high performance as compared C and g other values as shown in Table 9. SVM (kernel = RBF) obtained accuracy 81%, on selected features and 57% accuracy value on full features with LOSO validation method. The computational time was 0.003 seconds on selected features as compared to the time on full features which was 0.008 seconds. SVM (linear) at C = 100, and g = 0.0009 achieved accuracy 86%, with a computational time 11.569 seconds on reducing selected features by relief with LOSO validation method. The NB accuracy on full feature was 75% and on reduce feature the 1% improvement in performance. Similarly, the DT performance improved from 70% accuracy to 73% with reducing feature. As shown in Figure 7 that in term of accuracy SVM performance was better as compared to other classifiers on selected features. The greater value of ANN specificity 100% which good for detecting healthy people. The sensitivity of SVM linear is 76% on selected features which is good then the sensitivity of full features so SVM good for detecting people with heart disease. The performance evaluation of different classifiers with relief features selection algorithm VOLUME 8, 2020 has been shown in Figure 8 for a better demonstration for the results.

6) CLASSIFIERS PERFORMANCES ON SELECTED FEATURES BY MRMR
In this section, selected features by mRMR was used in classifiers with LOSO CV. Furthermore, various parameters values were used classifiers. Initially, train-test the classifiers with 3 features subset, second time 5 features, than 7 features subset, similarly used 9, 11 features subsets and lastly used 12 subset of features. The classifiers results were pretty good on 6 numbers of features subset. Total, 8 tables with LOSO CV were formed but we only report the results of classifiers on 6 features set in Table 10 because the overall results of classifiers at 6 features set was high as compared to the performance on experiments on 3, 5, 9, 11, 12 features sets. In Table 10, the results of LR on C = 10 gives high performances. Achieved 86% accuracy, 97% specificity, and 73% sensitivity along with 87% MCC. There are significant improvements in all evaluation metrics. The classification of LR whole features was 84% and on selected features 86% on the same parameter value. The K-NN on k = 7 gives high results in all metrics 82% accuracy and computation time 2.376 seconds on selected features with LOSO validation methods. However, at the K-NN performances were not good on full features on the same values of k = 7. The processing time K-NN at k = 7 on whole and selected features 6.601 seconds and 3.276 seconds respectively. The performance of ANN was created as multilayer perceptron and used a various number of hidden neurons. On 20 hidden neurons the MLP gives high results on selected features and obtained classification accuracy 80% and on full features, the accuracy was 55%. It clears the difference of performance improvement with features selection. Also, the computation time of the ANN algorithm also reduces from 9.777 seconds to 2.867 seconds. The SVM (RBF) results at C = 100 and g = 0.0009 was high as compared to other values of C and g as shown in Table 9. SVM (RBF) achieved accuracy 83%, on selected features and 57% accuracy value on full features with LOSO validation method. The computational time was 0.103 seconds on selected features as compared to the time on full features which was 0.008 seconds. SVM (Linear) with C = 100, and g = 0.0009 achieved accuracy 87%, with a computational time 7.509 seconds on reduced 6 important selected features set by MRMR. The NB accuracy on full feature was 77% and on reduce feature the 2% improvement in performance. Similarly, the DT performance improved from 70% accuracy to 78% with reducing feature with LOSO validation method. As shown in Figure 8 that in term of accuracy SVM performance was better as compared to other classifiers on selected features. The greater value of specificity 99% of logistic regression which good for detecting healthy people. The sensitivity of SVM linear is 79% on selected features which is good then the sensitivity of full features so SVM good for detecting of people with heart disease. The performance evaluation of different classifiers with MRMR features selection algorithm has been shown in Figure 9 for a better demonstration for the results.

7) CLASSIFIERS RESULTS ON FEATURES SELECTED BY LASSO ALGORITHM
The features selected by LASSO were used by classifiers with LOSO CV. We used 3 features set, second time 4 features set, and then 6 features set, similarly 8, 10 features sets were used and lastly used 12 features set. The classifiers performances were high on 6 features set. Hence, 8 tables were constructed on these results but we only described the results of classifiers on 6 features set in Table 11 because the overall results of classifiers at 6 t features set was good as compared to the results of (3,4,8,10,12 ) features sets. According to Table 11 results show that the logistic regression on hyperparameter C = 10 was very good performances and obtained 87% accuracy, 95% specificity, 74% sensitivity along with 86% MCC. And with low processing time 0.001 seconds on reduced 6 features set as compared to other values of hyper parameter C with LOSO validation methods. It is clear that the performances of logistic regression improve with on features selection and there were significant improvements in all evaluation metrics. The 95% specificity shows that logistic regression is very best detecting algorithm for healthy people. And 74% sensitivity of logistic regression used for detecting of people with heart disease. The K-NN on k = 7 shows high results. The performance of ANN was formed as multilayer  perceptron and in MLP were used a various number of hidden neurons. The ANN on 20 hidden neurons the MLP gives high results on selected features set with LOSO validation method and obtained classification accuracy 82% and on full features, the accuracy was 55%. It clears the difference of performance improvement with features selection. Also, the computation time of the ANN algorithm also reduced from 9.777 seconds to 5.931 seconds. The specificity of ANN was 94% at 20 hidden neurons. Therefore, the ANN is good for detection of healthy people. The results of SVM (rbf) at C = 100 and  Table 11. SVM (rbf) obtained accuracy 85%, on selected features and 57% accuracy value on full features set. The computational time was 0.007seconds on selected features as compared to the time on full features which was 0.008 seconds. SVM (Linear) on C = 100, and g = 0.0009 achieved accuracy 86% with a computational time 0.021 seconds on 6 selected features set by LASSO with LOSO validation method. The NB accuracy on full feature was 75% and on reduces features set was 76% only 1% improvement in performance. Similarly, the DT performance improved from 70% accuracy to 79% with a reduced feature with LOSO validation method. The decision tree accuracy 78% on selected. As shown in Figure 10 that in term of accuracy logistic regression performance was better as compared to other classifiers on selected features. The greater value of specificity 97% of logistic regression which good for detecting healthy people. The sensitivity of DT 78% on the selected features set and identification of people with heart disease. The performance evaluation of different classifiers with LASSO features selection algorithm has been shown in Figure 10 for the better demonstration for the results.

8) CLASSIFIERS RESULTS ON FEATURES SELECTED BY LLBFS
In these experiments, the features selected by LLBFS FS algorithm were used by classifiers with LOSO CV. Furthermore, various parameters values were used with classifiers. Hence, 8 tables were constructed however we only report the results of 6 features subset in Table 12. Table 12 shows the results described that the logistic regression on hyperparameter C = 10 was very good performances and obtained 88% accuracy, 93% specificity, 75% sensitivity along with 89% MCC. The execution time as 0.001seconds on selected features as compared to other values of hyperparameter C with LOSO validation methods. The 93% specificity shows that logistic regression is very best detecting algorithm for healthy people and 75% sensitivity of logistic regression used detecting of people with heart disease. We used different values of k but k = 7 the K-NN show high results on selected features set. The ANN was created as MLP was used a various number of hidden neurons. At 40 hidden neurons the MLP gives high results on selected features with LOSO validation method and obtained classification accuracy 81% and on full features, the accuracy was 55%. The computation time of the ANN algorithm also reduced from 9.777 seconds to 2.501 seconds. The results of SVM (RBF) on C = 100 and g = 0.0009 were high as compared to other values of C, and g as shown in Table 9. SVM (RBF) achieved 82% accuracy on selected features and 57% accuracy value on full features set. The computational time was 0.002 seconds on selected features as compared to the time on full features which was 0.008 seconds. SVM (Linear) at C = 100 and g = 0.0009 achieved accuracy 87%, with a computational time 0.032 seconds on reduced 6 important selected features set. The NB accuracy on full feature was 75% and on reduces, features set 76% only 1% improvement in performance. Similarly, the DT performance improved from 70% accuracy to 74% with reduces features set with LOSO validation method. The performance evaluation of different classifiers with LASSO features selection algorithm has been shown in Figure 11 for a better demonstration for the results.  Table 7. The LOSO CV is used and different parameters values are used with classifiers. To demonstrate the results some graphs have been designed for better understanding. In Table 13 report the classification performances of the classifier on selected features LOSO validation. In figure 12 and 13, performance of proposed method graphically shown.

10) CLASSIFIERS PERFORMANCE COMPARISON ON SELECTED FEATURES SELECTED BY PROPOSED FS ALGORITHM (FCMIM) AND STANDARD STATE OF ART FS ALGORITHMS
To determine the best classifiers result with best features selection algorithms using LOSO validation method. According to the results of four states of the art features selection algorithms and proposed FCMIM algorithm, the results of best classifiers with their evaluation metrics have been given in Table 14. According to Table 15, the performance of SVM  in term of accuracy is good and achieved 92.37%accuarcy on selected features selected by proposed FS algorithm (FCMIM) as compared to the state of the arts FS algorithms (Relief, MRMR, LASSO, LLBFS) with LOSO CV. Hence in term of accuracy FCMIM, FS algorithm best for features selection and SVM is suitable classifier for HD diagnosis. LASSO and MRMR performances in term of accuracy with LOSO validation are also good for heart diagnosis. The specificity of classifiers as reported in Table 13 that specificity of ANN classifier is best on Relief FS algorithm as compared to the specificity of MRMR, LASSO, LLBFS, and FCMIM feature selection algorithms. Therefore, Relief FS algorithm with classifier ANN the specificity is good and best diagnosis system for correct classification of healthy people. The    Table 15 shows the accuracy of LR improved from 84% to 88% on reduces features with LLBFS algorithm. Similarly, SVM (linear) accuracy improved from 85% to 92.37% on reduces features set with FCMIM. Thus, the performance of classifiers improved with selected features. Finally, we concluded that the diagnosis system for heart disease using FCMIM FS algorithm with classifier SVM is good for effective diagnosis for heart disease. The proposed system (FCMIM + SVM) accuracy is high and achieved 92.37% accuracy as compared to other features selection algorithms and classifiers.

11) PERFORMANCE OF BACKWARD PROPAGATION DEEP NEURAL NETWORK (BPDNN) FOR DETECTION OF HD
In order to compare the performance of machine learning models with deep learning models, we use BPNN for classification problem. The training parameters are updated of BPNN in order to generate high classification results. Therefore, different number of hidden layers, hidden neurons, learning rate and epochs are applied for producing excellent result in our experiments. In Table 16

12) PERFORMANCE COMPARISON OF PROPOSED METHOD WITH PREVIOUSLY PROPOSED METHODS
The proposed method (FCMIM-SVM) performance in term of accuracy compared with existing methods in the literature for heart disease diagnosis. The proposed method achieved accuracy of 92.37% as compared to the previous method. The accuracy of the proposed method and existing reported in Table 17 and graphically described in Figure 14 for better understanding. Furthermore, the proposed method suggested for heart disease detection due to an accurate diagnosis. The proposed system can be easily incorporated into health care organization. VOLUME 8, 2020

V. CONCLUSION
In this study, an efficient machine learning based diagnosis system has been developed for the diagnosis of heart disease. Machine learning classifiers include LR, K-NN, ANN, SVM, NB, and DT are used in the designing of the system. Four standard feature selection algorithms including Relief, MRMR, LASSO, LLBFS, and proposed a novel feature selection algorithm FCMIM used to solve feature selection problem. LOSO cross-validation method is used in the system for the best hyperparameters selection. The system is tested on Cleveland heart disease dataset. Furthermore, performance evaluation metrics are used to check the performance of the identification system. According to Table 15 the specificity of ANN classifier is best on Relief FS algorithm as compared to the specificity of MRMR, LASSO, LLBFS, and FCMIM feature selection algorithms. Therefore for ANN with relief is the best predictive system for detection of healthy people. The sensitivity of classifier NB on selected features set by LASSO FS algorithm also gives the best result as compared to the sensitivity values of Relief FS algorithm with classifier SVM (linear). The classifier Logistic Regression MCC is 91% on selected features selected by FCMIM FS algorithm. The processing time of Logistic Regression with Relief, LASSO, FCMIM and LLBFS FS algorithm best as compared to MRMR FS algorithms, and others classifiers. Thus the experimental results show that the proposed features selection algorithm select features that are more effective and obtains high classification accuracy than the standard feature selection algorithms. According to feature selection algorithms, the most important and suitable features are Thallium Scan type chest pain and Exercise-induced Angina. All FS algorithms results show that the feature Fasting blood sugar (FBS) is not a suitable heart disease diagnosis. The accuracy of SVM with the proposed feature selection algorithm (FCMIM) is 92.37% which is very good as compared previously proposed methods as shown in Table 17. Further, the performance of machine learning based method FCMIM-SVM is high then Deep neural network for detection of HD. A little improvement in prediction accuracy have great influence in diagnosis of critical diseases. The novelty of the study is developing a diagnosis system for identification of heart disease. In this study, four standard feature selection algorithms along with one proposed feature selection algorithm is used for features selection. LOSO CV method and performance measuring metrics are used. The Cleveland heart disease dataset is used for testing purpose. As we think that developing a decision support system through machine learning algorithms it will be more suitable for the diagnosis of heart disease. Furthermore, we know that irrelevant features also degrade the performance of the diagnosis system and increased computation time. Thus another innovative touch of our study to used features selection algorithms to selects the appropriate features that improve the classification accuracy as well as reduce the processing time of the diagnosis system. In the future, we will use other features selection algorithms, optimization methods to further increase the performance of a predictive system for HD diagnosis. The controlling and treatment of disease is significance after diagnosis, therefore, i will work on treatment and recovery of diseases in future also for critical disease such as heart, breast, Parkinson, diabetes.
JIAN PING LI is currently a Chairman of the Computer Science and Engineering College and the Model Software College, University of Electronic Science and Technology of China. He is also the Director of the International Centre for Wavelet Analysis and Its Applications. He is with the National Science and Technology Award Evaluation Committee, the National Natural Science Foundation Committee of China, and the Ministry of Public Security, China, such as Technical Adviser and a dozen academic and social positions. He serves as the Chief Editor for the International Progress on Wavelet Active Media Technology and Information Processing. He also serves as an Associate Editor for the International Journal of Wavelet Multiresolution and Information Processing.
AMIN UL HAQ received the M.S. degree in computer science. He is currently pursuing the Ph.D. degree with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, China. He has a vast academic, technical, and professional experience in Pakistan. He is also a Lecturer with Agricultural University, Peshawar, Pakistan. He is associated with the Wavelets Active Media Technology and the Big Data Laboratory, as an International Student. He has been published high-level research articles in good journals. His research interests include machine learning, medical big data, the IoT, e-health and telemedicine, and concerned technologies and algorithms.
SALAH UD DIN received the master's degree from COMSATS University Islamabad, Pakistan. He is currently pursuing the Ph.D. degree in computer science and technology with the University of Electronic Science and Technology of China, China. His research interests include data stream mining, especially on data stream classification, novel class detection, and semi-supervised learning.
JALALUDDIN KHAN received the M.S. degree in computer science from Aligarh Muslim University, Aligarh, India. He is currently pursuing the Ph.D. degree with the School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China. He has an impressive academic, research, and professional experience in Saudi Arabia. He was a Lecturer with the Deanship of Skills Development and a Researcher with the Center of Excellence in Information Assurance (COEIA), King Saud University, Riyadh, Saudi Arabia. He is accompanying with the Wavelets Active Media Technology and the Big Data Laboratory under the supervision of Prof. J. P. Li and with a collaborated way with other researchers in UESTC. He has authored some research articles. His research interests include the IoT, security and privacy, e-health and telemedicine, machine learning, medical big data concerned technologies, and the IoT security with medical data. ABDUS SABOOR is currently pursuing the M.S. degree with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, China. He is also a Lecturer with Government University, Peshawar, Pakistan. His research interests include machine learning, medical big data, the IoT, e-health and telemedicine, and concerned technologies and algorithms.