The Cost-Based Feature Selection Model for Coronary Heart Disease Diagnosis System Using Deep Neural Network

The development of feature selection models in intelligence systems for the diagnosis of coronary heart disease has been widely carried out. One of the developments that have been carried out is to minimize the number of inspections carried out. Unfortunately, many features selection models do not consider the cost of inspection, so the result of feature selection is an average inspection that requires high costs. This study proposes an intelligence system model for the diagnosis of coronary heart disease using a feature selection model that considers the cost of the examination. Feature selection is developed using a genetic algorithm and support vector machine. Decision-making of the diagnosis system is carried out using a deep neural network, with system performance being measured using the parameters of accuracy, sensitivity, positive predictive value, and area under the curve (AUC). The test results use the z-Alizadeh sani model feature selection dataset which produces 5 features out of 54 existing features. The use of these 5 features can produce AUC performance of 93.7%, accuracy of 87.7%, and sensitivity of 87.7%. Referring to the resulting performance, it shows that the feature selection model by considering the cost of an inspection can provide performance in the very good category.


I. INTRODUCTION
The development of intelligence system models for the diagnosis of coronary heart disease has been developed by utilizing data mining techniques [1]. The intelligence system model using data mining techniques is divided into a number of stages, one of which is dimensional reduction. Dimensional reduction is divided into two, namely the reduction of the amount of data and the reduction of the number of attributes [2], [3]. The focus of many studies is the reduction of the number of attributes, known as feature selection. The feature selection method is divided into three main approaches, namely filtering [4], [5], wrapper [6], and embedded [7]. Each approach has advantages and disadvantages. The filtering model is very independent of the classification algorithm, while the embedded selection The associate editor coordinating the review of this manuscript and approving it for publication was Mostafa Rahimi Azghadi . process is attached to the classification algorithm. The wrapper approach is feature selection to get the best subset. The process to get the best feature subset is done by using a control in the form of classification performance parameters, such as accuracy [8].
The wrapper method has a better accuracy performance than the filtering method but has a high complexity [9]. The wrapper method is widely used in dimensional reduction, as was done by Shah et al. [10], wherein this study the Accuracy based Feature Selection Algorithm (AFSA) method was used for the feature selection process. The AFSA method uses a wrapper approach with accuracy control from the Support Vector Machine (SVM) classification algorithm based on the Radial basis function (RBF). The wrapper approach is also used in the research of Kumar & Sahoo [11], which combines genetic algorithms with Random Forest. The fitness function used in the genetic algorithm uses accuracy performance. A genetic algorithm is also used in VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Gokulnath & Shantharajah's [12] research which is combined with a support vector machine, with the same fitness function, namely accuracy. The development of intelligence system models also uses filtering-based feature selection. Research conducted by Gazeloğlu et al. [13] uses Correlation-based Feature Selection (CFS) for feature selection. In addition to testing the CFS feature selection, other methods are also tested, namely Fuzzy Rough Set & Chi-Square. The study concluded that CFS gave the best performance when combined with Naïve Bayesian. CFS has a weakness in terms of the number of features produced is still relatively large, so the computational process takes a long time. In addition to CFS, the feature selection fast correlation-based filter (FCBF) is also widely used. The ability of this method is that it produces fewer features, so the computation time is faster [14]. FCBF is also the right choice for the feature selection process on highdimensional data [15]. Referring to a number of previous literatures about feature selection methods, they only select features that can provide the best performance.
Development of an intelligence system model for diagnosis by referring to medical record data, sometimes there are imbalanced data conditions. Imbalanced data is the condition of the data in a class that is not balanced so that it can result in the model being trained with unbalanced data will give poor performance. Research conducted by Nasarian et al. [16], proposes a model by considering imbalanced data, namely by testing the system model using Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling Approach (ADASYN). The use of these two methods is able to provide improvement in system performance when in imbalanced data conditions [17]. The ADASYN method based on the original data distribution can adaptively generate synthetic data samples for the minority class, it can reduce the bias caused by the unbalanced data distribution. Furthermore, ADASYN can also independently shift classifier decision boundaries to focus more on examples that are difficult to learn, thereby improving learning performance [18]. In the study of Haibo-He et al. [18], they conducted a test by comparing the ADASYN method with SMOTE, and the results of testing with a number of datasets showed that the ADASYN method performance was better. Another comparison also shows the ability of ADASYN is better than Borderline-SMOTE [19].
The development of an intelligence system model for the diagnosis of coronary heart disease requires a classification algorithm that can provide good performance. Research conducted by Mehmood et al. [20] and research conducted by Hussain et al. [21], both of which use deep convolutional neural networks to classify in coronary heart disease diagnostic systems. Both studies did not use the feature selection stage, so the input model used all the features in the Cleveland dataset. Both models are able to provide good performance, but with a large number of features. A similar study was conducted by Miao et al. [22], but using a deep neural network and combined with principle component analysis (PCA). Deep neural network (DNN) capabilities are better than conventional neural networks, and a number of classification methods such as random forest, SVM, and kNN [23], [24]. The ability of DNN was also confirmed in a study conducted by Tomov et al. [25], were using this method was able to provide better performance than a number of studies in the area of coronary heart disease diagnosis, especially when using the Cleveland dataset [26].
Feature selection with the wrapper method generally uses performance parameters such as accuracy, sensitivity, and F-measure, to determine whether or not a set of attributes is reduced. The use of these parameters is sometimes inappropriate in certain conditions because sometimes additional considerations are needed, such as in the case of selecting examination attributes for the diagnosis of coronary heart disease. In the case of a diagnosis of coronary heart disease, it is sometimes necessary to consider the cost and ease of access to health services, especially during the COVID-19 pandemic which has a negative impact on the community's economy [27]- [29]. During the COVID-19 pandemic, the number of poverty levels increased which had an impact on the low ability of the community to access health services [30]. Preventive action is very necessary related to coronary heart disease, namely by carrying out routine checks. Routine checks with many attributes become unaffordable to the public because the costs are high. In these conditions, a coronary heart disease diagnosis model is needed, with a small number of examination attributes and low cost. This model is still able to provide performance that is still within the medical tolerance limits, especially for initial screening. This makes the diagnosis system model using examination attributes at an affordable cost by the community.
Referring to a number of studies that have been carried out, this research develops an intelligence system model with a feature selection method that considers costs. The cost to be considered is the cost of examining each attribute used for diagnosis. The feature selection method used is a hybrid, which is a combination of wrapper and filtering. The wrapper method used is based on a genetic algorithm with an SVM classification algorithm. The filtering method uses FCBF, which is preceded by an oversampling process using the ADASYN method to balance the data. The intelligence system model in determining conclusions using the DNN algorithm. System testing was carried out using the Z-Alizadeh sani dataset, with the performance parameters measured were accuracy, sensitivity, precision, and area under the curve (AUC).

II. MATERIAL AND METHOD
This study uses the Z-Alizadeh sani dataset, which can be accessed online [31]- [33]. The dataset consists of 54 attributes and 303 data instances. The examination fee for each attribute is obtained from the Clinical laboratory of Prodia and Sebelas Maret University Hospital, Surakarta, Indonesia. Attributes and examination fees can be shown in Table 1. The fees are shown in Table 1 are the result of IDR to 29688 VOLUME 10, 2022 USD conversion and these fees are accessed in August 2021. Feature dataset consisting of 54 attributes that can be grouped into 4, namely demographic, symptom examination, Electrocardiogram (ECG), laboratory, and Echocardiogram (ECHO). The cost of a number of examinations obtained is one package, such as ECG and Demographic. In the case of inspection which costs one package, the inspection fee for each attribute is calculated by dividing the total cost by the number of attributes. Using this calculation, it is hoped that the feature selection results carried out later do not have to check all the attributes in one package so that it can reduce time and costs.
The research method used in this study can be shown in Figure 1. In Figure 1 the study is divided into several main stages, namely pre-processing, feature selection, balanced data, classification, and performance evaluation. At the pre-processing stage, including the data normalization process. Normalization of data using the Min-Max method [34]. The next stage is the feature selection process. The feature selection process is carried out using a wrapper approach. The implementation of the wrapper approach uses a genetic algorithm combined with the SVM algorithm. The SVM algorithm uses the RBF kernel [35], [36]. Performance benchmarks in genetic algorithms take into account the cost of testing. The formula for the objective function of the genetic algorithm is shown in equation (1).
where the variables sensitivity (Sen) and Accuracy (Acc) are performance parameters with a formula as shown in equation (2)(3)(4). These parameters refer to the Table 2 confusion matrix.
In feature selection modeling with genetic algorithms, each chromosome is a representation of the solution, in the form of a selected subset of attributes, with objective function parameters as benchmarks. On each chromosome, there are a number of genes, which in this case represent each test attribute. The attributes representation will vary depending on the data type of the attribute. For example, for an attribute with an ordinal data type, each value will be represented in each gene, as for the RWMA Region attribute. The RWMA region has 5 categorical values so that the chromosomes are modeled with 5 genes. The chromosome model in feature selection can be shown in Figure 2. A value of 0 indicates that the attribute is not selected, whereas when the value is one, the attribute is included in the attribute of a subset. The best chromosome benchmark is determined by the objective function shown in equation (1). The cost-based feature selection process can be explained by referring to Figure 2. Each chromosome which consists of selected features, accuracy, and sensitivity will be calculated using the SVM classification algorithm. The next step is to calculate the cost of the selected features in the chromosomes, then add them up and take the average. The costs used are normalized so that the range of values is VOLUME 10, 2022 the same as accuracy and sensitivity. The objective function of each chromosome is calculated using equation (1). The same process is carried out for all generated chromosomes, both at the beginning of the generation and each generation change in each iteration in the genetic algorithm. The chromosomes with the best objective function values will be selected. Referring to equation (1) shows that the higher the cost of an inspection will reduce the performance of the system. This requires a combination of features that are low in total cost but capable of providing good performance.
In this study, the parameters used in the genetic algorithm, namely a population of 1000 chromosomes, with 150 generations. The probability used in the crossover process is 0.55, while the mutation is 0.3. The selection method used is tournament [37], [38], while the crossover method uses two points [39]. Crossover is the process of exchanging genes from one chromosome with another to produce a new chromosome through several intersection points. In the twopoint crossover method, 2 random numbers will be generated as chromosome cut points, which means that one chromosome is cut into 3 parts which are then crossed with the opposite chromosome. After penetrating the parameters in the genetic algorithm, the next step is to run the genetic algorithm. The final result of the genetic algorithm is the number of chromosomes with the best objective function. The content of the chromosomes is the result of feature selection, which is a selected subset of attributes [40].
The stage after feature selection is performed using a wrapper, is the oversampling process using the ADASYN [17], [18]. The oversampling process produces data that is balanced between positive and negative coronary heart disease. The next process is feature selection using FCBS [15]. FCBS is a filter-based feature selection. The result of the FCBS process is a sequence of attributes from the highest to the lowest rank. The selection of attributes is done by looking at the rankings, in this study 20, 15, 10, and 5 attributes were taken. The next stage is the distribution of data for training and testing. The validation method used is k-folds crossvalidation, after the data is divided then the classification process is carried out. Classification is done using a deep neural network (DNN) with the architectural model shown in Figure 3. Figure 3 is a Deep Neural Network architecture with the number of hidden layers L-1, then the output function can be expressed in equation (5)   where φ n , is a transfer function with n = 1, 2, . . . ., L, which can be either linear or non-linear. The activation functions used in this study are ReLU and Softmax, which formulas can be shown in equations (6-7). The DNN input is expressed in X-matrixes, while the weights are expressed in W n matrices and B n bias, where n denotes the nth hidden layer, the value of n = 1, 2, . . . , L. Matrix X is an examination attribute, which is an attribute resulting from the feature selection process whether considering the cost or not.
where α and λ are hyper-parameters defined by α > 0 and λ > 0. If the value of α = 0 and λ = 1, then equation (6) is referred to as Rectified linear unit (ReLU). The next activation function is softmax, which can be shown in equation (7).
where x is the input vector to the output layer, and j = 1, 2, . . . , K is the index for the output unit. There are several stages in DNN, starting with the Keras-Tuner process, this process uses data that has been done k-folds, with a value of k = 5 to find the optimal model. When the optimal model is found, then the training process is carried out using training data. The testing data is used to validate and run callbacks such as saving optimal weight, early stopping, and reducing the learning rate on the plateau. After the training phase is complete, then testing is carried out using testing data to obtain a number of performance parameters for later evaluation.
The last stage is the measurement of the performance of the proposed model. Performance measurement uses the parameters of accuracy, sensitivity, and precision (positive prediction value) with the formula shown in equation (2)(3)(4). In addition to these three parameters, performance parameters are also measured which are sensitivity and 1-specificity which are expressed in the area under the curve (AUC) parameter. Referring to the AUC parameter, the proposed system model can be categorized into poor, sufficient, good, or very good categories [42].

A. RESULTS
The system intelligence model for the resulting diagnosis has a deep neural network architecture as shown in Table 3. Table 3 shows the DNN architecture for feature selection without and considering the cost of requiring a varying number of hidden layers. The number of hidden layers depends on the number of features used. For feature selection without considering cost, for the number of features are 20, the DNN performance is optimal when using 6 hidden layers, namely hidden layers L to L-5. The activation function used in the hidden layer is Softmax, while the output layer uses ReLU. The highest number of hidden layers of DNN when the number of features used is 5. In feature selection which considers costs, the DNN architecture requires the most hidden layers when using 10 features, while the least is when using 5 features. In the number of features 5, it only requires 3 hidden layers.
The DNN architecture shown in Table 3 was obtained from the results of the DNN training process. In the training process to determine the optimal parameters of the hyperparameters, automatic tuning is carried out using a keras-tuner. Keras Tuner is a hyperparameter optimization framework from DNN. The DNN hyperparameters optimization process is carried out by determining the search space and utilizing the included algorithm to find the best hyperparameter value. The search algorithm used in determining the hyperparameters is Hyperband [43]. Table 4 is the result of feature selection from the genetic algorithm combined with SVM and continued with the filtering process using the FCBF algorithm. The results of the process obtained 20 features. In the feature selection process with genetic algorithms and SVM without considering costs, 32 attributes are obtained, while when considering costs there are 21 attributes. The FCBF method is needed to rank features that are relevant to the class but not redundant to other relevant features. Therefore, an approach will be taken by measuring the correlation between two random variables using Symmetrical Uncertainty (SU) [15], [40]. The SU value is in the range of 0 to 1. In this study, 20, 15, 10, and 5 attributes were selected from the FCBF results by referring to the SU value. Regarding the examination fees in Table 4, refer to Table 1. For each inspection that costs one package, such as symptom & examination and demography, it is assumed that the cost for each attribute examination is the same, so the cost of each attribute is the result divided by the number of attributes examined. Table 4 shows that the resulting features are only a collection of features with low costs that are still maintained. If referring to equation (1), the selection process is influenced by the performance of accuracy, sensitivity, and cost, so that when a feature with a high cost and when combined with a set of existing features does not provide a significant performance improvement, it will not be selected. In this case, it can be seen that all the expensive ones such as Q Wave, Region RWMA, and VHD Severe are eliminated because the costs required are very high which is not proportional to the resulting performance when combined with other feature sets. Referring to this, it cannot be assumed that high-cost features can be eliminated immediately. Features with high costs still have the opportunity to be selected, if combined with other features that they are able to produce good performance with a lower total cost compared to other feature sets.
The cost of checking for the number of features 15, 10, and 5, is done by adding up the cost of checking the top 15 features as well as those of 10 and 5 features. So, for the 15, 10, and 5 features, where the feature selection process is without considering the cost, the total cost is 28,888 USD for 15 features, 27,169 USD for 10 features, and 13,108 USD for 5 features. For feature selection by considering the cost, we get 5,159 USD for 15 features, 3,439 USD for 10 features, and 1,720 USD for 5 features. If it refers to the costs incurred for the inspection, then feature selection by considering costs is able to reduce costs that are quite large. The significant reduction in inspection costs was not accompanied by a significant decrease in performance. The performance of the system when using feature selection taking into account the cost does not always decrease in performance, as shown in Table 5. Table 5 also shows that the proposed model is better than a number of ensembles learning algorithms, such as Random Forest (RF) and XGBoost. This can be shown in the performance parameters AUC and sensitivity. Table 5 shows that when feature selection considers inspection costs, there is a decrease in performance. The decrease in performance that occurs is not significant, even relatively constant. This is shown when the number of features are 5, where without considering the cost of examining the AUC performance parameters up to 93.9%, while when considering the inspection costs, the AUC performance becomes 93.7%. Another reverse condition occurs when the number of features is 20, cost considerations in feature selection make the AUC performance parameter increase, from 95.1% to 97.3%.
An overview of the proposed system model can be shown in Figure 4. In Figure 4 it can be explained that examinations of patients recorded into the cloud system can use desktop-based applications and mobile applications. The recorded attributes are divided into 4 groups. The intelligence system model when used only uses attribute checks according to the output of the feature selection process. As a trial using the application model, you can use the input-output form shown in Figure 5. The test is carried out when using 5 examination attributes, namely Typical chest pain, DM, Non-anginal, HTN, and CRF. The system output is the percentage of confidence for each possibility, namely positive or negative coronary heart disease. Figure 5 shows the value of 96%, so the conclusion is positive for coronary heart disease.

B. DISCUSSION
The feature selection model based on a genetic algorithm by considering the cost of the examination is able to provide relatively good performance. The decrease in performance that occurs is not significant, and the resulting performance is still in a very good category, with an AUC value of 93.7% [42], requiring only 5 features out of 54 existing features. If referring to Table 4, the inspection attributes that require high inspection costs are immediately eliminated, namely Q Wave, Region RWMA, and VHD Severe. The three attributes were eliminated from the 20 selected attributes, because the cost was above 1 USD, while the others were less than 1 USD. The proposed system model performs attribute elimination at a high cost, but by referring to the objective function shown in equation (1), the elimination is carried out by considering the performance parameters of accuracy and sensitivity. It is the control of these two performance parameters that make the performance still relatively good.
The result of feature selection for 5 attributes, when not considering cost, is the same as the research conducted by Alizadehsani et al. [31], including Typical Chest Pain, Region  RWMA, and age. These attributes when feature selection considers costs, will be eliminated, namely the RWMA Region. This attribute requires a relatively high cost in the examination. Chest pain is one of the symptoms of a disease that in a short time can cause death. Patients with a history of diabetes mellitus (DM) will experience atypical chest pain by 0.32 odds compared to patients without a history of DM [61]. The typical chest pain attribute is the attribute that has the highest weight in the diagnosis of coronary heart disease, this is in line with feature selection using information gain [45].
The effectiveness of the use of feature selection, in addition to being shown by the resulting performance, can also be demonstrated by data visualization, one of which is the distance matrix. Figure 6(a) shows that before feature selection is performed, the distance between one object and another object in the same class is very far. If the features are well separated, then the features are easily identified to which class they belong. Classes in this study are a cad and normal. Figure 6(b) shows that after feature selection, the resulting distance matrix for the same class is relatively small. The distance matrix is calculated by using Euclidean distance [62], [63] from one object to another object. The effect of feature selection, apart from being seen with the resulting distance matrix, is also the resulting classification performance. The use of feature selection is able to provide better classification algorithm performance. VOLUME 10, 2022  Feature selection by considering costs, with a total of 5 features showing better capabilities than some previous studies, this can be shown by the AUC value and the number of attributes used. There are some that have better AUC values but seen from the number of features used are fewer and the costs required are also lower. In a study conducted by Joloudari et al [58] and Abdar et al. [53] were able to provide AUC performance above 95%, but the value was included in the very good category [42], as well as the proposed model. When viewed from the number of features required, there is a significant difference, namely 40 features and 16 features. Another consideration is that these 5 attributes are examination services that are easily accessible and available in primary health care [64], [65].
The proposed method has a performance that is not inferior in terms of accuracy performance parameters or AUC with a number of previous studies. Comparison with a number of previous studies can be shown in Table 6. Research that shows a relatively similar performance is shown in the study of Abdar et al. [53], with an accuracy of 94.66% using only 16 attributes, while the proposed model requires 20 attributes. The advantage of the proposed model is that the selected attribute examination model is not expensive. If we refer to Table 1, the 20 attributes used are included in the demographic group as well as symptom and examination. If the proposed model uses 15 attributes or even 10 attributes, the resulting performance, the difference is only slightly, when referring to the AUC value, it is still included in the same performance category, which is very good (AUC>90%) [42].
Further comparison with the research conducted by Das et al. [60], the feature selection research carried out was able to produce 21 attributes and was classified by the Random Forest algorithm resulting in an accuracy of 92.31%. Compared to the proposed research, the Das et al. [60] research has lower accuracy. Even when this proposed study uses the number of attributes 15 and 10, it still results in higher accuracy and AUC. The proposed model also has a better ability than the research of Joloudari et al. [58], where the study required 40 attributes to produce an AUC value of 96.70%, while the proposed model only had 20 attributes. The proposed model is also better than the model proposed by Alizadehsani et al. [31], this study requires 34 attributes to get an accuracy of 94.08%.

IV. CONCLUSION
The proposed system model, namely the feature selection model by considering costs and classified by DNN provides better performance than a number of previous studies. The capability of the proposed model can achieve 97.3% AUC performance by only requiring 20 attributes, even only requires 5 attributes to achieve 93.7% AUC. The proposed model can be an alternative to the feature selection model, by adding the consideration of inspection costs. The performance of the proposed model is generally categorized in the very good category.
WIHARTO received the B.E. degree in electrical engineering from Telkom University, Bandung, Indonesia, and the master's and Ph.D. degrees from Universitas Gadjah Mada, in 2004 and 2017, respectively. He is currently an Associate Professor with the Department of Informatics, Universitas Sebelas Maret, Surakarta, Indonesia. His research interests include artificial intelligence, computational intelligence, data mining, expert systems, machine learning, and medical imaging.
ESTI SURYANI received the bachelor's degree in mathematics and the master's degree in computer science from Universitas Gadjah Mada, Yogyakarta, Indonesia, in 2002 and 2006, respectively. He is currently working as an Assistant Professor with the Department of Informatics, Universitas Sebelas Maret, Surakarta, Indonesia. His research interests include image processing, statistics and probability, fuzzy logic, and cryptography. BINTANG PE PUTRA received the bachelor's degree in informatics from the Faculty of Mathematics and Natural Sciences, Universitas Sebelas Maret, Surakarta, Indonesia, in 2022. His research interests include deep learning, image processing, artificial intelligence, machine learning, and computational intelligence. VOLUME 10, 2022