A Boosting-Based Hybrid Feature Selection and Multi-Layer Stacked Ensemble Learning Model to Detect Phishing Websites

Phishing is a type of online scam where the attacker tries to trick you into giving away your personal information, such as passwords or credit card details, by posing as a trustworthy entity like a bank, email provider, or social media site. These attacks have been around for a long time and unfortunately, they continue to be a common threat. In this paper, we propose a boosting based multi layer stacked ensemble learning model that uses hybrid feature selection technique to select the relevant features for the classification. The dataset with selected features are sent to various classifiers at different layers where the predictions of lower layers are fed as input to the upper layers for the phishing detection. From the experimental analysis, it is observed that the proposed model achieved an accuracy ranging from 96.16 to 98.95% without feature selection across different datasets and also achieved an accuracy ranging from 96.18 to 98.80% with feature selection. The proposed model is compared with baseline models and it has outperformed the existing models with a significant difference.


I. INTRODUCTION
In recent times, the internet has brought about revolutionary changes in the way we communicate, making it more convenient and accessible. However, this positive transformation has also led to a significant increase in the number of internet users, providing an opportunity for adversaries to exploit naive individuals by stealing their sensitive credentials. One of the most common methods used by these attackers is phishing, which involves sending fake emails or creating replica websites to lure unsuspecting users into providing their personal information. As a result, innocent users become prey to these attacks.
The associate editor coordinating the review of this manuscript and approving it for publication was Victor Sanchez .
Based on a phishing survey conducted by the Anti-Phishing Working Group (APWG), 1 the total number of phishing websites for the first three quarters of 2022 was 3,394,662, as depicted in Figure 1. In contrast, the combined total for all four quarters of 2021 was 2,847,773, as illustrated in Figure 2. This represents a significant 19.2% increase in just the first three quarters of 2022 when compared to the entire year of 2021. This growth highlights the serious threat posed to naive internet users. Phishing attacks are commonly developed and distributed over the internet through two methods: fake emails and the replication of legitimate websites. Fake emails, also known as spoofed emails, are sent to users under the guise of a legitimate company or organization. In addition,  attackers create and deploy replicas of original websites on social media platforms like Twitter, Facebook, and Google. These phishing websites may use the green padlock and Hypertext Transfer Protocol Secure (HTTPS) to trick users into believing they are legitimate sites.
To detect and prevent phishing attacks, various methods have been proposed in the literature, including blacklist [1], [2], [3], [4], feature extraction [5], [6], [7], [8], and machine learning [9], [10], [11]. A blacklist is a list or database of phishing URLs that are typically blocked by modern browsers such as Chrome, Opera, and Mozilla. However, this technique is ineffective in detecting and preventing zero-day phishing sites that have a short lifespan. Feature extraction involves extracting characteristics from different phishing websites to identify and prevent phishing attacks, but not all phishing sites have the same features, so this method may not be reliable for all websites.
As a result, classification models [12] like Decision Tree (DT), Random forest (RF), etc. are used to detect phishing attacks. Existing literature [2], [12], [13] shows that machine learning-based methods can achieve up to 99% accuracy in detecting phishing websites, outperforming the blacklist and feature extraction techniques.
The performance of machine learning (ML) algorithms for detecting and preventing phishing attacks depends on the quantity of training data and the quality of the extracted features from phishing websites. Traditional ML models struggle to capture the diverse characteristics of data, while ensemble learning can extract diversified features, combine predictive results produced by multiple learning algorithms, and achieve better predictive performance using ensemble methods like voting, stacking, blending, and averaging. Also, deep learning methods are used in different domains [38], [39], [40], [41] including medical, security and NLP. Techniques in [41], [42], [43], and [44] use different deep learning techniques such as LSTM, CNN, GRU etc for the classification of phishing sites. To further select the relevant features from the given dataset, feature selection algorithms such as filter, wrapper and embedded techniques are used.
This work proposes a feature selection-based ensemble model to detect and prevent phishing websites, aiming to reduce the time for training and classification, as well as computation overhead. By harnessing the capabilities of a range of well-performed models in the task of classification, the proposed ensemble model shows promise for detecting and preventing phishing attacks. The model is applied to four datasets, including two variants of the Mendeley Phishing Dataset (MPD) (small and large), Mendeley with 10,000 instances, and UCI.

A. MOTIVATION
Phishing attacks pose a significant threat to online security and detecting them accurately remains a challenging problem. Various machine learning and feature selection strategies have been proposed to address this issue. Baseline machine learning approaches have successfully identified phishing websites, but ensemble-based models have demonstrated better efficiency and accuracy.
Specifically, the stacking model MLSELM [45] achieved the best results among the baseline and ensemble models. Feature selection approaches have also been employed to obtain an optimal feature subset, reducing model execution time, and improving accuracy. Feature importance-based approaches have shown greater accuracy, as they rank each feature based on its contribution to the model. However, these approaches have not been fully explored, especially in boosting-based ensemble models, stacking, multi-layered stacking, and the averaging of feature ranks obtained from multiple boosting models.

B. CONTRIBUTION
This study proposes a novel hybrid feature selection approach and a boosting-based multi-layered stacking ensemble learning model to address the challenges of detecting phishing attacks accurately. The feature importance ranking of three out of five boosting models that achieved high accuracy on all VOLUME 11, 2023 four phishing datasets were considered. The average feature subset was determined from the three feature subsets selected by the three best boosting models for each dataset. Finally, K-topmost features were selected for each dataset, with K ranging from 66% to 86% based on the number of features and size of the respective dataset.
The proposed multi-layered stacking model integrates the four best-performing boosting models as in the architecture of previously developed MLSELM [45] model. The model achieved high accuracy on all four phishing datasets using all features, both for imbalanced and balanced data.
Additionally, the hybrid feature selection approach followed by boosting-based multi-layered stacking model achieved high accuracy for both imbalanced and balanced data with reduced features. The hybrid feature selection approach identified the most informative features for detecting phishing attacks accurately, reducing the number of features used in the models. The results demonstrate that the proposed approach achieves high accuracy while using a reduced number of features. The proposed model is designed to achieve significant detection rate using hybrid feature selection and Multi-layered stacked ensemble model. Boosting focuses on reducing bias, while the stacking framework combines the strengths of different models. This combination helps mitigate the weaknesses of individual models, leading to improved generalization performance on unseen data. It is evaluated on different datasets to evaluate the behavior of the model with varying datasets. The model can be deployed as a web application or a browser extension which takes input as URL and source code of the websites and can result the web page as either legitimate or phishing.
The organization of the remaining of this paper is as follows. In section II, a review of literature on feature selection-based phishing detection and prevention techniques is presented. Section III outlines the architecture and functionality of the proposed model and covers the implementation of various phases of the proposed model, including the input dataset and the feature selection ensemble model. Section IV presents the experimental results with both the baseline and ensemble models as well as provides justifications, key findings, and limitations of the proposed model. Lastly, section V concludes the paper.

II. RELATED WORK
The feature selection techniques were classified into different categories as shown in Figure 3. such as 1. Filter, 2. Wrapper, 3. Embedded, 4. Hybrid, and 5. Evolutionary. The Information Gain (IG), Chi-square test(χ 2 ), Fisher's score, Correlation Coefficient(ρ), Variance Threshold, mean absolute difference (MAD), relief (reliefF, RreliefF) and Dispersion Ratio are knowing as Filter methods. Whereas, Forward Feature Selection (FFS), Backward Feature Elimination (BFE), Exhaustive Feature Selection (EFS), and Recursive Feature Elimination (RFE) are known as Wrapper methods. On the other hand, LASSO Regularization(L1), and Feature Importance are known as Embedded approaches. The combination of more than one feature selection approach is known as hybrid and evolutionary-based feature selection is a category of wrapper approach used to select optimal feature subset through evolutionary algorithms.
The below are some the proposed feature selection approaches belongs to either Filter, Wrapper or Embedded. These three types of approaches were applied in different combinations on four (D1, D2, D3, and D4) phishing datasets as described below. The filter method ReliefF [24], applied on UCI and selected 17 features. The ReliefF followed by Majority voting on multiple baseline classification approach obtained 95% accuracy. Furthermore, correlation feature selection (CFS) [18] selected 23 features from the UCI phishing dataset and CFS followed by statistical t-test with KNN obtained 97% of accuracy. Where, highly correlated features were considered as redundant and removed by CFS and the significance of features were tested through statistical t-test to obtain the most relevant features. Likewise, [14], applied four filter-based feature selection methods such as Correlation-Based Features Selection (CBFS), Information Gain (IG), Information Gain Ratio (Gain Ratio), and Chi-Square on UCI phishing dataset. Each FS approach selected 9 different features respectively. It is observed that the accuracy of baseline models, namely Naive Bayes (NB), Decision Tree (ID3 And C4.5), K-Nearest Neighbour (KNN), and Support Vector Machine is decreased and obtained accuracy within range of 94.01% to 94.17%. Moreover, in [15], selected 20 features from Mendeley [47] through IG and ReliefF approaches. The FS approaches followed by RF obtained 98.11% accuracy. In a similar way, Union of IG and relief using RF [28] with 20 features obtained 98.11% accuracy. In addition, Prince et al. [16] compared and analysed multiple feature selection methods: In addition to filter, embedded and wrapper approaches the hybrid FS approach also employed by some researchers to obtain optimal features. The combination of more than one FS approach of same category or different category is known as Hybrid FS approach. Zamir et al. [31] applied combination of filter (information gain, gain ratio, Relief-F) and wrapper (recursive feature elimination (RFE)) based FS approaches on UCI which obtained 27 features and the normalized 27 feature subset fed to the Principal Component Analysis (PCA) followed by Stacking (NN+RF+Bagging) obtained 97.4% accuracy.
Likewise, Moedjahedy et al. [29] applied three combinations of hybrid FS approaches. The first combination was predictive score correlation (PSC) and REF, next, maximal information coefficient correlation (MICC) and REF, and finally, spearman correlation (SC) and REF. From the results, it was observed that the third combination SC and REF using RF with 10 features obtained 97.6% of accuracy.
In addition, the hybrid feature ensemble [9] employed five FS approaches, namely Info Gain(IG), ANOVA, RFE, Reli-efF, and Fisher Score. The best performed three approaches IG, ANOVA and RFE are ensembled and achieved 97.51% of accuracy on UCI and 98.45% accuracy on Mendeley [47].
In recent times, evolutionary learning approaches gains attention by the researchers as another alternative approach to determine best feature subset. As a result, some of the evolutionary algorithms applied on UCI, Mendeley, Mendeley-small, and Mendeley-full phishing datasets. In which some of them are as follows. The Gravitational Search Algorithm(GSA) [30] with Random Forest(RF) model obtained 95.53% accuracy. The GSA selected 15 features from UCI dataset and found that its performance is better than other feature section methods, namely Correlation Feature Selection (CFS), Information Gain (IG), and Principal Component analysis (PCA). Likewise, the wrapper method with Genetic Algorithm [33] using DT classifier applied on UCI dataset and selected 20 best features. The performance of selected features evaluated through Nonlinear Regression based Harmony Search (NR-HS) (meta-heuristic nonlinear regression approach) and SVM. The accuracy of these two models were 92.8% and 91.83% respectively. Moreover, Laplacian Particle Swarm Optimization (LAPPSO) [34] [47]. datasets respectively.
In addition, fuzzy rough set (FRS) [36] selected 24 and 30 features respectively from UCI and Mendeley [47] phishing datasets. The FRS followed by RF obtained 93% and 95% of F-score. Moreover, differential evolution for feature selection with threshold mechanism (DEFSTH) [37] followed by Naïve Bayes classifier applied on on Mendeley-Full dataset and obtained 96.82% of accuracy.
Likewise, Binary Slap Swarm Optimization Algorithm (BSSA) [35] with transfer functions(TF) such as S-shaped, U-shaped, V-shaped, X-shaped, and Z-shaped TFs were applied on Mendeley(111 features) phishing dataset and selected 49 best feature subset among 111 features. From the results it is observed that the BSSA with X-shaped TF followed by KNN outperforms all other TFs with 95.07% accuracy. Similarly, some approaches other than filter, wrapper, embedded, hybrid, and evolutionary approaches applied to obtain optimal feature subset from phishing datasets. The Hybrid Ensemble Feature Selection (HEFS) [32], applied hybrid perturbation ensemble (i.e., data perturbation and function perturbation) followed by Cumulative Distribution Function gradient (CDF-g) for automatic feature cut-off rank identification approach to obtain final feature subset. The HEFS selected 10 features and HEFS followed by RF obtains 94.6% of accuracy. Likewise, Effective Neural Network Phishing Detection Model Based on Optimal Feature Selection (OFS-NN) [19] approach applied on UCI and obtained 96.75% of accuracy with 26 feature subset. Likewise, [20], applied feature validity value (FVV) index select the optimal features from UCI phishing dataset. The FVV obtained 23 features and FVV followed by NN obtained 94.5% of accuracy. Moreover, two FS approaches, namely Feature Selection by Omitting Redundant Features(FSOR) and Feature Selection by Filtering Method (FSFM) [21] applied on UCI dataset. The FSOR followed by RF with 22 features obtained 97.18% accuracy and FSFM followed by RF with 9 features obtained 95.21% accuracy. Furthermore, the eighteen common features of UCI and Mendeley [47] datasets were combined in [49] and selected 13 optimal features among 18 through Variance inflation factor (VIF) and P-Value feature analysis approaches. The RF on those 13 features achieved 93.2% of accuracy. Likewise, two feature selection approaches, namely consensus and majority voting [23] on UCI and Mendeley [47] phishing datasets. From Mendeley, 17 features were selected and obtained 98.17% of accuracy by consensus FS approach; 23 features were selected and obtained 98.63% of accuracy by majority voting FS. Likewise, from UCI, 9 features were selected and obtained 93.55% of accuracy by consensus approach and 13 features were selected and obtained 95.29% of accuracy by majority voting approach.
Majority of the existing works either used classical machine learning algorithms or ensemble algorithms (bagging and boosting) for the classification of algorithms. Some of the techniques also used feature selection algorithms such as filters or wrappers for identifying the relevant and significant features for the classification task. The proposed work uses different boosting algorithms for identifying the significant features using embedded method. The model also consists of multi layered stacked ensemble where stacked ensemble increases the model diversity and multi-layered structure enables hierarchical feature learning which learn different levels of abstractions from the data.

III. PROPOSED MODEL
The proposed work introduces a Boosting based Multi-layer stacked ensemble learning model (BMLSELM) to detect phishing websites. The model BMLSELM built based on MLSELM [45] using all boosting algorithms and it also uses hybrid feature selection method to select an optimal feature subset. It has three layers, as shown in figure 4, with all boosting algorithms. The first layer includes four estimators, namely XGBoost, LGBM, CatB, and AdaB. The second layer has three estimators, XGB, CatB, and AdaB, while XGB serves as the meta-learner in the final layer. Additionally, the hybrid feature selection method extracts essential features using three boosting models (XGB, CatB, and LGBM), finds feature ranking for all features through XGB, CatB, and LGBM, takes respective feature wise average for all three selected feature subset based on their feature ranks, and finally selects the K-topmost feature subset, which provides the highest accuracy, as presented in figure 5. The table 1 shows optimal percentage and number of K-topmost features from each phishing dataset.
The proposed approach involves four phases for evaluating phishing datasets. In the first phase, four phishing datasets were evaluated using five boosting and BMLSELM models. The second phase involved selecting the K-topmost features.  In the third phase, the unbalanced K-topmost features were evaluated using five boosting and BMLSELM models. Finally, the balanced K-topmost features were evaluated using five boosting and BMLSELM models as shown in figure 5.

A. DATASET
The proposed work was applied to four datasets, namely D1, D2, D3, and D4. D1 was collected from the UCI repository [46], while D2 was collected from Mendeley [47] and contains 48 features. D3 and D4 were also collected from Mendeley [48], with D3 containing 111 features and 58,645 instances, and D4 containing 111 features and 88,647 instances. Each dataset consists of two classes: phishing and legitimate. A detailed description of each dataset is provided in Table 1.
It should be noted that the UCI phishing dataset [46] and two variants of Mendeley [48] are imbalanced. As discussed in Section III-C, we applied a data re-sampling method to balance the datasets and improve the performance of our proposed model.

B. HYBRID FEATURE SELECTION APPROACH
The proposed approach for feature selection in this research involves the use of three boosting models, namely XGB, CatB, and LGBM. These models are used to extract essential features from the datasets under consideration. The feature importance for all the features is then computed separately using XGB, CatB, and LGBM. The average feature importance is then taken for each feature across the three selected feature subsets based on their feature importance scores as shown in the following equation. Where AVG F i is an average of i th feature importance (where i = [1, m]) when there are m features in a respective dataset, n = 3 (since, we employed three models such as XGB, CatB, and LGBM to obtain feature importance of each feature), RF j i is an importance of i th feature of j th model. This approach ensures that the most important features are selected, as they have highest score across all three boosting models.
After the average feature importance is computed, the K-topmost feature subset is selected to obtain the highest accuracy. The K-topmost feature subset is the set of features with the highest ranking and is chosen based on their relative importance. This hybrid approach helps to improve the accuracy of the proposed model by selecting only the most important features.
The stepwise approach for the selection of K-topmost features presented in Figure 5, and table 1 provides the optimal percentage of features selected from each phishing dataset, along with the relevant number of features based on the selected percentage. This approach ensures that the most relevant features are retained while minimizing the risk of overfitting.

C. DATA BALANCING
Imbalanced datasets can be addressed using data balancing techniques such as Random Under Sampling (RUS) and Random Over Sampling (ROS) [50]. In this study, we apply data balancing techniques to the K-topmost selected feature subsets of three datasets, namely D1, D3, and D4, which initially had imbalanced data.
For instance, the D1 dataset has 4898 legitimate instances and 6157 phishing instances. To balance this dataset, we use the ROS method, which randomly duplicates the instances of the minority class (legitimate in this case) and adds them to itself until the number of instances in the minority class is equal to the majority class (phishing in this case). This results in a balanced dataset with a total of 12314 instances, where each class has 6157 instances.
Similarly, in the D3 dataset, the phishing class is the minority class with 27998 instances, while the legitimate class is the majority class with 30647 instances. Using the ROS method, we duplicate the phishing class instances until we have 30647 instances, resulting in a balanced dataset with a total of 61294 instances.
Finally, in the D4 dataset, the legitimate class with 30647 instances is the minority class, while the phishing class with 58000 instances is the majority class. We duplicate the legitimate class instances until we have 58000 instances, resulting in a balanced dataset with a total of 116000 instances.
It is worth noting that the data balancing step was necessary to ensure that our models were trained on a balanced dataset,   which can improve their performance in detecting phishing attacks.

D. BMLSELM
The MLSELM based on boosting techniques utilized four boosting models, namely XGB, CatB, LGBM, and AdaB, out of the five available, as GB's performance was inadequate. Its architecture includes three layers, as depicted in Figure 4. The first layer integrates all four boosting models, while the second layer integrates three models except for AdaBoost. The last layer employs XGB as the meta-learner. Four phishing datasets, containing all features, were used as input to the BMLSELM and the five boosting models in the first phase, followed by the evaluation of the unbalanced K-topmost selected features of each dataset through BMLSELM and the five boosting models in the second phase. Finally, the balanced K-topmost selected features of each dataset were evaluated through BMLSELM as shown in Figure 5. The proposed model is designed to achieve significant detection rate using hybrid feature selection. It is evaluated on different datasets to evaluate the behavior of the model with varying datasets. The model can be deployed as a web application or VOLUME 11, 2023    a browser extension which takes input as URL and source code of the websites and can result the web page as either legitimate or phishing.

IV. EXPERIMENTATION RESULTS
In this study, we applied the proposed BMLSELM algorithm with five boosting based Machine Learning algorithms, including CatB, LGBM, GB, AdaB, and XGB, to four datasets listed in Table 1. The classification metrics used to evaluate the performance of the models include Precision, Recall, F-score, and Accuracy. In this study, we considered phishing instances as positive and legitimate instances as negative. The calculation of each metric was based on the following definitions: • P: Indicates total count of phishing instances • N: Indicates total count of legitimate instances • T N :The predicted count of legitimate instances that are correctly classified as legitimate by the model.
• F N : The predicted count of phishing instances that are incorrectly classified as legitimate by the model.
• T P :The predicted count of phishing instances that are correctly classified as phishing by the model.
• F P :The predicted count of legitimate instances that are incorrectly classified as legitimate by the model. The calculation of each metric is shown below: We evaluated the performance of BMLSELM and compared it with five classification models on four datasets (D1,D2,D3, and D4) with all features, as well as on balanced and unbalanced K-topmost features, as described in section IV-C. Additionally, we conducted a comparative analysis of the results of BMLSELM on four phishing datasets with the existing literature, which is presented in section IV-D.  also compared with proposed BMLSELM which can be seen in Table 2. From the results, it is observed that XGB outperformed other boosting algorithms across all datasets. Also, the results demonstrate that the proposed BMLSELM has achieved significant performance in accuracy and MCC across all datasets compared to XGBoost algorithm.

B. EXPERIMENT 2: EVALUATION OF BMLSELM ACROSS ALL DATASETS WITH FEATURE SELECTION
In this section, we apply feature selection prior to the model training and dataset with the selected features are fed to the proposed model for the classification. The embedding method with features selected from boosting algorithms through feature importance is applied across all datasets. The top k features from the boosting algorithms are choosen for the final features selection. From the experimental analysis, k is chosen as 20 for D1, 33 for D2, 96 for D3, 83 for D4 datasets. The results with boosting algorithms and BMLSELM on D1 dataset is shown in Table 3. From the results, it is clearly seen that BMLSELM outperformed other boosting algorithms with an accuracy and MCC of 97.73 and 95.44 with imbalanced data. Also, the proposed model performed better than other boosting algorithms when balanced data is fed to the model. But, the proposed model did perform well when the imbalanced data is given as input compared to the balanced data. Note that, these results from the proposed model includes only 20 features from balanced and imbalanced data.
As D2 is already balanced, we conducted the experiment with feature selection on the balanced data. The results with proposed model and other classifiers is shown in Table 4. From the results, it is demonstrated that the proposed model BMLSELM with 33 selected features achieved significant performance with an accuracy of 98.8 and MCC of 97.6. It is also observed that XGB achieved the similar performance compared to BMLSELM but with slightly lower in TPR.
Similarly, the traditional boosting algorithms and the proposed model is applied on D3 and D4 datasets with 96 and 83 selected features respectively. The results with D3 dataset is shown in Table 5. From the results, it is observed that BMLMSELM performed better with and without data balance compared to other boosting algorithms with an accuracy of 96.18%, MCC of 92.36. The results with D4 dataset is given in Table 6. From the results, it is observed that BMLSELM achieved an accuracy of 97.33 and MCC of 94.13 with imbalanced data and an accuracy of 97.88 and MCC of 95.77 with balanced data.

C. THE COMPARISON OF BOOSTING ALGORITHMS AND MLSELM WITH BMLSELM
In this section, we compare our proposed work with our existing work MLSELM as they are experimented on same datasets and use stacking mechanism. The comparison results are shown in Table 8    Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. Similarly, on D4, MLSELM achieved an accuracy of 98.43% and 97.41% under balanced and unbalanced categories, respectively, with all features, while BMLSELM achieved 98.13% and 97.33% under balanced and unbalanced categories, respectively. The performance difference between the two models on D4 was negligible at 0.3% and 0.08%, respectively. However, BMLSELM with 83 reduced features achieved a significant performance with an accuracy of 97.88% on balanced data and 97.33% with unbalanced data.

D. THE COMPARISON OF BMLSELM WITH EXISTING LITERATURE
In this section, we compare various existing works with our proposed work that used same datasets for their experimentation. The comparison results with D1 dataset is given in Table 10. From the table, it is clearly visible that the proposed model achieved better performance than existing works with an accuracy of 97.87 with all features and 97.73% with only 20 features. The second comparison results with D2 dataset is shown in Table 9 Finally, the comparison results with D4 dataset in Table 11 shows that BMLSELM performed lower compared to our earlier work but it has achieved significant performance of accuracy 97.88% with only 83 features compared to 98.43% with 111 features.

V. CONCLUSION
In this paper, we proposed a feature selection based stacking model (BMLSELM)that uses various boosting algorithms to identify relevant features. Also, the boosting algorithms are used to generate multi stacking model with estimators at different layers to achieve significant performance. BMLSELM is applied on D1,D2,D3 and D4 datasets to evaluate the performance of the model across different datasets. The model achieved significant performance with D1 to D4 datasets in two cases i.e. datasets with feature selection and without feature selection. The model is experimented with both balanced and imbalanced data. The experimental results of BMLSELM with D1-D4 datasets demonstrates that the model achieved an significant accuracy of 97.4 (D1 with 20 features), 98.80(D2 with 33 features), 96.18(D3 with 96 features) and 97.88(D4 with 83 features). Finally, the model is compared with baseline models where it outperformed the existing models with significant difference across different metrics. In the future work, we would like to use different feature selection ensembles, clustering algorithms and feature engineering techniques for the hidden feature generation that helps in improving the detection accuracy of the model.