A Meta-Analysis Survey on the Usage of Meta-Heuristic Algorithms for Feature Selection on High-Dimensional Datasets

Feature selection (FS) using meta-heuristic algorithms on high-dimensional datasets (HDD) is becoming more prevalent due to the continuous advancement in data mining. However, the difficulty in identifying the threshold of features in a dataset to be categorised as HDD remains an issue due to the different schools of thought on this matter. Therefore, this survey intended to determine the threshold for a number of features to be HDD, and subsequently identify the trend or potential FS method for HDD and the most preferred meta-heuristic algorithms and classifiers for both wrapper-based and filter-based FS methods to analyse HDD. This study performed an extensive systematic literature review by implementing the PRISMA guidelines on 62 research articles that were published between 2016 to 2021. This survey proposed a novel grouping technique called literal grouping and data grouping (LGDG) to accurately group the chosen articles based on HDD. The LGDG method serves as a guide for other researchers who intend to perform FS research related to HDD. Literal grouping refers to searching for selected papers using specific keywords, like HDD in this case. While data grouping compares the number of features in datasets towards the threshold, which is set at 2,000 features by the majority. Based on the analyses of all the LGDG groupings, the filter-based FS method gained more attention in recent years with competent results no less than wrapper-based, especially on HDD. Besides that, Moth Flame Optimisation works well in filter-based methods, whereas Cuckoo Optimisation Algorithm works well in wrapper-based, while Whale Optimisation Algorithm works well in both FS methods. As for the classifier’s preferences, SVM, DT, and NB are preferred by the filter-based, while KNN is preferred by the wrapper-based method. It can be recommended that reviewing other aspects such as multi-objective FS on HDD and including more FS methods could be included in future studies as an extension to this survey.


I. INTRODUCTION
The expansion of science and technology in the present era has resulted in a tremendous increase in data size and dimensionality. A high-dimensional dataset (HDD) is a collection of data with a large number of features [1]. However, HDD leads to insurmountable memory restrictions, expensive training, and computation costs, resulting in the ''curse of dimensionality'' [1], [2]. Consequently, it is necessary to execute The associate editor coordinating the review of this manuscript and approving it for publication was Jerry Chun- Wei Lin . dimensionality reduction by adopting feature selection (FS) to improve classification performance. FS is a process of identifying the most significant features [3]. It can be classified into wrapper-based, embedded-based, and filter-based methods [3]. Each FS method has its advantages and weaknesses; therefore, researchers often integrate effective metaheuristic algorithms to improve the performance. According to previous studies, the meta-heuristic optimisation algorithms can simplify optimisation problems [4], [5], classification [6], [7], and FS [8], [9]. Some examples of meta-heuristic algorithms include Ant Lion Optimiser (ALO) [10], Particle Swarm Optimisation (PSO) [11], and Whale Optimisation Algorithm (WOA) [12]. Due to the presence of many available methodologies, this study conducted a systematic literature review of the most recent works on the subject from 2016 to 2021. This survey intends to determine the threshold of the number of features in a dataset in order to be categorised as a HDD. Besides that, this study also aims to identify the trend or potential FS method for HDDs, along with the most preferred meta-heuristic algorithms and classifiers when dealing with HDDs for both wrapper-based and filter-based FS methods. Several digital libraries and databases were used in this study to gather research articles, namely IEEE Xplore, ScienceDirect, Scopus, Springer, Taylor & Francis, Emerald Insight, and ACM.
The remaining sections of this systematic literature review are organised as follows. Section II provides an overview and the most important definitions used in FS using meta-heuristic algorithms on HDDs. Section III discusses the research questions and selection criteria. Section IV contains information about data extraction and analysis of chosen articles. Section V groups the selected articles into studies employing HDDs and discusses the scale of datasets, metaheuristic algorithms used, FS methods applied, yearly publication growth, and classifier preferences. Section VI focuses on analysing HDDs by comparing the dataset distribution of each FS method. Section VII presents an overall discussion based on the content covered in the research questions. Finally, section VIII concludes and summarises the entire survey and provides suggestions for future work.

II. BACKGROUND
This section provides a concise summary of FS, metaheuristic algorithms, and HDDs.
For the past decades, data mining has remained a hot research topic for researchers from various domains. Data mining is a broad field of data science that finds patterns and characteristics in massive amounts of data. It includes regression, clustering, detection of an anomaly, and classification [13]. Data classification entails assigning a class label of an instance based on a previously trained model [14]. In recent years, classification has relied heavily on FS which refers to the selection of the most meaningful inputs [15]. Omitting irrelevant and non-essential features in HDDs can also be defined as FS [16], [17], [18]. FS intends to reduce time complexity and increase predictive precisions [16], [17], [18]. Therefore, this data pre-processing step is very important to generate compact and quality datasets for classification purposes. FS aims to choose suitable features for the classification model to achieve higher accuracy.
Researchers often deal with different kinds of datasets in feature selection. Datasets are interpreted as a matrix, with rows representing the instances and columns representing the features [1]. Datasets with many features are categorised as HDDs. High dimensionality leads to unmanageable memory constraints and high training computing costs, called the ''curse of dimensionality'' [1], [2]. Thus, feature selection has two key competing goals: (1) optimising classification efficiency and (2) minimising feature numbers to solve the ''curse of dimensionality'' [19]. Moreover, feature selection is perceived as a multi-objective challenge to balance the trade-off between the two opposing priorities. Hence, dimensionality reduction needs to be performed to reduce the number of features without compromising the retrieval of useful information from HDD to ensure classification performance.
Wrapper-based, embedded-based, and filter-based are the three types of feature selection methods [3]. Wrapper-based feature selection uses the strength of base classifiers to determine the best features in a dataset. Contrarily, embeddedbased feature selection occurs during model training in the machine learning algorithm [20]. Both wrapper-based and embedded-based methods result in higher time complexity due to the intervention of classifiers in the feature selection process. Meanwhile, the filter-based feature selection method relies on the mutual information in a dataset. It ranks the features by generating a score for each without using the classification model [20]. Moreover, wrapper-based methods are computationally less friendly for HDDs. Embeddedbased methods require predictive models, whereas filter methods can be combined with any predictive model to easily integrate HDDs [2]. Among the three methods, filterbased feature selection selects a subset of features without using any learning algorithm, thus, it is relatively faster than the wrapper-based method and is feasible in HDDs. Moreover, filter-based methods possess low complexity among the feature selection methods and are compatible with diverse datasets, including HDDs [20]. Since wrapper-based FS obtains high classification accuracy and filter-based FS maintains lower time complexity, this review only involves wrapper-based and filter-based FS methods. The study also investigated their performance when dealing with HDDs.

III. SYSTEMATIC LITERATURE REVIEW
The formulated research questions of this systematic literature review that needed to be answered are as follow: 1) What is the threshold for the number of features categorised as HDDs? 2) What is the current trend or potential FS method for HDDs? 3) When dealing with HDDs, which meta-heuristic algorithms do the researchers prefer for each FS method? 4) When dealing with HDDs, which classifiers do the researchers prefer for each FS method? Since various publications on FS using meta-heuristics algorithms on HDDs were retrieved, the following inclusion and exclusion criteria were employed to ensure that the search was oriented and relevant.

1) INCLUSION CRITERIA
a) The research articles must be published between 2016 and 2021. b) The research articles must be published in peerreviewed journals. c) The research articles must be written in English. d) Only research articles written as technical papers are included. e) The research articles with more than 1 dataset.

C. DOCUMENT AND BIBLIOGRAPHY MANAGEMENT
This systematic literature review applies the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart [86] as depicted in Fig. 2. PRISMA is an evidence-based method for reporting systematic reviews and meta-analyses, which includes identifying resources, eligibility-checking resources, and indexing resources. Table 1 summarises the number of research articles discovered. During the stage of identifying resources, 533 publications were identified in 7 databases. The number of resources obtained with specific keywords used in the searching query of advanced-search function were less but focused. This step ensured that the documents found were relevant to the search criteria, minimizing the number of less relevant or irrelevant records.
As for the stage of resource eligibility-checking, 433 publications were excluded after going through the abstract. Duplicate entries were also eliminated at this stage. Of the 100 eligible publications, 62 were chosen after reading the full-text content. Mendeley Desktop was used to organise and manage all the bibliographic details and references.

IV. EXTRACTION AND ANALYSIS OF DATA
This section summarises the content extraction of the 62 selected publications. The publication title, methodology used, related findings, and FS methods are tabulated in Table 2 and are sorted by most recent year first, followed by the publication title arranged alphabetically. The results from each research work in Table 2 were achieved using their own experimental settings. Each paper was denoted with a key number, as indicated in the rightmost column.
Based on Table 2, eight main issues motivated the studies. Those issues are summarised in Table 3. Although the issues are undertaken by each study varied, the findings were somehow similar, where most of the articles achieved their goals and proved that the implementation of proposed methods outperformed other methods through experimental results, either in terms of classification accuracy, number of selected features, execution time, convergence, or fitness values.
Over the years, the publications related to FS using metaheuristic algorithms increased as depicted in Fig. 3. Besides that, based on the FS methods used in these 62 articles, the wrapper-based FS method was most preferred by researchers as it appeared in majority of articles, 80.6% (50 publications), while the filter-based FS method was only reported in 9 publications (14.5%), leaving the hybrid FS method the least preferable with 4.8% (3 publications).
From the 62 selected articles, 10 most active researchers were identified and tabulated in Table 4. They have published remarkable research articles in the related field with at least 3 publications from 2016 to 2021. These researchers  contributed to the field of FS with meta-heuristic algorithms by constantly proving the competency of the proposed methods with new improvements each time. The number of publications was calculated by including the names of authors regardless of whether they are the first or co-authors.

V. STUDIES USING HDD
The purpose of this section is to identify true HDD. Not all the 62 selected articles listed in Table 2 dealt with HDD. Thus, a well-planned grouping technique was necessary to accurately perform the HDD analysis with ease. This study aims to introduce a novel grouping technique called literal grouping and data grouping (LGDG). This technique helps to group the selected papers into studies using HDD. The LGDG framework is depicted in Fig. 4.

A. LITERAL GROUPING
Literal grouping represents the searching of selected articles with HDD keywords. All the 62 selected articles were grouped using specific keywords. Those with the keywords were categorised as research articles employing HDD.

1) HDD KEYWORDS
The keywords used in searching for research article content were ''HDD,'' ''high dimensional,'' or ''high-dimensional.'' VOLUME 10, 2022   The Boolean operator 'OR' was used to join or exclude terms from a search, resulting in more targeted and productive results. It also reduced the time and effort by lowering the number of irrelevant matches. The keywords searched are case insensitive, hence, the search results were not affected by character case. The 3 fields where the keywords were used include title, issue, and dataset description. For instance, K02's research title was ''A two-stage hybrid ACO for highdimensional feature selection,'' and its issue was to select the optimum feature subset in HDDs and avoid local optimum. The datasets used in K02 were described as 11 highdimensional low-sample datasets. Since the 3 fields of title, issue, and dataset descriptions contained the searched keywords, it was categorised as a study utilising HDD. A complete list of studies utilising HDD based on literal grouping is discussed in the following subsection.

2) RESEARCH STUDIES BASED ON LITERAL GROUPING
Of the 62 selected articles, 19 fell under research topics concerning HDD based on literal grouping. The meta-heuristic algorithm, FS methods, datasets, the average number of features, and classifiers used in these 19 articles are listed in Table 5. The number of datasets used ranged from 6 to 30, while the average number of features ranged between 39 to 10,408 features. As for 10 of the articles, the dataset was retrieved from the UC Irvine Machine Learning Repository (UCI ML) [88], 2 from the Arizona State University repository (ASU) [89], 1 from Kent Ridge Bio-medical Dataset (KRBD), and 7 from unspecific sources.
The top 3 studies with the highest average number of features (Table 5) implemented the Cuckoo Optimisation     (Table 5).
As for the FS methods for the 19 articles based on literal grouping, 84.2% of the articles (16 publications) employed the wrapper-based FS methods while 15.8% (3 publications) used the filter-based FS methods.
From 2016 to 2021, the publications on FS using metaheuristic algorithms for filter-based and wrapper-based FS methods, specifically those using HDD based on literal grouping manifested an increasing trend as depicted in Fig. 5.
According to Fig. 6, the classifier preference of the 19 HDD research articles based on literal grouping indicated that the filter-based FS methods were more diversified as they can be integrated with SVM (42.9%), RF (14.3%), NB (28.6%), and KNN (14.3%) classifiers, whereas the wrapper-based only used KNN (93.8%) and MLP (6.3%).
In the next section, data grouping is performed to ensure that the journal articles dealt with actual HDD. Although not all the journal articles mentioned HDD research, the size of their dataset was described as HDD. Thus, to categorise these articles as HDD study-related, the threshold of the number of features for a dataset was identified using the minimum HDD feature numbers in all 19 literal-grouped HDD articles.

B. DATA GROUPING
Data grouping identifies articles that are related to HDD even though the articles did not state that the datasets used were HDD, by comparing the number of features in the datasets towards the threshold obtained from literal grouping. For example, if most of the HDD articles from literal grouping have at least x features in the datasets used, thus, x is used as the threshold of the minimum feature number. Therefore, the articles that satisfied the threshold were categorised as studies that employed HDD based on data grouping.
The first step in data grouping was determining the thresholds used in all 19 literal-grouped HDD journal articles identified in the previous subsection. In other words, the threshold must be obtained to perform data grouping for all the 62 selected articles.

1) HDD THRESHOLD
The reason to adopt the threshold for the number of features in HDD research is the existence of different schools of thought among the researchers listed in Table 5. Although 19 articles were identified as HDD studies by literal grouping, the dataset dimension varied. For instance, K40 treated the dataset with at least 30 features as HDD, while K62 views the dataset with at least 2308 features as HDD. Therefore, it is advisable to determine the threshold for the number of features adopted in these studies. In this section, the method used to identify the threshold was relatively straightforward, where the 19 literal-grouped HDD papers were read and the minimum number of features used in the datasets were recorded, as depicted in Table 6.
According to Table 6, the minimum features for all datasets used in certain cases were different from the minimum features for HDD, such as K03, K14, K17, K26, K28, K31, K40, K41, and K55. The variation is due to the mixed usage of both HDD and non-HDD in the studies. For instance, K03 had 12 datasets with a minimum of 9 features. However, K03 also specified that only 2 datasets were HDD, where the minimum number of features for the 2 HDDs was 2000. Thus, the common threshold for the number of features to be categorised as HDD is 2000 based on K03. The minimum number of features for HDD are used as thresholds representing each study's point of view.
Based on the last column in Table 6, there are 8 out of the 19 HDD articles (42.11%), having datasets with at least 2000 features acknowledged as HDD. Another 5 articles (26.32%) mentioned that the datasets with at least 2308 features are categorised as HDD. Meanwhile, only 2 articles (10.53%) required 1024 features like the number of thresholds. The remaining articles were the minority with only 617, 325, 265, and 30 features in their HDD studies. Therefore, the thresholds from the 19 literal-grouped HDD articles VOLUME 10, 2022  concluded that the threshold for the number of features is at least 2000 features, and this grouping process is called data grouping.
The threshold of 2000 is used across this review article while analysing whether the research studies covered are HDD research based on data grouping. For instance, since K02 has more than 2000 features in the datasets, it is categorised as a study employing HDD based on data grouping. The 3 situations to apply data grouping on all 62 selected articles are: a) Data grouping (1-match), b) Data grouping (mean-match), and c) Data grouping (all-match).
A complete list of research articles using HDD based on the 3 situations of data grouping is discussed in the following subsections. This technique holds the highest chance of categorising more articles as HDD studies. For instance, if a study consists of 3 datasets (500, 1200, and 2300 features) and obtains a 1-match (the dataset with 2300 features hits the threshold), it can be categorised as a HDD study as it has at least 1 dataset matching the threshold (2300 >= 2000).
Of the 62 articles, 26 were identified as journal articles utilising HDD based on data grouping (1-match). The metaheuristic algorithm, FS methods, datasets, and the average number of features and classifiers used in these 26 articles are summarised in Table 7. The number of datasets used ranged from 4 to 30, while the average number of features was from 384 to 12,852 on average. Whereas, 15 of the 26 articles retrieved the datasets from the UCI ML repository, 3 from the ASU repository, 1 from the KRBD repository, 1 from the Kaggle repository [90], 1 from the Keel repository [91], and 9 from unspecific sources.
The top 3 research articles with the highest average number of features from Table 7 implemented WOA (K43), Cuckoo Optimisation Algorithm (K04), and MFOA (K07). These algorithms can perform well in FS for HDD based on 1-match data grouping. Furthermore, PSO (K31, K32, K45, K62), GWO (K08, K13, K55), and WOA (K11, K43, K48) were the preferable algorithms according to Table 7. Several other meta-heuristic algorithms that appeared more than   As for the classifier preferences of the 26 HDD studies based on data grouping (1-match) (Fig. 8), the filter-based FS methods indicated higher adaptability as they can integrate with 5 well-known classifiers namely SVM (40%), RF (10%), NB (30%), KNN (10%), and DT (10%). Meanwhile, the wrapper-based FS methods had only 3 classifier preferences, including KNN (86.95%), DT (8.7%), and MLP (4.35%). Of the 62 publications, 14 were identified as research using HDD based on data grouping (mean-match). The metaheuristic algorithm, FS methods, datasets, the average number of features, and classifiers used in these 14 publications are summarised in Table 8 Based on Table 8, on average, the number of datasets used ranged from 4 to 30, while the number of features ranged from 3,550 to 12,852. Four of the 14 publications retrieved their datasets from the UCI ML repository, 2 from the ASU repository, 1 from the KRBD repository, and 8 from unspecific sources.
Besides, the wrapper-based FS methods were employed in 11 publications (78.6%), while the filter-based FS methods were utilised in 3 publications (21.4%) from the 14 HDD studies based on the 14 HDD works in data grouping (mean-match).
The number of publications on FS using meta-heuristic algorithms for filter-based and wrapper-based FS methods, specifically those under HDD studies based on data grouping (mean-match) demonstrated an increasing trend from 2016 to 2021 (Fig. 9).  The trendline gradient for filter-based FS methods was almost as steep as wrapper-based FS methods indicating that this data grouping situation (mean-match) depicts the actual growth of preferences among researchers to adopt filter-based methods in their studies consisting HDD on average.
As for the classifier preference of the 14 HDD studies based on data grouping (mean-match) as depicted in Fig. 10, the filter-based FS methods illustrated more adaptability than that of the wrapper-based as they can integrate with 4 classifiers such as SVM (42.86%), RF (14.3%), NB (28.6%), and DT (14.3%). Whereas, wrapper-based FS methods yielded lesser classifier preference as it only integrated with 3 classifiers, namely 83.33% of KNN and 8.33% for both DT and MLP.

4) STUDIES BASED ON DATA GROUPING (ALL-MATCH)
The all-match data grouping was applied to all the 62 selected journal articles in this subsection. All-match refers to the grouping technique that selects articles employing HDD if and only if every dataset used matched the threshold of 2000 features. For instance, if a study consists of 3 datasets (with 2000, 5000, and 7000 features) and obtained an allmatch, it can be categorised as HDD research since all the datasets matched the threshold of 2000 features. Therefore, this data grouping technique has the narrowest chance of categorising articles as HDD studies as it requires every dataset to have more than 2000 features.
Eight of the 62 publications were categorised as studies utilising HDD based on the all-match data grouping. Table 9 lists the meta-heuristic algorithm, FS methods, datasets, the average number of features, and classifiers used in these 8 publications. The number of datasets used ranged from 4 to 13, while the average number of features ranges from 6,981 to 12,852. One of the 8 publications retrieved the datasets from the KRBD repository and 7 from unspecific sources.
The top 3 average number of features were identified from publications K43, K07, and K48 employed WOA (in K43 and K48) and QMFOA (in K07), indicating the superior capability of these two meta-heuristic algorithms to solve FS problems in colossal HDD. Besides that, the authors of the top two publications (K43 and K07) employed the filter-based FS methods over wrapper-based, indicating that the filter-based method was preferred for datasets with extremely high dimensionality. The high preferences for WOA proved its capability to work well in both wrapper-based and filterbased FS methods, especially where the filter-based FS methods from K42, with an average number of features as high as 12,852 being at the top of the list for every data grouping, including 1-match, mean-match, and all-match.
Besides, the wrapper-based FS methods were employed in the majority of publications (6 publications, 75%), while the filter-based FS methods were only utilised in 2 publications (25%) based on the 8 HDD works in data grouping (all-match). Fig. 11 demonstrated an increase in the number of publications on FS using meta-heuristic algorithms for filterbased and wrapper-based FS methods, specifically those under HDD studies based on data grouping (all-match) from 2016 to 2021. The trendline for filter-based FS methods was as steep as the wrapper-based FS methods. This data grouping technique (all-match) demonstrated the preferences of researchers on adopting filter-based and wrapper-based FS methods are at par. Based on Fig. 12, filter-based FS methods demonstrated adaptability to integrating 3 classifiers, SVM (50%), NB (25%), and DT (25%) compared to the wrapper-based methods as they can integrate with 3 classifiers such as SVM (50%), NB (25%), and DT (25%) classifiers. Whereas, wrapper-based FS methods were only able to integrate KNN (83.33%) and MLP (16.67%).

VI. HDD ANALYSIS
In this section, HDD analysis was performed based on 4 LGDG grouping techniques introduced in Section V. It included 19 publications by literal grouping, 26 papers by data grouping (1-match), 14 papers by data grouping (meanmatch), and 8 papers by data grouping (all-match). The detailed HDD analysis of the journal articles that were filtered as studies employing HDD using LGDG grouping techniques in the previous section was discussed in this section.

A. LITERAL GROUPING
This subsection analyses the datasets used by 19 HDD studies based on literal grouping. This process was performed by counting the number of datasets that fall under every specific range of feature number. Table 10 lists the details of dataset distribution. For instance, K01 has 1 dataset with 301-500 features and 5 datasets with greater or equal to 2,000 features, hence, the value in column '301-500 (f)' was 1, and the value in column '>=2000 (f)' was 5, while all the other columns having the value of 0.
Generally, the column of '>=2,000 (f)' would obtain a higher count of datasets because it indicates that the research has more datasets with larger numbers of features. However, of the total 298 datasets in all the 19 publications (Table 10) based on literal grouping, only 152 datasets (51.01%) reached the threshold of 2,000 features while 146 datasets (48.99%) did not reach the threshold of 2,000 features. Therefore, only 51.01% of the datasets in literal grouping were qualified to be categorised as HDD.
Besides that, of the 234 datasets in wrapper-based HDD studies, 109 (46.58%) did not reach the threshold of 2,000 features, whereas 125 (53.42%) reached the threshold. In other words, the percentage of datasets qualified to be HDD in wrapper-based HDD studies was 53.42%. Fig. 13 illustrates the data distribution for wrapper-based HDD studies based on literal grouping.
On the other hand, there are 64 datasets in filter-based HDD studies, where 37 datasets (57.81%) did not reach the threshold of 2,000 features and 27 (42.19%) reached the threshold. In short, the percentage of datasets qualified to be HDD in filter-based HDD research based on literal grouping was 42.19% (Fig. 14).

B. DATA GROUPING (1-MATCH)
In this subsection, the datasets used by the 26 HDD studies based on data grouping (1-match) were analysed. The number of datasets with a specific range of features was counted in  this process. Table 11 lists the details of dataset distribution. Based on the table, of the 404 datasets, 231 (57.18%) datasets did not reach the 2,000 features threshold, while 173 (42.82%) datasets reached the threshold. This analysis indicated that only 42.82% of the datasets from the 26 publications in data grouping (1-match) qualified to be HDD. This percentage is relatively low.
Furthermore, of the 336 datasets in the wrapper-based HDD studies, 194 (57.74%) did not reach the threshold of 2,000 features, whereas 142 (42.26%) satisfied the threshold. Hence, only 42.26% if the datasets qualified to be HDD in the wrapper-based HDD studies based on data grouping (1-match) (Fig. 15).
Of the 68 datasets in the filter-based HDD studies, only 31 (45.59%) reached the threshold, while 37 (54.41%) did not. Hence, only 45.59% of the datasets qualified to be HDD in the filter-based HDD studies based on data grouping (1-match) (Fig. 16).

C. DATA GROUPING (MEAN-MATCH)
This subsection analyses the datasets used by 14 HDD studies based on data grouping (mean-match). This process determined the number of datasets with specific ranges for the number of features. The details of dataset distribution are tabulated in Table 12. A majority of the datasets, 135 (79.88%) reached the threshold of 2,000 features, while 34 (20.12%) did not. Hence, 79.88% of the datasets qualified to be HDD based on data grouping (mean-match), higher than the literal grouping and data grouping (1-match).
Of the 146 datasets in wrapper-based HDD studies, 113 of them (77.4%) reach the threshold of 2,000 features, while 33 (22.6%) did not. Hence, 77.4% of the datasets in the wrapper-based HDD studies based on data grouping (meanmatch) qualified to be HDD (Fig. 17).
Only 1 (4.35%) of the filter-based studies did not reach the threshold, whereas 22 (95.65%) reached the threshold of 2,000 features. According to Fig. 18, 95.65% of the datasets VOLUME 10, 2022  qualified to be HDD in filter-based HDD research based on data grouping (mean-match), contributing 18.25% higher than the wrapper-based.

D. DATA GROUPING (ALL-MATCH)
This subsection analyses the datasets used by 8 HDD studies based on data grouping (all-match) by determining the number of datasets with specific features range (Table 13).
According to Table 13, all of the 79 datasets (100%) reached the threshold of 2,000 features. It is a predictable outcome since the grouping technique used was all-match, where all datasets were set to reach the threshold. It is also the most convincing way of data grouping, whereby every dataset used in all the studies were HDD. Therefore, the datasets in both the filter-based (17 datasets) and wrapper-based (62 datasets) studies were accepted as HDD, as depicted in Fig. 19 and Fig. 20, respectively.

VII. DISCUSSIONS
This section discusses the trend or potential FS methods in dealing with HDD. The percentage of HDD versus non-HDD in all grouping techniques is listed in Table 14. The values under the four horizontal categories (All FS, wrapper, filter, and hybrid methods) are represented by percentage (%), with the summation of the percentage of non-HDD and HDD should be 100% for each category by each grouping.
Based on Table 14, 82.09% of the non-HDD and 17.91% of the HDD for All FS method represented all the 62 publications. However, not all 62 publications reviewed were   HDD studies, only 17.91% were HDD studies. Studies that employed the hybrid FS methods did not deal with HDD. The wrapper-based and filter-based FS methods measured 17.21% and 27.93% in HDD studies, indicating a difference of 10.72% between both methods. This percentage suggested that studies with filter-based FS methods focused more on HDD compared to wrapper-based FS methods (Fig. 21).
For literal grouping, wrapper-based FS methods performed better with 53.42% of the studies involving HDD, while 46.58% of studies involved non-HDD. Of which, 42.19% of the HDDs used filter-based FS methods. Both the wrapperand filter-based methods indicated an increment compared to the previous grouping of all selected publications. This observation implied that the literal grouping successfully differentiated studies employing HDD, as presented in Fig. 22.
As for 1-match data grouping, similar readings were determined compared to literal grouping. The filter-based FS methods outperformed the wrapper-based (42.26%) method with more HDD (45.59%). Despite the slight difference between the two methods, they presented better numbers than those without LGDG grouping. It also indicated that this data grouping method effectively differentiated HDD studies ( Fig. 22 (b)).
Meanwhile, the mean-match data grouping adopted the average number of features in the datasets used. It obtained the most accurate numbers compared to other grouping techniques, with an average of 79.88% HDD studies. Of the 14 studies which employed HDD, 79.88% of the datasets had  greater than or equal to 2,000 features per dataset. Studies employing the wrapper-based FS methods counted 77.40% HDD, while filter-based FS methods indicated that 95.65% of the studies utilized HDD (Fig. 22 (c)).
Finally, the all-match data grouping, being the narrowest grouping technique, revealed that 100% of the studies which employed wrapper-based and filter-based FS methods reached the threshold. The datasets used in the 8 studies were extremely high in dimension, and therefore, the group measured 100% HDD regardless of the FS methods ( Fig. 22 (d)). Table 15 presents the percentage of each FS method preference in all grouping techniques. It can be concluded that wrapper-based FS methods were the most preferred compared to the filter-based in the studies reviewed in this survey.   Specifically, K48 pointed out that wrapper-based FS methods are increasingly being employed in place of filter-based FS methods with the expansion of data mining techniques in many sectors [79]. This idea was supported by 50 of the   62 studies reviewed in this survey. HDD studies based on data groupings such as K02 and K44 stated that filter-based FS methods might overlook feature dependencies and preserve redundant or irrelevant features due to the absence of machine learning algorithms in FS [22], [34]. Furthermore, K02, K13, K44, and K62 also suggested that filter-based FS methods obtained low classification accuracy and consume lesser computational cost, unlike wrapper-based methods which need to build learning models to evaluate each selected feature [22], [34], [48], [71]. Hence, meta-heuristic algorithms are widely used in wrapper-based FS to tackle the computational cost issue when dealing with HDD.
Undeniably, all 62 publications proved that meta-heuristics algorithms were helpful in the FS of HDD, as they can help alleviate the computational load of the wrapper-based FS methods. But at the same time, many studies overlooked an essential factor: filter-based FS methods can also provide promising outcomes in the FS of HDD. The studies reviewed in this survey revealed that filter-based FS methods can also achieve high accuracy through experimental results, no less than wrapper-based FS methods. Besides having filter-based FS methods integrated with powerful metaheuristic algorithms, lesser computational costs, excellent FS, and potentially good classification accuracy were also desirable. Studies utilizing HDD based on LGDG grouping techniques such as K01, K07, K31, and K43 accomplished excellent outcomes from experiments [20], [26], [63], [67].
The graphical presentation of Table 15 is shown in Fig. 23. According to Fig. 23, the ratio of filter-based to wrapperbased methods exhibited a significant increase based on the LGDG grouping techniques.
As illustrated in Fig. 23, the ratio of filter-based to wrapperbased increased when grouping techniques were applied. The tendency of using filter-based FS methods increased especially when the grouping techniques used were of data grouping (mean-match) and data grouping (all-match). Both these grouping techniques are comparatively more accurate because their HDD versus non-HDD was genuinely higher (HDD in mean-match and all-match was 79.88% and 100%, respectively, refer to Table 14 and Fig. 22). Therefore, based on Table 14 and Table 15, the filter-based FS methods demonstrated an invisible up-growing trendline. In short, the number of studies that considered using filter-based methods to deal with HDD has increased.
Without applying LGDG grouping, only 14.5% of the studies from all 62 papers utilised filter-based methods. However, based on the values in mean-match and all-match data grouping in Table 15, filter-based FS methods acquired 21.4% and 25% preferences in HDD studies. Although the filter-based FS method is not commonly used as the wrapper-based FS method, the acceptance among researchers to adopt the filter-based FS methods in HDD has increased.
In short, both wrapper-and filter-based FS methods have their advantages. Both the methods acquired outstanding results with HDD, as discussed in this survey. The key is to choose the right method based on the nature of the datasets used. For instance, wrapper-based FS methods are often used in studies on HDD with low sample numbers (K02 and K48), whereas filter-based FS methods are often used on medical HDD or microarray gene expression HDD (K07 and K43).
Furthermore, integrating suitable meta-heuristic algorithms could also boost the performance of both wrapperand filter-based FS methods. Table 16 summarises the top 3 meta-heuristic algorithms with the highest average number of features based on the LGDG groupings (refer to Table 5 (literal grouping), Table 7 (1-match), Table 8 (mean-match), and Table 9 (all-match)).       In short, different FS methods work well with different classifiers, and knowing the best combination increases the classification performance. Table 18 indicated the top classifiers used in the research studies with the highest average number of features based on LGDG groupings (refer to Table 5 (literal grouping), Table 7 (1-match), Table 8 (meanmatch), and Table 9 (all-match).
Based on Table 18, the classifiers were limited to SVM, KNN, DT, and NB. The SVM, DT, and NB were the preferred classifiers for datasets with extremely high dimensions using the filter-based FS methods, whereas KNN was preferred for the wrapper-based FS methods. These classifiers were also widely used in all the 62 selected publications as depicted in Table 19. Each classifier has its strengths and hence is preferred based on the field of study.
However, categorising HDD research using the proposed LGDG technique is done manually, and therefore, time consuming. This is because the searching process of the HDD keywords in the fields of title, issue, and dataset descriptions by literal grouping requires manual effort. Similarly, for data grouping, each research work's datasets were manually compared to the threshold of 2,000 features to correctly group the dataset as HDD for analysis.

VIII. CONCLUSION AND FUTURE WORKS
Over the years, meta-heuristic algorithms have demonstrated their capability in various domains, including FS. Due to technological advancement, data expansion is unavoidable, where an enormous amount of data are being generated every second in different fields. The application of FS is no longer a task of simply selecting the relevant features that contribute to classification accuracy with a minimum number of selected features. Instead, FS keeps up with the pace of data growth as it also has to tackle the ''curse of dimensionality'' on HDD. Therefore, it is crucial to integrate meta-heuristic algorithms to aid FS.
With that being said, there are different FS methods available to accomplish the tasks mentioned earlier. For instance, filter-and wrapper-based FS methods are most frequently used when dealing with HDD using meta-heuristic algorithms. Since there are different schools of thought on determining a dataset as HDD, this study surveyed the threshold of the number of features in a dataset used to categorise HDD. This study also aimed to identify the trend or potential FS methods for HDD, together with the most preferred meta-heuristic algorithms and classifiers when employing HDD for both wrapper-and filter-based FS methods. Therefore, an extensive systematic literature review was conducted by implementing the PRISMA guidelines. The 62 journal articles selected for this survey were published between 2016 to 2021 and were retrieved from 7 databases accessed through the digital libraries in Universiti Tun Hussein Onn Malaysia.
To accurately group the 62 publications into HDD studies, this survey proposed a novel grouping technique called the LGDG. Literal grouping represents the searching of selected articles using HDD keywords. A total of 19 literal-grouped articles were categorised as HDD research. The data grouping consisted of 3 subgroups, namely 1-match, mean-match, and all-match. The threshold for the number of features in the 19 articles from literal grouping was set at 2,000 features by the majority. Meanwhile, the 26 publications categorised as HDD research based on data grouping (1-match), require at least 1 dataset with 2,000 features and above. While mean-match data grouping estimated 14 publications as HDD research in which the average number of features had to be greater or equal to the threshold. Finally, 8 articles were categorised as HDD research based on all-match data grouping, where every dataset requires 2,000 features and above. The publications by all LGDG groupings were analysed and discussed on different aspects such as the overall preference by researchers, the increasing trend for different FS methods in the yearly publication charts, suitable classifiers for each FS method, and the meta-heuristic algorithms that are excellent in both FS methods.
Based on the findings, different meta-heuristic algorithms and classifiers were preferred for different FS methods. Moreover, studies with wrapper-based FS methods indicated a remarkable ability in obtaining high classification performance. However, the filter-based FS methods also gained more attention in recent years with competent results on HDD.
In conclusion, as suggested by the No Free Lunch theorem, there is no absolute answer to the question of the best FS method of all. Once again, the key is to find a suitable method depending on the dataset's nature.
In the future, other aspects such as multi-objective FS on extreme HDD could be reviewed. Including other FS methods could be considered as an extension of the current survey. Lastly, the proposed LGDG technique could be improved to cover more extensive and effective searching methods for HDD research.