Phishing Classification Techniques: A Systematic Literature Review

Phishing has become a serious and concerning problem within the past 10 years, with many reviews describing attack patterns and anticipating different method utilizations. This indicates that the results are still not comprehensive, subsequently leaving a critical gap in phishing reports. Therefore, this study aims to conduct a systematic review, to show a more crucial issue in phishing attacks, namely classification techniques. These issues were categorized into techniques, datasets, performance evaluation, and phishing types. The obtained results are expected to help developers prevent future phishing attacks more effectively, especially in selectively and carefully determining the techniques and evaluations to address specific types of phishing.


I. INTRODUCTION
Phishing attacks are among the most common cyberattack threats on the internet [1], due to being a technique used in obtaining sensitive data, such as bank account numbers or accessing larger computerized systems through fraudulent email or website requests (National Institute of Standards and Technology). This indicates that the attackers often perform actions similar to an entity, to steal information from members or users [2]. These attacks often focus on the requests to change identity, passwords, and other important information, using email, social media, and others. In the industrial sector, the Anti-Phishing Working Group stated that the main targets of these attacks were presently webmail, financial institutions, payments, social media, and e-commerce [3]. Phishing attacks also involves the utilization of the world's top internet services, such as Namecheap (24%), Google (16%), Public Domain Registry (PDR) (19%), NameSilo (6%), Tucows (7%), and other channels (28%). Moreover, the SSL on phishing websites is one of the attackers' mainstays to deceive their victims. One of the most effective prevention techniques for detecting these attacks is classification, which has been widely used to detect The associate editor coordinating the review of this manuscript and approving it for publication was Zhan Bu . fraudulent activities on websites and emails. To improve the accuracy of detecting phishing attacks, various techniques have reportedly been developed by study researchers, such as feature selection [4]- [9] and ensemble learning [8], [10]- [14]. This led to the performance of classification technique reviews, to prevent phishing attacks. These reviews are expected to obtain more insight into the attack techniques.
Although various classification techniques are continuously emerging, they are found to still affect performance accuracy when using big and recent data [15]. For example, the classification of phishing has reportedly become a trend in previous years, although produces different performance results based on the objectives and dataset used. Therefore, a mechanism is needed to conduct a systematic analysis on the performance and variety of the techniques presently available in detecting phishing, especially the classification method. This study complements existing reports, such as [2], [16]- [18], into a systematic literature review (SLR), to focus on phishing classification techniques. According to Qabajeh et al. [16], the phishing prevention techniques were analyzed based on education and legal aspects, which were computerized using human-crafted and intelligent machine learning methods, respectively. This focused on the comparison of conventional and intelligent prevention techniques. To conduct SLR, a phishing development is being evaluated in this present report, accompanied by the description of classification technique usages over the last 10 years. Based on the study of Akinyelu [17], phishing websites and email detection prevention techniques were analyzed, as well as the utilization of datasets as performance benchmarks. Furthermore, Gangavarapu et al. [18] focused on email phishing prevention techniques, by evaluating feature selection (extraction), applicability, learnability, and generalizability of several state-of-the-art machine learning. The review of this present study is also based on [2], [17], [18] with an SLR on phishing classification techniques, including emails, financial data, short messaging services (SMSs), tweets, uniform resource locators (URLs), web pages, and websites. Subsequently, more insights are provided on the use of feature selection, with the SLR answering the following questions, (1) What phishing types mostly occur for classification techniques?, (2) What data sources do phishing classification techniques mostly use?, (3) What methods are often used for phishing classification techniques?, and (4) What performance evaluations are often used for phishing classification techniques?
This study aims to provide a more comprehensive SLR while focusing on the classification techniques for phishing attacks. It is also used as a guide for developing the prevention of phishing attacks, through more accurate classification techniques. The following contains the contributions of this research: 1) The identification of more comprehensive future development opportunities, such as determining the limit value of performance evaluation, expert collaboration, as well as the exchange of data and information, for phishing detection of different languages. 2) This review focuses on performance evaluation, data sources, phishing attack types, as well as the explanation of parameter settings and validation techniques.
3) The investigation of popular phishing attacks, such as emails, financial data, SMS, tweets, URLs, web pages, and websites. This study is subsequently organized into the following sections, (1) Section II, where the SLR is compared with previous related results, (2) Section III, where the obtained literature related to this review are thoroughly evaluated, (3) Section IV, where the basics of phishing attacks are explained from various sources, (4) Section V, where phishing is being assessed, including the technical aspects, datasets, types, accuracy performance, recommendations, and subsequent future insights, and (5) Section VI, where the conclusions are provided.

II. RELATED WORKS
Many reviews are found to comprehensively describe phishing attacks in the last 10 years, starting from the environment to technical and non-technical preventive techniques, respectively. However, not all these reviews focused on the classification techniques. This indicated that several previous studies specifically carried out a more comprehensive assessment on the utilized classification methods, performance evaluation, datasets, and phishing types, within the last ten years (2010-2020). Based on this condition, many related reviews were mainly divided into several groups, namely Twitter, SMS, Email, Website, URL, and Financial data. To add more in-depth insight, these were subsequently divided into more variables, namely Dataset, Classifier, Parameter Settings, Features, Validation method, and Evaluation metrics. The summary of the reports related to phishing reviews is shown in Table 1.
Based on Das et al. [2], a review of phishing URLs, websites, emails, and user studies was conducted, indicating the subsequent utilization of various parameters, namely feature, detection method, dataset, and evaluation metrics. This showed that the diversity of the dataset was evaluated for each phishing review. It also provided recommendations for dealing with phishing email issues, although did not mention the detection technique parameters used by the researchers URLs, websites, and electronic mail. This was because the parameters were indispensable for the researchers willing to perform comparisons with others. Furthermore, the study of Khonji et al. [19] involving a survey based on human factors, blacklists, heuristics, visual similarity, and data mining, which only focused on a variety of phishing detection techniques and solutions. Unlike Varshney et al. [20], the review only surveyed phishing detection techniques without a preventive solution. This indicated the sole utilization of the search engines, heuristics and machine learning, phishing black and white lists, visual similarity, DNS, proactive URLs, and mobile websites. Meanwhile, Khonji et al. [19] and Varshney et al. [20] did not describe the detection and performance evaluation techniques mostly used against these phishing attacks, leading to the development of other methods by study researchers.
According to Qabajeh et al. [16], a brief detection technique survey was conducted for website phishing attacks, with the analytical categories being grouped into traditional and computerized methods. This only provided limited information on existing techniques, as well as the description of the methods found in phishing detection. Therefore, several information such as performance evaluation and parameters were not stated in the analysis. This was in line with [21], which conducted a systematic review on phishing websites. Subsequently, the criteria used were datasets, features, techniques, and evaluation metrics. This indicated that the results obtained by [21] were more comprehensive than [16], although some explanations were still needed, such as the phishing detection technique parameters. Based on Gangavarapu et al. [18] and Almomani et al. [22], phishing emails were found to be the point of focus, indicating the production of methods, datasets, and evaluation metrics. This review only surveyed the techniques used in phishing emails, with the differences observed between Almomani et al. [22] and Gangavarapu et al. [18] being the provided solution. The prevention technique categories were also network-level protection, authentication, VOLUME 10, 2022 client-side tools, user education, server-side filters, and classifiers. This indicated that the results obtained by Almomani et al. [22] were very comprehensive than Gangavarapu et al. [18], due to the use of methods and solutions in preventing phishing attacks. However, the methods by which the parameters were utilized were not comprehensively explained for the detection techniques.
Akinyelu et al. [17] also conducted a review on phishing websites and emails, with the detection techniques and datasets being the only utilized categories. This indicated the performance of a brief categorical survey, where the influence of the obtained literature was described with the subsequent provision of feedback. Meanwhile, the detection technique parameters were not explained, indicating less understanding of the influence of the variables. According to Aleroud and Zhou [23], a review was conducted on phishing systems, namely environment, techniques, and countermeasures, through a very comprehensive survey. The prevention techniques also included machine learning, text mining, human users, profile matching, and other methods (ontology, honeypots, search engines, and client-server authentication). The results showed only seven phishing attack classification techniques, compared to this present study. Additionally, performance evaluations were only conducted on antiphishing tools. Various related studies also described the techniques used to detect phishing attacks, although had several undisclosed issues, such as (1) the methods by which the dataset was distributed against the phishing attacks, (2) the use of popular techniques for phishing types, (3) the use of phishing type evaluations, and (3) the use of parameters for phishing classification techniques. For example, the utilization of many parameters was not described by the researchers, leading to unbalanced comparisons with other studies. Therefore, the information on parameters is crucial to researchers during comparative analyses.

III. METHODOLOGY
This describes the SLR method, questions (Qs), search strategy, study selection, data extraction, quality assessment, and data synthesis. Fig.1, the SLR had nine stages, namely (i) identifying SLR needs, (ii) building a code of conduct, (iii) evaluating the code of conduct, (iv) searching for related reviews, (v) selecting related reviews, (vi) obtaining the information related to SLR, (vii) evaluating the quality of related studies, (viii) combining the results, and (ix) describing the SLR results.

B. STUDY QUESTIONS
The questions raised in this study are as follows, Q1, What phishing types mostly occur for phishing classification techniques?  Q2, What data sources are mostly used for phishing classification techniques?
Q3, What methods are often used for phishing classification techniques?
Q4, What performance evaluations are often used for phishing classification techniques?

C. SEARCH STRATEGY
After identifying the study questions ( Fig. 2), several queries and a journal database were selected and evaluated. This utilized database was obtained from many quality publishers, such as IEEE, Springer, ACM, Wiley, and Emerald. Moreover, the queries were defined based on the predefined questions, where phishing and classification descriptions were utilized. These queries were subsequently applied to the researchers titles, abstracts, and keywords. The parameters for the publication year, document type, and article category were also added in this process. This indicated that the required document type should be reviewed in the computer science category. For the results to be closer to the completion of the study questions, the search should focus on the documents conducted between January 2010 to December 2020. Table 2, the inclusion criteria for determining related studies included phishing topics, as well as the  comparison and application of classification techniques. The articles outside the inclusion criteria were also separated from the priority SLR journals.

Based on
Based on the search process, ''phishing'' provided 1,669 articles in TOPIC, leading to the rearrangement of the keywords. To subsequently produce phishing articles, the keyword was also used in the TITLE, leading to the production of 225 reviews. Moreover, the keywords were continuously rearranged for more specifications in the TITLE (phishing) AND TOPIC (classifi * ), leading to the production of 86 articles. The reviews irrelevant to computer science were then discarded, leading to the selection of 68 studies of phishing attack classification techniques.

E. DATA EXTRACTION
After the search and selection processes (Fig. 3), all related reviews were extracted to obtain information on the completion of the study questions. These data were obtained and used to map the selected articles, with adjustments conducted to the extracted information against the questions, as shown in Table 3.

F. STUDY QUALITY ASSESSMENT AND DATA SYNTHESIS
Based on this stage, the transformed articles were interpreted to answer the SLRs, where several graphic models were used to facilitate translation. In addition, the produced interpretations were found to be quantitative and qualitative.

IV. BACKGROUND
Phishing is one of the most dangerous cyberattacks capitalizing on human weaknesses, through the leveraging of social engineering and technology collaborations [24]. This indicates the loss of confidential information to hackers, through emails, SMS, social media including Twitter, Facebook, and Google+, or a web browser pop-up [25]- [27]. It is also found in almost all web pages, including auction and online payment websites [24]. Based on this condition, are known to capitalize on human weaknesses by sending emails or text messages containing gifts/security alerts from an organization [26]. This is often performed to direct the action of users according to the desired wishes. Therefore, a phishing attack aims to capitalize on the trust of users interacting with an institution believed to be safe and legitimate [24]. This indicates that phishing is encountered when a user obtains an illegal link in their email, with the subsequent response being often influenced by following the pop-up directions. These techniques are commonly found on computers and mobile platforms [7].
The technique is also divided into three groups based on the attack target [26]: namely general, spear, and whale phishing. General phishing is often massively carried out with hackers just throwing baits without using maximum effort, indicating that the chances of success are meagre [26]. This type of attack is found to only trap careless users. Furthermore, the spear-phishing attack targets a specific group of people with an essential organizational role, as hackers only need minimal effort to locate victims, such as using social scams. This indicates that the hackers constantly change methods, although use similar objectives with the failure of their attacks. The chance of success in this attack is found to be better, compared to general phishing. Meanwhile, the whale phishing attack has a target on organizational CEOs or political party leaders. This indicates that hackers do their best to profile victims and modify emails, as well as engage in various exploits to expose specific vulnerabilities. The difference between this attack and spear phishing is only based on the impacts achieved and performance efforts.
The disadvantages of phishing attacks are found to be very fatal, including the losses in financial institutions are observed as a reputation failure, due to the customers becoming insecure with the safety of transactions [24]. Meanwhile, finances are often disturbed based on users being unable to reaccess their financial accounts, such as through the illegal use of credit cards by hackers [28]. According to this condition, a phishing attack survey was conducted regarding the occurrences observed over the past 10 years. This indicated that phishing attack variations always occur yearly. Based on Fig. 4, the variation of these attacks was observed to increase yearly, with most of the occurrences found in 2019 and 2020, through websites, webpages, emails, URLs, SMS, and tweets. Subsequently, the phishing attack types that occurred over the past 10 years were described.

A. WEBSITE
Hackers often create a replica of the original website, for the full interaction of victims [29]. This is then accompanied by the transfer of the website through various media, according to the phishing target, such as emails, SMS, social media, and browser pop-ups [30]. Subsequently, hackers capitalize on human weaknesses by leading unknowing victims to the replica website, with the instructions to complete a validation request file/credential renewal and financial information [29]. When the user follows these instructions, accessing financial accounts become impossible with the immediate loss of money [31].
According to Abbasi et al. [32], two types of phishing websites were found to presently exist, namely concocted and spoof sites, respectively. This indicated that a concocted site replicates a legitimate website for commercial purposes, to conduct sale/purchase transactions or fraud. These sites are for buying and selling transactions between hackers and victims, based on the acceptance of money without providing the purchased product. This phishing technique often uses social engineering to obtain its victims. For example, a hacker creates a shop or an account at an e-commerce service provider, although performing transactions at the concocted sites. In this condition, hackers always use various excuses for victims to believe in carrying out transactions on the concocted site. This is not in line with the spoof sites, which only creates similar replica websites as the original, including web domains and content. When consumers select a replica website to log in, the user credentials are stolen by hackers and then used to assess the original platform for financial gain [33].

B. WEBPAGES
Phishing webpages manipulate textual forms or the appearance of legitimate websites, including the URL structure [5], [34]. This indicates that the ability of hackers to imitate legitimate websites is likely to deceive victims due to a lack of phishing knowledge. Lost personal information is also likely to lead to identity theft and loss of large amounts of money. Moreover, other forms of webpage phishing are found to exist, such as exploiting vulnerabilities to enter a legitimate website. This leads to the hacked website automatically having similar capabilities, such as the domain and appearance of legitimate sites [34].
Some users often place their trust in a website's safety from phishing attacks, based on the green padlock icon in the browser address bar. In recent years, almost all legitimate or phishing websites have reportedly used HTTPS, leading to the inability to become the standard against prevention [30]. Using a small amount of JavaScript technology, hackers also create green padlock icons and fake HTTPS in the browser address bar [35]. According to the Anti-Phishing Working Group [3], hackers used domain-validated to enable the SSL feature on phishing sites, due to being the weakest form of certificate validation.

C. EMAIL
Email phishing is a present problem so difficult to solve [36], as spam are limited to legitimate marketing emails with the occurrence of other types of phishing emails [37]. For example, a hacker often sends a phishing email to the victim as an essential or influential person in an organization, to obtain important information. This problem is increasingly developing in the big data era, with hackers assessing the profiles of victims, such as name, gender, contact information, and daily activities [26]. This indicates that hackers use fake emails to trick victims into providing confidential information. For example, a phishing victim is often directed to access a replicated bank website, with the instructions to provide an account number or credit card [37].
Besides causing the loss of important information, email phishing is also a means of accelerating the spread of malware [38]. When the email contains links leading to dangerous websites, malware is often being unknowingly clicked, leading to a fatal impact on the user [39]. Based on Sur [12], the message categories in phishing emails are as follows, • Authority An email from a law enforcement agency or authoritative institution, to regulate social life.
• Commitment An email from an organization or community group acting on humanitarian concerns, such as raising funds for natural disaster victims.
• Liking An email from people sharing similar fates, leading to users leaving the safe zone. Hackers often use this model to obtain organizational information through their members.
• Perceptual Contrast This email capitalizes on the victim's lack of knowledge in profit utilization information.
• Reciprocation This email capitalizes on the concept of reciprocity among humans (give and take).
• Scarcity These emails capitalize on human feelings, such as being provided with a short-term profit, which accordingly causes anxiety or loss when not immediately performed.
• Social proof This email originates from a trusted partner or neighbour. When users receive this message, they often become surer of the email's information due to being sent by a colleague or neighbour.

D. URL
Hackers are becoming constantly innovative through the creation of phishing URLs and various methods of obtaining VOLUME 10, 2022    confidential information [40]. The characteristic of this URL is based on the replication of a legitimate address, which then redirects to a website being modified for phishing [10]. The chances of being exposed to phishing URLs are high for those not comprehensively aware. This indicates that the awareness of URL phishing should be increased by identifying similar website appearance and the actual address [41]. However, present URL attacks often lead to other phishing websites, avoiding the detection of users.
According to Volkamer et al. [42], there were several reasons people always fall victim to URL phishing, such as, • The awareness of phishing URLs is lacking, leading to inappropriate decisions.
• Reliable URLs are often unknown when written in the email, as well as the browser address and status bars, respectively.
• The final destination of the URL is unknown, due to being redirected or using tiny addresses.
• URLs are not carefully checked before or accidentally being clicked, due to not knowing a phishing address.

E. FINANCIAL DATA
Financial data is known to be of interest to some phishing researchers, although not all have the desire to explore this specific data. This indicates that the data has reportedly attracted the attention of several researchers, such as [43] and [44], due to being associated with phishing attacks. In this data, phishing attacks often determines a correlation between textual descriptions, as organizational problems reportedly have financially significant impacts. Based on Nishanth et al. [43] and Chen et al. [44] phishing attacks were detected on the financial department through fiscal data. This indicated that the Decision Trees (DT), support vector machines (SVM), and multilayer perceptron (MLP) classification techniques were used to obtain phishing risk scores [43]. Furthermore, Nishanth et al. [43] developed the study of Chen et al. [44], which focused on the use of data imputation techniques. The results showed that this technique replaced the value of missing financial data, based on soft computing [43]. Nishanth et al. [43] and Chen et al. [44] also used financial statements and textual data to derive three levels of phishing attacks.

F. TWEET
Twitter is one of the media intermediaries for hackers to quickly and undetectably spread phishing across networks [45], due to using short content and tiny URLs. This medium does not need technique, as conveyed by [12], with the simple concept of Twitter, Follower, and Following indicating that users are especially interested in hackers. This indicates that hackers have always used these concepts to launch phishing attacks. Based on this feature, very little study has been observed, as Liew et al. [45] detected an attack on Twitter using the Random forest classification technique. This technique was used on the datasets obtained through a crawl of Twitter, indicating the classification of phishing attacks with 94.64% precision and 95.49% recall. However, the proposed warning mechanism against these attacks was approximately detected at 97.50% security alerts, for realtime phishing detection. VOLUME 10, 2022

G. SMS
Phishing attacks on telecom areas, namely SMS (Short Message Service), has reportedly reached users through the emergence of technology, with the provided convenience leading to the fall of victims. This indicates that phishing SMS (Smishing) is almost similar to email attacks, which involves stealing the victim's credentials. In this process, hackers often send messages to victims, containing a phone number for subsequent transactions or a URL with the directives to a malicious website [46]. These websites are similarly designed to the original platform, by imitating all the actual source code. When using Smishing, hackers often disguise themselves as trustworthy people, companies, or government organizations [47]. This indicates that more utilization of SMS leads to higher smishing [25]. Furthermore, many organizations use Blacklisting techniques against suspicious URLs, although the method is effortless for anticipation through a shortener service. This indicates the toughness in determining the safety and danger of a URL when hackers use the shortener service [47]. A similar modification of URLs to the original website address is also being performed by the hackers, such as misplaced or misspelt characters, e.g., gooogle.com or facebok.com. The Smishing trend in the last five years (2016-2020) was found to increase yearly, with the highest observed at 241342 reports in 2020 [48]. It also contributed to the spread of the Banking Trojan [49], due to being used to avoid the screening process mechanism carried out by Google Play. This indicated that McAfee Mobile Security found a significant increase of 141% in Q3 and Q4 within 2020 [49]. Therefore, bank customers are still the target of phishing attacks through telecommunications media [50]. Smishing also retrieves the information stored on smartphones when the user clicks on a malware-based URL [47], [51]. This information includes contacts, notes, financial information, pictures, etc. This trend subsequently steals personal information such as security cards, photo identification, or accreditation certificates for other crimes [52]. Various forms of text messages are also found to be compromised by malware, such as the provision of coupons to wedding invitations. This indicates that mobile phishing is very difficult to detect than that of the email, due to the tiny message size [50].
The opportunity for smishing attacks on mobile devices is reportedly enormous, due to the higher interaction of people with mobile devices, compared to laptops or computers [53]. For personal and official purposes, SMS is commonly utilized, with many popular organizations adopting it as a source of communication with their customers. Most organizations also use SMS to send information, promotions, VOLUME 10, 2022 and surveys [54], with study researchers utilizing various methods to prevent smishing attacks, such as the Blacklist technique. This method prohibits SMS from the phone numbers included in the Blacklist category, although still has a weakness with hackers performing Smishing from another mobile contact [53]. Using machine learning collaboration and feature extraction, the concept of smishing detection often identify various sources of telephone numbers in SMS [53]. Smishing messages have also become a part of Spam texts, although have a similar goal in obtaining users' personal and financial information [46], [50]. Based on this condition, various researchers had difficulties in obtaining the Smishing dataset to be used in comparing their proposed detection technique [50]. This indicated that the use of language translation techniques was an option to obtain new datasets for study researchers in specific countries. According to Rastenis et al., an English phishing email dataset was used and subsequently translated into Russian and Lithuanian [55]. This indicated that the Google Translate service was used to interpret the phishing email dataset into Russian and Lithuanian. Based on Wu et al. [56], mobile device users were exposed to phishing attacks due to the following, • Limitations of mobile device capabilities, such as the screen size and the ability to perform computations.
• User habits, such as leisure activities, enables more convenience to click on a URL than typing to check. This is because the use of a virtual keyboard is not as comfortable as using a physical device.

V. RESULT AND DISCUSSION
Phishing attacks have reportedly led to catastrophic losses, with various detection and prevention attempts being carried out by several researchers. Based on this condition, SLRs were conducted against phishing attacks as the classification technique for the last 10 years. Several reviews were also obtained from the best journal ranking in the computer science category. According to Table 4, the literature obtained were 50, 26, 6, and 18% of Q1, Q2, Q3, and Q4 Web of Science (WoS) articles, respectively. The most dominant source journals were also the IEEE ACCESS (15%), COMPUTERS & SECURITY (10%), EXPERT SYSTEMS WITH APPLICATIONS (6%), as well as IEEE COMMUNI-CATIONS SURVEYS AND TUTORIALS (6%). Phishing study was found to be increasing yearly, especially for classification techniques. Based on Fig. 5, the highest increase was observed in 2019 (23%) and 2020 (26%), with an annual rate of 22%. This indicated that there was a significant increase in the yearly study of phishing.

A. MOST OCCUR PHISHING TYPE
The distribution of phishing type is shown in Fig. 6, where the higher dominance was found in the website (39%), webpage (22%), email (20%), URL (12%), and others (7%). This indicated the relevance to present phishing incidents, where a collaboration between emails or other social media increased the types of attacks, such as URLs, webpages, and websites.  The yearly distribution of each type of phishing is shown in Table 5, where most studies were observed in 2020 (16 articles). During this period, the distribution of articles was based on phishing types, namely webpage (3), website (8), URL (2), email (2), and SMS (1). However, the distributions focused on the webpage (2), website (7), URL (4), email (1), and tweet (1). This subsequently indicated that only a few studies were conducted in 2010 and 2012. According to Table 5, phishing studies were increasing, especially in 2019 and 2020, where dominance was found within websites at 7 and 8, respectively. This indicated that there were yearly studies on websites, webpages, and emails, showing that researchers were highly focused on solving phishing problems.

B. MOST USED DATA SOURCE
In this study, the dataset was used as a medium to test the performance of the researchers proposed method. Based on Fig. 7, the most widely used dataset was the PhishTank (34 authors), Alexa (15 authors), and UCI Machine learning (12 authors). This was because the PhishTank dataset provided publicly suspicious phishing URL information [57], with Alexa being a Web analytic obtaining data from clients through an installed toolbar [58]. Meanwhile, the UCI Machine learning was mostly used to conduct benchmarks, especially the phishing studies [8]. A total of 13 authors also used the datasets obtained from their institutions, such  as email servers, spam filters, honeypots, financial data, and common-crawl websites. Besides that, other researchers utilized the datasets according to their problems, such as 3Sharp, DeepPhish, Enron, Phishload, starting point directory, and Stuffgate Free Online Website Analyzer.
Based on phishing, the use of datasets was divided into public, private, and blended (public and private) groups, respectively. According to Fig. 8, the uses of public, private, and blended datasets were 58, 13, and 29%, respectively. This indicated that the researchers simultaneously used the public and private datasets, to ensure that the proposed method's performance remained superior. However, the use of private datasets was constrained when compared with the proposed methods.
Based on Fig. 9, the use of public datasets began to increase from 2016 to 2020 (30 articles), indicating that the utilization of this information remained a reference for phishing studies to benchmark the proposed methods. The uses of both private and public datasets were also found to be increasing, subsequently providing new phishing insights. However, it remained an obstacle when compared to the proposed method's performance, due to the collision with a private dataset.   According to Fig. 10, the most widely used dataset size ranged between 1000-9999 (41%), accompanied by 10000-99999 (37%) and 100000-999999 (13%). This indicated that using the number of datasets was a consideration to test the method's performance.
The use of balanced and imbalanced datasets was also surveyed in this report. This indicated that an imbalanced dataset occurred due to the uneven distribution of data classes. Based on Fig. 11, 13 and 46 researchers (23 and 78%) used both balanced and imbalanced datasets, respectively. This showed that the increase in the use of imbalanced datasets began in 2011 and peaked within 2019 and 2020, although did not significantly increase.   In Fig. 12, the use of imbalanced datasets was more dominant than the balanced datasets, due to being commonly used on phishing websites (41%), webpages (24%), and emails (22%). However, the use of balanced datasets was only used on websites (31%), webpages (15%), and URLs (23%).
Several researchers also used one dataset, with others using various types such as Alexa and PhishTank, to achieve the required standards. For these researchers, the use of a dataset depended on the standards being set. From Fig. 13, 73% and 27% of researchers used multiple and one dataset collaborations, respectively.
The utilization of dataset features was also surveyed, due to being an essential element in detecting phishing attacks [5]. For researchers, the evaluation of feature use affected the classification performance [12], [13], [59]. This indicated the elimination of ineffective features on attack detection [9], subsequently reducing the training data processing time [60]. Based on Fig. 14, the feature evaluation was carried out by 58 articles, containing 33%, 2%, and 65% ordinary, FVV index, and cross-validation assessments, respectively. Some researchers also added numerous mechanisms to assess the feature evaluation models, such as the FVV index and cross-validation. This indicated that the FVV index was created by Zhu et al. [7] to evaluate sensitive features' impact, subsequently showing that the variable conquered overfitting, especially in the neural network classification. Meanwhile, cross-validation was the most widely used technique in machine learning, especially phishing attack detection techniques. The variable also overcame overfitting in machine learning [4], [24]. In addition, performance evaluation was conducted using majority cross-validation in accuracy, TPR, precision, and f-measure.

C. MOST USED METHODS
From 2010 to 2020, 30 methods were used to classify phishing attacks, as shown in Fig. 15, where the most widely used techniques were Random Forest, Support Vector Machine (SVM), Logistic Regression, Neural and Bayesian Networks, C4.5, Decision Tree (DT), and DNN (Deep Neural Network). This showed that Random Forest, SVM, and Logistic Regression were the three most used methods in this study (39%).
Each expert reported that their results were better than the techniques used by other studies. According to Chen et al. [58], the Logistic Regression performance was better than that of SVM and C4.5 on the phishing webpage. However, the DNN performance was better than that of SVM and C4.5 on the websites, according to Ali and Ahmed [6]. The study of Zhang et al. [33] consequently showed that the SVM performance was better than that of the Random Forest and Logistic Regression on a phishing website. Besides that, the Random Forest performed better than DT and SVM on phishing URLs, according to Sahingoz et al. [41].
This indicated that the specific methods performed better with the different phishing types. Although these methods had similar phishing types, they were still not necessarily able to produce the same performance. Moreover, differences were obtained from the collection, processing, and testing of data. Zhang et al. [33] also collected specific data on phishing and legitimate e-Business websites, namely http://www.315online.com.cn and http://www.anquan.org. The number of websites obtained was also 1,416 and 1,462 for phishing and 1,462 legitimate platforms. Subsequently, the study of Ali and Ahmed [6] used UCI Machine learning with 1353 websites, which contained 548, 702, and 103 legitimate, phishing, and suspicious platforms, respectively. This indicated that cross-validation was used to validate the developed model. However, Zhang et al. [33] used the precision, recall, and F-measure values to evaluate classification techniques.
Based on Fig. 16, the most explained phishing type was the website (13 articles), accompanied by the webpage (10 articles), URL (6 articles), and email (5 articles). Random Forest (5 articles) was also the most analyzed phishing technique on websites, accompanied by the SVM and Logistic Regression, each with four articles on webpages, respectively. Meanwhile, the phishing type with little analysis were financial data, tweets, and SMS.

D. MOST USED PERFORMANCE EVALUATION METHOD
Various methods of performance evaluation were used also used on the proposed classifications, with variations caused by the researchers efforts to obtain the best results compared with similar studies. This indicated that accuracy was the commonly used evaluation method [61], although was unable to be used as a benchmark for measuring all types of classification abilities. Therefore, the more the performance evaluation methods used, the better the opportunities for effective model developments.
Based on Fig. 17, general performance evaluation, namely accuracy, was mostly utilized. This indicated that most performance evaluation was accuracy (45 articles), accompanied by the True Positive Rate (TPR) (30 articles), F-measure (22 articles), and Precision (21 articles). The following are the nine most used performance evaluation techniques for phishing classification,

• Accuracy
The percentage of correct phishing predictions and legitimate websites to the total number of platforms (websites) [62].

• TPR/Recall/Sensitivity
The percentage of successfully and accurately predicted phishing websites from the total number of platforms (websites) [62].

• Precision
The percentage of successfully and accurately predicted phishing websites from the total number of expected platforms (websites) [62].

• False Positive Rate (FPR)
The percentage of legitimate websites incorrectly predicted from the total number of original platforms [62].
• False Negative Rate The percentage of phishing websites incorrectly predicted from the total number of platforms [62].
• True Negative Rate/Specificity The percentage of successfully and accurately predicted legitimate websites from the total number of original platforms [62].

• Receiver Operating Characteristics
The plot values of TPR against FPR, using various threshold settings [64].
• Area Under the Curve The probability that the classification technique performed a higher ranking of randomly selected positive instances than the negative conditions [63]. Based on Table 6, the most widely used performance evaluations were accuracy, precision, and TPR/Recall/Sensitivity. However, the least used methods were Welch's T-Test, Shapiro-Wilk Test, Ranking Techniques, Prediction error rate, Geometric Mean, F1-Macro, and Matthew's Correlation Coefficient. According to the review, the classification technique parameter settings were compared with related studies, with some researchers exhibiting specific parameters. From Fig. 18, the information disclosure on the use of parameters increased yearly, especially in 2019 and 2020. Meanwhile, the researchers that did not mention specific  parameters experienced an increase of almost half, compared to others.
Based on Figs. 6 and 19, the phishing types containing websites, webpages, emails, and URLs mostly correlated with information disclosure, using the classification technique parameters. This indicated that the website was most significant for phishing webpages, emails, and URLs, based on the parameters. Approximately 54.55% of website phishing researchers also comprehensively conveyed the use of the classification parameters. Meanwhile, the phishing webpages (30.77%), emails (26.92%), and websites (19.23%) researchers did not provide information on the use of specific classification technique parameters.
According to the parameter setting classification techniques, Al-Fayoumi et al. [60] used the confidence values of (0.2, 0.5), (0.1, 0.4) and (0.05, 0.3) to produce the best performance. This was in line with Alsariera et al. [64], although only the number of iterations = 100 was able to produce a significant performance comparison. Therefore, disclosing information on the use of parameters produced phishing development continuity.
These results were also in line with several studies, as the parameter setting was used to ensure a fair comparison of related reports. Subsequent comparisons were also carried out in this study, such as observing changes in the performance of classification techniques to the parameters. However, some studies only utilized the parameters recommended by the related reports. This setting is subsequently a big challenge [65], as there were no general rules or standards to be followed towards obtaining the best results. These indicated that many researchers used various modifications to the size of the parameters used in improving the performance of classification techniques. The settings used by the researchers were also the number of folds in cross-validation [66], layer [67], hidden nodes [14], [11], learning rate [68], threshold [7], activation function [6], epoch [6], [64], [69], minimum support and confidence values [59], related studies parameter utilizations [39], and automation [60]. Therefore, the parameter settings improved the performance of classification techniques to maximum accuracy [6].
Several studies also conducted experimental classification techniques on dataset changes to parameter settings. This indicated that the best classification technique performance was achieved by changing the dataset [66], Therefore, the role of parameter setting was very important to the performance of the techniques [89]. In Table 7, the list of the primary studies with six attributes were also presented, namely year, main study, publication, dataset, method, and phishing type. This primary study contained 68 articles (January 2010-December 2020), and was ordered by the most recent year of publication. Fig. 20 shows the various phishing attacks with the classification techniques.

E. INSIGHTS AND FUTURE STUDY DIRECTIONS
Based on the SLR, the following are some future classification technique contributions for phishing attacks, • No studies used a different language dataset. Most researchers used the datasets in English, such as phishing emails [36], [39], [66], [73] and SMS [46], indicating that the chances of increasing the prevention of attacks are small. Therefore, the phishing attacks in various languages were needed to measure the classification techniques.
• There are no expert-based feature recommendations. Many researchers only depended on the preprocessing features, especially emails, subsequently confusing the message's classification [68]. For example, an email containing a URL from a colleague was received after attending a webinar or meeting, categorized as a regular mail. However, it is categorized as a suspicious email when the mail received contained a short message as a URL. The complex behavior can lead to various innovations in crime [86], such as using persuasion techniques. Many suspicious emails use persuasion techniques to deceive their victims [85], [87], [88]. Therefore, the expert validation of the email is required, especially when related to the phishing classification features.
• There is no standard value or cut-off range for performance evaluation. No categories were determined for the performance assessment of the classification technique. This indicated that the researchers used a value close to 1, indicating the best performance [28]. Besides accuracy, many researchers also used alternative measurements, based on the observation of higher TPR or lower FPR values. Another critical issue is the difficulty in providing comparative evaluations among different phishing detection techniques. This was mainly due to the restriction of sharing, as well as the lack of standard benchmarks and reference datasets, based on the attackers' dynamic nature and potential data sensitivity [21].

F. LIMITATIONS
This study had limitations in the search for articles, as only journal-based publications with an impact factor were used. Clarivate analytics-WoS was also used to obtain the articles, as journals with emerging index citation sources in the WoS core collection were ignored. Additionally, only the phishbased articles were selected and classified in the computer science category.

VI. CONCLUSION
In this study, a more in-depth evaluation was observed at the phishing classification techniques, using systematic literature review. This was the first systematic review in the past 10 years, with a comprehensive focus on the classification techniques. Several recommendations were also provided, based on helping study researchers obtain more insight into the development of phishing. The results showed that many researchers performed comparisons without describing the parameter setting of the utilized classification technique. Other issues found were also the incorrect evaluation and validation of the classification technique performances, as well as the diversity of the dataset's utilization. In addition, the proposed systematic literature review thoroughly described the gaps in the classification techniques, for the development of phishing studies to be highly focused on more efficient solutions.