Imbalanced Classification Methods for Student Grade Prediction: A Systematic Literature Review

Student success is essential for improving the higher education system student outcome. One way to measure student success is by predicting students’ performance based on their prior academic grades. Concerning the significance of this area, various predictive models are widely developed and applied to help the institution identify students at risk of failure. However, building a high-accuracy predictive model is challenging due to the dataset’s imbalanced nature, which caused biased results. Therefore, this study aims to review the existing research article by providing a state-of-the-art approach for handling imbalanced classification in higher education, including the best practices of dataset characteristics, methods, and comparative analysis of the proposed algorithms, focusing on student grade prediction context problems. The study also presents the most common balancing methods published from 2015 to 2021 and highlights their impact on resolving imbalanced classification in three approaches: data-level, algorithm-level, and hybrid-level. The survey results reveal that the data-level approach using SMOTE oversampling is broadly applied in determining imbalanced problems for student grade prediction. However, the application of hybrid and feature selection methods supporting the generalization of the predictive model to boost student grade prediction performance is generally lacking. Other than that, some of the strengths and weaknesses of the proposed methods are discussed and summarized for the direction of future research. The outcomes of this review will guide the professionals, practitioners, and academic researchers in dealing with imbalanced classification, mainly in the higher education field.


I. INTRODUCTION
Student grade prediction is one of the essential areas that can determine and monitor student performance in higher The associate editor coordinating the review of this manuscript and approving it for publication was Wei Liu. educational institutions (HEI). This area has gained significant attention in the education sector over the years as many studies have been interested and proven the reliability of student grade prediction with many help of the existing machine learning algorithms to enhance student success [1], [2], [3].
The aim is to facilitate the educational sector to evaluate the risk of academic failure and provide feedback to improve student outcomes for each semester. With early grade prediction, the development and progress of students can be assessed more effectively [4], [5]. Other than that, student grade prediction is also one of the common indicators to determine student performance success [6]. The high accuracy of students' grade performances is beneficial and helps the HEI identify the students at risk of failure early in academics. However, as the student dataset becomes more extensive and complex, the effects of imbalance distribution on the target class become higher, which results in poor performance on the predictive model [7]. Therefore, knowing the imbalanced classification methods is significant for building an effective predictive model to improve students' future teaching and learning performance.
There have been various published reviews regarding predicting student academic performances and their relevance, as presented in TABLE 1. Nevertheless, based on the results, few studies highlight the algorithm involved in boosting the predictive model accuracy in predicting student performances. Most of the existing surveys were focused on the machine learning methods and summarized the prominent findings with interesting future directions on student performances, but the review of the algorithm resolving the imbalanced dataset was not discussed comprehensively enough. In this paper, the useful methods to resolve the imbalanced classification to improve the efficiency of the predictive model will be discussed in more detail in three different approaches; data-level, algorithm-level, and hybrid-level.
Therefore, this paper thoroughly reviews and summarizes the most common methods for addressing imbalanced classification in the education domain, focusing on improving the performance of student grade prediction. The contributions of this comprehensive review are summarized and highlighted as follows: 1. This SLR analyzes and summarizes the imbalanced classification methods in detail from three different approaches, data-level, algorithm-level, and hybrid-level, to improve the accuracy of predictive models.
2. Provide a taxonomy of current imbalanced classification methods used for predicting student grades to highlight the most applied algorithms in the education field that will ease the professionals, practitioners, and academic researchers to understand the significance of this technique. 3. A comparative study of existing balancing methods with their classifiers in both aspects (binary and multi-class) and accuracy scores more comprehensively that can be used for future educational research. 4. Provide an overview of the existing evaluation performance metrics applied for an imbalanced classification problem to improve predictive model performances in student grade prediction.
The rest of the paper is organized as follows. Section II gives background information to this research for the reader's basic understanding. Section III describes the review method and how the SLR was conducted to formulate the selected articles' research questions and search strategy. Section IV provides data extraction and synthesis of SLR results. Section V discusses the results of the overall findings. Section VI discusses the future direction of this research, and finally, the study is summarized and concluded in Section VII.

II. IMBALANCED CLASSIFICATION IN STUDENT GRADE PREDICTION
Data-driven in education is a new trend accelerated by global changes lately. The knowledge and insightful information gained from this area provide many advantages that can improve HEI decision-making. To achieve this, educational datasets are collected from various online databases and platforms such as Course Management and Learning Management Systems (LMS) or known as Moodle, Massive Open Online Courses (MOOC), Open Course Ware (OCW), Open Educational Resources (OER), and social media sites such as Twitter, Facebook, YouTube and Personal Learning Environments (PLE) [10].
Student grade prediction uses machine learning to predict the final score to improve student academic performance by the end of the semester [11]. The aim is to help educators determine the potential students at risk of low results and help them overcome their learning difficulties. Hence, identifying the relevant factors, including student background, academic information, environmental factors, test scores, and Grade Point Average (GPA) or Cumulative Grade Point Average (CGPA), are significant in predicting student performance [6]. However, when a tremendous amount of data is collected and analyzed without being classified in a balanced way, it becomes a significant problem for predicting students' grades.
During the training phases of student grade prediction, imbalanced classification appears when there is an unequal distribution of instances within the target class in the training dataset [12], [13]. Most datasets involve binary classification consisting of two target outputs: the ''pass'' class as the majority and the ''fail'' class as a minority. In contrast, some of it comprises more than two different classes, known as multi-class classification. When one class significantly outnumbers the class of the other, the training model usually spends more time processing on the majority classes than the minority ones, which could be less informative. Consequently, it usually leads classifiers to become biased and produce high erroneous. Due to this, many empirical studies are interested in exploring various methods to enhance student grade prediction performance [14], [15], [16], [17]. However, the methods and algorithms used in dealing with various class-imbalanced distributions to predict student grades are not being highlighted and are not comprehensive enough.
Several approaches have been proposed to handle class imbalance to improve the prediction model's performance. These approaches can be categorized into three levels of solutions: VOLUME 11, 2023  1. Data-level or pre-processing approach: This solution is more straightforward, feasible, and easy to implement. The aim is to rebalance the class distribution by either increasing the instances of a minority class or decreasing the instances of a majority class with a sampling method [18], [19]. The data-level approaches include three sampling methods: oversampling, undersampling, and hybrid. 2. Algorithm-level approach: These solutions try not to change the training dataset but to balance the class distribution by modifying the classifier learning [20]. 3. Hybrid-level approach: These incorporate data and algorithm-based approaches based on ensemble learning to balance the dataset [21].

III. SYSTEMATIC REVIEW PROTOCOL
A systematic review is a review of evidence on a formulated question using a systematic method to summarize research related to a comprehensive study plan [22]. This systematic review was conducted to review and identify current data pre-processing techniques for imbalanced classification problems and find the best solution for machine learning classifiers in student grade prediction. This study applied Kitchenham guidelines [23] to perform the systematic review. The review protocol consists of four phases: formulate research questions, search strategy and selection articles, synthesis and results, and report. Each phase in the review protocol is shown in FIGURE 1.

A. FORMULATING RESEARCH QUESTION (RQ)
There are three RQs have been formulated to explain the exact idea of this SLR, as indicated in TABLE 2. This subsection presents the RQ to address the relevant areas in this study. The aim is to provide an understanding of imbalanced classification methods and discuss how these methods can have a significant impact in the context of improving student grade prediction performance.

B. SEARCH STRATEGY AND SELECTION ARTICLES
This systematic literature review was performed based on the articles extracted from academic journals and conference proceedings from five online citation databases: Scopus, Web of Science, IEEE Explore, ScienceDirect, and SpringerLink. According to the RQs, we formulate the main keywords to search the literature consisting of three terms: (''student grade prediction''), (''student grade prediction'') AND (''imbalanced classification''), (''student grade prediction'') AND (''data pre-processing''). To narrow the search results, we limited our search to those published in peer-reviewed journals and articles from 2015 to 2021 and selected the subject areas that focus on the computer science field based on the relevant title. The search strategy of this review is based on PRISMA [24] flow diagram as shown in FIGURE 2. The process of article selection was conducted using the previously mentioned search keywords. Initially, the prospective articles through automated search generated 722 records based on the five selected databases. Next, all duplicated articles found during the automatic selection were removed using Mendeley software, ending up with 323 records. Then, the search was filtered by reading the article's title and abstract comprehensively through a detailed inclusion and exclusion criteria. After screening through this process, the articles were then classified based on QA criteria to remove all irrelevant articles, ending up with 43 remaining articles for in-depth review.

1) INCLUSION AND EXCLUSION CRITERIA
Specific criteria were set for inclusion and exclusion to determine the relevance of this study's selected journals and articles. This ensures that the selection criteria are reliable and correctly classified based on the defined RQs. The detailed criteria for this review are presented in TABLE 3.

2) QUALITY ASSESSMENT
Quality assessment (QA) is another criterion that we considered in this SLR to ensure that the selected articles are conducted based on specific quality measurements [23], [24]. These criteria were used to investigate the suitability of each VOLUME 11, 2023

IV. DATA EXTRACTION AND SYNTHESIS
As mentioned in the previous section, this systematic review selected the publication between 2015 and 2021 from five different databases. A total of 43 articles fulfilled all the inclusion and QA used in this SLR. These selected articles were conducted and filtered according to the RQs to be the most relevant that focused on the solution of imbalanced classification and high dimensionality in student grade prediction. In contrast, some other articles only applied conventional data pre-processing for their study findings.   The number of systematic reviews shows that an increasing trend in student grade prediction has gained prominence since 2019 among researchers. Moreover, based on our observation, the application of methods related to imbalanced classification problem has gained interest among researchers to improve the predictive model performance in the education field. As listed in TABLE 5, we exploited five research databases to find the relevant articles. Among this, 25 articles (58.1%) were published in journals and 18 articles (41.9%) in conferences proceedings.  TABLE 6 based on year, publication title, number of citations, publication type, publication name, and rankings based on impact factor (IF). Meanwhile, FIGURE 5 visualizes the bibliometric analysis of the citation and publication using VOSviewer. The node size indicates the frequency of occurrence, whereas the curve between the nodes represents their co-occurrence in the same publication.

V. RESULTS AND DISCUSSION
This section presents and discusses the findings by answering the RQs formulated into three subsections; (1) a summary of state-of-the-art approaches and methods in handling imbalanced classification; (2) discusses the proposed methods, algorithms, and datasets which significantly affect the   performance of student grade prediction and; (3) explore the various evaluation metrics that are used to measure the performance of student grade prediction.

A. SUMMARY STATE-OF-THE-ART FOR IMBALANCED CLASSIFICATION METHOD
Most prevailing methods to address imbalanced classification problems in education, mainly in the student grade prediction domain, are broadly discussed in three approaches: data, algorithm, and hybrid based. In the student grade prediction domain, we found that 17 articles from 43 primary studies selected data-level based on sampling methods more often to reduce the imbalanced classification, whereas 14 articles considered feature selection to solve the high dimensionality, which helps address imbalanced classification. Meanwhile, the remaining papers use other methods, such as algorithms and hybrid approaches to enhance student grade prediction performance. In answering RQ1, the taxonomy of several methods and algorithms proposed by previous researchers for improving student grade prediction performance is summarized in FIGURE 6.

1) DATA-LEVEL OR DATA PRE-PROCESSING APPROACH
Data-level, known as data pre-processing, is one of the most commonly used approaches to address imbalanced problems at the training set level. Three methods utilized under these approaches are oversampling, undersampling and hybrid sampling. Though much research has been devoted to investigating the strength and weaknesses of different selection approaches in the education domain, choosing the most appropriate methods for a given task is often difficult. From the literature, oversampling [30], [54], [55], [56], [63], [65], [66] methods have been the most preferable so far due to their effectiveness in dealing with a high proportion of class imbalance compared to undersampling [38], [39].

a: OVERSAMPLING
A method that increases the size of minority class instances by generating new instances to obtain balanced classes. SMOTE is one of the most popular and classic  oversampling algorithms. Chawla et al. [67] introduced an oversampling SMOTE algorithm that uses interpolation methods to increase the size of the minority class. It oversamples the minority class by generating ''synthetic'' instances rather than oversampling by replacement. Based on the similar principle of SMOTE, [35] compared many oversampling in 463,956 student data records using SMOTE, SVMSMOTE and ADASYN to solve imbalance class for predicting students' performance. Utari et al. [42] developed a high-performance model to predict student dropout using 2492 datasets with balancing methods using SMOTE and RF algorithms. Meanwhile, Bouchard et al. [45] used SMOTE on a high dimensional imbalance dataset with feature selection to improve student grade prediction. Tanha et al. [21] presented another oversampling algorithm SMOTEBoost to analyze the performance of binary and multi-class imbalanced problems. This algorithm introduces SMOTE in each iteration, boosting learning more minority class instances. However, it takes the highest computational time to process large datasets.

b: UNDERSAMPLING
A method that reduces the size of the majority class instances by eliminating the cases from the training dataset. RUS [39] VOLUME 11,2023 is an example of a naïve version and the most straightforward undersampling algorithm. It simply removes a random portion of samples in the majority class to make all classes equally represented. However, randomly discarding the samples may delete some potentially useful information from the majority class instances, reducing the prediction performance [68]. Therefore, to overcome this problem, some researchers have proposed heuristic undersampling algorithms such as ENN [69] and Tomek-Links [70] that effectively avoid the blindness of RUS. Some advanced methods that combine both oversampling and undersampling, known as hybrid sampling, are proposed to achieve reliable results.

c: HYBRID SAMPLING
This method aims to increase the number of minority samples and reduce the number of majority samples to mitigate the sample imbalance. A study [17] used SMOTETL combined SMOTE and TL algorithms to overcome the imbalance problem in predicting student performance. Meanwhile, Pristyanto et al [71] proposed a data hybrid solution using a combination of SMOTE and OSS to handle the distribution of imbalanced classes in predicting student success. The algorithm used SMOTE to reduce the risk of data duplication, whereas OSS overcame the loss of information and misclassification problem in the majority classes. Hassan et al. [39] proposed an algorithm combining SMOTE and undersampling ENN known as SMOTEENN, which can overcome the multi-class imbalanced problem. The experimental results show that SMOTEENN consistently produce high results with ensembles classifiers to improve students' performance.

2) ALGORITHM-LEVEL APPROACH
Unlike the data-level approach, algorithm-level is a dedicated algorithm that directly learns the imbalanced distribution from different datasets based on cost-sensitive methods and ensemble learning [20]. Instead of creating balanced data distributions using different sampling strategies, costsensitive learning overcomes the imbalanced learning problem by using different cost matrices that describe the costs for misclassifying any particular data class [72]. It generally learns the imbalance distribution from the classes in the datasets by modifying the decision threshold when assigning the minority class. The improved algorithm has a significant effect on maximizing the classification accuracy of imbalanced datasets [29].

3) HYBRID APPROACH
Other works use hybrid approaches combining both datalevel and algorithm-level approaches to optimize the prediction models. Mubarak et al. [29] proposed a custom loss function that introduces different weights for different classes into a new hyper-model CONV-LSTM, so that it selects significant features. It captures temporal dependencies automatically to optimize the prediction results. The proposed architecture combined the CNN and LSTM deep neural network, which creates complex non-linear features from the interaction of thousands of attributes, and were trained by updating the weights through forward and back propagation to achieve optimal parameters. Then, applying the costsensitive to the loss function minimizes the misclassification cost of the imbalanced class, thus increasing the performance of dropout prediction. In another study, Deepika et al. [73] proposed a combination of Optimal feature selection methods using GWO, SMOTE and RF Classifier. They adopted the GWO optimization method to select the optimal parameter for the RF classifier. In contrast, SMOTE solved the imbalanced data in the datasets chosen to enhance the efficiency of student performance prediction.
Few previous works explore the DNN model to improve the class imbalance problem. Nabil et al. [30] compared the efficiency of DNN with SMOTE, ADASYN, ROS and SMOTEENN using 4266 students' previous academic achievement records. When the DNN is implemented to class imbalance dataset, it helps generalization. It enables the network to correctly discover hidden patterns and extract insightful knowledge, thus achieving better and more reliable results. Similarly, Aslam et al. [74] proposed SMOTE into DNN based on a dense model using eight hidden layers to overcome the imbalance problem. TABLE 7 summarizes all approaches, balancing methods and algorithms regarding their strengths and weaknesses in handling imbalanced classification, especially for the student grade prediction domain.

B. IMPACT OF BALANCING METHOD ON STUDENT GRADE PREDICTION PERFORMANCE
The characteristic of an imbalanced dataset has caused the problem of low accuracy for the minority class in classification, which can produce biased results. Due to this, many methods and algorithms have been proposed to solve imbalanced classification among many researchers. To answer RQ2, several datasets are commonly used, considering the actual and online data, which give the most significant impact for each study experiment. TABLE 8 summarizes the detailed dataset characteristics with a comparative analysis of the balancing method and algorithm to improve student grade prediction performance.
Utari et al. [42] proposed SMOTE to predict student dropout using 2406 imbalanced datasets. The proposed technique outperformed the accuracy performance of the RF classifier up to 93.4% from 92.3%. Similarly, Gull et al. [41] have applied SMOTE using 250 imbalanced datasets for the multi-class imbalanced problem to predict final student grades early. The proposed techniques show high performance for the LDA with 90.7% accuracy by generating a balanced number of minority class samples. However, SMOTE always leads to overfitting risk when generating and comparing six different algorithms, such as ROS, ADASYN, and duplicating new synthetic instances, which become less effective. Some researchers applied RUS, NearMiss, SMOTEENN, and SMOTETL [39] to overcome this problem to improve the imbalanced classification. A study in [52] proposed four algorithms based on oversampling and undersampling VOLUME 11, 2023   techniques to minimize the effects of imbalanced problems in predicting essay grading. The distribution of each class is based on a stratified sampling approach so that the training dataset is maintained in the same ratios. It was observed that SMOTE and ADASYN were not effectively performed as ROS and RUS due to unusual patterns found in the spatial distribution of feature vectors extracted from textual data. On the other hand, [30] analyzed sampling methods using SMOTE, ROS, and ADASYN and hybrid sampling SMOTEENN algorithms to handle the highly binary imbalanced dataset. The studies used 4266 records of anonymized students with 12 features to predict students' grades based on previous course results. The imbalanced class distributions show that 91.16% and 8.84% were labeled as pass and fail, respectively.
Generally, after the sampling techniques, model classifiers' performance improves and achieves better results with a balanced dataset. However, ROS produced consistent results for all classifiers, including DNN, DT, LR, SVC, KNN, RF and GB. Intayoad et al. [75] employed three combinations of SMOTE, including Borderline SMOTE1, Borderline SMOTE2, and SVMSMOTE, to enhance the accuracy of the minority class samples using two educational datasets. Their results revealed that the proposed algorithms improve the precision, recall, and F1-score of minority class for KNN and NB compared to DT classifiers. However, SMOTE is not influenced by the imbalance ratio to affect classification performance.
On the other hand, we also considered methods to address the issues of noise in data, including outliers and bias for the imbalanced dataset. Jishan et al. [63] applied the discretization technique using optimal equal-width binning and SMOTE to predict students' final grades with 181 instances. Results demonstrate high accuracy for NN and NB classifiers with 75% compared to 61% after the proposed algorithms were performed. FIGURE 7 shows the percentages for various balancing algorithms frequently used in the student grade prediction domain. In total, there are fourteen (14) sampling methods have been extracted from this SLR. It is observed that most practices have used SMOTE oversampling (37.0%), then followed by ADASYN (13.0%) and ROS (10.9%). Meanwhile, RUS undersampling and hybrid sampling, SMOTEEN shared an almost equivalent distribution with 6.5%, followed by SMOTETL, SVMSMOTE and BorderlineSMOTE contributed with the same percentage of 4.3%. However, from the findings, we found very few researchers explored methods from different undersampling algorithms (NearMiss, Tomek-Links, ENN, NCR, and Spread Subsampling) and K-means SMOTE oversampling with only 2.2%. These findings indicate a valuable research gap that other researchers can further explore for the important contribution to the performance of predictive models in the future educational domain.
Another important issue that may adversely affect an imbalanced class distribution is the high data dimensionality. Lim et al. [51] investigated the effectiveness of feature selection using the Wrapper approach to reduce the number of 32 features for the imbalanced datasets. The results were validated using four popular classification algorithms: DT, NB, NBT, and LibSVM. The results found that using relevant features to assist C4.5 produced better prediction accuracy VOLUME 11, 2023 FIGURE 8. Comparison of best accuracy using various algorithms for handling imbalanced classification based on selected studies. than others. Besides, selecting the relevant features helps to upgrade the generalization abilities of the classifiers and reduce computational time and resources. Hussain et al. [46] proposed a feature selection algorithm based on a streaming model using the Alpha-investing method to predict students' difficulties in the learning session. Other than that, Saifudin et al. [44] identified that using a forward selection algorithm for feature selection reduces the computational complexity of the imbalanced dataset and improves the NB model's accuracy.
In contrast, Khan et al. [48] proposed a correlation-based filter approach to reduce the number of overlapping features and overfit the training dataset to improve the accuracy performance in student grade prediction. Besides the sampling method, Pristyanto et al. [71] proposed hybrid sampling known as SMOTE + OSS that combined SMOTE and OSS methods using 105 instances to improve the prediction performance on the binary imbalanced problem. In their study, the instances of minority classes were generated randomly based on k-nearest neighbors using SMOTE to reduce the risk of duplication. In contrast, the majority of class instances were selected using OSS to remove noise and borderline samples. The results concluded that SMOTE + OSS is more effective with a high average g-mean increase of up to 96.5% using SVM compared to KNN (89.4%) and NB 85.4%) classifiers. Other methods in [73] proposed a hybrid model based on the GWO algorithm, SMOTE oversampling and RF classifier to improve student academic performance. Their experiment showed that the proposed model could solve the imbalance problem in the selected datasets with high accuracy. Meanwhile, Mubarak et al. [29] noticed that imbalanced classes could cause poor performance results on dropout prediction models. Therefore, the author proposed a novel cost loss function algorithm to address the imbalanced problem with CONV-LSTM using deep learning to optimize the student dropout prediction. The proposed model also applied automatic feature extraction to extract the important features of raw clickstream data obtained from MOOC, showing better results than those who applied manual feature engineering. FIGURE 8 shows the best accuracy percentage using various algorithms proposed from previous studies for handling imbalanced classification to improve student grade prediction. The related articles are summarized in TABLE 9.

C. EVALUATION PERFORMANCE METRICS USED
In order to ensure that the model proposed is suitable for handling imbalanced classification problems, this subsection identifies the most often used performance metrics which respond to RQ3. As depicted in FIGURE 9, fourteen (14) frequently used metrics for evaluating the performance of classifiers based on accuracy, f1-measure, precision, recall, AUC, ROC, RMSE, False Positive Rate (FPR), True Positive Rate (TPR), Specificity, Kappa, MAE, incorrectly classified and correctly classified were proposed.
From the findings, 72.2% of the selected articles most considered accuracy, f1-measure, precision, and recall with  15.9% (20/126) respectively to evaluate the imbalanced classification performance in student grade prediction. Accuracy is the most intuitive performance measure defined as the ratio of the total number of correctly calculated predictions. In contrast, precision is the ratio of the correctly predicted positive to the total number of correctly predicted positive cases. Recall, also known as sensitivity, is the ratio of correctly classified cases to the total number of all actual class cases. Meanwhile, the f-measure or f1-score is the weighted average of precision and recall, which are considered a good indicator of the relationship between them. Usually, most previous researchers found the f-measure score is more valuable than accuracy for imbalanced classifications because it considers both false positives and false negatives rates. For example, studies in [30], [33], [48], and [63] have compared the score of accuracy, f1-measure, precision, and recall to evaluate the effectiveness of the predicted models. On the other hand, only 27.8% of articles reflect their performance evaluation based on the remain parameters. This includes VOLUME 11, 2023 AUC at 7.9% (10/126), ROC at 5.6% (7/126), and FPR at 3.2% (4/126), whereas TPR, Kappa, and RMSE contributed the same distribution at 2.4% (3/126). Apart from them, the correctly and incorrectly classified MAE was one of the least used metrics in the student grade prediction, with a score of 0.8% (1/126). The RMSE and MAE are metrics used to measure the regression model error.
Studies from [4] and [11] used the RMSE and MAE to evaluate the matrix factorization techniques for predicting students' grades. Meanwhile, [45] applied accuracy, f1-measure, Kappa, and ROC to evaluate different classifiers using 270 features with imbalanced classes to predict students' final grades. Kappa is another metric that compares the observed and expected accuracy, which is a good measure for handling multi-class imbalanced problems. A higher Kappa score closer to score 1 indicates a more accurate prediction model. Hence, the diversity of different performance metrics shows a significant impact in evaluating the model's capabilities from different perspectives when handling binary and multi-class imbalanced problems. TABLE 10 lists the specific articles that have utilized the performance metrics for this studies.

VI. FUTURE DIRECTIONS OF STUDENT GRADE PREDICTION
This study presents an overview of imbalanced classification methods used in student grade prediction focusing on two aspects; (a) appropriate solutions to build high-accuracy prediction models; (b) appropriate algorithms and metrics to appraise the performance of imbalanced datasets. Based on this review, there are some limitations, and underexplored methods have been shown in resolving the imbalanced datasets to build high-accuracy prediction models. Some aspects that are found to be potential and require further discussion for the future direction in this field are as follows:

A. SAMPLING BASED WITH FEATURE SELECTION METHOD FOR HIGH DIMENSIONAL IMBALANCED DATASET
Dealing with imbalanced classification on a high-dimensional dataset is very difficult to train and will negatively impact the performance of the predictive results. It is of utmost importance to overcome the high dimensionality of the data because it can produce the probability of data becoming sparse and highly imbalanced leading to inaccurate results. In this study, SMOTE is less effective for high-dimensional imbalanced dataset. Adopting feature selection can reduce the biased of minority classes to improve the predictive result. Therefore, building comprehensive data pre-processing is required by considering a hybrid feature selection method that can be integrated with different sampling methods to solve the problem of imbalanced classification effectively.

B. HYBRID-LEVEL APPROACH FOR IMBALANCED MULTI-CLASS CLASSIFICATION
Selecting appropriate hybrid methods and basic classifiers can increase the effectiveness of these algorithms, which can contribute to a better quality of educational data. Referring to FIGURE 8, we noticed that the performance of the hybrid approach achieved the highest accuracy in handling the imbalanced classification problem for student grade prediction. Furthermore, some methods discussed and proposed dealing with the binary imbalanced dataset cases, but the research on the hybrid approach of multi-class classification is still lacking. In future work, it is recommended to highlight the practice of using another aspect of hybrid approaches to resolving the multi-class problem in this domain to boost the accuracy of the predictive model.

C. IMPROVED ACCURACY OF IMBALANCED DATASET USING ENSEMBLE METHOD
The ensemble method is designed to increase the accuracy by not changing the base classifier to make a decision output of a single class. Based on the observation, we revealed that most of the studies used common traditional classifiers such as DT, SVM, kNN, NB, RF, etc., and only a few have investigated the potential of ensemble algorithms using ensemble algorithms such as Bagging, Stacking, and Boosting method for improving the performance of student grade prediction. It is more appropriate to highlight the performance of ensemble algorithms when dealing with imbalanced classes than the mere use of a primary classifier in predicting the student's grade prediction.

VII. CONCLUSION
This paper presents the survey of approaches and methods used to address the imbalanced class problem in the student grade prediction domain, including the state-of-the-art methods, solutions, impacts of the methods, and future directions for solving imbalanced classification problems. Overall, this review achieved its objectives of enhancing student grade prediction performance by highlighting the impact of imbalanced classification as the key solution to improve student grade prediction. Various approaches and methods proposed to overcome the difficulties found in imbalanced student grade prediction can be grouped into sampling methods, feature selection, cost-sensitive learning, and hybrid approaches. Nevertheless, this approach has its limitations. Still, it has many aspects that need to be improved regarding the diversity of methods used, although some have shown exemplary performance in handling imbalanced classes. Hence, it can provide new advances in developing a more efficient model focus for improving data quality, especially in the education domain. We will expand our investigation for future plans by considering the new techniques and methods in other related data-driven issues.

Acronym
Full He started his career as an IT Analyst Consultant for a New Zealand, Singapore, and Malaysianbased company. It is within this period of time that he obtained specific experience and field knowledge in software marketing, testing, and training. With his clear interest in academics, he expanded his knowledge by continuing his master's degree while at the same time conducted multiple research plus development works with the Ministry of Science, Technology and Innovation (MOSTI), in the midst of also tutoring in the Universiti Tenaga Nasional (UNITEN), Malaysia, and Asia Pacific University (APU). He joined UNITEN's permanent academic force as a Lecturer, in 2010, and has been contributing to the university to date. He is also a Lecturer with the College of Computing and Informatics, UNITEN. He is also working as the Head of Software Engineering Program, the Head of External Relations Unit, UNITEN, and a Treasurer for the IEEE Malaysia Section Computer Chapter. He has accumulated 15 years of teaching and training experiences, produces more than a 100 student projects and won multiple awards in international innovation competitions.
PO CHAN CHIU received the M.Sc. degree in information technology from the Universiti Malaysia Sarawak (UNIMAS), in 2010. She is currently pursuing the Ph.D. degree in computer science with the Universiti Teknologi Malaysia (UTM). She started her career as a Software Engineer for three years. She is also working as a Lecturer with the UNIMAS. She worked on several consultancy projects and developed software solutions to meet the needs of the woodworking industry. Her research interests include artificial intelligence, data analytics, optimization, and neural networks.
HAMIDO FUJITA (Life Senior Member, IEEE) received the Doctor Honoris Causa degree from Óbuda University, Budapest, Hungary, in 2013, and the Doctor Honoris Causa degree from Timisoara Technical University, Timisoara, Romania, in 2018. He is currently an Emeritus Professor with Iwate Prefectural University, Takizawa, Japan. He is also the Executive Chairman of i-SOMET Incorporated Association, Morioka, Japan. He received the title of an Honorary Professor from Óbuda University, in 2011. He is a Distinguished Research Professor with the University of Granada, and an Adjunct Professor with Stockholm University, Stockholm, Sweden; University of Technology Sydney, Ultimo, NSW, Australia; National Taiwan Ocean University, Keelung, Taiwan; and others. He has supervised Ph.D. students jointly with the University of Laval, Quebec City, QC, Canada; University of Technology Sydney; Oregon State University, Corvallis, OR, USA; University of Paris 1 Pantheon-Sorbonne, Paris, France; and the University of Genoa, Italy. He has published more 400 highly cited papers. He has four international patents in software systems and several research projects with Japanese industry and partners. He was a recipient of the Honorary Scholar Award from the University of Technology Sydney, in 2012. He is the Emeritus Editor-in-Chief of Knowledge-Based Systems. He is also the Editor-in-Chief of Applied Intelligence (Springer), he headed a number of projects, including intelligent HCI, a project related to mental cloning for healthcare systems as an intelligent user interface between human-users and computers, and SCOPE project on virtual doctor systems for medical applications. He is a Highly Cited Researcher in Cross-Field for the year 2019 and 2020 in computer science field, respectively from Clarivate Analytics. He collaborated with several research projects in Europe, and recently he is collaborating in OLIMPIA project supported by Tuscany region on Therapeutic monitoring of Parkison disease. VOLUME 11, 2023