Multiclass Prediction Model for Student Grade Prediction Using Machine Learning

Today, predictive analytics applications became an urgent desire in higher educational institutions. Predictive analytics used advanced analytics that encompasses machine learning implementation to derive high-quality performance and meaningful information for all education levels. Mostly know that student grade is one of the key performance indicators that can help educators monitor their academic performance. During the past decade, researchers have proposed many variants of machine learning techniques in education domains. However, there are severe challenges in handling imbalanced datasets for enhancing the performance of predicting student grades. Therefore, this paper presents a comprehensive analysis of machine learning techniques to predict the final student grades in the first semester courses by improving the performance of predictive accuracy. Two modules will be highlighted in this paper. First, we compare the accuracy performance of six well-known machine learning techniques namely Decision Tree (J48), Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbor (kNN), Logistic Regression (LR) and Random Forest (RF) using 1282 real student’s course grade dataset. Second, we proposed a multiclass prediction model to reduce the overfitting and misclassification results caused by imbalanced multi-classification based on oversampling Synthetic Minority Oversampling Technique (SMOTE) with two features selection methods. The obtained results show that the proposed model integrates with RF give significant improvement with the highest f-measure of 99.5%. This proposed model indicates the comparable and promising results that can enhance the prediction performance model for imbalanced multi-classification for student grade prediction.


I. INTRODUCTION (HEADING 1)
In higher education institutions (HEI), each institution has its own student academic management system to store all student data, including information about their final examination marks and grades in different courses and programs.These marks and grades are used to generate a student academic performance report, which evaluates their course achievements every semester.The data stored in the repository can provide valuable insights into student academic performance.Solomon et al. have emphasized that determining student academic performance is a significant challenge in HEI.Consequently, previous researchers have identified various influential factors that can greatly impact student academic performance.However, the most common factors are typically related to socioeconomic background, demographics, and learning activities, as opposed to final examination grades.Therefore, it is evident that predicting student grades could be a viable solution to enhance student academic performance.Predictive analytics has proven to be beneficial in HEI, as it enables the identification of hidden patterns and the prediction of trends in a vast database, thereby benefiting the competitive educational domain.It has been successfully applied in various educational areas, such as student performance, dropout prediction, academic early warning systems, and course selection.Furthermore, the use of predictive analytics in predicting student academic performance has steadily increased over the years.The ability to predict student grades is an important area that can contribute to improving student academic performance.Previous research has explored different machine learning techniques for predicting student academic performance.However, there is a lack of research on addressing the challenges posed by imbalanced multi-classification problems in predicting students' grade prediction.

II. FRAMEWORK OF GRADE PREDECTION
This document aims to identify the most effective predictive model for student grade prediction, specifically in addressing imbalanced multi-classification.Our framework takes as input the final course grade of the student, which we extract from their academic spreadsheet document and academic repository.To tackle the issue of imbalanced multiclassification, we employ two data-level solutions: oversampling using SMOTE and two feature selection (FS) methods.These techniques help reduce overfitting and misclassification in the dataset.Next, we combine these techniques to design our proposed model, which is then evaluated using performance metrics through a selected machine learning classifier.Finally, data visualization is utilized to depict the dataset trends and the final classification results.E, E-, F).These categories were established as the output for the prediction class.However, upon analyzing the class distribution of the dataset, it was evident that there was an imbalance in the number of instances for each class.Specifically, there were 63 instances of Exceptional, 377 instances of Excellent, 635 instances of Distinction, 186 instances of Pass, and 21 instances of Fail, with a high ratio of 3:18:30:9:1.This imbalance could potentially lead to overfitting results.To address this issue, data-level solutions such as oversampling SMOTE and two feature selection methods, Wrapper and Filter based, were employed as benchmark methods in this study.The experiment utilized the open-source tool Waikato Environment for Knowledge Analysis (WEKA) version 3.8.3due to its wide range of machine learning algorithms and user-friendly graphical interfaces for easy visualization.

III. PERFORMANCE ANALYSIS
The objective of this study is to forecast the final grades of students by analyzing their academic performance in the previous semester's final exams.The research utilized various machine learning algorithms to determine which algorithm yielded the most accurate predictions for the students' final grades.The study consisted of three experiments conducted in four distinct phases across five different classes.The accuracy of the predictions was assessed through ten-fold crossvalidation, where 90% of the dataset was allocated for training and 10% for testing on the same dataset.The theoretical models employed in constructing the multiclass prediction model included Logistic Regression (LR) and Naïve Bayes (NB).Logistic Regression utilizes a cost function that employs a logistic function to mathematically model classification problems, making it ideal for analyzing categorical data and understanding the relationships between variables.On the other hand, Naïve Bayes is based on Bayesian theorem and is favored for its simplicity and ability to provide quick predictions.It is particularly suitable for small datasets, combining complexity with a flexible probabilistic model for accurate predictions.The Decision Tree (J48) is a commonly used algorithm in various multi-class classification tasks, capable of handling missing values in high-dimensional data.It has been implemented successfully to achieve optimal accuracy results while using the minimum number of features.The K-Nearest Neighbor (kNN) algorithm, on the other hand, is a non-parametric method that classifies instances in a dataset by calculating the differences between them and their nearest vectors.The value of k represents the distance in the n-dimensional space.kNN employs a distance function to perform well in datasets with small features.Lastly, the Random Forest (RF) is a classifier that utilizes ensemble learning by combining multiple decision trees from various subsets of the data.This approach helps identify the best features for achieving high accuracy while mitigating the issue of overfitting.Additionally, RF demonstrates relative robustness to outliers and noise, making it an effective classification method.

IV. EXPERIMENTAL RESULTS
The study's findings are categorized into two subsections based on the research questions.A thorough performance analysis was carried out through three experiments utilizing real data.The results of J48, kNN, NB, SVM, LR, and RF experiments were examined and contrasted.Additionally, the effects of employing oversampling SMOTE and FS techniques to address the imbalanced multi-classification issue with the identical dataset were also assessed.The primary aim of this study is to evaluate the predictive model's accuracy performance by comparing six different machine learning algorithms.These algorithms were utilized to train a student dataset, and their prediction accuracy was assessed.The performance accuracy was analyzed using ten-fold cross-validation with stratification as the testing method to identify the most optimal predictive model.Various metrics such as classification accuracy, precision, recall (Sensitivity), and f-measure were employed to ensure the accuracy of the predictive model.The results of different classifiers on the student dataset are summarized.

IMPACT OF OVERSAMPLING AND FEATURE SELECTION FOR IMBALANCED MULTI-CLASS DATASET
In this study, we concentrate on the impact of oversampling and feature selection techniques on imbalanced multi-class datasets.Specifically, we utilize oversampling SMOTE and two feature selection algorithms to address the imbalanced classification issue.To evaluate the performance of various predictive models, we conduct three experiments using six machine learning algorithms.Initially, we apply SMOTE to the dataset with each of the six machine learning algorithms independently.Subsequently, we employ two feature selection algorithms separately with three different attribute evaluators.Lastly, we implement and test the proposed multiclass prediction model (SFS) using the same dataset with the six selected machine learning algorithms.In addition to accuracy, we also consider other performance metrics such as precision, recall, and f-measure to ensure the effectiveness of our predictive model in predicting the dimensionality accurately .

SMOTE OVERSAMPLING TECHNIQUE
The Synthetic Minority Oversampling Technique (SMOTE) is widely utilized to address the issue of overfitting in machine learning.It employs a random sampling algorithm to modify imbalanced datasets and generate new instances of the minority class using synthetic sampling techniques.This helps to create a more balanced distribution.In this study, the default parameter for the number of nearest neighbors (k) in the SG sample of the minority class was increased.N samples were then randomly selected and recorded as SGi.The new sample, SGnew, is defined by the following expression.
The random seed, 'random', is used for random sampling within the range of (0,1).We implemented the SMOTE filter in Weka, specifically the weak filters.supervised.Instance SMOTE' filter, to insert synthetic instances between the minority class samples and our dataset.The parameter for the index class value 0 was set to auto-detect the non-empty minority class.We also set the 'k' value for the number of nearest neighbors to 10 (k = 10), with a percentage of instances set to 100%.The SMOTE filter was applied in ten iterations.The oversampled dataset increased the number of instances from 1282 to 2932.The class distribution using SMOTE became (504) exceptional, (377) excellent, (635) distinction, (744) pass, and (672) fail, reducing the ratio to 1:1:2:2:2.In Table 6, we present the detailed comparison results of all predictive models with their performance measures.When the classifiers were used with oversampling SMOTE, we consistently observed an improvement in the effectiveness of all predictive models.Among these models, RF achieved the most promising f-measure of 99.5%, followed by kNN with 99.3%, J48 with 99.1%, SVM with 98.9%, LR with 98.8%, and NB with 98.3%.This result was statistically significant with a confidence level of 95% using the Paired T-Tester (corrected), as shown in Figure 6.We also observed that when the SMOTE method was applied, the number of minority class instances increased to balance with the other classes through the iteration and 'k' value.The accuracy performance was analyzed in detail based on the confusion matrix.

DISCUSSION
This research aimed to tackle the issue of imbalanced multi-classification problems in student grade prediction by focusing on data-level solutions.To address this problem, we utilized a real dataset of final course grades from JTMK at one of Malaysia Polytechnics and analyzed the results of our proposed model.A similar study conducted in the past also highlighted the importance of course grades in decision making within the educational domain.In order to answer our research question, we conducted a comprehensive experiment on the real student dataset, comparing the accuracy performance of our prediction model with a selected machine learning algorithm.Additionally, we applied oversampling SMOTE and two FS methods to assess the effectiveness of the predictive model, using evaluation metrics such as accuracy, precision, recall, and f-measure to demonstrate the performance of the predictive models.The results of the study revealed that predictive models generated from J48, NB, kNN, SVM, LR, and RF exhibited enhanced performance when SMOTE was applied individually to address the imbalanced dataset.However, when the FS method was applied to the imbalanced dataset using a wrapper-based approach, only kNN and NB demonstrated significant improvement, while SVM remained unchanged.Additionally, it was observed that SVM struggled to independently address imbalanced multi-classification due to limitations in computing the optimal hyperplane for highdimensional imbalanced datasets.On the other hand, NB's utilization of FS for predicting student grades was supported by previous research, which highlighted NB's superior accuracy performance when employing wrapper-based subset feature selection.Nevertheless, it was noted that FS alone did not enhance the accuracy performance of RF, possibly due to the imbalanced nature of the dataset.Therefore, while FS facilitated quicker interpretation of the predictive model, its impact on performance was not solely dependent on a few features.

FIG:
FIG: FRAME WORK OF GRADE PREDECTION