Abstract:
This research project uses careful data preparation and machine learning model assessment to provide an in-depth analysis of a dataset of students in college or universit...Show MoreMetadata
Abstract:
This research project uses careful data preparation and machine learning model assessment to provide an in-depth analysis of a dataset of students in college or university. The first analysis looks at goal value distributions, economic variables, and student counts by gender. The handling of outliers, feature selection, and class imbalance are all addressed by further filtering. Using ROC curves to highlight classification strength, the study assesses several classifiers, including XGBoost, Random Forest, K-Nearest Neighbors (KNN), and Decision Tree. With the greatest AUC of 0.99, Random Forest remarkably shows excellent predictive power, closely followed by XGBoost at 0.98. XGBoost performs exceptionally well on testing and training datasets. The findings contribute valuable insights into predictive modeling for understanding and predicting student outcomes, emphasizing the potential to enhance educational support systems. This integrated approach, combining exploratory data analysis and machine learning techniques, establishes a robust framework for future research in educational data mining and predictive analytics.
Published in: 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom)
Date of Conference: 28 February 2024 - 01 March 2024
Date Added to IEEE Xplore: 18 April 2024
ISBN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Prediction Analysis ,
- Student Dropout ,
- Prediction Model ,
- Machine Learning ,
- Receiver Operating Characteristic Curve ,
- Training Dataset ,
- Random Forest ,
- Education System ,
- Test Dataset ,
- Decision Tree ,
- Machine Learning Models ,
- K-nearest Neighbor ,
- Class Imbalance ,
- Root Mean Square Error ,
- Higher Education ,
- Training Set ,
- Learning Algorithms ,
- Artificial Neural Network ,
- Binary Classification ,
- Dropout Rate ,
- Synthetic Minority Oversampling Technique ,
- Data Pre-processing ,
- Predictors Of Dropout ,
- Male Students ,
- Female Students ,
- F1 Score ,
- Target Variable ,
- Target Value ,
- Discrimination Performance ,
- Mode Of Application
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Prediction Analysis ,
- Student Dropout ,
- Prediction Model ,
- Machine Learning ,
- Receiver Operating Characteristic Curve ,
- Training Dataset ,
- Random Forest ,
- Education System ,
- Test Dataset ,
- Decision Tree ,
- Machine Learning Models ,
- K-nearest Neighbor ,
- Class Imbalance ,
- Root Mean Square Error ,
- Higher Education ,
- Training Set ,
- Learning Algorithms ,
- Artificial Neural Network ,
- Binary Classification ,
- Dropout Rate ,
- Synthetic Minority Oversampling Technique ,
- Data Pre-processing ,
- Predictors Of Dropout ,
- Male Students ,
- Female Students ,
- F1 Score ,
- Target Variable ,
- Target Value ,
- Discrimination Performance ,
- Mode Of Application
- Author Keywords