A Machine Learning-Based Recommender System for Improving Students Learning Experiences

Outcome-based education (OBE) is a well-proven teaching strategy based upon a predefined set of expected outcomes. The components of OBE are Program Educational Objectives (PEOs), Program Outcomes (POs), and Course Outcomes (COs). These latter are assessed at the end of each course and several recommended actions can be proposed by faculty members’ to enhance the quality of courses and therefore the overall educational program. Considering a large number of courses and the faculty members’ devotion, bad actions could be recommended and therefore undesirable and inappropriate decisions may occur. In this paper, a recommender system, using different machine learning algorithms, is proposed for predicting suitable actions based on course specifications, academic records, and course learning outcomes’ assessments. We formulated the problem as a multi-label multi-class binary classification problem and the dataset was translated into different problem transformation and adaptive methods such as one-vs.-all, binary relevance, label powerset, classifier chain, and ML-KNN adaptive classifier. As a case study, the proposed recommender system is applied to the college of Computer and Information Sciences, Jouf University, Kingdom of Saudi Arabia (KSA) for helping academic staff improving the quality of teaching strategies. The obtained results showed that the proposed recommender system presents more recommended actions for improving students’ learning experiences.


I. INTRODUCTION
Over the last century, a significant concern on the ability of education systems to equip the students with the adequate professional and career preparation needed for the 21 st century has evolved. In response, OBE has been proposed as a theory that bases each part of an educational system around outcomes. As stated by [1], outcomes are the abilities that the students can acquire at the end of a learning experience. The most important aspect of an outcome is that it should be observable and measurable.
OBE is an enlightening approach of teaching strategy that is based upon a predefined set of expected outcomes [2].
Outcome-based education means clearly focusing and organizing everything in educational system which is The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues . important for all students to be able to do successfully at the end of course/program/graduation as stated by many works in the literature including [2]- [5], and [6] and many others.
Teaching is then designed to engage students in learning activities that optimize their chances of achieving those outcomes, and assessment tasks are designed to enable clear judgments as to how well those outcomes have been attained [3], and [5].
Nowadays, this approach is applied in tertiary education by specifying the outcomes at three levels: (i) PEOs which are broad statements that describe the career that the program is preparing students to achieve, (ii) POs which are narrower statements that describe the knowledge, the skills and competence, those students are expected to know and be able to do by the time of graduation, (iii) COs which are statements that describe what students are expected to know, attitudes they are expected to hold, and what they are able to do after the completion of a course [7].
In an OBE learning process, no single specified style of teaching or assessment is specified and by the end of the educational experience each student should have achieved the program educational objectives [1]. Teachers are expected to act as facilitators of learning by creating and sustaining an effective learning environment for students to develop the competencies that the program of study expects to foster [4].
No course of the study program can be dealt with in isolation. Each course is an element of a program and often related to other courses of the program. OBE does not interfere with the academic freedom of the teacher. It merely asks the teacher to follow a process in offering a course. The process consists of writing the Cos about what the students should be able to do in the context of its relationship to the other elements of the program, design assessment (how to measure the ability of students to do what they are expected to do) and instruction (how the teacher proposes to facilitate the students to acquire the ability to do what they are expected to do). All the decisions in all the three steps of the process are made by the teacher [1], [2].
Assuming the Cos, are written carefully to reflect the intended outcomes of the course, the attainment of the COs should lead to the attainment of the POs and consequently the PEOs. However, it is necessary to follow well-defined COs assessment process consistently to plan for continuous improvement in the quality of the learning. The COs assessment process is carried out based on both direct and indirect assessments [2]. The direct assessment can be determined from the performances of the students in all the relevant assessment methods. With regard to the indirect assessment, it is based on the collected feedbacks from students.
Feedbacks are taken using online forms. Teachers are asked to prepare course reports based on the results of the COs assessment process. These course reports are used at the end of each semester to improve the student learning experience. Indeed, each teacher proposes several recommended actions to enhance the quality of the courses he/she taught and therefore improving the overall educational program.
In recent years, assisting and helping academic staff in improving the teaching and learning quality is considered a major concern in both research and practice. In fact, the huge amount of data accumulated in educational institutions creates a gold mine, from where useful knowledge can be discovered. This initiates a new and fast-growing interdisciplinary research field of data mining called Education Data Mining (EDM) [8], and [9]. Several methods and applications have been proposed in EDM [10], and [11].
Indeed, various tasks and applications in the field of EDM have been studied and their purposes have been categorized [10]. Besides, [11] has described the most popular EDM methods. However, these proposed taxonomies are not exhaustive; they do not cover all the possible tasks. Indeed, there are many more specific objectives depending on the viewpoint of the final user.
These applications can follow both applied research objectives such as improving learning quality, as well as pure research objectives, which tend to improve the understanding of the learning process. The applications of EDM can target any of the stakeholders involved in educational systems, such as students, teachers, and researchers themselves [10].
Making discoveries and providing recommender systems can help the teachers by improving teaching performance and making decisions [10]. The applications of EDM have been reported in a number of academic works. A combination of neural networks and experts' prior knowledge has been applied in [12] to predict and evaluate student's learning outcomes of an academic program and as a result enhance teaching quality. A second work [13] applied K-means clustering algorithm to investigate the relationship between skills taught in Business programs and the title of the program using a dataset extracted from the program catalog.
The concept of using assessment information for course improvements is nothing new; teachers have been doing it forever. It is an integral part of being an effective educational professional. Assessment results can help teachers decide how useful their assessment strategies have been and what changes are needed to improve their effectiveness [14].
The sources of that assessment information were different; instructional decisions and actions were often based on intuition, teaching philosophy, and personal experience [15]. Indeed, there is no a systematic process (or data-driven approach) for course improvements. Teachers often like to try out different instructional strategies based on their own judgments. This trial-and-error process of choosing a strategy, and judging its efficiency is different for every teacher. How do the teacher decide which strategy to try, and how do he know whether it ''worked''? The trial-and-error process is not very consistent and can lead to ambiguous results [15].
Considering the lack of knowledge and guidelines on the actions to improve students' learning [15] and the recent remarkable success of the educational data analytics, this paper proposes a recommender system, using different machine learning algorithms, for predicting suitable actions to enhance the quality of the courses and thus to improve the overall educational program. Predicting the suitable actions will be based on courses specifications, students' academic records, and course learning outcomes' assessments.
The data are fed as input to the proposed model and the recommender system will answer the following questions: 1. Which is the best method for this multi-label classification problem?
2. Which is the best algorithm to predict the most accurate courses improvement actions?
3. Which features are selected to be the most relevant predictors for courses improvement actions?
The remaining of this paper is structured as follows. Section II presents the related works. Section III discusses the methodology carried out in this study from the data collection to the model proposal. Section IV discusses the experimental results. Section V explores the analysis and discussion of the proposed recommender system. Finally, the paper ends in section VI with a conclusion and future directions. The definitions of the major used acronyms in this paper are presented in Table 1.

II. RELATED WORKS
Nowadays, it is possible to generate a great volume and wide variety of data from educational environments such as those related to students' academic records, assessment files, courses reports and the records of interactions by e-mails between students and teachers. The generated data by educational environments play a significant role for decision support and improvement of the learning process. The improvements can be reached from the students' data analysis through of their behavior, satisfaction, and performance [16]. DM techniques are an alternative to extract knowledge from these data and therefore to enhance the learning process [17].
EDM has been a fast-growing interdisciplinary research field [8] and [11]. The researches in EDM expanded into several areas, including studies on interactions between the educational actors, planning and assessment of educational programs, teaching strategies, course contents and educational outcomes, improving students' performance, identifying students' learning styles, supporting weak students to prevent them from failure and dropout, gaining knowledge about different educational phenomena and solving the relevant complex problems [16] and [17].
Different kinds of data can be collected in order to resolve educational problems depending on the type of the educational environment even traditional classroom education or computer-based education with the information system used, such as LMS, ITS, and MOOC [11]. Data preprocessing is considered a hard and complicated task. Educational data is available whether raw, original, or primary data to solve problems that are not in the appropriate form. Therefore, it is necessary to convert the data to an appropriate form for solving each specific educational problem [18].
The majority of traditional DM techniques including but not limited to visualization, classification and clustering techniques have been already applied successfully in the educational domain. However, educational systems have also some special characteristics with hierarchical and longitudinal data that require a specific treatment of the mining problem and preprocess of the data [11].
Nowadays, a wide array of well-known tools and frameworks can be used for the purposes of conducting EDM research. As presented in [19], 40 tools used for data mining have been reviewed in the area of education. This review is useful to researchers interested in learning about these tools not just at a theoretical level, but also in terms of practical application and use. The majority of the EDM researchers use their own data for solving their specific educational problems [11]. Gathering and preprocessing educational data is a hard and very time-consuming task [18]. That's why another option consists in using public datasets that are currently available for free download on the Web. Examples of EDM public datasets are proposed in [11].
Several works have been proposed in the literature to tackle different educational problems. These works have different objectives depending on the viewpoint of the final user (students, instructors, administrators or other stakeholders). In the Sequel, we will present some of these proposed works. A complete survey of these works is beyond the scope of the current paper and more details can be found on [10] and [11].
The first work we identified has been proposed in [20] for helping students to select the more suitable faculty based on several criteria including his/her grades in different subjects in high school, the country state where he/she is located and the gender. The proposed enrollment recommender system consists of two phases; training phase and runtime phase. The training phase takes the previous high school database and faculty database as input and generates the faculty student model, whilst the runtime takes a new student as input and produces as an output the recommendation for this student, that is, either suitable or not, to join that faculty. As shown in [20], a recommender system was applied on the enrollment process for faculty of Engineering, Al-Azhar University. They employed and compared four machine learning algorithms. Based on the obtained results, the Alternative DT algorithm outperforms the other algorithms with an overall accuracy close to 80%.
A second work was presented in [21] where a student performance prediction system was proposed to determine students who would be expected do well in the Faculty; precisely to succeed in studying engineering programs. They used an Artificial Neural Network model to predict the performance of a student before he/she starts his/her sophomore year in Engineering studies based on a number of factors such as high school score, results in some subjects in the freshman year including mathematics and electronics, student's gender, type of High school whether it is private or public. The proposed prediction model was tested and the overall result was 84.6%.
The third work we identified in the literature focuses on factors affecting the students and the instructor performances. Indeed, the authors in [22] investigate the factors that affect students' achievements to enhance the quality of the educational system and propose a predicting model of the instructor performance. Indeed, the main goal of the authors is to build a classification model that enables to predict factors affecting student performance. The authors used four well-known DM techniques, namely J48 DT, MLP, NB, and SMO. Based on the obtained results, authors claim that J48 DT algorithm achieves the best performance compared to the other algorithms with an accuracy of 84.8%. Another interesting issue presented in this research affirmed that the performance of an instructor is mainly affected by the number of courses that is taught.
Another work which is interested in the students' performance was proposed by [23]. The authors claim that students' performance is related to students' dynamic behaviors more strongly than teaching environment and students intrinsic characteristics. They stressed that living and study habits are greatly associated with academic success. Based on this observation, the authors proposed a framework to model students' academic performance using their behavior pattern. Discovering students with poor academic performance early is helpful to supervision and developing good study habits. The authors used the smart card records to build the students' behavior pattern. After that, they used a regularized multi-task model for the classification task to predict the students' performance of each course simultaneously. The experiments results showed a high recall of poor performance discovery and enough feasibility for early warning. As a matter of fact, the authors found that some behaviors such as shopping, meal in canteens, and leaving dorm before 8 a.m. have relative significant relation with performance. They stated that ''moderate diversity on campus and a good lifestyle especially on meals and getting up are in favor of good performance''.
The fifth identified work focuses also on predicting outcomes of student performance at the end of a one year school cycle [24]. Indeed, the authors employed two datasets obtained from a repository of the State Department of Education of the Federal District of Brazil. The first dataset contains only attributes collected prior to the beginning of the school year; however the second one contains those same attributes, but included furthermore a few new variables, such as 'absences' and ' grades'. After that, the authors built a classification model based on the GBM for each dataset to predict student performance, allowing them to compare the predictive capability at two different moments in the academic cycle. The experiments results showed that, though the attributes 'grades' and 'absences' were the most relevant for predicting the end of the year academic outcomes of student performance, the analysis of demographic attributes reveals that 'neighborhood', 'school' and 'age' are also potential indicators of a student's academic success or failure.
Another educational problem has been treated by [8]. In fact, the authors have focused on a very important problem in higher education which is student dropout. As stated in this work, early detection of vulnerable students can lead to the success of any retention strategy. At-risk students would be provided with academic and administrative support to increase the chance of staying on the course. The proposed work aims to disclose interesting patterns, which could contribute to predicting students' performance and dropout, based on their pre-university characteristics, admission details, and initial academic performance at university. The authors introduced a new feature transformation method to improve the accuracy of conventional classifiers and therefore maximize the students' attention and recommending courses for them based on their progress.
In this paper, we have described some academic works proposed to solve educational problems. However, the number of possible objectives or educational problems in educational data mining is huge and there is no taxonomy that covers all the possible tasks. In fact, there are many more specific objectives depending on the viewpoint of the final user (students, teachers, scientific researchers or administrators). Indeed, to the best of our knowledge, this is the first study to deal with predicting the recommended actions to improve the students' learning experiences based on course specifications, academic records, and course learning outcomes' assessments. The current paper implements a recommender system using applied machine learning algorithms with data engineering by formulating multi-label multi-class classification problems. The proposed recommender system aims to predict the suitable actions for enhancing student learning experiences. As we mentioned in the previous section, teachers are asked at the end of each semester to prepare course reports based on the results of the COs assessment process. These course reports are used to improve the student learning experience. Indeed, each teacher proposes several recommended actions to enhance the quality of the courses he/she taught and therefore improving the overall educational program. Therefore, a recommender system that predicts automatically and efficiently the more suitable actions will be very helpful for teachers in preparing their course reports. VOLUME 8, 2020

III. METHODOLOGY
The proposed machine learning-based recommender system is considered a simulation to the hybrid recommender approach presented in [25] where the student's or learner's average score in specific or all modules can affect the quality of course content that can lead to poor enhancements in student learning outcomes.
The dataset of the proposed recommender system is extracted based on the student course learning outcomes, academic records, and course specification for each distinct course. The student in each semester studies several courses in different scientific domains. In each studied course, a course file must be prepared at the end of the semester in order to determine the proposed improvements that can be made based on the student records and the course learning outcomes. An adaptive methodology is proposed to recommend the best suitable actions for each taught course based on the overall student academic records in the course under concern. The adaptability of the proposed methodology is executed by applying the dataset on different problem transformation and adaptive methods for dynamically predicting the suitable actions that can be taken.
The methodology is formulated based on different steps starting from dataset collection, data preprocessing and problem transformation. The methodology is illustrated in the following steps:

A. DATASET COLLECTION
To the best of our knowledge, there is no dataset addressing the main topic of the current paper. Therefore, the dataset in our research was collected from the course reports of the College of Computer and Information Sciences at Jouf University, Kingdom of Saudi Arabia (KSA). In the college of Computer and Information Sciences, the students are allowed to study in four years divided into eight semesters. The students are distributed into three scientific departments: Information Systems, Computer Science, and Computer engineering and networks.
The dataset was collected from the courses reports of 127 scientific courses taught in three departments during 4 semesters in the two academic years 2018 and 2019. In addition, there are graduation projects 1 and 2 in each department. Both graduation projects 1 and 2 are taught in separate semesters where Project 1 course is the prerequisite of Project 2 course. Each course has two sections respectively for male and female students.
At the end of each course, the teacher assesses the course outcomes in order to recommend several actions to enhance the quality of the course and therefore the overall educational program. The course outcomes assessment process is carried out based on both direct and indirect assessments. The direct assessment can be determined from the performances of the students in all the relevant assessment methods including final exams, midterm exams, assignments, labs and projects. With regard to the indirect assessment, it is based on the collected feedbacks from students. Since the students' feedbacks can be subjective and therefore it cannot be a valid assessment method, we used only the direct assessment in our proposal. In fact, we used only valid assessment methods including (final exams, midterm exams, assignments, projects and lab exams) to predict the suitable actions to improve each course. This dataset is considered the first dataset that has been collected to handle the recommended actions for improving and enhancing courses.
As presented in [26], different factors can affect the integrity of E-learning in Saudi Arabian universities such as obtaining solutions from other students during the examination and as a result, the success rate will be changed and as a result the recommendations for improving teaching strategies will be affected.
Another case study for studying the effect of E-learning on higher educational universities was presented in [27]. The authors of that research focused on the student satisfaction, self-efficacy, and the content of E-learning. These factors can also affect the recommended actions that must be taken to improve the course contents and teaching strategies. In order to improve the performance of the proposed recommender system, the collected dataset must be complete and accurate from both teaching and quality aspects. Each course under consideration should be identified based on the following factors:

1) NO OF STUDENTS IN EACH SECTION
The number of students in each section of the course must be identified to ideally reflect the suitable actions for improving the quality of teaching strategies. The more the students in each section, the more it will be helpful to evaluate the course learning outcomes more effectively.

2) COURSE CREDIT HOURS
Each course taught is classified based on its teaching mechanism. The teaching mechanism for each course is whether it is theoretical, exercises, and practical lab. Each course is based on at least two applied mechanisms. As shown in Table 2, a sample of several courses with different teaching mechanisms is presented.
In Table 2, the teaching mechanism in each course is completely different from other courses according to methods of teaching. For example, the course CS 230 has a teaching mechanism (3, 1, and 0). This means 3 credit hours of theoretical lectures, 1 credit hour for exercises with no practical hours. The course CS 350 which is ''Introduction to Database Systems'' is considered a theoretical and practical course. So, it has a teaching mechanism (3, 0, and 2) while the course CS 360 basically depends on teaching with labs more than the theoretical lectures.
As a result, it has a teaching mechanism (1, 0, and 2). The course CS 410 is based also on practical but the theoretical lectures are more than the practical lectures with a teaching mechanism (2, 0, and 2). These teaching mechanisms will be a main concept in determining the best recommended actions based on the distribution method of credit hours.

3) NO OF TOPICS IN EACH COURSE
Each course contains a number of topics that differ from any other course based on the scientific content. There is a direct relationship between the numbers of course topics and the learning outcomes. The more the number of course topics, the more the learning outcomes will be applied and the more recommendations will be taken to improve the course content. Formula (1) explains the relationship between the course topics (C T ) and the course learning outcomes (CLO) and the recommended actions (A R ): where, the CLO i is each course learning outcome in the intended course and CLO n is the overall course learning outcomes of the course that is related to the program learning outcomes PLO. Increasing the sum of all course topics C Ti will lead to more recommended actions A R . For example, the information systems program, provided by the College of Computer and Information Sciences at Jouf University, encompasses 12 PLOs that describe the career of students enrolled in this program. The number of topics differs from one course to another and consequently the number of covered course outcomes; e.g., the ''Information System Engineering'' course which delivers more than 12 topics has 4 CLOs however the ''Fundamentals of Information System'' course which delivers only 8 topics covers 3 CLOs. At the end of each semester, professors propose actions to enhance the quality of the course he/she taught based on the results of the CLOs assessment process. The number of recommended actions to the ''Information System Engineering'' course for instance will be greater than the number of actions recommended to the ''Fundamentals of Information System'' course.

4) ASSESSMENT MARKS
Based on the type of course teaching mechanism presented in Table 1, different assessment marks are applied on courses based on their category whether they are theoretical or practical courses. These categories are discussed in the following sections: • Final Exam: the final exam marks will be categorized based on whether the course is theoretical or lab. Theoretical courses will have a final exam marks with 60 marks out of 100 marks. The remaining 40 marks will be distributed to midterms and assignments. On practical courses, the final exam mark will be 50 marks out of 100 marks where the remaining 50 marks will be distributed to midterms, lab exams, and assignments.
• Midterm 1 and 2 Exams: the first and second midterms will have 15 marks each whether the course is theoretical or practical. This is considered a main issue in all courses while the remaining marks will differ according to the course type.
• Assignments, Labs, and Projects: in this category, the assignments such as reports, surveys, and quizzes will be applied on theoretical courses while the labs and projects will be applied on courses that depend on practical applications.

5) GRADE DISTRIBUTION
The grade distribution is based on the total marks of all assessments methods used in the course. As shown in Table 3, the grade distribution can be high, above average, average, below average, and fail. Recommending suitable actions for improving students learning experiences is based on verifying the student category with a feature binary method. The feature binary method is applied based on the following: • Calculate the student category feature δ by calculating the value of the mean µ + 10 for each feature. As the success rate in the credit hour mechanism is based an overall percentage of success of 60%. Based on the mean value µ + 10, the category feature is applied based on formula (2): In order to recommend the suitable action for improving the course, the total number of students in each grade distribution is determined.  [6]. As stated by [14], learning outcomes are clearly articulated statements that refer to the specific knowledge, practical skills, and areas of professional development, attitudes, or higher-order thinking skills that the instructors expect the students can develop and learn by the end of their learning. Each course learning outcome CLO i has different suitable actions that can be applied. The CR calculates the achievement percentage of each learning outcome and based on this value, a set of course actions are recommended to enhance the quality of teaching strategies.

7) TOTAL NUMBER OF QUESTIONS
In the final and midterm exams, each question should cover a specific learning outcome CLO i from the overall learning outcomes CLO n of the course and each question must also obtain a specified mark from the exam.

B. DATA PREPROCESSING
Most of the effort of applied machine learning researches are preparing and formulating the data to be in suitable format for machine learning algorithms. The preprocessing of data is highly correlated with the type of problem as in text classifications. In text classification, the problem should be formulated to generate the features from text while in image classification the problem should be formulated to extract the interested features from image. The rest of the study is based on performing different experiments using different machine learning algorithms and selecting the best algorithm suitable for the problem under consideration. The success or failure of these algorithms is based on the nature of dataset which is different from problem to another.
The preprocessing stage in many research papers are based on traditional machine learning algorithms that use single label data [28]. Different classification methods are enhanced to be adapted with more than one class label. These methods are based on two main categories and are discussed as follows: • Problem Transformation (PT): the problem is formulated as a multi-label multi-class problem which is the general case for both multi-label and multi-class.
In multi-label, the number of targets is more than one and the target cardinality is less than two (0 or 1) while in multi-class the number of targets equals one and the target cardinality is more than two (such as categorical).
In the case of multi-label multi-class or multi-output multi-class, the number of targets is greater than one and the target cardinality is greater than 2. As presented in [29], a multi-label classification was proposed to model dependencies between data labels where a Monte-Carlo mechanism was applied to increase accuracy. As shown in [30], the multi-label classification was used to detect connections between different multi-label methods for predicting sequential data. In this research, the multi-label multi-class data is transformed into smaller single label data classification. During the formulation, a binary relevance is applied to convert the multi-label problem into single label classification data. Each dataset has all instances of the source dataset with a single corresponding label [31], and [32]. The binary relevance uses each label as a distinct input during the learning mechanism and a binary classification process is associated to each label [33], and [34]. Once the transformation is executed, a binary classification algorithm can be used to predict suitable actions.
• Model Ranking and Selection (MRS): the MRS is based on selecting the best algorithm to deal with the problem under consideration. The MRS is based on ranking algorithms based on their performance and then selecting the best algorithms performance. Based on both PT and MRS, the mechanism of the first method is executed inversely from the second method. In PT, the multi-label dataset is formulated to the proposed algorithms while in the MRS the proposed algorithm is adapted to the multi-label dataset to select the best one. In our proposed paper, advanced algorithms for manipulating multi-label classification datasets are deployed to deal with multi-label datasets.
Let  O) is the input and output space respectively. The main objective of multi-label classification is to search for a connection function f from the input dataset S to the output space 2 L such that each single dataset s ∈ S must be part of the output space 2 L .

1) DATA FILTERING
The process of data filtering is used to eliminate unnecessary information that will not effect on the overall performance of the recommender system.
In all educational systems, there are different types of students classified based on whether they have continued in the course or not. As shown in formula (3) the student status is defined as follows: where, ST i is a single student status from the overall students ST n . Each student ST i can be classified based on his status on the course under consideration with the following conditions: ST IN − refers to the incomplete student who did not complete the project at the end of the semester.
ST WD − is the withdrawn student who withdraw the course during the semester ST DN − is the denied student who was denied from attending the final examination due to exceeding the specified absence percentage.
ST R − is the regular student who attends most theoretical and practical lectures.
Regarding the student status, the recommender system will propose different suitable actions for only regular students ST R who have already finished the course. The data filtering process will also remove all irrelevant features such as 'Courses ID', 'Section', 'Semester', 'Year', Department' that will not affect the prediction of suitable actions

2) HANDLING MISSING VALUES
For handling missing values, all numeric fields that have null values will be converted to 0 while the courses that have null actions will be converted to −1 to differentiate between both null value and null action.

3) MAPPING ACTIONS TO CLASSES
The recommended actions of the proposed recommender system are categorized to nine actions. As presented in Table 4, the nine actions are mapped from 0 to 8 with −1 that refers to ''no action''. Each action has a number of samples with a total of 304 samples. The recommended actions were developed from the course reports prepared by professors at end of each semester. Indeed, based on the results of the COs assessment process, each teacher propose actions to enhance the quality of the course he/she taught and therefore improving the overall learning experience. So, after gathering the course reports during several semesters, we studied the proposed actions one by one. Then, we eliminated the actions that cannot be applied (such as ''add more credit hours to the course'') and the redundant ones.
Based on the preprocessing step, the final input data to the recommender system will be presented in Table 5. As shown, the class actions are identified for each course learning outcome CLO i where each class action is used to enhance the quality of teaching strategies based on the percentage achieved in each CLO i . As presented in CLO 1.1 no actions will be taken due to the high percentage of student achievements which is 85.67%. So, the action value will be −1.

C. PROBLEM TRANSFORMATION
In this section, a brief description of four problem transformation methods and one adaptive method that are applied on five base classifier algorithms. Multi-label learning is considered a supervised learning process into which the training dataset is associated with multi-label multi class binary classification [35] while the traditional methods are based on learning from single-label data. As shown in Figure 1, the overall framework is decomposed into training and testing mechanisms. The training dataset is used for learning by fitting the preprocessed data into the specified problem transformation methods. The testing dataset is independent from the training dataset but if the dataset is already fitted in the training dataset, the features of the testing dataset will be also fitted and selected.
As depicted in Figure 1, a multiple problem transformation is examined with different machine learning algorithms on the training dataset and the trained modeled scored on the test dataset. The MRS will select the best model performance to be deployed as a final model for predictions. In the problem transformation, the multi-label data is transformed into multiple binary classification dataset. As presented in the prediction framework, four problem transformation methods are applied: OvA, BR, LP, and CC. The fifth method is considered the adaptive classifier that uses ML-KNN classifier. The OvA is considered a heuristic method for using binary classification algorithms [36]. It is based on decomposing the multi-label data into binary classification dataset and a binary classifier is applied to improve predictions. The CC is trained on the input data and then each next classifier is trained on the combined input and all previous classifiers output in the chain.
BR is a straightforward method for handling multi-label classification. The main strategy of BR is to reduce multilabel problem with m labels into m independent binary classification problem [37], [38]. The binary relevance is not dedicated to a specific learning mechanism. Instead, it can be applied in several binary classification algorithms and can be adapted to learn from multi-label multi-class datasets with missing labels [39]. The LP is another problem transformation method into which each label l is used as a class in multiclass datasets [40].
The multi-label is converted to multi-class problem with one classifier trained in all label combinations in the training dataset. CC is considered one of the most effective binary classification methods for handling multi-label multi-class problems. In most methods of multi-class classification, the labels are associated with small number of training datasets that is considered a challenge in most multi-label classification methods due to an imbalance of label dataset [41]. The CC can deal with negative and positive labels with low number of label datasets.
Adaptive algorithms are based on adapting the specified algorithm to the multi-label datasets. The ML -KNN is based on traditional KNN that maximizes posterior estimation to identify all labels in the dataset under consideration. The enhancement of ML -KNN is that it can use smaller training datasets with better efficiency and high performance [42].

IV. EXPERIMENTAL RESULTS
The experiments performed using a data split technique with 70% of training data and 30% for testing dataset. The results were conducted by testing different problem transformation and adaptive techniques such as OvA, BR, LP, CC, and adaptive ML -KNN with different classification algorithms such as SVC [43], and [44], LR [45], RF [46], Gaussian NB [47], and DT [48].
To measure all the metrics of the experimental results, we must include all classes of the datasets based on the following methodology: S -is the input space dataset. L -is the number of labels in the label set. C -is the classifier used in the prediction process. L P -is the labels predicted by the classifier C. L I -is the true labels associated with the predicted labels L P .
-is the difference between two input datasets that is exits in either of the sets but not both. To measure the precision of the datasets, the number of the TP i predictions is divided by the total number of TP i and FP i predictions as shown in formula (4): The recall is measured with the ratio between the TP i predictions divided by the total number of TP i and FN i predictions as shown in formula (5): The F1-measure ratio is identified using the mean of both precision and recall as shown in formula (6): The input space dataset S is used in the four problem transformation classifiers: OvA, BR, LP, and CC and is applied also with the adaptive ML KNN adaptive algorithm. Each classifier is applied to five classification algorithms: SV, LR, RF, Gaussian NB, and DT. The best three classifiers for each method are explained in the following sections and the overall experimental results are obtained with the top 10 classifier results.

A. OVA CLASSIFIER
The OvA classifier is applied on different classification algorithms. When the OvA is used with SVC, the precision, recall, and F1-measures achieves good results on all actions of teaching strategies especially on action 7 which is ''Using blended learning'' and action 1 which is ''Add further case studies''. As shown in Figure 2, the precision, recall, and F1-measure of action 7 and action 1 achieved 100%, 67%, 80% and 92%, 85%, 88% respectively.
As shown in Figure 3, the OvA is used with the LR algorithm. The precision of both action 2 and 7 achieved 100%. The recall rate of action 5 and action 1 achieved 92% and 85% respectively while F1-measure for both action 1 and 5 achieved 88% and 87% respectively.
The OvA is applied with DT classification algorithm as shown in Figure 4. The precision, recall, and F1-measure achieved high results on most actions. The precision of action 8 which is ''Recall basic concepts used in the course'' achieved 100%. The recall of actions 3 and 5 achieved 82% and 81% respectively while the F1-measure of actions 8 and 1 achieved 88% and 80% respectively.

B. BR CLASSIFIER
When the BR method is used on multi-class multi-label datasets with SVC, the same results are achieved like the OvA method with SVC where actions 7 and 1 will have the 201226 VOLUME 8, 2020  best results on precision, recall, and F1-measure. As shown in Figure 5, the recall of actions 3 and 5 is better than the recall of action 7.
As shown in Figure 6, using BR with LR, the precision on actions 7 and 1 achieved high results with 100% and 92% respectively. Action 1 achieved the results on both recall and F1-measure with 92%.
The BR is applied with decision tree classification algorithm as shown in Figure 7. The precision, recall, and F1-measure achieved high results on most actions. The precision on actions 2, 7, and 8 obtained 100% while the recall of actions 5 achieved 81%. The F1-measure of actions 8 obtained 88% while both actions 1 and 3 achieved 80%.

C. LP CLASSIFIER
Applying LP classifier on multi-class multi-label datasets with SVC achieved high results when compared with the OvA and BR classifiers. As shown in Figure 8, the precision on actions 2, 4, 6, and 7 achieved 100% while the recall rate achieved 96% and 91% on actions 5 and 3 respectively. The F1-measure obtained 88% and 86% on actions 5 and 0 respectively.
As presented in Figure 9, when the Gaussian NB classification algorithm was applied, high results are obtained when compared to Gaussian NB using OvA and (BR classifiers. The precision of action 2, 6, 7, and 8 achieved 100% while the recall of action 5 achieved 92%. The F1-measure achieved good results action 5 with a rate of 81%.  The LP is applied with DT classification algorithm as shown in Figure 10. The precision on actions 1 and 7 achieved 100% while the recall of action 5 achieved 96%. The F1-measure of actions 1 and 5 obtained 92% and 89% respectively.

D. CC CLASSIFIER
As explained in Figure 11, when the CC is applied on the same dataset, the results with SVC were somewhat similar to the OvA in that actions 7 and 1 achieved precision 100% and 92% respectively in addition to enhancements of precision on actions 5 and 6 with 83%. The recall and F1-measure of action 1 achieved 85% and 88% respectively.  As shown in Figure 12, using CC with LR, the precision on actions 7 and 2 achieved high results with 100%. Action 5 obtained 92% on recall while action 1 obtained 88% on F1-measure.
The CC is applied with DT classification algorithm as shown in Figure 13. The precision on actions 7 and 8 achieved 100% while the recall of action 5 achieved 85%. The F1-measure of both actions 1 and 3 obtained 80%.

E. ML-KNN ADAPTIVE CLASSIFIER
As explained before, adaptive methods are considered as the inverse of problem transformation methods as the algorithm is changed to be adapted to the multi-class multi-label  datasets. ML-KNN algorithm is one of the best adapted algorithms [26]. By applying the ML-KNN, high results are obtained on precision, recall, and F1-measure. As shown in Figure 14, actions 2, 7, and 8 obtained 100% on precision while action 5 achieved 96% on recall. The F1-measure achieved high results on actions 1 and 3 with 92% and 91% respectively.
Based on the previous performance metrics, the random forest with n estimators = 50, 100, 150, and 200 obtained poor results in precision, recall, F1-measure and hamming loss. So, they are not included in the previous figures and included only in Table 6 that obtains the overall experimental results.

F. PERFORMANCE AND HAMMING LOSS
Based on the metrics that are previously defined, the overall precision, recall, and F1-measure for all classifiers are measured. In addition, the hamming loss, macro average, and micro average are identified to verify the recommender system performance. The hamming loss is measured also to determine the relevant labels that are misclassified labels or irrelevant labels that are classified.
The macro average is measured based on the average precision, recall, and F1-measure of all actions as presented in VOLUME 8, 2020 where, K is the calculated precision, recall, or F1-measure of all actions and L is the total number of labels. The micro average is measured by the sum of all individual true positive, false positive and false negative and aggregating all classes to compute the average as shown in formula (9): As shown in Table 6, the overall performance of the proposed recommender system is presented.
Based on the performance metrics presented in Table 6, the hamming loss is measured by determining the relevant labels that wrongly classified or the number of irrelevant labels that are correctly classified. The lower is the value of the hamming loss; the higher is the efficiency of the predicted labels.
The adaptive ML-KNN achieved the best hamming loss with 0.099715 ≈ 9.97% while the LP with SVC achieved 0.102564 ≈ 10.256% Both LP-SVC and ML-KNN achieved F1-measure with 0.754202 and 0.753489 respectively.
Hamming loss and F1-measure are the most important metrics in our proposed recommender system. Hamming loss calculates the fraction of labels that are incorrectly predicted. So, in multi-label problem, the loss value does not reflect the relevant set of labels as the hamming loss evaluates the individual labels.
As a result, in our proposed recommender system, it is more convenient to predict one or more relevant actions for the course not to predict the full relevant actions. As the predictions in this research mainly focus on predicting one or more actions for improving courses, the predictions of the proposed recommender system will succeed when they can recommend full or partial actions. For example, when the actual actions are 3 and the recommender system predicts one or two actions correctly from the actual actions recommended in course report, it is considered as a successful prediction case. This is why Hamming distance is the best to measure the success of this problem because it can count the successful partial predicted actions.
The F1-measure calculates the weighted average of the precision and recall. So, it is more convenient to use F1-measure than accuracy due to the unbalanced class distribution of dataset as in our case under consideration.
Regarding the precision of the predicted labels, the LP with SVC achieved ≈ 89.6% while ML-KNN achieved ≈ 86.9%. The LP with Gaussian NB classifier achieved ≈ 85.59%.
As shown in Table 6, the best classifier results with different performance metrics are presented.
The machine learning use the default parameters of Gaussian NB and DT Classifier while the RF used with different estimators from 50 to 200 and SVC algorithm used with 'linear' kernel and LR use Stochastic Average Gradient (sag) as solver.

G. FEATURES IMPORTANCE
An analysis of features importance for each recommended action is presented. These important features will be used VOLUME 8, 2020 by domain expert to understand important features without relying on the complexity of machine learning or problem transformation methods. Some of machine learning algorithms support extracting these important features such as LR, SVC, and RF. Table 7 explores the best classifier results based on the parameters: hamming loss, F1-measure, precision, and recall. The adaptive ML-KNN classifier recorded the best hamming loss with 0.099715 ≈ 9.97%. The LP-SVC recorded a hamming loss with 0.1026 ≈ 10.26% while LP-DT recorded a hamming loss of 0.1168 ≈ 11.68%. As shown in Table 7, the hamming loss is increasing until it reaches 0.1681 ≈ 16.81% with CC-SVC.
The LP-SVC recorded the best F1-measure with 0.8033 ≈ 80.33% while the adaptive ML-KNN classifier achieved a close result to LP-SVC with 0.7956 ≈ 79.56%. The remaining classifiers showed relatively low F1-measure starting from LP-DT to CC-SVC. The highest precision and recall are recorded using LP-SVC with 89.56% and 75.42% respectively while the adaptive ML-KNN recorded 86.89% and 75.33% respectively. Based on the results presented in Table 6, the best results are stated in both adaptive ML-KNN and LP-SVC on all parameters.
To extract the feature importance in this research, we selected LP-SVC algorithm which is ranked the second best algorithm in the experimental results as shown in Table 7. As the SVC result is a hyper-plane that separates action classes as best as possible. The coefficient of this algorithm represents this hyper-plane, by giving the coordinates of a support vector which is orthogonal to the separator hyperplane. So, the absolute value of the SVC coefficient is relative to other values. This gives an indication of how important the feature for the separation. Table 8, 9, and 10 present a sample for the most important features for each recommended actions.

V. ANALYSIS AND DISCUSSION
In this research, we have demonstrated that courses specifications, academic records, and course learning outcomes' assessments can be used for predicting recommended actions to improve students learning experiences. In this problem, the data was formulated in different formulations: OvA, BR, CC, LP, and Adaptive ML-KNN with different machine learning algorithms.
As a result, we concluded that there are two problem machine learning algorithms that have succeeded in modeling our current problem; namely, the LP-SVC and Adaptive ML-KNN algorithms that give the best F1-measure and Hamming losses as shown in Table 7. The reason behind the success of LP-SVC to achieve the best F1-measure is that it transforms the multi-label to unique label combinations that treat the problem as a multi-class classification problem. As a result, the best F1-measure is archived while other formulations didn't consider the combinations of multiple labels. On other hand, the success of Adaptive ML-KNN to achieve the best Hamming losses is due to adapting the data to directly perform multi-label classification, rather than transforming the problem.
This can answer the first research question regarding the best method for multi-label classification problem. Regarding the second research question for identifying the best algorithm that can predict the most accurate courses improvement actions, as shown in Table 7, the adaptive ML-KNN and LP-SVC have the lowest hamming loss with 0.0997 and 0.1026 respectively while the adaptive ML-KNN and LP-SVC achieved the best F1-measure with 0.7956 and 0.8033 respectively.
Another interesting conclusion presented in this research is the relationship discovered between each action and the important features that affect these actions either positively or negatively.
As shown in Table 8, when course has ''more topics'' and has a ''project'' as one of its assessment methods, this leads to the action ''Add further case studies''.
This makes sense since more topics means more course complexity that needs more case studies.
While in Table 9, more number of students with ''F grade'' and little achievement of ''CLO 1.4'' (i.e. learning knowledge) lead to action ''Improve the course content'' which also is a logical recommendation since more students' failure  and less achievement of knowledge mean that there is crucial needs for improving the course content to overcome these defects. This answer the third research question of which features are selected to be the most relevant predictors for courses improvement actions.
Finally, as shown in Table 10, more students with A+ grade and fewer students with F grade lead to action ''No action required'' to improve the course as our goal aims to achieve the best students' degrees and less student failure.
We acknowledge two important limitations of this research. First, the dataset collected from course reports is small; since this is the only available data that can be acquired at this period to the best of our knowledge. Second, the data was collected only from the College of Computer and Information Sciences, Jouf University, Kingdom of Saudi Arabia (KSA). Directions for future research are, thus, based on the identified limitations. Further data can be collected from other colleges and universities in future work.

VI. CONCLUSION AND FUTURE DIRECTIONS
This paper proposed a recommender system for predicting the suitable actions that can be proposed by faculty members' to enhance the quality of courses they teach and therefore the overall educational program. The recommended actions will be based on courses specifications, academic records, and course learning outcomes' assessments. In this proposed work, five machine learning algorithms are used for predicting suitable actions. Four methods are classified as problem transformation methods such as OvA, BR, LP, and CC while and the remaining method is the adaptive ML-KNN. The five machine learning methods are applied with different classifiers such as SVC, LR, RF, Gaussian NB, and DT. The performance metrics were measured based on the hamming loss, F1-measure, precision, and recall for all classifiers. The experimental results stated that the adaptive ML-KNN obtained the lowest hamming loss while LP-SVC obtained the best F1-measure. Further investigation is required and more work needs to be executed to enhance teaching strategies and maintaining academic integrity especially on the college courses after applying online teaching during the epidemic of Covid 19. In addition, the dataset of the recommender system needs to be increase to include course reports from diffident universities in Kingdom of Saudi Arabia (KSA).
NACIM YANES was born in Gabes, Tunisia, in 1981. He received the master's degree in computer science applied to management from the Higher Institute of Management (ISG), Tunisia, and the Ph.D. degree in computer science from the National School of Computer Science (ENSI), Manouba University, Tunisia. He is currently an Assistant Professor with the Higher Institute of Management (ISGGB), University of Gabes, Tunisia, and also an Assistant Professor with the College of Computer and Information Sciences, Jouf University, Saudi Arabia. His current research interests include software reuse, recommenders systems in software engineering, serious games and gamification, and outcome-based education.
AYMAN MOHAMED MOSTAFA (Member, IEEE) received the M.Sc. and Ph.D. degrees in information systems from the Faculty of Computers and Informatics, Zagazig University, Egypt. He is currently an Assistant Professor in with the Faculty of Computers and Informatics, Zagazig University, Egypt, and also an Assistant Professor at with the College of Computer and Information Sciences, Jouf University, Saudi Arabia. He has published more than 25 scientific articles in various national and international journals and conferences. His current research interests include information security, cloud computing, E-business, E-commerce, big data, and data science. He is also an Oracle Certified Associate, Oracle Certified Professional, and EMC Academic Associate in Cloud Infrastructure and Services.
MOHAMED EZZ (Member, IEEE) received the B.Sc., M.Sc., and Ph.D. degrees in systems & computers engineering from the Faculty of Engineering, Al-Azhar University. He is currently an Associate Professor with the Faculty of Engineering, Al-Azhar University, and also a Visiting Professor with the College of Computer and Information Sciences, Jouf University. He has published 20 scientific articles in various national and international journals and conferences. He has contributed in more than 16 mega software projects in electronic banking EBPP, EMV, mobile banking, and e-commerce, also CBAP Certified. His current research interests include pattern recognition, applied machine learning, application security, intrusion detection, and semantic web.
SALEH NAIF ALMUAYQIL received the Ph.D. degree in computer science from the Faculty of Computing and Digital Technologies, Staffordshire University, U.K. Since 2018, he has been the Chair of the Information Systems Department. He is currently an Assistant Professor with the College of Computer and Information Sciences, Jouf University, Saudi Arabia. He has published scientific articles in various national and international journals and conferences. His current interests include health informatics, knowledge discovery, knowledge management, digital transformation, and data science. VOLUME 8, 2020