Introduction
E-learning, which involves using a variety of technologies to assist students in finishing their courses outside of traditional/conventional classrooms, can be described as an innovative approach that is frequently employed in the education sector. In order to successfully develop conceptual understanding in the students, teachers must need to assess their conceptual knowledge and give timely feedback. The usage of traditional formative evaluation techniques in today’s classrooms is decreasing [1]. Students’ cognitive involvement can be used to examine how effective the online learning mode of education is working. Previous research has shown that student engagement significantly affects learning as it relates to academic accomplishment [2]. Additionally, it has been stated that students who are cognitively engaged can discover new knowledge and this is necessary for pupils to learn meaning full knowledge. There are numerous techniques to assess a students’ performance in an online learning platform and to evaluate their cognitive ability [23].
At several universities, e-learning has enhanced the teaching and learning processes. The type of education provided has evolved as a result, with online learning replacing traditional classroom training. In order to begin “observing” cognitive engagement, the author observed how students behaved when gathering, interpreting, summarizing, and analyzing information as well as when weighing and debating various possibilities and making decisions [3]. This was done under the premise that cognitive engagement is not elaborated in online learning. It involves looking at how questionnaires and platforms for online learning are used to assess students’ cognitive engagement. The level of student cognitive involvement is a crucial indicator of how effective is the online learning. The cognitive participation of pupils is divided into three categories in this study: high, high-low, and low.
Most of the educational systems favor an online learning environment for students, and some colleges have been using virtual learning from a long time. As a result, throughout an online lecture, it is important to assess students’ performance and gauge their cognitive level [4]. When examining each student’s involvement through computer vision, researchers have previously either focused on their conduct (the activities they engage in) or their emotions. However, because of its increased complexity, cognition is often not addressed. Similar to this, cognitive engagement is related to students’ mental efforts when they may show a high level of knowledge and improved learning [5]. Furthermore, it assessed the cognitive domain in online forums using computerized text analysis.
To measure the amount of students’ cognitive involvement in virtual learning, this study, however, used levels of Bloom’s Taxonomy. Bloom held that one must first primary the first level in order to advance to the second level and beyond. The pyramid’s lowest level represents the simplest form of learning, while the top level represents the most challenging and abstract form of cognitive aptitude as shown in Figure 1.
Below [29] is a quick explanation of each level:
Knowledge: At this level, students must demonstrate retention of previously studied content by recalling facts, conditions, key concepts, and solutions. This level is referred to as “recalling of learning material” and is easier. Example keywords required for identification of first level.
Comprehension: Students at this level must classify, contrast, translate, grasp, describe, and condition essential ideas to show that they understand facts and concepts.
Application: One needs to acquire new knowledge in order to fulfill the requirements for this level. By using recently gained knowledge, facts, methods, and policies, one must deal with issues in novel settings. Applied side example, Applications of Usability heuristics in mobile applications.
Analysis: Students at this level are likely to segment the material by establishing objectives, justifications, or conclusions and to collect data to back up generalizations. comparative analysis among technologies.
Synthesis: At this stage, each student individually gathers information in a distinctive way by putting the fundamentals together to create a new model or putting forth alternatives. Hypothesis and logical reasoning to state the answer.
Evaluation: In order to express and defend viewpoints, students must develop evaluations of knowledge, the talent of ideas, or the excellence of work based on the set criteria.
The remaining sections of the study paper are organized as follows. Section III includes a comprehensive literature assessment of previous studies. Section IV presents the suggested technique and explains the architectural concept with respect to relevant aspects in the domain of cognitive assessment. Experiments and experimental setup used to test the proposed framework are presented in Section V. Results and discussion are presented in Sections VI and VI, respectively. Section 8 is the last Section of the paper giving conclusion.
Literature Review
Recent research has focused on measuring students’ cognitive abilities in e-learning environments using baselined methodologies etc. Since a few decades ago, a lot of researchers have been concentrating on the cognitive domain. Content analysis is mostly used to identify cognitive level and link it to students’ performance on online tests. Through a social knowledge creation coding method, the scripts from discussion forum files and server log files are collected and assessed. The author also emphasized that when texting their views in a discussion forum, students frequently did not engage their thinking abilities [1]. They provided a model with four basic steps: corpus development, feature extraction and selection, classification strategy, and testing. The cognitive levels of the communication are determined using that model [3]. additionally, a Natural Language Processing (NLP) technique was used to automate text analysis-based assessments of students’ conceptual knowledge. From this pattern, features are taken. The next step is to employ three models as classifiers to determine mental state, including SVM, Linear Discriminant Analysis, and K-Nearest Neighbor. SVM’s superior accuracy for this type of detection has also been emphasized [13].
Another study used a strategy of employing questions that were automatically created to test the real-time cognitive characteristics of online learners. To do this, text from video lectures is gathered, questions are created using the model, and responses are then evaluated [6]. The Bloom’s Taxonomy is a useful tool for categorizing cognitive abilities. To investigate that, several researchers have worked in this area. By the inclusion and deployment of regime of Bloom Taxonomy, assess students’ cognitive abilities in online classes. Students completed an objective question paper from the Software Engineering course for this purpose, and their answers were assessed using Bloom’s Taxonomy’s six levels [24].
Additionally, it has been noted that this kind of testing is beneficial for identifying kids’ learning strengths and weaknesses. However, it was discovered that a number of the action verbs in Bloom’s Taxonomy overlap at various levels of the hierarchy, creating uncertainty regarding the precise amount of cognition required. Two methods are utilized in this study to identify the cognitive complexity of 2000+ questions [20]. When applying the BERT framework for text-classification, we were able to attain 89% accuracy compared to the LDA model’s 81% accuracy. It found a link between students’ academic success in eLearning and their motivation levels. Additionally, 111 social science students participated in a motivating survey, and data was analyzed using SPSS software.
The findings showed that there was a positive but tenuous relationship between academic achievement and encouragement level [8]. Determined cognitive load (CL) based on assessing students’ performance, attentiveness, behavioral, and subjective characteristics with the goal of measuring cognitive load. Distractions [9]. Students’ programming code collects data, and method is then used to automatically match the students’ competency level with the pertinent cognitive level from the written code. The findings of the model evaluation and the manual assessment are then compared [10]. Researchers have created a novel method for categorizing students’ cognitive domain evaluations of free body diagram subjects in physics. To do this, the test is created so that the only possible answers are diagrams, and the students’ responses are then analyzed in accordance with their RBT-based cognitive levels.
Some researchers suggested a prediction model to investigate students’ intellectual participation in distant learning. Students’ engagement is tracked by LMS data, and their written remarks during problem solving exercises are also gathered [11]. Finally, written messages are assessed using a cognitive involvement coding structure and student involvement is analyzed using sign in activity, how frequently they are accessing course materials and debates.
As can be seen in Table 1, the majority of scholars have previously conducted work related to the evaluation of students’ cognitive abilities in a distant learning setting. The following table summarizes the most relevant articles that are carefully chosen as shown in Table 1.
In research, conceptual knowledge is often extracted from textual answers using NLP and neural network algorithms. The primary goal of the proposed study is to investigate the link between cognitive engagement and academic success in e-learning and to develop a multi-agent system for a more accurate evaluation of students’ cognitive ability [33].
Proposed Methodology
To improve the effectiveness and efficiency of evaluating students’ cognitive capacities in an online learning environment, a Multi-Agent System has been developed. Agents based System is a computational system that consists of multiple autonomous agents that interact and collaborate to achieve a common goal. In the context of cognitive assessment, Multi-Agent System can provide personalized assessment and feedback, adapt to individual learning styles, and support the overall assessment process. Additionally, each agent can measure students’ cognitive performance in an online learning environment via their textual answers, after transforming the text into numerical features.
A. System Architecture
The primary functionalities of the suggested system are indicated by the component diagram of the proposed technique in Figure 2. Using Bloom’s Taxonomy, the system creates a performance forecast model to categories the cognitive levels. All six levels’ worth of questions based on Bloom’s Taxonomy are fed into the model as input. Each question is parsed by the model by tokenizing each word, and stop words are then eliminated from the question using lemmatization methods. Additionally, the model employs stemming techniques to reduce the most important terms in the query to their most basic form. Finally, the questions are transformed into a dataset with the significant terms acting as independent variables and the question’s Bloom’s Taxonomy level acting as a class or dependent variable. The support vector machine classifier algorithm, which is extensively used and frequently yields the best prediction accuracy compared to any other classifiers available in the research, is utilized in the model as shown in Figure no 2 below.
The methodology consists of various stages, including design of sample question paper, data collection, data pre-processing, and Bloom’s Taxonomy level prediction model, design of agents for student’s textual response analysis, testing and evaluation of proposed system. In order to train SVM Classifier, sample software engineering course questions are gathered from previous research work and via web scraping from various online websites; such as GeeksforGeeks, Guru99, TutorialsPoint and IndiaBix etc. After that, questions preprocessing is done using Natural Language ToolKit library in Python and numeric encoding is assigned for Bloom’s Taxonomy Level. Each question is converted into feature vectors using TF-IDF and Support Vector Machine (SVM) is trained on this dataset to classify questions into cognitive levels. Once SVM classifier is ready, multi-agent system is designed using python to train each agent for textual response analysis. Dataset for textual answers is collected using Beautiful Soup Library and then, text features are extracted using TF-IDF and Word2Vec to train NLP model such as Random Forest Classifier on each agent. Once the system is ready for use, dataset of collected students’ responses is provided to scale cognitive level into Good, Bad and Average for the evaluation of students’ performance at each level.
B. System Modules
Following is the detailed explanation of each module.
1) Data Preprocessing
This step involves the use of NLP approaches for pre-processing of raw textual data. Exam questions and students’ textual answers are pre-processed to convert them into the form of dataset. Stop words, slang words, misspelled words etc. are common in textual answer and these extraneous characteristics can have a negative impact on the performance of ML based algorithms. Therefore, it is necessary to do text pre-processing first using NLTK as shown in Figure 3.
2) Features Extraction
To transform textual information into numerical vectors usable by machine learning models, text feature extraction is a popular approach in NLP. TF-IDF and Word2Vec are two well-known methods for extracting text features.
The Term Frequency- Inverse Document Frequency (TF-IDF) is a measure of a word’s significance in a collection of words as a numerical statistic. In NLP, TF-IDF is a popular method for determining relationships between words.
Words that are frequent in a text but uncommon in the textual answer are assumed to have more informational value in TF-IDF. This method assigns each word a score based on how often it appears in the text. The mathematical representation of its’ formula is given below in Eq. 1:\begin{equation*} \text {W}(\text {d}, \text {t}) = \text {TF}(\text {d}, \text {t}) \ast \text {log}(\text {N}/ \text {df} (\text {t})) \tag{1}\end{equation*}
As the name implies, TF-IDF vector scores a word by multiplying the word’s Term Frequency (TF) with the Inverse Document Frequency (IDF). TF stands for Term Frequency, where the term or a word is the number of occurrence’s in a documented dataset, as compared to the total number of words in the document.
Thus, features from text data can be extracted using TF-IDF as shown in Figure 4 and then, can be utilized in ML models for tasks like classification and clustering. A machine learning algorithm can feed a feature matrix that contains the TF-IDF score for each term in a text.
Words are often represented as vector using “Word to Vector” method. We normally use that technique to figure out words semantic. Word2Vec uses a neural network based architecture to learn from a vast body of textual answer. The neural network is fed with text to make probabilistic predictions about individual words in that text based on their surrounding context. This technique will represent each term as vector and there will be a close connection between the vectors, representing words that have equivalent meanings. For example, vectors for “software” and “engineering” will be near together but, the vectors for “software” and “water” will be far away.
Both the Continuous-Bag-of-Words (CBOW) and the Skip-Gram models can be used to train the Word2Vec model as shown in Figure 5. The current word can be predicted by a CBOW architecture, while neighboring words from the provided current word can be predicted using the Skip-Gram model. In our developed system, we used the CBOW model to implement the Word2Vec technique that can be combined with features extracted from previous method - TF-IDF to get more accurate results.
3) Bloom’s Taxonomy Level Prediction Model
After text pre-processing and features extraction, next module is Bloom’s Taxonomy Level prediction model. This step is mainly used to classify questions on the basis of cognitive levels so that, relevant agent can be called for textual response analysis in next module.
For the prediction of Question level as per Bloom’s Taxonomy, SVM Classifier is used which takes numeric features of questions as an input and produce output in the form of numeric encoding for cognitive levels. Each level is assigned a numeric encoding in the sequence of 1 to 6.
To classify data, SVM seeks for a hyperplane in a feature space with high dimensions that best divides the data into distinct categories. A hyperplane breaks down a decision boundary between two sets of data. In order to maximize the separation between the various Bloom Taxonomy Level classes during training, the SVM algorithm finds the optimal combination of support vectors (data points nearest to the decision border). In this case, the radial basis function has been as a kernel.
4) Students’ Cognitive Assessment Model
Once question level is identified by previous module, next step is to call relevant agent for text analysis and to assess students’ performance on that question. If a question is classified as Knowledge level then, knowledge level agent would be called for the analysis of textual response and same goes for all other agents. Total six agents have been designed and trained on Random Forest Classifier. Each agent will get textual response features, extracted from TF-IDF and Word2Vec as an input and will produce outcome on a scale of good, bad and average to categorize students’ performance. Description is required for student’s ultimate classification.
Following values are being used to set model’s parameters: n_estimators = 600, which means 600 trees were evaluated; max_depth = 4, which means the maximum depth for each tree was evaluated; max_features = 3, which means no more than three features were considered for inclusion in any given tree; bootstrap = True (the default; to emphasize how bootstrapping is relevant to random forest models); and random_state = 18, default value set. Once RF models is trained on each specific agent using textual answers of Software Engineering course questions, it represents student performance at different taxonomy levels as shown in Figure 6.
The students’ performance is tested, based on students’ provided responses, in order to evaluate engagement in online learning platform.
Experimentation
This section describes the setup of our experiments which includes the main libraries used along with dataset collection. Figure 7 shows the accuracy comparisons of different algorithms from cosine similarity gives 70%, Naïve Bayes 92% and 98% for SVM.
A. Experimental Setup
Google Collab and the Python programming language is being utilized to conduct the experiment. When it comes to deep learning and machine learning, Google Collab is a popular online IDE. Moreover, the following libraries are being used throughout the experiment’s execution. There is an 80:20 split between the dataset’s training and testing sets.
Different libraries related to NLP are used for text analysis such as Python 3, Pandas, Requests, BeautifulSoup, Natural Language Toolkit (NLTK), Numpy, Scikit-learn, Gensim, Matplotlib and Seaborn.
B. Dataset Collection
The use of Bloom’s taxonomy in teaching has been widely adopted. In this study, an online software engineering course is taken into account for student evaluation. To meet the learning objectives of the software engineering course and to guarantee that the assessment is completed effectively, assessors must categories questions based on various cognitive levels [7]. For this purpose, a Google Form is used to create an online test that asks 12 subjective questions, each one corresponding to one of the six levels of Bloom’s Taxonomy, in order to measure the mental condition of students’. An online test is conducted among students enrolled in the fifth semester of the Bachelor of Computer Science - Software Engineering Course. It will be transformed into a graded activity to ensure maximum student input and the exam will be administered at the conclusion of the online class in order to motivate students for this assessment and to accomplish our purpose. The online test has a total of 12 questions, as was previously discussed, and the allotted time is 30 minutes. The collected dataset includes 300 students’ textual responses, and it serves as an input to an intelligent system.
The criterion established to determine the scale of students on the metric of Good, Bad and Average depends upon their scores at each level, we have student with >60% in Analysis and >70% in Application is considered as at Good in Cog levels, similarly the student with Analysis level score < 60% and >50% score in Application is considered as below Average and Average.
Results and Discussion
The findings of this study will be discussed below. Since there are two main phases of methodology in our developed system, the results are also divided in two sections as follows:
A. SVM Classifier Result
The questions in online tests are designed according to Bloom Taxonomy level and SVM classifier is used for the classification of Bloom Taxonomy levels. We have chosen SVM for classification because according to recent literature, it is the most widely used classifier.
Additionally, SVM classifier achieves the best result with respect to other machine learning based algorithms for predicting the accuracy of Bloom Taxonomy level classification, as observed in Figure 8.
The confusion matrix provides an additional metric for evaluating the performance of the proposed SVM classifier. The confusion matrix compares the proportion of cases that are properly categorized by the model against the proportion that are incorrectly predicted. The diagonal values represent the number of correctly classified labels which are true in reality. The SVM classifier’s confusion matrix has been shown in Figure 8. The values in the confusion matrix is scaled up values. The actual instances are small therefore all the instances and the outcomes of model are scaled up out of 100.
The performance of the SVM classifier is also measured in terms of accuracy and kappa statistics as observed in Table 2.
B. Random Forest Classifier Result
After question level prediction, numeric features of textual response from TF-IDF and Word2Vec is provided as an input to respective agent and then, machine learning model is being used to provide outcome as a performance scale. Random Forest is one of best classifier for text classification to evaluate students’ performance. Each agent returns assessment results in the form of good, bad and average per 300 students’ and then, results are combined to find out overall accuracy of the system as shown below in Table 3.
C. Students Performance Evaluation
The analysis of results shown that how well student performed in each level of Bloom’s Taxonomy, as observed in Figure 9. By considering all levels into a single performance scale, there are 144 students who performed badly in all the Bloom Taxonomy levels, 66 students have shown average performance and 90 students performed well in online test. The overall classified performance of blooms taxonomy level in shown below in different graph level, shows Knowledge level of student performance is represented vat Good level as per Figure 9 (A).
(A). Students’ performance results. (B). Students’ performance results. (C). Students’ performance results. (D). Students’ performance results. (E). Students’ performance results.
The Performance as per Application level is shown as Good in the cognitive level testing of students as shown in Figure 9 (B)
The Performance as per Analysis level is shown as Average in the cognitive level testing of students as shown in Figure 9 (C)
The students’ performance at the difficult level of synthesis shows students unsatisfactory performance and it serves as an input for students to improve their performance and learn for the improvement to compete at the synthesis level of taxonomy as well as compared to the other cognitive levels performance, as shown in Figure 9 (D).
The students’ performance at evaluation level shows majorly the unsatisfactory result due to the complexity of the cognitive level as shown in Figure 9 (E).
The Table 3 shows the aggregate overall accuracy of 91.83 as per students’ performance at different levels of blooms taxonomy sums over precision, recall, F1 and Accuracy parameters.
To find out the students who performed badly in Bloom Taxonomy level, the experiment is repeated again in an episodic way. Figure 9 shows the bar chart representation of students’ performance in software engineering course.
Students’ strengths and weaknesses can be more easily seen when their progress on every level of Bloom’s Taxonomy is visualized using a bar chart. According to results, most students did well on the knowledge and application questions, but only few of them did well on the synthesis and evaluation questions. This shows that teachers should place a greater emphasis on helping students acquire higher-order thinking abilities.
Conclusion
This research work provides a solution to the problem of evaluating student’s performance in an online environment. In this study, different levels of Bloom’s Taxonomy are used to measure the students’ cognitive level and multi-agent’s system is developed to evaluate students’ performance. Each agent is mapped to one level of Bloom’s Taxonomy. This study has the novelty that it covers questions which are subjective in nature. Previous studies are limited to objective type questions only. Different NLP techniques are used to process the data, gathered from students’ enrolled in Software Engineering Course. The processed dataset is then used to predict the Bloom’s Taxonomy level of question using SVM classifier with 98% accuracy. After identification of the level, the corresponding level agent is called to analyze students’ textual answer. For this purpose, the Random Forest algorithm classifies students’ performance on a scale of bad, average and good, whose accuracy is 92%. The results showed that the proposed methodology is able to achieve results, which are better than the existing studies. The system can be enhanced to evaluate students’ cognitive skill in other courses or in SE- Lab assignments. Also, the system can be modified to provide real-time feedback to students. Agents can be trained to compare the student responses with the actual answers and can point out the differences in answers. In future, the system can also be enhanced in such a way that it can predict students’ mental state by assessing the grades in all the subjects, not only in one subject.
Declaration
The authors declare no conflict for this research.
ACKNOWLEDGMENT
This research is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R136), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors are also thankful to AIDA Lab. Prince Sultan University, Riyadh Saudi Arabia for support.