InPath Forum: A Real-Time Learning Analytics and Performance Ranking Forum System

Numerous studies have been conducted on the influence of peers on students’ learning processes and their participation in online forums. However, these studies are limited in terms of system functionality and lack real-time analysis. In this study, we present InPath, a real-time analytics forum system, to rank and provide feedback on the performance of online participation of students. We used the of other students of the online discussion forum performance as an effective reference point to inspire forum participation. A set of learning metrics was generated to analyze the contributions to online forums. The K-means clustering method was used to classify the students into three broad levels: Hall of Fame, All Star, and Rookies. The results showed that students with higher badge levels were more likely to spend more time on forums. In summary, this study highlights the implications of this state-of-the-art system for learning analytics on online forums, including supporting instructors and students in determining overall and individual performance on the forum.


I. INTRODUCTION
In 2020, the COVID-19 pandemic greatly impacted the educational landscape by affecting 94% of the world's student population in more than 190 countries [1], [2]. According to the researchers, the number of students forced to stay home due to educational institution closures reached a high of 1.598 billion from 194 countries on April 1, 2020 [3]. One year after the pandemic, almost 50% of the world's students are still affected by partial or full school closures [1]. Based on research conducted to study how delivery of education is adapted to online learning in response to the pandemic, the author discussed online discussion forums as an online asynchronous platform that has improved student content, student education, and student-student interaction [5], [6]. Participation in the online discussion group ensures that students become intimately involved with the course material and understand the meaning of the content through interaction with their peers [7].
The associate editor coordinating the review of this manuscript and approving it for publication was Francisco J. Garcia-Penalvo .
There is a lack of existing studies that investigate student participation in online discussion forums by tracking their performance and providing feedback in real time. Table 1 highlights the gap in existing research with respect to system functionality, such as real-time performance analysis, performance classification using machine learning algorithms, performance ranking, performance feedback mechanisms, and a dashboard for data visualisation. All features listed in Table 1 were incorporated into our proposed system (InPath).
We investigate how the students participated and contributed to the online discussion forum. Our main goal is to develop a real-time system that allows both students and educators to evaluate and reflect on student participation in the online discussion forum. In Section II, we present the proposed learning metrics as a measurement of the of students of participation level in online forums based on the literature and availability of data. Then, we explain the data flow and machine-learning approach used in the proposed system. We used peer success as a reference point for students to compare their participation. We introduce a hierarchical badge system that categorises students into three badge levels by applying machine learning clustering algorithms based on the participation level of each student. We also provide a feedback mechanism for student performance based on the proposed metrics to promote self-efficacy and self-regulated learning skills [8]. As a result, we present a dashboard that displays the visualisation of the analysed data and the ranking based on the students' participation metrics in the online forum, as explained in Section III. In Section IV, we discuss the relationship between the proposed learning metrics and the three badge levels, as well as the current limitations of the InPath system. Finally, in Section V, we conclude our study by summarising the features and implications of the InPath system for students and instructors.
The main contributions of this research are as follows: • Propose learning metrics to measure and assess student participation in the online discussion forum.
• Apply a machine learning clustering algorithm to categorise different participation levels.
• Introduce a hierarchical badge system based on the level of participation.
• Use the feedback mechanism to improve the student's performance.
• Introduce a real-time automated dashboard that visualises the participation metrics.

A. DATASET
The dataset consists of five months of anonymized data obtained from the Ed Discussion Forum for an Information Technology course at an Australian research university in the 2020/21 academic year during the COVID-19 pandemic. In total, 423 undergraduate students and 16 tutors were registered users of the online forum. After obtaining ethical clearance from the university, we loaded this dataset into an online discussion forum for system testing.

B. LEARNING METRICS
Based on prior research [4], [9], the authors used the users' contributions as metrics to analyse their participation in the online discussion forum. By taking these metrics into consideration, we proposed a set of learning metrics based on the objective of this research and the availability of forum data. The learning metrics proposed are as shown in Table 2, where the number of words, posts, replies, and posts for each category are considered the students' contributions to the online forum. Meanwhile, the total number of days a student visited the forum and the total number of posts read by the student were used to measure the activity of the students participating in the online forum.
Existing research models reported in the literature still have room for improvement. They can be refined using a mechanism to assess the quality of user-generated content [4], [9]. Cheng and Vassileva stated that the quality of user contributions declined as the number of user contributions increased [9]. This could be due to the urge of individuals to increase their contribution scores without considering the quality of content, as quality was not taken into account in the motivation strategy proposed. Therefore, in this study, we decided to use the total number of likes received as a qualitative metric that determines the quality and usefulness of a student's posts or replies.

C. DATA FLOW
In this research, we propose InPath, a real-time automated system built on open-source software that collects student activities on an online student discussion forum and generates instant feedback. The InPath system predicts students' contributions to the online student discussion forum via machine learning clustering, generates dashboard visualisations that can be viewed as comparisons to other peers, and obtains auto-generated feedback indicating the gap between the student's current performance (indicated by the badge level) and a higher badge level. Fig. 1 illustrates the standard data flow of the InPath system. To begin, the database associated with the online discussion forum stores all user information and activity; any updates to the users are passed through the Kafka Connector and read by Kafka Consumer Group 1; additionally, the application filters out non-student users and extracts only the seven learning metrics specified in the previous section. Following that, the Kafka Producer writes each processed record to the Kafka topic belonging to consumer group 2. The streaming data is then passed to the Scala-based Spark streaming application, which forecasts each student's current badge level using the trained machine learning model. The results were then stored in a database. Finally, the learning analytics dashboard displays student performance in comparison.

D. MACHINE LEARNING APPROACH
The K-means method is a popular unsupervised clustering algorithm that effectively produces clusters in n-dimensional space [10]. The InPath system employs the same clustering algorithm as [11], which is a three-level K-means algorithm that performs the best in grouping students based on the learning metrics in their research. As demonstrated in the pseudocode in Fig. 2(a), the K-means algorithm takes the number of clusters and the number of data points as input and initialises K centroids randomly. The algorithm proceeds by repeatedly assigning each data point in D to the nearest cluster and recalculating the centroids' positions until there are no further changes in the positions. Our machine learning approach consists of two major steps: (1) training the K-means clustering model and (2) prediction of data points. As shown in the pseudocode in Fig. 2(b), we utilised Apache Spark and implemented a Scala-based Spark streaming application that constructs a Kafka consumer to receive data from the database via a stream. Subsequently, the application initialises the K value to be 3 to obtain three ranks and uses the Apache Spark Machine Learning library to load a K-means model. The three-rank mechanism was chosen in our application as it adequately classifies the students based on their participation level on the forum, which can be described as low, moderate and high participation level. In comparison, the two-rank clustering model performed poorly since it resulted in many students being flagged as unengaged due to over-clustering of points into unnecessary extra clusters [10].
Next, the dataset was fitted to a model with three clusters based on the learning metrics we proposed. After transforming the students' attributes from the Kafka stream into vectors, we performed predictions on the vectors one by one. The outcome is a tuple consisting of the student's ID and predicted cluster ID, which we will use to categorise the students as either Hall of Fame, All Star, or Rookies badge. Table 3 shows the three badge levels and the characteristics of the students who will be classified into each level. The outcomes were then updated in the database.

E. RANKING SYSTEM
To understand and evaluate the performance of students in the online discussion forum under each K-means cluster, we proposed a ranking system on our learning analytics website. Let F = f1, f2, f3, . . . , f7 be the set of contributions for each student, similar to the learning metrics we proposed. The formula to calculate the score of each student by taking the average of the scores of the seven selected features is presented in (1). As a result, the top 10 students with the highest scores will be displayed on the Learning Analytics website. This ranking system will act as a powerful reference point for all students and will motivate user participation, as stated by [12], and will help achieve the objective of creating a better web-based learning environment.
Feedback is a critical element of the learning process because it allows students to define their expectations, track their current learning progress, and compare their success with that of others [13]. Learning analytics-based feedback has been demonstrated to be one of the most influential factors affecting student learning performance, with a significant difference in learning performance between students who received feedback and those who did not. [14], [15]. As an outcome, we include two mechanisms in InPath to provide real-time feedback for students to understand their personal performance in terms of their participation level in the online discussion forum, realise the gap between the current badge level and the next upper badge level (Table 3), and perhaps become motivated to act and improve learning participation. First, Type 1 Feedback: A graphical representation of studentto-student comparisons on learning measures, average Hall of Fame performance, and the highest performance is provided by InPath. Second, Type 2 Feedback: InPath emphasises the difference between current personal statistics and a higher badge level. Student A, for example, with a Rookies badge, is 10 times behind the average in the All Star category in terms of likes received. As a result, student A realised that his posts/questions may be excessively extensive and unproductive in terms of post content for his peers.

III. RESULTS
This section presents the generated visualization of InPath and the results computed based on the methods applied in the implementation stage of our work. An open-sourced data visualisation tool was used to construct the interactive charts on our learning analytics website.

A. DASHBOARD
The dashboard is the main page of the learning analytics website. It displays a graphical examination of the performance of a student compared to others. The dashboard interface is depicted in Fig. 3. At the top of all the charts are statistics on the total number of students who have earned a badge level. The dashboard includes six charts that allow for comparisons between students and the data obtained based on the performance of other students, corresponding to the six learning metrics presented in this study. The second section of the dashboard delves into the first type of feedback mechanism discussed in Section 2.6. The radar chart shown in Fig. 4 compares the user with the highest statistics and the average statistics of the highest badge level (Hall of Fame).

B. INDIVIDUAL STATISTICS
The Individual Statistics Page, as shown in Fig. 5, is a page where students can view their forum statistics based on their respective participation levels in the forum. The metrics displayed on this page are the number of posts, replies, words posted, reads, days the student visited the forum, and likes received by the student. We also included a summary of feedback based on these metrics with reference to the average statistics calculated from the metrics of the badge, which is  one level higher, as shown in Fig. 6. For students at the highest badge level (Hall of Fame), we will compare them with the highest value obtained for that metric.

C. A RANKING SYSTEM
As demonstrated in Section III (C), we present an equation to rank students in each cluster based on their level of participation in order to evaluate the powerful effect of peer pressure and encourage participation. Fig. 7 shows the results of the ranking system on the learning analytics website. This page is designed to function as a leaderboard, which allows tutors and students to discover the top 10 most contributing students on the system, along with their badge. For example, David Hayes had the highest engagement score among the students based on the calculation results. He/she is awarded the Hall of Fame badge and is rated first on the leaderboard.

IV. DISCUSSION
In this study, we present the application of the InPath system, a real-time analytics forum system, to rank and provide feedback on the performance of student online participation. It is an innovative and revealing way to motivate students to engage in self-reflection, improve student interaction in VOLUME 10, 2022   online discussion forums, and allow instructors to track students' learning behaviour in real time. As previously noted in Section II (A), the dataset from the five-month forum that we used as the experimental data for our InPath system corresponded to the course of the COVID-19 pandemic. There were 423 students enrolled in this course, and the three K-Mean clusters computed by InPath are presented; the centroids reflect the seven learning metrics, and the centroid results are displayed in Table 4. In Fig. 8, it appears that students who earned a Hall of Fame write longer posts and receive more likes than their peers. This shows that students in the Hall of Fame were more likely to ask relevant questions and earn better forum interactions than others at a lower badge level. Figure 9 also shows the   relationship between the total number of days spent on the online discussion forum and the total number of posts read. Students with higher badge levels, it goes without saying, are more likely to spend more time on online discussion forums and read more posts. In total, there were 423 students who earned the following badges by the end of the semester:18 were inducted into the Hall of Fame, 75 were named All Star, and 330 were Rookies; the simplified ratio of the three badges is 6:25:110.
Owing to time constraints, we were unable to collect and analyse data from a variety of different sample forum sizes. Although InPath results only reflect the conclusion of a single forum dataset, they provide critical insight into the real-time learning analytics system implemented in the online discussion forum environment. Some datasets from more comprehensive courses will be used in our future improvements, and we will integrate the InPath system into an ongoing course to track and analyse the influence of this system on students' attitudes regarding the online learning environment. In addition to this work, different machine learning approaches, such as classification and quality-based methods, can be used as an improvement approach to improve the performance of our system.

V. CONCLUSION
With the advancement of technology, e-learning has grown in popularity. However, e-learning faces a barrier in allowing instructors to assess student performance and improve the e-learning environment. Therefore, we set a research question for this paper: Are we able to assess student participation on discussion forums in real time? As a result, this study proposes a learning analytics tool, InPath, to assess student participation in educational online discussion forums. This research was motivated by the need to understand how to analyse individuals' performance in comparison to other students' performance as a reference point, and how an analytics dashboard can assist students in learning their own participation level. As discussed in previous sections, we designed a set of learning metrics and a real-time Spark Streaming K-means system to monitor and rank students based on their participation levels in the discussion forum. As a result of this study, students are categorised into three ranks: Hall of Fame, All Star and Rookies. Furthermore, this paper presents the implementation of a web-based forum analytics website to illustrate the real-time results generated by the InPath system. Ranking mechanisms and feedback systems are also deployed on the website to encourage other learners to become more involved in the forum community and to enhance the quality of the e-learning environment. We believe that InPath will not only motivate students to participate in the online forum but will also support instructors in understanding students' behaviours and improving their teaching methods. In 2005, she joined Gemplus Technologies Asia Pte. Ltd., Singapore, as a Telecommunication Software Engineer. After her Ph.D. graduation, she started her career as an educator and a researcher. She is currently a Research Track Educator mainly supervising postgraduate's projects in INTI International University, Malaysia. At the same time, she serves as a Freelance Lecturer at Monash University, the Tunku Abdul Rahman University of Management and Technology, and the Methodist College Kuala Lumpur. She has received her professional certification in project management from PMI and data analytics from SAS. She has more than 12 years of lecturing, supervising projects, and research experience. Her research interests include big data analytics, information systems engineering, educational data mining, psycho-academic research, and software engineering.