Academic Performance Prediction Based on Multisource, Multifeature Behavioral Data

Digital data trails from disparate sources covering different aspects of student life are stored daily in most modern university campuses. However, it remains challenging to (i) combine these data to obtain a holistic view of a student, (ii) use these data to accurately predict academic performance, and (iii) use such predictions to promote positive student engagement with the university. To initially alleviate this problem, in this article, a model named Augmented Education (AugmentED) is proposed. In our study, (1) first, an experiment is conducted based on a real-world campus dataset of college students ( $N =156$ ) that aggregates multisource behavioral data covering not only online and offline learning but also behaviors inside and outside of the classroom. Specifically, to gain in-depth insight into the features leading to excellent or poor performance, metrics measuring the linear and nonlinear behavioral changes (e.g., regularity and stability) of campus lifestyles are estimated; furthermore, features representing dynamic changes in temporal lifestyle patterns are extracted by the means of long short-term memory (LSTM). (2) Second, machine learning-based classification algorithms are developed to predict academic performance. (3) Finally, visualized feedback enabling students (especially at-risk students) to potentially optimize their interactions with the university and achieve a study-life balance is designed. The experiments show that the AugmentED model can predict students’ academic performance with high accuracy.


I. INTRODUCTION
As an important step to achieving personalized education, academic performance prediction is a key issue in the education data mining field. It has been extensively demonstrated that academic performance can be profoundly affected by the following factors: • Students' Personality (e.g., neuroticism, extraversion, and agreeableness) [1]- [4]; • Personal Status (e.g., gender, age, height, weight, physical fitness, cardiorespiratory fitness, aerobic fitness, stress, mood, mental health, intelligence, and executive functions) [1]- [12]; The associate editor coordinating the review of this manuscript and approving it for publication was Shadi Alawneh .
• Lifestyle Behaviors (e.g., eating, physical activity, sleep patterns, social tie, and time management) [7]- [28]; and • Learning Behaviors (e.g., class attendance, study duration, library entry, and online learning) ( [7], [8], [23]- [26], [28]- [38]). For example, [2] investigated the incremental validity of the Big Five personality traits in predicting college GPA. [21] demonstrated that physical fitness in boys and obesity status in girls could be important factors related to academic achievement. Meanwhile, [22] showed that a regular lifestyle could lead to good performance among college students. [24] showed that the degree of effort exerted while working could be strongly correlated with academic performance. Additionally, [32] showed that compared with high-and medium-achieving students, low-achieving students were VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ less emotionally engaged throughout the semester and tended to express more confusions during the final stage of the semester.
According to their predicted academic performance, early feedbacks and interventions could be individually applied to at-risk students. For example, in [33], to help students with a low GPA, basic interventions are defined based on GPA predictions. However, the research on the feedback/intervention is still in the early stage, its achievements are relatively few.
Although many academic performance prediction systems have been developed for college students, the following challenges persist: (i) capturing a sufficiently rich profile of a student and integrating these data to obtain a holistic view; (ii) exploring the factors affecting students' academic performance and using this information to develop a robust prediction model with high accuracy; and (iii) taking advantage of the prediction model to deliver personalized services that potentially enable students to drive behavioral change and optimize their study-life balance.
To address these challenges, four representative prediction systems (including one online system and three offline systems) are summarized in Table 1. We first discuss the online prediction system, System A [32] (proposed by Z. Liu). This system is relatively simple because its data is only captured from either SPOC or MOOC. Regarding the latter three offline prediction systems, i.e., Systems B ∼ D [8], [22], [24] (proposed by R. Wang, Y. Cao, and Z. Wang respectively), the number of data sources is reduced, while the corresponding scale size rapidly increases; Unfortunately, the number of different types of behaviors that could be considered is decreased. Ideally, multisource data at a medium/large scale could help lead to a better prediction system design. However, in practice, due to limitations, such as computing capability, either data diversity or the sample size is sacrificed during the system design process.
To initially alleviate the challenges mentioned above, a model named Augmented Education (AugmentED) is 5454 VOLUME 9, 2021 FIGURE 2. Overview of AugmentED. In the data module, the features blocked in dashed boxes (including LyE, HurstE, DFA, and LSTM-based features) are proposed in our study, to the best of our knowledge, which is used for the first time in student's behavioral analysis. proposed in this article. As shown in Fig. 2, this model mainly consists of the following three modules: (1) a Data Module in which multisource data on campus covering a large variety of data trails are aggregated and fused, and the characteristics/features that can represent students' behavioral change from three different perspectives are evaluated; (2) a Prediction Module in which academic performance prediction is considered a classification problem that is solved by machine learning (ML)-based algorithms; and (3) a Feedback Module in which visualized feedback is delivered individually based on the predictions made and feature analysis. Finally, Aug-mentED is examined using a real-world dataset of 156 college students.
The remainder of this article is organized as follows. In Section II, a literature review is given. In Section III, the methodology of AugmentED is described in detail. In Section IV, the experimental results are discussed and analyzed. Finally, a brief conclusion is given in Section V.

II. RELATED WORK A. FEATURE EXTRACTION
Feature evaluation plays an important role in designing prediction systems. Features that measure the various behavioral patterns can enhance our understanding of how a student's behavior changes as the semester progresses. In this part, on the one hand, previous features that quantify students' behavioral patterns are summarized; On the other hand, new features worthy of inclusion are also introduced.
In general, behavioral change can be quantified by the following three groups of metrics.

1) BEHAVIORAL CHANGE-LINEAR (BC-LINEAR)
Traditionally, behavioral change is mainly quantified by two linear metrics: behavioral slope and behavioral breakpoint.
First, the behavioral slope can be captured by computing the slope of the behavioral time series of each student using a linear regression [8]. The value of the slope indicates the direction and strength of the behavioral changes, e.g., a positive slope with a greater absolute value indicates a faster increase in behavioral change [8]. Given a mid-term day during the semester [8], both the pre-slope and post-slope can be calculated to represent the students' behavioral change during the first and second halves of the semester, respectively.
Second, the behavioral breakpoint can be captured by computing the rate of behavioral changes occurring across the semester. The value of the breakpoint identifies the day during the semester before and after which a student's behavioral patterns differed. Two linear regressions can be used to fit a behavioral time series and then use the Bayesian information criterion (BIC) to select the best breakpoint [8]. If a single VOLUME 9, 2021 regression algorithm is selected, the breakpoint can be set to the last day.
Regarding the students' behavioral time series, nonlinear metrics have been used to discover nonlinear behavioral patterns. We consider entropy an example. In [22], entropy is proposed to quantify the regularity/orderliness of students' behaviors, and it was demonstrated that a small entropy value generally leads to high regularity and high academic performance. Another example is entropy calculated based on a Hidden Markov Model (HMM) analysis [44], which is called HMM-based entropy for simplicity in our study. HMM-based entropy is proposed to quantify the uncertainty/diversity of students' behaviors, e.g., the uncertainty between the transition of different behaviors and the various activities that a behavior exhibits. In [44], HMM-based entropy is evaluated by the following two steps: (i) extracting the hidden states of a behavioral time series by HMM [45], [46]; and (ii) subsequently calculating the HMM-based entropy of the extracted hidden states.
To further recognize students' activities and discover their nonlinear behavioral patterns, the following three new metrics, which have not been applied in students' behavioral time series analysis previously, are also worth to be studied.
• Lyapunov Exponent (LyE) [47]- [51] is a measure of the stability of a time series. For example, in [47], LyE is used to quantify the stability of a gait time series, and the results demonstrate that a time series with a large LyE value is less stable than a series with a small LyE value, i.e., generally, a large LyE value indicates high instability. Therefore, in gait analyses, LyE is considered a stability risk indicator for falls [47] that can distinguish healthy subjects from those at a high risk of falling.
• Hurst Exponent (HurstE) [52]- [54] is a measure of predictability (in some studies, it is also called long-term memory) of a time series. For example, in [53], HurstE is applied to quantify the predictability of a financial time series, and the results demonstrate that a time series with a large HurstE value can be predicted more accurately than a series with a HurstE value close to 0.5.
• Detrended Fluctuation Analysis (DFA) [54]- [57] is a measure of the long-range correlation (also called statistical self-affinity or long-range dependence) of a time series [56]. For example, in [56], DFA is used to quantify the long-range correlation of a heart rate time series, and it is demonstrated that a time series with a small DFA value indicates less long-range correlation behavior than a series with a large DFA value. Therefore, in heart rate analyses, DFA is considered a long-range correlation indicator that can distinguish healthy subjects from those with severe heart disease [56]. In summary, the above three nonlinear metrics can measure the stability, predictability, and long-range correlation of a time series. Although these metrics have already been extensively applied in time series analyses, e.g., gait time series [47], in this study, for the first time, they are used in a behavioral time series analysis. These metrics can enhance our understanding of not only whether a student's behavior is stable, predictable, and long-range correlated, but also how good a student's behavior is (e.g., self-discipline).

3) BEHAVIORAL CHANGE-LSTM (BC-LTSM)
Features represent temporal change over time is also worthy of study. Such features can be extracted by long short-term memory (LSTM) [58], which in this article is called LSTM-based features for short. LSTM-based features have been applied in many fields, including for example emotion recognition [59], [60], traffic forecast [61] and video action classification [62]. However, these features have not been applied in lifestyle behavioral analysis previously.

B. PREDICTION ALGORITHMS
In general, academic performance prediction can be considered either a regression or a classification problem. A wide variety of algorithms have been used/proposed in literatures to predict academic performance.
For example, in [8], Lasso (least absolute shrinkage and selection operator) regularized linear regression model, proposed by Tibshirani [63] in 1996, is used to predict academic performance. In [24], four supervised learning algorithms (consisting of support vector machine (SVM), logistic regression (LR), decision tree and naïve Bayes) are used to classify students' performance. In [22], RankNET, a neural network method proposed by Burges et al. [64] in 2015, is used to predict the ranks of students' semester grades. Similarly, in [27], a layer-supervised MLP-based method is proposed for academic performance prediction. In [32], a temporal emotionaspect model (TEAM), modeling time jointly with emotions and aspects extracted from SPOC platform, is proposed to explore the effect of most concerned emotion-aspects as well as their evolutionary trends on academic achievement. In [65], four classification methods (consisting of Naïve-Bayes, SMO, J48, and JRip) are used to predict students' performance by considering student heterogeneity.
In general, due to the lack of open-access, large-scale, and multisource data sets in the education field, on the one hand, to some extent, it is impossible to compare the performances of the existing academic performance prediction algorithms; On the other hand, the algorithms proposed in this field are relatively simple, which are mainly based on basic statistics models (e.g. ANOVA and Post hoc tests) or ML algorithms (e.g. SVM and LR).

C. MULTISOURCE AND MULTIFEATURE
It has been verified in many literatures that the predictive power could be improved by multisource data and multifeatured fusion. For example, it is demonstrated that the performances of predicting both at-risk students [65] and stock market [66] could be improved by combining multi-source data. Similarly, in [22], [23], the performances of academic performance prediction are improved by combing traditional diligence features with orderliness (and sleep patterns) features. In [67], the accuracy of scholars' scientific impact prediction is improved by using multi-field feature extraction and fusion. In [68], a contrast experiments of eleven different feature combinations were conducted, demonstrating that the performances of sentiment classification can be improved by multifeatured fusion.
However, we note that multisource and/or multifeature data cannot always guarantee a higher predictive power. For instance, [69] shows that the results of predictive modeling, notwithstanding the fact that they are collected within a single institution, strongly vary across courses. Actually, compared with single course, the portability of the prediction models across courses (multisource data) is lower [69]. Therefore, the effect of multisource and multifeature data needs to be varied in experiments.

III. METHODOLOGY
In our study, academic performance prediction is considered as a classification problem. According to the high-low discrimination index proposed by Kelley [41], academic performance is divided into low-, medium-, and high-groups. Given a digital campus dataset, according to Fig. 2, the main task is to first extract features from the raw multisource data; then select the features that are strongly correlated with academic performance and use these features to train the classification algorithm; and finally provide visualized feedback based on the prediction results.
In this section, the three modules designed in AugmentED (see Fig. 2) are described in detail.

A. DATA MODULE
A flowchart of this module is shown in Fig. 3, which includes the following three parts.

1) RAW DATA
Permission to access the raw data was granted by the Academic Affairs Office of our university. The raw dataset used in our study was captured from students engaging in the course of ''Freshman Seminar'' during the fall semester of 2018-2019. The ''Freshman Seminar'' was chosen for the following reasons: (1) more students were enrolled in this course (N = 156) than other comparable courses, and (2) these 156 students were more active on our self-developed SPOC platform, thus providing abundant valuable behavioral data. Our dataset consists of the following four data sources (see Table 2): information and academic records, are recorded by the central storage system of our university. For simplicity, the former three data sources are designated D 1 , D 2 , and D 3 , see Table 2. To evaluate the effect of multisource data on the academic performance prediction, which is similar to the studies introduced in Section II.C, contrast experiments of different data source combinations were conducted in our study (see Section IV). To be specific, based on D 1 , D 2 , and D 3 , in total, the following seven data combinations could be obtained: D 1 , D 2 , D 3 , D 1 +D 2 , D 1 +D 3 , D 2 +D 3 , and D 1 +D 2 +D 3 . The latter data source, i.e., Central Storage (which is relatively static and simple), is considered fundamental information shared by all seven combinations.
In our study, privacy protection is seriously considered, and all students' identifying information is anonymized. The infringement of students' privacy is avoided during both the data collection period and data analysis period. First, the student IDs are already pseudonymous in our raw data. Moreover, the resolution of the students' spatialtemporal trajectory is reduced. All information regarding the exact date/area showing when/where a behavior occurred is removed. Therefore, it would be reasonably difficult to reidentify individuals through our dataset.

2) DATA TRIALS
In our study, to initially understand how a student's behavior changes as the semester progresses, on the one hand, data trails across the whole semester is processed and organized in chronological order, including when, where and how a behavior occurs; On the other hand, data trails per week is summarized according to preliminary statistics, including the flowing information in each week, e.g., how often a behavior occurs (i.e. total frequency), how long does a behavior last (i.e. duration), and how much money does a student need.
Regarding the SPOC data (D 1 ), online learning is quantified by (i) learning frequency and duration, which are extracted from the raw log files; and (ii) online learning emotion, which is extracted from the discussion forum. Regarding the Smart Card data(D 2 ), multiple behaviors are involved, e.g. library interaction (including borrowing a book and library entry), see Table 2. Regarding the WiFi data (D 3 ), first, student's trajectory is calculated, mainly including when a student comes to a place; how often does he/she visit this place (i.e. frequency); how long does he/she stay there (i.e. duration). Second, attendance is calculated by combining WiFi data with class schedules. Specifically, to distinguish among behavioral patterns during different periods, three types of durations (namely, durations on working days, on weekends, and throughout the semester) and two types of attendances (namely, attendance during the final study week and attendance throughout the semester) are evaluated in our study.

3) FEATURE EXTRACTION
To gain a deeper insight into students' behavioral patterns, as summarized in Section II.A, in our study behavioral change is evaluated by linear, nonlinear, and deep learning (LSTM) methods, see Fig. 3.
• BC-Linear. Similar to the traditional approach, linear behavioral change is quantified by behavioral slope and behavioral breakpoint. Students behavioral series are fitted by two linear regressions, subsequently the optimized breakpoint is selected by BIC and behavioral slopes are calculated. Additionally, to further measure the amount of variance in the dataset that is not explained by the traditional regression model, the residual sum of squares (RSS) is also evaluated (see Table 2). In our study, those linear metrics are mainly calculated by the python model sklearn.linear_model.
• BC-nonLinear. Similar to the traditional approach, first, entropy and HMM-based entropy are evaluated in our study, measuring the regularity and diversity of campus lifestyles respectively. Notably, the hidden states are numerically extracted by the MATLAB function hmmestimate, then the HMM-based entropy of the extracted hidden states is evaluated by the MATLAB function entropy. Second, to further discover nonlinear behavioral patterns, the following three nonlinear metrics are proposed and extracted for the first time: LyE, HurstE, and DFA, measuring the stability, predictability, and long-range correlation of campus lifestyles respectively. In our study, four nonlinear metrics (entropy, LyE, HurstE, and DFA) are evaluated by a numpy-based python library, i.e. nolds, based on the 0&1 sequence (see Appendix A).
• BC-LSTM. LSTM-based features representing dynamic changes in temporal behavioral patterns are calculated as follows. First, as input information, data trails from multiple behaviors are organized together week by week, see Fig. 3. In each week, the basic information of all multiple behaviors involved in our study is summarized, including for example how many times having breakfast and borrowing books from library etc. occurred respectively. Subsequently, this weekly information is fitted into a Keras LSTM network, then features representing the weekly behavioral patterns that might change throughout semester are extracted.

B. PREDICTION MODULE
The main task of this module is to select features and use these features to train the prediction algorithm.

1) FEATURE SELECTION
In our study, 708 different types of features are extracted, including 510 linear features, 119 nonlinear features, 50 LSTM-based features, and 29 basic features (including e.g. frequency and duration, gender, age, and grade). For instance, because multiple behaviors are involved in our study, there are 20 DFA related features in total to quantify long-range correlation for each behavior individually (e.g. library entry). The distributions of the evaluated features and GPA are spread in different value scopes. Therefore, to eliminate a potential effect on the correlation analysis, both the features and GPA are normalized by min-max normalization. Additionally, to improve the performance of the prediction algorithms, the top 130 features with the most significant effect on academic performance are selected by the SelectKBest function in a python library named scikit-learn.

2) PREDICTION ALGORITHM
Subsequently, the selected features are used to train the MLbased classification algorithm for the academic performance prediction. Specifically, in our study, five ML algorithms are applied, including RF (random forest), GBRT (gradient boost regression tree), KNN (k-nearest neighbor), SVM, and XGBoost (extreme gradient boosting). The hyperparameters of the ML and LSTM algorithms are optimized by Grid-SearchCV in scikit-learn.

3) CROSS VALIDATION
Our dataset is divided into a training set and a test set at the ratio of 7:3. The classification algorithm is first trained and then applied to the test set to predict academic performance. Finally, the robustness of the algorithm is tested by 10-fold cross validation.

C. VISUALIZATION MODULE
The main task of this module is to provide personalized feedback, including GPA prediction and a visualized summary of the students' behavioral patterns.

IV. EXPERIMENTAL RESULTS
In this section, first, the experimental results of Aug-mentED is presented and analyzed. Second, to evaluate the effectiveness of multisource and multifeature, contrast experiments are conducted, and the corresponding results are discussed. Finally, visualized feedback offered to students are designed.

A. PREDICTION RESULTS
The experimental results of AugmentED are shown in the last five rows of Table 3 (i.e., RF * , GBRT * , KNN * , SVM * and XGBoost * ), which are highlighted in bold. Five indexes (accuracy, precision, recall, f1, and AUC) are used to evaluate the performance.
Notely, AugmentED is proposed based on (i) multisource data, i.e. D 1 +D 2 +D 3 (including SPOC, Smart Card, and WiFi data); (ii) multiple features, i.e. C-III (including BC-Linear, BC-nonLinear, and BC-LSTM features). * in Tabe III denotes that C-III feature combination is used in the corresponding ML algorithms for academic performance prediction.
From Table 3, it can be seen that, first, the academic performance can be predicted by AugmentED with quite high accuracy. Second, the performance of the five different ML algorithms (RF * , GBRT * , KNN * , SVM * and XGBoost * ) are similar, which can all lead to a good prediction result. To clarify, we consider the case of precision values, see the 5 th column of Table 3. The precision values of five ML algorithms are 0.873, 0.877, 0.863, 0.889, and 0.871 respectively, indicating that (i) its minimum value is 0.863, i.e. the precision of AugmentED is no less than 86.3%; (ii) the difference between the minimum and maximum values is 0.026, which is quite small, i.e. AugmentED is independent of ML algorithms.

B. COMPARATIVE EXPERIMENTS
In this part, contrast experiments are conducted to evaluate the prediction effect of multisource and multifeature combinations.

1) MULTISOURCE
Comparisons of the performance of different data source combinations are conducted, see the 1 st column of Table 3. As shown in Table 3, a large number of multiple data sources can lead to a more accurate prediction result.
To clarify, we consider the case of SVM * , from D 1 to D 1 +D 2 and D 1 +D 2 +D 3 (see Tables 3 and Fig  multisource data can enhance the in-depth insight gained into students' behavioral patterns.

2) MULTIFEATURE
Comparisons of the performance of three different feature combinations (C-I, CI-II, C-III) are also conducted, see the 2 nd column of Table 3 and Fig. 5.  Table 3 highlighted in light pink; • C-III (including BC-Linear, BC-nonLinear, and BC-LSTM features), see the rows of Table 3 highlighted in light green. Its corresponding MLs are denoted as RF * , GBRT * , KNN * , SVM * , and XGboost * . As shown in Fig. 5 and Table 3, all five evaluation indexes (accuracy, precision, recall, f1, and AUC) of C-III are significantly higher than those of C-I and C-II. To clarify, we consider the case of SVM * in the (D 1 +D 2 +D 3 ) dataset, the accuracy value of SVM * is 0.866, which is much higher than that of SVM and LSTM (i.e., 0.635 and 0.501, respectively), see the 4 th column of Table 3. This result indicates that the multifeature combination proposed in our study (i.e. C-III) can significantly improve the predictive power.

C. IDENTIFICATION OF AT-RISK STUDNETS BASED ON THE PREDICITON
The prediction result obtained by AugmentED can be used to identify at-risk students, i.e., determine whether a student belongs in a low performance group. It could be quite helpful for early warning and feedback to be provided to at-risk students before the final exam week.
To illuminate how AugmentED could potentially help students optimize their college lifestyles and consequently improve their academic performance, a feedback example delivered to one at risk student is shown in Fig. 6.
We note that except for the prediction result itself, the extracted features that are strongly correlated with academic performance can also be taken as assistant indicators, to identify at-risk students. Traditionally, those features can be selected by either statistical analysis (e.g. by ANOVA) or ML algorithms (e.g. feature importance returned by RF). We recall that in our study, multiple behaviors are involved, and each behavior is quantified by a plenty of -linear,nonlinear, and -LSTM features. Therefore, a particular feature (e.g. entropy) of one single behavior (e.g. either having breakfast or learning online) might not make sense to gain a comprehensive evaluation of student' behavioral patterns. From this perspective, in Fig. 6, nine assistant indicators are calculated and plotted.
We begin by discussing the indicators of -linear,nonlinear, and -LSTM features (see Appendix B), which are denoted as D-linear, D-nonLinear and D-LSTM respectively, representing the (weighted) linear, nonlinear and temporal pattern of all multiple behaviors involved in our study (rather than one single behavior). Regarding these three indicators, (i) The average values and 95% confidence intervals (from the low-, medium-, and high-academic performance groups) are plotted in the left column of Fig. 6. (ii) The Pearson correlation between the indicators and academic performance is calculated, see the 2 nd , 5 th , and 8 th rows of Table 4 which are highlighted in light gray. VOLUME 9, 2021  Furthermore, six more indicators are calculated and provided as supplementary, see the 2 nd and 3 rd columns of Fig. 6. The Pearson correlation between these indicators and academic performance is also calculated and listed in Table 4.
From Table 4 it can be seen that all the nine indicators are strongly correlated with academic performance. Additionally, in Fig. 6, the apparent distinction among three academic performance groups demonstrates that all the nine indicators can offer strong support in at-risk student identification.
To clarify, we consider the case of D-linear. On the one hand, its average values and 95% confidence intervals from low-, medium-, and high-academic performance groups are (1.457±0.199, 2.160±0.193, 3.035±0.341), see Fig. 6(a1), indicating clear separation. On the other hand, its correlation coefficient is 0.534, see the 3 rd row of Table 4, i.e., this indicator is significantly correlated with academic performance. Therefore, D-linear can be taken as an indicator to explore which student is at risk because of the low performance he/she will achieve.

V. CONCLUSION AND FUTURE WORK
As an important issue in the education data mining field, academic performance prediction has been studied by many researchers. However, due to lack of richness and diversity in both data sources and features, there still exist a lot of challenges in prediction accuracy and interpretability. To initially alleviate this problem, our study aims at developing a robust academic performance prediction model, to gain an in-depth insight into student behavioral patterns and potentially help students to optimize their interactions with the university.
In our study, a model named AugmentED is proposed to predict the academic performance of college students. Our contributions in this study are related to three sources. First, regarding data fusion, to the best of our knowledge, this work is the first to capture, analyze and use multisource data covering not only online and offline learning but also campus-life behaviors inside and outside of the classroom for academic performance prediction. Based on these multisource data, a rich profile of a student is obtained. Second, regarding the feature evaluation, behavioral change is evaluated by linear, nonlinear, and deep learning (LSTM) methods respectively, which provides a systematical view of students' behavioral patterns. Specifically, it is the first time that three novel nonlinear metrics (LyE, HurstE, and DFA) and LSTM are applied in students' behavioral time series analysis. Third, our experimental results demonstrate that AugmentED can predict academic performance with quite high accuracy, which help to formulate personalized feedback for at-risk (or unself-disciplined) students.
However, there are also some limitations in our study. To gain a multisource dataset, we scarified the scale the dataset by only using student-generated data within a single course. This limitation might have a certain negative influence on the generalization of AugmentED. Furthermore, in this study, we mainly focus on behavioral change. Other characteristics/features (e.g., peer effect, sleep) that are worthy of consideration were not evaluated in this study.
In conclusion, our study is based on a complete passive daily data capture system that exists in most modern universities. This system can potentially lead to continual investigations on a larger scale. The knowledge obtained in this study can also potentially contribute to related research among K-12 students.

APPENDIX A
To evaluate the four nonlinear metrics (entropy, LyE, HurstE, and DFA) of the time series, we concentrate on the precise time of day during which the behaviors occurred. Therefore, in our study, the involved time is first converted to a discrete time sequence. Then, according to the represented discrete time sequence, the raw behavioral time series data are converted to the 0&1 sequence as follows: The time data are converted to a discrete sequence with a normalized time interval by the following three steps: Following the time data representation, the raw behavioral data are converted to a 0&1 sequence by the following two steps: • Step 2.1. First, a zero sequence X ij with length N t is generated, and • Step 2.2. If a behavior occurs at time T ij , the T th ij element of the corresponding discrete behavioral sequence X ij is set to 1, i.e., X ij = 1. For instance, if a student has a meal at ''03/09/2018, 10:24'' (where T ij = 117), the 117 th element of the discrete meal sequence is set to 1, i.e., X ij = 117. This process can by described as follows: where X ij [0,1]. According to Eq. A-2, all behavioral data listed in Table 2 (including SPOC online study, borrowing a book, library entry, meal consumption, breakfast consumption, consumption, clinical visits, and WiFi data in the study and relaxation areas) are converted to discrete behavioral sequences.

APPENDIX B
Regarding the nine assistant indicators described in Section IV.C, the former seven are calculated according to [24]; while the latter two (LSTM-49, LSTM-1) are selected from the extracted 50 LSTM-features without any further processing. The fundamental mathematical approach to calculate the former seven indicators is the same. The major similarity between these indicators is that they all represent certain property of all multiple behaviors involved in our study. The major difference is the input features used for calculation. To clarify, in this section the mathematical approach to the calculation of D-linear is given.
• Step 1. The score of each linear feature (e.g. slope) for each student is calculated as follows, We assume that there are N students and K extracted features in total. Corr(X k ) is the Pearson correlation coefficient between the k th feature X K and students' academic performance, where k ≤ K . Rank(x n ) means the ranking of the n th student's (denoted as u n , where n ≤ N ) feature among all students. For example, there are three students (u 1, u 2, u 3 ), and their k th feature (e.g. slope value of having breakfast) are (0.8, 0.4, 0.6), then we have Score 1 k = 0, Score 2 k = 0.667, and Score 3 k = 0.333 because Corr(X k ) > 0. (|Corr(X k )| * Score n k ) (B-2) We note that essentially D-Linear is the weighted mean of all linear feature scores, and its weights are the correlation coefficients.