Explainable Student Agency Analytics

Several studies have shown that complex nonlinear learning analytics (LA) techniques outperform the traditional ones. However, the actual integration of these techniques in automatic LA systems remains rare because they are generally presumed to be opaque. At the same time, the current reviews on LA in higher education point out that LA should be more grounded to the learning science with actual linkage to teachers and pedagogical planning. In this study, we aim to address these two challenges. First, we discuss different techniques that open up the decision-making process of complex techniques and how they can be integrated in LA tools. More precisely, we present various global and local explainable techniques with an example of an automatic LA process that provides information about different resources that can support student agency in higher education institutes. Second, we exemplify these techniques and the LA process through recently collected student agency data in four courses of the same content taught by four different teachers. Altogether, we demonstrate how this process—which we call explainable student agency analytics—can contribute to teachers’ pedagogical planning through the LA cycle.


I. INTRODUCTION
The global COVID-19 and the related closures of educational institutions showed how significant it is for students to be able to rely on their own resources. In particular, to continue learning, the educational institutions' closures placed greater demands on students' autonomy and their capacity for independent learning, executive functioning, and self-monitoring [1]. It also showed that those students who lacked the resilience and engagement to learn on their own, in particular, were at risk of falling behind [1], [2]. In summary, COVID-19 and its consequences for students revealed the importance of being self-determined in learning and being able to adapt to situations involving rapid change.
Student agency equips students to manage such situations. It refers to students' holistic judgement of how they can affect and direct their learning in instructive settings, work effectively, and utilize the assets that are accessible The associate editor coordinating the review of this manuscript and approving it for publication was Shuihua Wang .
in the learning environment [3], [4]. The importance of agency in education has been emphasized by policy-making informers, especially by the Organisation for Economic Co-operation and Development [5]. Agency is a basic need in any goal-oriented work, particularly in jobs that call for creativity and continuous development in work practices [6]. This means that graduates of higher education institutes, in particular, should be prepared to act as developers and change agents in their field. However, despite this need-especially in the COVID-19 context but also in general-and the particular emphasis on student agency by policy-making informers, student agency has received little explicit attention in educational practice in higher education so far.
Learning analytics (LA) refers to a research field that harnesses data on learners to understand, improve, and optimize learning [7]. The use of LA can, for example, predict academic success, improve quality assurance, and identify at-risk students [8]. Moreover, dashboards are often utilized to visualize learning processes and study pathways-not only to increase awareness but also to give personalized feedback FIGURE 1. The XSAA process can be depicted as a loop, which starts when the teacher makes the initial pedagogical plans. At some point in the learning and teaching process, the students complete the AUS questionnaire, and the agency analytics is executed automatically. The teacher receives results, and can then adjust the pedagogical plans according to the students' experienced resources of agency.
to the learners. This kind of personalized feedback and consideration of the personal traits of learners can positively influence the learning process and outcomes. Since it is usually unfeasible for teachers to manually provide such individualized feedback to all students-especially for teachers in higher education settings who often have to instruct hundreds of students with different backgrounds-such automated feedback can offer significant support.
Jääskelä et al. [9] examined student agency as the theoretical framework for assessing and enhancing digital education at universities by making use of LA. Based on a factor and robust cluster analysis process, which is conducted to measure students' responses to a validated scale [3], [9], the students receive automated feedback on their individual agency profile. In addition, the teacher of a higher education course gets an aggregated overview of the different student agency profiles. The essence of this automated agency-based process-which is called student agency analytics (SAA)-is to provide actionable information for students on their learning efforts in relation to their perceived affordances in the course and for teachers on students' judgements of their situational agency to increase pedagogical knowledge.
In a recent review, Deeva et al. [10] classified automated feedback systems by their applied educational settings, the properties of their delivered automated feedback, and their design and evaluation approaches. They concluded that applied learning theories or educational frameworks had not been reported in most cases. Moreover, they urged the developers to use more data-based solutions and to be able to explain the reasons behind the automated system. Therefore, the purpose of the present article is to show how the integration of explainable artificial intelligence (XAI) techniques with the SAA process (see Figure 1) can support the transparency and data-based development of automated feedback systems in education. More precisely, we aim to integrate XAI techniques into the SAA process in the context of higher education. This procedure improves awareness of different stakeholders from such organizations on the learning arrangements, considers the complexity of the students' capacities and various contextual resources, and supports reflection.
Another reason why we aim to integrate XAI techniques within SAA is that explainability became a key issue in LA [11]. Relationships in educational data are often complex [8], [12], and several theoretic LA studies have shown that these relationship can be modeled better by complex models than by simple linear ones (e.g., [13]- [16]). However, in practice, these complex models are rarely used because they are reckoned to be inexplicable. XAI is an emerging research direction that can help the user or developer of complex models understand the model's behavior and provide human-understandable justifications for it [17], [18]. Thus, the integration of XAI techniques allow us to also use the better performing complex LA models in SAA and to explain them in such a way that even practitioners with no background in data analysis can easily understand them.
To demonstrate our explainable SAA process (XSAA), we provide the results from a study of four concurrently implemented courses on mathematics in an engineering education degree program. The content and curriculum of these mathematics courses are identical but they are taught independently by four different teachers. This means we built and explained our models not only by using the student-specific agency data but could also link them to the particular teaching approaches of the instructors. Such a setting is new and might help teachers to increase their awareness of the effects of their pedagogical planning and interventions.
The main contributions of this paper are twofold: • We use XAI to produce explainability and actionability through dashboards. These dashboards not only show summaries of the raw student data (e.g., how active they were with the tasks or how long it took to solve a problem) but also-through nonlinear and universal machine learning models-explain the reasons for the students' actions, linking them to a well-defined body of pedagogical planning by the teacher.
• We discuss the usability of the results gained through XSAA at the teaching practice level; that is, how they may help teachers in reflecting and designing their curriculum and in developing agency-supportive practices in their teaching implementations. The rest of the paper is organized as follows. Section II outlines the background at the basis of our contribution. First, we locate our research among the previous studies in the field of LA and XAI in higher education. Second, we summarize previous student agency LA studies. Section III provides a discussion of the need for explainable models, especially in LA. It also provides an overview of the different XAI techniques that we are using for our SAA dashboards. Section IV presents an example of an application of our explainable SAA in higher education (i.e., the data and our XSAA results from the four groups of students studying the same mathematics course taught by different teachers at a university of applied science). Finally, Section V presents the main findings and implications of our study.

A. LA AND XAI STUDIES IN HIGHER EDUCATION
Hundreds of primary studies depicting and analyzing the use of LA to improve educational actions in higher education institutes (HEI) have been published, and their impacts and outcomes have been summarized in many recent reviews (e.g., [19]- [21]). Their overall conclusions suggest that LA should be better grounded to learning science, its effectiveness should be assessed, and actual linkages to teachers and pedagogical planning should be emphasized.
For example, the review by Aldowah et al. [19], which included 402 articles from 2000 to 2017, presented many student-oriented characteristics such as ''engagement,'' ''achievement,'' ''participation,'' ''reflection,'' ''motivation,'' and ''satisfaction'' to be approachable by using LA techniques. However, no linkage to the actual teaching activities was presented. In the combined review-meta-review by Du et al. [22] from 901 identified research papers from 2011 to 2017, the authors mentioned that instructors need to connect LA with learning science and use dashboards for student monitoring. Similarly, the knowledge gap between the theoretical frameworks of educational domain knowledge and the LA models was emphasized in the review by Cui et al. [23].
After multistage screening, the review by Sonderlund et al. [24] ended up with only 11 studies out of 689 that were found to evaluate the effectiveness of LA interventions, concluding that the lack of intervention studies where the educational institution (in practice, the instructor of a course in HEI) performs and evaluates systematic changes of its actions. Moreover, based on analyzing 252 papers published during 2012 to 2018, Viberg et al. [21] concluded that ''the overall potential of LA is so far higher than the actual evidence, which poses a question of how we can facilitate the transfer of this potential into learning and teaching practice.'' Likewise, Ifenthaler and Yau [20] addressed the study success of HEI students through 46 primary studies, concluding that the lack of ''rigorous, large-scale evidence of the effectiveness of LA in supporting study success.'' To this end, the review by Leitner et al. [25], which was based on 101 articles during 2011-2016, nominated teachers solely as a ''side-product'' of the research field.
Contrary to the huge amount of LA in HEI studies, studies dealing with XAI in HEI are extremely scarce. A Google Scholar and Scopus search in May 2021 identified only three studies of XAI in HEI [26]- [28]. Putnam and Conati [26] conducted experiments with nine university students testing whether the students would like to receive explanations for hints given in an intelligent tutoring system (ITS). They concluded that the majority of students would like explanations in the ITS, but the actual implementation of XAI was presented as future work. Likewise, Conati et al. [27] discussed only theoretically necessary considerations to make an ITS explainable for the benefit of learning. Alonso and Casalino [28] used XAI for a distance learning set. However, they did not provide any description of XAI techniques and solely used existing software (WEKA) to gather explanations for their prediction models. In sum, all three articles emphasized the need for XAI in automated feedback systems in HEIs, but none implemented and explained the underlying XAI techniques.

B. STUDENT AGENCY ANALYTICS IN A NUTSHELL 1) STUDENT AGENCY IN HIGHER EDUCATION
Agency has been under consideration in several disciplines and has been highlighted in various areas of life. In general, agency is one's capacity to act and cause change. However, different disciplines have their own and more detailed FIGURE 2. Student agency analytics provides information about the inter-individual differences relating to resources of student agency. This figure shows a student's personal report that consists of his/her individual agency profile in comparison with the general agency profile of the group. A teacher's report consists of a general agency profile of the group combined with four prototypical agency profiles (as visualized in Figure 3).
perspective on the meaning of agency. For example, in social cognitive theory, agency is understood as an individual's capability to engage in intentional, self-defined, and meaningful action [29]. Similarly, in social sciences, the concept of agency concerns an individual's capability to take intentional and self-defined (i.e., autonomous) action and is focused on the circumstances and structural factors that constitute frames for action (e.g., [30]). Contemporary educational discourse has emphasized the meaning of agency in lifelong learning [31] and in student-centered learning [32]. Within educational sciences, agency is seen as an integral part of learning, which manifests itself both as individuals' active action in knowledge construction (e.g., [33]) and a sense of being empowered in learning situations [34].
Our stance on student agency is based on the conceptualization made by Jääskelä et al. [3], who synthesized the previous literature on agency and defined student agency in higher education as ''a student's experience of having access to or being empowered to act through personal, relational, and participatory resources, which allow him/her to engage in purposeful, intentional, and meaningful action and learning in study contexts.'' Student agency consists of three resource areas (see Figure 2). Personal agency resources consist of the dimensions of competence beliefs and selfefficacy. Relational resources refer to power relations in different educational settings, which include the experiences of equality among the students, trust for the teacher, and support from the teacher. Participatory resources of student agency involve dimensions relating to engaged and active participation in learning. Altogether, student agency is composed of 11 dimensions, and it is measured using a validated psychometric Agency of University Student (AUS) scale [3], [35].

2) STUDENT AGENCY ANALYTICS
Discerning different study experiences can be demanding in heterogeneous educational settings with a multitude of students. To address this challenge, we apply a LA process called student agency analytics, which utilizes robust statistics and psychometric information obtained using the AUS scale [9]. First, the students in a particular study group or course complete the AUS questionnaire. Second, the individual factor values of agency are calculated for each student using the factor pattern matrix, which enables the determination of the general agency profile of the whole study group. Third, unsupervised learning, specifically robust clustering, is used to provide prototypical agency profiles with four distinct groups based on cluster validation indices, as described in more detail in [9]. Kruskal-Wallis H and Mann-Whitney U tests can then be used for explaining the clustering results through the agency dimensions. Moreover, if the information on the quality of learning outcomes or course grades is available, it can be linked to the prototypical agency profiles using supervised learning.
The main representations obtained using SAA are the students' individual agency profiles (IAPs), the general agency profile (GAP) of a group (e.g., study group, course), and four distinct prototypical agency profiles (PAPs) within a group. IAP ( Figure 2) represent the values of individual student's agency dimensions, which can be compared with the GAP.
IAP is a personal depiction, and it is aimed only at the student accompanied with general information about student agency. For the teacher, student agency analytics provide a general overview of the agentic resources of the students. To preserve students' privacy, teachers do not receive individual student profiles. Instead, their report consists of de-identified information about the GAP and PAPs. Both the GAP and PAPs are presented in the teacher report as a special combined bar graph ( Figure 4).

3) TEACHER'S PERSPECTIVE
Teachers' actions and their pedagogical choices influence students' learning experiences (e.g., [36]- [40]). In terms of pedagogical planning, teachers would benefit from the analysis results concerning all their students. For instance, peer support can help students in higher education to develop selfregulation skills, decreasing or allowing better management of study-related exhaustion [41]. Thus, it would be worthwhile for the teacher to identify the different experiences of peer support to provide means and opportunities for students to actualized supportive collaboration. Students' prior knowledge can significantly influence student achievement [42]. Failing to consider students' prior knowledge might be manifested as a lack of competence beliefs and self-efficacy. In summary, becoming aware of students' agentic experiences could help teachers make better pedagogical plans and decisions.
From the teacher's perspective, SAA summarizes the inter-individual differences of learning experiences in a visually interpretable form. As a result, students' general assessment of their agency and four distinct student agency profiles are presented to the teacher. The process can be depicted as a loop (see Figure 1), which starts when the teacher makes the initial pedagogical plans. At some point in the learning and teaching process, the students complete the AUS questionnaire, and the agency analytics is automatically executed. The teacher receives results, which visually describe the GAP and the PAPs. The teacher can then adjust the pedagogical plans according to the students' experienced agency resources. In the following sections, we develop the SAA process toward explainable LA.

4) ETHICAL CONSIDERATIONS
A general prerequisite in LA should be the responsible use of educational data [43]. It is worth emphasizing that SAA aims not to evaluate or grade the students or their learning. Instead, the purpose is to identify and make visible different personal learning experiences through the concept of agency. Thus, it is essential to ensure the privacy of the students and teachers. The individual agency profile received by a student is personal and only for the student's use. Teachers or anyone else do not see the student's IAP unless they want to disclose the results, for example, to help study counseling. Generating aggregated results (GAP and PAPs) provide a means to present detailed but de-identified information for the teacher. Similarly, the teacher report depicting the aggregate results of a course is meant only for the teacher to use in personal pedagogical planning. The results should not be used to evaluate the individual teachers or their teaching.

III. TOWARD EXPLAINABLE LEARNING ANALYTICS
From a technical point of view, LA is about modeling students and learning. Its methods have roots in several different disciplines, such as statistics, education, psychology, and machine learning [44], [45]. While traditionally, statistical models were mainly used in LA to scaffold students and help teachers, the machine learning models have gained in importance in recent years [46]. This is mainly due to the challenge of modeling the increasingly rich, varied, and multimodal (such as eye tracking, physical movement, and face recognition for emotion detection) LA data [47], [48].
Often a trade-off occurs between the performance of a specific machine learning model and its explainability. For example, in supervised learning, the performance (i.e., the difference between the real outputs and the outputs of the model) is usually better for complex models with nonlinear combinations of inputs, but such models are harder or even impossible to understand. These kinds of models are also called ''black boxes.'' On the contrary, simple linear methods are prone to perform worse, but they are easier to interpret and understand. One example of the latter is a linear regression model, where the coefficient of an input can be directly interpreted as the importance of that input.
Although they usually perform better, black boxes have several problems. One problem relates to assuring that such a model works as intended. If not even the designer of the model can explain the model's underlying logic and how it arrived at a result, it is impossible to verify that the model uses the right justifications for its decisions. In the worst case scenario, such black-box models may use questionable reasons for their decisions without anyone noticing them. This usually happens if they adopt bias in the training data. Bolukbasi et al. [49], for instance, showed that a model that was trained on a corpus of Google News text, learned the correct word embedding ''man is to woman as king is to queen,'' but at the same time also learned the worrisome embedding ''man is to woman as computer programmer is to homemaker.'' Another example, discussed by Freitas [50], comes from the military: The military trained a classifier to distinguish pictures of enemy tanks from pictures of friendly tanks. This classifier was performing well on the training set but showed poor performance when it was used in the field. Later it was discovered that the pictures of enemy tanks in the training set were taken mostly on overcast days, while the pictures of other tanks were taken on fair weather days. It turned out that the classifier had learned this pattern from the training set and consequently mostly used background features to classify the tanks. Such examples prevent users from trusting a black box model. In fact, some studies have shown that even if they are proven to be more accurate than human forecasters, most people exhibit an inherent distrust of automated predictive models [51]. If the users do not trust a model or a prediction, they will not use or deploy it. Thus, the explainability of models is important, not only for developers but also for the end users, and all other parties involved.
XAI is a new research field. It refers to approaches attempting to make machine learning models more explainable and to address the above-mentioned issues. Several XAI review papers were recently published, indicating its importance and topicality [18], [52]- [56]. Generally, the explainability of a model refers to any approach that helps the user or developer understand the model behavior and its reasoning process [17]. While no definition of XAI is uniformly accepted, it can be conceptualized as the ability to provide human-understandable justifications explaining the way in which a model works so that observers can understand how and why it has delivered particular outcomes. For example, in the military classifier case discussed above [50], an explanation would have shown that the classifier used the background instead of the features of the tanks for classifying the photos. Thus, XAI can help to identify potential bias in the training data, ensure algorithmic fairness, and verify that the algorithms perform as intended [53].
As pointed out by Baker [11], explainability is also one of the biggest challenges in LA nowadays. Several LA studies have shown that complex models outperform the simpler ones. However, if an instructor does not understand such a complex LA model and if a development team cannot explain it, the LA model will probably never be employed in practice (ibid). Instead, only simple linear models that have been around for years continue to be used. This is a problem, because as argued for example in [12], relationships in educational data are often complex and cannot be modeled well enough with the simple models. If the better performing complex models could also be explained in such a way that even practitioners with no background in data analysis could easily understand them, they would probably be employed more often.
Conati et al. [27] argued that the explainability of models is also important for learners: For instance, if learners cannot comprehend the logic of an intelligent tutoring system, they are not motivated to follow the systems instructions and their trust in the system as a whole will decrease. Another reason the explainability of LA models has become increasingly important is that the new General Data Protection Regulation (GDPR) now includes a right to explanation and information [57], [58]. This means that if automatic profiling (e.g., in student analytics) is used, it is not only a desiderata but actually a requirement to be able to explain to a student why he/she was assigned to a particular profile.
In general, one can distinguish XAI methods that are intrinsic, meaning interpretable due to their simple structure, and post-hoc XAI methods, meaning methods applied after model training to explain the model's logic in retrospect. Moreover, one distinguishes between local and global explanations [59], [60]. While modular global explanations provide interpretation for the model as a whole, approaching it holistically, a local explanation provides interpretation for a specific observation (such as one particular student). Finally, explanation techniques can be model specific, meaning the explanation technique is specific to its model, or model agnostic, meaning the explanation technique can be applied to any model.
In this work, we use both intrinsic model-specific and post-hoc model-agnostic explanations as well as global and local explanations. Moreover, we want to explain not only the most important characteristics of the different agency profiles (global explanations) but also explain, for specific observations, why they were assigned to a particular group (local explanations). The latter are especially interesting for instructors who receive a report about their students' agency and can then see why a particular student was assigned to a particular agency group. Finally, as pointed out above, students have a right to information about individual decisions made by agency algorithms, and the local XAI techniques enable us to provide such information.

A. MULTINOMIAL LOGISTIC REGRESSION
Logistic regression is an example of a machine learning method that because of its linear structure is intrinsically explainable and offers model-specific modular global explanations. It is probably the most traditional technique to predict a categorical response variable (i.e., the class). If the class is dichotomous, a simple logistic regression can be used that employs a logistic function to measure the relationship between the class and the explanatory variables through estimating probabilities. If the class has more than two categories, multinomial logistic regression should be used. Multinomial logistic regression uses the softmax function (i.e., a generalization of the logistic function to multiple dimensions) to calculate the probabilities of each class category over all possible class categories. These calculated probabilities are then used for determining the class (i.e., the response variable category) for the given inputs.
Logistic regression is intrinsically explainable through its coefficients. The coefficient of a continuous explanatory variable can be explained as the estimated change in the natural log of the odds for the reference event for each unit increase in the predictor [61]. In general, the larger the absolute magnitude of a coefficient is, the more relevant the corresponding explanatory variable is for the classification. Moreover, the sign of the coefficient indicates whether the explanatory variable increases or decreases the probability of belonging to a certain class. Furthermore, if the logistic regression model is penalized with the l 1 norm, some of the feature coefficients shrink to exactly zero, which makes the model simpler and easier to explain [62]. However, although (multinomial) logistic regression generally meets the characteristics of an explainable model, Arrieta et al. [63] point out that it may also demand post-hoc explainability techniques, such as visualizations, particularly if the model is to be explained to non-expert audiences. VOLUME 9, 2021

B. MULTILAYER PERCEPTRON
A multilayer perceptron (MLP) is an example of a machine learning technique that is also able to find and model complex nonlinear interactions in data and, thus, often outperforms linear techniques, such as the previous discussed logistic regression. It consists of an input layer, at least one hidden layer, and an output layer. Each layer consists of nodes, and except for the input nodes, all nodes are neurons with nonlinear activation functions. MLPs are fully connected, meaning that each node in one layer connects with a certain weight w ij to every node in the succeeding layer. These weights on the nodes are automatically adjusted to construct the mathematical model that most accurately maps the input features (such as the agency dimensions of the students and the information in which course he/she was studying) to the output labels.
However, MLP models are generally regarded as black boxes and opaque. For example, even when techniques are used to identify the features that a particular MLP model assigned significant weights to, the relationships between those features and the classification can be weak because a small permutation in a seemingly unrelated aspect of the data can result in a significantly different weighting of features [64]. Moreover, different initial settings can result in the construction of different models [65].

C. RANDOM FOREST
Random forests, as well as other tree-based techniques, are one of the most popular nonlinear supervised machine learning methods nowadays [66]. They are ensemble learners based on decision trees, which are on the one hand, explainable and able to model nonlinear relationship in data, but on the other hand, generally low performing because they tend to overfit the training data. Through growing each tree in the ensemble (i.e., the forest) only on a bootstrap sample from the original data and by randomly using only a subset of the features for each node in each tree, random forest keep the main advantages of decision trees while at the same time overcoming their disadvantage. In other words, random forest are also explainable and able to model nonlinear relationship in data, but-through the bagging of many uncorrelated decision trees-surmount the overfitting and low-performance issue of decision trees. In fact, they perform so well that they are often the winner in machine learning competitions [66], [67]. Nevertheless, although the importance of a global model-specific feature is generally provided with the random forest implementation (for example, in Python, Gini measures the global importance of the input features), less attention has been paid o far to local explanations for random forest predictions [66].

D. LOCAL INTERPRETABLE MODEL-AGNOSTIC EXPLANATIONS
Local interpretable model-agnostic explanations (LIME) are a XAI tool developed by Ribeiro et al. [68]. LIME provides explanations, such as features and rules of features, that were important for predicting a specific observation (i.e., local explanations). It can be used for any prediction model, meaning it is model agnostic, because it does not even need know the actual ''black box'' prediction model f ; it just uses its predictions. More specifically, it changes the model's inputs and then uses the model's outputs to make conclusions about the model. The main idea is that if the model prediction does significantly changes after the value of a feature is slightly adjusted, that feature may be an important predictor. Vice versa, if the prediction does not change, the changed feature may not be important at all.
It accomplishes this by taking the observation x for which the prediction should be explained and permuting its feature values. All of these permuted fake observations are weighted by their distance to x. Then, the black box model f is used to predict the permuted observations, and a new surrogate/explanation model (can be any explainable model, such as a linear model or decision tree) g is trained that reflects the original predictions as accurately as possible, while the complexity of this surrogate model is kept as low as possible. Then the explanations of the simple surrogate model (for example, the weights if g is a linear model) are used to explain the local behavior of f (x).
Mathematically, this can be expressed as follows: where π x is the proximity measure to define locality around x, and (g) is the complexity of g that should be kept low (for example, by minimizing the number of non-zero weights if g is a linear model).
The advantages of LIME are that it is relatively easy to use and understand. However, certain drawbacks are associated with it. One of these is the potential inconsistency between the surrogate model prediction g(x) and the real model prediction f (x). Another drawback is the lack of comparative values for the LIME values. SHAP, which will be discussed below, overcomes these drawbacks.

Shapley values, introduced by Shapley [69]
, originate from cooperative game theory. They measure the fair payout that each player should receive based on his/her contribution to the total payout of the game. The payout for each player is proportional to his/her marginal contribution to the total payout. Similarly, when used as an explanation for a prediction, a Shapley value measures the contribution of an individual feature to the total prediction. This means a Shapley value is the average marginal contribution of a feature value across all possible coalitions of the features.
The fair contribution of feature i is obtained by taking the average of the contribution over the possible different permutations in which the coalition can be formed. Mathematically, this can be expressed as follows: where N is the number of all features, S a subset of the N features, and v(S) the prediction of the S features. When feature i joins the S features, its marginal contribution is Shapley values come with four desirable properties: (i) efficiency, meaning that the sum of the Shapley values of all features equals the value of the total coalition; (ii) symmetry, meaning that all features have a fair chance to join the prediction; (iii) dummy, meaning if a feature contributes nothing to any coalition S, then the contribution of that feature is zero; and (iv) additivity, meaning that for any pair of predictions v, w: SHapley Additive exPlanations (SHAP) are a XAI tool developed by Lundberg and Lee [70] that uses these Shapley values to explain machine learning models. It includes the model-agnostic SHAP KernelExplainer that works universally for any prediction model. The KernelExplainer builds a weighted linear regression by using the given data, the predictions, and the function/model that predicts the predictions. It computes the feature importance values based on the Shapley values and the coefficients from a local linear regression. Besides the KernelExplainer, the SHAP tool also includes other explainers that have been optimized for specific models. One example is the TreeExplainer, which was optimized for tree-based prediction models [66]. According to Lundberg et al. [66], it is the only tool that enables the exact computation of optimal local explanations for tree-based models. The TreeExplainer can also be used as a global explanation method by averaging local explanations. For example, if this is done over all instances in a dataset, it results in a global measure of feature importance.

IV. APPLICATION OF EXPLAINABLE STUDENT AGENCY ANALYTICS
In this section, we present the results from an application of XSAA in higher education. All the analytics were performed in Python 3.8.2, using LIME and SHAP toolboxes.
Sample and Study Context: Four courses on mathematics (A1-A4) of first-year engineering students (n = 141) in a Finnish higher education institution (university of applied sciences, ISCED Level 6) were studied. Each course had a different responsible teacher but the same basic contents and learning goals. The teaching arrangements as a whole were mostly traditional: lectures and guided exercises in a classroom and additional homework. The courses consisted of instructional videos, automatic tests that guided the student depending on the answers, and a final test. In addition to class hours, teachers sent emails to the whole student group using the virtual learning environment. Personal messages between teachers and students were exchanged by email. In all the courses, mid-term feedback was collected, and depending on the results, some small modifications were made (for example, more time was allocated to topics the students found challenging). All the courses also had voluntary support classes guided by the teacher.
Different practices were also used between the courses. Attendance affected the evaluation in one course (A2). Two courses (A1 and A4) made continuous self-assessments; one based on homework and their model solutions (A1) and the other based on the results of automatic tests in the learning environment (A4). One course (A3) had extra support hours guided by a student assistant. In one course (A4), the students had the opportunity to get a small amount of personal guidance from the teacher if necessary. Moreover, this course (A4) made weekly applications on the topics practiced and had small teams.
Analysis Between Prototypes: Prototypical student agency profiles were created using clustering. The different prototypical agency profiles (PAP1-PAP4) and the general agency profile (GAP) are presented in Figure 3. GAP is the profile of all the analyzed students. All the agency dimensions maintain the order from the lowest profile PAP1 to the highest profile PAP4. In general, the relational resources of student agency (equal treatment, trust for the teacher, and teacher support) were experienced as the highest resource domain and > 4 in all profiles except in PAP1. Three of the participatory resources (participation activity, opportunities to make choices, and opportunities to influence) were generally experienced as lower than other resources in all the profiles. The rest of the participatory resources and the personal resources were experienced close to the factor value of 4 at the GAP level. PAP1 was particularly characterized by low personal resources.
Analysis Between Courses: The analysis between courses revealed differences in student agency between the four different course instances (A1-A4). Figure 4 presents the box plots of each student agency dimension in each of the course instances. There were statistically significant differences in all the dimensions based on the pairwise comparison using the Mann-Whitney U statistics. In particular, the student agency dimensions of trust for the teacher, teacher support, and opportunities to influence were experienced as lower in the A3 course instance comparing to other courses, and the difference was statistically significant.
We also examined if there were any dominant prototypical profiles present in each of the courses (Table 1). Based on the chi-square test of the contingency table, statistically significant differences were observed; χ 2 (9, n = 141) = 30.1, p < .001. More students were assigned to the higher agency profiles PAP3-PAP4 in the courses A1 and A4. In course A4, no students were observed in the low agency profile PAP1. In course A3, the majority of the students were in the profiles PAP1-PAP3, and only 5% were in the high agency profile. In A2, a somewhat equal quantity of students were assigned to each PAP.
Prediction Results: In comparison to earlier work, we not only created the student agency profiles here but also built  models predicting these profiles. Using these models, their global model-specific explanations, and local model-agnostic LIME and SHAP explanations on top of them allows us to identify the most important characteristics explaining why certain students are assigned to certain profiles. To predict the multinomial class label (i.e., the agency profile), we used all 15 features: the 11 agency dimensions and the four course variables that were one-hot encoded into binary features.
To estimate and compare the models for the supervised task (i.e, predicting the student profile), we divided the data with a stratified split into a training (80%) and an independent test set (20%). Then, we used stratified fivefold cross-validation on the training set to estimate the best hyperparameters for the classifiers. We compared the multinomial logistic regression (MLR) with l 1 , l 2 , and elasticnet penalization, random forest, and MLP classification models to predict the agency profile. Table 2 summarizes the best model for each classifier as determined through the fivefold cross-validation on the training set and its performance on the independent test set. As shown in the table, the two nonlinear classifiers (random forest and MLP) outperformed the three linear classifiers. Overall, random forest was the best performing classifier when comparing all classifiers, and multinomial logistic regression with l 1 penalization was the best linear classifier.
Global Explanations: Since random forest was the best classifier overall and the multinomial logistic regression with l 1 penalization the best linear classifier, we focused on these two models to explain the prediction results. Figure 8 shows the coefficients of the multinomial logistic regression with l 1 penalization predicting the highest agency profile PAP4. Figure 9 shows the coefficients of the multinomial logistic regression with l 1 penalization for all four agency profiles. The figures illustrate that overall, the agency dimensions seem more important for the prediction model than the course variables. However, being in a certain course can also increase or decrease the probability of belonging to a particular agency profile. For example, being in course A1 decreases the probability of belonging to agency profile PAP2 and increases the probability of belonging to agency profile PAP3 (see Figure 9). Figure 10 shows the importance of the features of the random forest model predicting the agency profile. In comparison to the coefficients from the multinomial logistic 137452 VOLUME 9, 2021 FIGURE 4. Student agency dimension in each course instance and pairwise statistical significance using Mann-Whitney U statistics. As usual, corresponds to p < 0.05, to p < 0.01, and to p < 0.001.
regression, the feature importance levels of the random forest are always positive and do not encode which class a feature is indicative of. The random forest feature importance levels can tell us that a certain feature is important, but not whether it is indicative of a student having agency profile PAP1, PAP2, PAP3, or PAP4. Moreover, they provide no information in regard to whether a high feature value increases or decreases the probability for a certain class. They just summarize the importance of each feature for the whole model. If we combine all the local SHAP values (the results of the individual local explanations are provided in the next section) for all the students, we can also get the global SHAP explanations for a model. This is shown in Figure 5 for the random forest classification model. As the figure shows, a student's competence belief was the most important feature for the model, especially when determining if he/she belongs to the lowest (PAP1) agency profile. This model-agnostic explanation is the same as that from the model-specific feature importance levels (see Figure 10, here the competence belief was also the most important feature) but more informative as it also shows which features are important for each profiles.
Local Explanations: As explained in Section III, local explanations enable us to explain why a certain student received his/her prediction and the contributions of the individual predictors. Global feature importance, as discussed above, only shows the results across the entire population, VOLUME 9, 2021 To explain the model predictions for particular students, we used the true positives with the highest probability for each agency profile; that is, those four students from the test set that the model correctly predicted to be PAP1, PAP2, PAP3, and PAP4, respectively, with the highest probability. Table 3 summarizes these local explanations for the random forest model. As we saw already in the global model-specific explanations (Figure 10), the opportunities to influence was one of the most important variables for the random forest model. However, from Table 3, we can also see for which profiles this variable was especially important (namely, agency profile PAP2, PAP4, and especially PAP3).
The LIME rules can also be presented visually. Figure 11 shows the LIME rule visualization for the PAP2 student who was predicted to be a PAP2 profile with the highest probability with the random forest model. For comparison, Figure 6 shows the SHAP local explanations for the same model and student. This plot provides a more comprehensive explanation overview of the prediction than the LIME rules.
More specifically, as Figure 6 shows, the model predicted an 88 percent chance that this student was a PAP2 student, whereas the base value (i.e., the prediction if nothing would be known about this student) for PAP2 was a 29 percent chance. The feature values causing increased predictions are in red, and their visual size shows the magnitude of the feature's effect. The biggest impact comes from the opportunities to influence, which is 3.16 for this student. The feature values decreasing the prediction are in blue. As can be seen in Figure 6, the fact that this student is in course A1 had a meaningful effect, decreasing the prediction. The model predicted some tiny probabilities that this student was a PAP1 or PAP3 student, but his/her competence beliefs are lower than for PAP3 and higher than for PAP1 students. If one subtracts the length of the blue bars from the length of the red bars, it equals the distance from the base value to the output. This means that the baseline plus the sum of individual effects add up to the prediction as discussed in Section III.
Local Explanations for the Student Needing the Most Support: The local explanations also enable us to locate the students needing support the most and to receive the explanations describing which factors could affect a change toward higher agency. Based on Table 1 and Figure 9, we can conclude that the students in course A3 needed the most support. Since profile PAP1 represents the lowest agency profile, we chose the student from the test set who was in course A3, and was predicted to have the lowest agency profile PAP1 with the highest probability, for the local explanations. Figure 7 shows the SHAP values explaining why this student was assigned to profile PAP1 with the highest probability. As Figure 7 illustrates, the base value of the prediction in the absence of any information on the independent variables is 0.2138. Knowing that the competence beliefs of this student are only 1.907 increased the prediction that this student is PAP1 by 0.222, and knowing that the self-efficacy value of this student is 1.878 increased the prediction for profile PAP1 by another 0.176 (see Table 4).

A. SUMMARY AND DISCUSSION OF RESULTS
Our results can be summarized from the application level and the methodological level. From the application level, we can conclude that the level of student agency was higher in the two courses, A1 and A4, where continuous task-driven self-assessment took place. No students were in the lowest agency profile PAP1 in the course A4, and the majority of the students in A1 and A4 were in the higher agency profiles PAP3 and PAP4. One reason for the students' generally high sense of agency in course A4 might be the personal guidance that the teacher offered in the course. Furthermore, a joint analysis of Figure 8, Figure 5, and Table 3 suggests that if the students found support from their peers and experienced opportunities to influence and participate in the course, they tended to have higher agency profiles.
From the teacher's perspective, the XSAA results could provide insight for pedagogical planning. For example, the students in course A1 seem to have received the proper amount of teacher's support and attention, as relational resources were scored high and those resources represented some of the most important resource areas for the second highest agency profile PAP3 (Figure 3, Figure 5, and Table 3). To foster student agency of the PAP2 and PAP3 students in A1, the teacher could provide low-threshold ways for participation because the participatory resources were considered important in the highest profile PAP4. In addition, suggestions to improve pedagogical planning could be made by analyzing the characteristics of the students in the lowest agency profile PAP1. The findings suggest that low self-efficacy and competence beliefs are important common nominators for students in PAP1 (Figure 3, Figure 5, and Figure 7). As there were many PAP1 students in course A3, these students might TABLE 3. LIME rules explaining the true positive students for each profile from the test set with the highest probability with the random forest model. For each student, the rules are ordered by importance with the most important rule first. FIGURE 6. SHAP values explaining why the random forest model predicted an agency profile 2 student from the test set to be profile PAP2 and not profile 1, 3, or 4 (the bars are ordered by the profile number; i.e, the first bar predicts PAP1, the second PAP2, and so on). For each bar, the values explain how to get from the base value that would be predicted if no feature would be known to the current output for this particular profile 2 student. Feature values causing increased predictions are in red, and feature values decreasing the prediction are in blue. Their visual size shows the magnitude of the feature's effect.
benefit from more extensive encouragement as well as more attention and support in understanding the course contents (cf., [71]).
From the methodological level, our results showed that the complex nonlinear methods, especially the random forest, improved the accuracy of the predictive models. The VOLUME 9, 2021 FIGURE 7. SHAP values explaining why the random forest model predicted an agency profile PAP1 student studying in course A3 from the test set to be profile PAP1 (true positive). The most important explanations are the low competence beliefs and self-efficacy values of this student.

FIGURE 8.
Coefficients of the multinomial logistic regression with l 1 penalization predicting the highest agency profile (PAP4). For seven features, the coefficient is zero, meaning they were irrelevant for this prediction model. A high value in all the picked features (except CourseA3) increases the probability that a student will be assigned to PAP4. However, if the student is in course A3, the probability that he/she will be assigned to PAP4 decreases. traditional linear techniques performed worse but came with more informative global model-specific explanations. For example, while the global model-specific explanations from the random forest simply provided a ranking of the input features, the global model-specific explanations of the logistic regression with l 1 penalization also showed which feature was important for which class and which direction (i.e., whether it increased or decreased the probability for this class). Moreover, several features were dropped from the model, making it sparser and more interpretable.
Through recently developed model-agnostic XAI tools, we were able to also explain the better performing classifiers. LIME and SHAP can be used on top of any (complex) classifier to explain predictions for particular students (local explanations). These local explanations are very important, mainly for two reason. First, the GDPR now includes a right for explanation [57]. This means that if an automatic profiling is used in an LA tool, the student has a right to receive an explanation about his/her particular profiling.
Second, the local and global explanations can be different, and it is thus not enough to use the global explanations to explain why a particular student was assigned to a certain profile. For example, according to Figure 5, the most important agency dimensions for PAP2 (visual consideration of the lengths of the orange bars) are opportunities to influence, competence beliefs, and then trust for the teacher and selfefficacy. However, according to the LIME rules for that student in the test set who was assigned to PAP2 with the highest probability (Table 3), the order of importance concerning FIGURE 9. Coefficients of the multinomial logistic regression with l 1 penalization predicting agency profiles PAP1-PAP4. As a whole, the course features seem not as important as the agency dimensions but they are contributing. For example, if a student is in course A1, the probability that he/she will have the second highest agency (PAP3) increases.  . LIME rules explaining why the random forest model predicted an agency profile PAP2 student from the test set to be profile PAP2 (i.e., a true positive) with the highest probability. The most important local explanation why this student was assigned to this profile are his/her participation values.
agency dimensions was participation activity, opportunities to influence, peer support, and then trust for the teacher.
In other words, the LIME rules (also those for the students that are representative for their PAP-profiles) do not always resemble the global explanations ( Figure 5). For example, for the particular PAP2 student analyzed in Table 3, the value of participation activity was extremely low (2.69, see Figure 11), and the local surrogate model built by LIME to explain this prediction relied on this feature to a significant degree. This exemplifies the ''local fidelity'' of LIME: LIME explanations can be trusted only locally around the specific instance being explained. In contrast, the local SHAP explanations canbecause of their additivity-be combined so that they can also be used to explain the global behavior of the model (Figure 5), being therefore more in line with the global model-specific explanations ( Figure 10).
Naturally, our results are limited to the relatively small amount of data. Further data collection is required to increase the reliability of the observed connections between student agency and course implementations in higher education. In this paper, we have established the foundations for the use of XAI techniques in analyzing students' agency. Further work is required to examine, for example, the causal relationships of teaching practices and student agency.

V. CONCLUSION
Student agency is a key construct in the contemporary discourse about student-centered learning in higher education [3]- [5]. Jääskelä et al. [9] developed an LA process called student agency analytics (SAA), which utilizes a psychometric questionnaire instrument [35] and machine learning to provide information about the different resources of student agency. The recent literature on LA has highlighted the importance of explainability when utilizing complex models in education (e.g., [11], [14], [72]). In this study, we employed XAI techniques to derive more detailed information from student agency data. The purpose was to illustrate how the SAA process, combined with XAI techniques, could advance teachers' pedagogical awareness and reflection.
The purpose of the XAI techniques is to help to gain an understanding of how and why a model works. We used the multinomial logistic regression coefficients, feature importance levels of the random forest model, and combined SHAP values to explain the essential characteristics of the different agency profiles (global explanation). The prediction of the student profiles showed that the nonlinear techniques (especially random forest) modeled the data the best. The finding indicates that the relationships between the prototypical profiles of student agency and the teaching practices in higher education are relatively complex. Local explanations gave insight into why a student was assigned to a particular agency profile. Altogether, the XSAA results could be used to derive tentative explanations of the different experiences of student agency and to suggest ideas for pedagogical planning, as summarized in Section IV-A.
Educators at all levels of education need to take steps toward supporting student agency. To promote the educators' efforts, Moses et al. [4] called for connecting theory and practice and suggested increasing the research and practitioner-focused work about how teachers could support student agency. They emphasize that student agency ''is a practice-embedded construct that shapes the daily work of educators'' by involving them in reflecting the ways to create agentic spaces for students and making pedagogical decisions based on that reflection [4]. We see that this kind of teacher reflecting, pedagogical planning, and sharing of experiences of the agency-supporting practices among the colleagues could be facilitated using research-based tools and explainable SAA. These tools could help teachers to detect and understand the different experiences of student agency in their courses.
In summary, explainable models can provide more detailed and meaningful information about the different dimensions of student agency. By getting an overview of the different experiences of student agency in their courses, teachers could better meet the practical challenges of supporting student agency. Furthermore, higher education institutions could better adapt their capabilities to different learners' needs now and in the future. Thus, XSAA has the potential to contribute to teachers' pedagogical planning through the LA cycle.
MIRKA SAARELA received the degree in computer science from the University of Passau, Germany, and the Ph.D. degree in mathematical information technology from the University of Jyväskylä, Finland. She is currently a Postdoctoral Research Fellow with the Faculty of Information Technology, University of Jyväskylä. Her research interests combine machine learning, explainable artificial intelligence, cognitive computing, and education. She was awarded several grants for her research, amongst others, by the Ulla Tuomisen Foundation, the Savings Bank, and the Ella and Georg Ehrnroot Foundation. VILLE HEILALA received the master's degree in education and the master's degree in computer science. He is currently pursuing the Ph.D. degree with the Faculty of Information Technology, University of Jyväskylä. He is currently a Researcher with the Faculty of Information Technology, University of Jyväskylä. His research interests include learning analytics and student agency.
PÄIVIKKI JÄÄSKELÄ received the Ph.D. degree in education. She is currently an Adjunct Professor. She is a Senior Researcher with the Finnish Institute for Educational Research, University of Jyväskylä, Finland. She is the main Developer of a validated scale with the Agency of University Students (AUS), that assesses university students' agency within the multidimensional framework which takes into account the interplay of individual, relational, and participatory aspects of agency. She has worked several years as a Responsible Researcher in various university-level projects on teacher development and the development of pedagogy of university studies. She has extensive experience as a Teacher in educational sciences and a Teacher Educator at the University of Jyväskylä. Her research interests include university student agency, learner-centered pedagogy, and teacher development in higher education.
ANNE RANTAKAULIO has been a Teacher in mathematics with the Department of Technology and Transport, JAMK University of Applied Sciences, since 1986. As the teaching tools have evolved and e-learning has increased, she has become particularly interested in what learning really is and how it can be best supported.