To Opt in or to Opt Out? Predicting Student Preference for Learning Analytics-Based Formative Feedback

Teachers’ work is increasingly augmented with intelligent tools that extend their pedagogical abilities. While these tools may have positive effects, they require use of students’ personal data, and more research into student preferences regarding these tools is needed. In this study, we investigated how learning strategies and study engagement are related to students’ willingness to share data with learning analytics (LA) applications and whether these factors predict students’ opt-in for LA-based formative feedback. Students (N = 158) on a self-paced online course set their personal completion goals for the course and chose to opt in for or opt out of personalized feedback based on their progress toward their goal. We collected self-reported measures regarding learning strategies, study engagement, and willingness to share data for learning analytics through a survey (N = 73). Using a regularized partial correlation network, we found that although willingness to share data was weakly connected to different aspects of learning strategies and study engagement, students with lower self-efficacy were more hesitant to share data about their performance. Furthermore, we could not sufficiently predict students’ opt-in decisions based on their learning strategies, study engagement, or willingness to share data using logistic regression. Our findings underline the privacy paradox in online privacy behavior: theoretical unwillingness to share personal data does not necessarily lead to opting out of interventions that require the disclosure of personal data. Future research should look into why students opt in for or opt out of learning analytics interventions.

personalized feedback are mainly the challenges of feed-93 back in general. Both the feedback sender and the receiver 94 contribute to the success of feedback. Jonsson [15] has 95 presented five reasons why students may not use feedback: 96 (1) it may not be useful; (2) it may not be sufficiently 97 individualized; (3) it may be too authoritative; (4) students 98 may lack strategies for using feedback; and (5) students may 99 not understand the terminology used. Furthermore, Price and 100 colleagues [16] suggest that students' 'readiness to engage' 101 (i.e., motivation to receive feedback, emotional response, and 102 assessment literacy skills) may contribute to the engagement 103 with feedback. For example, students may misinterpret the 104 feedback if they do not understand the difference between 105 formative and summative assessment [16]. However, some 106 challenges are unique to LA-based personalized feedback. 107 For example, while automation decreases the human effort 108 needed for providing feedback, there might be a tempta-109 tion to send feedback more often than would be optimal 110 from a student's perspective. Furthermore, Lim and col-111 leagues [17] found that some students with a study strategy 112 based on offline activities felt that the feedback overempha-113 sized engagement with online learning tasks. 114 Finally, it is essential to note that feedback may also have a 115 negative impact. Especially, feedback administered in a con-116 trolling manner may harm intrinsic motivation [18]. While 117 feedback is one of the most potential areas where intelli-118 gent technologies may augment teachers' abilities, caution is 119 needed not to scale up any adverse effects. While students are increasingly aware of data mining used 122 to monitor and influence buying behavior [19], they are not 123 necessarily expecting the same in an educational context [20]. 124 Students trust not-for-profit higher education institutions 125 more than for-profit corporations and are comfortable with 126 practices in a university setting that they were skeptical of in 127 a corporate environment [21]. Interesting questions are how 128 aware students are about education institutions using their 129 data, which data and for which purpose they are willing to 130 share with the institution, and whether students have agency 131 in deciding how their information is used. 132 Jones and colleagues [21] found that undergraduate stu-133 dents at U.S. higher education institutions lack awareness of 134 analytic practices and the data they rely on. Several students 135 encountered the idea of the university collecting and analyz-136 ing information about them for the first time during the study 137 [21]. In Australia, Roberts and colleagues [20] found that 138 most students were unaware or unsure of what big data and 139 learning analytics were. Furthermore, in Finland, Teräs and 140 colleagues [22] found that most students did not know what 141 data their institution collected from them and what purposes 142 the data were used for.   Overall, there is a tendency toward greater student partici-217 pation in the deployment of learning technologies that utilize 218 student data. Opt-in procedures are only one yet essential part 219 of student agency in LA.

221
Technological tools and learning analytics (LA) augment 222 teachers' capabilities for providing formative feedback to 223 support students' self-regulated learning. According to Hat-224 tie and Timperley [3], feedback regarding the processing of 225 the task and self-regulation is more powerful than feedback 226 regarding the task or the student as a person. Furthermore, 227 Lim and colleagues [17] suggest providing students feedback 228 on their time management and learning strategies to enhance 229 personalized feedback further.

230
There is an increasing focus on student agency and ethical 231 aspects of LA interventions in LA literature. Opt-in proce-232 dures with an opportunity to review one's decisions later are 233 recommended [26]. However, little is known about how stu-234 dents use the possibilities for opting in or out of LA interven-235 tions and how different student characteristics are associated 236 with the opt-in behavior. In the present study, we addressed 237 this gap by investigating the interactions between student 238 characteristics and opt-in behavior in a setting where students 239 can manage their opt-in as self-service.

240
The first aim of our study was to examine the complex 241 interactions between learning strategies, study engagement, 242 and students' willingness to share data with LA applications. 243 We were especially interested in whether it matters what 244 data the students are requested to share: are different com-245 ponents of learning strategies and study engagement associ-246 ated with willingness to share specific data types? We also 247 investigated, to what extent students are willing to share their 248 data. We hypothesized that students are less willing to share 249 fine-grained data revealing their behavior than other data 250 types.

251
The second aim of our study was to understand how 252 the willingness to share data, learning strategies, and study 253 engagement affect students' opt-in for LA-based formative 254 feedback. Acknowledging the privacy paradox, students' low 255 self-reported willingness to share data does not necessar-256 ily lead to opting out of LA interventions. We investigated 257 whether students' self-reported willingness to share data, 258 VOLUME 10, 2022 learning strategies, or study engagement predicted their initial 259 opt-in for LA-based formative feedback or the choice to 260 change their opt-in status later during the course.

263
The context of the present study is an undergraduate-level In this study, we used data on students' course progress  Table 1 299 for examples). The number of feedback messages received    The Schoolwork Engagement Inventory (EDA) consists of 331 nine items that load onto three factors: energy (3 items; e.g., 332 'When I study, I feel I'm bursting with energy'), dedication (3 333 items; e.g., 'I find my studies full of meaning and purpose') 334 and absorption (3 items; e.g., 'Time flies when I'm study-335 ing'). The items were rated on a scale ranging from 1 ('Totally 336 disagree') to 6 ('Totally agree').

337
The Sharing of Data (SOD) questionnaire focuses on 338 whether students are willing to disclose specific data types 339 (28 items; e.g., 'medical information,' 'records of my down-340 loads in the learning environment') to a learning analytics 341 system ('Please indicate whether you would agree to disclose 342 the following data for a Learning Analytics system'). Items 343 were rated on a Thurstone scale from 0 ('I do not agree') to 1 344 ('I agree'). 345

346
Before starting the first module of the course, the students 347 needed to fill out a course enrollment questionnaire where 348 they selected their Course Completion Goal ('The course can 349 be completed on your own schedule. In which month do you 350 plan to complete the whole course?') and Opt-in for Person-351 alized Feedback ('I WOULD LIKE to receive automated per-352 sonalized feedback messages' or 'I DO NOT want to receive 353 automated personalized feedback messages'). As students 354 could change their answers throughout the course, we created 355 two dummy variables: Initial opt-in and Opt-in changed.

357
We applied exploratory factor analysis with generalized least 358 squares estimation methods and oblique promax rotation to 359 Sharing of Data (SOD) items using JASP software (Version 360 0.16) [39]. We found a factor structure with four factors. 361 The first factor ('Performance data') was comprised of ten 362 items (e.g. 'school history records', 'motivation questionnaire  Regarding our second research aim, we ran Mann-Whitney 397 U-tests in JASP to determine any statistically significant 398 differences between the opt-in and opt-out groups and opt-in 399 changed and no changes groups. Then, we built binomial 400 logistic regression models with Initial opt-in as a depen-401 dent variable using the glm function in the R stats package.

402
As there were considerably more students opting in than 403 opting out, we built models with original data and weighted 404 data, where we weighed students opting out using a 4:1 ratio. 405 We first built a naive model using all the survey variables 406 as predictors and then chose a model by AIC in a stepwise 407 algorithm using the stepAIC function in R. This resulted 408 in a total of four models (original naive, original stepwise, 409 weighted naive, and weighted stepwise). Sensitivity, speci-410 ficity, and balanced accuracy metrics were calculated for each 411 model. A small number of students changing their opt-in 412 status during the course prevented building a model with Opt-413 in changed as a dependent variable.

416
Most students (N = 121, 77% initially; N = 114, 72% after 417 changes during the semester) opted in for LA-based feedback 418 regarding their progress toward their goals. Eight students 419 who initially opted in decided to opt out later, whereas one 420 student who initially opted out chose to opt in later.

421
In Table 2 we present the descriptive statistics, internal 422 consistencies (i.e., Cronbach's alphas), and pairwise corre-423 lations of the measures used in the study. Correlation anal-424 yses revealed multiple significant correlations between the 425 self-report measures. All six variables regarding learning 426 strategies and study engagement were positively correlated 427 (r ≥ .40). Moreover, nearly all sharing of data variables were 428 positively correlated (r ≥ .35), a nonsignificant connection 429 between Demographic data (DEM) and Sensitive data (SEN) 430 being the only exception.

431
Regarding the sharing of data variables, students are, 432 in general, more willing to share demographic data (M = 433 0.62, SD = 0.37) and performance data (M = 0.60, SD = 434 0.36) compared to process data (M = 0.44, SD = 0.35) 435 and especially sensitive data (M = 0.21, SD = 0.29), which 436 students are very hesitant to share. Students' willingness to 437 disclose data about their performance was positively corre-438 lated with self-efficacy (r = .34) and all of the study engage-439 ment variables (r ≥ .24). Moreover, willingness to disclose 440 process data was positively correlated with absorption (r = 441 .25) and willingness to share sensitive data with self-efficacy 442 (r = .25).

443
Students who opted in for LA-based formative feedback 444 had, on average, slightly higher scores in most of the learning 445 strategies, study engagement, and sharing of data variables 446 compared to students who opted out (See Table 3). How-447 ever, none of these differences were statistically significant 448 VOLUME 10, 2022 who changed their opt-in status during the course to stu-with self-efficacy (r p = .12), dedication (r p = .04) and 482 energy (r p = .03) nodes. Moreover, there is a weak con-483 nection between the self-efficacy node and sharing of sen-484 sitive data node (r p = .03). To conclude, willingness to 485 share data is rather weakly connected to different aspects of 486 learning strategies and study engagement based on a partial 487 correlation network, the strongest connection being between 488 self-efficacy and sharing of performance data. The binomial regression models to predict students' initial 492 opt-in for LA-based formative feedback are presented in 493  Table 4. The first model (no weighting, naive) attempted to 494 predict students' initial opt-in based on all survey metrics. 495 Looking at the model coefficients, none of the predictors is 496 statistically significant, and the model accuracy is very weak 497 (balanced accuracy 0.49). Using a model selection algorithm 498 to select variables for the second model (no weighting, step-499 wise AIC), a null model with only the intercept was created 500 (balanced accuracy 0.50).

501
As the opt-in and opt-out groups were imbalanced (see 502  Table 3), we also created weighted models where students 503 opting out were weighted using a 4:1 ratio. The third model 504 (weighted, naive) had a slightly higher balanced accuracy 505 (0.65) than the first model, but none of the predictors were 506 statistically significant. Using the model selection algorithm, 507 we selected five predictors for the fourth model: self-efficacy, 508 metacognitive self-regulation, energy, dedication, and will-509 ingness to share demographic data, of which metacognitive 510 self-regulation was the only significant predictor (p = 0.04). 511 While the fourth model had the highest balanced accuracy 512 (0.67) of our models, it can still be considered weak.   Our results agree with previous findings: applying the 531 factor structure found here to results by Ifenthaler and Schu-532 macher [23], we found little difference in results regarding 533 performance data, process data, or sensitive data. The willing-534 ness to share demographic data here was moderately higher 535 than results by [23] (M = .50), which contextual factors or 536 cultural differences could explain. The partial correlation network shows that students with 539 higher self-efficacy were more willing to disclose perfor-540 mance data for learning analytics applications. In other 541 words, students not expecting to perform well were less 542 inclined to share their performance data.

543
While such data (e.g., grades, test results) are usually 544 available to teachers and institutions, we specifically asked 545 if the student would be willing to disclose these data for 546 learning analytics systems. Perhaps the most prevalent use 547 case for these systems to utilize performance data would be 548 to differentiate content based on performance. Thus, it is 549 logical that a high-performing student is willing to disclose 550 data to get more personal experience. While the same could 551 apply to students not expecting to perform well (i.e., students 552 with low self-efficacy), literature on students' help-seeking 553 strategies show that low self-efficacy correlates with help-554 seeking avoidance -a view that needing help is a sign of 555 weakness [47]. In this sense, disclosing performance data 556 might be seen as a threat to one's self-esteem.  While all the models failed to predict students' opt-in 582 decisions, metacognitive self-regulation was a statistically 583 significant predictor of one of the four models. This raises the 584 question regarding the role of LA-based formative feedback 585 as SRL support. For a student with excellent SRL and time 586 management skills, feedback telling if you are on track with 587 your goal may feel rather superficial. However, this kind of 588 prompt might be optimal for a student with intermediate self-589 regulatory skills. Recognizing these nuances might be easy 590 for a teacher when thinking about an individual student, but 591 it is challenging to scale up in teaching augmentation tools. 592

593
Previous research has shown that students expect to be able 594 to choose whether to opt in for or opt out of learning analyt-595 ics [20], [26], [30]. Tsai and colleagues [26] have suggested 596 that students should also be able to revisit their opt-in deci-597 sions during the semester. Being able to opt out later may 598 even increase the probability of initial opt in, as students 599 can actually reflect on whether the intervention is helpful for 600 them, and then decide if they like to continue with it.

601
While we gave this opportunity to students in the cur-602 rent study, nine students (6%) ended up using the option -603 mainly to opt out later after initially opting in. The students 604 using this opportunity had, on average, higher dedication (i.e., 605 general meaningfulness of and enthusiasm for studies) and 606 self-efficacy (i.e., expectation to perform well) compared to 607 students who did not change their opt-in decision. The more 608 dedicated students may see the choice of opting in or out as 609 more important than other students and thus are more likely 610 to use the opportunity to change their status.  Finally, it should be acknowledged that the current study 634 used self-reports to measure learning strategies and study 635 engagement. Self-report instruments have been found to 636 largely measure students' intentions, which may differ from 637 their actual behavior [48]. This should be considered when 638 interpreting our results. 640 We investigated how learning strategies and study engage-641 ment relate to students' willingness to share data with learn-642 ing analytics applications and whether these factors predict 643 students' actual opt-in for LA-based formative feedback. 644 We found that students with lower self-efficacy were more 645 hesitant to share data about their performance. However, 646 we could not sufficiently predict students' opt-in decisions 647 based on their self-reported learning strategies, study engage-648 ment, or willingness to share data.

649
Our inability to predict students' opt-in decisions empha-650 sizes the contextuality of opt-in behavior. Previous research 651 has shown that students' willingness to share their data 652 depends on the data to be shared [23] and the purpose for 653 which the data are used [24]. Based on our findings, asking 654 for opt-in for a specific intervention should be preferred over 655 requesting consent for using particular data or data categories.

656
Still, it is vital to acknowledge the differences between data 657 categories. We found that students were more hesitant to share 658 sensitive or process data than performance or demographic 659 data. A balance between the data required and the usefulness  Why did students choose to opt out? Did they understand the 670 intervention? Was there something that raised suspicion? For 671 example, we found that students not expecting to perform 672 well were more hesitant to share their performance data. 673 Should this be due to the student being ashamed of their low 674 performance, one could emphasize that the intervention aims 675 to help and support the student, not to monitor performance 676 or facilitate student competition. In this sense, opt-in rates 677 and particularly changes in them are valuable tools that help 678 learning analytics developers design and evaluate learning 679 analytics interventions. He currently works as a Senior Researcher in 858 artificial intelligence, big data in education, net-859 work science, and scientometrics at the University 860 of Eastern Finland. He is particularly interested 861 in research methods, including network analysis, 862 temporal networks, machine learning, process, and 863 sequence mining, and temporal processes in gen-864 eral. He is also an active member of several sci-865 entific organizations and acts as an academic editor in leading academic 866 publications.