Facial Expression Recognition in Classrooms: Ethical Considerations and Proposed Guidelines for Affect Detection in Educational Settings

Recent technological and educational shifts have made it possible to capture students’ facial expressions during learning with the goal of detecting learners’ emotional states. Those interested in affect detection argue these tools will support automated emotions-based learning interventions, providing educational professionals with the opportunity to develop individualized, emotionally responsive instructional offerings at scale. Despite these proposed use-cases, few have considered the inherent ethical concerns related to detecting and reporting on learners’ emotions specifically within applied educational contexts. As such, this article utilizes a Reflexive Principlism approach to establish a typology of proactive reflexive ethical implications in tracking students’ emotions through changes in their facial expressions. Through this approach the authors differentiate between use in research and applied education contexts, arguing that the latter should be curtailed until the ethics of affective computing in educational settings is better established.


Facial Expression Recognition in Classrooms: Ethical Considerations and Proposed Guidelines for Affect Detection in Educational Settings
Allison Macey Banzon , Jonathan Beever , and Michelle Taub Abstract-Recent technological and educational shifts have made it possible to capture students' facial expressions during learning with the goal of detecting learners' emotional states.Those interested in affect detection argue these tools will support automated emotions-based learning interventions, providing educational professionals with the opportunity to develop individualized, emotionally responsive instructional offerings at scale.Despite these proposed use-cases, few have considered the inherent ethical concerns related to detecting and reporting on learners' emotions specifically within applied educational contexts.As such, this article utilizes a Reflexive Principlism approach to establish a typology of proactive reflexive ethical implications in tracking students' emotions through changes in their facial expressions.Through this approach the authors differentiate between use in research and applied education contexts, arguing that the latter should be curtailed until the ethics of affective computing in educational settings is better established.Index Terms-Affective computing, affect sensing and analysis, education, ethical/societal implications.

I. INTRODUCTION
I N THE increasingly virtual and surveillance-driven land- scape of education [1], [2], many caretakers receive notifications regarding their students' behavior or performance within the school setting as evidenced by the popularity of education-specific gamified behavior management apps such as ClassDojo [3].
In an imagined future in which affective computing (AC) in the form of facial expression recognition (FER) is similarly applied in school settings, these notifications may include reports of a child's emotions as detected by automated FER software uploaded to their district-assigned laptop.The FER software would run unobtrusively as a student completes their online assignments and assessments, tracking discrete changes across over 50 facial action units [5], [6] to automatically categorize students' facial expressions into one of seven basic emotions (i.e., fear, happiness, disgust, anger, sadness, surprise, and contempt) [4].This software is marketed by the school district as a low-stakes tool, which helps them provide all students with cognitively and emotionally individuated instruction.In this hypothetical scenario, imagine your child is a high-achiever who has easily navigated their school's institutional requirements thus far, so when this software is introduced within your child's school, you decide against trying to navigate the opt-out process.
A few weeks later you receive an email notifying you that recent patterns in your child's emotional data have placed them on a list of "at-risk" students.The email goes on to explain how your child's affective data was flagged by a predictive surveillance model used to 'keep students safe' and 'provide targeted resources' to learners.Upon reading this, you open a district provided dashboard to see that, according to action units 4, 5, 7, and 23 (e.g., pursed lips and raised eyebrows) [5], [6], your child was flagged as a discipline risk for high summative anger scores as detected by these facial movements.
In the affective computing research community, this hypothetical use case may feel like an oversimplification of the finegrained, real-time emotional data FER can currently produce.Further, despite growing attention to AI ethics in the education community [67], the above scenario is in many ways a far-off concern for everyday practitioners, given that current facial expression recognition efforts largely remain at the model building stage within research settings.It is nonetheless important to consider both the real and the potential ethical ramifications of using affect detection and FER in education that lie somewhere between these contrasting understandings of the technology, especially as the data-driven culture of educational surveillance continues to expand [8].Despite well-intended efforts, emergent data tools and facial expression recognition systems (such as the Zoom overlay being piloted in virtual classrooms by Intel and Classroom Technologies [29]) are being deployed in education settings to combat undesirable student behaviors such as disengagement prior to thorough, ethics-driven discussions of their use, causing material and psychological harm as a result [7], [9].As advances in computer vision technologies like FER coincide with increased focus on the role of emotions in learning, it is imperative that researchers and developers alike proactively engage with the tangible, data-driven harms taking place within education while considering the ways in which even altruistically designed emotionally aware learning technologies and affective computing research may further augment these harms.
In driving this discussion, it is important to first acknowledge that using FER within research contexts is a fundamentally different task from deploying FER enabled affective tools in applied educational settings.By "applied educational settings" we distinguish between research and training in education versus its application in classroom settings to learners.As we stand at the precipice of scaled computer vision usage and newly adopted 1:1 technology policies in classrooms [10], it is vital we consider contextually relevant ethical concerns related to scaled usage across a range of affect-aware learning technologies (AALT) as well as the ethical responsibilities of affective computing researchers whose work may indirectly support their development [11].Given the challenges unfolding across today's educational landscapes, we argue that proactive examination of context-specific ethical considerations (both identified and identifiable) must be conducted alongside ongoing evaluations of the technical and moral implications related to inferring individual emotions from learners' facial expressions, especially within large-scale bureaucratic institutions such as education [4], [13], [19].To demonstrate this, we model the ways in which researchers, practitioners, and educational stakeholders alike can shift towards a reflexive, principles-based model of ethical decision making [14] that proactively and iteratively weighs the ethical implications of proposed FER use cases alongside the context-specific dimensions of educational settings As such, this article will discuss the intersection of identified AC ethical issues [12], [15], [16], [30] and contextually specific factors related to the usage of FER in PK-16 education, layering ethical discussions of facial expression recognition with current challenges of the modern American educational system to explore the contextual dimensions which shape our recommendation that FER is not appropriate for use in classroom settings.
To support this conclusion, we first explore the differences between using FER in research and learning settings before examining existing ethical guidance outlined by IEEE's Ethically Aligned Design standards [19].Finally, we expand upon these standards by further reviewing the raised ethical issues through the lens of five ethical principles (respect for autonomy, justice, nonmaleficence, beneficence, and explicability [17], [18]) to produce a typology of proactive reflexive ethical considerations to generate contextually grounded considerations on the use of FER within applied education settings.Through this scaffolded approach to analysis of ethical issues we argue that, while the use of FER in research contexts may advance understanding of the emerging ethical landscape, the use of FER applied education contexts should be curtailed until the ethics of affective computing in educational settings is better established.

II. USING FACIAL EXPRESSION RECOGNITION TO DETECT EMOTIONS DURING LEARNING
Advances in AC are coinciding with increased focus on the role that emotions play in learning [20], prompting the emergence of multiple affective data channels capable of detecting both static and dynamic representations of learner affect [21].Data channels such as learners' self-reported emotions [22], [23], logfile representations of learner behaviors [24], physiological data [23], and facial expressions of emotion data [25], are being used to better understand process-level learning emotions by generating multimodal representations of affective processes as they unfold across a learning task [11], [21], [26], [27].
While FER is not the only proposed method to detect learner emotions in applied settings like education, technological shifts brought on by the pandemic highlight education's willingness to utilize computer-vision enabled EdTech tools to monitor students, even as it becomes clear that these tools are propelling deep-rooted educational inequities, as is evidenced by schools and universities continued use of webcam-based test proctoring software despite mounting evidence that said software is not equipped to recognize black and brown faces [28], [69].While we recognize that these applications do not represent the aims of all affective computing research or affect-aware learning technologies, we focus our discussion on ethical issues related to current and eventual affect detection systems which utilize facial images to detect emotions given the existence of both the necessary technological infrastructure and education's established institutional willingness to monitor student behaviors via computer-vision.

III. FACIAL EXPRESSION RECOGNITION AS A RESEARCH TOOL IN EDUCATION
Considering the quickly moving landscape of facial expression recognition (FER) technologies, we argue it is particularly important that individuals researching and developing affectaware learning technologies (AALTs) like FER first consider how ethical issues in applied education settings are distinct from those found in research contexts.For this reason, we begin by referring to research findings from a lab-based study examining college students' facial expressions of emotions during a reflective writing task as a means of exemplifying the inherent subjectivity of FER as an affective data channel.In doing so, we highlight how even theory-driven research results may fail to adequately consider education's contextual dimensions and invite readers to consider the implications of this inadequacy when applied at scale through an FER enabled tool.
In this study of reference, students' facial expressions were analyzed using iMotions software [35] to identify changes in students' facial landmarks (i.e., action units) [5], [6], as a means of interpreting the presence of seven emotions (i.e., fear, happiness, disgust, anger, sadness, surprise, and contempt) while they completed a reflective writing task.Each emotion included an intensity score ranging from 0-100 and intensity scores are calculated at a frame rate of 30 Hz for each emotion.
Presentations on this study reported significant relationships between writing students' emotions as captured by FER and variables such as writer's self-efficacy or varying keystroke behaviors [36], [37].While these results evidence a correlative relationship between students' facial expressions of emotions and learning and writing processes (and were reported as such), we reference this study to point out that although results revealed emotions were linked to writing; even these findings should not Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
be viewed as sufficient evidence base for scaled application of FER enabled affect detection across the range of social, political, and cultural contexts found in education.While statistical relationships between facial expressions data and writing tasks open space for future research, we argue against interpreting these results as inherently applicable to all educational settings.
An important facet to applying an RP approach in this area of study is the recognition that results from research like this are open to interpretation driven by value perspectives embedded in both the FER enabled systems and the spaces in which they are deployed.In the study referenced above, conclusions made were based on methodological practices and process-driven theories of learning as selected by the researchers.For example, the study found that the more a writer was detected as expressing disgust, the more time they spent producing text and pausing between producing text, and the more expressed joy, the less time spent producing text [36], [37].Interpreting these results through the lens of self-regulated learning theory [38], [39], [40] suggests that disgust (a negatively valenced emotion) may have prompted learners to spend more time engaging in beneficial self-regulatory behaviors, (e.g., re-reading texts or reflecting on their writing) because they were dissatisfied with the text they produced.If these results were interpreted by a researcher interested in promoting positive affect, the same data could be interpreted in a way that associates positive learner emotions with efficient text production, wherein expressions of joy increased both time management during the writing and added to students' "net happiness" [16].
These juxtaposed interpretations highlight the level of value subjectivity present in using FER data to detect the interplay between emotional states and other complex learning processes.Despite this article's focus on the use of FER in applied educational settings, we present this aside to highlight the foundational ethical concerns present in collecting and interpreting this data channel.In research contexts, the subjectivity of these interpretations and their impact on development of FER enabled affectaware learning technologies can be discussed in the abstract, especially in the presence of institutional research policies, which proactively protect participants from harm [60].Should FER be used to automate these kinds of interpretations in applied education settings, which lack similar protective measures for individuals, the potential for harm is no longer hypothetical.The original purpose of the lab research as described was to explore potential uses of technological tools for detecting affective measures during a task in a controlled, low risk seting.We include this study to provide a tangible example outlining the ethical concerns associated with FER while minimizing associated learner risks.

IV. IEEE ETHICS GUIDANCE FOR FER
Even as companies are marketing emotionally aware AI to educational providers [29], [70], ethical thinking in the space of affect detection technologies remains in its nascency, with the scholars discussing ethics specific to AC focusing on mapping potential ethical issues on a global scale [12], [15], [30].A decade ago, U.K. psychologist Roddy Cowie argued that affective computing's rise would center on three types of ethical principles: those codified around data protection and human subjects; other characteristic moral imperatives including to increase net benefit, avoid deception, respect autonomy, and appropriately and fairly portray individuals avoiding the "unnaturalness" of the uncanny valley (the space where human comfort with artificial agents drops off sharply because they have become too similar to human agents yet not similar enough -they bear an uncanny but not quite accurate resemblance [71]); and, finally, "widely discussed concerns with less clear connections to moral philosophy or real abilities of the technology" [16].On that last more nebulous category, Cowie pointed readers to what ethicists might call the precautionary principle: that there are some ethical issues too unknown (and therefore too risky) to easily manage or mitigate [16].
Given those potential ethical risks, we first look to the standards outlined in IEEE's Ethically Aligned Design [19], a collaboratively authored document by stakeholder communities, to examine existing guidance regarding ethical design around Big Data technologies such as affective computing applications.It lays out eight core "principles" intended to guide communities of practice.These include (1) human rights, (2) well-being, (3) data agency, (4) effectiveness, (5) transparency, (6) accountability, (7) awareness of misuse, and ( 8) competence (2019: 19).These "principles" are the result of community input, analogous to the norms described in professional codes of ethics.What approaches like these don't do is articulate theoretical grounding or strategies for ethical decision-making from which a set of principles could be a jumping off point.As such, we use this section to demonstrate how IEEE guidance informs the current conversations around ethics of facial recognition which we aim to strengthen and theoretically ground through a Reflexive Principlism (RP) approach in the following sections.

A. Facial Expression Recognition Across Cultures
IEEE's Ethically Aligned Design recommends emotionally aware technologies be designed in ways that are sensitive to cultural differences as to avoid unintended harms done by systems, which interpret and convey emotions based on a limited (and in many cases Western and Anglocentric [4]) view of emotions [19].Concerns regarding the ways in which affective systems may misinterpret facial expressions of emotions across cultures or inadvertently shift cultural and societal values among users are equally pertinent when considering the use of affectively aware technologies in learning settings, especially as existing inequities compound in education [19], [54].We build upon the issues raised by the IEEE working group to suggest that this view of cultural sensitivity should be widened by education-specific dimensions that stand to further impact both the accuracy of FER detection in applied educational settings and the unintended consequences of using contested data channels such as FER [63] to shape students' educational offerings in diverse learner populations.
In education facial expressions will undoubtedly vary across the cultural, social, and religious norms outlined within Ethically Aligned Design.But beyond these variations driven by social level norms, we argue that students' external expressions of emotions may also be dependent upon contextually specific dimensions within more local educational settings such as a learners' developmental stage, the institutional culture they are learning within, the presence of one or more learning disabilities, or the socio-economic factors that shape a specific student's experience.As such, any system widely used to detect learner emotions in applied settings must consider a vast range of dynamic, context-specific factors as well as the cultural elements outlined within IEEE's recent typology.For example, students with autism spectrum disorder may have trouble recognizing or displaying facial expressions of emotions [52] in accordance with the "universal" emotion categories commonly used to detect and interpret emotions with FER [4].Or, in the case of students who are experiencing economic hardships outside of school, FER enabled systems may continually detect "negatively valenced" emotions as those learners' grapple with aspects of a lived experience taking place outside the learning task.Given the wide range of learners, many of whom represent some of the most vulnerable members of our society, any discussion of affect detection in education must proactively attend to these overlooked dimensions [61] to avoid furthering existing educational inequalities.Departing from IEEE's recommendation, we argue that rather than trying to quantify these amorphous dimensions or diversify existing datasets, affective systems designed for education settings should instead acknowledge the inherent limitations of FER enabled emotion detection in dynamic settings and prohibit this data channel from being used in any evaluative processes (e.g., utilizing emotion detection to evaluate which students are or are not engaged in a learning activity as proposed by Intel [29]).
It is also important that those interested in deploying FER enabled systems in these spaces carefully consider how contextual dimensions interact with broad definitions of culture to shape the settings they hope to serve.An RP approach echoes and expands on IEEE's recommendation for the creation of continued, collaborative efforts between researchers, developers, and educational professionals that both proactively consider setting-specific ethical concerns as well as continually measuring the impacts of emotion detection in education spaces.As noted in our discussion of the principle of explicability (see Section VI.E), providing targeted programming designed to help students, parents, and educators develop a comprehensive understanding of proposed affectively aware systems and the potential harms is an important first step in this effort.Further, we argue that clearly defined mechanisms to "shut down" affect detection and biometric data collection, as proposed by the IEEE committee, should be made accessible not only to the educational professionals tasked with monitoring these systems, but to caretakers and their students as well.In envisioning systems that respect individual autonomy (see Section VI.C) by allowing individuals to easily turn off certain data channels at their own discretion, we invite those interested in developing intelligent learning systems to work towards emerging technologies that grant renewed levels of learner agency rather than codifying existing limitations on learner agency into upcoming automated systems.

B. When Systems Care
Like support for user-end control over information collection, we similarly argue for public, universally discernable notices provided by developers of affect-aware learning technologies that clearly denote what FER enabled learning tools (e.g., intelligent tutoring systems, affectively aware toy robots like Moxie [70], etc.,) can and cannot do, and extend this recommendation to include explicit notices on how systems use collected affect data.This is especially important in educational settings (part of the caring professions [68]) where providers are notoriously overburdened and undervalued.Presenting intelligent learning technologies as a care-providing tool capable of performing at the same (or better) level as instructional staff burgeons on predatory marketing practices: current systems ready for application in diverse educational settings simply do not have this capacity.Thus, we argue that language which exaggerates the care capacities of affect-aware learning technologies (or minimizes the dimensions of care provided by human instructors) should be omitted in descriptions of these systems.
Further, suggesting that affect-aware learning technologies as they exist today can match human instructors' ability to interpret student emotions and emote in kind diminishes the level of nuanced cognitive and affective skills held by effective educators.Because this article is focused on ethical issues surrounding the detection of learners' expressed emotions rather than the ethics of emotive affective systems more generally, we deviate from IEEE's guidance to recommend that those interested in AC for education focus on the ways in which proposed systems can bolster the ethical principles outlined here (see Section VI) for individual learners and promote institutional cultures of care within educational settings by valuing educational professionals' expertise and user control over a focus on quantified outcomes.
Thinking about caring systems in this context recognizes the traditions of the ethics of care [68], which have been seen by some as missing from and complementary to a principlist account of ethics.However, caring relationships are central to the contextual specification of principles: part of what it means to act beneficently is, we think, to cultivate and strengthen caring relationships (see Section VI.B).More generally, the cultivation of relationships through an emphasis on care sustains respect for autonomy through recognition of the others' capacities and agency (see Section VI.C) and, in so doing, avoids harms (see Section VI.A).As such, it is important that any proposed affective system validates and supports instructional care roles within education rather than seeking to replace them.

C. Manipulation, Nudging, and Deception
As noted in Ethically Aligned Design, affectively aware computing systems are often promoted to develop pro-social behaviors via in-system design elements that can nudge users towards desired emotions or behaviors [19].Given examples (see Section VI.D) which outline how education systems have promoted singular, politically driven definitions of what behaviors are seen as adaptive or maladaptive in education settings [3], [7], we strongly second the committee's recommendation that all entities consider whether it is "ethically appropriate" to deploy Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
affect-based nudging in governmental contexts [19: pg. 99].This is of particular importance within education given the extensive role of state and federal governments in school systems ranging from Pre-Kindergarten to college, up to grade 16.An RP approach helps explain why the manipulative potential of affective computing approaches is of ethical concern: both designers and users ought to worry about manipulation not merely because some cultural contexts agree that it is a problem but, more deeply, that manipulation poses negative challenges to fundamental principles.Through epistemic conflation, it threatens our ability to respect autonomy, opens space for unintentional harms, and positions injustices as data-driven instances of justice [7].Manipulation, nudges, and deception are not merely useful tools for social guidance, but ethical threats to autonomy and justice.Without explicability (see Section VI.E) to support transparency, even nudges designed to be helpful can turn to manipulation.We urge those interested in affective learning technologies to avoid deceptive or manipulative designs in favor of transparent, user-facing system elements that report findings directly to individual users, provide users with clear discussions of what has been detected, and allow users the autonomy to opt-out of "nudges" based on that information, promoting regulatory behaviors while preserving student autonomy.

D. Supporting Human Potential
It is important to clearly define what success or human potential development means as an institutional, political, or socially derived ideal within educational contexts.As noted in IEEE's guidance, affectively aware systems have the potential to reduce emotional capacity, hinder creativity, and shape social outcomes in ways we have yet to fully explicate.When this guidance is applied to educational settings, these risks could have long-term effects on emotional development of millions of young learners.As such, we build upon the recommendations of the IEEE committee by asking proponents of AC to consider how a particular project is defining human potential or, in the case of education, promoting institutionally defined metrics of potential or academic success.Supporting human potential demands respecting individual learner autonomy, which may in turn demand limiting the use of narrow measures of academic success such as "achievement" measured by year-end assessment scores, or "academic engagement" as measured by a student's ability to appear on task as measured by FER.The use of such narrow measure of success runs the risk of supporting a limited view of human potential and autonomy that categorizes students with complex skills and experiences into binary distinctions of "successful" or "unsuccessful" students.For this reason, we recommend that AC researchers and developers, alongside education professionals, reorient their definition of human potential by deeply considering what is measured in today's educational systems as a means of designing affective systems that prioritize student experiences over institutional definitions of achievement.

E. Synthetic Emotions
IEEE's Ethically Aligned Design defines synthetic emotions as a system's ability to artificially emote in ways that may facilitate empathetic, collaborative relationships between humans and affectively aware technologies [19].We again echo the committee's guidance and our prior argument to recommend that systems utilizing synthetic emotions should be clearly marketed as such, making clear that all system-displayed emotions are "authored, designed, and built deliberately," to bolster explicability and prevent potentially harmful bonding between students and synthetically emotive systems that engender trust in users as a means of amassing affective data.
We expand the committee's definition of synthetic emotions to warn against affective systems that generate synthetically expressed user emotions as well.Research on the relationships between current educational surveillance like ClassDojo and learner performativity [3], [28] is relevant in considering the ways in which emotion detection may unwittingly impact affective processes or emotion development across student users.Should emotion detection be used to value, track, and place students to the same degree as current educational data has, we run the risk of teaching students to perform varying expressions to avoid undesirable outcomes.As such, we argue that facial expressions of emotions data should never be used as a valuative measure for risk of encouraging synthetic, performative user emotions that stand to impact natural developmental processes in ways that cannot be predicted.

V. ETHICAL REASONING THROUGH PRINCIPLISM FOR AFFECTIVE COMPUTING IN EDUCATION
Generally speaking, IEEE Standards committees have organized the ethical terrain of affective computing around four topics: (1) how affect varies across human cultures, (2) problems for artifacts designed for caring and private relationships, (3) supporting human flourishing, and (4) appropriate policy interventions [19].
While many of the proposed ethical principles for AC align with national level ethical guidelines in educational research (e.g., social responsibility, respect for dignity, [31]), the landscape of ethical consideration for affective computing, and more specifically for affective computing in the context of education, is young, under-developed, and does not yet mirror the degree to which educational codes of ethics prioritize educators' commitment to students [33].However, common themes emerge around ethical principles of respecting autonomy (even if that includes the "isolated autonomy" [32: pp. 195] of artificial intelligent systems), worrying about risks, rights, responsibilities, and relationships in the context of moral agents and patients (beneficence and nonmaleficence), and thinking about fair representation of identity and transparent access to information (justice).This coalescence of principles-oriented concerns forms the basis of principlism, an approach that grounds contemporary US bioethics.We think a principlist approach, specifically in its iterative form of Reflexive Principlism [14], can help stabilize the conversations around the ethics of affective computing and connecting them to a larger discourse across other disciplines such as education.Its four principles guide ethical decision-making, offering guidance and requiring practitioners to iteratively specify and balance them in emerging contexts.For example, respect for autonomy holds as a universal ethical principle, but whose autonomy one must respect might be expanded to account for not only designers and users but also future agential affectively aware systems that are non-human or complex multi-agential systems, especially those serving minors within autonomy-restricting educational systems.
Principlism's advocates, particularly in biomedical ethics contexts, have articulated responses to a wide range of criticisms and counterarguments over the decades since its development out of the post-World War II Belmont Commission [17: pp. 425-57].They argue consistently that principlism provides an accessible ethics vocabulary that can account for cultural differences in specification of principles.Take, for example, the principle of respect for autonomy.In a 2000 essay, Cameroonian bioethicist Godfrey Tangwa articulates the way in which the European/US conception of individual autonomy differs from the African concept of community-based autonomy, in turn specifying what respect for autonomy means across these cultures [34].More recent literature has pushed to articulate similar specifications of the principle of respect for autonomy that account for multiagent systems or even non-human animal or artificial agents [72].Specification and balancing of contextual principles support differences across diverse dimensions of ethics while upholding a common morality approach that articulates shared values, an important practice when considering the ethical implications of using AC in human-facing care professions such as medicine and education.
While principlism satisfies requirements for ethics in autonomous and intelligence systems research set forth by the IEEE committee [32: pp. 195-203], principlism does not stand as an ethical decision-making theory in that it does not guide the process of decision-making in a procedural way.As a complement to traditional principlism, engineering ethicists proposed Reflexive Principlism (RP) [14].RP emphasizes the reasoning process: the iterations of specifying, balancing, and justifying principles in accordance with contextual constraints.Starting from initial norms (principles) identified from a shared or common morality and iteratively applying them to the specifics of a case or context is the process of "coherence-making" [14: pp.283], in which RP actively engages interdisciplinary reasoners in ethical decision-making.Coupling the four ethics principles and processes of RP to the complex epistemic context of digital technologies, RP expands to consider explicability as a fifth principle [18].In the context of artificial and intelligence systems like those driving affective computing, this fifth principle emphasizes epistemic concerns about transparency of algorithm and access, intelligibility of programming, and accountability of developer intention.Table I maps out these five principles along with their ethical dimensions.

VI. HOW REFLEXIVE PRINCIPLISM SUPPORTS ETHICAL CONSIDERATIONS FOR FACIAL EXPRESSION RECOGNITION IN EDUCATION SETTINGS
Considering RP's iterative reasoning process and the limitations of the IEEE standards as stated above, we argue for an interdisciplinary and proactive Reflexive Principlism model [14] in which multiple stakeholders (i.e., developers, researchers, educators, policy makers, parents, and students) are invited to continually drive discussions on use of affect-aware learning technologies within applied educational settings.By design, this iterative model of ethical decision making expands discussions on the ethics of FER in education to include the voices of those who are most likely to be impacted by affect-aware learning technologies, continually considering changes in affective research, education contexts, and applied ethics while working towards morally and pedagogically aligned intelligent learning systems.We argue that the highly personal nature of affect makes this level of continued collaborative ethical decision-making imperative in any use of emotion detection within the classroom.
To exemplify this, we organize the remaining discussion by the five coupled ethical and epistemic principles within Reflexive Principlism [14] to discuss the ethical concerns surrounding potential applications of FER enabled affective computing (based on data channels and research results like those presented in Section III) in real-world education settings.In emphasizing the impact of context, we advocate for ethical discussions of FER use in education which consider the impact of context-specific dimensions typically excluded from research and existing ethical standards, including ethical, legal, and social implications at institutional scales and emotional and developmental considerations at individual scales.
In accordance with the RP process, we then follow each of these principles with discussion of education-specific contextual problems.Through this reflexive process of "coherence making" [14: pp.283], we aim to highlight the ways in which existing high-level ethical guidelines (see Section V) can be used to propose context-specific guidance for educational applications of intelligent affective computing systems.We will connect the ethical principles outlined in Table I with discussion of issues unique to the education profession to highlight context-specific Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
considerations regarding the design and application of FER enabled systems for classroom settings.
In recognition of the reflexive principlist approach, we organize ethical considerations by beginning each section with one of the five principles (i.e., initial norms [14]) with each of these including subsections on contextually specific problems.However, given the emergent nature of technology, applied ethics, and the field of education, we recognize that these ethical considerations represent a high-level snapshot of issues surrounding the use of FER in education and will require sustained balancing and specification for emerging contexts.

A. Nonmaleficence
Education settings serve several vulnerable student groups as defined by the Code of Federal Regulations, a designation that necessitates enhanced protections for members of these groups in research settings [41].The absence of similarly defined protections within applied education settings leaves many students exposed to harms otherwise accounted for in research protocols.As such, it is critical that proponents of deploying affect-aware learning technologies outside of controlled research settings proactively mitigate potential harms unique to educational settings.
1) Privacy and Security of Student Data: Conversations surrounding biometric data collection (such as those produced by FER) have rightly focused on data privacy to prevent harm.Collecting facial expression data from students, especially those under 18 years of age, necessitates in-depth consideration of data security and privacy concerns as a means of protecting learners' privacy within an involuntary space [13], [18].While education and EdTech's troubling track record with this area of privacy [7], [42] presents clear ethical concerns for the applied use of FER systems, ongoing discussions and legislation on this topic highlight growing efforts to protect identifiable student information and minimize data-driven harms related to breaches in privacy, whether real or potential [43] [44] [45] [46].A growing body of work in this area implicates emerging digital technologies such as FER in potentially novel digital harms for which current discussions may be insufficient [44].Some scholars, like political scientist Hillman, argue that discussions on educational data security must also, "unveil the hypocrisy of data practices that are enabled as every educational process becomes digitalized …" [46].
Given the unique nature of FER data and its proposed role in extrapolating human emotions via individual's outward expressions and appearances, the focus of nonmaleficence in the AC context rests heavily on the individual right to emotional and psychological privacy [18], [47].FER presents novel challenges here, as it integrates or conflates mental and informational privacy concerns: affective computing systems push for access to inner mental and emotional states and translate those into and through informational systems.In this way, concerns for the privacy of our inner states (which face psychological interference from education's existing behavior management practices [3], [48]) could face a second-order concern about the freedom from informational interference as well (see Section VI.C.3 below).
2) Preventing Harms: Indeed, this informationalization of inner emotional states is made more concerning by systems which conceptualize individual's facial expressions into limited, "universal" categories of emotions [4].Given the ethical and methodological concerns surrounding claims of emotional universalism [4], [49] and racially driven accuracy disparities in commercial computer vision applications [28], [50], there are real concerns associated with scaled use of FER in education systems serving 26.8 million students of color [51].Further, concerns on FER's ability to consider the ways that facial expressions are impacted by cultural dimensions [4] or the presence of neurodivergence [52] would inevitably be tested on real student populations should we choose to track student's facial expressions across systems serving a growing number of learners born outside of the U.S. [53] or in K12 settings where 14% of students receive some form of accessibility support [51].As such, detecting emotions via FER within these spaces without regard to growing work on the limitations of FER enabled affective systems (as well as education's tendency to utilize simplified versions of emerging tech tools) could work to automate existing inequities across tens of millions of students.
AC systems driven by an assumption that facial expressions can be used to extrapolate individual's emotional states could falsely categorize learners (many of whom already receive inequitable education services, [54]) without sufficient cause.This highlights how concerns about justice are built on the back of privacy concerns.Ethical concerns do not stand alone but are interlinked even if weighted in a particular context: a key premise of principlism in ethics.

B. Beneficence
Understanding what it means for emotionally aware technologies to not only avoid harm but to actively work towards benefiting the individuals using it requires deeper and iterative inspection of the parties, values, and power structures unique to education as conditions develop.In discussing the ethics of affective computing, Cowie posits that its "most obvious function" is to both maximize "net happiness" across users while minimizing negative experiences [16].When we consider this supposition while focusing solely on the development of the affective systems themselves, it is easier to agree with binary assessments of good and bad emotions as they relate to affectively aware technologies -clearly, no informational system is deployed to deliberately do harm.However, in reflexively considering the nuanced nature of emotions as well as the contextually relevant factors impacting applied use of these systems, the argument for beneficence is less simple.
To determine what beneficence looks like in educational settings, it is important to first distinguish who is benefiting when we collect biometric data during learning.As was the case in the study described in Section III, collecting FER data in research contexts requires all participants receive a written explanation of the study that clearly defines who will benefit from their participation (whether that be the study participants, the researchers, or both) and how.In applied settings, it is difficult to define who benefits when we collect and analyze learners' Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.biometric data.In the decades following calls for educational reform in the US in the early 1980s [55], rapidly amassed troves student data have yet to assuage inequitable outcomes in areas like literacy or student disciplinary outcomes [8].Instead, as we have noted, this data has in many ways proliferated a culture driven by surveillance and performance-based funding models in ways that incentivize certain representations of learning (e.g., summative assessment scores) over others (e.g., skills such as task persistence or individual reflection).Should scaled collection and interpretation of students' FER data follow the path of existing educational data usage, beneficence may inadvertently mean supporting existing structures that prioritize simplified institutional metrics over nuanced learner experiences [56].
Further, should FER enabled emotions data be utilized to detect and valence learner emotions as a means of nudging individual students towards positive emotions (given research on the association between positive emotions and desirable learning outcomes; [57]), educational systems may inadvertently limit dynamic, multidimensional learning processes in ways we cannot foresee.Though admirable in the desire to promote enjoyable experiences and psychological wellbeing, widescale application of emotion detection tools that are used to nudge learners towards positive emotions as a means of increasing happiness or academic achievement has the capacity to shift emotional development processes across entire generations of learners, potentially limiting students' exposure to healthy discomfort when pursuing academic goals [58].
For these reasons, we argue that for AC applications to promote true beneficence within educational systems, they must first promote the wellbeing of individual learners in a more nuanced capacity that avoids nudging students away from "negative" emotions (e.g., confusion or frustration) as they organically arise during problem solving tasks [58].

C. Respect for Autonomy
Respect for individual autonomy is an important moral value both within and without the AC community [16], [17], however the institutional, legal, and developmental norms found in education add challenging layers to discussions of agency within this context.It is important that those interested in scaling affectaware learning technologies consider how existing contextual dimensions external from proposed systems shape individual student agency when deploying FER enabled emotion detection in education.Beyond the general principles and practices of autonomy laid out in Table I, autonomy is specified within the FER-in-education context in the following ways.
1) Respect for Individuals' Emotional Autonomy Within Surveilled Institutional Settings: We must recognize the inherent power imbalances found in education when discussing the use of emotion detection tools in these settings.Within education, scaled use of FER enabled technologies would take place within rule-governed, bureaucratic institutions [13] that can curtail individual autonomy.As bioethical principlists Beauchamp and Childress note [17], the presence of an authoritative body does not automatically lead to the absence of individual agency, however it is worth considering what learner agency looks like when access to education and upward social mobility hinge on the adhering to rules defined by educational institutions [13].As such, discussing individual agency as it relates to the use of emotion detection within these settings must consider the ways in which educational policies and institutional rules significantly reduce individual agency and limit learners' ability to fully consent to biometric data collection, opt-out of emotion detection, or advocate for themselves should they disagree with interpretations of their data.
2) Dangers of Artificial and Community Agencies: Perhaps most relevant in conversations of autonomy is the educationspecific legal authority which allows education professionals and the institutions they represent to act in loco parentis, or in place of the parent while a student is within their care [59].While this designation was born from a desire to provide students with supports and foster social norms, it also grants schools the authority to foster social and moral norms that are deemed beneficial (either within the institution, the larger socio-political sphere, or both) in ways the mirror a level of control granted only to legal caretakers.Instructional content and institutional rules designed with the goal of shaping "ideal" student behaviors are a historied element within education, however the advent of data-driven education technologies allow educational stakeholders to quantify, surveil, and manipulate student behaviors at scale that is driving a pervasive new chapter in this practice [3], [48].With affect-aware learning technologies as part of this data multi-agential system of authority, what it means to recognize autonomy and respect it becomes increasingly complicated.
3) Freedom From Informational Intrusion; the Right to Self-Determine: We argue that collecting and reporting on emotional data may hinder individual students' freedom to emote and express a range of facial expressions should education adopt FER driven affectively aware systems.It is not yet known how quantifying and reporting on learners' biometric data may impact school discipline or interfere with students' emotional development processes.To what extent might affect-aware learning technologies systems deployed in these settings intrude on students' self-determination or free expression?What we do know is that when we boil down complex processes into data points, the current educational model tends to use this information to feed punitive models of student surveillance [7].For this reason, it cannot be understated that wide scale collection of FER data as a means of monitoring emotions in traditional schooling environments has the potential to further restrict student agency while fostering performative expressions of emotions in ways that fail to support the ideals of emotionally informed pedagogy [3].
4) Individual Autonomy, Compulsory Education, and Consent: Due to the institutional power dynamics discussed in the previous sections, another salient distinction between the use of FER as a research tool (see Section III) and deploying FER to capture and interpret students' emotions in applied settings lies in differing levels of individual consent across these spaces.Established ethical research practices necessitate some degree of ethical thinking while designing, conducting, and reporting on a research study (even as the particulars of these policies continue to evolve) [60].For example, each student who participated in the described writing study received a thorough explanation of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the research design which complied with requirements outlined by the university's Institutional Review Board.This ensured that every student participant was informed of data channels were being collected, where their data would be stored, how their biometric data would be used in the future, and who they could contact directly with related questions or concerns.By providing this level of transparency, each participant could give or withhold their informed consent to participate in a study that utilized FER to detect their emotions.
In contrast, users' (students) consent for data collection within educational technologies (in US contexts specifically, although learning analytics play an increasingly large role in European nations as well [73]) takes the form of an implied, institutionwide consent wherein a student's consent is inferred from actions such as continued enrollment in a course or a parent's decision to keep their child within a given school district.While this practice allows educational institutions to update instructional and technological offerings in a timely manner, it also dampens individual learners' ability to freely consent to the collection of sensitive data.This is especially true as we look towards the use of multimodal systems that utilize FER data alongside other channels to track learners' interactions across an academic task.Within an educational system, opting out or withholding individual consent often requires individuals to navigate opaque, bureaucratic requirements that require a level of literacy, time, and technological access that is inherently inequitable to caretakers working multiple jobs or whose primary language differs from what is spoken in the school system.Additionally, many parents who do manage to navigate this institutional red tape do so only to realize that opting out of data collection means opting out of the instructional content attached to the tools collecting said data, wherein one must choose between privacy and educational opportunity.Proponents of emotion recognition within these settings should question whether true individualized, informed consent and respect for individual autonomy can take place in scenarios where opt-out processes may prove burdensome for learners and their guardians [13].

D. Justice
Practically, the rights of individual students within educational settings are also constrained by the circumstances of their birth, in that experiences within education systems are often dependent upon factors outside of a student's control such as race, gender, socioeconomic and cultural status, disabilities, or the intersections of these dimensions [54], many of which have yet to receive in-depth focus in AI ethics literature [61].The collision of these inequities and education's predilection for surveillance and punitive practices has led to clear disparities in disciplinary rates across student ethnicities, genders, and disability statuses [54], [62].For this reason, it is crucial that ongoing discussions of AC ethics and algorithmic justice expand beyond in-system elements to further consider existing contextual injustices [18].
Ongoing discussions of racially biased facial recognition applications [50] and continued research purporting physiognomic FER applications to detect dishonesty or criminal intent [63] present alarming implications for unjust usage of FER and emotion detection in public settings.This is especially pertinent in education, where recent zero tolerance disciplinary polices have translated into a "School to Prison Pipeline" that inordinately targets Black students and students with disabilities [62].These contextual injustices are further exemplified by a recent program in which the Pasco County Sheriff's Office in Florida surreptitiously used sensitive student data (e.g., absences, grades, the number of times a child experienced trauma such as sexual abuse) to drive the Sheriff Office's predictive policing initiatives [7].Given the undue role of the legal system in contemporary models of education [56], it is important to recognize how increased access to student data has thus far driven increased surveillance rather than targeted learning supports.There is little evidence to suggest that the addition of FER data would ameliorate existing surveillance practices.
These concerns are further compounded by the institutional and legal dimensions which limit individual students' due process rights within educational institutions [62], [64].As evidenced by the lack of ramifications following the Pasco County Sheriff Office's illicit use of student data [7], specifics on data usage and resulting decisions are frequently contained within educational bureaucracies.These systemic issues set the stage for significant injustices and psychological harm should students' FER data be utilized in accordance with existing educational surveillance practices.

E. Explicability
While biomedical principlism restricts its list to four key principles, the context of data-driven technologies warrants inclusion of a fifth: explicability.Explicability "ensures individuals have the right to know and understand what led to decisions that have significant consequence in their liberty, employment, [and] economic well-being …" [74].It complements respect for autonomy by recognizing the complexity of epistemic contexts informed by Big Data.
1) Avoiding Black Box EdTech Models: Explicability represents an essential ethical consideration for emotion detection within bureaucratic spaces [13], [18].This is particularly relevant within rule-driven education institutions where user-level FER data could be used to determine a learner's educational path.Given the institutional and political dimensions which may limit access to due process in education contexts, we argue that explicability should expand beyond a pronouncement at the onset of data collection to include explicit descriptions of how learners' biometric data will be collected and maintained, how the system will interpret the collected data, which institutional mechanisms will draw from these interpretations to make data-driven decisions, and the avenues with which students can refute these interpretations or opt-out of FER data collection all together.This expanded model of explicability would provide learners and caretakers with a clear understanding of how their data will be interpreted and allow them to orient their behaviors to achieve desired institutional outcomes or avoid punitive actions if needed [13].Algorithmic and data regulation in European markets are laying the groundwork for expanded explicability policies and discussions of an individual's "right to an explanation' [13], [65], however limited legal incentives elsewhere have left explicability in the hands of companies and developers.This leaves governing bodies private organizations to self-opt for a level of explicability that is viewed as logistically and financially prohibitive, increasing the likelihood of continued opacity.
2) Computational Literacy is Key: Concerns around the explicability of FER are further compounded by the limited scope of current computational literacy provided to education professionals.Historically, educators are trained on how to select and apply available EdTech offerings to promote student engagement or deliver instructional content.Stakeholders have only recently begun to work towards explicit educator preparatory programming on computational topics such as computation thinking [66] or AI ethics in education [67].As such, discussions encouraging explicability and transparency to protect users within affectively aware systems must recognize that educational institutions are tasked with an enormous number of existing responsibilities.Professionals within these spaces should not be expected to also develop in-depth knowledge of AC and FER applications to protect the interests of the students they serve.Rather, we argue that the moral responsibility of explicability lies with researchers and developers, wherein the collaborative approach of RP [14] would involve collaborating with education professionals to create and disseminate intelligible explanations of FER enabled systems specific to educational settings.

VII. CONCLUSION
Classrooms often act as a microcosm of the social, political, and cultural challenges that surround them and, thus far, data produced by existent educational technologies have failed to capture this complexity, resulting in outcomes that range from morally concerning to actively harmful.While no researcher or developer can mitigate every contextual element, nor shelter users from harms enacted by existing educational policies, they can consider these ethically relevant issues in ways that proactively weigh the moral implications of utilizing FER to detect emotions in applied educational settings.
Incoming reports have highlighted ongoing harms happening at the intersection of emerging tech and education [2], [42] and it is our hope that in presenting these issues and their potential roll-over to eventual affectively aware systems, professionals across technology, cognitive sciences, philosophy, and education can join in proactive, critical discussions of what it means to scale FER within spaces already struggling to protect students' wellbeing.In reviewing the range of contemporary education issues presented here, we challenge readers to reflect on the ways that envisioned affective computing applications could join existing efforts from external stakeholders (e.g., EdTech, politicians, assessment companies, etc.) in proliferating these issues and exerting undue influence over educational environments.We intend that future work will build on our Reflexive Principlist evaluation of the ethical landscape related to FER use in educational contexts, developing guidelines for policy and future practice for practitioners.
In the immediate term, we implore readers to heed concerns that FER data collection is not "ethically appropriate" in governmental and bureaucratic spaces like classrooms [19].The belief that learner facial expressions can be used to reliably detect, extract, and attend to student emotions presumes that facial expressions of emotions are universal and always directly related to the instructional elements within our control.When we reflexively consider the diverse dimensions -both ethical and contextual -shaping today's learners and the contemporary educational experience, it becomes clear that this categorization of emotions may be inappropriate outside of research contexts and in spaces where misuse of FER data stands to expand education's growing digital surveillance culture.