Designing Adaptive Extended Reality Training Systems Based on Expert Instructor Behaviors

Advances in training technologies using Extended Reality (XR) offer dramatic improvements in terms of time and associated costs over traditional training methods. Industries that are reliant on human labor are starting to embrace XR devices such as Virtual and Augmented Reality (VR/AR) Head-Mounted Displays (HMDs) to capitalize on these advantages. Additionally, developers now have the capacity to create customized user experiences and reduce instructor workload by employing adaptive automation in XR training simulations. Adaptive systems analyze the behavior of a learner for a trigger and then deploy an adaptation to alter the system state. While examples of adaptive XR training systems in current literature show that they are feasible to develop, further research must be done to quantify their efficacy. Furthermore, recommendations for how to design triggers and adaptations for XR training simulations are currently non-existent, resulting in negative impacts to training if inappropriate adaptations are implemented. This paper provides novel, evidence-based recommendations for the design of future adaptive XR training systems by learning from expert XR instructors. To this end, semi-structured interviews were conducted with 11 XR trainers. Participants were asked to discuss their experiences dealing with learners who exhibited confusion during XR training. Interviews were analyzed for existing and emerging themes. Finally, these themes were applied to existing trigger and adaptation models and synthesized into design recommendations for XR training developers. The outcomes of this work will inform the future development of adaptive XR training platforms.


I. INTRODUCTION
Virtual Reality (VR) and Augmented Reality (AR) technologies use computer graphics to generate 3D content and simulated environments. VR does this by replacing the user's view of the real world with a completely artificial one. AR, on the other hand, supplements the user's view of the real world with additional computer-generated content, creating an environment that blends the two. Extended Reality (XR) encompasses both AR, VR, and more. It also includes spatial computing technology, Head Mounted Displays, projections systems, and non-invasive brain interfaces. These XR technologies are being applied to a myriad of industries, including training and education.
The associate editor coordinating the review of this manuscript and approving it for publication was Michele Nappi .
Many researchers have studied the impact XR training has on learners, also known as trainees. Many of these studies have found that the use of XR technology significantly increases learner performance metrics during and after training [1], [2]. XR systems were also found to decrease mental workload by providing relevant information in the proper training context [3]. Furthermore, the time savings associated with increased performance and training speed, has been shown to decrease associated costs by up to 50% [4]. However, one place where XR training is still lacking is in reducing the effort required of an instructor, also sometimes referred to as the trainer. XR instructors must maintain a high level of situational awareness and attention in order to give appropriate feedback during learning. However, this is inhibited by XR devices, such as an HMD, because the instructor often cannot see the virtual environment as the learner does [5]. One way to remedy this problem is by supplanting or augmenting the role of the instructor with adaptive automation.
Adaptive automation is a system that actively adjusts the allocation of tasks between a human and a computer based on a trigger event [6]. Adaptive automation has shown promise as a tool for increasing learning outcomes in training [7], [8]. By combining adaptive automation and XR to create an adaptive training system, greater performance gains can be made while reducing the workload placed on XR instructors [9]. Several examples employing adaptive XR training systems have been published in fields ranging from military training to physical rehabilitation [10], [11]. However, little research currently exists on the design of adaptive triggers, and how to choose the appropriate type of adaptation to best suit the needs of a learner in an XR training system. Additionally, if employed improperly, adaptive automation can lead to over reliance on the system, meaning that errors are likely to occur if the computer's logic fails, or automation is removed. Additionally, improperly designed automation can increase mental workload, making research in this area even more necessary [12].
This paper begins to fill the void of research in the design of triggers and adaptations for XR training by drawing from existing experts. Based on an extensive literature search, this work is one of the first published that provides detailed, research-based recommendations for the design of adaptations and triggers specifically for XR HMD technology. For this work, 11 XR training experts were interviewed about their experiences providing feedback to learners who were immersed in virtual environments. Seeing as the population of VR training experts is relatively low, the sample size was determined to be sufficient. By applying their experience to existing adaptive system models, the authors identify common triggers and adaptation methods used by experts. Finally, the authors discuss how this information can be implemented in the development of future adaptive XR training systems by setting forth design recommendations.

II. BACKGROUND
This section reviews previous research inherent to understanding adaptive XR training systems. The first is XR training, which has been thoroughly researched and shown to provide many benefits in terms of training and procedural task performance. The second is adaptive automation, the trade offs of which will be discussed and exemplified in the text. Lastly, the scarce amount of research combining these two technologies will be discussed along with the need for the current research.

A. XR SIMULATIONS FOR TRAINING
XR technologies have rapidly evolved in the twenty-first century and are being embraced as a training tool in various industries. One reason is because XR simulations can be used to train for situations that are potentially hazardous or difficult to simulate in real life [13]. Therefore, by using a controlled virtual simulation, learners can be better prepared for dangerous or uncommon circumstances when they arise. Additionally, applications such as manufacturing, assembly, and maintenance are frequently cited applications of XR training in literature and have been found to have dramatically positive effects on training outcomes [14]- [16]. XR training has also been shown to improve performance and reduce training time for medical students learning to conduct various types of laparoscopic surgery [17]- [19]. Additionally, a recent review of studies that used VR HMDs for training and education found that existing research supports the use of this technology for training in psychomotor tasks such as surgery or search tasks and affective skills such as coping with fears and phobias [20]. These studies demonstrate the varied applications of XR training, but each industry has a different reason for choosing XR technology over traditional training mediums.
Many industries have adopted XR technologies because they offer unique advantages over traditional training methods. For example, the use of XR technology in training has been shown to improve a learner's ability to perform spatial assembly tasks and knowledge retention over time [2], [21]. Miller and Waller studied subjects who were trained to complete spatial circuit board assembly tasks using one of three training methods: desktop VR, paper, or video tutorials. They found that the VR training condition took more time to complete, but led to better performance retention over the course of two-weeks than the traditional training methods [2]. Another study by Ganier, Hoareau, and Tisseau found that VR training could be successfully transferred to real-world tasks [22]. Their study tested 42 participants who were trained for a maintenance procedure on a control panel using one of three methods 1) VR training, 2) traditional training, and 3) no training. After training, all participants were asked to complete the task on a real control panel. They found that those trained in VR performed just as well as those trained using traditional methods [22].
The most frequently proven advantage of XR training is performance. In one example, researchers studied the use of VR and AR for training in the assembly of an electronic actuator. They found that VR training was equally effective as traditional video training and AR training provided performance improvement over video training for the application in question [23]. Another study by Hou et al. compared traditional paper schematics to AR instructions in the context of a pipe fitting task [4]. Their study showed that the use of AR instructions reduced completion time and the number of errors by 50%. These reductions were then used to compute a 66% cost savings for stakeholders [4]. In a more recent study, Doshi et al. found that using AR technology to project spot welding locations of the work piece could increase industrial welding accuracy by as much as 52% over traditional welding practices without AR [24]. Lastly, XR technologies like wireless HMDs have recently become cheaper and more portable. This allows training to occur where learners are located, instead of being limited to a specialized training facility. In the case of AR, this means the training can take place in VOLUME 9, 2021 the very environment where the skill will be used. This can present unique challenges such as registration accuracy and privacy which must be considered by the system developers before choosing to use this technology.
Despite all of these potential benefits of XR training, there is still room for improvement when it comes to providing crucial feedback to learners who are immersed in virtual training simulations. In a study by Kruglikova et al., it was found that medical students using VR training equipment learned to perform an endoscopic colonoscopy more quickly and accurately when they received feedback from an instructor when compared to a group that used VR training but did not receive feedback. Additionally, the number of failures due to perforations were reduced from seven in the non-feedback group to zero in the group that received feedback from instructors [25]. However, it can still be difficult for instructors to provide effective and timely feedback because observation of, and communication with, the learner is often hindered by an XR device, such as an HMD [26]. One solution to this problem is to use adaptive automation to monitor and provide customized feedback to the user. This will ultimately reduce the burden that is placed on the instructor while maintaining the benefits that feedback provides during XR training.

B. ADAPTIVE AUTOMATION
In simplest terms, automation is the allocation of a task, once performed by a human, to a computer [27]. Adaptive automation goes one step further by sensing the state of a user and reallocating tasks between the human and the computer in real time with the goal of increasing overall performance of the task [6].
While many different models of adaptive automation exist today, most are not specific to training. However, Kelley defined several key characteristics that all adaptive training systems share. He argued that they are all feedback loops including four basic elements: a stimulus, a human learner, a performance measurement, and adaptive logic [7]. Figure 1, which is adapted from a similar flow diagram by Feigh, Dorneich and Hayes [28], shows how this process works in practice when combined with XR technology. In the figure, the stimulus is delivered to a user in the form of the virtual environment (VE) and aids in performance measurement by processing user inputs to the system. Next, adaptive logic is used to select a response. During this stage, the adaptive system identifies the need for automation using a trigger, which is activated when a performance measurement meets a certain threshold. Then, an adaptation method (also sometimes called an adaptation strategy) is employed based on the trigger and other system characteristics [29]. Finally, the system applies the automation and repeats the process until the user's training goals are met or the session has ended. The adaptation made to produce this figure, from the original, is the inclusion of the learner wearing an XR device.
Many researchers have studied, and defined a concept called Levels of Automation (LOA). LOA defines how much control the computer has versus the human [30] in an automated adaptive system. One of the most commonly used models of LOA has ten levels ranging from complete manual control at level 1 to complete computer control at level 10 [31]. This LOA model was later augmented by Parasuraman, Sheridan, and Wickens to include a second axis corresponding to the stages of human information processing: information acquisition, information analysis, decision selection, and action implementation [27]. Kaber et al. tested various levels of automation using an air traffic control task. They found that the human-automation team performed best when the human remained responsible for analytical processes like information analysis and decision making, while the computer acquired the information and executed the action dictated by the human [32]. Other research has found that the use of automation can lead to over-reliance and miscalculated trust, especially when the human believes the computer is faultless [12]. This finding is especially important in the context of adaptive training, because learners may not have the help of automation after training is complete. Therefore, over reliance on automation could lead to failure in the field, making it even more important to carefully consider the design of triggers and adaptation methods for adaptive XR training.

C. ADAPTIVE XR TRAINING SYSTEMS
While adaptive automation is not completely new to the field of XR training, its efficacy for XR application is not well studied. In fact, in 2020, a literature review by Zahabi and Razak found only 12 published lab studies on adaptive VR training applications [33]. Only three of those used HMDs in their research and all of them suffered from small sample sizes [34]- [36]. Furthermore, a consistent naming convention for this combination of technology does not yet exist, making it difficult to find relevant information. Several researchers have developed VR training systems that incorporate adaptive elements, but few have studied its effects on training transfer or discussed the process of designing an adaptive XR training system. One of the earliest examples of adaptive training in XR was a system developed by Rickel and Johnson that used a virtual agent acting as either an instructor or a member of a training team for Naval operations [11]. The virtual agent, ''Steve'', could provide verbal adaptive feedback and instructions based on the performance of the trainee and could give physical cues when it was the learner's turn to act. Although no formal evaluation of the efficacy of this type of training was presented, learners were able to successfully complete an aircraft carrier maintenance task with the help of the adaptive virtual agent [11].
More recently, a study was conducted by Fricteaux, Thouvenin, and Mestre in 2014. Their work evaluated a 3D power wall simulator system for fluvial navigation that could manipulate task difficulty and user interface elements based on user input and computed performance metrics. They found that the adaptive system helped learners better predict and control the movement of the boat during the simulated training by 10% over non-adaptive training [37]. A few researchers have studied the use of adaptive XR training using HMDs as well. In a 2018 experiment, Barzilay and Wolf used EMG, motion tracking, and machine learning techniques to adapt upper arm rehabilitation to a patients' needs. They found that their system, using the eMagin 3DVisor HMD, resulted in a 33% average increase in triceps performance on the training task after one VR training session [10]. However, they did not compare their system to one without adaptation. In 2018, Lang et al. published a paper which compared an adaptive driving trainer to one without adaptive features [36]. Their simulation used the driver's performance and gaze measurements to select the difficulty of the driving scenario presented in a VR HMD. They found that those who used the adaptive system outperformed the control group in both driving scores and response time. However, each group only had 10 samples making the results less powerful.
Some researchers have also studied the use of adaptive systems that change based on the user's emotions using video or image capture of a user's face. In one recent case, researchers showed that emotions could be successfully interpreted from a single picture of the user's eye area [38]. Similarly, in 2019, Dey et al. used galvanic skin response (GSR) and electroencephalogram (EEG) to successfully recognize user emotions 96.5% of the time [34]. They were able to use this data as a trigger to adapt a VE. However, both of these emotion recognition methods are not currently feasible with commercial HMD hardware. While newer HMDs like the Vive Pro Eye collect eye tracking data, they do not provide images of the eye area. Additionally, GSR and EEG require additional wearable hardware that is somewhat invasive and can constrain movement.
While several examples of adaptive XR training systems have been published, most are descriptive in nature, and lack rigorous evaluation into their effectiveness for training when compared to non-adaptive methods. Furthermore, none of these works describe the process used to design adaptations for XR training systems, leaving the rest of the community in the dark when attempting to design new and effective adaptive XR training systems. This gap in research can lead to the design of ineffective, and even detrimental, XR training systems due to over reliance and mistrust in the system. The research presented in this paper begins to fill this void in the current body of work by interviewing XR training experts and synthesizing adaptation design recommendations for the XR training community. These recommendations will allow training designers to more effectively combine these two powerful training technologies.

III. METHODOLOGY
In this research, the authors posit that drawing from the experiences of real XR training personnel can aid in the effective design of triggers and the selection of adaptation methods for adaptive XR training systems. In the following subsection, justification for the ''instructor as an adaptive system'' model will be provided. Then the data collection procedures and analysis methods will be discussed.

A. INSTRUCTOR BEHAVIORS AS MODELS FOR ADAPTIVE AUTOMATION
One way to explore the design considerations for adaptive automation within XR training is to examine the behavior of XR training instructors. In fact, Kelley describes adaptive training systems as ''merely the automation of a function performed by a skilled instructor'' [7, p. 547]. The process an instructor uses to provide feedback to learners ( Figure 2) is very similar to the process shown in the adaptive automation model that was shown in figure 1. The instructor perceives the current state of the learner and assesses their performance. The instructor then uses their knowledge of the system to identify triggers and select the proper adaptation. Lastly, they implement the adaptation either by communicating with the learner or by making a change to the system. However, differences arise when you look at how information is received by an instructor and translated into an adaptation for the learner. These differences are highlighted in red in Figure 2. The instructor observes the learner through their own senses, computer-mediated channels such as screen mirroring, and sometimes by viewing information from sensors within the system. The instructor can also choose to act directly on the user, on the simulation, or both, depending on the software features.
The similarities in the two models presented serve to illustrate how the themes gathered from the present interviews of XR training experts can be applied to an adaptive XR training framework. In the adaptive automation model, shown in Figure 1, triggers are used to determine when an adaptation is needed. This is analogous to a human instructor identifying signs of confusion in a learner. Furthermore, the adaptive automation model uses pre-programmed rules or artificial intelligence to select an adaptation method. Similarly, a human instructor uses the information at hand, and previous experience, to choose a mitigation strategy appropriate for the circumstances. Therefore, the methods used by an instructor may also be used to develop XR adaptive training systems. Using semi-structured interviews, the authors identified common triggers and adaptation methods used by adaptive XR training instructors. The data gathered was analyzed and used to make recommendations for the design of adaptive automation within the XR training domain.

B. STUDY PROCEDURE
Participants were recruited using word of mouth and emails to the authors' connections within the XR community. Additional candidates were recruited via referrals from participants. Before participating in the study, potential participants were asked to complete a screener survey to ensure they qualified to participate. Participants qualified if they indicated having previously facilitated training using AR or VR technology. In total, 17 people completed the screener survey. Because of the targeted recruitment tactics, all of the individuals who completed the screener qualified to be interviewed. Of the 17 instructors who qualified, 11 agreed to participate in an interview via Zoom video conferencing software. The sample size of 11 interviews was determined to be sufficient for this research for three reasons: 1) The authors were satisfied that they had reached a point at which no new themes were emerging, also known as a saturation point.
2) The size of the XR training instructor population is quite small since the technology is relatively new.
3) The goal of this research is to inform the development of future adaptive XR training research and not to draw inferential conclusions. All interviews were conducted in April and May of 2020. The lengths of the interviews ranged from 24 to 61 minutes with an average of 37 minutes per interview. The interviews had a semi-structured format. Participants were asked to describe their previous experience with XR training, and then prompted to answer specific questions about learners who experienced confusion during XR training. A list of interview questions was composed and used to address three research questions: 1) What are common sources of confusion during XR training? 2) How do instructors know that a trainee is confused during training?
3) What do instructors do when they realized a trainee is confused?
A complete list of questions asked during the interviews can be found in the appendix. Additionally, follow up questions were posed when appropriate to gain a more detailed understanding of the participant's experience as it pertained to this research. Informed consent was collected from each interview participant. All the interviews were recorded and transcribed either manually or using the Rev.com transcription services.

C. ANALYSIS
The first step in analyzing the transcripts was to create and implement a coding system. The R Qualitative Data Analysis (RQDA) package was used to apply codes to the data. General code categories that aligned with the research questions were created at the onset of the interviews with additional, more specific codes, developed as the coding progressed. The final code book consisted of 100 unique codes grouped into 15 categories.
The majority of coding was conducted by a single coder, while a second coder analyzed a subsample of the data (n = 3) in order to calculate inter-coder reliability. Transcripts were divided into paragraphs based on each new line of questioning during the interview. Relevant codes were then applied to each paragraph. To calculate reliability, each code was considered dichotomous (present or not present in the paragraph) and independent of the other codes. The reliability for each code was calculated using Cohen's Kappa for two raters with unweighted values [39]. Using this method, the agreement between the two coders across all codes was almost perfect (χ = 0.917, p < .0005). This showed that the coding system was robust. In the following section the variable n is used to describe the number of interview participants who gave a response that fit the theme or had the characteristic described.

IV. RESULTS
Interview participants consisted of two women and nine men. They had varying amounts of experience with XR training ranging from 1 to 10 years with an average of 3 years, and all had an educational level of Associate's degree or higher. Participants also had experience with different forms of XR technology. Four of those interviewed had experience facilitating training with VR technology, one had experience with only AR technology and 6 had experience with both AR and VR training technology. The models of hardware used by the instructors varied as well. VR equipment used included the HTC Vive, HTC Vive Pro, Oculus Rift, Oculus Go, Oculus Quest, Valve Index, HTC Cosmos, PlayStation VR, and Windows Mixed Reality devices. AR equipment used by the participants included the Microsoft HoloLens, HoloLens 2, Google Glass, and RealWear HMT-1. Additionally, one participant reported using a flight simulator system with a projection screen for training. The participants had experience conducting XR training for various industries which fell into three main categories: manufacturing/maintenance (n = 6), military (n = 2), education (n = 2), and demonstrations/research (n = 1). Table 1 further summarizes the characteristics of the interview participants.
The following analysis is divided into three categories that describe the participants' experiences with managing confusion during XR training, the first step when determining the potential need for an adaptation. The first section will describe the causes of confusion, the second will list the ways that instructors identify confusion in learners, and the third section will show how instructors intervene to mitigate this confusion.

A. SOURCES OF CONFUSION
When asked to recall common causes of confusion for learners during XR training, instructors had a wide range of answers that were often specific to the particular type of training they were facilitating. However, trends emerged that allowed the authors to categorize these sources of confusion. The graph shown in Figure 3 depicts the frequency of different causes of confusion as reported by the participants. This chart, as well as subsequent charts, depict the number of distinct experts who cited each cause of confusion, not the total number of mentions summed across all interviews. This method was chosen in order to show the pervasiveness of each cause throughout the XR training community. From Figure 3, it is clear that hardware, software, and instructions were the largest drivers of learner confusion.

1) HARDWARE
The most commonly cited source of confusion was the hardware itself (n = 8). Instructors often responded that learners became confused because the hardware did not work as intended or they did not know how to perform the required action using it. One expert, quoted below, noted how learners would become confused because they did not know how to use the VR controllers: ''And we really started to notice that, especially for new people too, anything that we asked them to do beyond just that trigger, the difficulty they had just went up exponentially.'' (P9) In another example, Participant 8 reported that a learner experienced confusion when the batteries in the controllers died and the buttons became unresponsive. These examples showed how learners became confused when they did not understand how to use their XR hardware or it did not work properly. Similarly, learners became confused or frustrated when their immersion was interrupted by the hardware. For example, immersion could be lost when the virtual guardian, which is used to render the boundary of the VR play area, was activated, when the learner came in contact with cords, or (in the case of AR) when the visual fidelity was compromised by light.

2) SOFTWARE
Another frequently reported source of confusion was software not behaving as the developers of the simulation had intended (n = 7). These instances occur for a variety of reasons including software bugs or user error. In her interview, Participant 1 described this phenomenon in the context of a battery manufacturing simulation: ''If the system does something [that] we're not expecting it to do. For instance, if the instructions say that a battery is supposed to come down the line, and the battery doesn't come down the line.'' (P1) These software ''bugs'' or ''glitches'' often caused a learner to miss information that was crucial to the completion of their training task. Some software issues also interfered with the learner's ability to interact with the environment. Instructors reported software issues that affected interactions with buttons in menus, natural user interfaces (such as picking up parts), and locomotion.

3) INSTRUCTIONS
A third notable theme that caused learner confusion was instructions (n = 5). Confusion caused by instructions came in two forms: 1) misunderstanding the instructions and 2) not following them. The former often resulted when language used in the instructions was either too technical or not specific enough. For example, Participant 11 reported having to change relative directions like ''forward'' to objective ones like ''clockwise'' to ensure learner understanding.
Learners who read, but did not follow, the instructions did so for several reasons. First, they may not have seen or heard the instructions (e.g., if they were temporal in nature). Second, they may have executed an incorrect action that caused them to skip an instruction. Lastly, some instructors reported that learners purposefully attempted the task without reading the instructions. In these cases, the learners felt that they could execute the task without the help of the instruction, such as in this example given by Participant 1: ''They skip a step, which really shows me that they may not be reading. It could be something as simple as not going to get the cart to pull it near the machine. . . That tells me right away they didn't read the instructions, they're just going according to what they think is best.'' (P1) Typically, learners proceeded without reading the instructions when they were already somewhat experienced with the task. It was in these situations that a learner would become confused if the simulation did not behave as they expected. This was often a result of the simulation behavior not matching the mental model of the learner.

4) OTHER CONFUSION CAUSES AND FACTORS
Two other causes of confusion that came up during the interviews are not expressed as themes because of their low number of mentions but are worth noting. The first was visual obstructions such as limited field of view and occlusion of objects in the simulation (n = 3). Visual obstructions resulted in confusion because learners could not see the part of the environment with which they needed to interact. The second had to do with learners' mental models of objects in the environment (n = 3). In these cases, learners either did not know they were supposed to interact with an object or did not know how to properly interact with an object in the simulation. Finally, when asked about causes of confusion, several of the instructors mentioned other factors, which were not direct causes of confusion but could augment the feeling of confusion in the learner. These factors included discomfort (n = 6) and inexperience with XR technology (n = 4).

B. IDENTIFYING CONFUSION
The second goal of this research was to understand how instructors identify confusion in learners while they are immersed in an XR training environment. Based on the interview responses, all of the instructors (n = 11) recalled noticing confusion in learners during training. Additionally, two of the instructors reported that they identified confusion following a training session through verbal feedback from the learners. The following subcategories will describe how instructors identified confusion during training rather than after.

1) VERBAL
Methods of identifying confusion during XR training were varied and often included the observation of more than one identifying characteristic. The most frequently reported method of identifying confusion were verbal cues made by the learner (n = 10), as shown in Figure 4. Learners used several different kinds of verbal cues to indicate their confusion as shown in the comments below: ''In a face-to-face training or orientation, they're wearing a headset and I'm standing nearby, and they'll usually ask the question, 'How do I grab this?''' (P8) ''They'll just say, 'What do I do next?' Or, 'What am I supposed to do now?' or something along those lines.'' (P3) Sometimes direct statements were used to indicate the state of confusion as well as the source of confusion, such as in an example given by participant 8 (above). Other times, indirect statements, or vocalizations which indicated the state of confusion but not the source, were employed by learners such as in the example from Participant 3 (above). This placed the responsibility of finding the cause of the confusion on the instructor.
Verbal signs of confusion were conveyed either naturally or through computer-mediated channels. Natural communication occurred when instructors and learners were co-located. During distributed training, microphones and speakers were used to mediate verbal communication. If any part of this computer-mediated communication channel was interrupted, such as by muting of a learner's microphone, signs of confusion could go unnoticed by the instructor. Computermediated communication allowed learners to respond to verbal prompts from the instructor in new ways as well. One instructor (P4), who used an online virtual meeting space to facilitate training, reported that she could gauge understanding based on a learner's use of emojis in VR. She referred to these emojis as a type of virtual ''body language'' that she used to quickly gauge understanding and engagement among large groups of XR learners.

2) PHYSICAL
The data collected during interviews with XR training instructors showed that physical behavior of the learners could also indicate confusion (n = 9). Some instructors cited specific body language characteristics that could indicate confusion such as increased head movements, or lack of body movements. However, the majority of physical indicators described by the participants were more specific to the training task.
Instructors often identified confusion when participants were exhibiting physical behaviors that did not aid in their progress toward the training goals. According to the instructors interviewed, behaviors that indicated the learner was not making progress toward the training goal included performing body language (n = 6), incorrect actions (n = 5), standing in the wrong location (n = 4), not making an action (n = 3), and not reading the instructions (n = 2). Figure 5 illustrates these trends for physical indicators of confusion.
In order to notice these types of physical indicators of confusion, it was necessary for the instructor to have familiarity with the task and to monitor how the learner was progressing through the training. During VR training, instructors described observing the learner's actions in three ways: 1) by mirroring the learner's view on a 2D monitor, 2) by watching them in real life, and 3) by immersing themselves in the virtual environment with the learner. Instructors who were co-located with the learner typically used the first two methods, while distributed instructors and learners used the third method. AR instructors observed the learner's actions in the real world and did not report using mirroring or immersing themselves in the VE. In cases where computer-mediated observation was not used, the instructor had to be extremely familiar with the task and remain vigilant to accurately assess the learner's progress and identify confusion. Five of the instructors interviewed for this research described their ability to identify confusion in learners as a product of their previous experience with the training materials. The following quote from P7 illustrates the importance of having knowledge and experience of the simulation in order to identify confusion proactively: ''The team of us created those tasks, and so in my head I know how they're working through and the steps that they're working through, so I can walk along with them as they do it.'' (P7) This quote exemplifies the current requirement that XR training instructors must be experts in both the subject of the training and in operating XR hardware. This level of expertise takes significant time and effort to gain. By using adaptive automation, stakeholders can reduce the number of expert instructors needed and increase the number of learners who can be trained at any given time.

C. ADAPTING TO CONFUSION
Once the instructors identified learner confusion, the next step was typically to employ a corrective action, also known as an adaptation, to help the learner meet their training goals. A variety of adaptation techniques were reported by the instructors and could be used alone or in combination to yield the desired training results. Figure 6 shows the most common adaptations ascertained from the interviews and their occurrences.

1) VERBAL
The most frequently used adaptation method was to use a verbal intervention (n = 11). Using this method, the instructor could confirm the source of confusion, and then provide a verbal adaptation to allow the learner to perform a corrective action. This allowed the learner to remedy their confusion autonomously and form mental models to prevent future confusion.
When instructors used verbal adaptation, they often used different strategies to help learners overcome confusion. Some instructors first asked clarifying questions to further understand the root of the problem before giving further instruction, such as in this example from P11: Others directed the learner's attention to key components of the simulation such as written instructions or user interface elements to address their confusion. In these cases, the instructor did not give the learner any new information, but rather highlighted a part of the simulation that could help them reach the training goal. Lastly, many of the instructors supplemented the information given in the simulation with more detailed instructions.

2) PHYSICAL
The second most common adaptation method was to physically intervene with a learner (n = 6). In these cases, the instructor used their body to show a learner where to go or how to interact with the training system. This type of adaptation was used in two different ways. The first, and most common, was for the instructor to guide the learner through touch. For example, instructors described touching a learner's hands to show them how to use XR controllers or leading them by the shoulders to stand in the correct position to execute a task. The second way instructors physically intervened was by making gestures. Two of the instructors reported pointing to areas of interest or waving to get a learner's attention. One instructor (P8) also described taking the controller from a learner and demonstrating the correct action for them. However, it should be noted that these methods only work when the learner can see the instructor (such as in AR) or a representative avatar (such as in symmetric VR).

3) INSTRUCTIONAL
The second most common adaptation method was to physically intervene with a learner (n = 6). In these cases, the instructor used their body to show a learner where to go or how to interact with the training system. This type of adaptation was used in two different ways. The first, and most common, was for the instructor to guide the learner through touch. For example, instructors described touching a learner's hands to show them how to use XR controllers or leading them by the shoulders to stand in the correct position to execute a task. The second way instructors physically intervened was by making gestures. Two of the instructors reported pointing to areas of interest or waving to get a learner's attention. One instructor (P8) also described taking the controller from a learner and demonstrating the correct action for them. However, it should be noted that these methods only work when the learner can see the instructor (such as in AR) or a representative avatar (such as in symmetric VR).

4) PROCEDURAL
The final adaptation method used by expert instructors was procedural in nature (n = 5). In this category, instructors changed the order, or duration, of the tasks presented in the training. This included pausing, restarting, or repeating all or part of the training simulation. For example, Participant 1 recalled having to restart a training simulation after experiencing a software issue: The final adaptation method used by expert instructors was procedural in nature (n = 5). In this category, instructors changed the order, or duration, of the tasks presented in the training. This included pausing, restarting, or repeating all or part of the training simulation. For example, Participant 1 recalled having to restart a training simulation after experiencing a software issue: In the experiences recalled by the experts, procedural adaptations did not affect the material presented in the simulation. The same material was presented again or resumed after a break. It is also notable that the instructor's goal when applying procedural adaptations was not to reinforce training goals, but rather to combat software malfunctions or learner fatigue and frustration.

5) INCREMENTAL ADAPTATIONS
Early in the interview process it became clear that instructors often use more than one adaptation method to mitigate confusion, depending on the circumstances. Analyses showed that instructors did not always apply the intervention that would solve the problem the fastest, but rather increased the amount of intervention incrementally (n = 6). Multiple instructors first used a verbal intervention before escalating to a physical intervention such as in the following example given by Participant 3: ''So, we have one of the headphones propped open, and then we'll just try to verbally explain it, and if not just let them know, ''okay I'm going to touch your hands,'' and then place their finger on the button they should be using so that they understand where it is.'' (P3) This allowed instructors to apply the least intrusive adaptation first, observe the outcome, and apply a more aggressive adaptation, if necessary. This also had the advantage of allowing a learner to potentially remedy their confusion somewhat independently.

V. DISCUSSION
Determining when and how instructors adapt to mitigate confusion is the first step in an informed design process for adaptive training systems. By looking at the way instructors deal with learner confusion in XR through the lens of adaptive systems, instructors can increase the usability of their simulations and decrease the need for observation during XR training. This discussion section is divided into three sub-sections. The first will discuss treating disparate causes of confusion in XR training, along with the feasibility of mitigating them using adaptive automation. The second section will describe how the information from the interviews can be used to design triggers in adaptive XR training simulations. Lastly, how adaptive methods used by human instructors can be applied to adaptive methods within closed-loop XR training simulation will be discussed. Within each subsection, explanations of how these findings should influence the design of AR and VR adaptive systems will be given.

A. TREATING DISPARATE CAUSES OF CONFUSION
Causes of confusion were shown to vary widely based on the interview data because of different training goals and conditions. The present work generalized the sources of confusion and summarized them based on factors that were common among all, or most, of the XR training scenarios: hardware, software, and instructions. Another way to explain confusion is through the gulf of evaluation and gulfs of execution [40]. Gulfs of evaluation occur when a learner does not know what task they are meant to execute. Gulfs of execution occur when a learner does not know how to execute a task. Each of the causes of confusion reported by instructors can be categorized into these two groups.

1) GULFS OF EVALUATION
A gulf of evaluation occurs when a user's goal is unclear. During the interviews, this was best exemplified by confusion caused by the instructions in the simulation. Like all technologies, both AR and VR technologies are susceptible to confusion caused by Gulfs of Evaluation [41]. Gulfs of evaluation also occurred when software bugs prevented instructions, or other pertinent information, from being conveyed to the learner. In all of these cases, the learner did not have enough information to know what task to complete, let alone how to complete it. Therefore, when treating confusion caused by instructions (or lack thereof) the gulf of evaluation can be bridged by providing the missing information that help the learner understand the state of the system [41]. In many cases, the problem can be solved in the development stage by increasing specificity, changing the wording of instructions, or presenting the information through a more salient channel. Adaptive automation can be helpful for solving the gulf of evaluation when the XR simulation is meant to provide varying levels of difficulty to the learner. For example, adaptive automation can be used to provide increasingly detailed instructions as the system detects inaction or incorrect actions performed by the learner.

2) GULFS OF EXECUTION
A gulf of execution occurs when the goal is clear, but the user does not possess sufficient information to make progress toward that goal. During this research, it became clear that confusion caused by system hardware often resulted in gulfs of execution. In these cases, learners knew their intended goal, but the hardware or software prevented them from executing the requisite actions [41]. For example, not VOLUME 9, 2021 understanding how to use the controllers, dead batteries, or loss of tracking all interfered with the execution of key actions in both AR and VR. Additionally, visual renderings created by AR HMDs suffer in the presence of light pollution, resulting in gulfs of execution. Similarly, confusion derived from software bugs resulted in gulfs of execution when learners were prevented from interacting with the environment. While some of these causes of confusion could be solved by providing more information regarding system interaction interfaces, others require more disruptive interventions such as system restarts, re-calibration, or, in the case of AR, adjusting environment variables such as light. This can make confusion resulting from gulfs of execution difficult for an adaptive system to diagnose and remedy. However, it is still possible to use adaptive automation to remedy confusion caused by gulfs of execution as long as ample signals are still reaching the system. For example, a learner who does not know how to use the controllers to execute a task may be identified using a trigger such as incorrect button presses. The system could adapt by providing the learner with more in-context information about how to use the controllers. The following subsections discuss recommendations for the design of triggers and adaptations that can help overcome both gulfs of evaluation and execution.

B. IMPLICATIONS FOR TRIGGERS
Many of the methods that XR instructors used to identify learner confusion can be replaced by adaptive automation in the form of triggers. However, the system must also be able to receive the necessary signals in order to recognize these triggers. The following paragraphs discuss how the data collected during the expert interviews can be used to develop more effective triggers for adaptive XR training simulations.

1) VERBAL
Since most modern AR and VR HMDs include built-in microphones, verbal triggers are easily extensible to adaptive XR systems. The microphones can be used to capture verbal signs of confusion in the form of utterances or using certain words. One disadvantage of using XR technology to identify verbal triggers is the wide variety of languages and terms that learners may use. Therefore, coding triggers to specific words or phrases can result in false positives. Artificial Intelligence or natural language processing tools may be used to design triggers that more adequately identify confusion using verbal signals. Furthermore, without a human instructor present, it is unknown if a learner will be as likely to use verbal signals to indicate confusion. Simply informing the user to the presence of verbal cues or asking them to use an activation phrase similar to when using voice assistants (such as Apple's Siri, or Amazon's Alexa) may mitigate this effect and make verbal triggers more effective.

2) PHYSICAL
Physical signs of confusion can be used to create adaptive triggers in XR as well. A detailed picture of the physical environment can be obtained using head and hand position data, button presses, and tracking interactions with objects in the simulation. This data can be used to determine if the learner is performing an incorrect action, not making an action, or standing in the wrong location. For example, a trigger could be designed so that an adaptation is applied if the learner is not standing in proximity to the object required for the task for a specified period of time. Similarly, a trigger for not reading the instructions could be created by checking to see if the learner's head position was oriented towards the text for an adequate length of time. Eye tracking data from HMDs such as the HTC Vive Pro Eye could also be used to increase the precision of the measurements and make the trigger more accurate. AR HMDs like the Microsoft HoloLens and Magic Leap are at a disadvantage for identifying physical triggers because they do not continuously track hand positions. Therefore, adaptive systems that obtain input using these devices would have trouble identifying triggers that rely on the handling of objects in the simulation. Instead, Simultaneous Localization and Mapping (SLAM) methods may be used to identify objects in the environment so the adaptive system can evaluate their state in relation to the user [42].

C. IMPLICATIONS FOR ADAPTATION METHODS
The adaptation methods used by human instructors during XR training can be used to inform the design of adaptive methods within XR training systems. As long as the XR device has the ability to output stimuli in ways that are similar to a human instructor (e.g., provide audible feedback or text to a learner), it is possible to emulate these behaviors in an adaptive system.

1) VERBAL
The most common adaptive method used by expert instructors when combating confusion in XR was verbal intervention. If the cause of the confusion can be reliably identified by the system using the measurements from XR input devices, the system can apply verbal adaptations using pre-recorded messages or text-to-speech software broadcasted through the AR or VR HMD's speakers. Similarly, a voice assistant interface could be implemented to answer common questions asked during training. The same information could also be presented visually using text instructions in the VR or AR environment to supplant or supplement audio cues. By using multiple channels, the information can be reinforced to the learner for more effective retention.

2) PHYSICAL
Physical interventions present a unique set of challenges when designing for adaptive systems. Most notably, current HMDs have limited abilities to apply physical stimuli on a learner besides applying vibration, which is only possible using certain VR controllers. Therefore, adaptations designed to imitate physical interventions used by instructors must be supplemented by other sensory channels in both VR and AR.
For example, instead of physically touching a learner's hand to communicate how to use a controller, an adaptive XR system could use a visual substitution to highlight the buttons on a model of the controller in a simulation.

3) INSTRUCTIONAL
Instructional interventions are the most easily extensible to both AR and VR technologies because they involve an instructor acting on a human-machine interface. Since the human-machine interface (in this case the HMD) is already a part of the system flow, the adaptive system can act on the interface faster than the human instructor could. For example, instructors reported changing the wording of instructions as a confusion mitigation technique. However, this often took a lot of time and collaboration with developers to implement, and was often paired with a temporary adaptation, such as verbal feedback. In contrast, an adaptive system could alter the specificity of the text in real time, using more details for novices and fewer for expert learners to test their skills. Additionally, adaptive automation can replace the need for constant observation by an expert instructor, saving time and increasing training throughput, while also decreasing late cycle product changes for developers.

4) PROCEDURAL
Instructors use procedural adaptations by altering when and how learners experience a simulation. Typically, this is done by restarting the simulation, which can be replicated in an adaptive system by reloading the scene to pre-selected checkpoints. However, the interviews showed that this type of intervention is often used by instructors when the learner is experiencing confusion from hardware malfunctions in both AR and VR. In these cases, the software or hardware has failed, and must be restarted by the instructor, rendering adaptations ineffective. However, if the failure is not catastrophic, or is isolated to a certain feature of the simulation, adaptive methods could be integrated to help the learner troubleshoot the problem themselves. For example, the simulation could use long periods of inaction coupled with input from the learner to identify a malfunction and direct a simulation restart. Additionally, features that allow the software to save the user's progress should be implemented to avoid repeating training.

VI. CONCLUSION
In the present paper, the authors have summarized the need for adaptive XR training as well as the need for guidance in the design of adaptive XR training software. By using data from interviews with XR training experts from various fields, developers can make more informed decisions when designing triggers and choosing adaptation types. Furthermore, research should be conducted to fully evaluate the efficacy of each of the XR trigger and adaptation methods concluded from this research.
In summation, the authors recommend the following guidelines as a starting point for the design of adaptive triggers in both AR and VR training: 1) Verbal triggers should be prioritized when designing adaptive XR training systems. However, measures should be taken to inform and promote the use of vocalization from the learner, especially when a human instructor is not present. 2) Physical triggers should also be of high priority, especially in XR systems where appropriate stimuli can be applied to a learner. These include monitoring for signals of inaction, incorrect action, and body position. When designing adaptation methods for XR training, the authors make the following recommendations: 1) Physical adaptations can be replicated using models and animations in the VE. They should be prioritized to indicate points of interest in the VE, visualize hardware interactions, and even demonstrate the correct performance of the task. 2) Incremental adaptations can be used to allow the learner to think through a problem on their own. 3) Instructional changes can be used to increase or decrease the amount of instruction given to the learner. This is useful for adapting to different learners' experiences and creating varying levels of difficulty. 4) Procedural adaptations are of lower priority and involve the restart of the application or repetition of some, or all, tasks in the XR simulation. They can be used to reinforce a training concept, or to help the learner troubleshoot a problem with the system. It should be noted that the recommendations in this work have not yet been tested in live adaptive XR training systems, nor does this constitute an exhaustive list of adaptation types and triggers. Therefore, designers and developers using these recommendations to inform the design of their XR training systems should conduct their own testing to ensure that the adaptations are effective for their system. Furthermore, continued research should be done to evaluate the accuracy of various triggers in adaptive XR training, as well as the efficacy of different adaptation types so that a standard can be created and dispersed to the XR training community.

APPENDIX A INTERVIEW QUESTIONS
1) Background a) What kinds of Virtual or Augmented Reality training have you facilitated? b) What was the task? c) What was the learning objective? d) How many people were trained? e) Did the trainees have any previous experience with VR/AR? f) What were the demographics of the trainees? 2) Setting and Methodology a) Tell me about the last time you facilitated a VR/AR training session with a trainee. b) What hardware was used? c) Where did the training take place? d) Was there any introductory training to use the hardware? e) How long did the training take? f) How did you monitor the progress of the trainee? g) How were the tasks conveyed to the trainee during the simulation? h) How did you communicate with the trainee while they were immersed in the Virtual Environment?
3 4) RQ2 -Recognizing Confusion a) How did you know that the trainee was confused during the training? b) What do/don't they do? c) What do they say? d) What other cues tell you that there is a problem? e) What did you do when you realized the trainee was confused? 5) RQ3 -Intervention Methods a) If you intervened, how did you intervene? b) How did you determine when to intervene? c) What did you say or do? d) Why did you choose to intervene this way? e) How did your method of intervention help the trainee overcome their confusion? f) How did you know the trainee had overcome their confusion? g) How often do you have to intervene to clarify or resolve confusion during VR training? h) If you didn't intervene, did you use any other technique to mitigate the trainee's confusion in the future? i) If you didn't intervene, did you use any other technique to mitigate the trainee's confusion in the future?