Attention-Based Applications in Extended Reality to Support Autistic Users: A Systematic Review

With the rising prevalence of autism diagnoses, it is essential for research to understand how to leverage technology to support the diverse nature of autistic traits. While traditional interventions focused on technology for medical cure and rehabilitation, recent research aims to understand how technology can accommodate each unique situation in an efficient and engaging way. Extended reality (XR) technology has been shown to be effective in improving attention in autistic users given that it is more engaging and motivating than other traditional mediums. Here, we conducted a systematic review of 59 research articles that explored the role of attention in XR interventions for autistic users. We systematically analyzed demographics, study design and findings, including autism screening and attention measurement methods. Furthermore, given methodological inconsistencies in the literature, we systematically synthesize methods and protocols including screening tools, physiological and behavioral cues of autism and XR tasks. While there is substantial evidence for the effectiveness of using XR in attention-based interventions for autism to support autistic traits, we have identified three principal research gaps that provide promising research directions to examine how autistic populations interact with XR. First, our findings highlight the disproportionate geographic locations of autism studies and underrepresentation of autistic adults, evidence of gender disparity, and presence of individuals diagnosed with co-occurring conditions across studies. Second, many studies used an assortment of standardized and novel tasks and self-report assessments with limited tested reliability. Lastly, the research lacks evidence of performance maintenance and transferability.


I. INTRODUCTION
Autism is a lifelong neurodevelopmental condition clinically defined by difficulties with social communication and interaction and by the presence of restrictive and repetitive behaviors and interests [1]. In reality, the autistic population is far more varied than may be gleaned from the listed criteria in the diagnostic manuals or the other widely used terminology (e.g., high-functioning, low-functioning). Complex differences in sensory sensitivities, the nature of repetitive behaviors, and the various types of all-encompassing interests all contribute to an autistic individual's ability to communicate. It is, therefore, more useful to consider autism as a constellation rather than a linear spectrum, as the latter oversimplifies this variability [2].
Attention is a complex cognitive process which involves and influences perception, memory, and decision-making to select aspects of information with which to interact [3]. Attention-related disorders, such as Autism Spectrum Disorder (ASD) have the potential to benefit from virtual environments. ASD is a pervasive neuropsychiatric diagnosis in which individuals experience hypo-and hyper-sensitivity towards physical stimuli due, in part, to over-selective attention [4]. Despite being easily affected by stimuli, individuals with ASD process visual cues more effectively than other types of sensory stimuli [5].
The umbrella term, extended reality (XR), encompasses a spectrum of augmented reality (AR) and virtual reality (VR) technologies that fall on the virtuality continuum [6]. It involves augmenting real-world content with virtual content and compelling digital cues using a camera and display, desktop monitor, head-mounted display (HMD), projection screen, or mobile device [7]. AR superimposes virtual information onto real-world content using a screen and camera [8] or a head-mounted display (HMD) [3], while VR allows users to interact with virtual environments [9].
XR allows researchers to control testing in a wide variety of scenarios and to produce more generalizable results due to increased ecological validity in comparison to traditional laboratory and clinical interventions.
Previous evidence indicates that performance improvements in a simulated environment transfer to the realworld for neurotypical individuals [10]. Similar results were also found for neurodiverse participants. For instance, improved attention performance achieved from a virtual classroom approach transferred to real-world classroom settings for participants with attention-deficit hyperactivity disorder (ADHD) [11]. Regardless of the variability between XR technology and intervention methods, cognitive and physical skills acquired during XR-based training transfer just as successfully to the real-world as those obtained from traditional training [12]. Furthermore, Kaplan et al. [12] argue that XR training provides benefits beyond traditional methods such as the ability to mitigate risk from dangerous situations.
Despite being hyper-or hypo-sensitive to stimuli, autistic individuals process visual cues more effectively than they do other types of sensory stimuli [5]. It is for this reason that XR can be particularly beneficial in supporting autistic users. AR, for instance, has been used to engage behavior by projecting information onto complex tasks to help users direct attention more efficiently [13]. Gaze-contingent interfaces have also been shown to facilitate interaction in joint attention tasks [14]; though, many of these effects are typically limited to reactions with the performance of neurotypical (NT) individuals. Due to the heterogeneity of autism, there is no particular technology or uniform intervention that can work for all users on the spectrum.
In order to inform researchers in the XR and autistic communities on promising directions for future exploration, we first provide a comprehensive overview of the methodologies used in a selection of studies. Next, we identify key methodological challenges of attention-based research for autism using XR and determine effective measurement tools and metrics. Lastly, this review proposes recommendations on more inclusive ways to conduct studies with autistic participants.

A. UNDERREPRESENTATION IN AUTISM RESEARCH
The global prevalence of diagnosed autism ranges between 0.3% to 1.2% of the population, with Western countries demonstrating a prevalence of 1% [15]. The majority of male participants in autism studies reflects the global male-tofemale ratio in the identified autistic population (e.g., 1.33 to 16.0 in Northern Europe; 2.2 to 5.8 in North America), of which diagnosed males variably, but consistently outweigh diagnosed females [15]. However, recent findings suggest that autism presents differently in females and that current diagnostic criteria are largely male-biased [16] resulting in underdiagnosis of females and those who identify as other genders.
Autism research also focuses disproportionately on children due, in part, to the prioritization of understanding causal factors [17]. Furthermore, diagnostic criteria were far stricter during the advent of autism research in the 1960s and 1970s, and autistic adults who were not diagnosed or misdiagnosed at the time may have continued to manage thus far.
Lastly, there remains a controversial divide amongst communities, clinicians, and researchers between focusing on rehabilitation and developing medical cures, or on promoting acceptance and establishing rights [17]. Table 1 describes the distinct attentional processes autistic people experience in comparison to neurotypical (NT) people (i.e., undiagnosed with ASD or do not display evidence of autistic traits). One such attentional process involves shifting focus from one aspect to the next or broadening and narrowing the area of focus (i.e., selective attention) [18]. For instance, in comparison to NT children, autistic children shifted attention more between non-social objects than between people [19].

B. AUTISM AND ATTENTION
Another attentional process relevant to autism includes the ability to focus on an activity or stimuli over a long period of time (i.e., sustained attention). Autism is most associated with differences in social attention (i.e., how focus is directed during a social situation). This includes joint attention (JA) (i.e., socially coordinated visual attention [18]), which plays a critical role in the development of social-cognitive processes involved in the inference of intentions or emotions of another person, as well as influencing how stimuli are encoded [20], [21].
Visual attention has been the most widely researched form of attention investigated in autistic populations. However, selected methodologies warrant more clarification as numerous conflicting findings are resulting from the use of methodologies that are not ideally suited to answer research questions or whose implications are not well understood [18].
The main paradigms used in visual attention research include eye-gaze cueing, spatial attention, eye-movement recording in scene-viewing tasks, change detection, and computational modelling [18]. Evidence also demonstrates different brain activity during attentional processing of sensory inputs (e.g., visual, auditory, tactile) [22]. Findings from other types of attention, specifically aural, generally correspond to those of visual attention [23]. These findings suggest that differences in attentional bias observed in autistic people during social and non-social situations may result from differences in brain activity, rather than neurological deficits.

C. RESEARCH GAPS
Previous systematic literature reviews analyzing research involving XR applications for autistic people [31]- [36] have all concluded that XR provided a safe and comfortable environment for autistic users to learn communication and social skills. However, half of these papers were focused on reviewing interventions designed for VR technology [34]- [36]. As a result, these reviews included limited amounts of literature, ranging from 6 articles [35] to 31 articles [33]. Several reviews also reached the common conclusion that there was a lack in methodological validity across studies which could be improved with more objective evaluation methods [31], [33], [36]. While existing review articles briefly discussed different types of evaluation parameters, none elaborated on the effectiveness of these parameters and the systematic synthesis of effective physiological and behavioral attention cues in autistic populations.
This systematic review aims to overcome this shortcoming by providing a comprehensive analysis of methodological aspects (i.e., study design, evaluations, limitations) and systematically assessing the effectiveness of physiological and behavioral metrics on attention components. This review is motivated, in particular, by supporting future XR research and development that identifies autistic characteristics as differences, rather than impairments or deficiencies, and works towards developing technology to support these differences across a variety of individuals.

III. METHOD
Standard systematic review methods adopted from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol [37] was used to conduct and report the current review. To better understand how XR studies can accommodate the differences in attention experienced by autistic individuals, this systematic review was guided by three themes (demographics of articles and participants, study design, and outcomes and findings) that include ten research questions in Attention Type Description

Selective
• Increased perceptual capacity in autistic people compared to NT people can explain altered attention [23] • Autistic individuals demonstrate strengths in nonsearch processes (e.g., discrimination) and reduced abilities in the generalization that is dependent on levels of perceptual load (i.e., the extent of distractor processing is dependent on the amount of perceptual load) [24] • In terms of visual search, autistic adults demonstrated enhanced perceptual capacity during selective attention tasks when presented with greater perceptual load in comparison to non-autistic participants [25] Sustained • Autistic children perform similarly to NT peers on sustained attention tasks (e.g., Continuous Performance Test [26]) [27] • Activity in the prefrontal cortex (associated with motivation and reward) differs between autistic and NT groups, implying that autistic people mediate motivation and respond to reward differently [28]. • Impairments in sustained attention may result from a misperception of disinterest or lack of motivation Social • During eye-tracking studies, autistic participants demonstrated differences in gaze patterns towards social regions (e.g., mouth, eyes) while viewing social scenes compared with NT groups [29] • Autistic individuals demonstrated differences in initiating and responding to joint attention, which was linked to lower information processing in comparison to non-autistic peers [21]. The development of these is closely linked to the ability to learn with and from other people [30], suggesting that differences lead to learning problems in affected individuals [20] • In autistic children, evidence of disrupted development of attentional bias towards social content resulted in missed opportunities to learn about social behavior and language [17] Theme total. The first theme provides an understanding of how representative the participants are of the target population, while the second theme contributes data on methodological practices used in studies and the third theme addresses the effects of these methodologies on autistic participants. Altogether, these questions aim to demonstrate how much different research methods can be generalized and to shed light on alternative methods to improve generalization to different populations. Table 2 presents our detailed research questions under each theme.

B. SEARCH STRATEGY.
A comprehensive literature search was conducted in June 2020 on six electronic databases, including ACM Digital Library, IEEE Xplore, ProQuest, ScienceDirect, Scopus, and SpringerLink. The search string displayed in Table 3 was developed based on the subjective judgement of the researchers of categorical terms that occurred frequently in preliminary literature searches. The terms included in the three identified categories (i.e., XR technology, autistic populations, or attention) were alternative names or related concepts. Search strings (Table 3) were tailored for each database and were used to search title, abstract, and author-specific keywords of articles. As XR is a relatively recent field, there was no limit to publication period to allow for more studies to be identified.

C. SCREENING AND ELIGIBILITY
Articles were screened for duplicates using Mendeley software and verified for actual inclusion of the search terms within the full text. After duplicates were removed, titles and abstracts of the pooled articles were scanned, and full-texts were scanned in the following step.

Category Search String
XR ("virtual reality" OR "virtual environment" OR "VR" OR "augmented reality" OR "AR" OR "diminished reality" OR "DR" OR "mixed reality" OR "MR" OR "extended reality" OR "XR" OR "immersive technology" OR "three dimensional" OR "3D*" OR "3-D*") AND Autism (autis* OR ASD OR ASC OR Asperger OR "pervasive developmental disorder" OR PDD-NOS OR neurodevelopment* OR neurodiver* OR NDD) AND Attention (attention OR distract* OR concentrat* OR focus) For inclusion into the systematic review, titles and abstracts were scanned for articles focusing on autism and attentionrelated tasks (Table 4). Empirical studies were included where the interaction between humans and XR was part of an intervention that targeted autistic users.
Papers involving studies that did not include attentionbased tasks using XR technology for autistic populations, or involved non-human participants, were excluded. Remaining papers were read in full and those that did not report on the design or effects of XR for autistic participants were also excluded.

D. DATA EXTRACTION
A data extraction form was developed to capture items related to study characteristics such as primary author, publication year, sample size, male-to-female ratio, age, diagnosis, XR hardware, characteristics of intervention, outcomes, limitations, and future research agendas.
An analysis of the selected literature was performed using data analysis software. Papers were imported and nodes were inductively created to analyse data from the methods, results, discussion, and conclusion sections of each paper according to our research questions. Similar to Stowell et al. [38] we iteratively coded and compared the literature using attributes (e.g., descriptions of participants, sample size of autistic participants, presence of NT participants, type of XR hardware used, intervention tasks, and attention evaluation measures, reported limitations) to determine emerging themes.

IV. RESULTS
This section provides a concise overview of attention-based research for autism using XR. A total of 1531 articles were initially identified from the database search (Fig. 1). After removing duplicates, 1214 titles and abstracts were screened for XR-related keywords. A total of 1073 titles and abstracts were further screened against the exclusion criteria. The full texts of 173 articles were reviewed, of which 59 articles met the criteria for inclusion in the review and nine were used for a meta-analysis.

A. RQ1: RESEARCH AFFILIATIONS?
The 59 papers reviewed work led by first authors affiliated with academic or professional institutions across 16 different countries (Table 5). There were also 262 collaborating authors from 71 academic or professional institutions across 24 different countries. The United States had substantially more affiliations than all other countries, followed by India, the United Kingdom, China, Taiwan, and Indonesia. We did not identify any papers affiliated with African countries.

B. RQ2: WHAT WERE THE DEMOGRAPHICS OF PARTICIPANTS?
Out of 59 papers, 54 papers involved autistic participants, totaling 736 participants. The following section covers the different characteristics of these participants. Characteristic distributions were only calculated from papers that provided a sample size of autistic participants. Among the studies that did not disclose the average age for included ASD participants, six papers highlighted studies that involved participants with an age-range under 18 years [39]- [44] while one paper involved adult participants (i.e., over 18 years) [8] and four papers failed to report the average age of autistic participants [45]- [48]. Five papers proposed interventions intended for autistic users without involving autistic participants in studies. In regards to these papers, three presented interventions intended for autistic children [49]- [51] while two others did not specify the age range of target users in their studies [52], [53]. None of the interventions were designed for autistic adults.
Gender distributions were calculated from autistic participants. 35 papers reported participants as either male or female, while 18 papers did not report gender. Of the 440 participants for whom gender was reported, 87.7% of them were male. This echoes previous findings showing that the majority of participants in autism studies are male. None of the studies in the current review acknowledged other gender identities (e.g., non-binary, gender-fluid).

C. RQ3: WHAT METHODS WERE USED TO SCREEN PARTICIPANTS DURING THE SELECTION PROCESS?
Out of the 54 papers with ASD participants, 14 papers had no indication of autism type while 33 papers required participants to have received some form of autism assessment. As displayed in Table 6, the following diagnostic tools were used across studies: Autism Diagnostic Observation Schedule [54] (ADOS), Social Responsiveness Scale [55] (SRS), Social Communication Questionnaire [56] (SCQ), Childhood Autism Rating Scale [57] (CARS), Autism Diagnostic Interview-Revised [58] (ADI-R), Autism Spectrum Screening Questionnaire [59] (ASSQ), Gilliam Autism Rating Scale [60] (GARS), Autism Spectrum Inventory [61] (ASI), and Autism Quotient-Child (AQ-Child). 21 papers also reported using some form of intelligence (IQ scores e.g., full-scale IQ, verbal IQ, performance IQ). It is also worth noting that all five papers presenting preliminary designs and studies without ASD participants did not indicate diagnostic traits for their target audience [40], [49]- [51], [53].

E. RQ5: WHICH ATTENTION TYPES WERE TARGETED DURING PRIMARY LITERATURE?
As displayed in Table 7, nearly half of the papers included in this review focused on social attention. Four papers [49], [70], [101], [102] focused on selective and sustained attention. Two articles [51], [89] focused on social, selective, and sustained attention.
Of the included literature, significantly more focused on social attention and involved two or more co-occurring tasks (33.3%), compared to those that investigated only selective attention (18.8%) or only sustained attention (17.6%) studies.

F. RQ6: HOW WAS ATTENTION MEASURED IN THE PRIMARY LITERATURE?
Overall, 52 of the 59 papers mentioned using attention measurement strategies with autistic participants. The two categories of studies included quantitative and qualitative. Physiological and behavioral signatures were used in 22 papers making them the most frequently used measures across the featured literature (Table 7), We found eye tracking was used exclusively in 72.7% of these studies. Motion-tracking sensors were also widely used (63.6%) to measure body position (n = 7), hand gestures (n = 5), and head position (n = 3). Additionally, several papers also measured brain activity using functional magnetic resonance imaging (fMRI) (n = 2) or electroencephalogram (EEG) (n = 4), with two also employing eye-tracking in tandem ( [69], [71]). Three papers (13.6%) also applied other physiological measurements (e.g., electrocardiogram, electrodermal activity, electromyogram, galvanic skin response, pulse plethysmogram, skin temperature, respiration).

G. RQ7: WHAT TYPES OF ATTENTION TASKS WERE USED DURING THE PRIMARY LITERATURE?
The types of attention-based tasks implemented in each paper were dependent on aims of the study. These included improving attention or training a skill (e.g., pronunciation, emotion recognition) or assessing attention differences between autistic and NT participants or amongst different types of technology. Tasks involving imitation, interaction with an avatar or agent, following cues, and contextual familiarization were primarily used to for improving attention or training skills. Interacting with objects and recognizing and identifying objects, emotions, or words were also used for these aims, apart from [65], [67], [74], [79] which compared differences between autistic and NT participants and [66], [74], [95] which compared attention between different modalities. Tasks requiring participants to initiate or maintain eye gaze or passively attend to a scene were used for different kinds of aims. Table 8 shows the different types of attention-based tasks used in the literature. While most papers involved interaction with either a virtual agent or object, these interactions often occurred with a second gaze-based task, in which participants fixated eye-contact onto stimuli [72], [73], followed cues to interact with objects [8], [78], [96] or to maintain gaze [49], or interacted with an avatar to identify emotions [64]. Two studies involving four tasks (e.g., interact and maintain gaze with an avatar to follow cues to interact with objects) [50], [51] focused on social, selective, and sustained attention.

H. RQ8: HOW DID XR-BASED TASKS TARGET ATTENTION AND WHAT WERE THE OUTCOMES?
The main purpose of XR-based tasks intended to improve attention or attention-related skills through intervention. Other tasks served to help researchers understand the differences in attention between autistic and NT participants. We identified eight domains of tasks ( Table 9) that involved responding to social cues from an avatar or agent (e.g., Maintain/initiate gaze, imitation), interacting in realistic scenarios (e.g., Contextual familiarization), or playing games involving cognitive tasks (e.g., Recognition/identification). Table 10 further specifies tasks and their outcomes for each paper.

Train air travel skills using VR HMD Perform virtual rehearsal of air travel situation
Pre-and post-test subjective questionnaire by parents; clinical observations

Increase in attention to VR intervention associated with increase in use of targeted vocabulary
Wang et al. [8] (2020) Propose an attention-training interface ---found no difference in participant ability to identify target objects from avatar cues accurately (e.g., pointing, headturning). Two articles included studies examining responses to viewing social cues [62], [82]. Viewing a social scenario scene was used to distinguish attention patterns and between ASD and NT participants. When viewing a realistic scenario of an approaching male figure, [62] found activation in neural regions associated with social attention (i.e., right temporoparietal junction) to averted gaze in autistic participants, compared to activation to direct gaze in NT participants. In another study, researchers examined viewing patterns towards a realistic virtual face using a gaze-contingent display in which everything but the focal point was blurred [82]. Researchers noted highly heterogenous viewing patterns among autistic participants but found they generally focused on areas surrounding the face more than NT participants. Also, while the gaze-contingent lens resulted in more gaze stability, only one of 13 autistic participants and half the NT group became aware they were controlling the lens, which may have implications in agency judgements.
Vocational skills (e.g., inspecting and sorting boxes in a warehouse) required attention to detail [89], [96]. Authors found that participants tended to ignore to assistive prompts and, instead, recommended highlighting objects.
Driving simulations highlighted salient stimuli (e.g., pedestrians, road signs) and used eye-tracking to track attention towards these [70], [94]. Participants showed significantly fewer driving errors [70], and better baseline performance was related to the rate of improvement in driving skills [94].
In other studies, participants wearing AR smartglasses that guided socially salient visual stimuli improved attention while interacting through conversation with an educator [77], [97], [100]. Another paper [49] found prompting participants with fading cues over faces was more effective than flying and exploding cues for directing attention; however only NT participants were involved. During a public speaking task in which participants were required to attend to virtual peers while answering questions, [75] researchers examined moderating effects contributing to variances in attention and found social attention in autistic children were affected by IQ, social anxiety, and ADHD symptoms.

GAMIFICATION OF COGNITIVE TASKS
Many interventions involved gamifying repetitive tasks by having participants use motor movements to perform cognitive activities.
Participants used hand gestures to sort colored balls into corresponding boxes (e.g., [51], [80]) or pair matching items (e.g., words and definitions [99], objects [83]). Two other studies also required participants to move an object along a path using their hand [47] or a haptic arm [65].
Short-term improvements in sustained attention were observed during intervention sessions [51], [99] and long-term improvements in attention towards contextual information during a drag-and-drop task were maintained two weeks posttest [83]. Other long-term improvements in initiating and responding to joint attention cues using pointing, showing, and sharing skills during various scenarios were also made over a three-month period [86].
When comparing visuospatial attention between autistic and NT children, both groups demonstrated similar performance accuracy when raising their arms to hit balloons or bubbles. However, autistic children required longer time for successful attempts [95].

I. RQ9. WHAT WERE THE ATTENTION-RELATED OUTCOMES OF USING PHYSIOLOGICAL AND BEHAVIORAL MEASUREMENTS?
A mini meta-analysis of physiological and behavioral metrics was conducted. Table 11 summarizes the nine papers that were included in this quantitative synthesis. Between-group studies with two groups of participants were included (autistic and age-matched group of neurotypical participants). The purpose of involving a NT group in these studies served to compare the implications of attention qualities (e.g., absolute or relative) on between-group differences of physiological or behavioral measures. Papers were omitted if they did not report statistical analyses or if the data was unrelated to attention qualities.
We reported the effect sizes (partial eta squared) and Fvalues. Significance levels 0.01 (small) 0.06 (medium) and 0.14 (large) were taken from [110]. Relative attention demonstrated the lowest effect size (0.14), while absolute attention had an average effect size of 0.19. While reporting of results varied, we identified 15 p-values that were consistently reported. Furthermore, effect size (np 2 ) was calculated for single-way ANOVAs that reported F-values. p-values greater than 0.05 were plotted as 0.06, and those less than 0.05 were plotted as 0.04.
Absolute attention (i.e., total attention spent focusing on stimuli) of social stimuli and relative attention of social stimuli (i.e., the ratio of attention to stimuli compared to entire session) were identified as themes of qualities of attention from [103].
Five out of six of the absolute attention papers reported significant p-values (p<0.05) or trending towards significant levels (i.e., [82]). All relative attention literature failed to record strong evidence against null hypotheses [63], [67], [76], [103]. The remaining five articles recorded an insignificant pvalue (i.e. [101]). However, this study also included the fewest participants. Overall, studies showed significant differences in attentiveness toward social ROIs (i.e., neutral virtual human face) between autistic and NT groups. On the other hand, high p-values for relative attention studies demonstrated a lack of significant differences in social viewing patterns between the two groups.
Further limitations of small sample size include the lack of gender parity in the sample [85], [95] and differences in participant profiles [40]. Limitations in sample size may be partly due to selection biases during the recruitment process (e.g., only participants without intellectual disability [87]). Several studies also pointed out constraints to generalize results to the autistic population due to idiosyncratic differences between individuals with ASD [80], [88]. Task design appropriateness / ecological validity [63], [80]. Constraints associated with short duration involved inadequate exposure to technology or limited task repetition [96].
Another cited concern was the absence of a comparison group [94], [100] (e.g., no NT group [91], differences in gender ratio of the control group [82]) leading to an inability to examine differentiation effects. An additional recurrent limitation included the lack of equivalent metrics for calculating comparable performance across conditions [70], [95].

V. DISCUSSION
This section discusses research themes and sub-themes of key challenges (see Table 13) and makes recommendations on future research directions.

1) STUDY LOCATION
Similar to previous reviews [112], the earliest study in this systematic review was published in 1996. The papers in this review originate from many of the same countries that consistently produce high outputs of autism research, including those in the U.S., Japan, China, and the U.K. [15]. This is also the first review of autism research using XR with a large number of studies derived from India, where autism diagnoses vary significantly across regions. There were no papers published from African countries, where autism remains largely underexplored [15]. It is crucial for future work to expand to different cultural communities and socioeconomic groups as it would allow researchers to compare intervention strategies.

2) PARTICIPANT CHARACTERISTICS
The majority of studies in the systematic review recruited participants between 3 and 18 years, although the prevalence of autistic adults is similar, if not greater [113]. Furthermore, adults may have learned to camouflage autistic characteristics, leading to underdiagnosis [114]. Male participants in these studies also greatly outnumbered female participants. These results are consistent to those attained by [33], who found many studies that included only autistic boys. The rate of males participating in studies greatly outweighed females and other genders. While this is in line with current gender ratios [15], females are often underdiagnosed due to the different ways ASD presents itself in this gender [115], [116]. For instance, autistic females are stronger in socio-cognitive skills [115], [117] and focus skills [118], which may contribute to differences in performance on attention tasks. It is, therefore, important to consider the effect of gender identity on user experience and efficacy as effective intervention methods from the primary literature may not transfer in the same way to female and non-binary populations. An estimated 70% of autistic people have a comorbid psychiatric disorder, with 40% having two or more [1]. Approximately 25.7% of autistic people are also diagnosed with ADHD, and diagnosed anxiety disorders related to attention (e.g., anxiety disorder, obsessive-compulsive disorder) are also commonly linked with ASD (17.8%) [119]. This resulted in more generalizable and insightful outcomes. For instance, research that accommodated the effects of ADHD [75], [79], [100] observed impacts in autistic participants that were not present in groups with both ASD and ADHD. In contrast, some papers attempted to generalize results without considering the effects of learning disability or language disorder that resulted in inconclusive findings [85]. Given the substantial heterogeneity within ASD [78], [79], it is essential to examine the interplay of identities and comorbidities to better interpret and apply findings to wider intervention adoption and effects.

1) ASSESSMENT TOOLS
Several studies relied on standardized assessments with high reliability to evaluate outcomes. However, some of these failed to yield significant differences between pre-and posttest scores [85], produced inconsistent results [68], or resulted in ceiling effects [83]. Many studies also relied solely on selfdeveloped assessments or rating scales [40] which raises questions on the validity of the results. Amaral et al. [72] developed a JA assessment, which did not yield any effect, but the standardized measures which were used as secondary outcome measures all yielded significant results. Researchers are encouraged to adopt appropriate, standardized quantitative measures consistently across XR studies to enable valid comparisons and more robust evaluations of attention and efficacy over time.
Measurements of the autonomic nervous system can reflect affective responses not outwardly expressed [122], [123]. Jyoti and Lahiri [78] used these physiological indices to investigate participants' ease of understanding during joint attention tasks and found autistic participants showed higher variations in pulse rate compared to NT participants. Bekele et al. [63] provides supporting evidence that found such measurements (e.g., respiration, pulse, skin temperature) to be over 90% accurate in assessing stress during an attention task.
Head-position tracking can also be used to examine comorbid effects of ADHD. Jarrold et al. [75] found selective attention was lower in autistic participants with high anxiety and ADHD than those with low anxiety and ADHD based on how they oriented their heads during a public speaking task. However, other studies found it difficult to gauge whether participants were looking at ROIs correctly when turning their heads towards target stimuli [105].
fMRI studies with VR were primarily used to provide insight on differences in brain activity during social situations that may not otherwise be feasible to conduct in real-world settings. Two studies found different neural regions that showed levels of activation dissimilar to NT groups [62], [64]. For instance, direct gaze with an avatar elicited less activity in the region associated with social attention tasks related to the judgement of mental states (right temporoparietal junction) and more activity in the region related to selective attention (dorsolateral prefrontal cortex) [62], suggesting habitually reduced attention to socially salient stimuli.
Based on findings, physiological measurements provide the most salient insights. Eye-tracking, which appears to be the most used, is also the most accurate at measuring performance in attention tasks. fMRI, and ANS measurements (i.e., EEG, electrocardiogram, electrodermal activity, electromyogram, galvanic skin response, pulse plethysmogram, skin temperature, respiration) also provide an additional layer of understanding, although the ANS measurements may be the more feasible and cost-effective option of the three.

1) TASK APPROPRIATENESS
Complex tasks are challenging to assess and require reliable measurements [88]. Inconsistencies in the validity of attention tasks in pre-and post-test tasks further add to the difficulty of evaluating improvements on performance from interventions [80]. Thorough attention interventions with virtual scenarios highly relevant to the real environment produce transferable results [11].

2) REALISTIC SCENARIOS
Lorenzo et al. [41] highlighted the importance of making VR scenarios as real as possible as this reduces adaptation time and increases skills learned during virtual role-playing that can then be transferred into the real-world. Cheng and Huang [86] also found children were more motivated to learn when they perceived the content to be closely related to their daily school activities.
Jyoti and Lahiri [78] modelled avatars after local Indian populations to facilitate the sense of real-life experience in JA tasks. One study found that when autistic children were able to customize an avatar to look like a real person they were familiar with, they were more likely to focus on contextrelevant information [45]. This corresponds to another study in this review, which found that, in contrast to NT children, autistic children were unaffected by the uncanny valley effect. Interestingly, Kumazaki et al. [74] found autistic children responded more to virtual avatars and avoided looking at realistic humanoid robots, which suggests they could be affected by the uncanny valley effect for physical agents. While autistic users of VR report continuous distinction between real and virtual worlds, this aspect allows them to consider VEs to be safe places for training and learning [124]. This suggests a more significant potential for knowledge acquisition from realistic virtual tasks that can be applied to the real-world. In addition to customizable avatars, there are also research opportunities to investigate which additional aspects of VEs can be customized to facilitate a higher likelihood of transferable skills.

3) INTERACTIVE SCENARIOS
Several articles provide promising results for gaze-contingent interfaces that integrate virtual tasks with eye-tracking metrics to facilitate engagement [66], [70]- [72]. Lahiri et al. [66] found that autistic participants were more likely to fixate on an avatar's face when using a gaze-contingent interface compared to a performance-based interface. Specifically, blink-rate was more receptive to engagement with the system. Additionally, autistic participants in [82] had similar results when presented with a gaze-contingent interface and shared a nearly equal number of fixations on an avatar's face as NT participants. There were no studies that assessed if progress from virtual interventions was translated into real-world settings, resulting in an opportunity to expand research in this area to explore transferability to real-life scenarios.

4) INATTENTION AND DISTRACTION
Studies also focused on different ways to examine inattention using distractions. With the exception of two papers ( [66], [74]), studies used distractions closely resembling those in real-life. When presented with non-social distractions, two studies found that autistic participants were less resistant to distractions than the NT control group based on tracked eyeand head-movements [65], [69]. However, some autistic participants were still able to perform similarly to NT participants [65]. In contrast, others performed significantly more poorly, despite having achieved similar scores to NT groups with traditional modalities (e.g., pen-and-paper) [69]. Interestingly, in vocational scenarios, participants in [89] reported feeling unaffected by distractors and found visually distracting environments to be more enjoyable [96]. However, these self-reports contradicted with performance scores which were lower than conditions with no distractions.
In contrast, autistic participants showed significant improvements in eye-tracking measures of inattention and performance following interventions in which interfaces highlighted relevant features in distracting driving scenarios [70], [94] or social contexts [77], [100]. Participatory design sessions also resulted in the development of an interface that filters irrelevant stimuli to augment attention towards important information [52]. These strategies of using visual supports are commonly used in autistic interventions, as they provide environmental structure and help users function more independently [125].
Tasks presenting users with rich scenarios and visual supports engage and train users to focus on essential information. Tracking eye-and head-movements provide valid measurements of improvements of inattention. Given that many inattention tasks are based on realistic situations, research efforts should assess how reactions to real-world distractions are affected by the intervention.

5) FOLLOW-UP SESSIONS
Similar to Khowaja et al.'s [31] review, this systematic review found very few studies that involved maintenance phases or assessed transferability. Four studies that included maintenance sessions demonstrated how well participants retained the significant improvements they achieved during the intervention [71], [86]- [88]. However, the authors did not explore whether these improvements transferred to the participants' real-life. Only one study that conducted maintenance sessions outside the experimental setting confirmed that performance could be generalized to the participant's home [73]. Determining whether generalization is maintained during follow-up is considered a fundamental aspect of intervention studies in ASD [126] and should be factored into the study design.

VI. CONCLUSION AND FUTURE WORK
This systematic review relied on six databases to search for peer-reviewed literature and may have missed relevant papers published in other databases (e.g., CINAHL, JSTOR, Web of Science). Furthermore, while the review procedure also attempted to encompass a range of papers, the search may have missed papers under other terms.
Authors focused solely on literature analysis and a basic meta-analysis to examine the corpus of papers. The current review did not include a quality assessment of individual studies or a formal meta-analysis. Future reviews would benefit from a risk of bias assessment and meta-analyses on evaluative research involving better-reported results to provide a more comprehensive understanding of research in this area.
The findings from this review provide substantial evidence for the effectiveness of using XR in attention-based interventions for autism to support autistic traits. However, the conclusions that can be drawn from this review are limited by inconsistent methodologies (e.g., evaluation measures) and inconclusive findings due to limitations such as small sample size from relevant studies to date. Despite these drawbacks, we were able to identify three principal gaps in the literature that could provide promising research directions. First, the lack of diversity in participants, namely underrepresented autistic populations such as adults, females and other genders, and individuals from lower-income countries could be solved by considering the effects of demographic differences to increase the generalizability of future findings to larger populations. Second, in many papers, there is a trade-off between using an assortment of standardized tasks and assessments and developing specific ones with untested reliability. However, we found benefits and insights were gained when appropriate assessments were combined with physiological measurements. Third, while many papers reported improvements in performance, future XR studies would benefit from further research to establish the specific aspects of interventions that are most effective in generating transferable skills. Furthermore, efforts to understand how performance in XR interventions is transferred to real-world scenarios is essential for the development of cost-effective and scalable applications. Lastly, understanding the varied interaction patterns of neurodivergent end users provides valuable insight to inform XR developers in the design of more inclusive applications and, in doing so, would enhance user experience for a more diverse range of users.
XR offers an accessible and engaging way to support attention differences in autism. With 1.2% of the global population diagnosed as autistic, and potentially even more undiagnosed, it is essential for research to consider how to accommodate autistic trait differences to create accessible and effective therapy solutions and supportive interfaces for engaging comfortably in learning, playing, and interacting.