Augmented Reality for Learning of Children and Adolescents With Autism Spectrum Disorder (ASD): A Systematic Review

This paper presents a systematic review of relevant primary studies on the use of augmented reality (AR) to improve various skills of children and adolescents diagnosed with autism spectrum disorder (ASD) from years 2005 to 2018 inclusive in eight bibliographic databases. This systematic review attempts to address eleven specific research questions related to the learing skills, participants, AR technology, research design, data collection methods, settings, evaluation parameters, intervention outcomes, generalization, and maintenance. The social communication skill was the highly targeted skill, and individuals with ASD were part of all the studies. Computer, smartphone, and smartglass are more frequently used technologies. The commonly used research design was pre-test and post-test. Almost all the studies used observation as a data collection method, and classroom environment or controlled research environment were used as a setting of evaluation. Most of the evaluation parameters were human-assisted. The results of the studies show that AR benefited children with ASD in learning skills. The generalization test was conducted in one study only, but the results were not reported. The results of maintenance tests conducted in five studies during a short-term period following the withdrawal of intervention were positive. Although the effect of using AR towards the learning of individuals was positive, given the wide variety of skills targeted in the studies, and the heterogeneity of the participants, a summative conclusion regarding the effectiveness of AR for teaching or learning of skills related to ASD based on the existing literature is not possible. The review also proposes the research taxonomy for ASD. Future research addressing the effectiveness of AR among more participants, different technologies supporting AR for the intervention, generalization, and maintenance of learning skills, and the evaluation in the inslusive classroom environment and other settings is warranted.


I. INTRODUCTION
Autism spectrum disorder (ASD) is a neurological disorder due to which diagnosed child may face difficulty in social communication or have a repeated or restricted set of behaviors [1]. The American Psychological Association (APA) publishes taxonomies and diagnostic tools referred to as the Diagnostic and statistical manual of mental dis-The associate editor coordinating the review of this manuscript and approving it for publication was Luigi De Russis . orders (DSM); the diagnosis from DSM-IV to DSM-5 was extensively revised. A few definitions of the diagnosis were expanded, while a few definitions were narrowed. The social reciprocity and communicative intent have been submerged to social communication. The notation of symptoms has changed; mild, moderate, and severe are renamed to ''requiring support,'' ''requiring substantial support,'' and ''requiring very substantial support,'' respectively. The number of children diagnosed with ASD has increased with an increase in awareness among parents and caregivers. The prevalence rate VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of ASD in the local context is 1.14 per 1000 or 1 in 87 children [2], while the prevalence rate in the United States is 16.8 per 1000 children. This number may increase if an attempt is made to screen every single child across the country. The student (ASD and neurotypical) learning experience has changed from the use of non-interactive media like a textbook to engaging traditional learning through a teacher in the classroom setting to a more interactive digital learning experience using computers (desktop or laptop), tablets and interactive whiteboards among others [3]. With the changes in learning experiences, the interaction styles are also changing; from the use of keyboard and mice for on-screen interaction in computer to the use of the whole body to interact with the content that resides in a virtual world as in virtual reality (VR) or in a physical world as in augmented reality (AR) or a combination of both as in mixed reality (MR). The interaction with the VR world may require the use of a specialized VR headset, in which many children with ASD might face difficulty in wearing or interacting. The AR application can minimally work with the tablet or smartphone and provides a more ubiquitous approach for the intervention of an individual and ASD in the context of this research [4]. Evidence-Based research has shown that AR attracts the attention of children with ASD [4], [5]. The AR-based applications provide a multimodal interaction to children with ASD for them to learn different skills as a part of intervention or therapy sessions [6]. The market of AR is expected to grow in comparison to VR in the next few years [7].
Each child with ASD is different; it is possible that if one technology-based solution works for one child, it may not work for another child. Therefore, the researchers have started to use different technologies in the interventions for children with ASD to identify the best possible technologies that suites an individual with ASD. Furthermore, the buying cost of each technology may vary drastically; thus evidencebased research on the use of technology can be useful for the parents, caregivers, school, center among others to make a decision based on their need and availability of the budget to buy the required quantity of the technology. In this research, instead of highlighting individual studies conducted by the researchers, the review papers are highlighted as they provide a start-of-the-art of the technology used by the researchers in their studies. The reason for citing review papers is that the writing of review papers has increased in the recent past. Review on AR, in general, includes technologies and features of AR environment [8], medical training [9], games for health [10], a taxonomy of VR and AR [11]. Science, technology, engineering and mathematics (STEM) [12], [13], education [3], [14]- [17], construction [18]. Reviews on ASD include social robotics [19], [20], computer-based intervention [21]- [29], multitouch table [27], VR [4], [30], tablet [31], and miscellaneous technologies [27], [32]- [35].
The search revealed two review papers on AR for ASD; both review papers are briefly described below.
Marto, et al. [36] have conducted a systematic review of studies that have focused on the use of AR for the rehabilitation of people diagnosed with ASD. The authors found 16 primary studies to answer research questions (RQs) related to skills targeted, participants, the technology used, and the findings.
Khowaja et al. [37] have conducted a systematic literature review (SLR) of empirical studies that have used AR to support individuals with ASD to learn social communication skills. A total of fourteen primary studies were found to answers RQs related to skills targeted, participants, technologies used, research design, data collection method, setting, and finding. The results contributed to a bigger study (covering the entire spectrum) presented in this research paper.
To sum up, the essence of this SLR is to provide stateof-the-art research regarding (1) the studies utilizing AR for children and adolescents with ASD to learn different skills, (2) taxonomies, and (3) recommendations for future research.
The remaining part of this article is structured as follows. The research method used in this SLR is described in section 2. The results are presented in section 3, whereas section 4 enumerates the research findings. Section 5 concludes this SLR by outlining the limitations of this SLR and the recommendations for future studies.

II. METHOD
Kitchenham [38] has defined a three-stage process for SLR, where each stage involves various activities. In this research, the SLR process defined in [36] is followed. The following sections discuss the stages involved, while their subsections discuss the activities involved at each stage.

A. PLANNING THE REVIEW 1) RESEARCH QUESTIONS
A total of eleven research questions (RQ) and five subresearch questions (SRQ) were formulated to carry out a detailed review of the topic. The research questions related to the objectives are as follows: • RQ1: What is the demographic information of the primary studies?
• SRQ1: When the primary studies were published?
• SRQ2: Which first authors have frequently published the primary studies?
• SRQ3: Which co-authors have frequently published the primary studies?
• SRQ4: Which countries have published the primary studies?
• SRQ5: Which venues are used by the authors of primary studies?
• RQ2: Which learning skills have been targeted in primary studies?
• RQ3: Which participants have been targeted in the primary studies?
• RQ4: Which technologies have been used in the primary studies?
• RQ5: What research designs are used in the primary studies?
78780 VOLUME 8, 2020 • RQ6: Which data collection methods are used in the primary studies?
• RQ7: Which settings are used in the primary studies? • RQ8: Which evaluation parameters are used to analyze the performance of the participants in the primary studies?
• RQ9: What are the outcomes of using AR in primary studies?
• RQ10: Did AR support in the generalization of the learning skills?
• RQ11: Did AR support in the maintenance of the learning skill over the period? Most of the terminologies used in the RQs have plain meanings except the terms used in RQ7, RQ10, and RQ11. These are briefly described below.
In RQ7, the setting refers to the environment (classroom, natural, etc.) where the evaluation was conducted. In RQ10, the generalization of learning refers to a transfer of skills learned from one content, situation, or setting to new content, situations, or settings. There is no defined procedure to conduct a test of generalization of learning skills. Therefore, the researchers themselves decide to conduct generalization tests in terms of content, situation, setting, or a combination of them. Lastly, in RQ11, the maintenance of skills learned refers to the retention of skills acquired over time; there is no defined period of intervals at which the maintenance tests are conducted. Therefore, the researchers decide one or more intervals following the withdrawal of intervention to determine the maintenance of skills learned.

B. SEARCH STRATEGY 1) SEARCH STRINGS
The steps used to come up with the search terms are as follow: 1. Identifying the major keywords from each research question 2. For each keyword, identifying an alternate spelling, synonyms, and acronyms 3. Identifying the keywords used in the related research papers or book chapters 4. Identifying and dividing the similar keywords into categories 5. Using the Boolean operator OR by combining the keywords in each category 6. Using the Boolean operator AND by combining the keywords across the categories 7. The following table shows the categories, and the keywords identified. The query generated based on the keywords in each category is as follow: (''Autis * '' OR ''Autism Spectrum Disorder'' OR ''ASD'' OR ''Asperger syndrome'' OR ''Pervasive Developmental Disorder -Not Otherwise Specified'' OR ''PDD-NOS'' OR ''Rett syndrome'' OR ''Childhood disintegrative disorder'') AND (''Mobile'' OR ''Tablet'' OR ''Smartphone'' OR ''Phone'' OR ''Smartglass'') AND (''Augmented reality'' OR ''AR'' OR ''Mixed reality'' For the trial, this query was executed in one randomly chosen database, and the number of studies returned in result by a database was below 10. Therefore, it was decided to reduce the number of keywords to get more results. After lowering the keywords, the same database was searched again, and it improved the results. There were certain limitations in the databases, and these limitations varied from one database to another. The restrictions include 1) several characters in the query, 2) type of document to search (journal, conference proceedings, book chapter, etc.), 3) years, and 4) month and day, among others. The following sub-section describes the actual query used in each database and the parameters used along with their values.

2) ELECTRONIC DATABASES
Eight electronic databases were selected to search primary studies for this review. These databases include Web of Science, Scopus, ACM Digital Library, ScienceDirect, IEEE Xplore, SpringerLink, SAGE, and Google Scholar. The search was performed on the title, abstract, and indexed terms for accepted or published journal papers, conference proceedings, and book chapters. Table 2 presents the procedure used to perform queries in each database, and the notes for the reader.

3) SEARCH PROCESS
The systematic literature review requires a rigorous search in the selected bibliographic databases based on the subject of discussion. The search process is following the typical steps, as described below. VOLUME 8, 2020 Step 4: Add the references from a recent SLR [29] that includes studies on the use of augmented reality with children with ASD. 5.
Step 5: The next step was to remove all the duplicate references returned from step 4 and include the new references to the list of prospective studies. 6.
Step 6: The next step was to identify the prospective primary studies for the review. For this, the title of the study was considered first, followed by abstract, introduction, conclusion to the remaining sections until a decision to include or exclude a study was made. 7.
Step 7: The references of each identified study were manually searched to identify further additional prospective studies that can also be included as a part of the review. 8.
Step 8: The last step was to perform the quality assessment of the prospective studies and include or exclude a study. The details of quality assessment are discussed below. 9. Table 2 presents the details of the search performed in each database, while Figure 1 shows the flow of search and selection of primary studies.

C. CONDUCTING THE REVIEW -STUDY SELECTION
From the search in eight databases, 331 prospective studies were found. Next, the removal of incorrect citations reduced the prospective studies to 223. The removal of duplicate studies further reduced the prospective studies to 184. A total of 27 primary studies were to be included in the list of prospective studies from a recent SLR [29]; however, 24 studies were duplicates; therefore, adding three studies increased the total to 187 prospective studies. Next, the analysis of the remaining studies was carried out based on the selection criteria to identify the prospective studies. First, the title of each study was considered, then the content of each primary study, starting with the abstract, was considered. Hence, the primary studies incapable of addressing one or more of the RQs related to this SLR excluded from the list of prospective studies. Then, the inclusion and exclusion criteria were also used as a part of the selection criteria to reduce the number of prospective studies. The inclusion and exclusion criteria are discussed below. The collective selection process reduced the prospective studies further down to 55. Then, the references section of each remaining study was manually checked to identify more prospective studies, but it did not reveal any new prospective study. Lastly, the qualitative assessment of the primary studies was performed; this process revealed 30 primary studies. The qualitative assessment of primary studies is also described below.
The following set of inclusion criteria was utilized to determine the studies that would be covered in the review: (1) studies that focus on the augmented reality app for children with ASD; (2) the study directly answers any one or more of the research questions, (3) published in between 2005 and 2018; and (4) written in English.
The following set of exclusion criteria was utilized to determine the studies that would not be covered in the review: (1) grey papers, i.e., papers without bibliographic information such as publication date/type, volume, and issue numbers were excluded; (2) do not have a link with any of the research questions, (3) papers written in any other language than English. The rest are excluded.
The step-by-step flow of the entire search process and the selection of studies is depicted in Figure 1.

1) QUALITY ASSESSMENT OF PRIMARY STUDIES
The quality assessment (QA) of each study is typically carried out to measure the quality of contents presented in the studies. The following quality assessment questions were created to evaluate the relevance, completeness, and creditability of the primary studies: 1. QA1: Are the aims/objective clearly defined? OR Have authored clearly defined aims/objectives? 2. QA2: Is AR-based solution clearly defined? 3. QA3: Is the overall research methodology used in the research clearly described? 4. QA4: Are the data collection methods adequately described? 5. QA5: Is the participants' recruitment procedure clearly stated? 6. QA6: Are research findings clearly reported? 7. QA7: Are the limitations of the current work adequately addressed? 8. QA8: Are future works mentioned?
Each question has one of the three possible answers: ''yes'' represents 1, ''partially or partly'' represents 0.5, and ''no'' represents 0. The quality score of each study was then calculated by summing up the score representing the answer to each question. Five authors performed the qualitative assessment of the studies; each author, referred to as the first assessor, was randomly assigned eleven studies to assess the quality of their studies and mark the answers. Table 9 in the appendix presents the quality assessment of all the primary studies by the assessors. The next step was to calculate the inter-rater reliability of the scores once all the authors assessed the quality of assigned studies. Each author referred to as the second assessor was randomly assigned the studies of one of the first assessors and was further asked to randomly choose and assess the quality of 30% of the studies, i.e., 3 out of 11. Then, the second assessor was asked to communicate and discuss the scores of the studies they chose with the first assessor. It was decided that if: 1. the difference between the quality score of first accessor and second accessor for each question related to any study is greater than or equal to 1; then, both accessors would discuss the study based on the question to resolve the inconsistency between the scores and see if the difference gets below 1. 2. the difference of quality scores between the first accessor and second accessor for any study would be equal to or greater than 2; then, the second assessor would assess the quality of the remaining eight studies as well to discuss and resolve the inconsistency between the scores. 3. the total quality score of each study was greater than or equal to 4, i.e., 50% of the score; then, the study will be shortlisted for further review. Table 10 of the Appendix section presents the inter-rater reliability assessment of the primary studies. For each study assessed by an assessor, two rows of the qualitative assessment scores are shown. The first row shows the scores given by the first assessor, while the second row shows the scores given by the second accessor (i.e., column code refers to the second accessor). For each study, the difference of total scores between the first accessor and second accessor is less than 2; therefore, the qualitative assessment of the remaining studies was not done. Based on the qualitative assessment of the studies, a total of 30 primary studies were shortlisted for further review.

D. DATA EXTRACTION
A set of guidelines related to the data extraction process was followed to identify relevant information from the primary studies. The data extraction form was designed so that all the authors can use the form to accurately record all the information for each study assigned to them. The attributes were decided considering all the research questions related to this SLR. The attributes include: (i) title; (ii) authors and their details; (iii) venue of publication; (iv) year of publication; (v) skills targeted in the study; (vi) participants' characteristics and their symptoms; (vii) technologies used; (viii) AR framework/toolkit used; (ix) research design; (x) data collection method (questionnaire, interviews, focus groups, survey, observations, writing essay etc); (xi) setting (classroom, home, controlled research environment, etc.); evaluation parameters; (xii) outcome of the study, (xiii) generalization; (xiv) maintenance, (xv) limitations, (xvi) future work.

III. RESULTS
The summary of primary studies in terms of skills targeted, participants, technology, research design, data collection methods, and setting used are presented in Table 3, while the other details, including evaluation parameters and outcomes of the studies, are presented in Table 4.  Figure 2 shows the publication trend of the thirty primary studies. Each vertical column shows the number of studies accepted for the publication in a research journal or presented at the conference. It can be seen that the trend to use AR to provide intervention of different skills to children with ASD has increased since 2014.  The authors of each primary study specify multiple keywords in the manuscript. These keywords are used by bibliographic databases and search engines for indexing, among other purposes. To help readers see all the keywords used in all the primary studies, the word cloud of these keywords is shown in Figure 3. The font size of each word in the word cloud depicts its frequency, the larger the word, the higher the frequency. Similarly, the smaller the word, the lower the frequency. The top six keywords with their frequencies used in the primary studies include reality (N=30), augmented (N=28), autism (N=25), spectrum (N=12), disorder (N=10), and education (N=9). The top five first authors in the primary studies are shown in  Table 6 presents the top seven co-authors who have also worked as the first author of at least one primary study. Three authors, namely Sahin, Cihak, Keshav, have worked in three studies each; while, two authors, namely Vahabzadeh, I. J. Lee, have worked on two studies each. Lastly, McMahon and Escobedo have worked on two studies each. During the analysis, it was found that one of the top co-authors, i.e., Sahin has two studies as a first author and another coauthor, i.e., McMahon has three studies as a first author.  Table 7 shows countries of the first author and all the coauthors of the primary studies and the number of primary studies contributed by each country. The analysis shows that the first author of eleven primary studies is from the US, while the UK, with four primary studies, stood second, and Taiwan, with three primary studies, is on the third. Four countries, namely Brazil, Indonesia, Mexico, and, Spain had two primary studies each, and lastly, Argentina, France, Greece, and Japan had one primary study each. It was found that authors from Brazil and Portugal collaborated in two primary studies, and they conducted an evaluation in Portugal (collaborator country), while the authors from the UK and Switzerland collaborated in one primary study, but the location of evaluation is not mentioned in the study.  Table 8 shows all the venues, their type, and references of the primary studies where papers were either presented or published and the publication years. A total of 21 primary studies were published in journals, eight primary studies were presented at various conferences across the world, and lastly, one primary study was published as a book chapter. IEEE Pervasive Computing, Journal of Research on Technology in Education, Journal of Special Education Technology and International Conference on Enterprise Information Systems (ICEIS) the top four venues with two publications each, whereas the remaining venues had one publication each. The rows with the top three venues and types are highlighted in the bold format.  navigation, pretend play, repetition, social communication, social reciprocity, socioemotional skill, and values. The social communication skill is used in 12 studies, pretend play is used in 3 studies, while navigation and facial expression have been used in 2 studies each. The remaining skills are used in 1 study each. There are five studies in which authors have used two or more skills. There are five studies in which authors have used two or more skills. All primary studies have been grouped based on the skills targeted in the studies and briefly described below in alphabetic order of skills targeted.
1. Attention Management: Escobedo et al. [43] conducted a study to investigate if the usage of Mobile Object Identification System (Mobis) among children with ASD increases selective and sustained attention and provoke positive emotions. The authors recorded the whole interaction and analyzed it to present the findings. 2. Brush teeth: Cihak et al. [52] conducted a study to examine the use of AR to teach elementary-age school students with ASD about chain tasks, more specifically, how to brush their teeth. The authors used a multiple probe design across participants design by Hammond and Gast [67] to demonstrate the relationship between augmented reality and brushing teeth independently. 3. Cooking: Papadaki et al. [61] conducted an experimental evaluation of the ''Let's Cook'' game to teach children with cognitive impairments how to prepare simple meals. The authors carried out an evaluation of Let's cook game in two ways: 1) using an evaluation method of the user observation by two usability experts. Both experts observed all the sessions and the interaction of the student with the teacher from the distance to avoid distracting the child.
2) The answers submitted by the student when they were asked any question on-screen as a part of the game. 4. Facial expressions and emotions: Chen et al. [45] have developed AR-based self-facial modeling (ARSFM) app for children with ASD to learn and improve their emotional expression recognition and social skills. The authors designed 3D facial models of the virtual characters that can fit the face of all the children and six facial expressions that represents basic emotions (happiness, sadness, fear, disgust, surprise, and anger). They conducted a study to assess the use of ARSFM to become aware of the facial expressions in a school setting. Chen et al. [51] have developed an augmented reality-based (AR) video modeling (VM) with a storybook (ARVMS) for children with ASD to learn crucial social abilities that can help them understand the facial expressions and emotions of people in a social situation/gathering. The authors have used AR in this study for multiple functions; AR extends the social features of the story, but the attention is restricted to the most important parts of the video. The authors have VOLUME 8, 2020 investigated the use of ARVMS can improve VM and encourage children with ASD to focus on specific parts of the videos. 5. Handling Plants: Richard et al. [39] have designed a non-immersive recreational and educational augmented reality application (ARVe -Augmented Reality applied to Vegetal field) for young children to handle 2D and 3D plant entities. The authors conducted a study to investigate the performance, behaviors, and attitudes of using ARVe among the participants. The participants recruited were a mix of children with cognitive disabilities, including ASD and typical children. 6. Literacy: McMahon et al. [53] have conducted a study to examine the use of AR to teach words of science vocabulary to college students with ASD and ID. The authors recorded the student's ability to define and label three sets of vocabulary words (bones, organs, and plant cells) in terms of the number of correct responses. 7. Navigation: McMahon et al. [48] have investigated the use of location-based AR navigation in comparison to Google maps and paper maps as a navigation aid of postsecondary education college student with ASD and ID to travel to an unknown business location (store, café, university, museum, etc.) within the city. The visual analysis of the percentage of directions checks completed independently during the baseline shows that they were unable to navigate independently. McMahon, et al. [49] have conducted a study to compare the effect of three navigational aids with students with intellectual disabilities (ID) who attended a postsecondary education (PSE) program. These navigational aids include a printed map, a Google map on a smartphone, and an AR app for navigation). The authors used an alternating adapted alternating treatment design to compare three navigational aids. 8. Pretend play: Bai et al. [41] and Bai et al. [44] presents an experimental evaluation of a proposed AR system to support and encourage children with ASD to pretend play. The authors recorded all the interactions of the children in both conditions, i.e., AR and non-AR. The analysis of the recorded sessions was conducted based on five play categories: pretend play, constructive play, relational play, simple play, and no play. Additionally, Bai et al. [44] conducted an experiment to control the potential learning effect by a particular order of AR and non-AR conditions. Dragomir, et al. [56] have conducted an evaluation of the augmented reality app to engage children with ASD in pretend play. The entire evaluation was video-recorded; the video was analyzed in 10-second intervals and labeled as no play, sensorimotor play, relational play, functional play, and pretend play. 9. Social communication skills: Lee et al. [5] have combined the use of AR with concept map (CM) strategy as a training tool for children with ASD to focus on nonverbal social cues and teach them how to reciprocate when they socially interact with others. They have investigated if the AR can be used to train them to focus on nonverbal cues and teach them how to reciprocate when they socially interact with others appropriately. The AR-based CM system (ARCM) developed by the authors is more like a miniature theatre in which child play's role as an avatar's social situation. Multiple users (children, parents, faculty) can simultaneously use the system. Chung, et al. [46] have conducted a study to investigate the differences in terms of communication, positive affect, and aggression in children with ASD while playing AR-based games (AVGs) versus traditional videogames. The AVGs use body motion as an input than joystick or buttons. The hypothesis was that AVGs would increase social behavior (communication and positive affect) and decrease aggression among children with ASD. Da Silva, et al. [42] have developed an AR-based system for the therapist to design interactive activities based on Augmented and Alternative Communication (AAC) and Applied Behavior Analysis (ABA) for assistance during speech therapy sessions of children with ASD. They conducted a qualitative study to test if the functionalities of the system were clear to the therapist and that it can support them in the interventions. Da Silva, et al. [47] have conducted a study on the use of AR to assist in the intervention of communication and language of children with ASD. The authors combined the elements of Augmented and Alternative Communication (AAC) and Applied Behavior Analysis (ABA) in the Speech Therapy with Augmented Reality (STAR) system developed as a part of their research. The STAR allows therapists to create customized interactive activities for each child. The goal was to test if the software was clear to the therapist and to evaluate its usage as a supporting tool in the interventions performed. Farr, et al. [40] have used Knight Castle playset with Playmobil figures and digitally augmented them with the voice that can be configured with the voice of an individual child with ASD. They conducted a two-group, two-condition (configuration, non-configuration) study using the Augmented Knight Castle (AKC) play set and investigated if the configuration of AKC with a child's own voice improves their social interaction in comparison to when AKC is used with a default voice. Two groups of children were given ten Playmobil figures in a configuration condition, whereas twenty Playmobil figures were given to another two groups in a non-configuration condition.
Keshav, et al. [54] have conducted a study to assess the tolerance and usability using Brain Power Autism System (BPAS) 1 novel smartglasses to improve the social communication of children with ASD. The BPAS is based on Google Glass Explorer Edition or other smartglasses and uses both AR and affective artificial intelligence to children with ASD to learn social and emotional skills. Liu, et al. [55] present a feasibility report of Brain Power System (BPS) while it was used to provide one behavioral session to children with ASD. The BPS is the first AR smartglass for children and adults with ASD to learn, practice, and improve social and cognitive skills using gamified augmented reality applications (for instance, Face Game and Emotion Game). The authors used the aberrant behavior checklist (ABC), subjective caregivers, and user reports to assess tolerance, usability, and user report. Sahin, et al. [62] have assessed the safety and potential negative effects of the Empowered Brain system for children and adults with ASD to learn about the socio-emotion and cognitive skills themselves. The Empowered Brain serves as a social communication aid that consists of AR smartglasses with apps that allow children and adults with ASD to coach themselves on important socio-emotional and cognitive skills. It consists of an AR smartglass with apps to learn the above-mentioned skills. Eighteen users were recruited to use the system, which consists of three apps, namely, Transition Master, Face2Face, and Emotion Charades. Each user used the system in the presence of their caregiver for 10 minutes once they were able to tolerate the wearing of the system. A structured interview was conducted with the user and caregiver once the user has completed the use of all the apps. Sahin, et al. [63] have investigated the usability and accessibility of Glass Enterprise Edition, also referred to as a Glass with children with ASD and their caregivers. The Glass is a success of Google Glass smartglass. A total of 8 children with ASD and their caregivers were recruited for the experiment. While wearing a Glass, each child could use any app of their interest. The interaction with the Glass was recorded using video and photographs for which prior consent was taken from each user and its caregiver. Each pair of a child and a caregiver were asked a set of questions as a part of the semi-structured interview.
Vahabzadeh, et al. [66] have conducted a study to investigate the changes in ADHD-related symptoms among the participants immediately after the use of Empowered brain system (behavioral and social communication aid for children with ASD running on AR smartglasses). They divided the participants into two groups (low-ADHD related symptom group and high-ADHD related symptom group) based on their ABC-H scores. Both groups had four participants. Lorenzo, et al. [58] have conducted a study to assess the effectiveness of an AR training program for children with ASD to improve social skills. They recruited and divided 11 students into two groups for the intervention of 20 weeks with two sessions per week for 15 minutes each; the control group consists of 5 males, while the remaining five males and one female were in the experimental group. They used two instruments for the data collection: 1) Autistic Spectrum Inventory by (Riviere, 2002), and 2) Quick Vision application for the AR-based intervention. The analysis of the data was carried out in five ways: 1) comparison in pretest scores between experimental and control groups, 2) comparison in pre-test and post-test scores in the control group, 3) comparison in pre-test and posttest scores in the experimental group, 4) comparison between experimental and control groups in pre-test and post-test scores, and 5) comparison in pretest and posttest global scores in the autistic spectrum inventory between experimental and control groups. Menéndez [60] investigated the use of KiNEEt, a Microsoft Kinect based system at Special Education Center Princesa Sofia, to improve physical and cognitive skills among students with special needs. The system supports four activities: 1) Numbers, 2) Shapes, 3) Handwriting, and 4) coordination. The system allows students to interact with the activities using gesture recognition and body motion. Two types of evaluations were conducted; five experts, including special education professional, physiotherapist, education ICT professional, and computer science professional were used in the first evaluation. The second evaluation was performed with four actual users, including a child with autism, a child with hearing impairment, a child with visual impairment, and a child with physical impairment. The expert evaluation was carried out using survey questionnaires, and questions were related to usability evaluation, educational evaluation, and students' behavior. The user evaluation    [40] converted Playmobil figures into augmented toys that can speak and played with; the voice of these ATs can be programmed with the voice of a participant as well.
Chung, et al. [46] used Microsoft Xbox as a part of their research to compare the sedentary video games and active videogames among three dyads of a participant with ASD in terms of joint positive effect, reciprocal communication, and aggression.
Takahashi, et al. [64] have proposed FUTUREGYM, which is an interactive school gymnasium with a large-scale, interactive floor projection system in a school setting to improve the interpersonal skills of the participants. In this study, we have categorized the research design used by the primary studies into four based on research design types in [68]. The categories include post-test only, post-test control group, pre-test and post-test, and pre-test and posttest control group. These research designs are mostly used for evaluations in educational contexts. The first type of research design-post-test only involves the evaluation of an independent variable at the end of the system application or intervention. The second type-post-test control group compares the evaluation of different groups of participants at the end of the evaluation. The third type-pre-test and post-test involves the evaluation of certain variables before and after the intervention to measure the effects of the intervention. Lastly, pre-test and post-test control group type evaluate the intervention variables before and after intervention with different groups of participants to measure the effects of the VOLUME 8, 2020 intervention on each group before and after the intervention. Among the primary studies, the pre-test and post-test design within 14 studies, was the most used design. These studies include [6], [43], [45], [46], [48], [49], [51], [52], [54]- [56], [59], [65], [66].
The post-test control group design was used in four primary studies [40], [42], [53], [5]. Lastly, pre-test and post-test control group was used in two studies [58], [64]. Among the primary studies, five main data collection methods have been used, and these methods include an interview, focus group, programmatically, observation, and questionnaire. The prominent method of data collection is observation. Six studies applied the questionnaire, four applied an interview, and only one study applied focus group. While the Majority (N=24 out of 30) of the studies have used a single method of data collection, few (i.e., 6 out of 30) of the studies combined more two or three methods. Six different combinations of the data collection methods were identified as an interview, observation, and focus group [6], observation and questionnaire [41], questionnaire and interview [5], questionnaire and programmatically [60], automatic and interview [62], and lastly programmatically, interviews, and observation [63]. The brief description and applications of these methods are explained as follows: 1) Interview: This method involves taking the views of experts or users on the design and usability of the application. Some of the experts included in the interview sessions were teachers, parents, therapists to understand their views on the behavior of the users. The users interviewed were mainly the ones with verbal abilities. The objective of the interview with users was to understand usability, feasibility, tolerability of the application, and how it impacted their current behavior and engagement [6], [55], [5]. These primary studies have applied this method as a post-intervention evaluation [6], [55], [63].  2) Focus Group: This method of data collection is commonly used with a group of people to address the issues and approach of a design collectively. Among the primary studies, the focus group was used as a follow-up with teachers and verbal users in one primary study [6]. This study used this method along with other methods such as Interview and Observation to address the technology adoption issues and interesting uses that have emerged from the long-term use of the application. 3) Programmatically: This method of data collection evaluated the user interactions with the AR technology to understand its usability and assess user performance based on the programming code implemented in the app. Only one primary study [58] has used a programmatically method of data collection alone, while one other study [60] has used it with a questionnaire. 4) Observation: This method provides detailed information on the behaviors of the users while interacting with the AR technology used as a part of the intervention. The specific techniques used in the primary studies were video data analysis [6], [39], [41], [56], [63]- [65]. Another method of observation used was a direct visual inspection and direct observation by experts or caregivers [47], [61], [65]. The experts were present during each experiment and were situated at a distance from the user and the teacher to avoid distracting the user. During each session, the experts paid close attention to the way both the user and the teacher interacted with the system. Another technique, aside from video data analysis, was Henry Mintzberg's structured observation method and lagged sequential analysis. These techniques estimate the total and descriptive statistics of the time users spent paying attention and exhibiting behavior problems, and the time teachers spent prompting [6]. Overall, twenty one studies have used observation method along for the the data collection [39], [40], [42]- [52], [54]- [56], [57], [59], [61], [64], [65]. Four studies used it with one other data collection method, such as a questionnaire [41], interview [5], [62], [63]. Lastly, one study used it with two data collection methods, i.e., focus group and interview [6]. 5) Questionnaire: The views of the therapist were evaluated on the performance of the prototype using a 5-point Likert scale from strongly disagree to strongly agree [5]. This method was used to ensure the social reliability and validity of the test is close to a real situation. Also, the questionnaire was used to assess the presence of the various traits evaluated before and after the intervention performed [58]. Three primary studies have used this method with other data collection methods such as Observation and questionnaire [41], interview and questionnaire [5], programmatically, and questionnaire [60]. Nonetheless, the questionnaire was also used to get feedback from the users, such as a social validity questionnaire regarding the use of AR to learn new vocabulary words [53].

H. RQ8: WHICH EVALUATION PARAMETERS ARE USED TO ANALYZE THE PERFORMANCE OF THE PARTICIPANTS IN THE PRIMARY STUDIES?
The evaluation parameters have been categorized and presented into two types: 1) machine-assisted, and 2) humanassisted. The machine-assisted data collection methods use programming code to automatically record the actions and calculate the performance of the participants (e.g., [60].) The human-assisted evaluation parameters include all those parameters manually marked by an individual to get the views on subjects, attitudes, or behaviors. The number of correct and incorrect answers can be an example of both programming-assisted and human-assisted. Assume, a user is shown a question followed by multiple options to choose the best option. An example of a programming-assisted evaluation parameter is when a user selects an option, the programming code running at the backend of an app knows that the chosen option is either correct or incorrect. In contrast, if an app can show a question followed by a set of possible options only when a user selects an option. Then, a human-assistance is used to watch a live session or recorded user interaction with the app to calculate the number of correct and incorrect answers.

I. RQ9: WHAT ARE THE OUTCOMES OF USING AR IN PRIMARY STUDIES?
The results of the primary studies are presented based on the skills targeted in the studies utilizing the same groups used in RQ2. The results of each study are briefly described below: 1. Attention Management: The results of Escobedo, et al. [43] indicate that low-functioning children can also use Mobis to uncover digital content. During the therapy, the children started to walk in the classroom to identify the objects painted on the wall as well as outside the classroom. The Mobis increased engagement time by 20% to perform the given tasks, and they were more motivated. The selective attention of the children was increased by 62% while using Mobis. The sustained attention of the children was increased by 45% while using Mobis. Selective attention and sustained attention were quite low before and after using Mobis. This shows a positive impact of using AR in the therapy sessions. 2. Brush teeth: In Cihak, et al. [52], the average performance of all three students in terms of the percentage of steps performed independently increased from 24.7% during baseline to 98% during the intervention once AR was introduced and 100% independence during the maintenance stage, i.e., nine weeks following the withdrawal of the intervention. 3. Cooking: The analysis of the results in [61] revealed that students were able to comprehend the layout of the game and the interaction style. The graphical elements were appropriate, and design was intuitive; they were aware of the locations of the items to appear and the required interaction with it. 4. Facial expressions and emotions: The visual analysis of the results in [45] shows that the performance of all the children was low during the baseline. It was improved during the intervention and slightly lowered in the follow-up sessions, but it was higher than the baseline. The statistical analysis shows that the mean difference between performances of baseline and intervention was significant, and the mean difference between performances of baseline and follow-up was also significant. In the study conducted by Chen, et al. [51], during the baseline, it was found that children were paying attention to the irrelevant parts of the video scene; they focused on only those parts which were of interest to them. During the intervention, they had to focus on the augmented cues provided by the ARVMS in the storybook. This brought curiosity and interest in the augmented hint and the facial expressions and started to ask questions to the therapist like why the facial expressions changed, among others. They were able to differentiate facial expressions representing specific VOLUME 8, 2020 emotions. The analysis shows that performance significantly improved from the baseline to intervention, and they retained it during the maintenance phase, the emotional and social skills they learned during the intervention.

Handling Plants: The analysis of the video-recorded
data in [39] shows that ARVe increased the focused attention, brought enjoyment and positive involvement feedback among the children. All the children were able to use markers and had difficulty in interaction with the app. Overall, 82% of the children showed an interest in the experiments. Nine out of eleven children with cognitive disabilities developed a keen interest in using ARVe. Based on the motivation of performing tasks, the authors found that the level of interest between children with cognitive disabilities and typical children was not the same. 6. Literacy: The visual analysis of the results in [53] shows that all the students demonstrated improvement in the ability to define and label all sets of vocabulary words. This shows that the use of AR provided an effective instructional intervention environment for teaching science vocabulary. 7. Navigation: The analysis of the results in [48] show that students were able to navigate independently by making their own decisions while using the AR navigation tool; they did not require any assistance. However, for the Google maps and paper maps, they needed personsupported assistance in all the sessions; they preferred AR navigation tool over Google maps and paper maps. In terms of independently checking the direction for all navigation aid in the study [49], it was found that checking increased to 10.9%, 46.8%, and 87% for a paper map, a Google Map, and an AR map, respectively. This shows that AR navigational map was more effective than a Google map and a paper map and that students were able to reach an unknown location using an AR map. 8. Pretend play: In terms of the play frequency in [41] and [44], the authors found that the mean frequency of pretend play is higher in AR than in non-AR. Similarly, the constructive play was higher in non-AR than in AR. Lastly, the relational play, simply play, and no play remained the same in both conditions. In terms of the play duration, the percentage of time spent in pretend play by children in AR condition was significantly higher, while the percentage of time spent in constructive play was significantly higher in non-AR. The total relevant actions performed, including reality-based and novelty-based, are significantly higher in AR condition than in non-AR condition. Parents were asked a set of questions based on the 5-points Likert scale on children's engagement in terms of the cooperativeness, attentiveness, and happy smiling. They range of values in scale includes varied from one question to another; the values include very good to very poor, frequent to never, the first session to the last session, strongly agree to strongly disagree. In terms of the engagement, the mean scores of attentiveness and cooperativeness are between ok and high in both conditions, while the score of happy smiling varied from sometimes to frequent. A marginally significant difference in happy smiling. The parents found children playing more in AR condition. In Bai et al. [44], the main effect of the order on pretend play and constructive play based on the statistical analysis were not significant on either pretend play frequency or constructive play frequency. This indicates that the learning effect is efficiently controlled.
The analysis in the study by [5] shows that all children had lower scores during the baseline. The performance significantly improved from the baseline to intervention, and they retained it during the maintenance phase, the emotional and social skills they learned during the intervention. The scores of training effect given by the therapist indicate that the scores of all children were low at the start, but it significantly and dramatically increased, and it was also higher in maintenance than the intervention. The authors found that the use of ARCM was useful in teaching children in understanding and recognizing the social relationship of an individual and how to respond with an appropriate greeting. It also allowed children to imitate the modeled behavior without actually facing it in the real situation.
The pre-post analysis in the study [56] shows a significant increase in the percentage of time spent in pretend play.
Social communication: Chung et al. [46] found that one of the three groups had a positive effect, while social behavior remained unchanged in the remaining two groups. It could be because AVGs are less enjoyable than traditional videogames. However, parental feedback shows that AVGs have equal or more impact on social behavior compare to traditional videogames. The authors noted that their participants who actively play traditional videogames did not have improvements in the quality of social engagement, but AVGs have the potential of improving peer-to-peer interaction.
The analysis of the results in the studies [42], [47] shows that children are highly motivated to use the system because of the 3D models and the animation shown on top of the card. They showed a high degree of engagement and asked the therapist to give the system to play as they arrived for the intervention and used it for a longer duration on their own. It was also found that passive students who always used to run away from the therapy sessions towards the window were also very active and performed the tasks as instructed by the system.
Farr et al. [40] found that significantly less amount of time was spent in solitary behavior when the voice was configured in comparison to when the default voice was used. The voice configuration option increased the interest of the children, and they were more active when Playmobil figures of the AKC set.
The preliminary results by [54] show that 91% (19 out of 21) of the children with ASD showed tolerance in all three measures (caregiver report, initial tolerability threshold, and whole session tolerability threshold). The caregivers' report shows that 19 out of 21 children were able to use the BPAs successfully. The users who were able to communicate well also reported that the use of BPAS as comfortable.
The caregivers' report in [55] indicates that users had fun and enjoyed using the system. They felt the system had a high tolerance and engagement. They reported the increase in non-verbal communication, eye contact, and social engagement, while verbal communication was not affected. The analysis of differences in five sub-scales (Irritability/agitation, Lethargy/social withdrawal, Stereotypic behavior, Hyperactivity/non-compliance, and Inappropriate speech) of ABC between pre-intervention and post-intervention shows improvement in all subscales of the users.
In the study conducted by Sahin et al. [62], (N=16 out of 18), users were able to use at least one app, while the remaining two users did not show interest in wearing the system. They did not express any negative effect but were non-verbal and were relatively young (5.5 and 5.8 years) compare to (12.2 years +-5.2 years) of the remaining users. (N=14 out of 16 remaining users) and all caregivers reported no minor negative effect, while all users and caregivers reported no major negative effect.
Sahin et al. [63] found that all children were able to complete the session successfully and reported no stress while using Glass, or experience of sensory or emotional issues, willingness to use Glass in different settings (home, classroom, etc.). While all the caregivers felt it was fun for their children, and most (N=6 out of 8) of the caregivers felt that it was a better experience than expected.
In the study conducted by Vahabzadeh et al. [66], the authors found that all participants were able to use the smartglasses and complete all the sessions. The postintervention ABC-H scores among (N=6 out of 8) participants showed improvement in the ADHD-related symptoms at 24 hours and improvement among all the participants at 48 hours. In terms of mean ABC-H scores at 24 hours post-intervention, there was a decrease in score by 54.9% in the high ADHD group and 20.0% in the low ADHD group. In terms of mean ABC-H scores at 48 hours after the session, there was a decrease in score by 56.4% in the high ADHD group and 66.3% in the low ADHD group.
The statistical analysis in [58] show no significant difference between control and experimental groups with or without using AR-based intervention; however, the qualitative feedback provided by the experts revealed an improvement in the focus of attention and motivation among children through AR which can provide fruitful results in the development of skills for children with ASD.
The analysis of the results in [50] shows that the frequency of access to the computer was higher than the frequency of access to write using PECS.
The authors in the study [64] found that the use of pacemaker characters in the trial of Circle-Run game trials improved coordination among the children of each group as they followed their pace and position with that of the pacemaker character. This coordination did not exist in the pre-trial when the pacemaker character was not used in Circle-Run. The running formation improved during the post-trial compared to pretrial. The analysis of the constellation game shows that the number of helping behaviors and positive behaviors increased after playing the game.
In the study conducted by Taryadi and Kurniawan [65], the qualitative analysis of the communication using AR-PECS shows that the average performance of children was 47% before the start of the intervention. The performance improved to 65% during the intervention, and it further improved to 76% after the intervention. The three factors mentioned by the authors that contribute to effective learning using PECS include 1) the items that motivate a child to initiate communication, 2) using concrete symbols as a real one, and 3) giving rewards. 9. Miscellaneous: The quantitative analysis by Tentori et al. [6] revealed that Mobis and FroggyBobby improved attention span, and results were significant, while attention span slightly decreased using Sensory-Paint and results were insignificant among children the ASD. BxBalloon supported children of being aware of all the bad behaviors, and they stopped when felt it is bad behavior. They showed positive behaviors and had fewer tantrums. The qualitative analysis showed that engagement level increased across these prototypes, and after little training at the start, they were able to perform exercises on their own. In the study conducted by Lumbreras et al. [57], the results for the learning basic shapes module show that children learned how to progress through the activity levels and showed more interest and excitement when different rewards were given. In repeat basic habits module, children learned how to daily routine tasks step-by-step by placing the pictograms representing those activities in order and remembered them. In the draw module, children demonstrated their learning and following the commands to draw different paths as asked by the app in an acceptable manner. Drawing is the first step before they start writing. The children were able to draw each letter as they heard the VOLUME 8, 2020 audio and were instructed by the app to learn to write modules. In terms of learning values and empathy module, children were able to successfully recognize and select the correct reaction when the pictograms related to one of the everyday life situations were shown.
The quantitative analysis of the study [59] shows that after using pop up book with AR content increased interest to the design by 56.25%, duration of observation by 62.5%, exploration of curiosity by 43.75%, mastery of message contents by 56.25%, and communication by 43.75. Overall, the improvement was significant. The qualitative analysis shows that a popup book with AR is a unique and fun way to learn the content and gives immersive feeling to interact with the objects directly.
The visual analysis of the results in [60] for all the activities revealed that the time required to perform each activity and the number of errors made reduced every day. The highest time and the errors were seen in the initial ten days when they did not have prior experience of working with these activities as they were all new for them. Since each participant had different abilities, a child with physical impairment faced difficulty in number activity as locations were closed to each other. A child with hearing impairment faced some issues with the shape activities as the activity used speech synthesis to inform the child to reduce the errors. A child with autism had better results in shapes and handwriting because of the artistic nature of the topic. The child with visual impairment faced difficulty in handwriting and coordination activity. From the primary studies, only one study by Lee et al. [5] conducted generalization probes at various instances (baseline, intervention, and maintenance) of their research, but they did not present the generalization results.

K. RQ11: DID AR SUPPORT IN THE MAINTENANCE OF THE LEARNING?
Five studies had a maintenance phase to determine if the participants were able to retain the skills learned as a part of intervention over the period. The performance analysis of these studies during the maintenance phase revealed that participants retained what they learned as a part of the intervention. The time of starting the maintenance phase, the number of sessions conducted, and the duration of the maintenance phase varied from one study to another and its participants. Cihak et al. [52] conducted maintenance phase 9 weeks after the intervention; Lee et al. [5] conducted the maintenance phase after six weeks of the intervention, and the sessions were between 4 and 8 and lasted for 2 to 4 weeks. Chen et al. [45] began the maintenance phase after two weeks, and the number of sessions and the duration varied from one participant to another. Chenet al. [51] started the maintenance phase after one month of the intervention. Lastly, Taryadi and Kurniawan [65] have not presented described much about the maintenance phase except that it was conducted.

IV. MAIN FINDINGS
The main findings related to each RQ are briefly described below: A. RQ1: DEMOGRAPHICS In terms of the author (first and co-author), McMahon and Sahin are slightly more active than the others. The US is leading the research on the use of AR for children and adolescents with ASD. There is a subtle difference between the number of studies across the remaining countries. Although two collaborating countries (Portugal and Switzerland) were found in three studies, the evaluation of AR solution in which Portugal was collaborator was conducted in Portugal. In contrast, no information regarding location was found in a study with Switzerland being a collaborator. The evaluation of the solution in all the collaborating countries may have given additional insight, especially from the cultural perspective. Thus, in general, the design of a solution can be enriched by incorporating the needs of locals and cultural elements. In terms of the publication venue, a conclusive remark is not possible as there is a subtle difference between the top and the rest of the venues. With the increase in the use of AR, future reviews would give a better picture.

B. RQ2: LEARNING SKILLS
Several learning skills have been targeted in the primary studies. Most of the primary studies have been conducted without involving stakeholders (children with ASD, parents, caregivers, or teachers) in the design process. The stakeholders can be used to gather needs, design a system that can fulfill their needs, perform an initial usability evaluation, and improve the system before it is made publicly available for everyone to use. The use of participatory design with children with ASD is not new [69]- [72]. In the context of this SLR, one primary study [43] conducted participatory design sessions with teachers to discuss the prototype and discover new design aspects to be incorporated in the prototype.
The National Autism Center in its National Standard Report has highlighted several evidence-based methods that have proven to be useful to teach different learning skills 2 [73]. From the methods highlighted in the report, PECS, AAC, VM, and ABA have been used in the selected primary studies. The use of these or similar evidence-based research studies is important to ensure the authentic skills learning environment for the participants.

C. RQ3: PARTICIPANTS
Due to inclusion criteria, all primary studies contain at least one child or adult with ASD. It is to be noted that majority of the primary studies have been conducted for and with ASD; There are a few studies which have been conducted for a larger group of the population including intellectual disability (e.g., [48], [49], [53]), cognitive impairment (e.g., [39]) and special needs (e.g., [60], [64]) and participants of those primary studies included children and adolescents with ASD. The results of the later studies show that participants with ASD had improved outcomes; however, the impact of replicating the same research on a larger group of population is yet to be seen.
It is to be noted that some researchers have recruited participants with one specific symptom (mild, moderate, or severe), while others have used recruited participants with different symptoms. This recruitment may be constraint by the availability of participants in the area where the evaluation would take care of.

D. RQ4: TECHNOLOGIES
Several technologies have been in the primary studies, from a smartphone, tablet to smartglass, smartwatch, Microsoft Kinect, among others. The cost of technology is an essential factor to consider [74]; handheld devices like smartphones and tablets are getting cheaper and cheaper, widely available, and becomes an obvious choice for use. The use of smartglasses has also increased, and this can be seen from the number of publications. The use of glass-based technology (smartglasses, HoloLens, etc.) is currently on evolution, and because of their size, portability, and flexibility, more research studies are expected soon. On the contrary, projection-based technology, as used in some of the primary studies, are usually more expensive than handheld devices and challenging to set up in the environment. They are typically set up in a dedicated classroom environment and provide advantages to the users (children and adolescents with ASD) in the learning of specific multisensory learning skills using a hands-free environment for the interaction.
The research on ASD has suggested replicating the existing research with newer participants, setting, content, etc. Therefore, the researchers can either use technologies used in the primary studies or investigate the related technologies like Microsoft HoloLens, 3 Magic Leap One, 4 Oculus Rift S, 5 HTC VIVE Pro 6 or Pro Eye, 7 HTC VIVE Cosmos, 8 Realwear HMT-1 9 or any headsets supporting AR to be used by the participants. Some of these new technologies have a built-in eye-tracking facility. Each technology has its pros and cons, and researchers should use them with caution.

E. RQ5: RESEARCH DESIGN
The two commonly used research designs in descending order include pre-test and post-test, and post-test only. The overall improvement in the participants can be better seen through a pre-test and post-test design as it can show the improvement before and after participants were exposed to the AR technology.

F. RQ6: DATA COLLECTION METHODS
The data to be collected, and the evaluation parameters are directly related to each other. For the consistency, data collection methods and evaluation parameters can be classified and discussed based on two categories: 1) human-assisted, and 2) machine-assisted.
The data collection method known as programmatically is machine-assisted, while the remaining data collection methods, including interview, questionnaire, focus group, and observations are all human-assisted.
The researchers in many primary studies used humanassisted data collection methods for assessing participants' interactions and behaviors. This method is prone to error as compared to the automatic method. For example, the number of correct clicks may be miscounted due to human fatigue or boredom. At the same time, a machine-assisted programming code will track the correct clicks of the participants efficiently. However, manual annotation remains ideal for assessing participants' behaviors and system usability due to irregular actions that may hard to program. Therefore, the combination of human-assisted and machine-assisted data collection methods for the robust and easy evaluation of the AR system should be adopted.

G. RQ7: SETTINGS
Most of the primary studies were either conducted in a classroom environment or a controlled research environment., the studies conducted in a classroom environment can be replicated in a home environment with support and training to the parents and caregivers to support the generalization. Similarly, the primary studies conducted in a controlled research environment can be conducted in a classroom or home environment with a minimal change in the setup or instrument (e.g., video camera or microphones, etc.). However, the replication of studies with a dedicated setup of projectionbased technology in a gymnasium or a specific classroom environment to a different environment (classroom or home) is yet to be seen. Similarly, the generalization of learning from a classroom or home environment to a natural environment is also underexplored.

H. RQ8: EVALUATION PARAMETERS
The primary studies included both the programming-assisted parameters as well as human-assisted parameters. Most of the evaluation parameters used in the primary studies require a huge effort of human-assistance to analyze and process hours of recorded data. Since the participants perform a few tasks in each study, the tasks like identifying and calculating the number of correct and incorrect responses, number of attempts made, time spent on each attempt, among others, can be offloaded to machine than human. Furthermore, the humanassistance can be used to analyze qualitative data recorded during the experiments.

I. RQ9: OUTCOMES
Several skills have been targeted in the primary studies, including spectrum (e.g., social communication, emotion), education (e.g., science, literacy), day-to-day living (e.g., brush teeth, plant) among others. The researchers of the primary studies reviewed in this study have provided learning of limited content as a part of their investigation. One important reason behind having a restricted content is the size of an AR app; highly depends on several parameters, for example, the type of model (2D or 3D) used, number of polygons used, pixel density, object recognition and tracking among others. The AR solutions used in the primary studies have provided learning of limited content. These studies can be replicated by incorporating more content in different settings, participants, or AR technologies, the same content, or enriched content.

J. RQ10: GENERALIZATION
Generalization plays a vital role in transferring the knowledge learned from a research study to a real-world environment. Despite its importance, it was found that no research study evaluated the generalization of skills learned or reported the results of the evaluation for generalization.

K. RQ11: MAINTENANCE
The maintenance or retention of the skills learned is also important to ensure that participants can use the skills in their day-to-day lives. Despite its importance, only five studies measured the retention over the period. Although the results were positive, the duration of the interval was short-term (between 1 to 2 months). This leaves two questions unanswered: 1) long-term retention (6 months to over one year) of content for primary studies which evaluated maintenance, and 2) short-term and long-term retention of content learned in the primary studies which did not evaluate maintenance.

V. CONCLUSION
This systematic literature review began by selecting the year of publications from January 2005 to December 2018 (inclusive) to identify the relevant studies in eight databases to answer eleven research questions to provide insight into ARbased solutions developed for individuals with ASD to learn different skills. Across the primary studies, the authors have used several AR technologies. The authors have used both the quantitative approach as well as the qualitative approach to investigate the impact of using AR on the learning of participants as a part of the intervention. The effect of using AR among the participants was positive. Given the wide variety of skills targeted in the studies and the heterogeneity of the participants' characteristics and symptoms, a summative conclusion regarding the effectiveness of AR for teaching different skills to an individual with ASD based on the existing literature is not possible. However, for the researchers interested in conducting further research and investigating AR, several important points do emerge.
The researchers are recommended to follow the institutional policies to conduct human subject research and getting the research protocol approved from the institutional review board (if necessary.) This process may involve getting approval from multiple institutions/boards if the research is to be conducted at multiple locations. This entire process includes preparing all the consent forms to be signed by the participants or read aloud the assent forms to the under-aged participants before they. Figure 4 shows a research taxonomy for ASD that researchers can use as a base to plan the evaluation of AR technology among children and adolescents with ASD. The researchers can adapt the taxonomy based on the need for their research. where the intervention and evaluation will take place. The settings used in the primary studies included classroom environment, Natural environment (City streets), VOLUME 8, 2020 Controlled research environment, School gymnasium, Home environment. 8. Evaluation parameters: The researchers need to select the appropriate evaluation parameters that would be used to determine the performance of each participant while using technology as a part of the intervention. The evaluation parameters used in the primary studies are presented in column 2 of Table 4. 9. Evaluation: the evaluation at all four stages (baseline, intervention, generalization, and maintenance) requires planning of mainly three aspects, namely the number of the participants, number of the sessions, and duration of each session so that everything can fit in the duration of the whole study. The actual number of sessions, the content may vary from one participant or stage of evaluation to another. However, the researchers need to distribute the sessions considering the chosen research design and the fact that if an evaluation would be an onetime post-test only or a combination of pre-test and post-test. Additionally, if the researchers intend to have the post-test generalization, then they need to also plan for the additional details. These details include settings (classroom, home or natural environment, etc.) where the evaluation of generalization would take place, number of sessions to be conducted, content to be used, and the evaluation procedure. If the generalization is to take place in a natural environment or home setting, then researchers need to train the parents and caregivers as they would play an important role in the evaluation. Similarly, if the researchers intend to have the post-test maintenance, then they need to plan for the number of occurrences and the timeline of the maintenance tests to be conducted.
A. FUTURE WORK Figure 5 shows several themes and sub-themes gathered from the limitations and future work of the primary studies. Each rectangle represents a sub-theme, and the value of N inside the rectangle is the frequency of the primary studies in which sub-theme was discussed. Each rectangle is linked to a square dot rounded rectangle. This link represents a theme or the main area of investigation, and its frequency is the sum of all the frequencies from the rectangles. These themes are discussed below. The researchers can investigate one of the more sub-themes from the theme or by merging sub-themes from the multiple themes as a part of their future research.
1. Intervention: Several primary studies have recommended a long-term evaluation of the technology for a few reasons. First, it would provide more concrete results to see if the technological intervention provided continuous learning and improvement of skills in the participants, or it was only for a short-term [40], [66]. Second, the long-term use of technology in different settings simultaneously would shed light on participants becoming independent in using technology for its learning [44]. Third, it can provide a positive and negative effect on repeated use for the long-term [62]. The number of sessions and their duration needs to be planned based on the study. Generalization plays an important role in transferring the skills learned using a computer or smartphone to a natural environment. The importance of generalization has been highlighted in the existing research on ASD [29]. The participants often face difficulties in transferring the skills learned from a particular situation, setting, or content to a new or untrained situation, setting, or content [75]. Despite its importance, it was found that no study investigated the generalization of the skills learned using mobile augmented reality to the natural environment. Therefore, the researchers can incorporate the generalization phase as a part of their future studies. Similarly, the test of maintenance was conducted in five studies, which is one-sixth of the primary studies. The studies had a positive impact on the retention of participants. This suggests that researchers should incorporate the assessment of maintenance as a part of their future research to ensure that participants retain the learned skills over the period. This would provide an opportunity for the underperformed participants; the intervention for those participants can be planned through the parents or caregivers. 2. Technology: Due to the heterogeneity of the participants' impairments and associated symptoms, there is no specific technology that can work for all the participants in the spectrum. The research on ASD has emphasized the need to use similar or new technologies to determine its effectiveness/usability among the participants [73], [76]. A limited number of modalities have been used in the primary studies; the researcher can incorporate and investigate the use of other modalities and its impact in the AR environment. Keshav, et al. [54] conducted a tolerance test to see if the participant is comfortable in using the technology; similarly, the researchers can conduct a tolerance study before beginning the actual intervention. Furthermore, they can use the quantitative data collection method to analyze their performance as a part of the intervention, whereas, they can use the qualitative data collection methods to know the user experience while interacting with the technology. One or more independent observers can observe the user experience, or the support of eye trackers can be used. In either case, the analysis can be carried on eye contact, facial expressions, etc. 3. Participants: Twelve (41%) studies have used less than 5 participants; seventeen (57%) studies have used less than 10 participants, whereas, the remaining 43% of the studies have used more than 10 participants. There are only two studies that have used more than 21 participants. A few primary studies have recommended replicating the same or similar research with more participants to spur research in this area. The replication of studies has also been suggested in the existing research on ASD [77]. Future research can also incorporate both female and male participants, participants from different age groups, symptoms, and diagnoses beyond the spectrum of ASD. This way, participants across the genders can sign up and take part in the intervention; this would also support the generalization. 4. Classroom environment: The concept of inclusive education for ASD is not new [78], [79]. The researchers can target and conduct future studies in the classroom environment and as a part of inclusive education. This would allow teachers to determine the educational value and acceptability of the technology for its classroom environment and the effect of using technology on the workload [43], [61], [63], [64].

B. LIMITATIONS OF THIS RESEARCH
Like every other review, the number of primary studies is based on the 1) research questions formulated based on the objectives of the research, 2) keywords chosen to identify the primary studies, 3) time frame to search the primary studies, 4) type of venues, and 5) inclusion and exclusion criteria. The values of all these parameters restrict the studies to be shortlisted and thus can be considered as the limitation of this systematic literature review study. It is hoped that this systematic literature review study will provide useful insight and guidance for educators, practitioners, and researchers.

APPENDIX
See Tables 9 and 10.
DENA AL-THANI received the M.Sc. (Hons.) degree in software engineering from the University of London, in 2009, and the Ph.D. degree in computer science, in 2016, her thesis was titled Understanding and Supporting Cross-modal Information Seeking. Her thesis in Human Computer Interaction (HCI) investigated visually impaired and sighted people's collaborative computer use and proposed technical approaches to support it. Following her graduation, she managed the online portal and integration platforms for Ooredoo-Qatar before joining Queen Mary University of London's Ph.D. studentship program to continue research in her field of interest. The thesis explored the under-investigated area of cross-modal interaction and inclusive design and evaluation. Her academic and research vocation is to explore and demonstrate how HCI as a field of applied enquiry can contribute to building a more inclusive society. In addition to her research work at Queen Mary University of London, she has worked as a Teaching Assistant in three Computer Science modules including database systems and programming. She has obtained a postgraduate certificate of Learning and Teaching in Higher Education, and is an Associate Member of the Higher Education Academy in the U.K.
MOHAMMED TAHRI SQALLI received the Bachelor of Science in computer science and the master's degree in software engineering from Al Akhwayn University, Ifrane. He is currently pursuing the Ph.D. degree in computer science with Hamad Bin Khalifa University. Along the way, Mohammed enriched his academic career by participating respectively in an undergraduate and a graduate exchange program at the State University of New York in Binghamton, and with the University of Eastern Finland. He also spent one year as a graduate researcher with Meijo University, Nagoya, Japan.
ABOUBAKR AQLE (Graduate Student Member, IEEE) received the B.Sc. degree in computer science from Qatar University, in 2002, and the M.Sc. degree in Computer Science and Engineering from Qatar University, in 2015. He is currently pursuing the Ph.D. degree in computer science and engineering with Hamad Bin Khalifa University, Doha, Qatar. His master thesis was based-on real-time systems analysis for the hidden web databases. Before undertaking doctoral studies with Hamad Bin Khalifa University, Aqle, in 2016, worked for different techno-functional and managerial ICT positions for more than 14 years in multisectors of private, government and semi-government institutions. He started his career as Programmer, then as System Analyst, after that as Implementation Team Leader to a Project Manager, and finally as ICT Solutions Manager. He is actively involved in Qatar University research project of Analytics based Interface Transformation for Web Databases for National Priorities Research Program (NPRP) that is a funding program for Qatar National Research Fund (QNRF). He applied new information extraction methodology that is domain independent and gave promising results. His work has appeared in several of international conferences and journals. Aqle assisting many undergraduate and master students at Qatar University for the Mobile App development framework and Formal Concept Analysis approach for the hidden web data analytics and model representation. His research topics of interest are formal concept analysis, hidden web data analytics, concepts extraction and browsing, semantic and structural analysis in documents, and text summarization techniques.