Robot-Based Intervention for Children with Autism Spectrum Disorder: A Systematic Literature Review

Children with autism spectrum disorder (ASD) have deficits in the socio-communicative domain and frequently face severe difficulties in the recognition and expression of emotions. Existing literature suggested that children with ASD benefit from robot-based interventions. However, studies varied considerably in participant characteristics, applied robots, and trained skills. Here, we reviewed robot-based interventions targeting emotion-related skills for children with ASD following the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. We systematically searched for all relevant articles published in English language until April 2021, using the databases Scopus, Web of Science, and PubMed. From a total of 609 identified papers, 60 publications including 50 original articles and 10 non-empirical articles including review articles and theoretical articles were eligible for the synthesis. A total of 892 participants were included in the robot-based intervention studies; 570 of them were children with ASD. Nao and Zeca were the most frequently used robots; Recognition of basic emotions and getting into interaction were the most frequently trained skills; while happiness, sadness, fear, and anger were the most frequently trained emotions. The studies reported a wide range of challenges with respect to robot-based intervention, ranging from limitations for certain ASD subgroups and security aspects of the robots to efforts regarding the automatic recognition of the children’s emotional state by the robotic systems. Finally, we summarised and discussed recommendations regarding the application of robot-based interventions for children with ASD.


I. INTRODUCTION
A UTISM Spectrum Disorder (ASD; OMIM 209850) is a neurodevelopmental disorder that is associated with persistent deficits in the socio-communicative domain. The mean age of diagnosis of ASD is around 4 to 5 years of age [1]. Baio and colleagues [2] reported an average prevalence of 16.8 per 1,000 children aged 8 years in the USA. The authors found that ASD was more common in boys (26.6 per 1,000) than in girls (6.6 per 1,000). The clinical picture of ASD is heterogeneous, including diverse symptom VOLUME 4, 2016 1 severity and potential comorbidities such as anxiety disorder or attention-deficit hyperactivity disorder [3]. Limited sociocommunicative skills and deficits in the recognition and expression of emotions influence the ability of children with ASD to interact and communicate with others, affecting their relationships with family members, peers, and therapists.
There are promising results in the use of robots in supporting the social and emotional development of children with ASD. Using robots as social mediators to engage children in tasks, allows for predictable and reliable environment; e.g., having predictable rules is an important prerequisite in promoting prosocial behaviors. We do not know exactly why children with ASD are eager to interact with humanlike looking robots and not with humans. Regardless of the reason, social robots proved to be a way to get through the social obstacles of a child and make him/her involved in the interaction. Once the interaction happens, we have a unique opportunity to engage a child in gradually building and practicing social and emotional skills.
This paper is part of the work of the Erasmus+ project "EMBOA-Affective loop in Socially Assistive Robotics as an intervention for Children with Autism" 1 . The project aims to implement, evaluate and develop guidelines in the feasibility of applying emotion recognition technologies in robotsupported intervention for children with ASD. The project combines three domains: intervention for children with ASD, social robots and automatic emotion recognition. This paper presents one of the results of the project -a systematic literature review (SLR) according to the procedure proposed by Kitchenham et al. [4], [5] and meta-analysis of robotbased intervention for children with ASD.
The purpose of this paper is to report a study, based on systematic literature review, what aimed at exploration of the state-of-the-art in the use of social robots in intervention for children with ASD. The paper uses PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) standard for reporting the study and is organized as follows. Section II describes the research methods used and the procedures of the systematic literature review. Section III reports quantitative and qualitative results respectively. The results are followed by a discussion of research validity and outline of challenges that might be addressed by future works.

II. RESEARCH METHOD
Systematic literature review was used in the study as a methodological approach for capturing state-of-the-art in the domain of interest. The systematic method was chosen as the study aimed at finding key studies and performing the review with a transparency and rigour that would allow to replicate the study [4], [5]. According to the approach, the following steps were performed: (1) setting up research questions, (2) defining keywords and search string, (3) inclusion and exclusion criteria, (4) decision on search engines, (5) data extraction, (6) multiple-phase selection based on quality 1 https://emboa.eu/ criteria and research questions, (6) final selection of papers and snowballing technique, (7) extraction of the key findings. The study is reported using framework PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) [6].

A. RESEARCH QUESTIONS
In the study we aimed at identification of the key previous studies, covering all aspects, technical as well as psychological, related to robot-based intervention in children with ASD. We finished up with five research questions: garding the use of robots in interventions for children with ASD? Regarding the first research question we aimed at identifying types of robots used as well as the frequency of use of the robots in intervention for children with ASD. The second and third questions refer to intervention purposes for children with ASD: The focus of our study is to identify the skills trained with robot-based therapeutic approaches and the emotions tracked in order to support the therapy process. The fourth and fifth research questions also refer to therapy focused on emotional skills and enhanced with robots and they are of qualitative nature: We want to identify core challenges, recommendations, and lessons learned from previous studies.

B. SEARCH ENGINES AND INCLUSION CRITERIA
In the study we decided to use three literature search engines, according to the triangulation rule. In order to cover both articles from the technical and articles from the medical community, we chose the search engines Scopus, Web of Science, and PubMed. All three databases provide search by both title and topic. The inclusion criteria were defined as follows. We settled on including in the SLR the papers written in English and published in journals or conference proceedings, regardless of date. We agreed to exclude papers written in other languages from the further phase of SLR. For further stages of paper selection we were scanning for papers that focus on using robots in intervention for children with ASD, papers that contain validation with at least 1 child. At analysis stage we have decided to include some papers without participants as well, if they reported interesting recommendations or challenges. Moreover, we wanted to focus on papers that referred to emotion-related interventions.

C. KEYWORDS
The keywords defined were grouped into the clusters: 2 VOLUME 4, 2016 1) related to emotions (emotion, affective, emotional, affect), 2) related to children (children, child, young), 3) related to autism (autism, ASD, ASC, autistic, pervasive developmental disorder), 4) related to robots (robot, SAR, RAT, robotic, humanoid), 5) related to therapy (intervention, learn, learning, teach, teaching, tutoring, therapy, coaching). The phrases were searched in title and topic fields available in all three search engines. Each scientific database has its own search engine resulting in a different query format. Thus, we had to slightly modify our search query to fit these requirements.

D. PAPERS EXTRACTION AND SELECTION
There were two rounds of paper extraction and selection. The first round was performed in January 2020 and the second one in May 2021 to reveal the most current studies. Papers extraction by topic resulted in 231 records from Web of Science, 265 records from Scopus and 113 records from PubMed, which gave 609 papers in total (428 after duplicates removal). The detailed numbers are provided in Table 1. The paper selection process in each round was manual and consisted of two phases: selection by title and selection by abstract. Each paper was evaluated for its relevance to the eligibility criteria. The papers were evaluated on the threepoints scale: 0 -irrelevant, 1 -somehow relevant or unsure, 2 -surely/strongly relevant. The tagging was performed by four independent raters among authors of this paper. Further decisions were made based on the sum of scores. Papers scored 8, i.e. each rater scored 2, were taken to the next stage automatically. Papers scored less than 4 were excluded automatically. Papers with scores 4-7 went under screening to the next phase. The two rounds were reported in detail separately. This is due to the fact that not all raters participated in both rounds and we found it invalid to simply sum up the totals. The first and the second rounds are depicted with PRISMA diagrams in Fig 2 and 3, respectively.
The Fleiss' kappa coefficient was used to determine interrater consistency of tagging. The coefficient was 0.64 for title tagging and 0.53 for abstract tagging in the first round. In the second round, the inter-rater consistency was 0.59 both for tagging by title and by abstract. Papers that passed the screening (tagging) phase, were analysed in detail. We decided to add additional papers while at analysis stage, if relevant papers in literature would be found by citation scanning (snowballing technique) or from other sources. Sixteen papers were additionally included in total (15 in the first round and 1 in the second round).

E. KEY FINDINGS ANALYSIS
All papers included in the quantitative study were checked against a number of issues including: A) information about study participants including wording for individuals with ASD, B) robot(s) used in intervention, C) skills addressed in intervention, D) challenges identified by the included articles, E) recommendations given by the included articles for future studies Both papers qualified to quantitative and qualitative study were also screened for challenges and/or recommendations for future studies.

III. SLR QUANTITATIVE AND QUALITATIVE RESULTS
Quantitative analysis included 50 papers that had at least one participant. In qualitative analysis we have added more papers that did not report any studies with children but had interesting recommendations for child-robot interaction.

A. STUDY PARTICIPANTS
Fifty of 60 articles included participants. One study included adults only [7]. The other 49 studies investigated children. Forty-eight of these 49 studies included children with ASD; twelve of the 48 studies included typically developing (TD) controls. Miskam and colleagues [8] included one TD child, but no children with ASD. The total number of participants included by the 50 original articles is 892 (min: 1, max: 137, mean: 18, median: 10, standard deviation: 23; rounded to integers). Of these, 570 were children with ASD and 322 were TD participants (137 of these were adults [7]).
Twenty-nine studies included in this SLR investigated participants with ASD and did not specify in their articles whether these children belonged to certain subgroups of ASD. Some studies, however, described their participants by reporting them to belong to ASD subgroups ( Fig. 1). The most frequently included subgroup is the 'high functioning' subgroup, followed by the 'low functioning' subgroup. It remains open whether the subgroups 'Asperger' and VOLUME 4, 2016 'high functioning' and the subgroups 'ASD + Intellectual Disability' and 'low functioning' have the same inclusion criteria respectively and could thus be grouped together.

1) Gender
Thirty-seven of the 48 studies including participants with ASD provided information about the gender of the participants with ASD: The gender of a total of 510 participants with ASD was reported. Of these, twelve studies included only male participants. From all studies together, 420 participants with ASD were reported to be males (min: 1, max: 52, mean: 11, median: 8, standard deviation: 11; rounded to integers) and 90 participants with ASD were reported to be females (min: 1, max: 9, mean: 4, median: 3, standard deviation: 2; numbers refer only to the 25 studies including female participants with ASD). Of the 25 studies including both male and female participants, two studies included an equal amount of female and male participants [9], [10], two studies included more females than males [11], [12], and the remaining 21 studies included more males than females.

2) Age
The adult participants of [7] were between 21 and 63 years old (mean: 34.5, standard deviation: 9.6). Fourty-four of the 49 studies including children provided information on the age of the participants. Of these, 38 studies provided the exact ages or the age ranges of the participants. The participants of these 38 studies were between 2 and 20 years old: The youngest participants of the studies were 2-14 years (rounded to full years) respectively (mean: 6, median: 5, standard deviation: 3; rounded to integers); the oldest participants of the studies were 2-20 years (rounded to full years) respectively (mean: 10, median: 10, standard deviation: 4). Two of the 38 studies included only one participant [13], [14]; for the remaining 36 studies we calculated the number of years between the youngest and the oldest child: it was 0 to 13 years respectively (mean: 5, median: 5, standard deviation: 3; rounded to integers). The six studies providing information on the age of the participants -but neither the exact ages nor the age ranges -either provided the mean age of the participants (9.03 years [15], 10 years [16], 11.4 years [17], 2.5 years [18]), the mean age and the standard deviation (5.4±1.5 years [19]), or the inclusion criteria regarding age (6-9 years [20]).

3) Wordings for Participants
Articles used different wordings to refer to participants with ASD and TD participants ( Table 2). The most often used wording for children with ASD was 'children with ASD' (30 studies) and the most often used wording for TD children was 'TD children' (nine studies).
The wording 'children with ASD' was used by articles published from the years 2008 to 2021 (median: 2018). The wording 'children with autism' was used by articles published between 2002 and 2020 (median: 2017). 'Autistic children' was used by articles published from the years 2002 TABLE 2. Articles using the given wordings to refer to the participants. Some articles are listed twice or more often because they used two or more wordings. Abbreviations: ASD = autism spectrum disorder; TD = typically developing.

B. ROBOTS
We examined which robots were used in interaction scenarios designed for robot-based intervention. We grouped these robots based on their morphological properties, analyzed their popularity and their presence in the CRI studies over the years, as well as the age intervals of children interacting with them.

1) Robots and Their Popularity
The reviewed articles referred to 38 robots. We have divided them into five categories based on their morphological characteristics: humanoid, animal/creature, mobile robot, ballshaped robot, and other. The morphological categories of the robots, their names and the corresponding studies are displayed in Table 3. The first, and most popular, category is composed of 16 humanoid robots. The most popular robot in the category is Softbank's Nao. Nao is followed by ZECA and Darwin-Mini. These robots were mentioned multiple times in different articles with slight changes in their model or version. The Darwin-Mini robot from Robotis was mentioned as "Darwin-OP" [58], "Darwin OP-2" [16] (its updated version) or "Mini Darwin" [7]. We grouped all the corresponding studies under the name Darwin-Mini in the Table 3. The other humanoid robot, appearing in articles with several names, is Hanson Robotics' Zeno robot. It is listed as "Zeno R-50" [50], "Robokind R25 robot" [33] or "ZECA" [12], [57] as in "Zeno Engaging Children with Autism". We decided to keep the name "ZECA" and listed all the referenced studies under this name. ZECA is followed by Kaspar, which is primarily designed as a social companion robot for children with autism (Fig 4). The second category consists of 11 robots having animal or toy-like characteristics. The robots in this category physically resemble animals or toys designed to give the impression of a living being. Most of them take their inspiration from animals; the dinosaur robot Pleo [31], the parrot KiliRo [53], the monkey SAM [26], the bee-like Bee-bot [38], the penguin PABI [63], and the dogs Zoomer [21] and AIBO [59]. The other robots assigned to this category are the robots having toy-like or cartoon-like characteristics: Keepon, a small yellow-colored snowman [27]; Muu, a creature resembling a droplet with a single eye [59]; and Probo, a child-sized plushie having the trunk of an elephant [61].
The next category comprises of 7 mobile robots. This category is composed of wheeled robotic platforms having the appearance of a toy-car. They are remotely controlled and used in order to attract the attention of children and maintain their engagement during the intervention. The most popular robot in this category is Romo, developed by Romotive. It is composed of an iPod mounted on threads and has the ability to display a large variety of facial expressions by a virtual penguin agent on its screen [43]. Whereas, the other robots in the category do not have any visual displays or any facial components to display facial expressions. GIPY-1 [14] is a wheeled robot having a cylindrical shape, with a static neutral face drawn on it; Rovio is a remote-controlled truck-shaped robot [32]; whereas Labo-1, Bubble blower and Lego Mindstorms NXT are wheeled robots having a toy-car appearance. Labo-1 is equipped with infrared sensors, as well as a heat sensor, the robot is able to navigate autonomously, avoid obstacles and follow a heat source such as a child [46]. Additionally, the robot includes a speech synthesizer unit and is able to produce short spoken phrases using a neutral intonation to attract the children's attention. On the other hand, Bubble blower accomplishes the same task by blowing bubbles [59] and Lego Mindstorms NXT is used as a mediator to promote the interaction between the children and other individuals [64]. The last robot in this category is referred as "remote-controlled robot" without any additional information on the robot's appearance and physical properties [10]. The robot is used for monitoring the negative facial expressions and body postures of children in response to fear triggered by the motions of the robot or the noise it is emitting during the interaction.
The fourth category includes 2 ball-shaped robots. We decided not to assign them to the mobile robots category and separate them because they provide a different level of tactile interaction. In addition to various types of in-hand manipulation, the children can kick or pick them, or move them randomly. Similar to mobile robots, these robots also have supplementary functions besides their motion patterns. Both robots have the function of emitting sound. Roball has the ability to communicate with children by short vocal messages and play a song [65]. Sphero can play music and combines this audio effect with its movements and multicolored LEDs to convey its affective states and emotions [18].
Finally, the last category is composed of 2 other robots that have not been assigned to the previous categories due to their embodiment. These two do not share the morphological characteristics of a standard robot, they are rather designed as interactive devices instead of fully-embodied robots. The robot-based basketball (RBB) robot consists of a robotic ma-  [7], [8], [13], [15], [17], [20], [22], [24], [25], [32], [36], [37], [39]- [42], [47]- [49], [51], [55], [56] ZECA [12], [33], [50], [52], [54], [57] Darwin-Mini [7], [16], [23], [43], [58] Kaspar [9], [28], [37], [59] iRobiQ [30], [44], [45] CARO [30], [44], [ [34] nipulator with a basketball hoop attached to its end-effector. RBB displays different affective states by moving the hoop in 3D with different speed levels, accompanied with soft background music [34]. On the other hand, the transitional wearable companion (TWC) robot provides a soft interactive surface looking like an animal-shaped pillow or blanket [29]. TWC was not assigned to the animal/creature category for the same reason that the ball-shaped robots were not assigned to the mobile robot category: TWC provides a different level of physical interaction because it is a wearable robot. It has four soft "paws", so that the children can carry it along by hugging it or wrapping it around their torso or their shoulders.The sensors embedded in the paws can detect the children's touch and trigger child-adaptive feedback by changing the color of RGB LED strips on the paws. TWC also incorporates speakers through which it is possible for a child to listen to brief sounds or music. Based on the total number of articles on the use of social robots for children with ASD, humanoid robots appear to be the most popular, as seen in Table 4. The second place belongs to animal/creature category; and mobile robots, ball-shaped robots and the others follow them respectively. The popularity of robots belonging to humanoid or animal/creature categories may be explained by their capability to provide a higher level of social interaction based on their acquired social skills such as verbal/non-verbal interaction, behavioral cues, turn taking or imitation capabilities as well as displaying their affective states.
When the selected articles were reviewed and analyzed, we found out that some of the published articles were theoretical studies on the conceptual system design, mechanical implementation or interaction procedure. There were also two review articles ( [37], [59]) reporting the use of robots but not disclosing user studies with children. Therefore we separated the articles describing user studies with children and computed the number of user studies with children based on the robot's category. The results show that humanoid robots hold their place as the most popular category, but the second place is shared between animal/creature and mobile robot categories, as displayed in the second column of Table 4. Due to the large number of robots used in the studies, the articles were grouped by the morphological category of these robots to improve the readability and understanding of the figures. The results are displayed over the years in Fig. 5.  The results demonstrate that the studies with the robots in the last two categories (i.e. ball-shaped robots and others) appear to be unique studies on the timeline. On the other hand, the robots in the animal/creature and mobile robots categories hold a consistent display, their numbers do not change drastically but they seem to maintain their presence over the years. However, the increasing number of studies, either conceptual or applied studies, performed with humanoid robots over the years supports the claim of the popularity mentioned in the previous section. Starting from early 2000's the number of articles on the use of social robots has been increasing within the years and reaches its maximum point in 2018. The figures also show that in the last two years, only the articles with humanoid robots were monitored based on the inclusion criteria. Nevertheless, the data for 2020 and 2021 should be treated with reserve, firstly, due to the COVID-19 pandemic, secondly due to the fact that the literature search for this review ended in May 2021.

3) Robots Used for Age Groups
Among the reviewed articles, only 35 of them reported the exact age and age ranges of children who participated in interaction studies with robots. We decided to group these studies into three age intervals according to the reported age distribution: (1) preschool age: 2-5 years, (2) primary school age: 6-9 years, (3) secondary school age: 10-14 years. However, some of the studies reported larger age intervals than the others, e.g. 4-8 years [22] or 7-20 years [60]. Although the studies reported the exact age range, some did not provide the mean age of the children, so we decided to make the group assignments based on the youngest child included in the study. The "preschool" group included 24 studies, while the "primary school" included 11 studies and "secondary school" 4 studies, as displayed in Table 5.

C. INTERVENTION
We analysed which skills were taught in robot-based interventions and grouped these skills in eight skill groups. Take into account that we only included original articles here; articles reporting the general feasibility of a robot to teach certain skills were not included. Table 6 shows the skill groups and skills taught in the respective robot-based intervention studies. The skill group taught by the highest amount of studies was 'social interaction: general' with 23 studies teaching related skills. The skills most often taught were 'recognition of basic emotions' by 19 studies and 'getting into attention' by ten studies.
We analysed the skill groups that were taught by the 38 studies we previously assigned to the three age groups 'preschool age' (2-5 years; 23 studies), 'primary school age' (6-8 years; 11 studies), and 'secondary school age' (10-14 years; 4 studies). The most frequently taught skill group of the 'preschool age' studies was 'social interaction: general', followed by 'emotions: recognition'. Both skill groups were also the most frequently taught in the 'primary school age' intervention studies; however, in this age group 'emotions: recognition' was taught by more studies than 'social interaction: general'. The skill group 'emotions: expression' was taught by as many studies as 'social interaction: general'. Three of the four studies of the 'secondary school age' group taught 'emotions: recognition', but none of them taught 'social interaction: general'. 'Social convention' was only taught in 'preschool age' studies. Fig. 6 shows the number of studies using the respective robots types for interventions related to the eight skill groups. The robot NAO was used for the most skill groups, namely 6/8. NAO was the most frequently used robot for interventions on these six skill groups, namely 'social convention', 'social interaction: general', 'social interaction: initiating', 'emotions: recognition', 'emotions: expression', and 'other skills'.

1) Emotion Recognition
We focused on studies with participants and did not include here articles reporting the general feasibility of a robot to teach emotions. Twenty-one studies provided intervention regarding emotion recognition. We provide the taught emotions in Table 7. We subsumed 'happiness', and 'joy'; 'fear', 'scared', and 'afraid'; 'anger', 'mad', and 'annoyed', respectively. This results in a total of seventeen different emotions including the neutral condition that were taught to be recognised (Table 7). One study did not specify the taught emotions [45]. The recognition of the emotions 'happiness/joy' and 'sadness' was most often taught. The recognition of 'curious', 'proud', 'pleased', 'frustrated', and 'nervous' were only taught by one study [16]. Fig. 7 shows all robots that were used for emotion recognition interventions. The most different emotions were taught by the robots ROMO and Darwin-Mini, followed by NAO (Fig. 7). The robot Probo was used to teach only two emotions.

2) Emotion Expression
Seven studies taught the children to produce/imitate specific gestures that express emotions. We provide the taught emotions in Table 8. We subsumed 'happiness' and 'joy'; 'fear' and 'scared'; and 'anger', and 'annoyed', respectively. This results in a total of eight different emotions including the neutral condition that were taught to be expressed ( Table 8). The expression of 'fear/scared' was most often taught, namely by all seven studies. Additionally the expressions of 'hungry' [24] and 'neutral' [19] were addressed once. Four different robots were used to teach the expression of emotions: NAO was used to teach the expression of 6 emotions, R-50 Alice was used for 6 emotions, ZECA was used for 5 emotions, and Bee-Bot was used for 4 emotions.

3) Emotion Control
Four studies taught the children to control emotions. Zorcec and colleagues [9] taught the participants to make the robot happy again when it is sad. Costescu and colleagues [27] used the robot Keepon to teach the control of sadness and anger. Boccanfuso and colleagues [18] used the robot Sphero to teach the participants to transfer the robot to a positive emotional state when touching it. Javed and Park [43] used the robots ROMO and Darwin-Mini to teach the participants the regulation of 14 emotional expressions.

D. CHALLENGES
In qualitative analysis, we investigated the challenges, limitations, and concerns reported by the articles. The challenges include the ones that refer to the specificity of children with autism, those that refer to the robot characteristics and those who refer to the interaction/intervention including a child and a robot. Some methodological challenges of performing the studies were reported as well.

1) Children with ASD specificity
Nine studies reported the challenges, limitations, and concerns regarding the characteristics of the specific target group of children with ASD.
The articles reported low expressivity of children as a challenge and that included both low-functioning and highfunctioning [34] or only high-functioning participants [51] with ASD. Another study using iRobiq and CARO robots outlined the challenge to determine the responses of child to training stimuli [30] Another study using NAO robot outlined that automatic recognition of engagement requires personalising for each child separately [47].
Among the other reported challenges limited attention span of children was outlined [31] as well as limited ability to determine whether a child really understood the Probo robot animations used in an intervention [61]. Another study expressed concern that the children did not understood emotions expressed by robots Nao and Darwin-Mini [7].
One study reported a challenge of wearing physiological sensors as being problematic for some children [58]. VOLUME 4, 2016 TABLE 6. Skill groups and skills that were taught in the respective robot-based intervention studies. Some articles are listed twice or more often because they focused on two or more skills.
While reading the papers we found it disturbing that some of the papers still use "autistic children" versus "normal children" expressions, while the current approach is to put the person first and to put "typical" rather than stigmatizing "normal" in comparisons.

2) Robot and Child-Robot Interaction
There were papers that reported challenges that result from the robot construction or the specificity of the child-robot interaction.
The ones that referred to robots themselves outlined distractors on robots (NAO) [15], no or too little adaptation to therapeutic needs of children (NAO) [36], gender of a robot that might affect interaction (female robot R-50 Alice was used in the study), and lack of multimodal interaction [19].
Another group of papers reported challenges, limitations, and concerns related to the interaction of children with the robot during intervention sessions. In a study using iRobiQ and CARO robots the concern was raised whether the interaction with a robot is too complex for a child without the help of the therapist [45]. Another study (with Labo-1 robot) mentioned unstructured and unconstrained interaction as a challenge [46]. The other concerns on interaction were only generally expressed as a problem with determining the intervention success [19], [45], [50].

3) Methodological Issues
Nineteen articles discussed the small sample size and the related limited generalisability of their findings and three articles discussed limitations related to the heterogeneous sample of participants. Table 9 depicts the number of participants with ASD and the ASD subgroups for the respective articles. The 19 articles reporting a small sample size included 1-27 participants with ASD, respectively. The mean number of participants with ASD was around 10 (9.58±6.84), the median was 9.
While analysing the studies we found that not all researchers report the characteristics of the participant group. Missing information of age distribution, sex, and level of functioning, results in the study being less replicable and valid for meta-analysis.

E. RECOMMENDATIONS 1) Longitudinal studies
In five articles ( [30], [45], [47], [54], [59]), researchers suggested that longitudinal studies over an extended time period should be conducted in the future. In two studies ( [30], [45]) each child attended eight clinical sessions (each lasted approximately 30-40 min and contained ten trials for two training interactions). The third study [47] focused on single day recordings of the children. The fourth article [59] is a review and indicates that described studies over a few days or, rarely, a few weeks or months. The fifth study [54] consisted of two sessions per children during one day preceded by a familiarization session lasting 10 minutes.

2) Study design
In six articles ( [10], [22], [24], [27], [32], [44]), authors report that additional measures/assessments should be added in future studies. In two articles, researchers indicate the need to conduct research also in a different environment, e.g., school or group sessions [27], research within clinical facilities, but also in the child's natural environment [25].

3) Course of the intervention sessions
Recommendations regarding the course of the intervention sessions concern interruptions of the sessions, verbal instructions given to the child and objects used during the interventions. One study [15] suggests that sessions should be conducted without interruptions. Two studies ( [43], [58]) recommended the wording of the verbal instructions to be short, brief, simple, and concrete. Another study [32] recommended the inclusion of object-free, creative movement in-terventions involving rhythm, dance, yoga, and play therapies into the standard-of-care treatment of children with ASD.

4) Participants' characteristics
This section describes the recommendations on the warranted characteristics of children participating in studies on robotbased interventions. Researchers indicate that certain motor skills [57] and some verbal response capacities [15] of children are necessary. Some studies ( [12], [21], [30], [54]) recommended including children with ASD with different levels of functioning. In addition, studies should include a control group of typically developing children ( [12], [44]).

5) Participants' experience with robots
One study [25] recommended that children should not have previous experience with a robot, so as to rule out the familiarity effect (the participants of that study had previous experience with the robot). However, another study [64] even recommended a familiarisation phase, where the children can freely explore the robot in order to reduce anxiety levels and make the robot more attractive. Similarly, the next study [40] suggested the therapist should familiarize the participants with the intervention and the robot.This was due to the fact that at the beginning of the interventions, 2 from 3 participants were not responding to the robot. Researchers observed that initial human prompts helped the children understand how to interact with the robot.

6) Sustaining participants' engagement during the intervention
One article [32] indicated future research should develop diverse training activities that can sustain children's engagement over prolonged training durations. In reference to the above, another study [40] suggested that to increase the motivation of children during intervention sessions can be used to by interspersing the requests a child has mastered with the skills they are learning during the intervention. Another study [52] described using tablet by children to control and respond to robot to improve the child's engagement in the activity. Another study [39] indicated that among the used by robot social stimuli such as movement, vision and speech, the last of them is the most effective in attracting a child's attention. The average latencies for paying attention toward the visual, speech, and motion social stimuli are 3.441 s, 3.277 s, and 3.732 s, respectively. While the number of attention paid to the speech is 17.53 times, to motion is 14.32 times, and to visual stimuli is 11.10 times. In addition, another study [49] indicated children were showing a more apparent emotional response while the robot talked or made hand gestures. The results of the next study [41] indicate that the child's motivation can be enhanced when the interaction with the robot is meaningful to the child by using game scenarios. However, an attractive study scenario may not be sufficient to keep participants engaged. In one study [40] the robot was using five phrases for social praise which was paired with dances. Despite the participants initially enjoy- VOLUME 4, 2016 ing these reinforcers during the interventions, researchers noticed they became monotonous over longer periods of time. They suggest, in the future interventions, this could be improved by increasing the variety of reinforcers delivered by the robot such as including reinforcers typically used by human therapists such as edibles, preferred items (e.g., stickers), or other forms of social reinforcers (e.g., jokes, praise, and songs). Another study [52] also found that the reward aspect is needed (such as robot's dancing or cheering). In the next study [24] reinforcement (snacks or access to toys) was offered by the teacher at the end of each pretest, post test and training session. Researchers also suggest reward systems should be customized to individual subjects' needs, interests, and social abilities ( [40], [44]). The clinicians also recommended the robot build a positive relationship with the children and offer reinforcers (e.g., preferred items, edibles, and social reinforcers) without making requests or demands. Such a process builds the association for a child that the robot is fun and will lead to more fun elements in the future [40].

7) Measuring engagement
As measurement tools, researchers propose Modified Fogg's Behavioral Model (MFBM) to demonstrate the motivation, i.e., level of engagement shown by the children while interacting with a robot ( [48], [51]). In another study researchers presented initial design schemes of the robotic framework for initiating and estimation engagement [66]. A musical stimulus will be used for initiating engagement. The system needs to be able to detect emotional and social states of a child. Once perceived, it is imperative that the robotic system displays appropriate expressive behaviors and stimulating motions to engage a child emotionally and socially. Researchers use RGB-D depth sensors (e.g. Microsoft Kinect) to monitor the physical activities of a child to estimate the social engagement.

8) Robots
This subsection collected experiences with the use of individual robots, which may be a guide for future research applying robots for interventional purposes.

a: NAO and Darwin-Mini
In one of the studies [7], a system to teach five of the six universal emotions was developed, i.e., sadness, anger, happiness, fear, and surprise (disgust was not investigated). The researchers created an emotional gesture set that has been performed on the NAO and the Darwin-Mini robots. The relatively simple robot (Darwin-Mini) was able to express happiness better while the more advanced robot (NAO) was able to express sadness better. The study participants had difficulty differentiating between some emotions expressed by the NAO robot: Happiness and surprise were commonly confused with one another, as were fear and sadness. Therefore, when creating future gesture sets, the focus should be on a better differentiation between these emotion sets. In another study, using the Darwin-Mini robot [58], researchers also make recommendations on the gestures that represent emotions performed by the robot. Happiness and Sadness got higher scores because of the clear design features. However, there are some features making them confusing. High recognition rates were achieved, but more efforts are needed to adjust the robot body expressions in such a way that they would be perceived consistently among participants. Moreover, the emotion expressed is not exactly like that in real life, since expression in real life is subtler. To express emotion more authentically, the motions should be smaller.

b: ZECA
The study [50] where the ZECA robot (called the Zeno R-50 there) was used consisted of eight stages. Stage 3, 5, and 7 were the main experiment game. For each game stage, a sequence of 13 emotions was randomly shown by Zeno. After showing each emotion, Zeno would resume a neutral pose and wait for the child's response. The child's task was to predict/guess what emotion the robot is trying to show. In total, 37 animations were used. Nineteen of these consisted of only facial expressions, including a neutral expression. The other 18 animations were each based on the gesture mimicry of 18 human actors. The animations were developed to closely mimic the expressions of the actors' facial and upper body expressions. Addition of gestures for Happy greatly lowered the guess accuracy in both groups. This is due to the fact that the gestures could represent more than one emotion, e.g., happiness and surprise when meeting an unexpected friend. However, a significant improvement was shown for Disgust. Since the robot lacks the ability to show nose wrinkler, it is understandable why Disgust had a low recognition rate for facial expression.

c: Probo
One of the studies using the Probo robot [61] had two phases, one phase consisted of watching a video that was played on Probo's belly representing a situation that generates an emotion followed by a neutral facial expression of the robot and the other phase was identical to the first one, with the difference that the video was followed by facial expression of the robot with the right emotion. The emotion recognition performance was recorded for each participant in two phases. In each exposure, the participant had to recognize one of the two emotions, i.e., happiness or sadness, from an animation in which something positive or negative was happening with Probo. The results showed that the performance of participants improved when Probo's active face was used, compared with the phase where Probo expressed a neutral face. Additionally, using Probo's active face has similar effects in increasing the emotion recognition performance of both happiness and sadness. In the second study using the Probo robot [62], the main experimenter together with the child's therapist and parents identified a specific social skill deficit and an individualized social story was developed for each of the skills. In one phase of the study, Probo was telling a story. The robot also expressed the emotions that were included in 14 VOLUME 4, 2016 the story, i.e., happiness and sadness, and moved its head, eyes and trunk. After the robot had told the story, the child had to exercise the social ability described above that was targeted in the story. The story is played on the robot without interruption, so the therapist cannot stop the story when necessary. Accordingly, researchers recommend introducing more interactive stories, so that the robot can respond to the actions and the reactions of a child during the presentation of the social stories. Moreover, the two therapists involved in this study offered valuable feedback on the design of the robot. The size of the robot (80 cm) appears to be appropriate for interaction with children as is the relative size of the head compared to the body so it is easy for the children to focus their attention on the facial expressions. The face area (eyes and mouth) is not really a triangle, but rather a rectangle, which probably needs more attention resources than a triangle does. Also, it was suggested that a shorter trunk would increase the visibility of the facial expressions of Probo, so the mouth would be more visible to the children when the trunk is in the down position. Therapists did not make any remarks or negative reactions to the green color of Probo's coat. A demonstration was held for the other children of the autism center and the researchers observed that even those children who had problems with touching and being touched, were able to touch and interact with the robot at the end of the session.

d: ROMO
In another study [43] researchers decided to design our own character. The original Romo, developed by Romotive, came with a blue monster-like character capable of articulating emotion through the display of a variety of facial expressions. They replaced the monster with a penguin character taking into account several factors to ensure a friendly design for children with autism: 1) Simple and short verbal interactions, 2) No complex patterns for background (of the app, or agent), 3) No horizontal scroll bars 4) To minimize distractions, no background music, 5) No flashing or moving content, 6) Simple layout with minimal info display on screen, 6) Muted colors to minimize any possible discomfort.

9) Learning outcomes
One study [24] suggested that it should be investigated whether or not the learning outcomes can be maintained for a longer period of time (i.e., beyond two weeks). In order to examine this, it was recommended observations of the behaviors of the participants in schools and at home for an extended period of time. In another study [14], researchers came to similar conclusions. They indicated further experiments are needed to investigate the repeatability and durability of the effects.

IV. LIMITATIONS OF THE STUDY
We performed a systematic literature review regarding the use of robots in interventions in autism.
Out of initial list of over 600 papers, we have included 64 papers in qualitative analysis and 60 papers in quantitative summary. We acknowledge that the study is not free from some risks. First of all, it was performed in two rounds. Although we were trying to keep the process as similar as possible, the list of participants that tagged the papers by title and abstract was different, which might result in a slight change among the two stages. We also revealed the papers from three search engines, focusing on search by title and keywords. We believe that the most important studies were included, however adding more search engines and searching by abstract might extend the list. In our study we have not decided to include all papers found by abstract, as the list was too extensive to manually tag them in a reasonable time. Nevertheless, we believe that despite the validity risks reported above, the study is valid and repeatable, as we paid special attention to reporting is thoroughly using PRISMA standard.

V. CONCLUSION
The study revealed that the interest in using robots increases over the years as more and more studies are performed. Moreover, some robot types (mostly humanoid ones) are more frequently used. The robots are used as social companions, but the range of skills trained with the use of them is extensive, with emotion recognition and interaction skills being the primary focus.
The study might be of interest to therapists, who plan or are curious about the use of robots in autism therapy. The study might be also of use fr the researchers and practitioners, who develop new robots and diverse technological concepts used in those. Our future works in EMBOA project include development of guidelines how to use emotion recognition technologies in order to observe and enhance child-robot interaction. The practical studies will benefit from this review in several manners. First of all, we are well aware of the importance of the participant group construction and reporting the detailed information on gender, developmental rather than chronological age or at least the functioning level. We found multiple studies that were preliminary, and included adults or typically developing children only. Moreover, most of the studies included small sample of children with autism. The results show the need for more comprehensive studies, including both more participants, but also covering more factors that influence child-robot interaction.

ACKNOWLEDGMENT
The authors thank Srividya Tirunellai for assistance in the literature search process. The European Commission's support for the production of this publication does not constitute an endorsement of the contents, which reflect the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. VOLUME 4, 2016