The Role of Intelligent Technologies in Early Detection of Autism Spectrum Disorder (ASD): A Scoping Review

Background: Two-year delay is reported between the first developmental concern raised by the parents and the diagnosis of ASD (Autism Spectrum Disorder), delaying the start of early intervention programs most beneficial within the first three years. Aim: Evaluate the role of technology in ASD detection by answering four research questions analyzing 1) evolution of technology, 2) use of various bio-behavioral data sources, 3) demographic categories, databases, controls, comparators, and assessment instruments, and 4) data collection, processing, and outcomes of the technology-based methods in ASD detection. Methods: Scoping review included behavioral-based ASD screening and diagnostic studies, published between 1st January 2011 to 31st December 2021 in PUBMED, SCOPUS, and IEEE Xplore databases for children under six years. The studies were evaluated using the Critical Appraisal Skills Programm (CASP) and the PRISMA scoping review checklist (PRISMA-ScR). Results: The shortlisted 35 studies were categorized into seven bio-behavioral categories. The review highlighted the extensive use of machine learning (ML) and Deep Learning (DL) to detect infants (as young as 9 to 12 months) at risk of ASD and Other developmental delays (ODD) using multimodal structured and unstructured data. However, the review reported various internal and external validity threats. Conclusion: Technology can significantly improve the current ASD detection process. The validation and adoption of technology can be fast-tracked by 1) designing robust study protocols, 2) executing multi-cultural field trials, 3) standardizing datasets, data quality, and feature engineering methods, 4) recruiting statistically significant participants from ASD, typically developing (TD) and other developmental disorders (ODD) groups to ensure technological generalization, validation, and adoption outside laboratory settings.

care unit (NICU), are recommended to undergo additional developmental risk assessments [8].
An exhaustive developmental examination can confirm diagnosis and referral to intervention if the screening instrument indicates a developmental concern. Clinicians usually implement gold-standard tools such as Autism Diagnostic Observation Schedule (ADOS-2) [9] and Autism Diagnostic Interview-Revised (ADI-R) [10] to confirm ASD diagnosis.
Though early ASD indicators are evident at 12 months, and diagnosis is possible at earlier than 18 months [11], most children are diagnosed between 48-60 months [12], [13], highlighting a delay of two years. Delayed diagnosis slows the initiation of early intervention services by 12-14 months [14], which can improve children's IQ by 10-15 points if started under the age of three [15] due to the brain's high neuroplasticity. Therefore, early ASD identification and intervention can ensure a better quality of life for ASD children.
There are various reasons for the delayed or misdiagnosis of ASD among children. Firstly, children with ASD exhibit high variability in typical ASD features such as stereotypical interests, repetitive behaviors, and limited communication and social skills [1]. The high behavioral variance makes it challenging for the clinician to establish an early diagnosis for borderline and high-functioning ASD children, for example, with Asperger syndrome [16]. Moreover, with 80% of ASD cases diagnosed in males [1], women with ASD [17] are susceptible to diagnostic delays and misdiagnosis attributed to stereotypical gender biases [18]. Secondly, the symptomatic similarity of ASD with Attention Deficit Hyperactive Disorder (ADHD) and speech delays [19] often leads to delayed or misdiagnoses [20]. An accurate diagnosis is critical to identifying the child's area of strength and developing a personalized need-based intervention plan per the child's need [21]. Thirdly, the gold standard ASD diagnostic and screening tools such as ADOS [9], ADI-R [10], M-CHAT-R/F [7], and CARS2 [22] are designed for the western world. Therefore, these tests are sensitive to evaluation biases and subjective decision-making of clinicians from Low and Medium income countries (LMICs), resulting in incorrect results, primarily influenced due to lack of training and cultural disparities [23]. Fourthly, the availability of clinicians and infrastructure globally to assist ASD detection and management is limited [24], especially in LMICs, a challenge further constrained by the poor awareness of the disorder [25]. Also, families have limited access to clinicians and infrastructure and usually travel considerable distances or relocate to access services [26]. These limitations lead to lengthy wait times, delayed diagnosis, and causing stress to individuals and families [26], [27].
In addition, the current ASD detection process has limitations. The clinicians require significant training and time to implement diagnostic instruments [28]. A 93-point ADI-R questionnaire, for example, can take 2.5 hours to complete [29] across multiple visits. Further, interview responses are based on the caregiver's subjective comprehension of assessment questions and their reliance on memory recall of the child's developmental history, contributing to evaluation and assessment biases [30]. Moreover, developmental evaluations are seldom conducted in children's natural contexts, such as in their homes. An encounter with a new clinician in a new environment with social performance pressure may trigger discomfort for the child resulting in assessment and diagnostic biases.
Artificial Intelligence (A.I.) based innovations have fast-tracked ASD diagnostics [31], [32], increased clinician capacity, and improved access to early intervention programs [26]. The adoption of these technologies has surged during the COVID-19 pandemic [33]. These solutions have the following benefits over traditional face-to-face methods: 1) enhancing ASD management solution access to rural and underserved persons and families, 2) reducing doctors' and patients' expenditures (such as travel duration and cost), and 3) expanding providers' coverage areas. The preliminary findings provide evidence of technological innovation's feasibility and efficacy in improving current ASD detection and behavioral intervention methods, enhancing access, quality, and affordability. However, more in-depth analysis and information can confirm the impact and outcomes of these innovations.
Scoping reviews are a descriptive method that aids in analyzing complicated or varied research projects by identifying the critical concepts, theories, and evidence sources to guide and evaluate the adoption of new methods into practice [34]. The results of scoping reviews can identify gaps in the existing literature and indicate areas with limited evidence to merit additional studies or a systematic review. We, therefore, performed a scoping review to evaluate the use of innovative technologies for ASD detection. We investigated a body of literature to examine the extent, nature, and scope of current research activities and answer the following four research questions based on the PICO framework [35], [36] aligned toward diagnostic innovations. In the framework definition, ''P'' signifies the population in focus, ''I'' for intervention or researched condition, ''C'' for the comparators, and ''O'' for psychometric outcomes.
1) RQ1 How has the literature on technology-based ASD detection methods evolved? 2) RQ2 How do researchers use the various bio-behavioral markers to detect ASD? 3) RQ3 What demographic categories, databases, controls, comparators, and assessment instruments are a part of the technology-facilitated ASD detection process? 4) RQ4 How have researchers gathered and processed multimodal data? How do technological innovation's results compare to conventional ASD detection methods? The review is based on PRISMA scoping review guidelines and includes the following sections. Section II details eligibility criteria for study selection, keyword definition and justification, study search process, data extraction, and analysis. The results are listed in section III, where we synthesized the review finding and answered four research questions. We present the result under seven multimodal data categories, technological subdivisions, analyzing data sources, data extraction, synthesis, and outcomes. Discussion section IV highlights internal and external validity threats, advantages, disadvantages, ethical, legal, and cultural constraints, high-level limitations, and mitigation measures and recommends future directions. Section V lists the study's limitations and section VI lists future directions and additional focus areas for research. Finally, in section VII, we conclude our findings.

II. MATERIALS AND METHODS
This section describes the study's selection criteria, search strategy, justification, data extraction, and analysis. The review is conducted using the PRISMA Extension for Scoping Reviews (PRISMA-ScR) checklist [37]. The 22-point checklist is attached in the appendix section (See Appendix C).

A. ELIGIBILITY CRITERIA
The inclusion criteria for this study are as follows: (1) Studies that leveraged technology and included behavioral-based ASD screening or diagnostic methods; (2) included children under the age of six; (3) published between January 1, 2011, and December 31, 2021; (4) included quantitative ASD detection methods including cross-sectional experiments, longitudinal data analysis, and dataset investigations; and (5) were part of one of the three electronic databases: PUBMED, IEEE Xplore, and SCOPUS. The following are the search criteria justifications: 1) Most evidence-based ASD detection methods [38], and tools [9], [10], [22], [30] track social communication, eye contact, challenging behavioral, and notable play-based landmarks to identify children with ASD. We, therefore, shortlisted studies that used these behavioral landmarks and excluded studies focusing on medicine, biology, genetics, EEG (electroencephalogram), MRI (Magnetic resonance imaging) usage, and non-technology-based ASD screening or diagnosis methods. 2) We excluded conference papers to ensure we included only high-quality peer-review journal publications selected from PUBMED, IEEE Xplore, and SCOPUS. 3) We excluded literature reviews as we focussed on studies that conducted experiments, trials, datasets, or longitudinal multimodal data analysis. 4) Since 2011, the growth in mobile and edge-based A.I. innovations can be attributed to the emergence of low-cost, scalable cloud computing infrastructure and sensors [39], [40], [41], [42], [43]. Therefore, we selected studies published between January 1, 2011, and December 31, 2021, to evaluate the role of technology in ASD evaluation.

5)
Given the importance and effectiveness of early ASD detection and intervention due to the brain's strong neuroplasticity, the emphasis of the review was limited to studies that included children under the age of six.

B. SEARCH STRINGS
We searched the following search strings in the title, abstract, and keywords fields: The search string justification is as follows:  Figure 1. Any contradictory results were resolved with the consultation and mediation of a third author SS (Shuchi Sinha). The search results were downloaded, compiled, and imported into Zotero c for the presence of duplicates and subsequent removal. Zotero c assists reference management by syncing citations with bibliographies, DOI (Digital object identifiers), and metadata. Each unique article's title and abstract were screened for relevancy, followed by a full-text analysis per the inclusion-exclusion criteria listed in subsection II-A. Thirty-two studies were shortlisted post-full-text analysis, and additional three publications [44], [45], [46] were uncovered by analyzing the shortlisted study's references, making the total shortlisted study count to 35.

D. DATA EXTRACTION AND ANALYSIS
The review included extracting the below-listed data from thirty-five shortlisted studies listed in two tables. Table 1 includes multimodal input data, feature reduction steps, environment setting, data processing algorithms, and VOLUME 10, 2022 psychometric outcomes, i.e., sensitivity, specificity, and accuracy. Table 2 summarizes the enrolment counts, software or hardware devices used, assessment tools, assessment duration, limitations, and future direction of each study. In addition, a quality evaluation using the Critical Appraisal Skills Programme (CASP) was performed for each shortlisted study.
The technical terms used in the review are explained in Appendix B in Table 5. 1) Study objective, methods, and experiment locations 2) Participant's group size and diagnosis status 3) Datasets used in the study 4) Bio-behavioral markers for data extraction 5) Assessment duration, tools, and methods 6) List of software, material, or devices used 7) Multimodal data collection steps 8) Data processing steps 9) Technology used in the study, and 10) Outcomes, limitations, and future direction

III. RESULTS
This section answers four research questions and presents quality assessment results.

A. QUALITY EVALUATION
Two authors (MK and AKK) undertook the quality evaluation of shortlisted studies using the Critical Appraisal Skills Programme (CASP) tool [80]. The studies were scored with three possible responses: a) criterion met, b) partially met, or c) not applicable, not met, or not mentioned, with scores of 2,1 and 0, respectively. Table 3 shows implemented rating scales, referring to previous clinical studies [81], [82], to rank studies into high, medium, and moderate categories. The quality evaluation sheet for shortlisted studies is attached in Appendix C section.
In the following sections, four research questions are answered.

B. RQ1 HOW HAS THE LITERATURE ON TECHNOLOGY-BASED ASD DETECTION METHODS EVOLVED?
We respond to the research question by assessing the selected study's 1) temporal publishing, 2) co-authorships, and  3) keywords trends. In addition, we highlight prominent journals where shortlisted articles were published.

1) PUBLICATIONS TRENDS
The temporal publication patterns suggested that around 80% of the shortlisted studies were published between 2018 and 2021 ( Figure 2). Even though the use of technology in ASD management and in general has shown growth since 2011 [85], [86], [87], the review highlight 2018 to 2021 as dominant years in the adoption of Machine Learning (ML) and Deep Learning (DL) technologies. This aberration can be attributed to the following inclusion criteria for shortlisting      studies for the review; 1) selecting studies focussing on ASD detection rather than an intervention that has seen higher technological adoption, 2) including only behavioral-based detection methods and excluding EEG, MRI, and genetic methods that have incorporated technology since 2011, 3) selecting studies with participants of less than six years, and 4) skewed temporal adoption of technology in ASD detection.

2) COAUTHORSHIP PATTERNS
The publication pattern shown in Figure 3 depicts the county as a node and its size as the publication frequency from the country's authors. The country node's edge strength indicates collaboration between co-authors from multiple countries. The significant country-level contributions are from the United States of America (USA), whose researchers co-authored with researchers from Austria, Bangladesh, China, Japan, and Israel. Authors from Iran, Canada, South Korea, the United Kingdom (UK), the Netherlands, Poland, Italy, the UAE, and India are the other countries with co-authorship collaborations. Authors from Brazil, France, Sweden, Spain, and Switzerland collaborated with other co-authors from the same country. The analysis highlights that most research initiatives and partnerships are from developed economies that have formed partnerships with selected developing economies. Figure 4 depicts the most important and frequently used keywords in the shortlisted studies. The size of the keyword nodes represents the frequency of occurrences in the shortlisted studies, with the edge weights indicating their simultaneous occurrence in other studies as shown in Figure 4.

4) JOURNAL PUBLICATIONS
The frequency distribution of 35 review articles was as follows: four in the Journal of Autism and Developmental Disorders, three in Scientific Reports, two in the Journal of Medical Internet Research, and the remaining publications were published in different journals. The breadth of studies published in various journals suggests the adaptability and validation of a wide range of technology-based ASD detection innovations, with multi-country authorships and multimodal data types.

C. RQ2 HOW DO RESEARCHERS USE THE VARIOUS BIO-BEHAVIORAL MARKERS TO DETECT ASD?
Each shortlisted study is assigned to one of the seven data categories shown in Figure 5, also referred to as bio behavior. Listed below are the study counts for each data category.

1) Stereotypical behavior (Nine Studies) 2) Eye gaze (Six Studies)
3) Facial expressions (Three Studies) 4) Postural analysis (Three Studies) 5) Motor control and movements (Four Studies) 6) Auditory data (Three Studies) and 7) Assessments and electronic health record data (Seven Studies). We summarize shortlisted studies in seven data categories in the subsections below.
[47] developed ML models through a two-stage process: (1) feature selection and (2) ASD and TD classification. They trained ML models using historical ADI-R and ADOS-2 records, shortlisted 20 critical features using the DF (Decision Forest) algorithm, and incorporated them in the parental questionnaire (PQ) and annotation-based videotagging module. In the second stage, researchers integrated responses of both modules applying L2-regularized logistic regression (LR) [88], whose psychometric outcomes outperformed those of M-CHAT, CBCL (Child Behavior Checklist), standalone questionnaire, and video modules.
[48] enhanced their previous work by introducing a third clinician questionnaire module. The three-module screener implemented in 8-10 minutes outperformed earlier psychometric outcomes using the GBDT (Gradient Boosted Decision Tree) algorithm.
[49] collected one to five-minute home-based videos rated by non-experts generating a feature set analyzed by eight ML classifiers previously trained on ADI-R and ADOS datasets. All classifiers had a sensitivity above 0.945, but only three had a specificity above 0.5. The LR5 (LR model with five shortlisted features) outperformed other ML models. [50] validated their previous work [49] on Bangladeshi children, including those with SLC (speech-language conditions). Non-expert US raters, after one-hour training, reviewed videos and responded to 31 multiple-choice questions, generating a feature set from the responses. The LR with Elastic Net penalty [89] and LR5 were the best performing ML models on the feature set with sensitivity, specificity, AUC, and accuracy for ASD vs. TD as 0.76, 0.58, 0.76, and 0.70 and ASD vs. ODD as 0.76, 0.77, 0.85 and 0.76 respectively.
[51] video-recorded mother and child social interactions of HR (High-Risk) toddlers aged 9-12 months in three social situations. 1) Face-to-Face (FF) mother-child interactions, 2) mother's unresponsive Still-face (SF), followed by 3) usual mother and child interactions. The SVM classification model outperformed NB (Naive Bayes) and RF (Random forest) in the ASD detection and classification.
Further, [52] developed a Video-referenced Infant Rating System for Autism (VIRSA). The system algorithm proposed a series of parent-infant interactive age-matched videos, with parents choosing the most appropriate ones matching their child, resulting in a score computation. At ages 6, 9, 12, and 18 months, children were clinically examined, diagnosed, and rated on the VIRSA. The statistical analysis of VIRSA scores predicted 100% ASD in children at 18 months and 78% at 36 months compared to diagnostic established using goldstandard tools. This study is a first step towards creating a novel video-based online rating system for detecting ASD in children with robust psychometric properties.
[53] developed a smartphone application, NODAsmart-Capture empowering parents to record home videos of child's behavior and label social dialogue, play, and problematic behavior in four social scenarios. Diagnosticians annotated the videos with built-in tags designed on DSM criteria such as ''no eye contact'' or ''repetitive play,'' matching ninety-one percent of their recommendations with the ground truth diagnosis recorded at study enrolment.
West syndrome (WS) disorder [90], diagnosed in 0.06% of infants and children, is characterized by epileptic spasms, often leading to mental impairment in children. [54] implemented ML to predict the onset of ASD/ID (Intellectual disability) in high-risk 9-12-months with WS. Researchers captured three video-recorded social engagement scenarios, and out of SVM, J48, and RF [91], the DS (Decision stump) [92] algorithm predicted WS vs. TD with 0.765 and WS+ vs. WS− with 0.812 accuracy using multimodal audio and video data.   Based on a movie stimuli [55] elicited and engaged the child's attention, video recorded behavioral and social reactions of children. They analyzed the scenes using computer vision to decipher children's emotional, behavioral codings, and head positions and classified ASD children with 85-95% accuracy.

2) EYE-TRACKING
Eye-tracking is a non-invasive method for examining an individual's attention and mental processing abilities, which serve as proxies for cognitive and neurological functioning.
In this review, six studies, [56], [57], [58], [59], [60], [61] used eye-tracking and gaze analysis to measure fixation frequency, duration, and AOI (Area of Interest) responses from children's gaze towards social and nonsocial stimuli in images and videos. The studies hypothesized that children with ASD prefer circumscribed interests (CIs) [93], preferring specific animated characters, toys, or activities. Researchers use the gaze preference of children on the content of images or AOI to make ASD vs. TD classification In the experiment by [56], TD and ASD groups observed six scenic images with social (e.g., people) or without social cues (e.g., bowl). The researchers extended the experiment with twelve images, half with CI (e.g., a toy car) and another half without CI (e.g., a plant). Within-subjects CI and non-CI eye-gaze data for ASD and TD groups using T-tests suggested poor social attention processing abilities for the ASD group.
The study [57] recruited children from ASD and TD groups who were similar in age and gender. For 10 seconds, participants observed a female speak the English alphabet, and their fixation data on various facial and body areas were collected. They applied DA (Discriminant analysis) to mouth and body AOI fixation data and classified ASD and TD children.
[58] studied six-month-old preterm children's gaze and fixation on social figures, suggesting that children preferred looking at the eyes or lips of social figures over nonsocial images. However, at 18 months, each subject tested negative for ASD when evaluated on M-CHAT and without CG (control group) presence; the results provided weak evidence to detect and classify ASD and ODD.
[59] recorded participants' eye movements while viewing eleven photographs and constructed a virtual network graph using temporal gaze patterns and fixation time on the seven AOI on the human face. Betweenness centrality at four face features, under the right and left eye, left eye, and mouth, was lower in ASD children than TD children by 27, 53, 42, and 61%, respectively, forming a basis of ASD detection.
[60] captured the gaze modulation of children with ASD and TD children using an eye tracker as they played a variant of the Go/No-Go game. AdaBoost's meta-learning algorithm could distinguish ASD and non-ASD participants with an accuracy of 88.6% based on gaze patterns.
[61] evaluated if an impaired response to joint attention (RJA) in infancy is a critical ASD marker. The infant eye gaze was recorded in a 10-minute session of several IJA (Initiation to joint attention) tasks. Since newborns utilize their gaze for RJA and IJA, this method can be used to quantify children's social cognition milestones at an early development age of 10-18 months.

3) FACIAL EXPRESSIONS
Children with ASD struggle to produce and perceive facial expressions that express a range of emotions and display affection [94], impacting their social functions. Deep learning (DL) models can identify facial expressions from images or videos in three steps, 1) preprocessing the image or videos, 2) extracting facial expression features, and 3) classifying the extracted features to various emotions.
In this review section, two studies [62], [63] used publically available facial expressions datasets to train the DL models and extracted and analyzed facial expressions from EG (experiment group) and CG (control group) to make an ASD diagnosis.
[62] trained CVA model on Binghamton University 3D Facial Expression database [95] to extract facial landmarks that SVM classified into positive, neutral, and other categories. They observed that children with ASD had more neutral expressions than children without ASD. The AUROC with age-covariates ranged between 0.75 to 0.83 for five movies that children with ASD and TD watched.
Imitation of facial expressions is a critical measure of social interaction skills. Studies demonstrate that children with ASD on prompted stimuli usually perform imitation slower than TD children [96]. [63] trained the DL model to recognize facial expressions using FER2013 [97], CK databases [98], and augmented the model learning with sixteen Chinese children's facial expressions. The participants imitated seven facial expressions, and their responses were video-recorded. For the ASD group, average expression imitation was lower than 60%, compared with TD, a critical ASD deterministic threshold.
[64] studied facial expressions using the Facial Action Coding System (FACS). An OpenFace software extracted the subtle dynamics of social smiles of ASD and TD children from their home recordings. The results suggested that ASD children display happy facial expressions less intensely than their TD counterparts during the first year of life.

4) POSTURAL AND HEAD MOVEMENT DATA
Children with ASD demonstrate a diminished capacity for postural stability [99] and functional balance [100].
Two studies, [65], [66], used CVA (computer vision analytics) from recorded videos to measure head postural control in study participants to distinguish ASD and TD groups.
[65] induced social and nonsocial stimuli by asking study participants to watch five movies comprising animated and complex characters and recorded participant's rate of head movements using CVA. After adjusting for age, ethnic origin/race, and sex, the ASD group had a faster head movement rate in four of five movies with complex stimuli. By removing the ODD (other developmental delays) group from the non-ASD group, the 95% CI level adjusted rate ratios to distinguish ASD vs. TD were significant.
Reinforcement learning is a subfield of AI (Artificial Intelligence) that guides intelligent entities' behavior based on a reward-based environment [101]. [66] in multiple stimuli, single Child-Robot Interaction (CRI) session measured head postures, joint-attention, and eye-gaze data [102] using RGBD sensors and cameras. They used CNN (Convolutional Neural Network), CVA, and CLNF (Constrained Local Neural Field), differentiating TD and ASD children. The TD group had good adherence to IJA (Initiation of Joint Attention) and RJA (Responding to Joint Attention) with the therapist and robot than the ASD group. However, the children with ASD displayed higher comfort and engagement with robots and a high IJA towards the therapist during the transition.
In addition, [67] developed and validated a deep neural network (CNN-LSTM architecture) trained on the non-verbal aspects of social interaction from video recordings captured during ADOS-2 assessments that distinguished ASD and TD peers with an accuracy of 80.9%.

5) MOTOR MOVEMENTS
Children with ASD have varying degrees of fine and gross motor skills. Gross motor deficits in children with ASD can impair body balance and make it challenging to participate in sports or do daily tasks [103]. Difficulties with fine motor skills might limit participation in activities that demand hand muscle movements [104]. In this review section, we covered four studies [68], [69], [70], [71] that used motor data to classify ASD children.
[68] used a smart tablet with touch-sensitive screens and inertial movement sensors to capture the study participant's contact impact data patterns while playing games. They applied the Kolmogorov-Smirnov test (KST) [105] on the sensor dataset, shortlisting the ten most significant features from 262 and classified ASD vs. TD using RGF2 (Regularized Greedy Forest) [106] algorithm computing AUC, sensitivity, and specificity scores. [69] analyzed participant's upper-limb movements in a reach-to-drop task exercise. The participant reached the ball, placed it in the support, and transferred it to the target box hole. They shortlisted seven discriminating features out of seventeen using Fisher discriminant ratio (FDR) [107] for both EG and gender and mental age-matched CG. They used the SVM algorithm to identify ASD children using seven features. [70] on three real-world virtual reality imitation tasks collected participant's body movements in response to visual, auditory, and olfactory stimuli. They identified joint motions using the DL (Deep Learning) OpenPose and shortlisted critical and extensive body part movements using PCA (Principal component analysis) [108] to detect children with ASD. The SVM algorithm classified ASD children with an accuracy of 0.893 using five joint movements (head, trunk, arms, legs, and feet) in response to visual stimuli. Inter-joint coordination and motor synergies [71] can be potential substrates of ASD markers. Researchers asked ASD and TD participants to engage in a motor task behavior by manipulating a felt-tip pen to draw on a sheet of paper. At the same time, an optoelectronic motion capture system recorded their movement kinematics that was analyzed by the SVM algorithm to classify ASD and TD participants with a 94.7 percent accuracy. The analysis implies that an ecologically valid autism motor signature can predict ASD risk in children.
Word2Vec algorithms [109] convert words to vectors, evaluate similarities, and group words logically, allowing the processing of sizeable unstructured text repositories. In addition, LDA (Latent Dirichlet Allocation) [110] uses a prior Dirichlet distribution [111] matching word distributions with logical topics. Combining LDA and Word2Vec, both parts of NLP can generate discriminative features for a topic based on contextual associations. [72] analyzed unstructured ASD evaluation referrals by scanning, preprocessing, physical records, and reading through OCR (Optical character reader). The dataset was upsampled [112] by adding two simulated positive samples for each positive case and feature reduced using L1 and L2 regularizations [88] using SVM. Word2Vec predicted ASD risk with precision, recall, and F2 scores of 0.646, 0. 911, and 0.842, respectively, outperforming LDA. [73] predicted ASD risk by asking families of HR children to state social-communication developmental concerns in a sentence. A regression tree algorithm analyzed the textual responses that either suggested ASD risk or presented an additional M-CHAT-R [7] or ASQ [113] question and, after processing, suggested ASD risk. The ML model AUC with text-only analysis ranged between 0.36 to 0.54, and for text and with M-CHAT-R [7] questionnaire between 0.74 to 0.88.
The EMR [114] is usually implemented in clinicians' offices, clinics, and hospitals to capture notes, assessments, and treatment records cross-sectionally and longitudinally for diagnosis and treatment. [74] extracted 89 features from longitudinal retrospective EMR data and shortlisted 20 features using RF Gini impurity [115] scores. They used SMOTE [115] to upsample and overcome the class imbalance in the ASD dataset. The LR predicted ASD risk with an AUC of 0.727. Researchers obtained ground truth labels for patients (ASD or non-ASD) in the studies [116] from the clinical reports.
In addition to analyzing and classifying multimodal data, few studies focused on enhancing ML performance. [75] used Grasshopper Optimization Algorithm (GOA) [117] on three datasets [79] and predicted ASD with near 100% accuracy.
[76] assessed HR and low-risk infants at eight, fourteen, twenty-four, and thirty-six months. The best ML classifier was SVM (AUC of 0.713) trained on VABS [118] daily living module [119] records that were captured at 14 months, normalized and z-scored. [77] used ML to investigate the Q-CHAT [120] assessment records to distinguish between ASD and non-ASD children. Of five ML algorithms: RF, NB, SVM, LR, and KNN, the SVM achieved the highest accuracy of 95%. [78] used the Q-CHAT and Q-CHAT-10 (Q-CHAT with ten features) datasets to develop two 5-layer DNNs to detect children with ASD. They compared the performance of both the models and observed that the Q-CHAT-10 model reported higher AUROC, sensitivity, and specificity than the outcome of SVM and DNN algorithms processed on Q-CHAT data [78]. The findings confirm the role of ML models in reducing the assessment features and predicting an ASD condition.

7) AUDIO DATA
DL models can identify distinctive vocal patterns by analyzing the production of canonical syllables and speech volubility [121]. Canonical syllables [122] have a consonant and a vowel-like component that emerge by the second half-year of life and not later than ten months in TD children. Volubility refers to syllable production frequency and is usually limited in children with ASD [123]. In the review, three studies analyzed audio data; two used syllable production, speech patterns, and canonical babbling [44], [45], and the third used crying patterns [46] to detect ASD.
[44] used a pre-trained feature extraction auto-encoder integrated with a joint optimization method, and trained four ML models on eGeMAPS (Geneva minimalistic acoustic parameter set) dataset [124]. The ML models: SVM, BLSTM (88 features) [125], BLSTM (54 features), and optimized AE BLSTM were tested on 95 ASD and 130 TD utterances across five vocalizations categories: syllables, canonical babbling, calling mother or father, screaming and crying. The BLSTM AE model outperformed other ML models with precision, recall, and F1 scores of 0.4526, 0.6869, and 0.5457.
[45] conducted a retrospective study examining the vocalizations of 37 infants from two 5-mins videos in the 9-12 months and the 15-18 months age range; that included family play, vacations, and familiar routines (e.g., mealtimes). The video recordings were annotated on canonical babbling, syllables production, and speech volubility features. The LR model trained on the canonical babbling ASD features was the strongest predictor to classify 90% of ASD and 63% of TD infants at 9-12 months. Further, Log odds ratios (log OR) confirmed that TD infants reached the canonical babbling [123] stage earlier than other infants who were later diagnosed with ASD.
[46] for ten ASD and TD children collected crying samples (300 ms to 3-sec clips), preprocessed and cleansed them by removing screaming, babbling, or vocalizations instances with a closed or non-empty mouth. They used phonation and vocal quality features from Belalcazar-Bolaños dataset [126] created from audios of Parkinson's patients. To minimize misclassification, they used a novel SubSet Instance (SSI) method using unsupervised and supervised methods. They shortlisted two discriminative speech features, i.e., an MFCC and SONE coefficient, to measure tone's timbre and loudness with temporal difference variance to form a basis to screen children with ASD. This section identifies various participant counts, datasets, experiment and control groups, assessment instruments, locations, and durations.

2) DATASETS
Large-scale datasets give researchers the motivation and necessary sample size to develop, collaborate, and benchmark the performance of ML and DL algorithms. The studies reported use of datasets in the audio [126], [127], assessments [79], facial expression [95], [97], [98], and EMR category [74]. The studies reported challenges such as data preprocessing, cleaning, and augmenting the audio and EMR datasets as they were neither age-matched [62] nor culturally relevant to the experiment data [63]. Additional datasets are listed in Appendix A, allowing researchers to collaborate and develop ASD detection innovations and improve the current ASD detection process.

4) ASSESSMENT TOOLS
Out of the seventeen different psychometric tools, six of the most widely used in the review were ADOS, ADI-R, M-CHAT-R, MSEL, CARS, and DSM. The ADOS, ADI-R, CARS-2, and DSM are gold-standard ASD diagnostic tools. The outcomes of these tools are matched with the outcomes of technology-based tools to calculate psychometric properties. The MSEL measures children's cognitive development and ensures that the controls and comparators recruited in the study are age and IQ matched.

2) SHORTLISTED ASD MARKERS
Numerous studies incorporated feature reduction methods, marked in column ''reduced features'' in Table 1, and shortlisted critical ASD deterministic landmarks. Researchers trained ML models on these features to perform ASD vs. TD classification using a supervised learning method shown in figure 6. For example, [72] applied feature reduction on the scanned referral records and shortlisted behavioral patterns such as vocal vowel sounds and mood swings as critical ASD deterministic markers. Additionally, [74] shortlisted parental age, medication use, treatment, and dietary patterns as significant predictors of ASD.
Further, [63] highlighted that children with ASD can comprehend and imitate facial expressions such as happiness and sadness but struggle with complicated facial expressions such as neutrality, aversion, disgust, and surprise. [62] reported that non-ASD children, while watching movies, often raised eyebrows and an open mouth, a characteristic of normal development and a feature not displayed by ASD children. The social communication deficit is a critical marker for ASD. [49], [51], [67] highlighted speech patterns, communicative engagement, language understanding, emotional expression, sensory seeking, responsive social smile, and stereotyped speech as critical markers for ASD. Further, [50], [53], [67], [76] highlighted the child's stereotyped behaviors, repetitive interests, and poor eye contact as important markers for ASD risk determination. In addition, [55] suggested name-call responses and emotional state analysis as an enabler for early ASD warning flags in children. [51], [52], [53] emphasized shorter duration and lower frequencies of eye contact, lack of social smiling, and poor social engagement as ASD risk markers. However, [76] revealed that poor eye contact and repetitive hand movements alone did not accurately diagnose ASD. Individual behaviors such as daily living skills impairments and compliance within the household must be considered in conjunction with other behaviors to suggest predictive accuracy of ASD. Further, [54] used PCA to identify stereotypical hand motions (HM), motherchild communication exchange, and speech analysis as essential behavioral and auditory markers for ASD among children with WS. Thus, ML models can analyze facial expressions, gestural patterns, stereotypical behavior, and communication exchanges to predict ASD risk with high confidence.
While measuring joint attention skills, [61] reported that infants later diagnosed with ASD exhibit considerable atypical IJA but not RJA. In addition, the prevalence of atypical nonverbal behaviors manifested by displaying uncommon, limited gestural postures decoupled from visual contact, facial affect, and speech in ASD children [67] can lead to ASD identification.
Atypical motor movements can predict the risk for ASD. In an experiment, [68], researchers observed that while playing tablet games, gesture velocity was more significant in the ASD group, while the time to tap a screen was shorter than in the control group. In another study, a ball drop task [69] indicated an improper wrist angle position, hand inclination, and slower, fragmented movement as critical criteria for ASD and TD classification. Similar findings were reported by [71] in a reaching-grasping paradigm in which children with ASD displayed decreased coupling between DoF (degree of freedom), which correlated with the severity of their sociocommunicative symptoms. During a virtual reality, motor movement task [70] could classify ASD and TD groups with 82.98% using only head movements, 74.47%, and 72.34% accurately using arms and legs movements, respectively. The findings corroborate the literature suggesting that head spinning and banging, body rocking, and foot-stomping are three major stereotypes and repetitive motions associated with ASD.
[77], [78] findings indicated that ML algorithms could detect ASD with an accuracy greater than 90 percent from a selection of 14 feature items and greater than 80 percent using only three items of Q-CHAT.In addition, VABS (Vineland Adaptive Behavior Scale) [118] daily living normalized zscored [119] assessment scores at 14 months reported AUC of 0.713 [76] for ASD detection.
A study by [60] using the eye gaze reported that ASD children exhibited more unstable gaze modulation and demonstrated significantly shorter initial, average, and total fixation durations for social stimuli [56]. Further, [57] suggested that children with ASD show reduced fixation time at the eyes, mouth, and nose, affirming the critical role of fixation on the eyes in detecting autism via eye-tracking. However, findings of [58] suggested quite the opposite, as preterm children preferred to glance at the eyes or lips of social images or people. Therefore the ability to process social cues by analyzing the fixation duration at various body parts can predict the severity of ASD in children. [45] presented the vocal analysis of the children and confirmed that at 9-12 months, TD infants reached the canonical babbling [123] stage earlier than other infants later diagnosed with ASD. They further confirmed that infants diagnosed later with ASD produced fewer words per minute than those diagnosed with TD. Therefore canonical ability and syllables production in younger years can confirm the risk of ASD.

3) MULTIMODAL DATA PROCESSING
The research utilized seventeen ML algorithms listed in the column ''Algorithms'' of Table 2. Decision trees, random forests, and support vector machines were the most often used machine learning models. CNN algorithm is utilized approximately 80% of the time when deep learning methods are employed. The review employed six statistical methods, with the ANOVA, T-test, and Chi-squared test being the most often utilized. Ensemble decision trees [128] performed the best on structured data generated from the video annotation of ASD-relevant behaviors. In eye-tracking, statistical and discriminant analysis were the most effective algorithms. CVA ranked highest in analyzing unstructured facial expressions and postural and head movement data. Additionally, SVM scored admirably in the structured feature reduced data captured from the motor movements. MGOA and word2vector algorithms outperformed all other algorithms in Assessments, Datasets, and EMR Analysis. Finally, BLSTM (Bidirectional LSTM), AE, and SVM effectively classified audio data to detect ASD conditions.

4) MACHINE LEARNING VS. DEEP LEARNING
A ML model trained on multimodal data can classify ASD and TD children at the current state of the art. However, DL outperformed ML methods in feature extraction and classification tasks on unstructured data. For instance, researchers captured features of interest for ASD classification using DL from facial expressions [62], [63], postural and head movements [65], [66], [67], text analysis using NLP [72], [74], [78], motor movements [68], [69], [70], and audio recordings [44], [45], [46] and incorporated supervised learning techniques as shown in figure 6 to classify ASD children. As a result, a conclusive DL model trained on multimodal data sourced from one or more of the seven categories can make the ASD diagnosis procedure efficient.

IV. DISCUSSION
The scoping review shortlisted 35 studies after eligibility and inclusion-exclusion assessments. The review analyzes technology's viability, application, division, and outcomes for the following seven bio-behaviors;(a) Stereotypical behavior; (b) facial expressions; (c) eye gaze; (d) motor movements; (e) postural analysis; (f) assessments and EHR datasets; and (g) auditory data. The review data summary is populated in table Table 1 which includes multimodal input data, feature reduction steps, environment setting, data processing algorithms, and psychometric outcomes, i.e., sensitivity, specificity, and accuracy. Table 2 lists enrolment counts, software or hardware devices used, assessment tools, assessment duration, limitations, and future directions. The review uses table data and answers four research questions on technology usability, multimodal data capture, data analysis, quality evaluation, limitations, and strengths.
The review contributes to the literature by 1) Shortlisting various multimodal bio-behavioral markers for ASD detection; 2) Analyzing automatic multimodal data extraction, feature optimization, and data processing methods; 3) Highlighting psychometric outcomes from technological innovations and comparing them with traditional methods, and 4) Identifying relevant datasets for researchers to collaborate and cocreate ASD and ODD detection innovations bringing efficacy to the detection process. The review highlights that ASD detection ML and DL methods can be applied to identify children at risk of ODD, including speech and developmental delays and hyperactive challenges. Researchers can shortlist specific feature sets for each condition and train machine learning models with statistically significant data volume. The outcomes of the machine learning models can be measured based on psychometric properties calculated by comparing predicted diagnoses with gold-standard tools. The subsections below detail the role of technology in ASD detection and internal and external validity threats.

A. PROMISING ROLE OF TECHNOLOGY IN ASD DETECTION
The analysis highlights an upward trend in adopting technology-based ASD detection solutions during 2018-2021 attributed to multiple factors.
1) The demand for low-cost diagnoses [129], universal screening [130], and the availability of research funding [131] have promoted research initiatives to develop technology-based ASD screening innovations. 2) The high penetration of mobile devices, low-cost cameras, and Micro-Electro-Mechanical System (MEMS) sensors such as accelerometers and gyroscopes [132] have enabled the real-time capturing of vast volumes of structured and unstructured data. In clinical situations, cameras and sensors are more practical, less expensive, and less invasive technologies than fMRI and EEG. 3) With technology maturing, the generation and multimodal data processing is automated by researchers by building data pipelines on a low-cost cloud infrastructure. Integrating data pipelines with technologies such as AI, ML, and DL has expedited the development of cost-effective and superior detection and on-risk identification of ASD and ODD population. However, the traditional ASD diagnostic services are not always accessible, affordable, or data-driven [27], [133]. The review findings suggest that technology-based ASD methods can be extrapolated to the ODD population and can effectively, efficiently, rapidly, and potentially serve larger population groups with improved quality, access, and affordability [27]. Further, the technology-facilitated innovations are expected to supplement traditional detection methods because of the following reasons: 1) Diagnostic methods based on ML and DL can be trained on a large volume of involuntary generated multimodal data from various bio-behaviors to detect children with ASD and ODD risk. 2) Traditional ASD screening methods can misdiagnose children with borderline ASD or with speech delay or ODD as ASD. These limitations can be overcome using technological innovations such as an inconclusive ML classifier developed by [47] trained solely on misclassified data instances. The method reduces misdiagnosis of comorbid conditions with an implementation time of under ten minutes by suggesting borderline or ODD instances into an inconclusive class and recommending users for further evaluation by a clinician. Thus, ML technologies can potentially alleviate the misdiagnosis of detecting comorbid, ASD, or ODD borderline conditions such as speech and developmental delays with increased accuracy. 3) A few gold-standard tools, such as CARS-2, can diagnose children only beyond two years. Also, children's social communication, language, and other critical milestones do not develop until the second or third year of life. Therefore, evaluating ASD risk in children under two years can give conflicting results by an inexperienced clinician. The review emphasized the extraction and analysis of ASD and ODD landmarks from behavioral [51], [52], [54], eye gaze [58], audio [44], [45], [46], Facial expressions [62], postural [65] and assessments [73], [74], [76] data to identify children at risk of ASD between 6-18 months, circumventing traditional diagnostic instrument's age constraints. These improvements can advance the field by promoting early identification, improving clinician's capacity, and thereby improving access to early intervention [134] services. Even though the review highlights that the demand for technology-based detection methods has grown from 2018-2021, the actual adoption of these innovations has been minimal. These innovations should ideally be used by non-specialists, available on mobile applications ( to ensure widespread adoption), and able to identify TD, ASD, and ODD ( speech delay, development delay, Intellectual delay) based on well-defined minimal distinguishable features, in the first three years [135]. The adoption of these technologies can be supported through controlled pilots through the participation of stakeholders such as parents, clinicians, and schools, digitizing downstream detection processes, assessments, and treatments [136]. A digital human-supported ASD and ODD detection and management framework can be initiated and transition to an autonomous and need-based blended digital model, optimizing cost and maximizing scale [137].
Further, adopting these technologies can be supported with vernacular massive online open courses'' (MOOCs), training websites, and brief knowledge content, including text-based training procedures with video clips [137].

B. VALIDITY THREATS
Internal [138] and external validity [139] threats need to be reviewed and managed to ensure the reliability and robustness of the study's research methods and their outcomes. Internal validity evaluates study appropriateness concerning its method, rigor of an experiment, protocol, structure, study variables, and execution. External validity confirms study findings in the real world and leads to broader adoption. While rigorous research procedures can ensure the study's internal validity, they may limit its generalization, application, and external validation. Below Table 4 list internal and external validity threats and suggest methods to overcome those.

V. LIMITATIONS
The access and reach of technology-based ASD detection methods depend on the availability of computers, mobile phones, and the internet. A lack of internet coverage may disproportionately disadvantage those in rural and underserved locations hampered by sluggish internet speeds, poor quality, unstable connectivity, and persons' lack of technological ability and trust in technology. In addition, the internal and external validity threats listed above limit the acceptance and generalization of the innovations.
Further, although the scoping review eligibility to shortlist studies was for children between two and six years, the following studies had overlapping and higher age ranges. Three studies recruited children between two to eight years [48], [63], [70], and two studies included adolescents, teens, and adults age-group [72], [75]. The mismatch between study eligibility definition and study selection can limit the validity of the scoping review.

VI. FUTURE DIRECTIONS
The review highlighted the presence of a sophisticated technical stack, which produced promising but non-generalizable results with privacy, legal, cultural, and ethical challenges [142]. Therefore future studies should: 3) Researchers should focus on policy-level activities involving stakeholders in developing study designs based on the FATE (Fairness, Accountability, Transparency, and Ethics) framework for using AI for ASD detection. In addition, adopting and enforcing strong regulations, policies, and data protection and privacy legislation to prevent inadvertent data leaking can inspire confidence among stakeholders [143]. In addition, the development of technology-facilitated early ASD and ODD detection solutions should be supported downstream with reliable referral and intervention infrastructure, improving the healthcare system's efficiency, capacity, and efficacy [136].
Most studies in the review captured data from a single bio-behavior to develop ASD detection innovations. Future improvements should include capturing multimodal data from diverse bio-behavior categories. The feature engineering methods can assign weights to multimodal data originating from more than one of the seven categories, such as eye contact, stereotyped behaviors, postural demonstration, and speech, to develop ML and DL models to predict ASD and ODD risk and their severity levels for broader age groups. These improvements can be offered as a service on a mobile application to improve its adoption and usability.
Finally, future studies should focus on preventive methods incorporating genetic approaches. For example, [84] used Hidden Markov Models and genetics to examine the risk of having an ASD offspring, as the ASD risk is multiplied by 40 to 65 times in parents with an ASD diagnosis or carrying a risk gene. Therefore, future genetic-focused trials can preempt the risk of ASD in children and empower parents to decide on starting families with possible risk exposure. In addition, technological innovations using trained robots to treat and diagnose ASD in young children, using POMDP (Partially Observable Markov Decision Process) [144], [145] can significantly automate the ASD detection process and should be a focus for future research.

VII. CONCLUSION
The review comprised 35 studies grouped into seven multimodal data categories: (a) stereotyped behavior, (b) facial expressions, (c) eye gaze, (d) motor control and movements, (e) postural analysis, (f) auditory data, and (g) assessments and electronic health record data. A scoping review based on PRISMA guidelines revealed a rising trend of technology-based ASD detection tools incorporating multimodal data analyzed through ML and DL methods and supports the role and effectiveness of technology applications in improving current ASD screening and diagnosis methods. The review reported internal and external validity challenges with ethical, legal, dataset, and restricted participant and controls as critical challenges. In addition, most solutions reported outcomes limited to the laboratory with nongeneralizable outcomes. Therefore, additional cross-cultural intensive trials with large population groups with various other disorders are needed to examine the field preparedness, ethical, legal, and adoption challenges of technological solutions in real-world scenarios. The review can aid academics, clinicians, and practitioners by offering vital inputs for developing technologically-based ASD screening and diagnostic solutions that are efficient, cost-effective, and data-driven and can address the current constraints of the industry.

APPENDIX A
Appendix lists important datasets in autism research: JAFFE [146] -The database of 213 images containing the facial expressions of ten Japanese women. There are seven distinct facial expressions: neutral, happy, smiling, sad, surprised, anger, disgust, and fear.
CK+ [147] -The expression database created in the laboratory includes 593 expression sequences from 123 individuals, 69% female and 31% male from African Americans, Asians, and South Americans. It comprises seven facial expressions: disdain, disgust, fear, happiness, sadness, and surprise.
MMI [149] -The expression database is broken into two sections: the first is a dynamic data set containing over 2,900 video sequences; the second is a static data set containing over 2,900 video sequences. The second component is a static data collection consisting of many high-resolution photographs. The collection contains seven distinct types of expressions.
AFEW [150] -All of the facial photos in the database were edited from movies and included seven fundamental facial expressions SFEW [151] -The expression library consists of a static frame image from the AFEW data set containing seven fundamental expressions.
eGeMAPS [152] -A set of acoustic parameters suitable for use in various areas of automatic voice analysis, including para-linguistic and clinical speech analysis. The set is designed to serve as a single reference point for future research evaluations and prevent discrepancies produced by separate parameter sets or even by different implementations of the same parameter The Simons Simplex Collection (SSC) [153] is a resource of the Simons Foundation Autism Research Initiative (SFARI). The SSC established a permanent repository of genetic samples from 2,600 simplex families, each of which has one child affected with an autism spectrum disorder, and unaffected parents and siblings.
Binghamton University 3D Facial Expression database [95] has currently, 100 participants (56% female, 44% male), ranging in age from 18 to 70 years and representing a diversity of ethnic/racial ancestries. Each person made seven different facial expressions in front of the 3D face scanner. Except for the neutral emotion, each of the six prototypical expressions (happiness, disgust, fear, anger, surprise, and sadness) has four intensity levels.

APPENDIX B
See Table 5.

APPENDIX C
PRISMA Checklist: PRISMA-ScR checklist for studies. CASP Evaluation Sheet: The results of the CASP quality assessment tool for studies. (Microsoft Excel Open XML Spreadsheet (XLSX)) ACKNOWLEDGMENT Study Registration: Similar to most other scoping reviews, the current scoping study was not registered. Author Contributions: Manu Kohli: conceptualization, methodology, writing (original draft), writing (review and editing), software, formal analysis, investigation, resources, data curation, visualization; Arpan Kumar Kar: conceptualization, methodology, writing (review and editing), supervision, validation, and investigation; and Shuchi Sinha: conceptualization, methodology, writing (review and editing), and supervision.