Visualization and Interaction Technologies in Serious and Exergames for Cognitive Assessment and Training: A Survey on Available Solutions and Their Validation

Exergames and serious games, based on standard personal computers, mobile devices and gaming consoles or on novel immersive Virtual and Augmented Reality techniques, have become popular in the last few years and are now applied in various research fields, among which cognitive assessment and training of heterogeneous target populations. Moreover, the adoption of Web based solutions together with the integration of Artificial Intelligence and Machine Learning algorithms could bring countless advantages, both for the patients and the clinical personnel, as allowing the early detection of some pathological conditions, improving the efficacy and adherence to rehabilitation processes, through the personalisation of training sessions, and optimizing the allocation of resources by the healthcare system. The current work proposes a systematic survey of existing solutions in the field of cognitive assessment and training. We evaluate the visualization and interaction technologies commonly adopted and the measures taken to fulfil the need of the pathological target populations. Moreover, we analyze how implemented solutions are validated, i.e. the chosen experimental designs, data collection and analysis. Finally, we consider the availability of the applications and raw data to the large community of researchers and medical professionals and the actual application of proposed solutions in the standard clinical practice. Despite the potential of these technologies, research is still at an early stage. Although the recent release of accessible immersive virtual reality headsets and the increasing interest on vision-based techniques for tracking body and hands movements, many studies still rely on non-immersive virtual reality (67.2%), mainly mobile and personal computers, and standard gaming tools for interactions (41.5%). Finally, we highlight that although the interest of research community in this field is increasingly higher, the sharing of dataset (10.6%) and implemented applications (3.8%) should be promoted and the number of healthcare structures which have successfully introduced the new technological approaches in the treatment of their host patients is limited (10.2%).


I. INTRODUCTION
The technological evolution we have witnessed in the recent years has led to important advances in many research fields, among which cognitive assessment and training. Researchers The associate editor coordinating the review of this manuscript and approving it for publication was Charalambos Poullis .
have started developing and validating new solutions, based on serious games (SGs) and exergames (EGs). SGs are digital applications designed for a primary purpose other than pure entertainment, as education, information, enhancement of cognitive and physical functions. They usually emulate activities of daily living (school lessons, doing the shopping, doing housework, exploring environments) and indirectly assess participants cognitive functions during gameplay. EGs are videogames which rely on technologies that track body movements and imply a form of physical exercise, as emulating a sport, playing an instrument, exercise or do racing. Advantages of using SGs and EGs for the ecological assessment and the rehabilitation of cognitive functions are disparate. Firstly, gamification allows to indirectly evaluate patients avoiding causing stress and frustration, which could affect results of standard tests. Moreover, they ensure the creation of safe, controlled, standardized settings and a strict control over experimental conditions and stimulus delivery. Besides, thanks to the integration of different sensors, it is possible to record different measurements, useful for the assessment of patients' cognitive and motor skills and the monitoring of their well-being state, behaviour and improvements.
The purpose of our review is to provide an analysis of the advancements of research on SGs and EGs applied to the cognitive assessment and training conjointly with the development of new technologies for visualization and interaction, i.e. Virtual Reality (VR) and Augmented Reality (AR) [54], platforms for game deployment, i.e. mobile, computer, consoles and Web platforms, and Artificial Intelligence (AI). Specifically, this review addresses the following research questions: RQ1 Among the different visualization techniques, from non-immersive (e.g., monitor-based) to fully immersive (e.g., head-mounted displays) methods, which are the most adopted solutions? During the years, was there a rise in the use of immersive VR and AR devices?
RQ2 Among the available interaction techniques, from touchful (e.g., mice and keyboards, touchscreens and controllers) to touchless (e.g., vision, sensors or voice based), which are the most adopted solutions? Although the wide diffusion of gaming tools, during the years, was there a rising interest on the search of alternate solutions? RQ3 In consideration of their eventual adoption in the standard clinical practice, how developed solutions are actually validated? Moreover, given the increased number of systems published per year, have we assisted, in parallel, to a growth in the number of available data and finally, subjects/patients effectively using such solutions? In the following sections, we first describe the adopted procedure to gather data (Section II), then we consider the different technologies available for visualisation (Section III) and interaction (Section IV), their core features, potentials and actual application. We then focus on the validation process, in particular on the experimental designs currently adopted to test proposed solutions (Section V-A) and the way data are collected and analysed, also considering the application of ML algorithms for predicting the onset of certain pathological conditions (Section V-B). Lastly, we evaluate the availability of proposed frameworks and datasets (Section V-C) and the effective adoption of SGs and EGs based solutions by healthcare structures.

II. PROCEDURE
The current survey is the result of a systematic search that we conducted in several high profile databases, i.e. PubMed, Scopus, Web of Knowledge and Science Direct, using the search string (assessment OR training) AND ((cognitive AND (VR OR AR OR serious game OR exergame OR WebGL OR AI)) OR ((memory OR attention OR executive functions) AND (VR OR AR OR serious game OR exergame))) We included only articles written in English, both peer-reviewed journal articles and conference proceedings or workshops and excluded abstracts, editorial, book chapters and review articles and we excluded articles for which full text was not available for our University. Finally, due to the constantly evolving nature of the technologies considered, the search was limited to the period 2016 to the present. After the first search and selection phase, summarized in Fig. 1, we have selected 235 works (N = 235, in the following N denotes the number of the works for the considered specific topic), including 197 journal articles, 38 conference proceedings and workshop articles.
Papers included in this survey propose methodologies for the cognitive training (N = 151), assessment (N = 83) or both (N = 1) of heterogeneous target populations: elderly and age related diseases patients (N = 110); subjects with long term acquired disabilities caused by different pathological conditions, as stroke, cancer, MS, epilepsy, traumatic and acquired brain injuries (N = 32), or drugs and alcohol abuse (N = 5);  During the categorization and data extraction phase, articles were classified considering the visualization and interaction technologies used as well as the chosen validation approaches, adopting the taxonomy further illustrated in Table 1.

III. VISUALISATION TECHNOLOGIES
Our search highlights a strong interest in VR technologies (N = 223), which has become increasingly higher in the last few years, see Fig. 2. The popularity of VR is due to its ability to reproduce realistic and ecologically valid two-dimensional or three-dimensional objects and virtual environments (VE) the user can interact with, while allowing to precisely control test administration and stimulus presentation, to record responses and track participants performance over time [55]. Considering the level of immersion provided, VR setups can be classified in three main categories, nonimmersive, semi-immersive and immersive. Non-immersive systems are based on the use of screens (computers, mobile devices, monitors and projectors). The VE is presented to the users without occluding their Field of View (FOV), hence, even if they feel involved and engaged in the task, the sense of being in the real world while interacting with the virtual one persists. Semi-immersive virtual experiences provide users with a partial VE through the use of drive simulators or multiple screens. Sense of presence is moderate, since they will still give the perception of being in a different reality, while remaining connected to the physical surroundings. Finally, immersive systems concern room-filling technologies, such as the CAVE 1 or the CAREN High End, 2 and head-mounted displays (HMDs), both standalone and tethered, e.g. the Oculus 3 or the Vive 4 products. They completely occlude participants FOV enhancing the sense of being physically present in the VE. Moreover, they often include tracking systems, which, on the one hand, are essential for the correct functioning of the system, while, on the other hand, can provide additional information on users movements and behaviour inside the VE.
As shown in Table 2, non-immersive VR is still the most diffused visualization technology (N = 150), followed by immersive VR (N = 59), despite the recent success of VR HMDs and the release of more affordable and performing devices, and despite its benefits, in terms of multisensory stimulation, tracking of the head and body movements and higher sense of presence. The minor interest towards semi-immersive VR systems (N = 4) can be justified by their high cost and by the fact that they require an adequate space for installation and assistance during task execution.
The preference towards non-immersive VR may be partially explained by the fact that these technologies are accessible, affordable and portable, which make them ideal for remote use and home rehabilitation applications. Moreover, target populations mainly include children or adolescents (N = 10) and adults (N = 14), who are used to these technologies, or elderly (N = 25), who can more handily manage a computer or a tablet rather than a headset for immersive VR. Home rehabilitation applications are usually videogames smoothly accessible through Web platforms (N = 25) or playing games directly installed on the devices (N = 24), owned by the participants or provided by the experimenter. Even if studies focusing on the comparison of immersive and non-immersive solutions exist, results are not conclusive and strongly depend on the target population and on the task. Performance in assessing cognitive functions are usually comparable, whereas preference and usability of HMDs seem to be more consistent, in terms of increased motivation, more intuitive action control and greater enjoyment associated with task fulfillment. However, these studies are often conducted on young subjects, hence, even if results may encourage the use of immersive VR for the assessment and treatment of emotional or neurological disorders, they cannot be easily generalized to the senior population. In [55], researchers specifically address this problem and investigate the effect of the level of immersion (desktop screen and HMD) on seniors and young adults performance in a virtual supermarket shopping task. While young adult group score remains stable regardless of the platform used, seniors' scores are superior in the non-immersive case, even if their experiences do not differ between the two platforms and only minimal and rare side effects are reported. Moreover, in both groups, trial execution with the HMD seems to be more influenced by fatigue. Opposite results are obtained by authors in [56], who demonstrate that a higher level of immersion can significantly improve inhibitory control and task switching. Similarly, in [57] two different games to train attention and working memory in children with Attention-Deficit/Hyperactivity Disorder (ADHD) in two modes, immersive and non-immersive, are developed and tested on healthy subjects. Electroencephalography (EEG) signals and gameplay data are recorded and analyzed as a measure of participants' cognitive abilities and temporal cognitive ability changes. Better results are associated to the immersive trials.
A less explored alternative to VR is AR (N = 10), which integrates digital and physical information in real-time and allows the user to interact with virtual and real worlds and objects at the same time. AR related articles have been classified in two overarching categories, i.e. trigger versus view-based augmentation, see Table 2. As shown in Fig. 2, our search reveals an increasingly higher interest in trigger based AR solutions, which include applications for AR see-through headsets and for mobile devices in which markers, body movements and locations are used to initiate the augmentation.
A comparison of the effect of VR and AR spatial memory training on short-term and long-term memory, has shown that, even if VR outperforms AR in the immediate post-training test, AR is better suitable for long-term spatial memory transfer [58]. Nonetheless, physical displacements have been shown to be important in acquiring spatial ability skills [59]. Solutions combining non-immersive VR and AR technologies also exist. Authors in [60] design a tool for screening initial dementia: participants visit a virtual cultural relic exhibition and have to complete a test while visiting the exhibition by scanning the answer's code shown on a ''cognitive board'' with a mobile phone.

IV. INTERACTION TECHNOLOGIES
Interaction is another important factor to consider when designing applications in which participants can benefit from active learning and are asked to perform a specific task, which requires to interact with the VE. Thus, intuitive interfaces should be promoted, in order to reduce learning time and optimize the effects of the training. However, the choice of the solution to be adopted is frequently bound to some constraints, since all target populations exhibit cognitive and/or physical impairments, which should be taken into account during the design phase and the choice of the proper interface. Moreover, around half of the studies analysed address older adults, who are less experienced with Information and Communication Technologies (ICTs), may have expectations strictly anchored to mental models developed in their past experiences with certain tools and could lack some basic knowledge required to effectively interact with the technological solutions proposed.
Interaction modalities can be mainly classified in touchful and touchless, as shown in Table 3. In the first case, the user is required to handle a device and apply a physical pressure to a surface to trigger events. Whereas, in the second case, no physical interaction with the device is needed. This category includes both different approaches for body posture and hand gesture detection and tracking, such as movement, electromyography (EMG) and vision-based techniques, and interfaces exploiting gaze or brain activation through eye trackers and wearable EEGs.
Touchful devices comprise the widely diffused tools for gaming (N = 181) and physiotherapic devices (N = 17), e.g. treadmills, cycle-ergometers and pressure sensitive plates, which are particularly suitable for rehabilitation programs aimed at improving motor skills in subjects with age-related disorders [15], [50], [89], [95], [112], [147], [148], [149], [150], [204], [209], major neurocognitive disorders (MNCD) [23], stroke [116], [152] and multiple sclerosis (MS) [22] patients. In general, touchful interfaces ensure stability, reliability and effectiveness. Nonetheless, although the low ICT education level of some subjects, mice and keyboards have been widely used since the last century and can be considered familiar tools. Hence, they represent the most common solution for interaction with a growing interest over the years, see Fig. 3. However, as interaction is achieved by pressing buttons and triggers or sliding fingers on a touchpad, which are not natural hand gestures, the transfer of skills acquired during training to daily life activities could be questionable.
Touchless approaches are often referred to as ecological, as they are designed in order to reuse existing skills, through intuitive gestures requiring a little cognitive effort. Moreover, they potentially allow for nearly unlimited input options, as they theoretically could exploit all the 27 DOFs of the human hand. However, they often present interaction and tracking challenges, as they are prone to occlusion, noisy reconstruction and noisy artifacts, which undermine both their intuitiveness and their stability and efficiency, causing frustration. This explains the increased interest of researchers over the year in the development of vision-based solutions, although to a lesser extent than touchless solutions, as shown in Fig. 3. Less explored alternative are eye tracking (N = 2) and EEG (N = 9) based interfaces. Eye tracking is used to monitor students with autism spectrum disorders (ASD) during the interaction with their educators and to help direct their attention [234]. Whereas portable EEG devices are employed to create Brain-Computer Interfaces (BCI), where signals detected by the EEG are translated into inputs in the application. These BCI are designed with the aim of training attention and concentration in pre adolescents [84] or ADHD [94], anxiety disorder [51] or Mild Cognitive Impaired (MCI) [73] patients.
A minority of studies explores other solutions, namely verbal (N = 18) and robot-based (N = 6) interaction. In the first case, the user is usually asked to watch a video [64], [153], observe a scenario [47], [203] or navigate a city [68], [71], [86], [223] or a maze [199] and recall elements, pick up and correctly place objects in a house [193], [201], solve problems by thinking aloud [78], [229], recall a list of items before buying them [173], [218], listen to a list of words and recall them [83], respond to visual or auditory targets [217]. The second case includes hand end-effector robotic devices [24], exoskeleton gait robots [116], [152] and humanoid robots, programmed to substitute therapists during test administration and provide adequate feedback. For example, in [134] Pepper humanoid robot administer a music memory based game, requiring MCI patients to recognise songs from the years they were younger; in [96] Pepper administers and shows the final score of a MoCA-like psychometric assessment; in [126] a tablet is mounted on a Lego robot that moves its arms and legs when the activity is correctly completed.

V. VALIDATION A. EXPERIMENTAL DESIGN AND DATA ANALYSIS
As highlighted by Fig. 4, games for training are usually validated using three main experimental designs, i.e. within subjects (N = 61), between subjects (N = 27) and randomized control trial (RCT, N = 39). While in the former case, all participants follow the same training procedure, in the latter cases, they are divided in groups and asked to follow two different training procedures, usually the traditional one and the videogame-based one, and results are compared. Test-retest reliability is often used for validation and is obtained by administering standard validated penciland-paper tests at different moment of the training schedule, namely before, after and follow-up. Similarly, games for cognitive assessment are usually validated using a within subjects experimental design (N = 48) or comparing healthy control (HC) with patients (N = 25) or young and elderly healthy subjects (N = 2), whereas a between subjects experimental design is only used by [224] to compare different visualization technologies (immersive and non-immersive). In general, participants are asked to play the game, and their performance is correlated with the results of standard pencil-and-paper tests. High correlations suggest the construct validity of the proposed solution, which could become an alternative to standard assessment methods with the advantage of allowing for ecological assessment in a controlled environment. The increasing number of participants involved in the experimental sessions, as shown in Fig. 5 can be an indicator of the research community's interest and efforts to develop solutions for cognitive assessment and training as well as the level of progress towards actual practical application in standard clinical practice. In fact, even if studies presenting new prototypes, applications at an initial development phase and user tests, which usually involve less than 20 participants, are common (N = 77), there is an increasing number of works that describe solutions at the validation phase, which typically involve a larger number of participants (N = 158). Representative samples in combination with large sample sizes are fundamental indexes when the goal is collecting informative data and extending the results of a research.

B. DATA COLLECTION AND ANALYSIS
During experimental sessions lots of heterogeneous data can be collected, however, as highlighted by Fig. 6, interest is often mainly focused on gameplay data and standard cognitive test results. Gameplay scores and parameters or log data are recorded for the evaluation of performance and the monitoring of user actions in the VE. These scores are usually compared with standard cognitive pencil-and-paper tests results or, rarely, with performance obtained in real life tasks. In recent years, authors have become more interested in using a user-centered design approach, and validated questionnaires on usability, presence, workload or simulation sickness have emerged as key tools. Behavioural, observational, physiological and kinematic data, useful to monitor patients motor skills, task workload and the possible onset of negative side effects when using a certain technology, instead, have always been poorly exploited, although recordings can be easily acquired using non-invasive methods, which do not interfere with task execution or reduce sense of presence. For example, vision based approaches, as motion tracking systems and cameras, can provide kinematic information on body movements, postures and gait, while wearable sensor, e.g. armbands, bracelets or EEG headsets, allow to obtain physiological measurements, as heart rate, skin conductance or brain activation signals. Table 5 proposes a classification of articles based on the goal of the intervention, the experimental design, the acquired data and the number of participants, which summarizes the previously reported results.
In some cases, data acquired during gameplay are used in combination with ML techniques for the classification of pathological and non pathological subjects, in particular concerning age related diseases [79], [86], [115], [130], [196], [208] and ADHD [188]. Since some pathological conditions can take long periods before being actually diagnosed with the means currently available, an early diagnosis through prediction methods allows to intervene promptly, preventing or limiting the onset of severe symptoms and debilitating conditions. Digital biomarkers, i.e. kinematics data, gameplay, clinical and neuropsychological data, can be used for the creation of predictive models for the identification of MCI pathological patterns [220], for scoring elderly brain's ability [201] or for the clinical developmental assessment of preschooler [138]. Authors in [143] use Reinforcement Learning to train bots to generate synthetic data plausibly emulating a large population of players, at various stages of learning, or conversely, various levels of cognitive decline. Subsequently a prediction model is applied to new VOLUME 10, 2022    gameplay data to classify different levels of play. These examples suggest the feasibility of the adoption of ML techniques in biomedical applications where the number of patients is small, symptoms could be non-homogeneous and the complexity of the setting can be challenging. In other cases, ML techniques are adopted for runtime analysis of data. In [47] two supervised algorithms, namely random forest and support vector machine, analyse verbal and non-verbal input. Outputs are employed both for the classification of MS and Parkinson Disease subjects and for the recognition and automatic verification of the given verbal answers. Authors in [78] develop a multi-label classification model for driven automatic assessment to track and assess the cognitive and emotional states of individuals with ASD during VR-based training. In [99] authors describe the implementation and efficacy of a linear regression model for the assessment of intrinsic motivation and performance using Big Five personality traits values (openness to experiences, conscientiousness, extraversion, agreeableness, and emotional stability).
Finally, considering the effective use of proposed solutions with pathological subjects, patients involved in the experimental sessions usually have a diagnosed disease and are recruited from hospitals, clinics, community-dwelling, specialized day care centers or retirement homes. This implies a collaboration between these structures and research groups, however the nature of this collaboration nor the duration is often unclear, even if specialized research centers having stable and long-lasting collaborations with health facilities exist (12.3%). For this reason, it is difficult to understand the practical feasibility and usability in hospitals, rehabilitation centers and daycare centers of the methodologies proposed.
Nonetheless, as reported in Table 6, we could find some clinics (10.2%), which have successfully integrated the new technological approaches in the treatment of their host patients.

VI. CONCLUSION
The current survey aims at providing an analysis of the research on SGs and EGs for the assessment and training of cognitive dysfunctions related to different pathological conditions, conjointly with technological advancements in visualization and interaction technologies, platform for game deployment and AI, in the last five years. After the first search and selection phase conducted in high profile databases, we categorized articles and extracted data according to the classification shown in Table 1.
Considering RQ1, the analysis of the visualization technologies highlights a greater and increasingly higher interest in VR (94.9%), in particular non-immersive (67.2%) and immersive VR (28.5%), than AR (4.3%). The diffusion of non-immersive with respect to immersive VR, although the recent success and widespread diffusion of immersive VR HMDs, may be justified by the affordability and portability of PCs and mobile devices, their accessibility and easiness of use, even in absence of supervision, which make them particularly suitable for home rehabilitation and applications for target population with a low ICTs level. This could also explain why researchers are becoming more interested in trigger-based AR solutions for mobile devices.
Regarding RQ2, touchful interaction modalities have been the most diffused over the last five years (84.3%), whereas among the touchless solutions, a growing interest for vision-based techniques (18.3%) has been found. Considering touchful interaction approaches, half of the solutions found use the standard gaming tools (51.5%), e.g. mice and keyboards, controllers, joysticks or gamepads, whereas only a minority of works try to combine standard physiotherapy tools (7.2%) or exoskeleton gait robots and end-effector robotic devices (1.3%) with videogame-based training. Among the touchless interactions, excluding visionbased techniques, the application of alternative approaches, such as eye tracking (0.9%) and EEG for attention training is still limited (3.8%), even if the recent integration of eye trackers with newer HMDs (as HTC Vive Pro Eye) could encourage researchers to further investigate its application in the cognitive neuroscience field. Finally, the application of AI techniques for speech recognition and eventual interaction with chat bots or virtual characters (0.4%) or humanoid robots (1.3%) is still poorly exploited, since human therapists still play a prominent role (7.2%).
Concerning RQ3, the most diffused experimental designs for the validation of designed solutions are within subjects (26%), RCT (16.6%) and between subjects (11.5%) for training and within subjects (20.4%) and comparison of healthy control and patients (10.6%) for assessment. During experimental sessions, only gameplay data (63.4%), standard cognitive tests (66%) and validated questionnaires (22.6%) are ordinarily used, whereas physiological (13.2%) and kinematic (5.1%) data are employed by a minority of the works, although they can be easily recorded through non-invasive methods and could provide additional objective quantitative information on the pathological conditions, performance and workload. Moreover, we highlight an increasing interest over the years in the development of validated applications aiming at becoming a standard in the clinical practice, since the majority of proposed solutions have been tested on a number of participants between 20 and 60 (37.9%) or higher than 60 (29.4%). However, the number of subjects/patients effectively using such solutions and their actual adoption by healthcare structures (10.2%) or research centers having stable and long-lasting collaborations with health facilities (12.3%) is still limited. Although results are promising and literature is rich, the exchange and sharing of data and applications within the large community of researchers and medical professionals is limited. In fact, datasets are sometimes available online (0.9%) or can be obtained from the corresponding author on reasonable request (9.8%), whereas implemented applications are rarely freely accessible, i.e. Web applications (1.3%), free downloaded (2.1%) and open source (0.4%), by hampering a real impact of the research works.