Facial Expression Rendering in Medical Training Simulators: Current Status and Future Directions

Recent technological advances in robotic sensing and actuation methods have prompted development of a range of new medical training simulators with multiple feedback modalities. Learning to interpret facial expressions of a patient during medical examinations or procedures has been one of the key focus areas in medical training. This article reviews facial expression rendering systems in medical training simulators that have been reported to date. Facial expression rendering approaches in other domains are also summarized to incorporate the knowledge from those works into developing systems for medical training simulators. Classifications and comparisons of medical training simulators with facial expression rendering are presented, and important design features, merits and limitations are outlined. Medical educators, students and developers are identified as the three key stakeholders involved with these systems and their considerations and needs are presented. Physical-virtual (hybrid) approaches provide multimodal feedback, present accurate facial expression rendering, and can simulate patients of different age, gender and ethnicity group; makes it more versatile than virtual and physical systems. The overall findings of this review and proposed future directions are beneficial to researchers interested in initiating or developing such facial expression rendering systems in medical training simulators.


I. INTRODUCTION
Retrospective medical record reviews in British hospitals [1] show that approximately 10% of patients admitted to a hospital experience unintended injuries caused by medical management. A report by the Institute of Medicine [2] showed that up to 98,000 hospital deaths occur in the United States of America as a result of medical errors each year. Enhancement of techniques available to train medical professionals to integrate multiple sources of information to make correct conclusions can reduce such accidents. Some physical examination procedures such as palpation [3] require years of experience to acquire the right motor skills, perceptual strategies, and therapeutic attitudes for general practitioners (GPs) [4], [5].
The associate editor coordinating the review of this manuscript and approving it for publication was Shiqing Zhang . Effective education of practical skills can positively influence clinician behaviour, reducing the risk of patient harm [6], [7]. Traditional teaching methods, such as live demonstrations followed by students practising under tutor supervision are often perceived ineffective due to class time being consumed by teaching and demonstration of practical skills, availability of teaching staff, and resource considerations [8].
Medical procedures such as physical examinations involve physician-patient interaction, a process involving multiple feedback modalities, and is affected by the intrinsic and extrinsic factors of both parties. Intrinsic factors such as the patient's pathological conditions and mood states; extrinsic factors such as gender, age, and ethnicity have been shown to affect how the patient expresses the feeling of pain [10]. Fig. 1 shows an overview of this interaction system, where the interaction points occur between phase 2 and 3, and phase 5 and 1. Example of physician-patient interaction system diagram during physical examination (adapted from Prkachin, 1994 [9]).
Physicians can learn some of the extrinsic factors from the patient's appearance and medical history, whilst some of the intrinsic factors that directly relate to the patient's current health conditions remain unknown. Physicians attempt to engage with the patient's experience and the encoding process by decoding and interpreting verbal response, and combine with the visual and haptic feedback to determine the unknown factors to make diagnoses.
Visual feedback, including static information conveyed by the patient's general appearance, and dynamic information such as changes in the patient's facial expressions, communicate more information than verbal and vocal messages [11]. Mehrabian et al. [12] found 55% of the total impact of a message is from the non-verbal component of communication, and Pease [13] found non-verbal messages can be conveyed by a person's general appearance, personal hygiene, clothes and etc. Visual perception through facial expressions during primary medical examinations such as manual palpation is often used as a feedback modality to assess a range of medical hypotheses for diagnosis. When patients experience discomfort or pain, they communicate their feelings and emotions through verbal and non-verbal channels, including facial expressions [3]. Pain is one of the most commonly exhibited reactions during medical examinations. It can be communicated through facial expressions, which has been the most direct method for humans to communicate their feelings and emotions since infancy [14]. When properly interpreted, information related to the pain experienced by the patient conveyed to the examiner by facial expressions is sometimes more accurate than verbal responses [15]. But if the observer wrongly diagnoses the severity of one's pain, it could result in mistreatment or even mortality [16]. The intrinsic and extrinsic factors in both the patient and the physician increase perceptual subjectivity when perceiving visual feedback. Hence it is important for the students to experience treating patients of different demographics and backgrounds, as misjudgments in visual feedback due to subjectivity could result in distrust between patients and health workers [17], [18].
Simulation-based education (SBE) is frequently used in medical training in recent years, as it can provide safe and effective learning environments for students [19]. Its central tenet is student-tailored making and learning from mistakes and errors, which are prevented or immediately terminated to protect the patient in clinical settings [20], [21]. Using SBEs, the causes of mistakes made during the training can be reviewed openly without liabilities, allowing students to confront the mistakes and recognise the importance and value of the experience, and possibly improving the quality of event reporting [20], [22]. Standardized patients (SPs), physical training simulators such as manikins, and computer-based simulation or virtual training simulators [23] are the main SBE types employed by medical educators. They range from low to high fidelity for different medical specialties such as internal medicine, emergency medicine, surgery and dentistry.
SPs are healthy people acting as a patient in a consistent and standardized manner [24]. They are of the highest fidelity as a simulator but are difficult to train, and are unable to exhibit desired physiological conditions and symptoms at their will. Moreover, it is difficult to find SPs to represent some demographic groups such as children and infants [25]. Physical and virtual training simulators do not generate all types of feedback SPs do. Haptic, auditory, and visual feedback are often regarded as the most important modalities when it comes to learning and training of medical operations. Many physical training simulators, such as manikins, are designed to provide accurate haptic feedback to help improve clinicians' motor skills, and many guidelines and design considerations were provided for haptic response and haptic visualization [26]. Manikin simulators with haptic feedback can be classified by their fidelity, where low-fidelity manikins [27] are used to train specific tasks or procedures, and high fidelity manikins [28] can be used to simulate a greater variety of medical conditions. Although manikin simulators are widely used in medical training and students are more comfortable with using manikin simulators than SPs [29], they cannot be easily customised to simulate patients of different demographic characteristics. On the other hand, systems such as computer-based simulators and virtual patients provide great flexibility in simulating patients of different sex, age, and ethnicity. But they often lack in compatibility with physical sensing and haptic feedback.
As previously emphasised, it is important to render accurate facial expressions in medical training simulators irrespective of the types. An overview of the implementation of a medical training simulator with visual feedback through facial expression rendering is shown in Fig. 2A. This closed-loop process has two main interaction points: facial expressions generated by the simulator and perceived by the student, and the subsequent reaction generated by the student and exerted on the simulator. The facial expression rendering system can encode inputs from the sensors embedded in the simulator and display real-time dynamic facial responses. Fig. 2B shows the stakeholders involved in this system. Researchers and developers should consider both the educator and student needs when designing the system.
The primary focus of this article is to review facial expression rendering techniques used in medical training simulators that have been reported to date, and to identify important design features, merits and limitations of these systems. Even though there are many reviews of SBEs [19], [30] and medical training simulators with feedback methods such as haptic feedback [31], [32], to the best of our knowledge, no attempts have yet been reported on in-depth reviews of facial expression rendering approaches in medical training simulators. A timely written review paper on facial expression rendering in medical training simulators may help, not only to identify the current status of the research, but also to provide information to future researchers interested in developing such systems.
The paper is organized as follows: Section II presents the methodology used to conduct the literature review. Section III presents an overview of facial expressions, specifically pain expressions, as it is one of the most common and useful feedback modalities of the patient during physical examinations, and facial expression models. Section IV reviews existing facial expression rendering systems in medical training simulators with classifications and comparisons. Section V presents a brief review of facial expression rendering approaches in other research domains to explore the potentials to incorporate the knowledge from these works into developing facial expression rendering systems for medical training simulators. Sections VI and VII summarise the findings of this review, and concludes the paper with a discussion of design considerations for facial expression rendering systems in medical training simulators and potential future directions.

II. METHODOLOGY
The focus of the literature review is to identify and summarize the past and current facial expression rendering methods used in medical training simulators. The literature review of this article is a combination of qualitative reviews [33] and narrative overviews [34]. Literature directly related to this topic and those from other domains are included to make the review more comprehensive. The inclusion criteria for the qualitative literature review was based on keywords search: ''medical training simulators with pain expressions'', ''eye movements'', and ''facial expressions'' using Google Search and Google Scholar. Both commercial products and research work directly related to those keywords were included as primary sources of the review. Software solutions and facial expression databases were also included as supportive materials. The exclusion criteria removed alternative versions of the same search result, and works related to ''facial expression recognition'' and ''emotional recognition'' as the focus of this article is on facial expression generation rather than recognition. The selected results were then classified into physical, virtual and hybrid systems, with a narrative overview for each system.

III. FACIAL EXPRESSIONS
Many of the practical skills clinical students need to acquire are based on sensory-motor coordination, which consists in generating, accessing and evaluating useful sensory information through actions. The key difference in diagnostic skills between an experienced doctor and a student is the ability to focus, identify and distil perceptual and haptic information that can justify hypotheses in a short time. Therefore, medical simulators should prompt students to use active perception in conjunction with information gain from haptic feedback, and this can be done by incorporating visual feedback into the system.
Visual feedback via facial expressions is an important source of information in medical examination. It is one of the most direct methods for the patient to communicate emotions and feelings, sometimes involuntarily [15]. Needless to say, understanding facial expressions is particularly important when the patient is unable to utilise verbal communication [35]. However, the challenge is to train a physician to extract examination-related information out of a complex repertoire of expressions with subjectivity and biases. In this section, we will explore theoretical frameworks of facial expression generation, common features of facial expression across ethnic groups, the facial expressions of pain, and frameworks for modelling facial expressions.

A. MECHANISMS OF FACIAL EXPRESSION GENERATION
Emotions are typically conveyed via facial expressions, voice intonation, gestures, postures and behaviour [37]. In everyday life, people differentiate between a variety of emotional episodes, such as resentment, guilt, sadness, anger, or happiness [38], and rely on the idea that faces reveal what others feel inside (i.e., emotional expressions), as well as what information is inferred from facial expressions (i.e., emotion perception) [39]. The vast majority of the literature in psychology and neuroscience has relied on the common or classical view of emotions, assuming they have distinct boundaries and neural pathways observed in the brain and body representing reactions to behavioural or contextual stimuli [40]. For example, emotions are presumed to be ''read'' via prototypical facial configurations representing discrete universal categories such as anger, fear, disgust, happiness, sadness and surprise [39], [41], [42]. Researchers often use Facial Action Coding Systems or FACS, which are schemes that allow for combining various parts of the face into emotional prototypes, where participants are asked what the person on the picture is feeling. For example, the pain expression may include brow lowering, skin drawn tightly around the eyes, and a horizontally stretched open mouth with deepening of the nasolabial furrow, and anger includes furrowed brows, wide eyes, and tightened lips [43], [44].
Nonetheless, assumptions about configurations of emotional expressions have not been reliably replicated across samples, cultures, nor situations/contexts [39]. In terms of other measurement modalities, two recent large meta analyses on emotions and brain locations indicate large discrepancies between individuals, as emotions such as disgust, sadness, anger, fear and happiness cannot be pinpointed to activity in specific regions of the brain using classical tasks of measurement [45], [46]. In addition, the experience of emotional events (a common way to identify emotional states) does not take into account individuals who experience emotional events without direct awareness, nor those who struggle with words to express and describe their own and others' emotions, such as alexithymics [39]. Despite an exponential increase in applications modelling human emotions in areas such as affective computing, these models provide low predictive capability, often resulting in inaccurate emotion detection in computerised systems [47], [48].
The constructionist view is an opposing framework where emotions represent dimensions rather than categories. Commonplace mental states including perceptions, cognitions, and emotions are assumed to be constructed from a combination of fundamental psychological operations [49]. Viewed as so-called core affect, emotions can be described as a linear combination of two underlying, largely independent neurophysiological systems known as valence and arousal (depicted in a circumplex Cartesian space in Fig. 3, where the inner circle shows a schematic map of core affect and the outer circle shows where several prototypical emotional episodes typically fall). The valence system determines the degree to which an emotion is pleasant or unpleasant, and the arousal system determines the degree to which it is behaviourally activating. Consequently, a 90 degrees distance (e.g., ''happy'' and ''tense'') represents independent, whereas a 180 degrees distance represents opposite affective experiences (e.g., ''happy'' and ''sad''). Importantly, experiences of emotions are considered to be ambiguous, overlapping sensations that are the product of activity in neural pathways subserving valence and arousal, but which become contextualized and classified through the processing of relevant situational, historical, behavioural, and physiological cues that people use to safely navigate the world [50]. Thus, rather than being 'caused' by external or internal events, emotional experiences are instead predicted based on information from core affect and context [51]. In this view, the brain is a probabilistic prediction processor or engine [52], VOLUME 8, 2020 and core affect can be used to predict the precision (or granularity) of emotional experiences [50]. One illustration of this is that viewing Serena Williams celebrating a Wimbledon victory allows viewers to identify her joy (a form of happiness) only after viewing the picture with a tennis racket in full gear (i.e., with contextual information indicating a win). Evaluating a close-up of her face without knowing the context would produce an entirely different emotional evaluation, such as that of rage [48]. This ambiguity is also seen in people with mental health problems and in chronic pain [53], [54]. Thus, emotions should be modelled as likelihoods of emotional reality rather than reactions to environmental stimuli. This has direct implications for computational modelling forming the basis for development of accurate emotion mapping of pain, as lack of universality in emotional expression as well as the context of medical examination need to be taken into account.

B. FACTORS UNDERPINNING THE DIVERSITY OF FACIAL EXPRESSIONS
Current literature in psychology and neuroscience show large inconsistencies in findings on the foundation as well as universality of human emotions, with growing work indicating that people are not always good at recognising emotions from faces without prior contextual clues [51]. In addition, problems with reliability, specificity, generalizability, and validity are largely present in studies based on the view that emotions are categorical [39].
For instance, a seminal study by Ekman and Friesen [55] identified commonality in different cultures for six displayed emotions: happiness, sadness, fear, disgust, surprise and anger; suggesting understanding one's emotion and feeling by observing facial expression is a reliable method when there are barriers in verbal communication. Izard [56] supported this idea of universality in facial expression and took inspiration from Tomkin [57] who proposed the Discrete Emotion Theory, stating that there is a limited number of pancultural basic emotions.
Nevertheless, the authority of this work has been questioned, as subsequent studies removing clues about context have found less congruence across cultures in emotion perception from facial information [39]. For instance, there are differences in interpretations of certain emotional expressions. Despite assuming that emotions such as anger will be easily identifiable by participants using behavioural and/or physiological measures (e.g., via facial or auditory expressions, or brain scans), there is low correspondence across such measures to reliably indicate a universal instance of emotions in the brain or body [40]. Comparing East Asian and Western cultures, Chen et al. [58] found differences in perceptual discrimination of facial expression of pain and orgasm based on facial actions. Such inconsistencies are also obvious in recent attempts at technological applications (e.g., Microsoft's Emotion API), with no emotion recognition algorithm thus far reliably predicting either people's emotion expressions or perception from facial information alone [39].
In sum, facial expressions can be viewed as a communication medium with embedded subjectivity in the observer's perception caused by differences in demographic attributes such as gender, age, and ethnicity [59]- [61]. In medical training, therefore, the trainees should be exposed to patients of a great variety of demographic characteristics to increase perceptual objectivity.

C. PAIN EXPRESSIONS
Pain is defined as ''an abstract concept which refers to (1) a personal, private sensation of hurt; (2) a harmful stimulus which signals current or impending tissue damage; (3) a pattern of responses which operate to protect the organism from harm'' [62]. Clinicians and doctors rely on pain as a feedback modality to adjust their examination procedures accordingly to minimise the discomfort the patient experiences [63].
Pain expressions have been identified as common and useful feedback to the clinician. They are subjective by nature and the intensity of pain have been proven to be more accurately measured by analyzing facial expressions than self-reports of the patient [15], [63], [64]. It is crucial to understand how pain is generated, expressed, observed and understood. If the observer wrongly diagnoses the severity of one's pain, it could result in mistreatment or even mortality [16]. Pain intensity can be measured using the verbal rating scale (VRS-4), the visual analog scale (VAS) and the numeric rating scale (NRS-11). Breivik et al. [65] showed VRS-4 was less sensitive than VAS and NRS-11, and VAS showed similar sensitivity in relation to the pain intensity, highlighting the importance of analyzing facial expressions in evaluating pain. There are also pain scales for assessing children's pain such as the Faces Pain Scale-Revised [66], and the Wong-Baker FACES Pain Rating Scale [67]. These scales are based on facial expressions as children are less able to express their feelings verbally.
Pain can be communicated and interpreted via nonverbal communication methods. Scherer [68] presented a nonverbal communication model as a three step process: A: internal state of an experience. B: encoding A into expressive behaviours. C: observers draw inferences about the sender's experience. Ekman's Neurocultural Theory [69] suggested that the facial affect program (one-to-one map between felt emotion and displayed facial expression) is the same for all people in all cultures. But people use ''management techniques'' to control and sometimes override the operation of the universal facial affect program under some social settings. Prkachin included the social stimuli in Rothenthal's model and proposed a general model of a pain episode [9], which is composed of three stages: Experience, Encoding and Decoding. This model outlined the variability of the relationship between the intensity and quality of pain, and one's expression of pain.
Many researchers evaluated how pain is translated to facial expressions. Chapman and Jones [70] evaluated reactions to pain stimulus with heat. They found similarities between peoples' facial expressions such as the contraction of the eyelids, and noted that the majority of the subjects were unable to constrain that muscle movement when asked to. Prkachin [16] also found similarities in facial features when people experience pain.
Therefore, medical training simulators should be able to render the facial expressions of pain to a high level of similarity of the human expression to stimulate the visual information integration process of the trainee. This can be achieved by synthesising specific facial features and muscle groups. The next section introduces systems and frameworks for describing facial expressions.

D. FACIAL EXPRESSION MODELS
As previously mentioned in Section III-A, FACS was one of the first frameworks to systematically decode facial expressions. FACS divides common facial expressions into smaller movement primitives known as action units (AUs), allowing facial expressions to be analyzed quantitatively. The Emotional Facial Action Coding System (EMFACS) was later developed to improve computational efficiency of FACS. There are also other facial action coding systems. Izard created the Maximally Discriminative Facial Movement Coding System (MAX) [86], which is commonly used to examine infants' facial expressions. The System for Identifying Affect Expressions (AFFEX) [87] is another facial action coding system for children, which makes direct judgements of the affective meaning of the patterns of facial muscle movements in infants and young children. Ekman's system was more quantitative and it was later widely used in automatic facial expression recognition systems, whereas Izard's was more qualitative and was more commonly used in medical training.
Many models and systems for analysing and generating facial expressions were developed using the facial action coding frameworks, data analysis and image processing techniques. Deception means ''to retrieve information from one's expression'' [15], and it can be classified into two categories: subjective and objective [88]. Objective methods rely on taking measurements and evaluating numerical differences to determine changes and states of various facial expressions. Many objective methods were developed using signal processing and data analysis methods as summarised in Table 1. Subjective methods correspond to various facial nerve grading systems, and the most common system is the House-Brackmann facial nerve grading system [89]. Subjective methods are esteemed to be less reliable than objective methods due to variations caused by observer bias, and is not commonly used in facial expression decoding systems.
Researchers found the pain expression could be identified as a combination of at least four AUs using FACS. Prkachin [90] used FACS to evaluate peoples' responses to a broad range of pain stimuli and found significant increase in four AUs with pain intensity across all categories of pain: brow lowering, orbit tightening, upper-lip raising/nose wrinkling, and eye closure. Craig and Patrick [91] conducted what is known as the cold pressor test and found six AUs associated with pain. Craig et al. [92] analyzed all the AUs that activate during the pain experience and found lower facial actions more likely to occur during severe pain.  some of the significant work and findings on pain stimulus, intensity and facial expression up to date. Fig. 4A shows that some AUs are more frequently shown in pain expressions than others, the colours classify AUs based on their appearing frequency in previous studies. The result shows AU4, 6 and 10 are most likely to be present in the pain expression, and this finding is comparable to findings by other researchers. Prkachin and Solomon [97] found using AU4, 6, 7, 9, and 10 are sufficient to exhibit pain expression at varying intensity levels. Chen et al. [58] found AU4, 6, 9, 10, and 20 to be the most frequent AUs for pain expression in Western and East Asian cultures. This plot also suggests the simplest pain expression model can be built by synthesising AU4, 6, and 10, as shown in Fig. 4B. By using different numbers of AUs, researchers and developers can build systems of different degrees of realism, though the individual activation intensity and activation orders should be experimented and tested for dynamic simulations.

IV. MEDICAL TRAINING SIMULATORS WITH FACIAL EXPRESSION RENDERING
The facial expression rendering systems designed for medical training simulators range from fully virtual to fully physical, as classified in Fig. 5. Virtual systems rely on mediums such as computer monitors, televisions or projections on flat screens to render the facial expressions. Physical systems can be active manikins or robotic heads/faces that mechanically render facial expressions. And hybrid facial expression rendering systems are consisted of both virtual and physical components to synthesize the facial expressions. Thus far, different approaches or solutions have been proposed for rendering facial expressions in medical training simulators, as summarized in Table 3.

A. PHYSICAL FACIAL EXPRESSION RENDERING SYSTEMS
Full body manikins can simulate many different symptoms. SimMan 3G [28] from Laerdal Medical, an adult patient simulator for advanced training in emergency care, and Pediatric HAL [99] from Gaumard Scientific, an pediatric patient simulator are established commercial manikin simulators. Both simulators render facial expressions mechanically, and can simulate a variety of neurological and physiological symptoms. HAL can simulate emotions through dynamic facial expressions, movement and speech, whereas SimMan 3G is limited to eye movements only. Although only eye  Table 2. AU4, AU6 and AU10 were shown as the frequently used AUs to represent the pain facial expressions (B) Rendered facial expressions of pain with different AU activation intensity. Facial muscle groups driven by the assigned AUs are highlighted by the ellipses. movements are implemented as SimMan's facial expression feedback, the state of the eye is a critical factor in recognition of consciousness in emergency situations. SimMan can be customized using wigs to simulate patients of different gender but the skin colour cannot be changed. HAL can be made in three skin colours: light, medium and dark, but the ethnic features of the face do not differ.
Dentistry is another area where simulators are used to improve motor skills and hand-eye coordination of the students, which are essential clinical skills [109]. There are two main dental simulation systems: SIMROID [100] and DEN-TAROID [101], which are designed by Japanese researchers with physical facial expression rendering. SIMROID can simulate stress or discomfort the patient feels during the procedure. The face of SIMROID is actuated by air cylinders to physically display facial expressions of discomfort, pain and positive acknowledgements. Its eyes can rotate about the vertical and horizontal axis, and the eye lids can exhibit different degrees of opening. Its mouth can open and close, and its head can rotate about the vertical and horizontal axis. DENTAROID has automatic dialogue patterns, which gives students a more realistic clinical training experience. It helps students to learn to avoid accidents and improve communication competency with patients. DENTAROID is capable of performing different reaction movements that simulate accidents which can occur during treatment, such as reaction to pain. It can also perform facial movements such as eye blinking to make the simulation more realistic. Both robots have skin textures closely resemble the human skin. The ethnicity of SIMROID can be varied but the gender of both SIMROID and DENTAROID cannot be changed.
In addition to the commercial products, many platforms and systems are being explored by different research groups. Baldrighi et al. [102] proposed a bio-inspired facial expressions rendering system for a medical manikin. Their system was able to synthesize 7 emotions including happy, sad, angry, surprised, suspicious, disgusted and sarcastic expressions. The skin of the face was made of silicone elastomer to replicate the texture and appearance of real human skin, and the underlying structure was consisted of 13 DOFs driven by motors. The system was able simulate eyes movements such as blinking, iris dilation, and object tracking. The neck of the manikin was realized with a 6 DOF Stewart platform.

B. VIRTUAL FACIAL EXPRESSION RENDERING SYSTEMS
Another approach of facial expression rendering is virtual simulations. In 2014, Moosaei et al. [104] reported an approach for synthesizing naturalistic pain on virtual patients. They used the UNBC-McMaster pain archive and MMI database for acquiring source videos of anger and disgust. A Constrained Local Model (CLM) based face tracker was used to extract 68 feature points frame-by-frame from each source video. To render three different expressions: pain, anger and disgust, they mapped the extracted feature points to control points of an animated virtual character using Steam Source SDK. The authors also claimed that the naturally driven pain synthesis they used in the study was comparable to FACS based pain synthesis. In 2017, Moosaei et al. [108] performed a study on using facially expressive robots to calibrate clinical pain perception. In this study, they developed a virtual patient to synthesize facial expressions of pain, anger and disgust, and compared it with the same facial expression rendered on a Philip K. Dick (PKD) humanoid robot. The study found that it was easier for observers to detect pain in a virtual patient than in the high-fidelity, facially expressive PKD humanoid robot patient simulator. In addition, their approach allowed accurate generation of virtual human avatars of various demographic features.
In 2017, Maicher et al. [105] developed a conversational virtual standardized patient to help students practice history-taking skills. Character models in the simulator were created with Autodesk Character Generator and refined using Autodesk Maya. The virtual characters were able to synthesize various emotions via facial expressions such as happiness, anger, sadness, fear, surprise, disgust, contempt and pain. This system also allowed skin color and the gender of the character to be changed during sessions.
Wandner et al. [103] used virtual human (VH) technology with the People Putty software program to generate human avatars of different demographic attributes with various pain intensity expressions. The benefits of using virtual avatars are that the facial features and pain expressions can be standardized without biases from the construction of the stimuli, and that patients of different sex, race, age and pain intensity can be generated rapidly. Using this novel method they found the characteristics of the VHs influenced the ratings of pain assessment and treatment recommendations. This finding suggests that medical training systems should give students exposure to patients with different demographic attributes to minimise the effect of biases in treating real patients.

C. HYBRID FACIAL EXPRESSION RENDERING SYSTEMS
Facial expression rendering methods are not only limited to virtual and physical approaches. In 2008, Kotranza et al. [106] proposed a mixed reality human platform for breast cancer examinations which we classify as a hybrid approach. Mixed reality or hybrid systems preserve features of both the physical and virtual approaches, often done by projecting virtual models onto physical objects. The physical systems consisted of a full-body physical embodiment, in the form of a plastic manikin. The left breast of the manikin was a soft phantom simulating the feel of breast skin, tissue, underlying breast masses, and contains twelve pressure sensors to detect the user's touch. The virtual system was realized using a head mounted display (HMD) and wireless microphone. Head orientations and positions of the user were tracked using a combination of infrared-marker based tracking and accelerometers in the HMD. The user was able to interact with the mixed reality model through a combination of verbal, gestural, and haptic inputs. The proposed system was able to generate virtual female characters of different ethnicity to represent the patient. During the pilot experiments conducted in this study, the facial expression of the mixed reality human showed discomfort during any touching of the breast, and pain when the user pressed in an area that was designated as painful.
In 2012, Diego et al. [107] developed a physical virtual patient for medical students to conduct ophthalmic exams. The proposed system was based on a concept called Shader Lamps Virtual Patients (SLVP) where combination of shader lamps avatars and conversational virtual patients, as a means to achieve such physical presence were utilized. The physical system consisted of a styrofoam head with retro-reflective markers attached on the back, a pan-tilt unit for head motion control, and a static manikin body. A virtual character's head realized using virtual human software was projected onto the styrofoam head using a front-mounted Mitsubishi XD300U projector (1024 X 768 resolution). To correctly register the head of the virtual character onto the physical styrofoam head, a closed-loop process using the pan-tilt unit and an Eight-Natural-Point OptiTrack infrared camera tracking system was used. Results from the study suggested that proposed SLVP has an advantage over a SP because it can simulate pathological conditions such as restricted motion of one eye which a person with healthy eyes cannot exhibits.
The most recent hybrid approach was the physical virtual patient simulator by Daher et al. in 2020 [108]. An interchangeable translucent plastic shell was used as the physical structure of the patient body and face. And a virtual patient was rear-projected onto the shell using 2 AAXA P300 Pico projectors which provide imagery for the patient's head and body, respectively. The proposed physical-virtual patient simulator could display a range of multi-sensory cues, including visual cues such as capillary refill, facial expressions, appearance changes as well as auditory and tactile cues.

V. A BRIEF REVIEW ON FACIAL EXPRESSION RENDERING SYSTEMS REPORTED OUTSIDE MEDICAL TRAINING SIMULATOR DOMAIN
Research projects and insights from other fields such as robotics and computer graphics can also contribute to the developments of facial expression rendering systems in medical training simulators. This section reviews studies reported on virtual, physical and hybrid facial expression rendering approaches in those other research fields and discusses approaches and features that may be applicable to current and future developments of medical training simulators.

A. PHYSICAL FACIAL EXPRESSION RENDERING APPROACHES
Extensive amount of work have been carried out in the field of robotics to develop robots with human-like faces or facial expressions. The majority of these work are related to social robots or human-robot interaction (HRI) domains. Fig. 6 shows physical robotic systems that can synthesis facial expressions from abstract, low fidelity to human-like, high fidelity levels.
The number of facial expressions these robots can simulate are directly proportional to the degrees of freedom (DOFs) they have. For example, AILA [110], Pepper Robot [120] and M3-Neony [121] have low numbers of DOFs and can only execute basic head and eye movements. Some researchers explored the possibility of synthesizing facial expressions using ambiguous figures such as Telenoid R1 [122] and cartoon-like robotic faces such as Zeno [112] and Flobi [113]. Though the appearance of these robots resemble fictional characters, their facial expressions and emotions were similar to a human's. This fictional character design approach allows features such as modularity to be added. For instance Flobi's modular structure allows easy access to the underlying hardware, and the visual features of the robot such as hairstyle and facial features can be altered easily.
Mechanical humanoid robots give great sense of realism but many may fall into the uncanny valley, a sudden decrease of the sense of affinity prompted by the realization that the user is interacting with a robot [125]. The ''creepiness'' and uneasy feeling induced by this realization may have drastic effect on the perception of facial expressions generated by the robot. Some robot designs tackle this problem by focusing on generating human-like facial expressions and emotions rather than replicating the human-like appearance. For instance the social robot Kismet [126] has an interactive emotion system and is capable of displaying multiple facial expression, but its appearance remains mostly mechanical.

B. VIRTUAL FACIAL EXPRESSION RENDERING APPROACHES
Advancement in graphical processing and real-time rendering enabled researchers to build virtual human avatars with realistic and detailed facial features. To animate the facial expression of a virtual human avatar, both the morphological and dynamic characteristics of the virtual face must be considered [127]. Facial expressions generation of virtual avatars were based on methods either consist of exploiting the empirical and theoretical research such as FACS or the study of annotated corpus containing the expressions of emotions displayed by humans and virtual characters [127]. Many of the software solutions used FACS based methods and were made open-source to allow easy access for researchers and developers. Table 4 lists reported FACS-based software platforms for generating facial expressions. FACSe! [129] was the first 3-D face rendering software developed to use FACS to define bone system movements. It was built on the 3DSTUDIO MAX platform and controlled the bone rig of the face to change the activation intensity of different AUs to generate different facial expressions. Inspired by this application, FACSGen [130], [131] and HapFACS [132], [133] were developed with more functions including dynamic facial expressions that can be exported as video clip, and bilateral AU activation. Game engines and computer graphics (CG) improved drastically in recent years, leading to the development of OpenFACS [128] and FAC-SHuman [134], which were built using gaming engines and CG software. OpenFACS was built with Unreal Engine [139] and could be used in multiple operating systems. Its human face rendering is very realistic, and offers the functions FACSGen and HapFACS have. However, a high performance computer is needed to meet the graphical processing requirement of the application or the frame rate drops significantly and the software becomes unresponsive. FAC-SHuman is a FACS-based plug-in for Makehuman [140], a Python-based human model generation software emerged from Blender [141]. It offers all the functions those other applications have and allows the human face models to be exported in many 3-D object formats.
There are many facial expression databases to help researchers to train, build and validate their models. Some databases only have static images, such as the Karolinska Directed Emotional Faces [142], which includes 4900 pictures of human facial expressions; and the Cohn-Kanade AU-Coded Facial Expression Database [143], which has 500 image sequences with FACS AU annotation and emotion-specific expressions. Other databases have images and videos. The MMI Facial Expression Database [144] contains 2900 videos and still images of 75 subjects, with AUs annotated on frame-level. The UNBC-McMaster Shoulder Pain Expression Archive Database [98] has 200 video sequences, 48398 FACS coded frames and associated pain scores with self-report and observer measures.

C. HYBRID FACIAL EXPRESSION RENDERING APPROACHES
Lastly, Table 5 lists some of the hybrid facial expression systems that have both virtual and physical components. Furhat [135] is a social robot that synthesizes facial expression by having a virtual face projected onto a physical shell using a custom wide-angle projector. The physical head with the shell has 3 DOFs: pan, tilt and roll, which are controlled by servo motors. Using a 3-D animation software, Furhat is capable of synthesizing faces of different ethnicity and gender. Similar hybrid approach was used in Socibot [136] and Mask-bot [137] which have 3 and 2 DOFs head movements in the physical system respectively. In all three aforementioned works, rear-projection method (a projector is mounted behind a physical shell) is used with no active components in the physical shell.
Bermano et al. [138] proposed an augmented physical avatar using projector-based illumination. This model has a high number of DOFs, and as mentioned previously the number of expressions physical systems can produce are directly proportional but also limited by the DOFs they have. To make this model synthesize more expressions, the authors introduced a system that decomposed the motion into low-frequency motions that were physically reproduced by the physical robotic head, and high-frequency details that were added using projected shading. The physical head was implemented using a layer of silicone skin attached to an underlying rigid articulated structure with 13 DOFs actuated by a set of motors. 5 cameras and 3 projectors were mounted in front of the physical face to reconstruct and project the virtual face. The authors in this study claimed that result of their system was a highly expressive physical avatar that VOLUME 8, 2020 features facial details and motion otherwise unattainable due to physical constraints.

VI. DISCUSSION AND FUTURE DIRECTIONS
Given a haptic task such as physical examination, our previous work showed that the brain takes two broad classes of actions -one to condition the body such as searching for the best finger stiffness [145], and the other to control movements and forces [146], [147]. In the case of physical examination of patients, the process of combining visual perception with haptic feedback plays a critical role in the motor control pathways. For instance, subjective interpretation of facial reactions during physical examination such as palpation can lead to reduced levels of indentation. This may require a different finger speed regulation method to gain haptic information about a physiological condition. Misperception of constraints associated with palpation behavioural variables can therefore lead to sub-optimal diagnostic methods. Medical training poses the challenge of finding best design methods for patient simulators to prompt the learning of multi-modal sensor integration and sensory-motor coordination.
The face is one of the most expressive areas of the body which is capable of producing about twenty thousand different facial expressions [148]. Naturally, this makes facial expressions of patients one of the powerful non-verbal feedback modalities that medical personnel should account for during medical training and practice. As such, this article reviewed the developments and current implementations of facial expression rendering approaches in medical training simulators. We reviewed models describing facial expression generation, explained the advantages and significance in analysing facial expressions in clinical practice, highlighted the relationship between pain reaction and facial expressions, and introduced systems and frameworks for classifying facial expressions.

A. DESIGN CHALLENGES
We highlighted the debates around the psychological basis of tracing facial expressions to pain. It became clear that the context cannot be separated from the facial expression to interpret them. In medical education, the patients' medical history and cultural background are important factors to interpret, in conjunction with facial expressions to diagnose a physiological condition. Moreover, other feedback modalities such as vocal and muscle tension also provide useful clues together with facial expressions. Therefore, potential simulators should be able to present a fair variety of medical contexts in order to give a well rounded preliminary training to trainees for generalizeable skills.
We then reviewed medical training simulators with facial expression rendering systems and classified them to three categories: physical, virtual and hybrid. Their application domains, commercial availability, implementation methods, and facial expression simulation abilities were evaluated. Facial expression rendering approaches and systems in other research fields were also discussed, as some of the proposed methods and insights can be transferred to develop such systems for medical training simulators. It became clear to us that hybrid approaches have the advantage of being able to achieve a higher level of agency while offering the flexibility to present a variety of patient contexts. This is very important because empathetic connection is an important factor during medical examinations. Ethnic appearance is one such important contextual factor that can be easily changed in a hybrid approach. At a higher level of complexity, it would be interesting to explore how the physical morphology of a hybrid patient simulator can be controlled to render gender and ethnicity interactions.
The challenge for designers would be to maximise the efficacy of such designs using material, actuators, and graphics rendering tools to manage the level of agency of the medical training simulator with facial expression rendering without crossing over to the uncanny valley. Certain medical scenarios especially those involving the examination of the face itself can pose extra challenges for designers. For example in a dental or emergency treatment training simulator, the mouth of the simulated patient plays an important role and the face is directly involved in the training process. Fig. 7 gives a list of such considerations for all stake holders: medical educators, students and simulator developers.
To further improve the realism of the training experience, the robotic system could utilise multimodal fusion by capturing the state of the examiner, such as facial expressions, speech, force profiles and body movements. Sentic blending strategies [149] can be used to integrate quantitative data such as the force profile with abstract information such as facial expressions and speech to derive how the examiner feels at any given time. As mentioned previously, intrinsic factors affect how the patient reacts to pain, and the cognitive and affective information received by the patient is affected by the state of the examiner. Cameras and microphones can be used in the hybrid systems to capture the examiner's movements, facial expressions and speech data, and these information can be projected into a common AffectiveSpace through a fuzzy classifier [150]. Future development of hybrid systems may benefit from sentic blending and multimodal fusion to improve the realism of the training experience, and to capture more insightful information of the student's performance.

B. FUTURE DIRECTIONS
This review highlights the need to have controllable medical training simulators such as robotic patient simulators that can present multiple physiological conditions of patients from different gender and ethnic backgrounds. This will allow instructors to assist medical students to learn robust techniques and efficient methods to combine visual and haptic feedback when performing medical procedures on patients from diverse backgrounds. For instance, in the case of a physical examination simulator, this requires such robotic simulators to consist of 1) controllable sensorized internal organs that can present several classes of symptoms and measure palpation behaviours, 2) a facial expression rendering method to present essential features of visual feedback relevant to the training context, and 3) methods to quantify trainee's examination behaviors to provide focused feedback for peer assisted learning [151] and to optimize a given set of training criteria.

VII. CONCLUSION
Being able to present a variety of patient contexts is an open design challenge for patient simulators. Out of the currently available methods, physical simulators provide accurate haptic feedback but lacks patient customisation and detailed facial expression simulation. Virtual systems on the other hand provide facial expression simulations with fine detail and realism, and can simulate patients with different demographic attributes, but they often do not provide haptic feedback or interface with hardware sensors to take physical inputs. Hybrid systems integrate features from both physical and virtual systems and are capable of delivering accurate haptic and visual feedback. Based on this review we conclude that there are several opportunities for technology advances to maximise the efficacy of patient simulators with facial expressions. New methods can be found to control the physical morphology of the robot to render gender and ethnic diversity. This can be backed by animated solutions to overlay detailed nuances representing the patients' medical and cultural context. This poses an opportunity for AI techniques such as deep neural networks to parameterize complex linguistic phenomena such as culture, gender, and ethnic interactions. In this regard, considerations to determine the responsibility sharing between the physical and virtual rendering methods has become a key future challenge.  London, where she is also the Head of the Design Psychology Laboratory, with research focusing on understanding psychological mechanisms that govern human behavior, emotions, and decision-making processes related to designing products, services, and behavioral interventions that benefit mental health. She also leads the Human Behavior and Experience Research Network of Excellence, Imperial College London. She has published two textbooks and more than 30 peer-reviewed articles. She is a Chartered Psychologist (CPsychol) with the British Psychological Society (BPS).

SIMON DE LUSIGNAN is currently a Senior
Academic GP and a Professor of primary care and clinical informatics with the Nuffield Department of Primary Care Health Sciences, University of Oxford, U.K., and the Director of the Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC), U.K. His research interests include clinical informatics/digital health, disease surveillance, quality improvement (QI), measuring health outcomes from routine data, incorporating technology into clinical workflow, and new technology enabled roles in health care.
FUMIYA IIDA (Senior Member, IEEE) received the bachelor's and master's degrees in mechanical engineering from the Tokyo University of Science, Japan, and the Dr.sc.nat. degree in informatics from the University of Zurich, Switzerland. He is currently a Reader of robotics with the Department of Engineering, University of Cambridge. He is also the Director of the Biologically Inspired Robotics Laboratory and a Fellow of the Corpus Christi College. During his Ph.D. project, he was also engaged in biomechanics research of human locomotion at the Locomotion Laboratory, University of Jena, Germany. While he worked as a Postdoctoral Associate at the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA, he awarded the Fellowship for Prospective Researchers from the Swiss National Science Foundation, and then, the Swiss National Science Foundation Professorship hosted by ETH Zurich. In 2014, he moved to the University of Cambridge as the Director of the Bio-Inspired Robotics Laboratory. His research interests include biologically inspired robotics, embodied artificial intelligence, and soft robotics, where he was involved in a number of research projects related to robot locomotion, manipulation, and human-robot interactions leading to some start-up companies. He was a recipient of the IROS2016 Fukuda Young Professional Award and the Royal Society Translation Award in 2017. He has published two textbooks and more than 150 peer reviewed articles. His research interests include soft robotics, and robotic interaction with uncertain environments. VOLUME 8, 2020