A Systematic Review of Sensing and Differentiating Dichotomous Emotional States Using Audio-Visual Stimuli

Recognition of dichotomous emotional states such as happy and sad play important roles in many aspects of human life. Existing literature has recorded diverse attempts in extracting physiological and non-physiological traits to record these emotional states. Selection of the right instrumental approach for measuring these traits plays a critical role in emotion recognition. Moreover, various stimuli have been used to induce emotions. Therefore, there is a current need to perform a comprehensive overview of instrumental approaches and their outcomes for the new generation of researchers. In this direction, this study surveys the instrumental approaches in discriminating happy and sad emotional states that are elicited using audio-visual stimuli. A comprehensive literature review is performed using PubMed, Scopus, and ACM digital library repositories. The reviewed articles are classified with respect to the i) stimulation modality, ii) acquisition protocol, iii) instrumentation approaches, iv) feature extraction, and v) classification methods. In total, 39 research articles were published on the selected topic of instrumental approaches in differentiating dichotomous emotional states using audio-visual stimuli between January 2011 and April 2021. The majority of the papers used physiological traits, namely electrocardiogram, electrodermal activity, heart rate variability, photoplethysmogram, and electroencephalogram based instrumental approaches for recognizing the emotional states. The results show that only a few articles have focused on audio-visual stimuli for the elicitation of happy and sad emotional states. This review is expected to seed research in the areas of standardization of protocols, enhancing the diagnostic relevance of these instruments, and extraction of more reliable biomarkers.

Sadness is also related to many adverse effects, including depression, sleep disorders, anxiety, suicidal attempts, and scant attention. Long-term sadness has negative implications for cardiovascular activity [14]- [16]. Identification of these disorders at the earlier stage can help to improve treatment. Recently, a report on world happiness has also demanded significant attention to happiness [17]. Therefore, it is necessary to understand the neurological, psychiatric, and biobehavioural mechanisms of happy and sad emotions.
This work is motivated by the growing interest in recognition of clinical conditions linked to happy and sad emotional states, such as prediction of major depressive disorder (MDD) in long-term sadness. Prolonged sadness is the precursor of MDD. The effect of MDD may lead to reduced quality of life. MDD is predicted to become the leading cause of disability by 2030 for around 20 percent of the population over the course of life [18]. In the current study, we consider comparing instrumental and physiological trait-based approaches available to recognize happy and sad emotional states, which may help predict the clinical conditions.
The emotions are described using two common and popular ways, namely a discrete emotion approach and a dimensional approach. In a discrete emotion approach, emotions are categorized into six basic emotions, as described above. In a dimensional approach, the emotions are described using valence and arousal dimensions [19], [20]. The dimension of valence is the positive or negative emotion perceived by the users. In contrast, the dimension of arousal is the intensity of the particular emotion experienced by the users [21]. Happiness and sadness are described by opposite valence and arousal levels [7], [22].
In a non-physiological trait, namely the GAIT cycle, the movement of the body tends to incline forward and direct their hands towards their source of irritation in sadness. Also, there is a reduction in walking speed, vertical head motions, and arm swing in people that are perceived to be in a sad emotional state. The shoulder and elbow movement magnitudes are comparatively reduced in the sad emotional state [30]- [33].
In physiological traits, variations are observed in the amplitude and frequency of the signals. For example, ECG shows significant variations in the ST segment corresponding to happy and sad emotional states. The convex ST segment is highly predictive of a happy emotional state, while a concave ST elevation strongly suggests a sad emotional state [34].
For sad emotional states, sympathetic activation is reported to be high compared to happiness [35], [36]. Moreover, in a sad state, HR increases to provide an increase in blood supply [37]. The variation in the HRV is inversely correlated with HR. Thus, HRV decreases in sadness, and it increases in happiness [38]. In a happy emotional state, the mouth muscle, zygomaticus, eye muscle, and orbicularis are activated and lead to a rise in the mouth corners. These muscle activities are reflected by fEMG [39], [40]. Also, the pulse beat cycle of the PPG signal is reported to be more significant for the happiness emotion state [41], [42].
In a happy emotional state, the brain regions such as the right frontal cortex, the precuneus, and the left insula are activated; whereas, in a sad state, there is an increase in activity of the brain regions, namely the left insula, the right occipital lobe, the left thalamus, the hippocampus, and the amygdala. The hippocampus is strongly linked with memory, and it makes sense that awareness of specific memories is associated with sad feelings [13], [43], [44]. These changes in the central nervous system activity are reflected in EEG.
EDA is a measure of the continuous variation in the electrical property of human skin, which reflects the sympathetic division activity of the autonomic nervous system [45]- [47]. It is reported that the sweat expelled through the sweat glands is more in happiness than sadness. Thus, the conductance of EDA is higher in happiness as compared to sadness [48], [49].
Researchers have proposed various emotional triggers for the understanding of mental or cognitive processes. Especially, standardized collections of words, pictures, faces, and film clips/audio-visual stimuli have enabled research in affective computing by allowing the researchers to select suitable stimuli and compare the results through lab environments [50], [51]. The audio-visual stimuli are the important triggers to evoke intense emotional reactions in the laboratory because of their high resemblance to real emotional experiences [51]- [54].
Several physiological signals and non-physiological traits have been employed for differentiating dichotomous emotional states [39], [42], [55]- [91]. Although various literature has been reported, a systematic review that deals specifically with the happy and sad emotional states using audio-visual stimuli and a description of the instrumental approaches to classify them remain limited. The review also highlights the advantages, limitations, and gaps in the instrumentation-based dichotomous emotion recognition field. In addition, it could contribute to the development of a standardized data collection protocol and assessment procedures for this field to evaluate different data acquisition methods.

II. REVIEW METHODOLOGY
This review methodology is divided into seven subsections, namely search strategy, subject information, stimulation modality, data acquisition protocol, instrumentation approach, feature extraction, and classification.
A total of 655 articles (Scopus, n = 554; PubMed, n = 12; ACM, n = 89) are identified after the initial search process, and 19 articles have been omitted as duplicates. The screening phase involved the examination of records identified in the initial search. The query syntax is reviewed independently by two reviewers. Out of 636 articles, 373 articles are excluded based on the emotions related to animals, robots, pediatric, geriatric, and the participants with neurological disorders. Further, 224 studies have also been excluded after reviewing full-text articles based on the type of stimuli and research articles. Finally, 39 articles are included for the review. The inclusion criteria of articles are as follows: (i) studies using audio-visual stimuli for emotion elicitation, (ii) studies differentiating only happy and sad emotional states, (iii) studies differentiating positive and negative emotional states, and (iv) studies with the combination of other emotional states, where happy and sad emotional states are classified discretely. The articles that do not include happy and sad emotional states in their methodology are excluded. The PRISMA flowchart used for the selection of articles in this review is shown in Fig. 1.   Fig. 2(b) shows the type of physiological signal used in the selected studies. It is seen that 49% of the 39 selected articles have been used EEG signals, followed by ECG signals (10%). The percentage of studies using EDA, PPG, and multimodal signals is 8%, while the GAIT and HRV signals are 5% each. HR and fEMG signals usage account for 2% each, while PPG signals is 3% only.

B. SUBJECT INFORMATION
Among the reviewed articles, the number of participants varies depending on the type and field of the experiment.

C. STIMULATION MODALITIES
The choice of emotional stimuli depends on the research question and can be easily determined using the stimuli emotion matrix [51]. Emotion matrix is a graphical representation of five critical emotional stimulus characteristics (see Fig. 3), namely, Ecological Validity (EV), Temporal Resolution (TR), Controllability (CNT), Complexity (CMP) and Emotional Intensity (EI) allow researchers to select suitable stimuli in affective computing (see Table 1). The emotional stimulus must-have characteristics such as low CMP, low CNT, high EI, high TR, and high EV to elicit strong emotional reactions [51]. Compared to the text, audio, and images, the audiovisual stimuli have desirable properties, namely high EV and dynamic for emotional elicitation. Based on the effectiveness of audio-visual stimulus to induce emotions, the articles using only audio-visual stimuli for eliciting the dichotomous emotional states are considered and represented in Table 2. The references of the selected 39 studies are assigned with a Systematic Review identification number (SRYY), where YY represents a numerical digit from 1-39, as shown in Table 2 for convenience of accessibility in the rest of the manuscript.
Few of the works have not mentioned about source of stimuli used for emotion elicitation. Interestingly, the number and length of the stimuli are not the same and vary for different published articles. Table 2 shows that the minimum number of video clips used is two, and the maximum number of video clips used is a hundred. Also, the least duration of stimulus used is 0.5 min (SR04, SR17, SR39), and the maximum duration is 40 min (SR15). In 20 of the studies, film clips have been selected with the help of annotators.

D. DATA ACQUISITION PROTOCOL USING AUDIO-VISUAL STIMULI
The protocol followed in the selected articles is summarized in the flowchart shown in Fig. 4. Table 2 shows that the least TABLE 2. The experiment protocols used in the selected articles in recording various modalities using audio-visual stimuli. duration of the experiment is approximately three min (SR15) and the maximum duration is 96 min (SR26). Before starting the experiment, the procedure has been explained clearly to the subjects, and the consent form is filled. The experiment is carried out in a 30 dB soundproof room (SR09, SR10, SR35) with well-lit (SR03) and constant temperature (24 ± 2 • C) (SR08) or in a laboratory environment (SR14), where exact measurements are obtained.
To avoid mind wandering, the subject has to be brought into the neutral states using different methods such as taking rest (SR02, SR09, SR11, SR12, SR18), closing eyes for 60 seconds (SR22), performing GO/NO-GO task (SR39), and watching a neutral video (SR09 -SR11).
Before watching stimuli, the mood of the participant is identified by various methods such as rating the subject mood on the Positive and Negative Affect Schedule (PANAS) scale (SR06, SR07), from the self-report questionnaire (SR18) and by conducting a stress-resistance questionnaire test (SR15).
During the experiment, the videos are displayed randomly. However, random videos do not influence emotional responses (SR02, SR06 -SR09, SR11 -SR13). Participants are also instructed to wear a headset while watching stimuli to avoid unwanted ambient sound and prevent physiological signals affected by the conversation between subjects (SR15).
The single (unimodal) or multiple (multimodal) physiological signals are acquired at various sampling rates ranging from 25 to 2000Hz. Four of the selected studies have used a multimodal approach. Park et al. classified happy and sad emotions by fusing two peripheral signals: PPG and skin temperature (SR05). A hybrid fusion strategy has been employed using facial expressions, EDA, and EEG signals to classify happy, sad, and neutral emotions (SR29). Two multimodal fusion methods between ECG and EDA signals are used for happy or sad emotional state recognition with reference to a neutral state (SR32). Steenhaut et al. assessed fEMG, EDA, and ECG signals to measure the emotional reactivity of subjects during happy and sad emotions (SR28).

1) VALIDATION OF PHYSIOLOGICAL SIGNALS
The emotions felt by the subjects are validated with selfreports using various methods such as Self-Assessment Manikin (SAM), Visual Analogical Scale (VAS), Likert scale, questionnaire, and press file (SR01, SR04, SR06, SR07, SR10, SR13, SR14, SR16, SR17, SR18, SR20, SR26 -SR28, SR32, SR35, SR39). In one of the selected studies, happy and sad emotions are labeled from the valence ratings obtained using SAM. The video is labeled as sad when the valence rating is ≤ three and happy when the valence rating is ≥ seven (SR17). Krishna et al. have also considered SAM as ground truth for assessing the subject's happy, sad, relax, and fear emotional states (SR20).
Christensen et al. used VAS scale ranging from 0 to 100 ('0' -'sad', '50' -'neutral', and '100' -'happy') for measuring behavioral or subjective experience. The range of the scale is selected using a mouse cursor present on the screen (SR16). Steenhaut et al. also used the VAS scale to indicate subjective emotional reactivity (SR28). The lower end of the VAS scale indicates neutral and the higher end as happy or sad (SR28). In another study of fEMG based emotion recognition, the subject's happy and sad emotions are rated using the VAS scale (SR01).
After watching audio-visual stimuli, the subjective tendency of emotions is collected from the questionnaire to validate happy and sad labels. The questionnaire includes the level of emotion felt by the subject, tendency of emotion for a given audio-visual stimulus (SR18). Das   . The data that belongs to no emotion felt by the subject is discarded for further analysis (SR26). Questionnaires, namely type and intensity of emotion elicited by happy, sad, and calm, are considered to validate affect (SR13).
Gao et al. have used a feedback form for validating the emotions triggered by joy and sadness videos (SR27). A selfassessment form has been used to label the positive emotion induced by happy video and negative emotion induced by sad (SR04). Singhal et al. have used a web-based online form to validate happy, sad, and neutral emotions by collecting participant ratings on a scale of 1-5 ('1' -'very poor' and '5' -'very good') (SR10). A Likert scale ranging from 0-10 has been used for obtaining the intensity of happy and sad emotions felt by the subject (SR06, SR07). Liu et al. uses press file to represent two strings, namely '0' (target emotion is perceived) and '1' (target emotion is not perceived), to obtain participants subjective experiences for happy, sad, fear, and anger emotional states (SR14). SAM (SR39) and self-assessment form (SR35) have also been used to obtain the ground truth labels for differentiating positive and negative emotional states. Six of the selected studies used high definition cameras to record participant's facial expressions during the experiment (SR11, SR14, SR26, SR27, SR29, SR35). Only six studies have reported the details of an ethical committee approval and the validation of the protocol before experimenting (SR01, SR02, SR08, SR09, SR39). Also, one of the selected studies mentioned that the experimental protocol has been implemented in strict accordance with the declaration of Helsinki (SR15).

E. INSTRUMENTATION APPROACHES TO DIFFERENTIATE DICHOTOMOUS EMOTIONAL STATES
During stimuli visualization, various instruments are used to acquire physiological signals. In order to understand the performance based on instrument characteristics, it is important to consider some of the common factors associated with the hardware specifications. The design of instruments is affected by factors such as user, technology, medical, environmental, and economic-related factors (see Fig. 5) [94], [95].

1) USER RELATED FACTORS
When dealing with user-related factors, the instrument should take less time duration to set up device and subject preparation [95]. The instrument must be easy to wear by the subject without limiting normal activity and causing additional distress [96]. An instrument with good usability can bring a positive experience to the subject [97]. The portable instruments open a new path to the non-intrusive field of assessment of emotions [98]. For example, Emotive devices are portable and are comfortable to use in comparison to Neuroscan devices. Also, the setup time of Neuroscan devices is high compared to the Emotive devices [99].

2) MEDICAL RELATED FACTORS
The electrical safety of the medical equipment is the most important, and only devices tested for safety should be used in hospitals [95]. The parameters, namely comfort level and system usability, are crucial in the instrument for biofeedback acquisition. The non-invasive instruments are comfortable and easier to use for both the therapist and the patient [96]. In longterm tracking applications, systems without direct skin contact provide many advantages, such as reliability and electrical isolation with the sensor surface [100]. The instruments should dissipate nominal heat. The excess heat and radiation generated by the instrument may cause irreversible changes in the tissue [95].

3) TECHNOLOGY RELATED FACTORS
The multi-electrodes devices are expensive and maybe uncomfortable in real-life situations. In most devices, the input impedance, linearity, sensitivity, and Common Mode Rejection Ratio (CMRR) are made high, and the latency of the device is driven low for an accurate measurement. The accuracy of emotion recognition also varies between instrument and derivatives, the placement of electrodes. High CMRR refuses all unwanted signals in the preamplifier stage, so only the desired signals find a way into the amplifier [94]. Reliable instruments can have standards that allow physicians or clinicians to decide if their patients are normal or abnormal. The instruments with differential input can operate at lower voltages while maintaining high SNR [94], [95].

4) ENVIRONMENTAL RELATED FACTORS
Increasing the Signal to Noise Ratio (SNR) can reduce the effect of environmental noise in biomedical instrumentation systems. The stable instrument ensures that results are repeatable and reproducible [95]. The medical devices have to function appropriately in the suggested values for temperature and air humidity. Also, they must be less prone to movement artifacts and designed for minimum energy consumption [95].

5) ECONOMIC RELATED FACTORS
The cost of the instrument and its maintenance, such as labor and spare parts, must be inexpensive. The availability of trained manpower, availability of consumables, and compatibility with existing equipment is always challenging.
The instruments, namely Bioneuro multi-channel feedback (SR15), Biopac (SR28), EMPATICA E4 (SR29), and Power lab (SR 16), have been used for acquiring EDA signals. Among these instruments, the wearable device EMPATICA E4 Wristband (SR29) can be preferred because of its setup time, cost, real-time usage, portability, and the number of channels used. Further, it is found that Biopac (SR28) and Power Lab (SR16) have similar specifications in all aspects (from Table 3).
However, in the laboratory environment, Biopac (SR28) or Power Lab (SR16), or BioNeuro multi-channel feedback (SR15) is also a good choice because of its high input impedance, sensitivity, and SNR. BioNeuro multi-channel feedback (SR15) has a very high input impedance compared to the power lab (SR16) and Biopac (SR28). However, the input range of Biopac (SR28) and Power lab (SR16) is higher when compared to BioNeuro instruments (SR15).
Power Lab (SR01) and Biopac (SR28) have been used for fEMG signal recording. Since both of these devices have similar specifications, any of these instruments can be preferred. Digital-IF Doppler radar (SR27) has been used for measuring respiratory signals. This device has advantages such as less setup time, non-contact type, and being more comfortable to the participant.
Four of the selected articles have not mentioned the instrument type or model used for recording physiological signals (SR19, SR25, SR30, SR32, SR38).

G. COMPARISON OF MEASUREMENT METHODS USED FOR HAPPY AND SAD EMOTIONAL STATES
During stimuli visualization, various physiological and nonphysiological traits are acquired from the corresponding instrumentation approaches. The comparison of measurement methods in differentiating dichotomous emotional states using audio-visual stimuli is summarized in Table 4. The physiological measurement methods such as EEG, fEMG, EDA, ECG, PPG, RSP, and a non-physiological measurement method, GAIT, have been used to classify happy and sad emotional states selected review articles. The physiological signal, EEG, directly reflects the neural activity of the emotions, but the installation and maintenance cost of these devices is very high [101].
The EDA measurements are simple and are easy to install [102] but are influenced by external factors such as temperature and humidity [102], [103]. ECG generates a higher magnitude output signal compared to other methods. However, these measurements have limitations such as high inter-subject variability and low accuracy due to movement artifacts in mobile systems. Although PPG provides physiological variations, inaccuracy in tracking the PPG signals during daily routine activities due to motion artifacts caused by hand movements is one of the main limitations [104]. EMG has limitations such as being susceptible to noise, measures the only valence, and difficult to set up. Although this method has a good spatial resolution, it is limited by cost and time resolution. The non-physiological method, the GAIT pattern has strong ecological validity; they are still in their infancy.

H. FEATURE EXTRACTION
For the classification of happy and sad emotional states, the time, frequency, and Time-Frequency (TF) domain features were used.

2) FREQUENCY DOMAIN FEATURES EXTRACTION
Features, namely squared coherence estimate, Vr, MO, CO, frequency cepstral coefficient, spectral Shannon, and k-NN entropy, are calculated from EEG signal (SR19, SR21, SR31, SR 35). Lee and Hsieh extracted brain functional connectivity pattern-based features, namely coherence, phase synchronization index (SR39). Welch's power spectral density has been computed from EDA signals (SR32). From the HRV signal, the power spectral density (PSD) is calculated at Low Frequency (LF) and High Frequency (HF), and LF to HF power ratio (SR08). The indices of LF power, HF power, and LF to HF power ratio in the power spectral density are calculated from the PPG signal (SR11, SR26). The E of PSD at different frequencies is extracted from RSP (SR27).
The features, namely Eng, instantaneous phase, and absolute power, are computed from certain bands of EEG signal by applying DTCWT to the selected channels (SR13). The absolute Max, absolute M, Std, Pow, Eng, En, differential En, Vr, MO, and CO features have been computed from the wavelet coefficients of each sub-band generated by using the DWT method (SR19, SR21, SR22, SR24, SR33).
By using the TQWT decomposition method, the features such as mean absolute value, Pow, Std, Sk, and Ku have been computed from each sub-band of EEG signal (SR34). Similarly, Krishna et al. have calculated the time-domain features (RMS, absolute sum and SQRT sum, change in average amplitude, log detector, clearance factor, shape factor, and crest factor) from the amplitude at sampling points of each sub-band, and Hjorth features (Vr, MO, and CO) from the Std of each sub-band (SR20). Gao et al. fused power spectrum generated from STFT and wavelet energy entropy computed from DWT that are derived from the different frequency bands of EEG signal (SR17).
Considering the ECG signal, basic statistical features (M, Std, Min, and Max) are extracted from the DWT coefficients at level 4 decomposition. Similarly, total power, LF, HF, and LF to HF power ratio features have also been computed from the intrinsic mode functions generated by the DWTs empirical mode decomposition and wavelet coefficients at level 14 decomposition (SR30).
According to the survey, it is found that the time domain features are mostly used in all instrumentation approaches, followed by frequency domain and TF features. The summary of all the features in the respective instrumentation approach is given in Table 5.

I. CLASSIFICATION AND STATISTICAL ANALYSIS
Classification and differentiation of dichotomous emotional states are carried out using several classifiers and statistical analysis methods. For this, features extracted from various signals using different instrumentation approaches are considered. Out of 39 articles, 30 articles have used classification algorithms, and nine articles have used statistical analysis methods.

1) CLASSIFICATION
Out of 30 classification articles, 23 articles classify happy and sad emotional states, and the remaining seven articles classify positive and negative emotional states. The summary of the classifiers and the respective performance metrics, namely accuracy, F-score, and True Positive Rate (TPR)/False Positive Rate (FPR), are listed in Table 6.
Out of 23 happy and sad classification articles, 11 articles have used EEG signals with machine learning algorithms, namely Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Random Forest (RF), Relevance Vector Machines (RVM), Naïve Bayes (NB), Extreme Learning Machine (ELM), and Artificial Neural Networks (ANN). The  highest classification accuracy of 96.81% has been achieved using the SVM classifier and wavelet coefficient features. The EEG signals in the study are recorded in SS, and SI analysis has been carried out for classification (SR19). Another SI study on EEG signals acquired in MS have obtained the least accuracy of 63.63% in classifying happy emotional states using ANN (SR22). In the case of EDA signals, Srinivasan et al. have used SI analysis and achieved a classification accuracy of 65.38% and 87.50% for happy and sad emotional states, respectively, using the kNN algorithm (SR15).
The classifiers, namely k-Nearest Neighbors (kNN) and Fisher, have been used to categorize happy and sad emotions from ECG signals. The maximum accuracy of 75% for SI analysis has been achieved using the kNN algorithm (SR31). Cheng et al. reported the TPR/FPR metric of 0.8956/0.005 and 0.9010/0.0162 for happy and sad emotions, respectively, using the Fisher classifier and SI analysis (SR14). Another SI study conducted on ECG signals recorded in MS has achieved the least classification accuracy of 65% in classifying happy and sad emotional states using kNN (SR30). Quiroz et al. have performed both SD and SI analysis on GAIT pattern data using three classifiers: Baseline, RF, and Logistic Regression (LR). In both studies, the experiment is conducted in SS. The maximum accuracy of 68.20% (F-score: 0.7630) has been obtained for SI analysis using the LR algorithm (SR06, SR07). Recently, Shu et al. used four classification algorithms, namely kNN, RF, Decision Tree (DT), Gradient Boosting Decision Tree (GBDT), and AdaBoost, for classifying happy and sad emotional states using HR signals. SI analysis has been carried out for the classification and achieved the highest accuracy of 84% using the GBDT algorithm (SR09). The features extracted from the RSP signals have been used to classify joy and sadness emotional states using kNN classifier with SI analysis and achieved an accuracy of 85% (SR27).
Rakshit et al. have used the combination of SKT and PPG signals to classify happy and sad emotional states. The SD analysis carried out on combined signals yielded a maximum accuracy of 92.83% using the SVM classifier (SR05). Similarly, EDA and ECG signals are combined to classify dichotomous emotions using SVM, NB, and kNN classifiers with SI analysis. The highest classification accuracy of 100% and 98.92% are achieved using EDA and ECG signals, correspondingly (SR32). Another multimodal study (combination of facial expressions and physiological signals (EEG and EDA)) conducted by Cimtay et al. has used SD analysis on the LUMED-2 database and SI analysis on the DEAP database. In both databases, the signals are recorded in SS. The SD analysis conducted on the signals collected from the LUMED-2 database achieves an accuracy of 53.80%. The SI analysis performed on the signals collected from the DEAP database achieves an accuracy of 75%.
Out of seven positive and negative emotion classification articles (SR33 -SR39), six articles have used EEG signals with machine learning algorithms, namely Multilayer Perceptron Neural Network (MLPNN), enhanced D-score Genetic Programming (eDGP), ELM, RF, sparse Autoencoder based Random Forest (ARF), Quadratic Discriminant Analysis (QDA), and combined Rotation Forest (RoF) with SVM. Among the EEG signal-based positive and negative emotion classification articles, the highest classification accuracy of 94.40% has been achieved using the ARF classifier and entropy-based feature. In this study, SI analysis has been carried out on the signals recorded in MS (SR36).
Two of the selected studies have used multiple databases for the analysis (SR35, SR37). An SI study has been conducted on EEG signals acquired in SS (from experiment and SEED database) and MS (DEAP database) using an eDGP classifier. The signals acquired from the experiment have achieved an accuracy of 86.55% (F-score -0.9042). The signals collected from the databases, namely DEAP and SEED, have achieved an accuracy of 84.81% 86.22%, respectively (SR35). Similarly, Zheng et al. have used multiple databases, namely DEAP and SEED, to classify positive and negative emotional states using the ELM classification algorithm. The DEAP database has been created in SS, and the SEED database has been created in MS. The accuracy achieved by using EEG signals collected from the DEAP database is 69.67%, and the accuracy achieved by using EEG signals collected from the SEED database is 91.07%. SI analysis is carried out on the data collected from both databases (SR37).
In ECG, a SI analysis was carried out using an RF classifier and achieved an accuracy of 92.10%, 93.90%, and 92.20% for positive, negative, and neutral emotional states, respectively (SR38). The classifiers used in the selected studies and respective performance metrics obtained to classify dichotomous emotions are summarized in Table 6.
Type III ANOVA showed a significant difference in the mean of fEMG activated by the corrugator, orbicularis, and zygomaticus muscles (SR01). The time interval between foot, peak, and two successive feats of PPG signal are varied significantly high between happy and sad emotions (SR02). The frequency-domain indices of HRV, namely LF, HF, and LF to HF ratio, are highly significant to differentiate dichotomous emotions (SR08). The difference in the RI of the PPG signal for both happy and sad is very significant (SR11).
Kolmogorov-Smirnov test showed a very significant variation in the PPE of HRV between happy and sad emotions (SR12).
Steenhaut et al. performed a pairwise t-test to know the emotional reactivity differences between younger and older adults using happy and sad film clips. In happy emotion, the tonic component of EDA varied significantly, and in sad emotion, VAS ratings of participants are varied significantly. For both happy and sad emotions, older adults reported higher reactivity (SR28).
Paired t-tests on EDA responses of the subjects in happy and sad emotions varied significantly (SR16). The CD of EEG signals in parietal and frontal regions showed a significant variation in differentiating joy and sadness (SR23). Similarly, the alpha patterns of EEG signals differed very significantly in happy and sad emotions (SR25). The summary of the significance levels obtained to differentiate happy and sad emotions are listed in Table 7.

III. DISCUSSION
Despite the fact that these emotional states can be easily measured using physiological traits, the needs for measurement can differ widely on the basis of user, technology, medical, and environment related factors. In this review, six physiological traits and the instruments used for measuring dichotomous emotional states are identified. Moreover, each instrument has been outlined on the basis of its user related properties (i.e., setup time, measurement intrusiveness, and size), medical related properties (i.e., invasive/non-invasive and safety), technological related properties (i.e., compatibility, input impedance, input voltage range, sensitivity, and SNR) and cost. Most of the instruments used in the reviewed articles may be ideal for measuring in a laboratory setting. Still, they may not be the preferential alternative for motion artifacts characterization in real-time applications.
When the number of electrodes is considered, the instrument must be designed with fewer electrodes. Nevertheless, it requires a relatively large number of electrodes for most current EEG devices [105]. In this review, the highest number of electrodes used for recording EEG signals is 64 (SR03), whereas the minimum number is two when recording HR in (SR09). One study used continuous-wave Doppler radar for emotion recognition, where the user does not require to wear any sensor/electrode on the body (SR27).
In recent days, with the advancements in technology, wearable devices are popular for emotional state assessment in VOLUME 9, 2021 real-time because of their unobtrusiveness and relatively long recording time [106]. Quiroz et al. have used a wearable Samsung Gear 2 device to record accelerometer and gyroscope sensor data from the participants and achieved an accuracy of 70% (SR06) and 76.30% (SR07) to classify happy and sad emotions. Emotive EPOC + headset has been used to record EEG signals and obtained an accuracy of 83.93% (SR10), 91.18% (SR17), and 87.50% (SR18) to classify dichotomous emotional states. Similarly, Jaswini et al. have used the Enobio wearable device to classify happy and sad emotions from EEG signals with 63.63% and 100% accuracy, respectively (SR22). In one of the studies, Empatica E4 wristband has been used to acquire EDA signals participants and obtained an accuracy of 81.20% to classify two opposite emotions, namely happy and sad (SR29). Recently, Shu et al. used a wearable device, namely Algoband F8, to collect HR signals and classified happy and sad emotions with an accuracy of 84.00% (SR09). NeuroSky MindWave Mobile 2 headset wearable device have been used to acquire EEG signals and classified positive and negative emotional states with an accuracy of 87.61% (SR35).
Based on the reviewed articles, a single modality is commonly considered to recognize dichotomous emotional states. In comparison to a single modality, multiple modalities may provide better information and enhance recognition accuracy. The instruments such as the Multi-channel electrophysiological recording system -RM6240, HelathLab, and Biopac supports multiple physiological signal recordings. Thus, multiple modalities can be explored to classify emotional states.
The accuracy of dichotomous emotional state recognition can also be enhanced using multiple combinations of features and classifiers. The choice of feature extraction domain depends on the type of signal and its characteristics. The use of TF domain features is insufficiently explored in the selected articles. The various machine learning approaches such as SVM, RF, LDA, and Fisher have been used for the classification of happy and sad emotional states. The choice of classification algorithm mostly depends upon both the type modality and the type of application. In this review, SVM with time-domain features is most commonly used. The use of deep learning methods can also be incorporated with the growing use of newly available machine learning and artificial intelligence tools.

IV. CONCLUSION
In this study, the sensing approaches involved in recognition of dichotomous emotional states elicited using audiovisual stimuli with various protocols, recording devices, and classification methods are explored. Performance evaluation is carried out among the instruments used in the selected review articles, but there is a lack in the user related factors of the approaches considered. Despite the undisputed value of ambulatory diagnosis, the monitoring of happy-sad emotional states is not established. Most of the methods mainly focused on the enhancement of emotion recognition accuracy using multiple combinations of features and classifiers.
In order to increase the quality of life, critical developments in instrumentation are still actively sought to improve the efficiency of ambulatory care monitors. Thus, the research on the type of stimuli, features, and classification algorithms is still challenging with current enhancement in wearable emotion recognition devices. For a more effective recognition of happy and sad emotional states, the fusion of multiple physiological parameters is pursued for monitoring capability on wearable devices.