A Survey on EEG-Based Solutions for Emotion Recognition With a Low Number of Channels

The market uptake of Brain-Computer Interface technologies for clinical and non-clinical applications is attracting the scientific world towards the development of daily-life wearable systems. Beyond the use of dry electrodes and wireless technology, reducing the number of channels is crucial to enhance the ergonomics of devices. This paper presents a review of the studies exploiting a number of channels less than 16 for electroencephalographic (EEG) based-emotion recognition. The main findings of this review concern: (i) the criteria to select the most promising scalp areas for EEG acquisitions; (ii) the attention to prior neurophysiological knowledge; and (iii) the convergences among different studies with respect to preferable areas of the scalp for signal acquisition. Three main approaches emerge for channel selection: data-driven, prior knowledge-based, and based on commercially-available wearable solutions. The most spread is the data-driven, but the neurophysiology of emotions is rarely taken into account. Furthermore, commercial EEG devices usually do not provide electrodes purposefully chosen to assess emotions. Considerable convergences emerge for some electrodes: Fp1, Fp2, F3 and F4 resulted the most informative channels for the valence dimension, according to both data-driven and neurophysiological prior knowledge approaches. The P3 and P4 resulted in being significant for the arousal dimension.


I. INTRODUCTION
In recent years, biosignals have become an increasingly used source for measuring emotions alongside other traditional systems such as affective reports (e.g. SAM [1]). Cerebral blood flow [2], electroculographic (EOG) signals [3], electrocardiogram, blood volume pulse, phalanx temperature [4], galvanic skin response, and respiration are just some of the biosignals employed in the field of emotion recognition over the years. Recently, several studies focused on brain The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Zia Ur Rahman . signal analysis exploiting techniques such as PET (Positron Emission Tomography), MEG (Magneto Encephalography), fNIRS (funcitonal Near-infrared Spectroscopy), fMRI (Functional Magnetic Resonance Imaging), EROS (Event-related optical signal), and EEG (Electroencephalogram). Among the systems mentioned above, EEG has the advantage to offer a better temporal resolution.
The growing use of BCI technologies is boosting the market mainly in case of BCI applications to treat brain disorders and injuries. In 2020, the worldwide BCI market size was valued at $1,488.00 million. This value is expected to reach $5,463.00 million by 2030, growing at a Compounded VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Average Growth Rate (CAGR) of 13.9% from 2021 to 2030 [5]. In the last years, a significant challenge in EEG system prototyping has been to move the signal recording outside clinical research laboratories. Bulky technology was generally used in laboratory or clinical settings for subject monitoring. However, the efforts to realize wearable EEG systems have made possible the long term and non-invasive recording of brain signals outside of the lab [6]. Therefore, the increasing wearability of the prototyped solutions allows the use of BCI for the recognition of emotions in several sectors. In non-clinical applications, EEG is widely used in neuromarketing to evaluate the customer reactions to products or services [7], [8], [9]. In this sector, different kinds of commercial EEG devices have been employed in previous studies. However, researchers always tend to prefer systems that are more comfortable for the users [10]. Some further application fields are, for example, car driving [11], [12], working environment [13], and entertainment [14]. In clinical applications, wearable systems were employed for measuring sleep parameters [15], for detecting epileptic seizures [16] and for screening, intervention and monitoring of autism spectrum disorders [17], [18]. These approaches are made possible by the availability of new wearable solutions. In this perspective, an even more significant reduction in the number of channels and the use of dry electrodes [19], [20] represent a fundamental challenge together with the use of wireless technology to enhance the system ergonomics. However, most of the commercial wearable solutions are general purpose. Often, no information is offered about the adequacy of the positioning of the few electrodes to the specific phenomenon to investigate. The complexity of emotional phenomena makes difficult the identification of universally recognized electroencephalographic patterns. The absence of electroencephalographic patterns makes the attempt to minimize the number of channels particularly challenging. In the last years, machine learning-based approaches have supported neuroscientists to identify EEG patterns related to emotional phenomena. Moreover, machine learning has been directly used for the goal of minimization through the identification of the most informative channels.
The complexity of the emotional phenomena also poses significant challenges in terms of experimental reproducibility. In an emotion recognition task, it is always necessary to manage the uncertainty due to the relations among the stimulus, the perception of the stimulus and the physiological response (i.e., how the emotion reverberates at the electroencephalographic level). Therefore, each element can introduce variability that affects the experimental reproducibility. Reproducibility can be evaluated on both the cross-subject and the within-subject levels. Cross-subject reproducibility loss occurs when the same stimulus does not induce the same emotion in different subjects. A loss in within-subject reproducibility arise from different reactions to the same elicitative stimulus at different times. Over the years, many attempts to standardize both the procedures for emotional elicitation and the elicitative stimulus itself have been made to address the problem of reproducibility. Suitable stimuli datasets were experimentally validated (i.e., standardized) by using significant samples and are widely used by researchers (e.g., International Affective Picture System -IAPS [21], Open Affective Standardized Image Set -OASIS [22], and Geneva Affective Picture Database -GAPED [23]). The use of standardized datasets allows reducing the problem of reproducibility loss. Stimuli are rated according to the valence and arousal dimensions, and, for each stimulus, the corresponding mean and standard deviation are given. Thus, the probability associated with the confidence interval of the stimulus score gives an estimation of the percentage of the experimental sample perceiving the expected emotion.
To date, no EEG-based emotion recognition review focuses on the problem of channel reduction and, consequently, the problem of electrodes' optimal positioning. The present review contributes to this issue by answering the following research questions (RQ): • In an emotion recognition task, what are the criteria used to select the most promising scalp areas for EEG acquisitions? In the literature, what are the preferred approaches? A priori-knowledge or data-driven? (RQ1) • If data-driven approaches are exploited, are the obtained results compared with the neurophysiological knowledge? (RQ2) • If devices with few number of channels are already adopted, is this choice justified with respect to the EEG phenomenon to be investigated? (RQ3) • Are there convergences between different studies with respect to preferable areas of the scalp for signal acquisition? (RQ4) In Sections II-A, and II-B we briefly review a theoretical framework about emotions and their neurophysiology. In Section III, the process of papers selection is reported. In Section IV, findings about: (i) the time trend analysis, (ii) the minimization strategies, (iii) the reference theories, (iv) the experimental sample size and selection, and (v) the eliciting stimuli were reported. In Section V, the three channel reduction strategies mostly used in the literature were identified: data-driven based (Section V-C), prior knowledge based (Section V-A), and based on commercial EEG devices provided with a low number of channels (Section V-B). In Section VI, an argumentation of the achieved findings is reported.

A. THEORETICAL FRAMEWORK
The absence of a uniquely accepted definition of emotions strongly impacts their measurability. To date, several definitions of emotions were proposed by different theories. Kleinginna and Kleinginna proposed a well-assessed categorization of these definitions in [24]. The authors distinguished among: (i) affective, feelings of pleasure/displeasure and excitement/depression, (ii) cognitive, appraisal processes, namely the perceptual/thinking aspects of emotions, (iii) Stimuli-Organism-Response (SOR) based, effects of external stimuli on physiological mechanisms, (iv) adaptive/ disruptive, the emotions are considered to increase the probability for the body to meet its needs or to cause destructive effects on it, (v) multiaspect, which embrace different aspects of emotions, (vi) restrictive, attempt to differentiate emotions from other processes, (vii) motivational, the overlap between emotion and motivation is highlighted, and (viii) skeptical, the usefulness of the concept of emotion is denied. Based on the reported definitions, Kleinginna and Kleinginna proposed the following multi-component emotion definition: ''Emotion is a complex set of interactions among subjective and objective factors, mediated by neural-hormonal systems, which can: (i) give rise to affective experiences such as feelings of arousal, pleasure/displeasure; (ii) generate cognitive processes such as emotionally relevant perceptual effects, appraisals, labelling processes; (iii) activate widespread physiological adjustments to the arousing conditions; and (iv) lead to behaviour that is often, but not always, expressive, goal-directed, and adaptive''.
Further clustering of the theories of emotions refers to the link between the emotion and the underlying neurophysiological system. Discrete theories propose an independent neural system subserving every emotion while Dimensional theories affirm that all affective states arise from few independent and interacting neurophysiological systems [25], [26], [27]. Discrete theories of emotions suggest the existence of separate emotions, each with specific characteristic patterns. Six basic emotions (i.e., anger, disgust, fear, joy, sadness, and surprise) were proposed by Ekman [28].
Basic emotions are also called primary emotions because they are considered present from birth. They have innate neural substrates, and innate and universal expressions [29]. Secondary emotions, instead, result from the combination of the primary ones (e.g., pride, shame, guilt, etc.).
Dimensional theories of emotion propose the existence of underlying affective dimensions common to all emotions [30]. Thus, emotions can be represented in a multidimensional space. In the Circumplex Model of Affect, proposed by Russel [31], emotions are categorized according to two central neurophysiological systems explaining the valence of emotion (i.e., positive/negative affect) and the level of arousal (i.e., corresponding physiological activation). The choice of the reference theory determines the possibility of carrying out a classification of the emotional states (when the theory of discrete emotions is exploited) or a measure of the dimensions underlying the emotional states, namely valence or arousal (when dimensional theories are exploited).
The discrete theory entails the use of a nominal scale that represents non-additive quantities and can not be employed for measurements, referring to the International Vocabulary of Metrology [32]. Dimensional models allow the measurement since emotions are arranged along with interval scales. Studies on EEG-based recognition of emotions mainly refer to cortical brain lateralization theories. The Theory of Right Hemisphere claims that each emotional expression and perception takes place in the right hemisphere [33]. The Theory of Valence affirms that the right hemisphere is dominant for processing negative emotions and the left hemisphere is dominant for processing positive emotions [34]. Similarly, the Approach-Withdrawal model posits the role of the left-and right-anterior regions in processing emotional states in the government of approach and withdrawal behaviours [35]. The Behavioral Activation System -Behavioral Inhibition System (BAS/BIS) model states that the left and the right frontal activity reflects the strength of the BAS and BIS systems, respectively [36]. BAS/BIS are the two anatomical paths governing the emotional/motivational systems. The BAS is responsible for the activation of the behaviour in response to rewarding stimuli, and it associates emotions (which are generally positive, like hope and relief) with these behaviours. On the other hand, the BIS inhibits behaviour in response to new, feared, and adverse stimuli. BIS activates with passive avoidance and extinction behaviours, and the related emotions are generally negative (e.g., anxiety, fear).

B. NEUROPHYSIOLOGY OF EMOTIONS
This section reports the results of previous meta-analyses and surveys on the association between emotions and specific brain areas. Several meta-analyses aimed to verify the hypotheses posed by the theory of discrete emotions. In [37], neuroimaging studies were employed to determine whether basic emotions are associated with consistent and diverse brain activation patterns. Consistency relates to the fact that the same brain region exhibits more significant activity for the same category of emotions (e.g. the amygdala activity increases each time an instance of the category fear is experienced). Brain activation loci strongly associated with the five basic emotions (i.e. happiness, sadness, anger, fear, and disgust) [28] were identified to evaluate consistency. Activation maps for each pair of emotions were compared to verify emotions discriminability. While the consistency of regional brain activations corresponding to each primary emotion was found, the existence of discriminable neural correlates has not been demonstrated [37].
The most significant associations between basic emotions and brain activation regions are the following: (i) fear with the amygdala; (ii) disgust with the insula, ventral prefrontal cortex, and amygdala; (iii) sadness with the medial prefrontal cortex; (iv) anger with orbitofrontal cortex; and (v) happiness with rostral anterior cingulate cortex. Individual differences such as age and sex can influence some brain functions.
In [38], a meta-analysis conducted on neuroimaging studies was carried out to verify whether the data support the locationist or constructionist-psychological theory of emotions. A locationist account will be found if a certain emotion category (e.g. fear) corresponds to a brain region's consistent and specific activation across considered neuroimaging studies. A constructionist-psychological vision will be found if the same brain regions activate for different emotion categories. Furthermore, these brain regions may also carry out some basic psychological operations (e.g. core affect, conceptualization, language, or executive attention). The conducted analysis did not find strong evidence between VOLUME 10, 2022 the locationist hypothesis and the brain-emotion correspondence. The increase in brain region activation was not specific to instances of a particular discrete emotion.
Few meta-analyses were conducted to explore the link between brain regions and the affective dimensions of valence and arousal because of the lack of neuroimaging studies investigating the two dimensions independently. Two main views were proposed. The first one posits that arousal and valence are separately processed. Thus, an increase in amygdala activity is linked with the arousal dimension. At the same time, an activation in medial and lateral orbitofrontal cortex (OFC) regions is related to positive and negative valence, respectively [39]. Further studies hypothesized the involvement of multiple brain regions in the representation of arousal and valence. This hypothesis was confirmed with a finite impulse response model and suggested the existence of networks underlying valence and arousal dimensions (e.g., the network responsible for pleasant emotions includes the midbrain, the ventral striatum, and the caudate nucleus) [40]. Emotion is, therefore, the result of a complex process that takes place on several levels. The stimulus enters the brain stem, and the limbic system interprets it. The hypothalamus elaborates the stimulus and triggers the corresponding visceral physiological reactions (i.e., increased heart rate, chills, etc). The amygdala links the stimulus to the emotional reaction and compares new stimuli to the past experience. In the end, the temporal and prefrontal cortices cognitively evaluate the experienced emotion. Hence, the role of the frontal cortex as the emotional control centre.
EEG-based studies on the origin and brain processing of emotions are mostly aimed at identifying asymmetrical EEG activation over the frontal cortex, referring to the theory of valence. Thus, the most common feature employed to detect the difference in activation between the two cortical hemispheres is the alpha asymmetry. Alpha activity monitoring is predominantly carried out at F3 and F4 positions, as they are located above the dorsolateral prefrontal cortex [41]. Conversely, among the studies adopting the theory of discrete emotions, there is a lack of studies anchoring the proposed EEG features to neurophysiological theories.
As evidenced by the literature, neuroimaging studies mainly rely on discrete emotion theories, while EEG-based studies mainly rely on dimensional theories.

III. RESEARCH METHOD
The analysed studies were collected from Scopus, Pubmed, and IEEE Xplore by implementing the PRISMA guidelines about the systematic review reporting [42], [43].
The following query was used on the search engines: (eeg AND emotion AND ((''reduced number'' AND (channels OR electrodes)) OR ''wearable'' OR ''six channels''' OR ''portable'' OR ''one channel'' OR ''two channels'' OR ''three channels'' OR ''four channels'' OR ''five channels'' OR ''six channels'' OR ''seven channels'' OR ''eight channels'' OR ''nine channels'' OR ''ten channels'' OR ''channel minimisation'' OR ''channel reduction'' OR ''channel selection'')). The review was carried out by inspecting the titles, the keywords and the abstracts of the papers. Only journal and conference articles were considered, and book chapters and reviews were excluded from the results. No time limits were applied. A total of 418 papers were obtained: 258 from Scopus, 89 from Pubmed, and 71 from IEEE Xplore. One hundred five duplicates were excluded, and 313 papers were selected for the next step. Then, the articles were filtered according to their abstracts: all the papers not dealing with the emotion recognition field or not exploiting the EEG signal were eliminated. By the end of this round, 140 articles were left. Subsequently, each article was analyzed by reading the complete text, and other 25 papers were excluded because they dealt with the theme of emotions but did not carry out a classification of the emotional states (e.g., the aim was to distinguish between depressed and non-depressed groups. One hundred fifteen papers were finally selected and classified. The EEG device employed for recording the signals was not a criterion used to evaluate the inclusion/exclusion of the papers within the review. However, it is important to underline the role played by the equipment in guaranteeing the quality of the signal and therefore the classification performance. An objective criterion for establishing the adequacy of the instrumentation is represented by the compliance with the standard IEC 60601-2-26:2012 [44] (in 2019 replaced by the standard IEC 80601-2-26:2019 [45].) The following instruments will be considered as ''compliant'': (i) devices in compliance with this standard and specifically intended for clinical use, and (ii) instruments produced for scientific research which are accompanied by datasheets reporting the satisfaction of the minimum requirements  identified by the standard. Studies reporting the use of ''noncompliant'' devices were also included in the review given their high number on the total of the papers in the current scientific panorama [46]. These papers emblematically represent an important ongoing scientific and technological process of searching for a trade off between wearability, low cost and classification performances. In general, the metrological characterization of an instrument can be a useful tool to evaluate its adequacy, as in [47]. In Fig. 1, the phases of identification, screening, eligibility, and inclusion of the papers are shown in detail.

IV. GENERAL FINDINGS A. TIME TREND ANALYSIS
A time-trend analysis of the number of papers published each year allows identifying increasing attention towards the topic of emotion recognition by using a minimal number of channels. From 2013 onwards, the number of published papers per year increased almost linearly. Fig. 2 graphically shows the time-trend.

B. MINIMIZATION STRATEGIES
The articles fall in three macro-clusters namely data-driven, prior-knowledge, and market according to the channel reduction strategy employed. The data-driven cluster includes studies aiming a channel reduction by applying proper algorithms on a high number of channels dataset. The priorknowledge cluster collects studies selecting the optimal subset of channels relying on the prior-neurophysiological knowledge. Lastly, papers exploiting commercially available devices equipped with a low number of channels are included in the market cluster. Fig. 3 shows the occurrences of papers for the clusters mentioned above.

C. REFERENCE THEORY
Dimensional theory is the most widely adopted, specifically Russell's circumplex model of affect in which emotions are represented by two dimensions, namely emotional valence and arousal. The discrete theories of emotions are less commonly employed with respect to the dimensional theories. The amount of papers exploiting the above mentioned reference theories is reported in Fig. 4. Ten papers analyzed emotion by referring to both discrete and dimensional theories.
Depending on the reference theory adopted, different emotions can be assessed. In the case of the discrete emotion theory, primary (or secondary) emotions can be classified. In the case of the dimensional model, valence, arousal (possibly dominance) are evaluated.

D. EXPERIMENTAL SAMPLE SIZE AND SELECTION
60% of the surveyed papers employ a self-produced dataset acquired for emotion recognition. The remaining 40% employ publicly available datasets, such as Seed [48], Deap [49], Dreamer [50], Amigos [51], and Mahnob-HCI [52]. No particular criteria were used for the selection of the experimental sample. The information reported about the data set mainly concerns the number of subjects, age, sex and their health conditions. Less frequently, the ethnicity and the presence of cognitive or hearing issues are indicated. Four clusters of papers at varying the sample size, were identified: (i) 1 ≤ n ≤ 10, (ii) 11 ≤ n ≤ 20, (iii) 21 ≤ n ≤ 30, and (iv) n ≥ 31 were n is the number of subjects involved in the experimental activities. In Fig. 5, the number of papers for each interval is reported. The same item can be counted multiple times when it falls into multiple categories.

E. STANDARDIZED STIMULI
A not standardized set of stimuli (mostly video clips) was used to elicit emotions in most experimental setups. Standardized sets of eliciting stimuli are mainly pictures and are employed in a minority of cases. Results are shown in Fig. 6. The total number of studies exceeds the number of articles VOLUME 10, 2022 reviewed because some studies fall into more than one category and were considered multiple times.

V. SPECIFIC FINDINGS ON CHANNEL REDUCTION APPROACHES
This section focuses on the most known strategies for the reduction of channels in the emotion recognition field reported in the selected literature. Three channel reduction strategies largely used in the literature were identified: A) manually choosing the best subset of channels based on the prior neurophysiological knowledge; B) use of commercial EEG devices provided with a low number of channels, and C) use of machine learning-based algorithms (data-driven approaches) to find the best subset of channels.

A. PRIOR KNOWLEDGE-BASED APPROACHES
Papers employing prior knowledge to select the best electrode placement mainly refer to the theory of the right hemisphere and valence theory. Asymmetry in EEG patterns between the two hemispheres is mostly employed for emotion recognition, particularly in the dorsolateral prefrontal cortex. Among these papers, 68% select electrodes only from the frontal area, following the theory of valence. Further 32% select few electrodes symmetrically from each hemisphere's different areas (frontal, parietal, temporal, and occipital).
The most adopted electrodes were identified by counting the number of articles that proposed them, and the percentage for each channel was assessed. Results of the most adopted electrodes in a neurophysiology-based channel reduction task for emotion recognition purposes are shown in Fig. 7. Only the channels proposed by at least 30% of the studies were  EEG devices have become increasingly available on the market over the last decade. Most of these are general-purpose (i.e., measuring cognitive functions, sleep phases, meditation states, etc.). Therefore, the positioning of the electrodes is not anchored to a consistent neurophysiological theory.
The commercial devices provided with a number of channels ≤ 16 were classified as low number of channel devices. #16 is an empiric threshold emerged from the surveyed literature: #16 is the maximum number of channels to continue defining a device as wearable [82]. Below, the commercial devices provided with a low number of channels (< #16) employed in scientific papers on emotion recognition are reported: Emotiv Epoc + [83] [97], [98], [99], [100], [101], [102], [103], [104], [105], [106], [107]), the Emotiv Epoc + by the 30% [51], [108], [109], [110], [111], [112], [113], [114], [115], [116], and the NeuroSky Mindwave by the 12% [117], [118], [119], [120]. The OpenBCI, the Mindlink, the Emotiv Insight, the Nexus10, the Nexus4, and the abmedica Helmate are employed in the minority of cases [82], [121], [122], [123], [124], [125], [126], [127]. Also in this case it is possible to distinguish between compliant and non-compliant devices. Both the abmedica Helmate and Enobio8 comply the mentioned standard. The scalp areas covered by the electrodes are different across devices. The Emotiv Epoc + and the Muse systems lack channels along the midline. The IMEC system shows a higher concentration of electrodes in the frontal area of the scalp, but registration sites in the central area are also present. The NeuroSky mindwave and the Mindlink present a tiny number of channels concentrated in the pre-frontal area. Table 1 reports the channels provided by the aforementioned commercial solutions, those not reported in table do not have a fixed configuration of channels and allow the positioning of the electrodes to be changed within the positions offered by the compatible EEG cap. In Fig. 8 the most used channels are represented. The different colours indicate the percentage of devices that employ each channel. Only the channels used in at least 30% of the devices were reported. A subset of 12 channels can be identified as the most commonly used in commercially-available EEG systems employed for emotion recognition goals, namely Fp1, Fp2, F3, F4, F7, F8, C3, C4, P7, P8, O1, and O2. It is worth noting the symmetrical distribution of electrodes used in commercial devices.

C. DATA-DRIVEN APPROACHES
The problem of channel selection is strictly related to the feature selection problem for classification methods for which several survey works are available in the literature, e.g. [128], [129], [130]. It is now common opinion that the methods for feature selection fall in three major categories, depending on the search strategy adopted to find the most meaningful features: i) filter methods: the subset of features is selected as a preprocessing step and does not depend on the learning approach ii) wrapper methods: a learning engine is used to score subsets of features according to their predictive power iii) embedded methods: the selection is integrated in the training phase and usually strongly depends on the learning approach. Most of the methods proposed for the selection of the best EEG channels for emotion recognition do fall in one of the above categories changing the word feature with the word channel. In addition, some methods achieve channel selection via feature selection: first, features are selected considering signals from all channels, and then the channels with more selected features are labelled as most informative.

1) FILTER METHODS
Different from other strategies, filter methods are classifierindependent; no classifier is needed to select the best electrode sets. However, some score measure is needed to rank the electrodes. Three strategies are mainly used to rank the channels/features in the current literature: Correlation-based methods, Mutual Information-based methods, and Reliefbased methods.

a: CORRELATION-BASED
The Correlation Coefficient [131] is an indicator of the relationship between a pair of variables X and Y . It is usually defined as: where E(·) is the average operator and the summations run over the values assumed by the variables. In [132], the functional interconnections between electrodes pairs are computed for each session of each subject. In the proposed work, the correlation is used as a similarity measure between pairs of electrodes. The work was validated on the DEAP 32 channels signals on four classes sampled from the Valence/Arousal space. The correlations were then reported in connectivity graphs to facilitate pattern detection. Several statistics are then computed using the graphs: electrode degrees (number of electrodes connected to each electrode) and electrode modes. These indexes are used to estimate a final channel activation probability used to select the best four electrodes for the proposed emotions, resulting in CP1, O1, Pz, Po4. The Reverse Correlation Algorithm (RCA, [133], [134]) is an unsupervised feature selection method. A pseudo-code of the algorithm for feature selection is reported in the Appendix.
In [135], the RCA was adapted for the channel selection task. The intuition is that a lower correlation between channels can be interpreted as low connectivity with the other ones. In a nutshell, considering each channel as a collection of features, it is possible to give a score to each channel by computing the correlations between the channels' features and summing them up. Next, the channels having the lowest sums are selected iteratively. Experiments were done on the DEAP dataset, where stimuli were generated using music videoclips of 1 minute each one. The method was assessed on a four classes problem (happy, sad, calm, enthusiasm) in the Valence/Arousal space. A comparison with other three channel selection methods is made. The method was validated with a 10-fold CV. The authors search for the best three channels, founding the set composed of P8, AF4 and Cz as the best configuration. The authors highlighted that several inspected methods also select the Cz electrode and the right part of the brain. This was motivated by the authors because, as highlighted by other studies, the right-brain activity reflects a negative emotional state and negative emotions are considered to elicit more reactivity with respect to positive ones. Similarly, in [136] and [137] sets of respectively 11 and 4 optimal channels are found for the emotion classification problem on the DEAP dataset. To generalise the channel selection to an unseen subject, in [136] and [137] the RCA is adopted for each subject. Then the most frequent channels are selected. In particular, the authors highlighted that using the proposed 11 channels in an emotion classification task led to an average difference of less than 1 % with respect to using all the 32 channels used to acquire the data.

b: MUTUAL INFORMATION-BASED
Given two continuous random variable X , Y , Mutual Information MI is defined as [138]:

y) p(x)p(y) dxdy
where p(x, y), p(x), p(y) are the joint probability density function (pdf) of X and Y , the marginal pdf of X and the marginal pdf of Y respectively. High values of MI correspond to high relation between the two variables. If the variables are discrete, the integrals will be replaced by sums and pdf by probabilities. Since no statistics is used to compute MI , it can measure any kind of relationship between pair of variables, differently from other measures [139]. On the other size, MI requires to estimate the pdfs of the variables from data samples. Since 0 ≤ MI (X , Y ) ≤ min(H (X ), H (Y )) [138], [140], a normalized version of MI (·, ·) can be defined as where H (X ) and H (Y ) are the Entropy [138] of X and Y respectively. In [141], a feature selection algorithm based on MI was proposed. The proposed method build a set of features in an iterative way, maximising the MI between the classes and the features and penalising the features highly dependent on each other. A pseudo-code of the algorithm proposed in [141] is reported in the Appendix.
Reference [140] proposed a variation of the [141] algorithm using the Normalized Mutual Information index. NMI is adopted in [142] for channel selection problem to build connection matrices between channels used for a channel selection procedure. Also, in this case, the method is assessed on the DEAP dataset, showing, as for other methods, a certain channel reduction while maintaining high classification accuracy.
Mutual Information is again used in [143] with wavelet entropy, and average wavelet coefficient (WEAVE) features, halving the number of channels (from 32 to 16) with less than 8 % loss of accuracy.
Instead, in [144] an automatic channel selection procedure is performed exploiting the minimum Redundancy Maximum Relevance feature selection algorithm (mRMR, [145], [146]). mRMR selects less redundant features and most relevant for a given class simultaneously, and then either chooses the corresponding channels or assigns a weight to the channel averaging over the weights of the feature in the channel. Both redundancy and relevance are computed considering the Mutual Information between features. In other terms, mRMR optimizes the following conditions at the same time: with {X (i) } d i=1 set of features, C the class variable, MI (·, ·) is the Mutual Information, and F the set of the desired features. the first condition wants to maximize the the relevance between the selected features and the class, while the second one wants to minimize the redundancy between different features. In general, mRMR is an incremental search scheme, selecting one features at each iteration, not taking into account the interactions between groups of features. In [140], is highlighted that the mRMR algorithm can be obtained by setting β = 1 F in the [141] algorithm. In [144], two different methods to adapt mRMR to channel selection was proposed: the former using mRMR to feature selection and then selecting the corresponding channels, the latter using mRMR to assign a weight to the features, and the selecting the channels with highest average feature weights. The method is tested on the DEAP dataset reducing from 32 to 28 and 22 channels with a slight loss in classification accuracy (around 1.37%) using a Kernel Extreme Learning Machine (KELM, [147]) as a classifier. Furthermore, well-known electrodes sets discovered in other studies were tested.

c: RELIEF-BASED
The Relief Algorithm [148] is a feature selection procedure that ranks each feature by analysing the differences between feature values sampled from the items in the dataset. Relief starts from the hypothesis that a representative feature has similar values between sample acquisitions of the same class and very different values for instances of different classes. Therefore, the basic Relief strategy assigns a score to each feature X (i) in an iterative way. A pseudo-code of this strategy is given in the Appendix.
In [149], ReliefF, a popular filter feature selection method built on top of Relief, is proposed. Its main differences respect to Relief consist in searching for k neighbours with k > 1 instead of just one as in the Relief algorithm and averaging the k neighbours' contributions in the w update to improve the reliability of the scores. Furthermore, ReliefF uses a different strategy to handle multiclass data with respect to Relief and can work with incomplete data.
ReliefF is exploited in [150] for the channel selection task. Three different channel rank strategies are proposed: using the weights average over the features belonging to each channel (Mean-ReliefF-Channel-Selection, MRCS), the first one selecting the channels having the top-N features selected by ReliefF, and refining the ReliefF channels ranks exploiting the classification performances given by an SVM. These Strategies are tested on a subset of the DEAP dataset. Similarly, [151] performs valence recognition experiments both on DEAP and self-collected data, selecting the channels having the highest features ranks. In contrast, in [152], the channels are scored relying on the average feature weights. ReliefF is also tested in [153] and [154] on proprietary datasets, and in [155] and [151].

d: OTHER RANK MEASURES
Other measures used to rank the channels are reported.
-ReliefF-mRMR: mRMR and ReliefF are both adopted in [156]. In the proposed work, an intermediate set of 18 optimal channels was selected using the ReliefF algorithm, and then refined toward a final set of 10 channel. The strategy is validated on the DEAP dataset.
-Common Spatial Pattern-based: [157] a channel selection procedure was assessed on self-made data and MAHNOB-HCI dataset with three different emotions sampled from the Valence dimension. The adopted channel selection procedure gives a score of each channel relying on a multiclass CSP transform. Each 60 s is framed into 6 s windows and the method is validated with a 5-fold CV for each subject. Analysing the classification accuracy with different numbers of channels, the authors observed that the accuracy increases until the number of channels is below 19.
-Differential Entropy: in [158], a combined featurechannel selection method is proposed. The features are first extracted by four different Neural Networks and then ready for the channel selection procedure. The channel selection criteria is based on the Differential Entropy. Given a threshold value, channels with a greater entropy are selected omitting the remaining ones. The work is validated using DEAP, MAHNOB, and SEED datasets.
-Synchronization Likelihood: Another filter method is proposed in [159] measuring the linear interdependency between signal via the Synchronization Likelihood [160].
-Stepwise Discriminant Analysis: The use of Stepwise Discriminant Analysis (SDA) [161] is discussed in [162]. The final classification score are obtained with the Linear Discriminant Analysis.
-EigenVector Centrality Method: In [163], one channel is selected for a four emotions (fear, sad, happy, relax) classification task. The EigenVector Centrality Method (EVCM) makes the channel selection process. EigenVector Centrality relies on the following principle: given the channels relations disposed of in an adjacency matrix A, the eigenvector of A with the greatest eigenvalue is considered a score of centrality of each node. In this context, the centrality of a node (channel) is a measure of the node's influence on the whole network. Video stimuli are used to elicit the emotions, mapped in the arousal/valence model. The original signal is acquired using a 24 channel EEG device, reduced to one. The best channel found by the proposed method was FP1-F3.
-Energy Variation: In [164], optimal channels for each subject are selected, looking for the channels showing the most significant changes in brain activity during emotions. The relevant channels are selected by computing a probability score on the relevance of each channel. This probability is computed considering the Energy variations in the frequency bands. An estimate of the Energy is obtained from DFT and Numerator-Group-Delay (NGD). The chosen electrodes are different for each subject. The idea of selecting channels for each subject is based on the assumption that the folding of the cortex differs between any two people and on the findings of [165] that, in functional magnetic resonance imaging (fMRI) scans, the brain activity was unique for each emotional state. In other words, the authors searched for the most relevant electrodes for the investigated emotional states. The validation was made on the DEAP dataset. Final classification was made with RNN and QDC.

2) WRAPPER METHODS
In the channel selection domain, a wrapper method gives scores to subsets of channels using a learning engine. The channel subsets can be given a priori relying on some theory or empirical evidences or determined by a machine learning method. In particular, a significant part of the literature explores swarm intelligence algorithms for channel discovery.

a: A PRIORI KNOWELDGE CHANNEL SETS
In [166], several electrodes configurations for emotion recognition and attention recognition are proposed. The final aim of the study is to propose a general-purpose set of electrodes VOLUME 10, 2022 suitable for both tasks. The study proposed configurations composed of 2, 4, 6, and 8 electrodes, validated using a 10 × 10 -CV on the DEAP dataset on 4 classes sampled from the Valence/Arousal space. Sets of pair numbers of electrodes are chosen to satisfy the hemispherical symmetry. The research of the best electrode sets is made exhaustive, considering different feature configurations. Each set's final rank is given as a combination of the resulting accuracy and the normalized concentration performance measure (CONC). In the conclusions, the authors highlight that, for emotional recognition, all the discovered sets always include F7 and F8 channels.
Using the appraisal as reference theory, in [56], different electrodes sets are experimentally assessed in a 3-fold Cross-Validation scheme on an SVM classifier, considering the best channels highlighted to be associated with appraisal processing in other studies.

b: CLASSICAL MACHINE LEARNING FOR CHANNEL SETS DISCOVERY
A simple empirical study for channel selection using a simple classifier is reported in [167]. The method builds a classifier for each channel for data acquired from a 21 channels device. The final results showed that the occipital channels have good discriminatory performance (around 80% of classification accuracy). The experiments were performed on a proprietary dataset from 26 females volunteers. In [168] the best classification performance are empirically selected from a set of 32 channels. The channels' performance was assessed using a Neural Network, and then the channels giving the best performance for each subject were selected. The final classification accuracy were validated with a 4-fold Cross-Validation on proprietary data.
A SEnet architecture [169], for channel selection is proposed in [170]. The network is used to capture the dependencies between channels assigning a weight to each channel. The method is tested on the DEAP dataset, using an SVM classifier reporting that selecting the top 7 or 12 channels shows a better result than a pair of competing methods.
In [54], a binary emotion classification problem on EEG signal of 26 subjects was addressed. A channel selection procedure, based on Gradient Boosting Decision Trees, selected a combination of channels located in the lateral annular region of the brain as the most effective for the proposed emotion classification task.
In [171], the performances on a three emotions classification problem on self-acquired data are used to find a set of 10 optimal channels. The classification performances of each channel are used to measure the most relevant channels for emotion alteration. After the validation procedure, the channel corresponding to the C6 area appears to be most sensitive to the emotion alteration. In general, the right hemisphere seems particularly sensitive to emotions.

c: SWARM INTELLIGENCE FOR CHANNEL SETS DISCOVERY
Several Swarm Intelligence algorithms [172] were adopted for channel selections. The most significant part of them are inspired by mechanisms observed in nature. The selection of channels is possible with a swarm intelligence algorithm since it can exploit the most promising areas of the solution space without an exhaustive search.
In [173], Particle Swarm Optimization (PSO, [174]), Cuckoo Search (CS, [175]), Grey Wolf Optimizer (GWO, [176]), and Dragonfly ( [177]) are adopted to select relevant features. The selected features are used to choose the most relevant channels for an emotion classification problem on the DEAP dataset with SVM and k-NN classifiers. A channel is chosen if at least one of its feature was chosen for at least 31 subject in the 90% of the experiments in the feature selection stage. A group of 11 channels distributed over all brain regions was identified as involved in emotion classification.
In [178], a differential Evolution (DE, [179], [180]) version for feature selection is exploited for channels selection. In this work, features and channel selection are tied to selecting them in pairs via a Sparsity Constrained Differential Evolution (SCDE) approach. The feature-channel pairs are optimised synchronously in the global search adopting a sparsity constrained fitness function. The DE fitness function is obtained as the combination of the classification accuracy returned by Quadratic Discriminant Analysis (QDA) and a channel sparsity parameter to limit the number of the selected channels. Channels are selected relying on the features selected during the DE procedures. The DEAP dataset is used for the experimental assessment, obtaining different sets of channels at varying the sparsity coefficient. A pseudo-code of the DE algorithm for feature selection is reported in the Appendix.
A combination of DE and the Bat Algorithm [181], was proposed in [182]. The proposed Binary Adaptive Differential Evolution Bat Algorithm (BADEBA) was tested on the DEAP using an SVM classifier. The idea is to apply the mechanisms of the Differential Evolution algorithm to the Bat algorithm so that the mutation mechanism is introduced into the Bat algorithm. To solve the problem in the solution space of channel selection, the authors modify the Bat and the DE algorithms, the former imposing that each bat position F is in the {0, 1} d space with d number of channels, the latter by using logic operations instead of arithmetic ones. Two final sets of 8 and 7 optimal channels for valence and arousal were given. A pseudo-code of the basic Bat Algorithm is reported in the Appendix.
A feature subset selection algorithm based on DE is adopted in [183]. The method is tested on a small dataset of 10 subjects acquired by the authors using an LDA classifier on seven emotions, proposing sets of optimal channels for each examined emotion.

d: OTHER METHODS
In [184], a feature channel reduction and a channel selection procedure were proposed. The proposed channel selection method (Relief-FGSBS) is based on the Relief Algorithm and the Floating Generalized Sequential Backward Selection (FGSBS) combination. In FGSBS a feature is removed iteratively from the candidate optimal set. The removal is made relying on an evaluation function (e.g., the classification accuracy) computed at each iteration. In the proposed work, Relief-FBSBS iteratively removes the less influential channels. The performances of the selected set of channels were compared with random channels sets performances. Experiments were made both on self-collected data and public data (DEAP). In the self-produced data, a 64 channel acquisition device was adopted. The images from the CFAPS dataset was used as stimulus. Validation is made with a 4-fold CV procedure. In the study, a set of 10 channels reached an accuracy close to the one obtained using all the channels. Furthermore, the authors also highlighted that the channel rankings changed in function of the EEG features chosen.

3) EMBEDDED METHODS
Group Sparse Canonical Correlation Analysis (CCA) [185], a method that incorporates group effects of features into the correlation analysis while performing individual feature selection simultaneously is adopted in [186]. The traditional CCA is formulated as a weighted reduced-rank regression problem in this work. A set of binary weights indicates whether the corresponding group of features are selected and tested on the SEED dataset in leave-one-trial-out crossvalidation setup. The proposed method wants to select the best channels and predict the emotion information of testing EEG data simultaneously. The study reports a set of 4 channels returning an accuracy of about 80 % on three classes.
In [187] and [188] Graph Neural Networks (GNNs) are proposed both for emotion classification and channels relations detection. In particular, [188] returned a channel activation map showing the contribution of each channel for the final classification together with the inter-channel relations. The method is evaluated on SEED and SEED-IV.

4) THE CASE OF ALGORITHMS TESTED ON THE DEAP DATASET
To compare the performance of the proposed minimization algorithms, we focused our attention on papers using public datasets. Indeed, in the case of public datasets, the variability of the data generation process is controlled. The reference theory (i.e., discrete or dimensional), the elicitation stimulus (e.g., standardized or not), the mood induction procedure (how many and what instructions are given to the subjects), and the size of the experimental sample are all sources of variability.
Among the papers exploiting public datasets, the majority (53%) consider the DEAP dataset.
The EEG signals contained in the DEAP dataset are labelled in the framework of the dimensional theory of emotions in terms of valence, arousal, and dominance. Further information is provided about like/dislike and familiarity. 54% of the studies employ those signals in a 4-class emotion recognition problem where the emotions correspond to the four quadrants of the valence/arousal plane. The most informative electrodes were identified by counting the number of articles that proposed them, and the percentage for each channel was assessed. Results of the most informative electrodes in a channel reduction task on the DEAP dataset in a 4-class emotion recognition problem are shown in Fig. 9. Only the channels obtained in at least 30% of the studies were considered significant. The detected electrodes are mostly concentrated in the brain's frontal areas in accordance with the knowledge provided by neurophysiology regarding the relevance of the frontal brain regions in emotional processes.
42% of the works consider the valence and arousal dimension in channel reduction processes separately. Of these studies, 5 investigated only the dimension of emotional valence; the remaining 6 applied a channel reduction approach separately to both the valence and the arousal dimensions. Fig. 10 and 11 show the maps of the most significant channels for valence and arousal, respectively. The reported channels were obtained in at least 30% of the studies. One of the papers exploiting the Deap dataset did not report the selected channels. The most informative channel for the emotional valence dimension resulted in being F4. For the arousal dimension, in addition to some frontal electrodes, also electrodes placed in the parietal and occipital regions (i.e., P3, P4, O2) resulted informative.
The updated version of the 10/20 International Positioning System proposed by [189] was used to create the electrode maps.
The performances of a subset of the revised studies using the DEAP dataset were reported in Table 2. The investigated emotions (i.e., the classes), the classification accuracy, the modality of data division in the validation strategy, the validation strategy, and the reduced number of channels are indicated for each study. Only studies reporting all the requested information are considered.

VI. DISCUSSION
As stated in the Section Background, the neurophysiological theories anchoring EEG patterns to anatomy-functional analysis move within the framework of dimensional theories by considering only the valence dimension. The arousal-neural VOLUME 10, 2022 TABLE 2. Achieved performances of a subset of the revised studies exploiting the Deap dataset. For each row we report, the study, the classes (where H, L, V, and A mean high, low, valence and arousal, respectively), the classification accuracy, how the data are divided during the validation strategy, the validation strategy, and the reduced number of channels. circuitry is not associated with any EEG pattern [190]. Therefore, neurophysiological theories do not provide specific indications for using the EEG in emotion recognition. In the last years, data-driven approaches have been supporting the identification of EEG patterns useful for more accurate classification of emotional phenomena. The identification of EEG patterns is useful for the goal of channel selection. On the contrary, selection strategies can help the identification of EEG patterns.
We can now attempt to answer the four questions formulated in the introduction to this paper. Regarding RQ1, we can safely state that three approaches are used to select the most promising scalp areas for EEG acquisitions in an emotion recognition task: data-driven, prior knowledge-based, and based on commercially-available wearable solutions.
Regarding RQ2, the most spread is based on data-driven approaches, but rarely the neurophysiology of emotions is taken into account [54], [144], [191]. Considering question RQ3, it is worth noting that the majority of commercial solutions is general purpose, and, to our knowledge, no manufacturer justifies the location of the proposed electrodes. Therefore, when devices with a low number of channels are already adopted, this choice is often not justified with respect to the EEG phenomenon to be investigated. Few wearable systems without a fixed configuration of channels allow to change the positioning of the electrodes according to the research goal.
As RQ4 is concerned, convergences among different studies with respect to preferable areas of the scalp for signal acquisition were found in the present review regarding the use of frontal and parietal channels to measure emotions.
In the case of studies exploiting commercial solutions already based on few channels, findings show electrodes predominantly placed along the sagittal lines (right and left) connecting Fpz to Oz [189]. Fp1 resulted in the most used electrode among wearable devices. Indeed, Fp1 and Fp2 were also found to be significant from data-driven based channel reduction analyses. Unlike the scientific literature, several market solutions nowadays propose P7 and P8. These channels could be informative in the framework of the asymmetry theories since they maximise the distance from the midline. P3 and P4 are entirely missing though they are helpful for arousal recognition. F3 and F4, together with F7 and F8 are largely employed, in agreement with neurophysiological knowledge suggesting a fundamental role of these channels for the measurement of emotions [41], [192]. C3 and C4, and O1 and O2 are widely used even if neurophysiology does not suggest a fundamental role of these channels in the recognition of emotions. Both for prior knowledge-based approaches and approaches based on commercially-available wearable solutions, performance analysis on the classification outputs were not carried out because of the different experimental setups employed for the EEG signal recording, i.e., the investigated emotions, the eliciting stimuli, the electrode type (wet or dry), the EEG device, etc. Indeed, compliance with the standard should be reported by the manufacturers in the technical documentation. Among the mentioned devices, only the abmedica Helmate and Enobio8 refers to the standard.
By analysing the results provided by the papers dealing with data driven-based minimisation, the channels in the frontal area emerged as informative, as anticipated by the anatomical-physiological research. More informative electrodes were concentrated in the brain's frontal regions in a channel reduction task on the DEAP dataset in 4-class emotion recognition. In the 2-class emotion recognition task, F4 resulted being the most informative channel for the valence dimensions. Strangely the same did not happen for F3. For the arousal dimension, electrodes placed in the parietal area, namely P3 and P4, resulted in being significant.
However, despite having restricted the evaluation to those compared on the same public dataset, it is not easy to propose a comparison among the reviewed algorithms. Numerous differences emerged regarding: (i) the object of investigation (only valence or arousal dimensions or both), (ii) the number of classes, (iii) the size of the reduced set of channels and their positioning, and (iv) the validation strategy.
As regards the validation strategy, different performance validation results and relevant electrodes can be obtained if the validation strategy (e.g., k-fold Cross Validation (CV), Hold-Out Validation, Leave One Subject Out (LOSO), etc.) is applied considering recordings from each subject at a time, or considering the data of all the subjects together. Unfortunately, only a few of the reviewed works provide a detailed description of the use of the data. In Table 2 the works that reported the details mentioned above are reported. From the results of the studies reported in Table 2, it is possible to conclude that satisfactory performance can be produced even with a low number of channels. Thus, the experiments on the DEAP dataset, as for other datasets largely employed in scientific literature on EEG based recognition strategies, are affected by a weakness of reproducibility of the experiments as concerns the elicitation of the measurand. The impact of the experimental setup on the system performance, namely, the number and type of emotional states, the kind of stimulus, the stimulus induction procedure, the experimental sample selection, etc. also, adopting a peculiar reference theory is poorly justified (i.e., discrete vs dimensional).
Finally, in many studies, the sample size does not exceed 30 subjects (empirical threshold of the central limit theorem [193]) and therefore, the statistical significance of the results is compromised.

VII. CONCLUSION
In this review, different strategies for channel reduction in the context of EEG-based emotion assessment are compared. The goal is to contribute to improving the EEG-device wearability by minimizing, at the same time, the loss of information. The lack of robust EEG signal patterns linked to emotions makes more challenging the channel reduction with respect to other EEG phenomena (e.g., Steady State Evoked Potentials or Event-Related Potentials). Nevertheless, since 2007, more than 100 papers have pursued the reduction of the number of channels in EEG-based emotion recognition according to three main approaches: data driven-based, prior knowledge-based and based on commercially-available wearable solutions. The majority of the reviewed papers exploited data-driven approaches, but the neurophysiology of emotions is rarely taken into account. Many studies are based on public datasets, and this allows a comparison of the performances of the proposed algorithms. However, it is worth noting how often the public datasets are obtained using not-standardized stimuli and without administering questionnaires to the experimental sample for its preliminary characterization.
In the case of self-produced datasets, the care in preparing a reproducible setup to stimulate emotions is sometimes not accompanied by an equally profound evaluation of the criteria to be adopted to reduce the number of channels. For example, when devices with few channels are used, the consistency of the disposition of the electrodes with the localization of the studied electroencephalographic phenomenon is not previously discussed. Also the compliance of the EEG device with the standards is not always verified thus affecting the signal quality. Despite the limitations mentioned above regarding the non-univocal definition of the measurand and the low experimental reproducibility, some interesting trends can be identified. By analyzing the results, the channels in the frontal area emerged as informative, as anticipated by the anatomico-physiological research. Fp1, Fp2, F3 and F4 were the most informative channels for the valence dimension, both according to data-driven based channel reduction analyses and prior neurophysiological knowledge. For the arousal dimension, electrodes placed in the parietal area, namely P3 and P4 resulted in being significant. Generally, commercial EEG devices do not provide the selected electrodes since they are not built for the specific purpose of emotion measurement.

ACKNOWLEDGMENT
This work was carried out as part of the ''ICT for Health'' Project. The authors would like to thank the Ph.D. Grant VOLUME 10, 2022 ''AR4ClinicSur-Augmented Reality for Clinical Surgery'' (INPS-National Social Security Institution-Italy).