A Hierarchical Separation and Classification Network for Dynamic Microexpression Classification

Macrolevel facial muscle variations, as used for building models of seven discrete facial expressions, suffice when distinguishing between macrolevel human affective states but won’t discretise continuous and dynamic microlevel variations in facial expressions. We present a hierarchical separation and classification network (HSCN) for discovering dynamic, continuous, and macro- and microlevel variations in facial expressions of affective states. In the HSCN, we first invoke an unsupervised cosine similarity-based separation method on continuous facial expression data to extract twenty-one dynamic facial expression classes from the seven common discrete affective states. The between-clusters separation is then optimized for discovering the macrolevel changes resulting from facial muscle activations. A following step in the HSCN separates the upper and lower facial regions for realizing changes pertaining to upper and lower facial muscle activations. Data from the two separated facial regions are then clustered in a linear discriminant space using similarities in muscular activation patterns. Next, the actual dynamic expression data are mapped onto discriminant features for developing a rule-based expert system that facilitates classifying twenty-one upper and twenty-one lower microexpressions. Invoking the random forest algorithm would classify twenty-one macrolevel facial expressions with 76.11% accuracy. A support vector machine (SVM), used separately on upper and lower facial regions in tandem, could classify them with respective accuracies of 73.63% and 87.68%. This work demonstrates a novel and effective method of dynamic assessment of affective states. The HSCN further demonstrates that facial muscle variations gathered from either upper, lower, or full-face would suffice classifying affective states. We also provide new insight into discovery of microlevel facial muscle variations and their utilization in dynamic assessment of facial expressions of affective states.


I. INTRODUCTION
F ACIAL expressions convey one's internal thoughts, feel- ings and emotions.Thus, facial expressions serve as interpretable external signals conveying variations in one's affective states [1].Variations in facial expressions result from the conscious and subconscious processing of various internal and external stimuli, including any contextual biases and the interaction between past and present experiences [2].Patterns of facial muscle movements provide reliable models for automated recognition of human affective states [3]- [5].Several complex and difficult to identify facial expressions of affective states have also been modelled [5], [6].
Humans understand the dynamic and continuous nature of emotions and affective states using their collective experiences [3].Given the complexities of emotion elicitation and the This work was funded by the Faculty of Science and Engineering, Curtin University Australia.
psychology behind it, one cannot simply look at emotions as static occurrences in time [5], [7].By defining changes in facial expressions as a continuous and time-dependent function, we posit that micro-expressions are classifiable within the prevailing continuous space, as they form transient macrolevel expressions such as perceived expressions of anger or happiness for example [8].
Affective computing literature cites a diverse range of facial expression recognition and affective state assessment related works [9], [10].A large majority of the cited facial expression classifiers were trained using Ekman's discrete and independent models of affective states [11].The seven distinct and discrete models of affective states Ekman proposed were based on significant differences in facial muscle movement patterns [12].
This work presents a novel, continuous and dynamic affective state assessment solution that uses a rule-based system to model facial muscle movements for classifying microand macro-level transient facial expressions.The hierarchical separation and classification network (HSCN) used in this work deploys three subsystems: (i) Full-facial, macro-level affective state assessment, (ii) Upper-facial region micro-expression classification and (iii) Lower-facial region micro-expression classification.We used an unsupervised, cosine similaritybased separation method for exploiting mutual information in continuous facial expression data allowing for discovering the boundaries and regions within a multidimensional hyperplane.The following Linear Discriminant Analysis (LDA) subsystem further separates and clusters facial expressions in a linear discriminant space through the discovery of multiple hyperplanes within the discriminant space and identification of discriminant features, allowing for separation of multiple classes [2], [13].
Following the LDA transform, the HSCN uses our novel rule-based expert system for upper and lower facial region micro-expression classification on the basis of continuous muscle movements and the Facial Action Coding System (FACS) logic [14].The use of an expert system in facial expression recognition systems has been applied in several previous works.For example, in [15] a self-adaptive expert system uses the facial feature contours localized in a static dualview facial image to label the interpreted facial expression.In [16], the authors deploy a 'Belief Rule-Based Expert System' that exploits the outputs of a convolutional neural network (CNN) classifier to infer the mental state of a person based on their facial expression.In both cases, rule-based systems were exploited to augment and improve the classifier output.Our proposed HSCN provides another example of how rulebased systems can be deployed in dynamic facial expression recognition systems.

A. Contributions
Building upon the previous works, this paper contributes by providing: 1) A novel framework and an ensemble of classifiers that would enable hierarchical separation and classification of macro-and micro-level expressions of affective states, 2) A novel affective state assessment schema introducing methods and informing on benefits of attaining the ability to capture and monitor continuous facial expressions, 3) A systematic approach of separating the upper and lower facial regions and separately classifying them in a dynamic environment, 4) A novel design methodology of developing and applying a rule-base for classifying continuous macro-level expressions and, 5) Important details about implementing a rule-based expert system that uses dynamic linear discriminant features for classifying continuous micro-expressions on upper-and lower-facial regions.This paper is organized such that Section I introduces the work and Section II presents relevant works and approaches used in them.Section III details methods used for unsupervised clustering and labelling, cosine similarity estimation, data separation, macro-and micro-level linear discriminant analysis and construction of the rule-based expert system.Section IV reports results pertaining to the aforementioned analyses and system validation.Section V concludes this work and provides directions for future research.Section VI informs on the funding source and ethics compliance.

II. AFFECTIVE STATE ASSESSMENT
"Core affect" is a psychological construct that signifies the continuous nature of emotions and affective states [7].The concept of "core affect" helps in interpreting the complexities of human affective states and emotions as it provides theoretical foundation for building a dynamic affective state assessment solution.The fluidity and multidimensional nature of expressions of affective states as described in the core affect model highlights the need for a dynamic classifier capable of accounting for complex, multidimensional expressions.
Several models that represent 'unique emotions' have been introduced in the literature.For example, [4], [17] respectively attempted to model disgust and anger.Such unique models provide well-defined ways of comprehending and classifying human expressions of affective states.One such model, the Hourglass model [18] followed by its enhanced version [19] highlights the continuous and non-static nature of human sentiment and their assessment.Basically, an emotion categorization model, the Hourglass of Emotions was optimized for polarity detection.It was built on empirical data pertaining to sentiment analysis.Nonetheless, it could be used in the context of affective state classification.The hourglass model categorises similar and dissimilar emotions and presents a dynamic model that appears more representative of human emotions compared with the discrete emotion models.Other continuous emotion models include the Plutchik spectrum and the threefactor models [20], [21].Such continuous emotion models can delineate emotions and their macro-level expressions [22].In a somewhat similar manner, macro-level expressions have also been assessed along multidimensional axes of valence, arousal and intensity [23].
Facial expression analysis and affective state classification are complex problems thus, many of the available solutions underperform in real life situations.Changes in expressions of affective states are causal, representing some response to a particular temporal event or a combination of multiple external and internal stimuli [5], [7].Responding to certain stimuli and experiencing particular affective states would cause internal pathological and physiological changes in a person.Hence, fluctuations in cues like heart rate, skin conductance and hormone balances have been used for affective state classification.Variations in affective states are also reflected through external cues like speech rate and/or volume, haemodynamic changes on the face and facial expressions [24].Recent affective computing and psychophysiology literature highlights limitations of discrete facial expression models and affective state assessment solutions.
In order to overcome the limitations of a single cuebased, vision-supported affective state classifier, multiple-cue supported classifiers have been proposed.A recent survey [10] outlines a corpus of affective state assessment solutions, focusing on those related to the assessment of audio and visual cues.Research in [25] reports the deployment of a prototype multimodal affective state assessment machine that uses facial expressions and speech signals to improve the classification performances of a septenary classifier that can be compared to those discussed in [10].For real-time classification of affective states, an active-camera system has been used to track changes in the shape of the face, integrating a classifier that exploits human face and lip features to describe muscle-based expressions of affective states [26].
Previous works [27]- [31] provide examples of microexpression detection and categorisation.Pfister et al. in [27] suggested using temporal interpolation for feature mapping prior to implementing traditional machine learning classifiers like support vector machines, multiple kernel learning and random forests.Xu et al., [28] proposed a "Facial Dynamics Map" which characterises micro-expression related movements using granular pixel features along with an algorithmic approach that was based on optical flow estimation.Their work employs a support vector machine classifier to identify and categorise different types of facial micro-expressions [28].
Polikovsky et al., [29] used the EMFACS (Emotion Facial Action Coding System) for micro-expression detection.Their method divides full facial images into smaller facial regions based on action unit locations.A histogram of oriented gradients (HOG) approach was combined with a K-nearest neighbour classifier for detecting micro-expression and action unit activations.In [30], [31], rather than exploiting the visual cues, facial thermal features were used for facial expression classification.In [30], facial thermal features were compared on the basis of both upper-and lower-facial region muscle activation temperatures.In [31], the authors reported differences in classifier performances when different sub-regions of the face were used for feature extraction.
Through dynamic modelling of expressions of affective states and using multiple features, our proposed HSCN aims to improve on prevailing facial expression classification systems [10], [25].Considering variations in affective states as functions of time, the HSCN exploits continuous emotion models and attempts to further their static, discrete classification counterparts [7], [18]- [21].As the HSCN is based on the argument of continuous expressions, it goes beyond modelling the transient expressions and expression-intensity variations at the macro-level.It also demonstrates the transience of expressions by modelling continuous micro-level muscle movements in upper and lower facial regions and allows for the classification of various micro-expressions.Furthermore, the HSCN builds upon the prevailing dynamic facial expression recognition systems [32] and proposes an alternate approach for continuous macro-and micro-expression analysis.
Compared to the works discussed above, the HSCN would not only detect and classify micro-expression activations in the upper and lower-facial regions, but also uses them as a vehicle for macro-level dynamic affective state assessment.The HSCN is capable of classifying twenty-one, upper-and lower-facial region micro-expressions as well as twenty-one macro-level facial expressions.It would be safe to assume that the HSCN promises a robust affective state assessment solution that exploits the multidimensional and dynamic nature of human expressions.

A. Upper and Lower Facial Expressions Assessment
The FACS and EMFACS were discussed in [14] and their application details were presented in [10], [29]- [31].They categorise facial muscles movements through coding and action units and are important tools in the facial expression assessment and recognition arsenal.The EMFACS action units allow modelling feature fluctuations in time, as one's expressions change from one state to another.The rule-based expert system deployed in this work for upper and lower facial micro-expression classification is based on the muscle movements defined in the EMFACS.It would be prudent to note at this point that the "upper facial region" classifies muscle movements related to the eyes, eyelid, brow, and upper cheek whereas, the "lower facial region" classifies muscle movements related to the nostril, mouth, lip, buccinator and lower cheek [32], [33].
Our proposed HSCN was trained on the extended Cohn-Kanade (CK+) dataset [34] for the dynamic assessment of affective states and micro-expression classification.The CK+ dataset images contain continuous facial expression information as actors transitioned from an inactive/neutral state to an activated state as outlined in Table I, along with their corresponding action units and muscle movements.Using the FACS, the continuous nature of the CK+ dataset allows defin- Buccinator ing a continuous model of facial muscle movements in realtime.Visualising changes in expressions as time-dependent and supporting these observations with the EMFACS can help in the initial validation of the rule-based expert system being proposed.
Table I makes it obvious that the upper facial muscles move in a different way than the lower facial muscles across all seven discrete expressions of affective states.This fact poses the question -whether one facial region is more important than the other.Research conducted in [35] answers this question.Human participants' response to video recordings were examined in order to determine the relative importance of upper and lower facial regions for classifying facial expression.The study [35] reported that the importance of different facial regions is dependent on the affective state being expressed and that a full facial expression is always easiest to classify [35].These findings were also supported by research conducted in [31], which looked at the impact of different facial regions while attempting to classify facial expressions.These studies suggest that if the goal is to develop a comprehensive affective state assessment system, then detection and classification of both upper and lower facial region micro-expressions are important [32], [33].In [35], authors reported that on average, humans more accurately classify affective states using the lower facial region expressions compared with the upper facial region.This pattern was also observed while validating the performance of the HSCN's micro-expression classifiers.

III. METHODS
Unsupervised learning has been widely and effectively used for affective state classification as unsupervised learning models enable discerning and labelling patterns from within a collection of unlabelled data.Also, several works have treated continuous expression intensity estimation as an unsupervised learning problem.Generally, continuous expressions' sequences evolve from a neutral expression to a fully activated and unique facial expression [36].The corpus of unsupervised learning algorithms is extensive and ranges from dimensionality reduction, to manifold learning to clustering techniques [37].These methods rely on statistical foundations and the detection of similarity within a corpus of unlabelled data, exploiting similarity or dissimilarity measures for the purpose of identifying trends and clusters that may be useful for classification tasks [37].
Our proposed HSCN combines two techniques as shown in Fig. 1.The initial, unsupervised clustering and labelling approach is based on measurements of cosine similarity measures in the continuous data that have been projected onto an m-dimensional hyperplane defined by (1): Similar approaches were used in [38], [39] for multi-class facial expression classification.
Linear Discriminant Analysis was then applied to maximise the separation between labelled clusters by detecting multiple hyperplanes within the linear discriminant space.Previously, the LDA transform has been used for maximising the separation between clusters [40].Linear discriminant analysis allows for the projection of high-dimensional data onto a lowerdimensional linear discriminant (feature) space, clustering them in a way that maximises the inter-cluster variance while minimising the intra-cluster variance.This method maximises the separation between cluster centroids while minimising the separation between samples that belong to the same class [41].Analysing these clusters enables the modelling of state-to-state transitions and the classification of macro-level affective states in a continuous domain.Clusters formed at the macro-level via LDA provide foundations for defining micro-level clusters of lower and upper facial region data.This hierarchical clustering approach provides the structure that supports the HSCN's rulebased expert system.
Dimensionality reduction and clustering using LDA are seen as a set of logically apt optimisation steps for determining the optimal value of 'b' in (2), such that the function could be maximised as: where 'B' represents the inter-cluster covariance matrix and 'W' represents the intra-cluster covariance matrix.The solution to the optimisation problem was determining the linear discriminants, which correspond to the largest eigenvalues of W −1 B, noting that the number of linear discriminants required to solve an LDA problem depends on the number of labelled classes in a given set [41].Facial expression data 'x i ' was then projected onto the discriminant function used for the classification tasks and for determining to which class 'k' an expression 'x i ' belongs on the basis of similarity measures.For example: where ' xk ' defines the k th cluster centroid.In our proposed HSCN, the initial clustering process splits the continuous data into 'k = 21' classes.
As shown in Fig. 1, projections onto the lower-dimensional, linear discriminant space via LDA were applied in two stages: 1) Projection of the cosine similarity-separated clusters onto a two-dimensional linear discriminant space, maximising separation between cluster centroids to create the macro-level facial expression classifier.2) Projecting upper and lower facial region data onto 2-dimensional hyperplanes.Using the rule-based expert system allowed for the systematic detection and classification of upper and lower facial region microexpressions.The processes and subsystems contained within the HSCN framework are further discussed in the following sections.

A. Cosine Similarity-based Separation
Some definitions of the terms and concepts are given below to help understand the unsupervised separation and clustering methodology: • x i = {x 1 , x 2 , ..., x m } defines a pattern/feature vector i.e., a flattened facial expression image containing 'm' raw pixels/features.• X = {x 1 , x 2 , ..., x N } defines a set of 'N' input patterns all containing 'm' features.In this work, 'X' defines a continuous series of facial expression images ranging from neutral to activated, that have been projected onto an m−dimensional hyperplane.• C = {c 1 , c 2 , ..., c k } defines the 'k' class labels for the patterns contained in the pattern set 'X'.As mentioned earlier, there are k = 21 classes for all micro-and macrolevel classifiers in the network.Similarity measures have been used to solve both supervised and unsupervised learning problems [36].Determining similarity and dissimilarity measures across 'N' patterns in a continuous sample set 'X' allows categorising a subset of patterns based on similar features and mutual information.
The HSCN is split into three major subsystems, the first is tasked with the autonomous extraction of dynamic, macrolevel affective state clusters from a set 'X' of data.Separation and initial clustering of patterns was based on mutual information extraction via cosine similarity measures.Separation of continuous data was done by comparing the cosine similarity between all images/patterns within an m−dimensional hyperplane.
Cosine similarity leans on measuring the angle between two image vectors {x i , x j } projected onto a hyperplane of dimension 'm' [42].As the mutual information between the two vectors increases, the angle between them decreases, such that cos θ = 1 when i = j.The cosine similarity between two images was therefore calculated as such: We found that using the CK+ dataset, the inter-cluster variance was largest when using the cosine similarity approach.In this work cosine similarity measures were used to detect where any two serial expressions show high levels of dissimilarity.Regarding the CK+ dataset; high levels of dissimilarity indicate a noticeable change in affective state expression intensity.For each continuous set of facial expression samples, the dissimilarity detection algorithm allows separating the large cluster of images into three, macro-level facial expression clusters based on similar features, labelling them as follows: • Cluster 1: Neutral-dominated state • Cluster 2: Partially activated state • Cluster 3: Fully activated state This initial unsupervised separation process is achieved through a "frame-to-frame gradient analysis" which iterates through continuous data and calculates the dissimilarity magnitude '∆S cos θ ' between facial expressions in the series.The gradient magnitude is calculated as: with 'S cos θ (x, y)' defining the similarity measurement between the two serial expressions.This equation is applied 'N − 1' times to define all frame-to-frame transitions in X.
The dissimilarity magnitudes are used to detect locations of peak dissimilarity, which define the cluster boundaries within the hyperplane.The deployed algorithm splits X into two equal length subsets, with the global maxima (peak dissimilarity) being defined in each half.This allowed for the modelling of the continuous nature of affective states, thus allowing for the classification of twenty-one, transient macrolevel facial expressions.
Theoretically, this algorithm could be extended to increase the resolution of transient facial expression classes.Increasing the number of detected dissimilarity peaks would correspond to an increase in the number of clusters extracted from a continuous sample such that: N states = N peaks + 1. Separation of the continuous CK+ samples via the cosine similarity method are visualised in Fig. 2 and explained

B. Macro-level Linear Discriminant Analysis
The initial clusters created via the cosine similarity-based separation method were input into the second tier of the HSCN, which performed macro-level LDA clustering.Please note that general discriminant analysis has been previously used to further separate and cluster labelled facial expression data by defining hyperplanes within a linear discriminant space [43].The clustering was achieved by maximising inter-cluster variance in order to optimise cluster centroid separation.Fig. 3 highlights results of the macro-level LDA clustering algorithm when applied to a large volume of continuous facial expression data.Analysing subplot 2 of Fig. 3, we see linear trends from inactive ( NEUTRAL) expressions to partial expressions ( PARTIAL) to the fully activated expressions of all affective states that were modelled in this work.Furthermore, subplot 3 of Fig. 3 displays two continuous axes that separate these states: • Sadness (C) to happiness (B), • Anger (A) to surprise (D).We needed to understand "what do these axes represent in a theoretical sense?".The linear discriminant space visualised in Fig. 3 was a low-dimensional linear discriminant represen-tation of facial expressions, a mapping that corresponded with certain feature changes and variations in facial expressions at a higher level.We also saw that the other three activated states (contempt, disgust and fear) centroids resided on the two defined axes, with contempt existing at the intercept of the two axes.This was predictable given it was the most "neutral" expression relative to the other affective states being modelled.

Defining rules on the basis of the logical foundations provided by the EMFACS and Table I henceforth became very important. Comparing changes in muscle activations
from one state to the other state would help in determining what these linear relationships actually represented in reallife.This would also provide foundations for building a rulebased expert system capable of detecting and classifying micro-expressions.Comparing sadness and happiness muscle activations in Table I, we were able to model the state-tostate transition, visualising how expressions changes were based on muscle movements as shown in Fig. 4. Given the common facial muscles or facial regions involved in changing the expression from sadness to happiness, we could define an axis rule: Sadness-Happiness Axis Rule: Sadness and happiness  I, we could model the transition from anger to surprise as visualised in Fig. 5.Note that in this case, both states evidenced "raised upper eyelids" which was useful when attempting to derive a clearer relationship.Given this second example and the common muscle groups and facial regions that were activated (eyebrow and mouth region), we could define a second axis rule as: Anger-Surprise Axis Rule: anger and surprise share common facial muscle movements surrounding the mouth and eyebrow regions and share a consistent 'raised upper eyelid' activation.Therefore, the state-to-state transition could model the following transformations: (i) Eyebrows raise from an initial frowned/depressed position, (ii) Mouth opens from an initial tightened expression.
Expanding on the two axis rules that have been formed thus far, we might postulate an initial hypothesis in regard to what the X and Y axes represented in this case (i.e., linear discriminant 1 and 2 respectively).Let the linear discriminant 'n' be denoted by LD n , Table II shows the |∆LD n | values when comparing states i.e., points A → D in subplot 3 of Fig. 3. Together, Table II and Fig. 6 serve as the basis for developing and proving the rule-based micro-expression classifier [44].Given the evidence provided, we could define the following hypotheses and macro-expression rules: 1) LD 1 relates to the openness of the mouth and the lower region of the face given the following articles of evidence: • Sadness and Anger shared a low ∆LD 1 .The two common actions between the states were: "upper eyebrow frown" and "lips tightened/lowered corners".• The presence of two common actions would be troublesome if not for the presence of the surprise and happiness states, which also shared a low ∆LD 1 .
The common action between surprise and happiness revolved around raised lip corners and ultimately, the open mouth.
2) LD 2 relates to the region around the eyes i.e., the eyelids and eyebrows -the upper facial region, evidenced by: • The Anger-Sadness transition in Table II showing both a low ∆LD 1 and ∆LD 2 , if the initial hypothesis is that LD 1 is related to the mouth, then the second common action -"upper eyebrow frown" may be related to LD 2 , which supports the upper facial region relationship.• Analysing Fig. 6 and the transition from anger to happiness, we saw that the eyes remained the same shape, with the largest variance evident between full to partial anger states, when the frown was relaxed slightly.Removing the lower half of the face, we could observe that there were similarities between the brow/eye region of the two states.• The large variance between happiness and surprise.
Given that the open mouth was deduced as being referred to by LD 1 , we could clearly identify the difference between happiness and surprise frames in Fig. 6 through the upper region of the face, specifically the brow and eye regions, thus providing further evidence toward LD 2 relating to the upper facial region.By inferring the above rules for the macro-level LDA clustering approach, we were able to define a relationship that allowed mapping statistical features with real-world features vis-à-vis providing a vehicle for transient macro-level facial expression classification.

C. Micro-level Linear Discriminant Analysis
In the previous section, we could define the following rules: 1) LD 1 relates to the shape of the mouth and the lower facial region.2) LD 2 relates to the region around the eyes, eyelids, and brows -the upper facial region.These claims were substantiated through the necessary experiments.The micro-expression LDA clustering subsystem aims to prove the validity of the two claims, while providing a deeper analysis of dynamic facial expressions by focusing on these upper and lower facial regions.
An automated function was implemented to slice the CK+ images in half (horizontally), allowing to focus on both the upper and lower facial regions independently.An additional LDA clustering approach was then applied to the new image vectors in an attempt to validate the above hypotheses, thus allowing for the classification of micro-expressions in upper and lower facial regions.If the initial hypotheses were correct, then there should be a very discernible trend between states at the micro-level as this would indicate that the projected feature 'LD n ' is related to a particular group of muscles.
Let us describe the macro-level linear discriminant features as 'LD n ' -i.e., LD 1 = the lower facial region and LD 2 = upper facial region.Moving to the micro-level, let 'm' describe the micro-level features contained within the higher, 'n th ' level regions i.e., 'LD n.m '.For example, LD 1.1 and LD 1.2 describe micro-expressions in the lower facial region.
The clusters shown in Fig. 7A share a similar LD 2.2 value, with the largest variance being in the direction of LD 2.1 .We could also see that the clusters moved linearly Establishing this rule-based system provided the foundations on which the upper and lower facial region micro-expression detection and classification systems was built.

D. The Rule-Based Expert System
As discussed in previous sections, deploying the HSCN in a rule-based expert system would allow for the continuous monitoring and assessment of macro-level expressions of affective states as well as micro-expressions, which allow for the modelling of specific muscle movements in the upper and lower facial region.In the previous sections the macro-and micro-expressions were modelled using linear discriminant features defined by n th and m th level subscript notation i.e., LD n.m .Fig. 8 visualises the rule-based expert system and how facial image data is processed from input to output stages.Through classification, the proposed system is capable of assessing various levels of facial expressions and muscle movements using rules that have been derived prior.

IV. RESULTS
This work provides novel and useful information on facial expression recognition and how the movement of micro-and macro-level facial muscles can be used to build a robust feature space.Hence, this work should be regarded as a step forward from discrete affective state classification systems (capable of classifying one of 'n' discrete affective states [10], [25], [30], [31]) to a more continuous affective state classification system.The reported HSCN is a new and powerful classifier that exploits separation and clustering to categorise affective state expressions in a dynamic manner.The HSCN transforms seven independent facial expression clusters into twenty-one transient facial expressions clusters for classifying twenty-one upper and twenty-one lower facial region micro-expression classes with the help of a rule-based expert system.
Both macro-and micro-level algorithms used in this work were validated using the Random Forest (RF), Support Vector Machine (SVM) and K-Nearest Neighbour (KNN) approaches.Classifiers were trained using the clusters that had been defined through the separation and clustering subsystems of the HSCN.The CK+ facial expression images used to train the classifiers were resized to 100x100 pixels and were then flattened, providing 5842 image samples.An 80/20, train/validation split was used for testing the performance of each classifier.The validation accuracies in this work report the percentage of correct guesses with respect to the total number of guesses made, defined as: All classifier results were reported in Table III and Fig. 9, showing that classifier performances were competitive, given the range of predictable data in the proposed system.The macro-expression classifiers were capable of predicting twenty-one transient affective state expressions, modelling activation intensities across seven independent state axes.Furthermore, each of the the upper and lower facial microexpression classifiers could predict twenty-one variations of upper and lower facial muscle movements based on m th level 'LD n.m ' linear discriminant features and the rules defined in a rule-based system reported in Section III.
As reported, the maximum classification accuracies of 76.11% for macro-expression classification were achieved

Display facial expression OUTPUT INPUT
Fig. 8. Visual summary of the proposed HSCN when applied as a rule-based expert system.The flowchart shows how facial image data is proposed from input to macro-and micro-level assessment stages, and how results are reported and displayed to the user after classification.using a RF classifier.We achieved 73.63 and 87.68% classification accuracies respectively for upper-and lower-facial region micro-expression classification with the SVM classifier.
The observed results are comparable with the recent discrete affective state assessment solutions.Looking at the facial expression classifiers reported in [10] for example, we see that the accuracies of these systems range between 41% and 88%, classifying comparably less affective states.The observed HSCN classifiers prove that the resolution and dimensionality of a recognition system can be improved without hindering classifier performances.Furthermore, it proves that continuous affective state assessment solutions should be exploited more when compared to discrete models.
Performances of the lower and upper facial microexpression classifiers were consistent with the human observations made in [31], [35], stating that the classification of lower facial expressions is on average, more accurate than that of upper facial expressions.Looking at Fig. 7A and Fig. 7B, one would see why this might be the case.The lower facial region micro-expression clusters show a larger separation across LD 1.1 and LD 1.2 axes when compared to the upper facial region micro-expressions, primarily showing variations in the LD 2.1 axis for most states.In reality, the reason for this could be the prominence and relative size of the mouth and lips in the lower facial region.Relatively speaking, the mouth is a larger facial feature when compared to others and muscle activations around the mouth region would generally have a larger impact when compared to muscle changes around the eyes or brow region for example, thus supporting why it may be easier to classify lower facial region expressions when compared to upper facial expressions.
The rules derived in Section III along with the classifier results discussed above allow us to support real-world phenomena with statistical findings.Through these findings, the importance of both upper-and lower-facial region muscle movements in facial expression classification have been highlighted and reinforced.

V. CONCLUSION
The ability to detect and classify micro-expressions would help affective state assessment in demanding conditions.However, the complex nature of continuous expression analysis would require developing comprehensive models of affective state-caused variations in facial features.The HSCN, proposed in this work, provides a novel and reliable approach of modelling micro-and macro-level facial muscle movements for affective state assessment.
The HSCN approach is capable of predicting twenty-one, macro-level transient affective state expressions as well as twenty-one upper and lower facial region micro-expressions.The HSCN uses models containing multidimensional expressions of affective states.It was capabl of achieving classification accuracies of 76.11, 73.63 and 87.68% for macroand micro-level (upper-and lower-facial region) classification subsystems.As discussed in Section III, the reported validation performance of the HSCN makes it comparable with several previously reported affective state classification systems.
The proposed rule-based HSCN was built upon the theoretical foundations of the EMFACS and continuous affective state expression models.Through a combination of: (i) unsupervised cosine similarity-based separation, (ii) LDA-based clustering and (iii) traditional supervised learning classifiers, the HSCN's predictive capabilities show that it is a quantitative assessment tool and is supported by a theory-driven back-end.
Following this work, any future research should focus on integration of the HSCN into a multimodal affective state assessment system for providing a dynamic assessment tool capable of detecting and classifying transient facial expressions in real-time.We intend to expand and modify the HSCN architecture for incorporating human speech as well.The goal will be to model changes in affective speech expressions as continuous, time-dependent functions and using them for affective state assessment.
Human expression of affective states depends on many external and internal factors, culminating in visible and continuous fluctuations in mood and emotion.As demonstrated in this work, modelling the macro-and micro-level affective state expressions as continuous events in time would allow machines to have a greater understanding of the human psyche, consequently improving the way humans and machines interact with each other.

Fig. 1 .Fig. 2 .
Fig. 1.Visual representation of the proposed HSCN.The unsupervised cosine similarity-based separation method (TOP) leads to macro-level linear discriminant-supported clustering in the discriminant space.As evident in the figure, macro-expression classification becomes possible after clustering (MIDDLE).The HSCN allows for the classification of twenty-one upper and twenty-one lower facial region micro-expressions (BOTTOM).

Fig. 3 .
Fig. 3. Macro-level LDA clustering results.Subplot 1 on the top left shows the input from the initial, cosine similarity-based separation algorithm displaying all data samples extracted from the CK+ dataset.Subplot 2 on the top right displays the cluster centroids for each of the macro-level expressions.Subplot 3 on the bottom right visualises the two axes (sadness-happiness and anger-surprise) that can be inferred from this projection.
VI. COMPLIANCE WITH ETHICAL STANDARDS Curtin University's ethics approval No. HRE2019-0722 dated 23.10.2019obtained for this work required adhering to the highest levels of ethical standards.No animals were involved in this research.All data, hardware, software and related code remain the intellectual property of Curtin University.Dr Masood M Khan and Jordan Vice received funding (APA18336230) from Curtin University to carry out this work.All authors hereby declare that they have no conflict of interest.article.assessment of affective states, and affective computing.He received the 2019 Proxima Consulting Prize for Most Outstanding Final Year Project in mechatronic engineering.His works attracted media attention in 2019, 2020 and 2022.MASOOD MEHMOOD KHAN (Member, IEEE) received B.E. (mechanical), M.S. (Systems engineering) and Ph.D. (computational engineering) degrees.He taught at the National University of Computer and Emerging Sciences, Jefri Bolkiah College of Engineering and the American University of Sharjah before joining the Faculty of Science and Engineering at Curtin University Australia.He has published 55 peer-reviewed research papers in top-tier high quality journals and conference proceedings.His research interests include machine learning, affective computing, machine vision, human-computer interaction, AI and robotics.He is a fellow of the Higher Education Academy.TELE TAN received B.Eng (First Class Hons) and PhD degree from University of Surrey (UK) in 1990 and 1993 respectively.He has twodecade long research and development experience in computer vision and pattern recognition.In 2002, Tele pioneered the approach of using image synthesis techniques to improve the performance of face recognition systems making it possible for face recognition systems to be deployed in outdoor and uncontrolled environment while maintaining high accuracy performance and low error rate.In 2009, he founded the Studio for Experiential Sensing and Virtual Environment (SESVE) at Curtin University to foster research in human factors studies associated with various forms of visual analytics.Tele co-founded in Autism Academy for Software Quality Assurance in 2015 to help support the transitioning of students with Autism to employment in the fast growing IT industry.SVETLANA YANUSHKEVICH (Senior Member, IEEE) received the Dr.Sci.(Habilitation) degree from the Technical University of Warsaw, in 1999.She is a Professor at the Department of Electrical and Software Engineering (ESE), Schulich School of Engineering, University of Calgary.She directs the Biometric Technologies Laboratory, University of Calgary, the only research facility dedicated to biometric systems design in Canada.She was with the West-Pomeranian University of Technology, Szczecin, Poland, prior to joining the ESE Department, University of Calgary, in 2001.She has been contributed to the area of artificial intelligence for digital design and biometrics, since 1996.Most recently, she and her team have developed novel risk, trust and bias assessment strategies based on machine reasoning, with applications in biometric-enabled border control, forensics, and healthcare.IAIN MURRAY received B.Eng (Hons) in Computer Systems Engineering and a PhD degree.He has worked in the field of assistive technology for more than 25 years.Currently, he is the Curriculum Lead-Engineering in the School of Electrical Engineering, Computing & Mathematical Sciences.His research interest include learning environments for people with vision impairment, embedded sensors in health applications, the IoT and assistive technology.He founded the Cisco Academy for the Vision Impaired in 2002 to deliver ICT training to vision impaired people.He has supervised over 20 research students and has published in excess of 110 peer reviewed articles.He is a Member of the Order of Australia, Fellow of the Australian Computer Society and a Curtin Academy Fellow.

TABLE I LIST
OF AFFECTIVE STATES PRESENT IN THE CK+ DATASET AND THEIR CORRESPONDING FACIAL MUSCLE ACTIONS IDENTIFIED IN FACS AND EMFACS STUDIES.
Convert x i to greyscale Flatten x i , i.e. x i = {x 1 , x 2 , ..., x 22500 } end Let 'x i ' = i th test facial expression vector Let 'x j ' = comparison vector through Algorithm 1. Algorithm 1: Cosine similarity-based separation input: Continuous CK+ Dataset samples Define X = {x 1 , x 2 , ..., x N } for x i in X do Extract Facial Image Reshape x i to 150 × 150 pixels

TABLE II STATE
-TO-STATE COMPARISON SHOWING DIFFERENCES BETWEEN THE n th LINEAR DISCRIMINANTS.THE HIGHLIGHTED NUMERIC VALUES DISPLAY HIGH AND LOW |∆LDn| VALUES.

TABLE III HSCN
MODEL PERFORMANCES WHEN TRAINED AND VALIDATED ON THE CK+ DATASET USING AN 80/20 TRAIN/ VALIDATION SPLIT AND 21 UNIQUE CLASSES PER FACIAL REGION.