Shape and Texture Aware Facial Expression Recognition Using Spatial Pyramid Zernike Moments and Law’s Textures Feature Set

Facial expression recognition (FER) requires better descriptors to represent the face patterns as the facial region changes due to the movement of the face muscles during an expression. In this paper, a method of concatenating spatial pyramid Zernike moments based shape features and Law’s texture features is proposed to uniquely capture the macro and micro details of each facial expression. The proposed method employs multilayer perceptron and radial basis function feed forward artificial neural networks for recognizing the facial expressions. The suitability of the features in recognizing the expressions is explored across the datasets independent of the subjects or persons. The experiments conducted on JAFFE and KDEF datasets demonstrate that the concatenated feature vectors are capable of representing the facial expressions with better accuracy and least errors. The radial basis function based classifier delivers a performance with an average recognition accuracy of 95.86% and 88.87% on the JAFFE and KDEF datasets respectively for subject dependent FER.


I. INTRODUCTION
A face is a unique trait for identifying or recognizing people around us. Facial expressions are the reflection of emotional states and play an important role in non-verbal communication. A facial expression is a result of the motions of the muscles underneath the skin of the face. Analysis of these expressions helps in understanding the behavior of a person and certain anatomical changes. For example, the heart rate is higher in anger as compared to happiness and the skin resistance decreases during sadness revealing high stress. Thus, automated FER and subsequent analysis have found its applications in various domains such as surveillance, crowd emotion monitoring, psychological disorder detection, human-computer interaction, driver safety assistance and so on. Ekman and Friesen [1] has formalized The associate editor coordinating the review of this manuscript and approving it for publication was Rosalia Maglietta . six universal facial expressions such as surprise, fear, disgust, contempt, anger, sadness and happiness. These expressions have evolved through social learning and very much essential for survival.
FER is a pattern recognition system and the basis for FER is observation/identification, understanding and interpretation of the visual cues on the face. The main components of FER are (i) face detection (ii) feature extraction and (iii) expression classification. The spatial arrangement of the facial features like shape, fine lines, wrinkles on the facial skin and the structural elements of the face such as forehead, eyebrow, eyes, mouth create different patterns on the face during an expression. These patterns form the key characteristics comprising both micro and macro details. The micro details include variation in the wrinkles and fine lines changing the facial appearance and texture of the skin. Whereas, the movements in the structural elements of the face indicates the macro details. A raised and arched eyebrow VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ with eyes wide open represents surprise. A lowered eyebrow with intensely staring eyes indicates anger. Disgust expression is expressed with a wrinkled nose and lowered eyebrows. Similarly, an open mouth represents fear whereas corners raised mouth shows happiness. The extraction of these patterns forms the feature descriptors for recognizing facial expressions. Feature extraction and description play a major role in deciding the accuracy of the FER system. The features should be redundant, reliable, robust and unique with the best discriminating ability. Extracting the right features is critical and also a challenging task. Most of the FER methods have been proposed for capturing the facial expressions through shape and texture features as descriptors that include Zernike moments (ZM), histogram of oriented gradients (HOG), active shape model, local binary pattern (LBP), local directional pattern (LDP), statistical measures and gray level co-occurrence matrix (GLCM) [2]. These methods extract either the macro or micro details alone to describe the facial expressions. But, it is important to provide a maximum possible description to enhance the performance of the system. Thus, both texture and shape information provided by the Laws texture energy measures and Zernike Moments are utilized for FER. Texture information is obtained by five types of kernels such as level, edge, spot, ripple and wave. Distinct information is obtained with each kernel revealing the changes in the wrinkles and fine lines of the facial skin. Zernike moments provide the shape information of the changes in the facial appearance due to the movements in the structural elements. In this perspective, ZM is extended to spatial pyramid representation with three-level decomposition to capture the shape information in each facial sub-region thus forming Spatial pyramid Zernike moments (SPZM). The Law's texture energy features (LTexM) and SPZM are combined to form the integrated feature vector that has improved discriminating ability to recognize and classify the facial expressions. The contributions of this paper are (i) an integrated feature set for capturing the macro and micro details from the facial expressions using SPZM and Law's texture features (ii) quantitative assessment of improved recognition accuracy by considering the images with different orientations under subject dependent and independent FER (iii) integration of features for the effective representation of the facial expressions (iv) robust features for recognizing the facial expressions from the full left and right profile of facial images.
The literature review is presented in section II. Section III describes the Law's texture energy measures and SPZM. The proposed methodology and the experimental results are presented in sections III and IV respectively. Conclusion is presented in section V

II. RELATED WORK
The different approaches for extracting the facial feature to recognize the expressions include (i) Facial action coding system (FACS) and Action unit (AU) (ii) geometric, appearance and hybrid methods. In AU method, the movement of muscles responsible for producing a facial expression is encoded into 46 facial AUs [3]. FER system detects the face AUs to classify the expressions using observations and comparisons. FACS describes the relationship between the muscle movements of the face and expressions and was introduced by Ekman and Friesen [1] based on the characteristics of AUs. The second method depends on the extraction of content-based facial features. It depends upon the appearance, geometric and hybrid characteristics. Appearance-based approaches are holistic capturing global information from the facial images to generate the feature vector for distinguishing facial expressions. Non-holistic approaches make use of geometric features such as eyes, nose, mouth, chin, head outline of a face and their relationships.
Holistic approaches apply transformations and use statistical methods to extract the features representing the texture characteristics of the image. Gabor filter [4]- [8] provides texture descriptors with good discrimination ability. It provides both magnitude and phase components, but magnitudes are selected as features as they are invariant to translation. Gabor filter provides multidimensional feature vectors with high computation cost and the dimension of features can be reduced by principal component analysis (PCA). LBP [9], [10] generates binary patterns to represent the texture by comparing the center pixel with the neighborhood pixels of a region from the facial image. Variants of LBP were further introduced for improvement in the performance of the FER system. A weighted multi-scale method for LBP was proposed by [11], in which multiple weighted LBP features are extracted with different scales. The extended LBP is combined with the Karhunen-Loeve (KL) transform in [12]. The role of subspace analysis methods such as PCA and Independent Component Analysis (ICA) is investigated for the extraction of the facial expression features [13]. Other methods like local directional numbers (LDN) [14], local ternary pattern [15], discrete wavelet transform (DWT) [16] and sparse local Fisher discriminant analysis (SLFDA) [17] were also explored for FER in recent years.
Non-holistic approaches extract the geometrical features providing the position of facial landmarks and shape information. Geometric features were extracted using the curvelet transform. The coefficients of the transform with varying scales and angles form the feature vectors to represent the facial expressions [18]. Histogram of oriented gradients, originally proposed for object recognition, was found significant for FER. HOG provides the magnitude and phase of the gradients from which the dominant gradients are selected relating to the edge information [19], [20]. In [21], facial landmarks are tracked based on elastic bunch graph matching (EBGM) displacement. The facial landmarks or combination of the landmarks form the feature vector representing a facial expression. In [22], optical flow based facial points were tracked from consecutive frames to detect the movement of facial points to provide dynamic features. Moments based shape descriptors are critical in representing the facial expressions. Zernike moments with magnitude features, exhibiting orthogonal and rotation invariant properties, are used in [2], [23], [24]. Pseudo Zernike moments (PZM) [25] also provide a good description for FER. Appearance-based methods are affected by lighting and orientation conditions. Geometry based methods provide better feature descriptors working well irrespective of the variations in the facial image.
Further, the recent years have also witnessed the success of deep learning methods with multilayered architectures in facial expression recognition. The deep learning methods automatically compute the features for data representation while reducing the requirement for extracting the hand crafted features. The work presented in [27] utilized the ZM for deriving the coefficients of the convolution kernels in convolution layer of CNN architecture. This was found to be significant in extracting the shape features and improved the classification accuracy. Deep sparse auto encoders were implemented by [28] to learn discriminative and robust features. The work presented in [29] introduced a part based hierarchical bi directional recurrent neural network to analyze the dynamic evolution and morphological variations in facial expressions which proved to be effective in reducing the error rates. Modifications in CNN architectures are introduced by [30]- [33] to enhance the performance of FER system. Generative adversial networks(GAN) [34], [35] with generator and discriminator networks have also emerged to be better models in discriminating the facial expressions. Though deep features are efficient and have outperformed the existing hand crafted feature methods, the deep learning methods require larger datasets for training and are computationally expensive.
Thus this paper proposes a method considering the advantages of appearance and geometric based methods. Hence, the features from holistic and non-holistic approaches are combined to form a robust feature vector for improved classification. The efficacy of the selected feature descriptors in the proposed method is proved by focusing on both subject dependent and subject independent FER.

III. SPZM AND LTexM INTEGRATED FEATURE SET MODEL FOR FER
The schematic of the proposed feature concatenation strategy for recognizing facial expressions is shown in Fig. 1. The FER is a pattern recognition problem where a facial pattern (FP) is assigned one of the m expression labels The key characteristics or patterns are extracted from the facial images with expressions in the form of feature vectors. The selection of the right features improves the degree of accuracy in classification. LTex and SPZM are extracted from the facial images and integrated to form a feature vector for training a neural network classifier. Upon training, the classifier model is observed to classify the facial expression of an image. Finally, the performance of the model is quantitatively assessed using various performance measures based on the elements of the confusion matrix.
The process of the proposed approach is presented in Algorithm 1. The subsections present the comprehension of the proposed work.

A. FACIAL EXPRESSION DATASET
The two databases (i) Japanese female facial expression (JAFFE) and (ii) Karolina Directed Emotional Faces (KDEF) are experimented to analyze the performance of the proposed FER The JAFFE database [36] is widely used for evaluating the performance of FER systems. JAFFE is a dataset collected from 10 Japanese females. Each individual has posed with six basic facial emotions like angry, disgust, fear, happy, sad and  For the proposed work, 700 facial expression images are acquired from 20 female subjects with all five orientations. All the images were converted to gray-scale for computational reasons. The problem of facial expression classification is more challenging with this dataset as some images are captured with different orientations and only partial face is visible in full left and full right profiles. Thus, the FER system with a robust feature extraction method is required to overcome the complexity of the problem. The performance of the proposed framework was validated considering subject dependent and subject independent facial expression recognition.

1) SUBJECT DEPENDENT FER
For subject dependent expression recognition, the JAFFE and KDEF databases were used individually. A single dataset, containing the facial expressions of all the subjects, is divided into training and testing sets with no overlapping using hold out(HO) method. 5% of the images from the JAFFE database and 60% from KDEF database are used for training to build the classifier model. The remaining samples are used to evaluate the model after testing. Though the testing and training sets are disjoint, they are framed considering the images of all the subjects from the dataset as shown in Fig. 4(a) for the JAFFE database.

2) SUBJECT INDEPENDENT FER
For Subject independent FER, the subjects or persons considered with their facial expressions for training and testing should be disjoint. Thus the images of the 20 female subjects from the KDEF database were considered for training while the facial expression images of the 10 subjects from the JAFFE database were used for testing the model.

B. FEATURE EXTRACTION
Features were extracted from the facial expression images to carry out classification and recognition.

1) FEATURE EXTRACTION USING ZERNIKE MOMENTS
During a facial expression, a lot of movements in the structural elements of the face: eyebrows, eyes, nose, mouth and chin can be observed, thus resulting in changes in the appearance of the face. The most prominent change is in the shape characteristics of the face. These changes need to be captured and represented by a suitable shape descriptor to recognize and classify the facial expression to interpret the emotion. The preferred shape descriptor should have two characteristics to provide the best discrimination between the facial expressions. The characteristics are (i) invariance to rotation, translation and scaling (ii) lower redundancy between the features. Zernike Moments [38], from the class of orthogonal moments, proved acceptable to be the shape descriptors with a higher degree of information satisfying the above requirements [39]- [41]. The orthogonality of the moments provides lower possible redundancy and the magnitude of the ZM is invariant to rotation. Further, the lower and higher-order moments provide global shape and detailed information respectively. Thus, to capture the complete shape information from an image, a set of ZM is computed by varying the moment orders from a lower value to the higher. This process involves a significantly larger number of computations.
To reduce the complexity of the computation, the size of the set of ZM is predefined and combined with the spatial pyramid of the image to obtain the spatial pyramid ZM feature. A key characteristic of ZM that best complies with the spatial pyramid of the image is its hierarchical nature. The lower order moments provide the global shape information and the higher-order moments reveal the local shape information.

2) SPZM FEATURES
The SPZM feature is motivated by the concept of spatial pyramid representation of an image [42], [43], the shape descriptors and, Zernike moments. The idea is to extract the shape information at different levels of the pyramid to create a multilevel shape representation to capture the complete pattern created from the facial expression. The ZMs are computed at each pyramid level and are concatenated to form SPZM feature vector. Thus, this descriptor can be viewed as The ZM is computed by projecting each cell C(x, y) on to the set of complex Zernike polynomials V nm (x, y) where, n is the order of the polynomial m is the repetition factor such that |m| n and n − |m| is even ρ is the length of the vector from the origin to the pixel located at spatial location (x, y) and is presented by ρ = x 2 + y 2 θ angle of the vector from origin to the pixel located at spatial location, (x, y) from the x-axis in counter clockwise direction and VOLUME 9, 2021 R nm (ρ) is the radial polynomial defined as The ZM thus computed is a complex quantity, from which the magnitude |A nm | and phase Arg (A nm ) are obtained. The magnitude |A nm | is selected since it is preserved as the shape descriptor by varying the order of the moment and the repetition factor. For a cell C(x, y) a set of ZM magnitudes [|A 11 |, |A 20 |, |A 22 |, |A 31 |] is computed by varying the moment order from 1 to 3 with the associated repetition factors. The moment |A 00 | is ignored as the zeroth-order Zernike polynomial. V 00 is flat, in turn, an image projected on to this polynomial does not provide any edge maps or shape information. In ZM, the order of the polynomial indicates the radial component and the repetition factor indicates the sinusoidal component. With higher-order, the number of turning points in Zernike polynomial rises and thus provides better shape information of an image. The next higher-order polynomials next to zero are considered for image description in this work. The Zernike polynomial basis functions V 00 , V 11 , V 20 , V 22 , V 31 over a unit circle and their magnitudes |V nm | are provided in Fig. 6(a) and Fig. 6(b). Fig. 6(b) illustrates the rotational invariance characteristics of the ZM. The one dimension profile of the 2D Zernike polynomials is displayed in Fig. 6(c) to emphasize the number of turning points obtained with a variation in moment order. The complete shape description for an image (Fig. 5) is obtained by concatenating [|A 11 |, |A 20 |, |A 22 |, |A 31 |] obtained for each C(x, y) at levels l = 0,1,2 as indicated in equation (4)

3) LAW's TEXTURE FEATURES
Texture features provide the statistical measures of an image based on the spatial arrangement of the pixel intensities. In the proposed work, the texture features are utilized to capture the micro details of the facial expression reflecting the texture of the skin through the formation of wrinkles and fine lines. These details are significant and contribute to identifying and recognizing the emotional state associated with expression. Kenneth I Laws [44] proposed texture energy measures for classifying the textures. These features are invariant to changes in rotation, contrast and luminance. These measures find applications in various domains [45]- [47]. The Law's texture feature extraction method is presented as, 1) Law's texture features used a set of one-dimensional convolution masks that are center-weighted and symmetrical or anti-symmetrical with varying dimensions of 1×3, 1×5 and 1×7. It is proved that masks with dimension 1×5 provide better texture descriptors. Each 2) One dimensional masks were combined to generate two-dimensional masks by computing the outer product of each vector as indicated, This resulted in a set of 16 convolution masks 16 , each with a dimension of 5×5. The appropriate CM were selected from the convolution masks based on the listed criteria: (i) the mask L5L5 was dropped as it is sensitive to changes in the intensities and the sum of the elements of the mask adds up to a non zero value. (ii) The masks that provide a similar type of information were combined. For example, L5E5 measures vertical edge content while E5L5 presents horizontal edge content, thus the average will provide the total edge information. Accordingly, a set of 9 convolution masks {CM i } i=1,2......... 9 were finally selected as presented in equation 5.   Fig. 8.

C. FEATURE INTEGRATION
The Law's texture features and shape features from ZM are concatenated to form an integrated feature vector FV int . As the contributions from the texture and shape features are different, the feature set consolidates the information from both the features using a series fusion rule.
FV int ={Law's texture features,Spatial pyramid ZM features} During integration, no weights were considered to give equal preference to both the feature types. The integrated feature set is normalized using Z-score normalization to form a new vector FV int_norm with mean 0 and standard deviation 1. Feature normalization is crucial as it is one of the requirements of the machine learning algorithm and the magnitudes of the features in the feature vector influence the weight update during training process [49]. The normalized feature set is represented by, where µ is the mean and σ is the standard deviation of the features. The features are normalized by retaining their original properties.

D. CLASSIFICATION USING ARTIFICIAL NEURAL NETWORKS (ANN)
The extracted texture and shape features from the facial expression image are fused to form an integrated feature set FV int_norm . The concatenated feature vector is provided to the feed forward artificial neural network (ANN) for classification. ANNs are used extensively for facial expression

1) MULTILAYER PERCEPTRON NEURAL NETWORK
The neural network architecture is designed with an input layer, one hidden layer associated with a hyperbolic tangent activation function f (x) = ( 2 1+e −2x −1 ) and an output layer with a linear activation function. Each layer is associated with the trainable parameters known as weights and biases. The input layer accepts an input vector, and passes to the hidden layer for processing. The hidden layer output is fed to the output layer to present the network's final output.
The MLPNN network is trained with the labeled feature vector {FV int_norm , C} to adjust the trainable parameters for tuning the output close to the target values. For training, Levenberg-Marquardt back propagation algorithm [48] was used with mean squared error as the cost function. The stopping criteria was set by defining the number of epochs. MLPNN was tuned by providing performance goal(MSE), learning rate (η), momentum(m) and regularization parameter (λ) to generate the optimized classifier model.

2) RADIAL BASIS FUNCTION NEURAL NETWORK
RBFNN is a three layered network with input, hidden and output layers. The network is robust and converges at a faster rate. RBFNN uses Gaussian activation function from the class of radial functions in the hidden layer which is monotonic with respect to the distance from the center. The Gaussian function with center µ and spread/radius is represented by For the known input FV int_norm , the hidden layer computes the distance between the input and centers of the basis functions ||(FV int_norm − µ)||, where ||.|| is the L2 norm and applies RBF function. The computation is continued in the output layer that predicts the class label of the sample. The RBFNN is provided with {FV int_norm , C} for training. During training the learnable parameters of the network, spread of the activation function and the weights connecting hidden layer to output layer, are tuned to obtain the best model.
The effectiveness of the models are now tested using the samples from testing set with unseen samples. For the testing sample, the features FV int_norm_test are extracted and is fed to classifier model to predict the class label

IV. RESULTS AND DISCUSSION
To evaluate the performance of the proposed method, two experiments (i) subject dependent FER and (ii) subject independent FER were performed on JAFFE and KDEF databases.

A. SUBJECT DEPENDENT FER
The first experiment was conducted to investigate the combination of ZM based shape descriptors and Law's texture features for efficient facial expression representation and discrimination. This was experimented on the JAFFE and KDEF databases separately using the hold out method. As per the method, two disjoint sets were created for training and testing the classifier model.

1) CASE (I)
For all the images of the training set the SPZM and texture features LTexM were computed. The SPZM are extracted by considering the ZM orders n = 1,2,3 with the associated repetition factors as presented in equation (4). The SPZM shape features of length 84 are integrated with LTexM and normalized to form FV int_norm_1 with dimension 1 × 102. The labeled and normalized feature vector is provided to the neural network classifiers for training the model that can further predict the expression label for the unseen sample.
The MLPNN was trained by providing the labeled training dataset to learn the relationship between the input feature vectors and the class labels by modifying the weights and biases. The network was tuned appropriately by setting the number of epochs limit to 1000 and cost function close to zero. The network converges and provide the best model for η = 0.5, m = 0.95 and λ = 0.6. The model is tested by providing the facial images from the testing set. The ability of the model in classifying all the facial expressions is assessed by computing the performance metrics from the confusion matrix that include classification accuracy(CA), true negative rate (TNR), false acceptance rate(FAR) and false rejection rate (FRR). Table 1 shows the confusion matrix.
Where, True Positive (TP): true accept. True Negative (TN): True reject   Here TP and TN represent the correct facial expression classifications whereas FP and FN are miss classifications, For example, a happy face falsely recognized as sad or angry.
From the result of the testing process, confusion matrices [49] are framed for each expression for both databases as presented in Table 2. From the attributes of the confusion matrix, the performance metrics are calculated.
Classification accuracy = TP + TN TP + TN + FP + FN (9) TNR/Specificity = TN TN + FP (10) The performance metrics computed for all the expressions of the testing dataset from JAFFE and KDEF are shown in Table 3.
The experiment was conducted to prove the discriminating ability of the integrated feature set in the accurate classification of facial expressions. From the results, it is observed that the proposed method delivers more improved classification accuracy. Also, it can be observed that the MLPNN provides high CA (greater than 90%) for the expressions angry, fear, happy, neutral and surprise for the facial images from JAFFE dataset and CA greater than 80% for fear, neutral and surprise from KDEF dataset. Further, the specificity of the system was also measured through TNR. A value, TNR=1(100%), indicates the best specificity. A higher value of TNR close to one produces less false-positive results. From the results of the JAFFE database, it is observed that the average TNR is 96.90%, so the average prediction of false positives is only 3.1%. Similarly, for KDEF dataset, the average TNR is 89%.
In the next experiment, the system was tested for type-I and type-II errors which represent false match and false non-match respectively. FAR indicates type-I error whereas FRR indicates type-II error. Ideally, the two errors are inversely proportional to each other. False acceptance and false rejection recognize the facial expression and assign an emotion class label to which it does not belong to. These two are undesirable and may affect the success rate of a FER system. In some applications like psychological disorder detection, it may lead to false diagnosis and if a device has to take some actions based on human emotions. The plots shown in Fig. 9(a) and Fig. 9(b) display FAR and FRR of both databases. These plots aid in analyzing the relationship between FAR and FRR for each facial expression. Fig. 9 notifies the inverse relationship between the errors which ensures that the proposed framework returns the least possible errors for identifying the emotions represented by the facial expressions. Thus the proposed system provided an average FAR of 3.08% and FRR 3.12% for JAFFE dataset. Similarly, the two measures are 11.28% and 9.12% respectively for KDEF dataset which is desirable for a FER system to contribute a higher performance rate. Later, the RBFNN was trained and tested with the training and testing sets from JAFFE and KDEF datasets. For the training process, the cost function, sum squared error(SSE), was set and the spread(σ ) of the radial activation function was varied from σ = 2 in incremental steps of 1 to improve the performance. With each variation in σ , the accuracy of the model was noted. The network converged for σ = 8. Fig. 10 shows the accuracy of the model with respect to spread of RBF obtained for JAFFE database.
From the Fig. 10, it is noted that the maximum accuracy of 94.35% is obtained for σ = 8. With the optimized value, the performance metrics are computed for all the expressions of the testing datasets and are presented in Table 4. The analysis of the performance measures provided by RBFNN indicates that the expressions angry, happy, neutral were classified with CA greater than 96% from JAFFE dataset. Similarly the expressions fear, happy, sad and surprise were classified accurately with CA greater than 80% from KDEF dataset. Similarly, the model also provides the specificity (average) of 96.70% and 90.05% for JAFFE and KDEF datasets respectively which is an indication of lower false positives. The evaluation of type-I and type-II errors indicates the performance of the model in providing the least error values, FAR of 3.28% and FRR 2.38%, for JAFFE dataset. The two measures are 9.93% and 10.22% respectively for KDEF dataset.

2) CASE (II)
The experiment on subject dependent FER was continued with the next set of feature vectors. At this point, the texture features remain the same as considered in case (i) whereas the SPZM are captured by varying the ZM order from n = 2 to 4. The variation in n with the corresponding repetition factors resulted in [|A 20 |, |A 22 |, |A 31 |, |A 33 |, |A 40 |, |A 42 |] which are utilized in equation (4) to provide SPZM of length 126. The order of ZM was varied to explore the competency of higher order ZM in presenting best shape features. Now these features are combined with LTexM features to form integrated feature vector FV int_norm_2 with dimension 1 × 144.
The FV int_norm_2 framed for the training datasets of JAFFE and KDEF are provided to both MLPNN and RBFNN classifiers. The classifiers were trained by varying the trainable parameters of the network to improve the performance. The MLPNN parameters optimized for η = 0.5, m = 0.95 and λ = 0.6 and RBFNN converged for σ = 18. The models were later evaluated for their performance by providing the samples of the testing data set. The measures computed on testing dataset with FV int_norm_2 for both MLPNN and RBFNN are presented in Table 5 and Table 6.
The result and the analysis of the performance measures obtained from the second set of feature vectors FV int_norm_2 implies improvement in the performance of the FER system. The average classification accuracies of 94.35% & 95.86% for JAFFE and 94.35% & 95.86% were achieved from MLPNN and RBFNN classifiers respectively. Another noteworthy improvement is in terms of type-I and type-II errors. The results clearly indicate that the average type-I error for all the expressions is reduced by 30% and type-II error by 36.84%. On the similar terms, CA of 86% and 88.87% was observed.
The results presented by MLPNN and RBFNN for both FV int_norm_1 and FV int_norm_2 were also examined to check for the discrimination ability. The assessment  signified the performance of RBFNN that can be noted from the Fig. 11.
Later, the analysis is extended to compare the performance of the proposed method with state of the art facial expression recognition techniques and the same is presented in Table 7.
A comparative evaluation was carried out considering different image descriptors for the JAFFE and KDEF databases. The Table 7 provides the average classification accuracy which implies that the proposed method is comparable with the state of the art techniques.

B. SUBJECT INDEPENDENT FER
In the second experiment, the suitability of the extracted and integrated feature sets was checked for subject independent FER. To evaluate the performance, the combination of both JAFFE and KDEF databases was considered. The integrated features FV int_norm_1 and FV int_norm_2 were extracted from the facial images of both KDEF and JAFFE databases. The feature vectors of the KDEF dataset were labeled to form the training set while the JAFFE dataset forms the testing set. The performance metrics obtained with MLPNN and RBFNN are shown in Table 8 and Table 9.
The results were analyzed to find that the model with FV int_norm_1 and MLPNN delivered a CA greater than 80 percentage for the angry and surprise expressions but for disgust, happy and surprise RBFNN delivers. Similarly, for the feature set FV int_norm_2 , an accuracy greater than 80 percentage was obtained for neutral expression with MLPNN and sad & surprise expressions with RBFNN. Also the average specificity is found to be greater than 85 percentage for both the integrated feature vectors for MLPNN and RBFNN that reduces the false positives. The metrics FAR and FRR presented in Tables 8 and 9 suggest that the proposed method has better generalization capability. The proposed method is able to identify the true facial expression independent to the subject. It is observed that the recognition of facial expression is improved by considering higher order ZM (FV int_norm_2 ) and RBFNN performs better as compared to MLPNN in identifying the subject independent facial expression.
The descriptors were also significant in discriminating the emotions across the datasets (subject independent).

C. ABLATION STUDY
An additional experiment was conducted using ablation study to prove the effectiveness of integrating the Law's texture features with SPZM features in enhancing the performance of both subject dependent and independent FER.  For ablation study, the Law's texture features were removed from the integrated feature set retaining only the SPZM features in both training set and testing set. 3,4) Now the reduced training feature set with the class labels were provided to both MLPNN and RBFNN classifiers to create the model. For training the neural network the same criteria and tuning parameters were retained as mentioned in section III.D. The trained model is then tested with a reduced testing feature set. The performance is evaluated for recognition accuracy to investigate how the removal of texture features affects the performance of the subject dependent and independent FER. The results of the ablation study are displayed in Fig. 12, 13 and 14.
The classification accuracies shown in Fig. 12,13 and 14 obtained with the two neural network classifiers for both integrated feature vectors indicate that the combination of texture and shape descriptors are appropriate to represent the facial expressions for recognition followed by classification. From this study, It is noted that the integration of texture features contributes significantly in improving the recognition accuracy. It can also be noted that the RBFNN generalizes well by providing better performance in FER.

V. CONCLUSION
This paper presents a FER method using the combination of Law's texture features and ZM based shape descriptors. The proposed method uses feed forward neural networks MLPNN and RBFNN algorithms to learn the relationship between the features and the class labels of the emotions. The results proved that the proposed method has a good class discrimination ability and performed well with FV int_norm_2 where the texture features were integrated with SPZM obtained from higher order ZM. The RBFNN classifier presented a noticeable result with an average recognition accuracy of 95.86% and 88.87% on the JAFFE and KDEF datasets respectively for subject dependent FER using RBFNN. The model classified the angry, happy and surprise expressions with good recognition rate whereas the performance for disgust, neutral and sad expressions is quite small in comparison. The method is also able to perform well even in the presence of facial orientations. Subsequently, the proposed method has better generalization capability in identifying the facial expressions across datasets for subject independent FER. The framed feature set proved to be accurate in capturing the facial expression patterns and thus returned less number of miss classifications by providing least possible type-I and type-II errors.
Further, the effectiveness of combined features can be extended for better understanding and recognition of the micro expressions. These expressions last only for a fraction of a second on the face and that actually represent the true feeling of an individual. The proposed method will be revised for both spatial and motion information of the image sequence to describe the key characteristics from the dynamic facial expressions for classification. He has developed deep learning algorithms for performing image classification of medical images for disease diagnosis. He has published his works in many reputed and refereed journals indexed in Web of Science and Scopus. His research interests include image processing, advanced signal processing, image segmentation, machine learning, and deep learning.