Journals & Magazines >IEEE Access >Volume: 12

EmoSecure: Enhancing Smart Home Security With FisherFace Emotion Recognition and Biometric Access Control

The proposed block diagram of the smart home system consists of seven processes, including live video capture, face detection, iris recognition, eye extraction, emotion d...

Abstract:

The focus of smart homes is more inclined towards providing security and comfort to residents rather than treating energy as the foremost concern. This paper proposes a n...Show More

Metadata

Abstract:

The focus of smart homes is more inclined towards providing security and comfort to residents rather than treating energy as the foremost concern. This paper proposes a novel technique combining the FisherFace-based emotion recognition and the biometric security using the iris features for smart homes. The proposed emotion-based smart home system can refresh the user’s mood by detecting a real-time facial expression and adjusting the house environment - lighting, air-conditioning, and music system, accordingly. The unique features are the pattern of ridges in the iris and the pattern made by nerves on the sclera. The Canny edge detection is used to find the ridges along with pseudo coloring processing. The proposed iris bio-metrics security rests on the unique features obtained from the ridges of the iris. Resident authentication is provided through iris recognition, following which emotion recognition occurs. This model presents iris recognition through modified linear binary patterns and Daugman sheet conversion. The face extraction from a live video occurs using the Haar xml files, which retrieve the frontal face. Face emotions are then detected through edge extraction by applying Sobel filters and a combination of several steps - composite mask and eigenvectors followed by FisherFace recognition. A combination of these techniques produces a higher accuracy and recognition rate. The proposed system has shown a promising accuracy of 95.25% for iris recognition and 93.93% for emotion detection.

The proposed block diagram of the smart home system consists of seven processes, including live video capture, face detection, iris recognition, eye extraction, emotion d...

Published in: IEEE Access ( Volume: 12)

Page(s): 93133 - 93144

Date of Publication: 05 July 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3423783

Contents

SECTION I.

Introduction

The concept of a smart home has gained significant attention in recent years, offering the potential to revolutionize the way we interact with our living spaces. A smart home is a control-based housing system that integrates various devices and technologies to enhance convenience, comfort, and security. In a world with increasing competition and work pressure, home is the only place where one feels free. Home should be a place where one can relax, so tasks like closing doors, shutting off lights, turning on an air conditioner, etc., should require minimal steps. In a normal housing system, these steps cannot be reduced further without technology [1], [2]. For every person, a work day is different. A day’s experience can be easily detected through the person’s mood. Mood can be detected through facial expressions and expressions when a person returns from work. Also, emotions can be detected via speech, and some techniques use emotion and gaze detection for recommendation systems [3], [4]. Based on the person’s mood, different housing appliances can help him lighten his mood and reduce task steps for him. Thus, this helps to give them a pleasant and relaxed environment. User centric multimedia access home network [5] enables seamless access to content on the home network, automatically delivering it to nearby rendering devices based on the user’s location. The content is adapted to meet the specific constraints of each rendering device, and the system is also capable of maintaining session mobility.

This research paper introduces a novel approach that combines FisherFace-based emotion recognition and biometric security using iris features for smart homes. While traditional smart home systems focus on energy efficiency, this proposed system emphasizes providing comfort to residents by adjusting the house environment based on real-time facial expressions. The system also incorporates biometric security using iris recognition, leveraging unique features found in the iris patterns and nerves on the sclera. The iris recognition process involves ridge detection using Canny edge detection and pseudo coloring processing. Resident authentication is achieved through iris recognition, followed by emotion recognition. The paper presents iris recognition using modified linear binary patterns and Daugman sheet conversion. Face extraction from live video is performed using Haar xml files to detect the frontal face, and emotions are detected through edge extraction using Sobel filters and a combination of steps including composite mask and eigenvectors, followed by FisherFace recognition. The combination of these techniques results in improved accuracy and recognition rates for the system.

The different Smart Home Security based systems were built using Aurduino controller. These systems have implemented password, and voice-based authentication mechanism [6], [7], [8]. The password-based authentication mechanisms can easily be compromised by the hackers using different spywares. Whereas the voice-based authentication can be managed by using pre-recorded voice data of the actual person. As compared to both the mechanisms real-time facial iris-based recognition seems to be more effective.

The proposed system is proved to be more efficient as its authentication based on facial iris recognition. It is not limited to only smart security but also it talks about the real time facial expression detection for Smart home automation. The actual emotion of the person is recognized using fisher face algorithm. This algorithm mainly reduces the dimensionality of the face to extract the most essential features that helps in recognizing the actual emotion of the person. Using the proposed model better iris recognition, and emotion detection accuracy was obtained with less number of samples.

SECTION II.

Related Work

There are several ways of incorporating facial emotions and recognizing them after conducting the pre-processing on the dataset [9]. As we increase the lighting effects, it makes the image more focused, for extracting the features. The pipeline consists of gamma correction, selective filtering, and contrast equalization techniques. Due to this, a significant rise in the recognition rate and accuracy was obtained [9]. Since the features were now clearly indicated and marked, it was easier to process the emotions more clearly. In selective filtering, various schemes such as Gaussian filtering for smoothening, low pass filtering, and division of the original image by the Low Pass Filter (LPF) are described as separate means of getting an improved quality image for recognition. To address variations in illumination at low frequencies and noise at high frequencies, they employed a difference of Gaussian filter. The experiments also found that the band-pass filter did not preserve the details in the eye and mouth region as required. Sharpening both the eye and mouth before applying the band-pass filter was incorporated to solve it. For the eyes, application of the Viola-Jones technique and filtering out false positives. While for the mouth, Viola-Jones does not work very well as required [10].

The objective is to assess the impact of the pre-processing pipeline on a classifier utilizing FisherFace [11]. The paper highlights that FisherFace exhibits high sensitivity to changes in illumination due to the substantial differences in lighting compared to facial features. The study shows that a separate concept of Gaussian and division by the LPF has been devised for the lighting/pre-processing of data. The expression recognition using various approaches was studied. It was found that the use of holistic methods (Gabor wavelet and Neural Networks) for image-based approach and local methods like the Principal Component Analysis (PCA) plus the Neural Networks (NNs). Having used the Gabor filter for the lighting and increase in intensity, they compared the application of the local and the holistic methods. The recognition rate of around 84-86% was achieved using PCA plus NN. It was found that Facial Action Coding System (FACS) can improve Facial Emotion Recognition (FER) quite significantly [12].

A simple and efficient pre-processing sequence removes many effects of varying illumination while preserving the important appearance details needed for FER. A new method called Local Ternary Patterns (LTP) was introduced. The mentioned technique is an extension of the Local Binary Pattern (LBP) local texture method, designed to be more robust against noise in uniform regions. The researchers demonstrated that replacing the local histogram with a similarity metric based on local distance transform not only enhances the performance of LBP/LTP based Facial Expression Recognition (FER) but also reduces sensitivity to noise. The accuracy obtained in this method is around 86-87% and has a higher recognition rate [13].

To differentiate among both of them to find which one is suitable better for which dataset was found as per the survey and analysis, the FisherFace was depicted and concluded to have the least error rate while portraying the faces and computing the emotions as compared to the EigenFace. The error rate in the Eigenface was around 10%, and for the FisherFace, it was around 3-4% [14]. In the model [15], Feed Forward Neural Network (FFNN) improves face recognition accuracy performance. The facial expression recognition for the changes in the facial geometry has been proposed [16]. The modified eyemap and mouthmap algorithm and neural network are used in this technique for emotion recognition. A new method for recognizing emotions in images utilizes emotional concepts to establish a connection between the emotional content of the image [17]. This technique uses the knowledge graph to organize the relationship between concept and emotion. Then, a multi-task learning model is employed to recognize the emotion. An iris based identification system is implemented using optimized Support Vector Machine (SVM). For feature extraction the VGG16 model was used. Finally for classifying the iris image optimized SVM was used. It has shown a highest accuracy of 95% with the IITD dataset [18]. An iris-based identification system is developed using machine learning. For feature selection DFFT and PCA is used. The highest classification accuracy of 92% was obtained using K-nn algorithm [19]. Principal Component Analysis (PCA) based on Discrete Wavelet Transformation is used for the extraction of optimum features. Finally, SVM was applied to classify the iris image with 95% accuracy [20]. Different machine learning models are used to predict the real time facial emotion of a person. Initially PCA was applied to reduce the number of features. Finally different basic Models such as KNN, and support vector machine are applied to achieve a classification accuracy of 88% [21]. Transfer learning is used to identify the real time emotion of a person. Data augmentation was used to generate new samples. The model has reported highest testing accuracy of 65% [22]. After detailed investigation of the above discussed literatures the following research gaps are identified: - The existing Model complexity is high that consumes more computational resource and time. The models need additional pre-processing that enhances its time complexity. The models need a greater number of samples.

SECTION III.

The Proposed Technique

The system provides security through iris recognition. Iris biometric is used for user identification. Only if a person is authenticated via the system as a resident of the house, facial expressions would be detected. Emotion recognition is classified into happy, sad, angry, or neutral categories. Based on the classification system adjusts air conditioning, sets the appropriate music track, and enables a light enhancement. The block diagram of the proposed technique is as shown in Figure 1. The proposed technique consists of seven processes, including live video capture, face detection, iris recognition, eye extraction, emotion detection, and environment setting.

FIGURE 1.

Block diagram of the proposed system.

Figure 1 shows the proposed block diagram of the smart home system. As the person walks towards the door, the camera captures the image and then tries to do face detection. After that, the iris recognition is done. For iris recognition, the eye of the person is extracted from the image. This extracted image is saved. If the input iris matches the iris in the database, then permission is given to the person to enter the door. Otherwise, this person is not allowed to enter the door. From the image of the face, the person’s emotion is detected based on which the environment is set inside the house.

Overall, the motivation behind the system is to provide enhanced security, personalized experiences, and user-friendly interactions within a smart home environment.

By incorporating iris recognition and emotion detection, the system aims to create a seamless and customized living experience for residents.

A. Biometric Security Using Iris Features

Innovative home security systems based on iris recognition can be developed using deep learning [23]. In this, the model provides better recognition in stable and worse situation.

One of the major problems in IRIS detection system is detecting the liveness of the person while facing towards the camera. For liveness detection the eye blinking is considered in this case. Following algorithm is mainly used for the liveness detection of a person using eye blinking.

Algorithm for Liveness detection using Eye Blink

Obtain the live video using camera.
Capture the face image.
Convert it to gray scale.
Detect the face from the gray scale image using Haar cascade Algorithm.
Detect the eyes from the face region of interest using Haar cascade Algorithm.
Compare each eye pixels with threshold value to create the binary image map.
If (pixel $\gt =30$ )
Pixel =255 (white)
Else
Pixel =0 (Black)
Now count the white pixels and
Check if(white pixel count <300)
Blinkcount = blinkcount +1
Eye Blinking = True
Else
Eye Blinking = False

In the above algorithm after detecting the eye region, the eye pixels are compared with a threshold value. This value will specify that the eye is open or closed. So if the eye pixel value is more than the threshold then assign it with white else black. Finally, after counting the number of white or black pixels, it can be known whether the eye is blinking or not.

The pre-processing of images is done by following a series of methods consisting of detection and extraction of eyes, ridge detection using Canny edge, iris extraction using Hough circle detection, noise removal, pupil removal, and conversion into local binary pattern [24]. The obtained output image is used for training.

In Figure 2, the block diagram shows the proposed model for iris detection and feature extraction. Initially, the image of the person is captured using a camera for recognition [25].

FIGURE 2.

The proposed block diagram of iris detection and feature extraction.

Then the eye of the person is extracted from the image. Then with the help of Canny-Edge detection, edges are identified. Iris is detected with the Hough method. Noise is removed from the image in the next step.

The proposed a modified LBP algorithm is then applied to the output image. The Dougman sheet is used for data normalization.

Now from the final image, features are extracted. The details of the iris detection algorithm are listed in the following.

Algorithm for Iris Detection:

Capture the image by the following substeps.
1. Record the video of the incoming person’s face. Select one such frame where the clear face of the user can be identified based on sharpness, uniformity, and contrast.
2. Crop the image such that the eyes of the user can be seen.
Use the Haar cascade xml files to get the features.
Then, extract the information about the eyes from the image.
Convert this image to a grayscale image and perform the following substeps.
1. Execute the iris pre-processing method.
2. Get the extracted eyes from the database.
3. Execute the Hough transform for detecting the inner circular portion of the eye (iris).
4. Apply Canny edge detection to get the ridges in the iris.
5. Remove the inner pupil of the eye to make the dataset more accurate.
6. Perform the iris feature extraction method.
Calculate modified local binary patterns of image and store them as features
Divide the image of $160 \times 160$ pixels into $40 \times 40$ blocks.
Calculate the mean and variance of each block to include them as features.
Create a CSV file which incorporates modified LBP, mean, and variance. Analyze and train the dataset obtained.
1. Perform the iris recognition method.
2. Divide the dataset of 450 inputs into 360 inputs for training and 90 inputs for testing
3. Calculate the features for each test image.
4. These features are then compared with the features extracted from the training set
5. The comparison results are calculated in the form of deviation (obtained with the help of Euclidean distance [26]).
6. The mean of these deviations represents the recognized image in the training set.

The detailed steps of LBP are listed as follows:

The LBP feature vector divides and examines the window cells. Compare each pixel value to its 8 neighbors. The pixels on the circle are followed.
If the pixels are brighter than the neighbour, it is initialized to 1. Otherwise, it is initialized to 0.
The graph is ciphered on the cell of the frequency of each pixel value.
The bar graph is then normalized.
After that, the histogram of all cells is concatenated.

Where gc is the result as a binary code which is describing the local texture pattern and gp is the neighboring pixel value.

$\begin{equation*} R= \sum \limits _{p=0}^{p-1} {s(gp-gc)2^{x}} \tag {1}\end{equation*}$ View Source

Here P is the number of members and, R is the radius of the circle.

$\begin{equation*} s\left ({{ x }}\right)=\begin{cases} \displaystyle 1,& x\gt 0 \\ \displaystyle 0,& x\lt 0. \\ \end{cases} \tag {2}\end{equation*}$

View Source

In the Hough circle transform, circles are extracted from the image. The transform is so efficient that it even finds an imperfect circle present in an image. It is based on the principle that as the line can be defined in multiple ways so can be a circle with the help of an angle and length. Let r be the length and $\theta$ be the angle. So, a circle can be defined using a center and coordinates (x and y).

$\begin{equation*} C:(x_{centre},y_{centre},r) \tag {3}\end{equation*}$ View Source

And x and y co-ordinates are as follows:

$\begin{align*} x& =a+Rcos\theta \tag {4}\\ y& =b+Rsin\theta \tag {5}\end{align*}$ View Source

A feature represents that piece of information by which an individual can easily be recognized and differentiated. It is considered unique to a particular individual. Thus, for iris recognition, features are used [27]. Once the feature set is ready, the K-nearest neighbor algorithm is applied for classification. This algorithm belongs to the supervised machine learning category. Being non-parametric, it is widely used in many real-life scenarios [28].

As the extracted features are dependent on intensity values, the selection of good quality camera plays a vital role in system implementation.

The test image is having dimension $80 \times 80$ pixels and number of rows and columns is set to 4. This image is divided into 4 rows and 4 columns, i.e., 16 blocks of $20 \times 20$ . Figure 3 depicts an example of the block division. Figure 3(a) depicts the iris image of $80 \times 80$ pixels after the pre-processing steps are executed. Figure 3(b) depicts the resultant image after dividing it into subblocks of $20 \times 20$ . There are 16 mean values and 16 variance values for these 16 blocks. They are taken as the features of this image.

FIGURE 3.

An example of block division.

Then, the modified local binary patterns are calculated using local binary pattern. Finally, the original image is added to it.

One example of the calculation of LBP is described here. The input test image block is taken as shown in Figure 4(a), the pixel with value 64 located in the center of the block is selected. If the neighboring pixel intensity is greater than the center pixel’s intensity, then its intensity is set to 1; otherwise, set it to 0. The processed resultant binary block is shown in Figure 4(b). To generate the updated value for the center pixel 64, the weight block as shown in Figure 4(c) is used. The updated value $1\times 1+ 0\times 2+ 0\times 4+ 0\times 8+ 0\times 31+ 1\times 64+ 1\times 128=193$ is then calculated. Finally, the modified local binary pattern is obtained by adding the updated value with the center pixel. The above-mentioned process is iterated for every pixel with the sliding widow of $3 \times 3$ pixels.

FIGURE 4.

An example of local binary pattern block generation.

The motivation behind the algorithm is to enable iris recognition by extracting key iris features and generating a feature set for classification. The algorithm provides a detailed process for capturing and processing iris images, calculating modified LBP, and utilizing the Hough circle transform. The outcome of the algorithm is the extraction of iris features, which is used for reliable identification and recognition purposes. This supervised machine learning algorithm KNN compares the extracted features of the test image with the features from the training set to identify the most similar iris pattern. The recognition result is represented by the mean of the deviations calculated using the Euclidean distance.

B. Feature Extraction

Feature extraction is a process of reducing the dimensionality [29] of images to be made more manageable. An initial set of raw data is reduced and divided into manageable groups. As raw images/input have some variables, a large computing power is needed to process them. This process helps to determine the most important features from the big data set by choosing and combing variables into a feature. These selected features are easy to process, and they describe the actual data accurately. Feature extraction is used in minimizing the presence of redundant data within the dataset. An example of feature extraction can be text feature extraction based on deep learning [30]. Another example of feature extraction from iris using the Time Series Feature Extraction Library (TSFEL) python library, it is used to facilitate the quick exploratory data analysis and feature extraction from time series data, while also considering computational costs [31].

After pre-processing, the images are ready for feature extraction. For feature extraction, the rules are as follows:

The image is divided into ‘n’ blocks per row and column.
Data distribution of the image can be calculated by variance.
For each block mean of object pixels and the data distribution (variance) of pixels are calculated.

In this way, there are two values available per block (mean and variance of the entire image). These values are considered as features of a particular image (image of the iris). Thus, these features are used to calculate the modified local binary pattern. To obtain the modified LBP the local binary pattern is added with the original image.

The Feature extraction results as reduced dimensionality, meaningful feature representation, and the generation of modified local binary patterns. These outcomes facilitate efficient image analysis and enable tasks such as classification or recognition.

C. Emotion Detection Module

Mood can be easily detected through emotion recognition. Emotion recognition based on facial expressions is feasible in real-time. The most prolific results are obtained using Laplacian of Gaussian, high boost filters, unsharp masking. The flow of the emotion detection model is depicted in Figure 5.

FIGURE 5.

Flow of the proposed emotion detection model.

The above block diagram shows the steps used for emotion detection. The image of the face is captured by the video camera. The, first pre-processing step is to apply a Haar cascade on the image. Then the histogram equalization process is executed on the image, the composite mask is then applied. The final preprocessing step before the actual feature extraction is the Sobel filter. Features are extracted with eigenvectors. Finally, FisherFace-based recognition is done.

Subject detection using the Haar cascade classifier is one of the most accurate object detection methods [32]. It is a machine-learning based approach where a face, eye, or mouth cascade function is trained using a combination of positive and negative images, enabling it to subsequently detect objects in other collected images. Face detection is built using the Haar cascade method. For comprehensive training of the classifier, a substantial number of positive images depicting faces and negative images lacking faces are necessary as per the algorithm’s requirements. Features are extracted after thorough training.

Initially the Haar-like features of the input image is computed according to the Viola-Jones integration formula as shown in equation 6.

$\begin{equation*} sum=I\left ({{ C }}\right)+I\left ({{ A }}\right)-I\left ({{ B }}\right)-I(D), \tag {6}\end{equation*}$ View Source

where A, B, C, and D belong to the integral image I.

This mask has prominently detected horizontal and vertical edges in an image [33], [34]. It also works on the principle of spatial masking and calculated the difference among pixel intensity values. As the middle row in the mask consists of zeros along the scan-line, it calculated the difference of above and below pixel intensities of the particular edge. Thus, it has increased the sudden change in the intensities and makes the edge more visible and prominent.

PCA is a dimensionality reduction method. The first principal component denotes the direction of greatest variance in the data, while the second principal component represents the direction of maximum variance in a space perpendicular to the first component. The first principal component corresponds to the eigenvector of the covariance matrix that possesses the largest eigenvalue, while the second principal component corresponds to the eigenvector associated with the second highest eigenvalue.

The principal components can be calculated using a collection of 2D or 3D data points. To find the eigenvalues of the image, it is considered to have a collection of points. A $150\times 150$ image is nothing but an array of $150\times 150\times 3$ numbers. It can be considered as a long 1D array of 67500 elements. This also can be considered as a point in 67500-dimensional space. Similar to a point in the x, y, z plane (i.e. 3D plane). In OpenCV2, eigenvectors are calculated using a direct method as shown in equation 7.

$\begin{align*} & \hspace {-1.5pc} mean, {eigen}_{vectors} \\ & =PCA Compute(Data, Mean, MaxComponents) \tag {7}\end{align*}$ View Source

1) Algorithm for Emotion Detection

Capture one frame from the video camera. Then, extract the eyes from this captured image.
Use Haar cascade xml files to get the landmarks and process and extract the human face from the image.
Convert this image to the grayscale and apply the Sobel filters to generate the resultant image of edge detection.
Perform principal component analysis to calculate eigenvectors of image.
Store the eigenvectors as the required features. Employ FisherFace to train and design the emotion detection.

The Fisher Face Algorithm is an emotion detection algorithm that performs a composite calculation using PCA and LDA. In this method the LDA is used to calculate the optimal value of the objective function that helps to reduce the variance within the same class and maximizes class distance for achieving proper class separation between the targeted class and other classes. The Fisher Face separation thresholds can be helpful to differentiate one type of emotion class from another.

$\begin{equation*} {} {argmax}_{w} O\left ({{w}}\right)=\frac {w^{T} S_{B^{w}}}{w^{T} S_{W^{w}}} \tag {8}\end{equation*}$ View Source

where

$S_{B}$

is the between-class scatter matrix defined as:

$\begin{equation*} S_{B}={(m_{2}-m_{1})(m_{2}-m_{1})}^{T} \tag {9}\end{equation*}$

View Source

and

$S_{w}$

is the within-class scatter matrix define as:

$\begin{equation*} S_{B}=\sum \limits _{j}^{2} \sum \nolimits _{x\in c_{j}}{(x-m_{j})(x-m_{j})}^{T} \tag {10}\end{equation*}$

View Source

where

$m_{j}$

is the mean of class j.

D. Implementation of Smart Home System

A Raspberry PI is used for the smart home simulation and is assembled. The ultrasonic sensor is used to measure the distance using its ultrasonic waves. It is used to know whether the door is open or closed.

A couple of Light Emitting Diodes (LEDs) are used to show the lighting conditions inside the room based on the emotion a person possesses. The LEDs turn red if the person is in a state of anger and blue when the person is calm and happy. DHT11 is another sensor used to sense the temperature of the room; it is being used to adjust the temperature depending on the mood of the user. The detailed processing steps of the smart home system are listed in the following.

Perform the iris recognition module.
If a person is an authentic user (resident) is found, the following sub-steps are executed.
1. The emotion detection process is executed.
2. Based on the detected emotion, lights and
  music system are set accordingly.
3. Room temperature is then adjusted.
The following sub-steps are executed if an unauthentic user enters the house (guest or a thief) is found.
1. Check if the door is open using an ultrasonic sensor.
2. If the door is open, capture the image using a camera.

The proposed smart home system provides personalized biometric security and tags the user as authentic or inauthentic by validating the user through his or her iris. If a user is tagged inauthentic and still tries to enter the house, he can be considered a guest-who visits occasionally or a thief. If an unauthentic person enters the house, his image is captured and sent to the owner through the Ethernet port connected with the wired Ethernet connection. Not only notifying the owner of the risk but also giving him an option further use those results to train his system to increase its efficiency.

Whereas, suppose the detected user is the resident. In that case, he/she is tagged authentic and if the door opens, allowing him/her to enter the home, the emotion detection module is initiated to detect the mood. Based on this classification, particular music track, lighting conditions and the air-conditioning of the room are adjusted to lighten the resident’s mood. A fixed set of music track are already set to play as per the mood detected by the system. The entire system works on the fly and in real-time, which is why this entire system is termed smart and self-adjusting.

Figure 6 is a block diagram of the working model. The LEDs depict enhancement in the lights; speakers are connected to play the background sound and a temperature sensor is used to detect the temperature of the house and then adjust the cooling. After the person enters the room, the environment is set accordingly [35].

FIGURE 6.

The proposed model of the smart home.

SECTION IV.

Experimental Results

In the simulations, we need to have a good quality camera to capture the face of the person entering the smart home. Raspberry PI is used for image processing and giving signals to the environment as per the detected emotion. Multi-colored LED lights are needed after the emotion is detected, led lights will glow with the color, which will set the environment accordingly. Quality speakers are also used to set the mood as per the emotions detected by the system. In addition, the air conditioner is needed to set the room’s temperature as per the emotions detected and as per the outside environment.

In the simulations, the testing programs are implemented in python script. Different libraries such NumPy, OpenCV2, and pandas libraries are used for image processing. The Fisher Face based algorithm is used for emotion detection. Similar to Eigenfaces, Fisherfaces enable the reconstruction of the projected image. Figure 7 shows that the Fisherfaces algorithm works better than the Eigenfaces in term of recognition rate.

FIGURE 7.

Results of the recognition rate for Eigenfaces and Fisherfaces.

A. Data Sets

In “Multimedia University (MMU) iris data-set” [36], a total of 450 images are present. There are five images per iris and two irises per subject captured. All images were taken using LG Iris Access 2200 at a range of 7–25 cm. This particular dataset is selected for this work because it has diversified iris image collection and these images are captures using good quality image capturing devices. Other datasets offer three or fewer images per iris. Due to privacy issues, most other iris datasets require a lengthy registration process.

The dataset contains around 450 extracted eye images (both left and right), classified as user-left-number or user-right-number, starting with user number 1. For instance, John is User 1, so the images extracted would be stored like 1l1 and 1r1 (user-left/right eye-number). From this set, we use 360 eye images, consisting of both left and right eyes, for training and 90 for testing. Thus, for training, we are using 99% of the dataset. The distribution of the dataset of the eyes into two sub-datasets is listed in Table 1.

TABLE 1 Distribution of the Iris Dataset

The purpose of the Cohn-Kanade AU-Coded Facial Expression Database is to serve as a resource for research and exploration in automatic analysis and synthesis of facial images. There are 7 different types of expressions present in the dataset as shown in figure 8.

FIGURE 8.

Sample Expressions.

For real time Emotion Detection, the Facial Emotion dataset present Kaggle repository is used [37]. The dataset contains a total of approximately 1200 images, i.e. ten images per individual. In the first version, there are a total of 486 sequences captured from 97 subjects. Each sequence starts with a neutral expression and progresses to a final expression. The peak expression in each image/sequence is coded using FACS (Facial Action Coding System) and assigned an emotion label. The emotion label corresponds to the expression made, rather than the intended expression. A part from accuracy rate, here we have also calculated the time complexity for iris detection and emotion recognition. Time complexity calculation is discussed in the following.

Time complexity IRIS Detection-

Capture the image- this step involves capturing an image frame from a video to crop the biometric
information = O (1)
Haar cascade Feature extraction from the image of size (w $\ast$ h) = O(w $\ast$ h)
Gray Scale conversion and image processing = O(w $\ast$ h)
Feature extraction- O(w $\ast$ h/16) = O(w $\ast$ h)
Creating the csv file with n features = O(n)
Training and Testing- Training and Testing image with n-features = O(n)
Therefore, the overall time complexity of the
algorithm = O(w $\ast$ h)
Where w = width and h = height of the image, n =
number of features
Time complexity Emotion Detection-
1. Obtaining one image frame from video = O (1)
2. Face extraction using Haar Cascade XML File = O(w $\ast$ h)
3. Convert image to gray scale and apply Sobel Filter = O(w $\ast$ h)
4. Perform PCA followed by Eigenvalue decomposition = O(n3)
5. Storing eigenvectors = O(k $\ast$ n) where k = number of eigenvectors, n = feature dimension

Therefore, overall time complexity of Emotion Detection= O(n3)

SECTION V.

Comparative Analysis

Comparative analyses of different approaches to iris recognition are given in Table 3. The modified local binary pattern model (MLBP) has shown the best results with an accuracy rate of 95.25% and recognition rate of 90.5%. In all the discussed methods different Machine Learning and transfer learning (TL) models [19], [20], [21] are applied. The MLBP is a very simple model as compared to the ML and TL model in terms of resource utilization. Secondly despite of the presence of the noise data in the iris image, the MLBP performs well in terms of iris recognition by capturing the low-level texture features. But in case of ML and TL based techniques image pre-processing methods are required to remove unwanted noise from the images. Thirdly MLBP shows better performance despite of less number of training samples, Whereas ML and TL techniques needs more number of samples to show better performance.

TABLE 2 Distribution of the Emotion Dataset

TABLE 3 Comparative Analysis for Iris Recognition

Comparative analyses of different approaches to emotion detection are given in Table 4. The proposed fisher face algorithm (FFA) has shown accuracy rate of 93.93% and recognition rate of 87.75%. For real time emotion detection, the applied FFA outperforms as compared to other discussed ML and TL models due many reasons. Firstly, FFA uses dimensionality reduction to filter out unnecessary features and captures the most discriminant features in the given image. But in case of ML and DL models separate dimensionality reduction algorithm is used to find the most useful features used for classification. Secondly FFA applies LDA to maximize the inter class distance and minimize the intra class distance by identifying the most discrete features of different class of facial emotions. FFA needs less amount of computational power as compared to the ML and DL models. The FFA trains well with a smaller number of samples which is never possible in case of ML & TL based models.

TABLE 4 Comparative Analysis for Emotion Detection

SECTION VI.

Conclusion

The proposed smart home system is designed to yield better and more accurate user identification, authentication, and facial emotion detection results. The proposed model not only gives a higher accuracy and recognition rate but also focuses on simplicity rather than the complex implementation of recognition and detection algorithms. The highest accuracy thus obtained in the iris biometric security, implemented using the modified LBP is 95.25%.

The main aim of the ‘emotion-based smart home system’ is to provide comfort to the user. This system is used to demonstrate the adjustments in the lighting, air conditioning, and music; and can be extended to more appliances by making it more personalized and robust. The modules used for emotion detection and iris recognition give pretty accurate results. The facial emotions thus detected using various approaches, as mentioned in Table 4, produce an accuracy from 89.39% to 93.93%. The system even works well for iris recognition even if the user wears an eye glass.

The proposed system has been developed for smart homes. Still, it can be extended to offices, classrooms, business units, and many more places, by changing the system architecture and including cloud storage. To enhance the robustness of the system a greater number of iris and facial emotion samples can be collected in future. More customization can be added in to the system such as voice command-based device control and context-based device control.

Compliance with Ethical Standards:

Conflict of Interest: The authors declare that they do not have any conflict of interests that influence the work reported in this article.

Data availability: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References is not available for this document.

EmoSecure: Enhancing Smart Home Security With FisherFace Emotion Recognition and Biometric Access Control

Abstract:

Metadata

Abstract:

Introduction

Related Work

The Proposed Technique