Driver Drowsiness Prediction Based on Multiple Aspects Using Image Processing Techniques

The majority of the accidents were happening perpetually due to driver drowsiness over the decades. Automation has been playing key role in many fields to provide conformity and improve the quality of life of the users. Though various drowsiness detection systems have been developed during last decade based on many factors, still the systems were demanding an improvement in terms of efficiency, accuracy, cost, speed, and availability, etc. In this paper, proposed an integrated approach depends on the Eye and mouth closure status (PERCLOS) along with the calculation of the new proposed vector FAR (Facial Aspect Ratio) similarly to EAR and MAR. This helps to find the status of the closed eyes or opened mouth like yawning, and any frame finds that has hand gestures like nodding or covering opened mouth with hand as innate nature of humans when trying to control the sleepiness. The system also integrated the methods and textural-based gradient patterns to find the driver’s face in various directions identify the sunglasses on the driver’s face and the scenarios like hands-on eyes or mouth while nodding or yawning were also recognized and addressed. The proposed work tested on datasets such as NTHU-DDD, YawDD, and a proposed dataset EMOCDS (Eye and Mouth Open Close Data Set) and proved better in terms of accuracy and provides results in general by considering various circumstances.


I. INTRODUCTION
A large number of people across the world want to buy vehicles. It is noteworthy that the menace of road accidents is also increasing rapidly with the increase in the number of vehicles plying on the roads. The number of road accidents is very high in countries having highly crowded streets and roads.
The National Crime Records Bureau (NCRB) conducted a survey and reported that around 0.13 million lives were lost due to road accidents in India in the year 2020 alone [1]. This represents the foremost cause of deaths worldwide. The average mortality is high in the middle-income countries compared to the low-income countries, which is an alarming condition to think towards the hitch. The World Health Organization (WHO) has published an article that pointed out that the risk factors leading to accidents are speeding, intoxicated driver, distracted driving, etc. [2]. Almost all of these factors reveal that most road accidents are happening due to the carelessness of the driver, and the negligence in following the traffic rules as well as the safety precautions. Drowsiness may occur due to lack of sleep or continuous driving at night or both, ultimately making the vehicle driver tired and diverted from the concentration on driving. In the transportation industry, where the bus and truck drivers drive overnight, it is very common for them to fall asleep, particularly in the wee hours, due to exhaustion, while the vehicle is in motion. The circumstances mentioned above demand that people get alerted to avoid these situations to save many previous lives. Technology is advancing at a very fast pace and automation is easing the people's busy lives while providing them with services with perfection and that too in less time and more safety. Though top companies are already investing a lot of money to identify the state of a driver's drowsiness, it is still a challenging task with open research avenues. Hence, an automatic and efficient drowsiness detection and driver mood prediction-based system is required to be implement for real-time applications [20]. This will help to reduce road accidents and increase the people's safety [3].
The development of technologies required to implement the driver drowsiness tools becomes a tedious task in the area of accident prevention or accident-avoidance systems [50]. Due to the intensity of the problem, the industry has developed many systems based on various aspects. The driver's inattention may be because of the lack of sleep or negligence, or other parameters that might draw the driver's attention away from driving. Alkinaniet al. has done a comprehensive survey on human driving behavior using deep learning techniques and challenges [12].

A. LIMITATIONS:
A person may fall asleep while driving for various reasons. The same is exhibited in different ways such as nodding, closing eyes, rubbing eyes to control drowsiness, closing eyes with a hand, and keeping hand automatically on the mouth while yawning. Figure 1 presents the sample images of these gestures.

1) Multiple face detection:
Generally, the camera captures the whole scenario that may consist of everything around the driver. Hence, in addition to the face of the driver, the faces of the passengers as well as other objects in the surrounding are also captured. Identification of the driver's face from the various entities in the image is an activity that needs to be performed. Further, the face can be cropped and processed to predict the scenario and to alert the driver.

2) Face orientation:
When the driver is drowsy, his face may be captured in any orientation because he may turn to the side while yawning or trying not to fall asleep [8]. So, the proposed system must be intelligent enough to analyze the situation from the given orientations.

3) Expression differentiation:
A person exhibits different expressions depending on different situations. These expressions include excitement, disgust, and sadness. They can easily divert the driver's attention and must be differentiated from each other for further processing from the research perspective. Feature extraction techniques are applied to extract the differentiated features [4] [55].

4) Illumination:
As stated earlier, most drivers tend to fall asleep more during the timings of night driving and early morning. It is notable that this timing, unlike normal lighting condition timing, also coincides with low illumination for the camera captured images. Hence, the system to be designed for drivers must also take care of sufficient lighting condition by use of sources like Light Emitting Diode (LED) light or by using the Infrared (IR) cameras [8]. The proposed method is the integration of possible cases arises while driving generally. Still, the state-of-the-art systems are addressing the one problem or two only. The proposed method addresses the many issues instead of one based on various parameters simultaneously. It motivated to combine the different circumstances aforementioned. The paper organized as follows: section I and II presented the introduction and literature review. Section III represents the proposed method. In section IV discussed about the results and analysis. Section V and VI were presented with conclusion and references as well.

II. LITERATURE SURVEY
Dasgupta et al. [4] proposed a three-stage drowsiness detection system having PERCLOS (eyelid closure calculation), speech handling data taken from the microphone, and feature extraction. Lin et al. [5] and Budak et al. [14] have developed a system for drowsiness detection depending on the EEG (electroencephalogram) integrated with ICA and power spectrum analysis, and linear regression that was used for classifying the state of the driver's drowsiness. Feature extraction, multi-view, and EEG-based systems were introduced and implemented on the training system to overcome the challenges given by dynamic behavior [6][7]. In addition, functional near-infrared spectroscopy (fNIRS) has been used to investigate brain function using positive signals released. In contrast, the classification algorithms like DNN and CNN have been used to classify the drowsiness and alert states [9] [39]. In similar but different research works Lee [28] as well as Lin et al. [35] proposed a method that converts images into gradient images and used random regression forest algorithm to find the head orientation. Anilkumar et al. [29] proposed a system based on heartbeat detection using R-peak detection, face movement etc., which were detected with the help of a frame difference algorithm.
The majority of the research works are done based on the eye status of the driver [31] [32][37] [41]. Drowsiness detection was done using LBPH [33] [48]. Cheon and Kang [34] worked on the bio-data gathered from PPG (PhotoPlethysmography) and processed with segmentation and averaging. They completed classification using. Tateno et al. [36] developed a drowsiness detection system based on the heart rate and respiration changes. Wang and Qin [38] implemented a system based on the FPGA to detect the driver's drowsiness. Ishii et al. [41] have proposed High-order Local Auto-Correlation (HLAC) for extracting the shape features and identified the attention, stress, drowsiness. Ling et al. [42] has introduced a discriminative local feature vector for facial expression recognition using the sparse coefficients. Maheswari et al. [53] presented a comprehensive survey on texture-based local patterns such as LBP, LTP. LTrP, DBC, and DLEP. Hong and Wang [43] has introduced the integrated feature vector-based multiple features along with the LSTM. Hammedi et al. [44] discussed various driver drowsiness detection methods. Cristiani et al. [45] have presented the work of the project REFLECT and have discussed the differences in detecting drowsiness and fatal crashes of cars. Lashkov et al. [46] and Joshi et al. [47] advocated that OpenCV libraries are useful to retrieve the required features to detect the driver drowsiness.

III. PROPOSED METHOD
Drowsiness detection is a system that helps to provide safety and accident prevention. The proposed system is a driver drowsiness prediction system that will identify various scenarios. It will capture closed eyes, open mouth, hands-on eyes or mouth while nodding or yawning etc. It will also detect whether the person is yawning or trying not to fall asleep by the innate actions such as making eyes broad from normal size and rubbing eyes with a hand. An image of the driver captured through the camera serves as the system's input. Furthermore, the face will be identified and cropped from the image with created Region of Interest (ROI). This will be followed by detection of eyes from ROI which in turn will serve as the input to the CNN algorithm for classification of various states of sleepiness.

A. GRADIENT AND MAGNITUDE CALCULATION
Gradient and magnitude calculation is used to recognize the edges of a given face and orientation. Therefore, to calculate the texture based gradient magnitude [54] and the orientations, we use Robinson's operator in four orientations such as 0°, 45°, 90° and 135°.
The given 3x3 image can be convolved with their four possible orientations to gather the gradient responses were calculated using the equation (1) and designated as G 0 , G 45 , G 90 and G 135 , respectively [53].
Here, R 0 , R 40  (2) and the orientations are calculated for each pixel using the following formula: With the use of equations (2) and (3), we could find the driver's face position as well as, similar to [49], the orientation of the driver's face. We could also find the facial expressions like sad, disgust, and excitement, which may lead a driver to have diverted concentration from his work. Identifying prominent facial features on the face is a fundamental process that helps analyze complex problems such as expression recognition. Various applications can then use the status of the specific features for further processing. Automated facial landmarking generally describes the unique process to find the effective differences to construct the appropriate model. This method uses the dlib68 point model to point the landmarks on the face to compute the Eye Aspect Ratio (EAR), Mouth Aspect Ratio (MAR), and the newly proposed Face Aspect Ratio (FAR) parameters. These parameters are depicted in Figure 2. After finding the landmarks on face, EAR, MAR, and FAR were calculated using equations (4), (5), and (6) to find the status of eyes and mouth.

B. FINDING THE EYE, MOUTH, AND FACE STATUS:
x u MAR is the parameter to know the status of the person is yawning or normal based on the value says that whether mouth is opened or normal.  The above expressions or actions were given by the people usually in a sleepiness mood.

FIGURE 3. An integrated driver drowsiness prediction framework with various factors
FAR is calculated from the facial landmarks pointed vertically in the middle 3 points using the Equation (6) parallel equation, which impacts distance while the mouth opens. The gap between the points beside the mouth is reduced while the mouth opens, and the vertical gap increases so that yawning status can be found out even if the person covers their mouth with a hand or something while yawning as innate nature. The proposed prediction framework is presented in Figure 3 while the corresponding Algorithm 1 is presented below and makes use of symbols listed in Table I. Following are formulas used for calculating the semantics of the face based on the given landmark positions:

Face Aspect Ratio (FAR)
Left Eye Image (LEI) from the given image: (x1,y1),(x2,y2)=(shape [43][0],shape [44][1]),(shape [46][ 0], shape [47] RightEye Image (REI) from the given image: (x1,y1),(x2,y2)=(shape [37][0],shape [38][1]),(shape [40][ 0], shape [41] Similarly, the computes of the mouth and facial cropped images were calculated in size. Detection of hand, when it is found on the face in an image that has been captured while yawning or nodding, is the prominent step in the present research work. A camera is used to capture the image which is then processed further. A training algorithm is used to train with the samples. Whenever hands are detected on the face, it can be cropped and used along with the training data.

A. DATASETS:
1. NTHU DDD: National Tsing Hua University (NTHU) dataset consists of 22 various subsets with different ethnicities at various levels. It has images capturing various scenarios while driving, such as yawning, blinking, dozing, and laughing, in various illuminations. Each scenario is considered from the video consisting of 30 frames/sec [13] [52]. The videos are also simulated with various scenarios like glasses-wearing in the daytime or nighttime, sleepy or non-sleepy, etc.

YawDD:
This dataset is constructed from videos of driving in real-time. The images are captured by cameras fixed either in the front mirror or dashboard. Images of people driving have been collected in color 24-b (RGB) with resolution 640 X 480 from the 30 frames/sec [51]. Images consist of people of all ages, different facial features, ethnicities, etc. All mouth postures are taken in various illumination conditions while talking, singing, etc.

EMOCDS (Eye and Mouth Open Close Data Set):
The dataset is comprises of cropped eye and mouth images with open and closed status. The images were taken from Google and it has around 12k images of various people's images.

Drowsiness Dataset)[56]:
It consists of 180 videos of 60 different participants. Each participant given in three classes drowsiness, alertness, and vigilance with low.

B. CLASSIFICATION:
The model we used is built with Keras using Convolutional Neural Networks (CNN). A convolutional neural network is a special type of deep neural network which performs extremely well for image classification purposes. A CNN basically consists of an input layer, an output layer and a hidden layer which can have multiple numbers of layers. A convolution operation is performed on these layers using a filter that performs 2D matrix multiplication on the layer and filter. The CNN model architecture consists of the following layers: 1. Convolutional layer; 75 nodes, kernel size 3 2. MaxPolling layer: (5,5) 3. Convolutional layer; 64 nodes, kernel size 3 4. MaxPolling layer: (5,5) 5. Convolutional layer; 128 nodes, kernel size 3 6. MaxPolling layer: (5,5) 7. Fully connected layer; 64 nodes The final layer is also a fully connected layer with 2 nodes. In all the layers, a Relu activation function is used except the output layer in which we used Softmax.

C. PERFORMANCE ANALYSIS:
The performance of the proposed system can be analyzed using following parameters for measuring classification accuracy: True Positive (TP): Yawning / closed eye status is detected as correct one yawning, and the eye is closed. True Negative (TN): Non-yawning / opened eye status is detected as the correct one as non-yawning, and the eye is opened. False Positive (FP): Non-yawning / opened eye status is incorrectly detected as yawning / closed eye. The experiments were executed on datasets of NHTU DDD YawDD, and an additional dataset created by us. The dataset consists of 45000 images of human beings collected from various sources such as Kaggle, Google images, pixel.com, etc. The eyes and mouth areas were cropped from the images and grouped into four categories to detect the drowsiness based on the closed eye and open mouth status images using EAR, MAR, and FAR calculations. In addition, gradient and orientations were used to find the expression and orientation of the face of the driver while driving. Hand gestures or identification and glasses detection has done with appropriate algorithms such as convex hull etc. This depicts the driver's concentration on the task of driving. Classification has done with CNN deep learning algorithm to check the status of the possible cases mentioned in algorithm 1.
Another dataset created by us having 2685 number of images contains scenarios like eye open, eye close, mouth open, mouth close, hands 'on' / 'not on' mouth, eye, face, etc. This dataset called EMOCDS (Eye and Mouth Open Close Data Set) is used for experimenting and calculating the accuracy in the prediction of right state from the nine different cases as mentioned in Figure 4. The corresponding results are listed in Table V.

VI. CONCLUSIONS
The major issue in the framework is extracting the efficient features from the images that have been cropped and cut from the video sequence. The proposed work has detected the driver's drowsiness based on various aspects such as closed eyes, opened mouth, nodding with hand, and putting hand on mouth while yawning. The methods such as EAR, MAR, and the proposed novel FAR were used for feature extraction. Also, orientation of the faces was identified and gradient-based patterns were used to identify the various scenarios created by different states of face parts and hands. Moreover, unlike the feature extraction, threshold has been defined based on the gestures given generally. Finally, integrated all the features to generate the efficient feature vector and adopted CNN to classify various scenarios that describe the drowsiness state. The proposed method has been validated on the proposed dataset called EMOCDS (Eye and Mouth Open Close Data Set), a dataset of the chaos of all possible cases of sleepiness, and the benchmark datasets NHTU-DDD and YawDD to examine the accuracy and efficiency of the system. The proposed work has proved better compared to the state-of-the-art methods. However,