Suspicious Behavior Recognition Based on Face Features

Intelligent surveillance systems are widely used in several critical areas as airports, ATM, and bank agencies to ensure more security and more safety. The need for an intelligent behavior recognition system is still increasing. Traditional approaches based on access to restricted places or suspected actions as theft, scam, and loitering are insufﬁcient to identify suspect behavior. These actions do not represent a real key of suspects. Novel trends attempt to extract behaviors from involuntary actions as face gestures, face characteristics, and feeling features. This paper is motivated not only by the limits of the traditional approaches but also by the complexity of intelligent algorithms. In this context, the present paper uses face features to recognize the feeling of fear like a suspect behavior. Indeed, this feeling represents the main characteristic of a suspicious person under crime as announced by several psychologist scientists. The fear feeling is usually followed by an increase in heart rate beats. This paper describes the recognition of fear feeling using a camera as a contactless sensor. Frequencies associated with face based-video are used to estimate the heart rate according to the fusion of three techniques: bandpass ﬁlter, Eulerian transformer, and Lagrangian transformer. The proposed algorithm beneﬁts from the advantages of each technique, but it is challenged by the Real-Time exigence. For this purpose, a Raspberry PI3 board is used in relation to Raspbian Operating System to ensures Real-Time criteria. The proposed is trained according to CK + dataset. In this paper, contributions attempt to ensure not only a high recognition rate using a non-complex algorithm but also guarantee a real-time computation. Results reveal that the proposed algorithm has the best heart rate estimation in comparison with traditional methods. Hardware results justify the success of the proposed design in terms of resource requirements.


I. INTRODUCTION
Nowadays, safety and security became an important objective for humanity. Video surveillance system historic starts in 1960 by using analogy CCTV. In this phase, the surveillance system serves for storage and did not provide intelligent processing. Then, the second phase starts in 1980 by using digital CCTV. This evolution helps the researcher to exploit briefly the automation. The third phase starts in 2000. It is highlighted by the evolution of the digital camera that serves for the semi-automation of video-surveillance systems [1].
The associate editor coordinating the review of this manuscript and approving it for publication was Dian Tjondronegoro .
The automated surveillance needs to be intelligent. For this reason, systems should be able to identify threats and dangerous state from video streaming. Emergent surveillance system using video processing identifies suspicious behavior when a person is accessing to restricted places or is committing actions as theft, scam and loitering [2]- [4]. The position of the object under surveillance is performed using frame succession (at different point of time). The position is exploited to identify the human behavior as trajectory, gesture, and event. In the case of surveillance, tracking must be accurate which is not the case.
The abnormal behavior based on the tracking method results in a poor recognition rate [5] and is not really related to a suspect action. Existing surveillance systems suffer from the following shortcomings: (1) manual/visual detection of suspicious behavior is untrustworthy, (2) systems save only what has already happened and cannot expect an accident or a crime, (3) Systems related to specific case, (4) non-real-time systems, and (5) systems violate the privacy of citizens.
This section focuses mainly on the techniques used in involuntarily suspicious behavior including face gesture and fear feeling. Indeed, working on feeling behavior related to suspect like fear represents a great challenge. Psychologists prove that any offender feels fear underdoing a crime [6]. This is accompanied by the increasing of the heartbeat.
Authors in [7]- [9] as psychological scientists, prove that high dominance and low trustworthiness as morphological features represent a signal of high criminality. These features could be extracted from face expression [10], [11].
Other works attempt to recognize suspicious behavior by the feeling of fear using face video streaming. These studies could be classified into two main groups: facial expression methods and heart rate estimation methods.
The first group methods [5] attempt to identify feeling from basic emotions as anger, disgust, fear, joy, sorrow, and surprise from face's expression. Authors in [5] use Support Vector Machine (SVM) classifier to extract eye movements from facial expression then compared with CK+ and MUG datasets. Results related to fear sensing is often misclassified (only 42 % of expression are well recognized as fear feeling).
Authors in [12] propose Local Binary Patterns on Three Orthogonal Planes (LBP-TOP) method to identify fear from face expression. The true recognition rate (about 79 %) in this case is better than previous work but suffers from the increased response time.
In [13], the authors propose to identify the basic emotions from facial expression according to mouth status in this case. SVM classifier is performed. The accuracy is lower than LBP-TOP method (about 71%) and the time response does not respect the real-time exigence (about 1 minute).
Authors in [14] use High-order Joint Derivative Local Binary Pattern (HJDLBP), Local Binary Pattern (LBP) histogram, and SVM algorithms to extract facial expression. According to results related to the feeling of fear, the accuracy rate is about 69.3%.
To sum up, different methods based on facial expressions are proposed in the literature to recognize fear. We can highlight three main shortcomings: (1) Accuracy rate still requests improvement (the higher rate until now is about 79%); (2) Computation time still requests to be reduced; (3) All proposed algorithms identify only the obvious fear feature.
The second group methods try to resolve the abovementioned limits using the subliminal feature as the estimation of the heart rate. Scientists affirm that any fear feeling is tied to an increase in the heartbeats. In the case of a surveillance system, we focused only on the contact-free system based on face-video processing. Authors in [15]- [19] used the frequency analysis approach of the face-based video. The Region Of Interest (ROI) in the face is transformed from the spatial signal to a frequency signal using Fourier Transforms. Then different functions are implemented to find the heart rate estimation as follows: (1) Peak detection algorithm is applied to the frequency signal of frames issued from the video scene. It indicates variations of frequency along with a set of frames, (2) Power spectrum algorithm is applied to the frequency signal computed by Fourier Transforms, (3) Eulerian transforms reveal subtle changes in the ROI, (4) Lagrangian transforms support unrestrained changes. Authors in [20] apply a band-pass filter to extract peaks or zero-crossing. All the above methods achieve an acceptable accuracy rate. The computation time is small since there are no datasets. In [21], authors use intelligent algorithms as Markovian model and machine learning techniques. These methods ensure the best accuracy rate in comparison with previous works, but the computation time is the most shortcoming met due to learning requested when using datasets. Despite the request of advanced Hardware as a higher-resolution camera and a specific processor architecture, these solutions still suffer from long delays due to huge computation.
In this paper, we propose a new method to recognize fear feeling based on heart rate estimation. The proposed technique corresponds to the fusion of several techniques to ensure the best accuracy in terms of recognition rate. A specific design is performed to provide computation in realtime.
Author contributions could be resumed as follow: conceptualization, Mossaad Ben Ayed and Sabeur Elkosantini; methodology, Mossaad Ben Ayed and Mohamed Abid; validation, Mossaad Ben Ayed and Shaya Abdullah Alshaya; investigation, Mossaad Ben Ayed and Sabeur Elkosantini; writing -original draft preparation, Mossaad Ben Ayed; project administration, Mossaad Ben Ayed. All authors discussed the results and contributed to the final manuscript. Section 2 proposes a new method to identify the feeling of fear based on heat rate estimation. Specific hardware design is described to ensure real-time processing in section 3. In the experimental section, software and hardware validation are discussed. The last section concludes the paper.

II. THE PROPOSED FEAR FEELING METHOD FOR SUSPICIOUS BEHAVIOR
The proposed method attempts to enable a real-time estimation of the heart rate with more accuracy. It is broken down into three phases. The extraction of the fear feeling from the face using the aggregation of the bandpass filter, Eulerian transformer, and Lagrangian transformer as a non-complex algorithm represents the main objective. This key innovation should maintain a high accuracy than related works based on complex computation and achieve a reasonable response time when using Raspberry PI3 board.
The first phase is called the pre-treatment. It aims to localize the ROI. The ROI represents the best region in which skin is well clear and the movement is so limited. Previous works demonstrate that the forehead is the best ROI. For this, the proposed technique extracts face from frames based on Haar cascade classifier used by OpenCV library. This function is sufficient because in our case only one face is exposed to the camera. So, the disruption is limited, and the true face detection rate is nearly 100%. Then, the forehead region is found using eye position.
The second phase estimates the heart rate. The ROI is transformed from the spatial domain to the frequency domain using Fast Fourier Transformer, see equation 1.
where u and v are spatial frequencies.
Then, we apply the bandpass filter using equation 2. This filter tries to remove the unwanted frequencies which are superior to 150 Hz and inferior to 50 Hz. This filter maintains only the useful signal Fbf.
where w is the bandwidth of the filter Eulerian transformer, defined by equation 3, analyzes the signal to determine the common behavior. This step selects the useful signal.
The Lagrangian transformer is applied to the obtained signal issued from Eulerian transformer to compute accurately the frequency related to the signal which corresponds to the heart rate, as shown in equation 4.
The third phase computes the variation of the heart rate estimation found by a successive video sequence with 5 s duration. If the variation of the heart rate (VHR) is superior to 30 Beats Per Minute (BPM) a suspicious behavior based on fear is expected. This VHR is computed through samples of fear emotion described in CK+ dataset [23] which is publicly available. As shown in figure 1, the system detects the face then localizes the forehead which represents the ROI. After the detection of the face using the standard function supported by the OpenCV library, the algorithm divides the face-image into matrix 3 × 3. The ROI corresponds to the (1,2) position. After that, the proposed algorithm is applied to this region to compute the estimated heart rate.

III. RESULTS
This section describes the validation phase for the proposed method. The first section discusses the simulation results then the second provides details about the implementation phase. The investigations were carried out following the rules of the Declaration of Helsinki of 1975 and approved by the Committee for the Ethics of Scientific Research at Majmaah University.

A. SIMULATION RESULTS
The proposed methods are tested on Intel i5-5200 CPU 2.2 GHz. Python 2.7 associated with OpenCV library is used. Participants were sited in front of an HP webcam at a distance of about 30 cm. This condition is sufficient in the case of surveillance at ATM system but poor for other applications as surveillance at bank agencies or that at the airport area. The recorded video has 30s length divided into sub-sequence of 5s. The proposed method is applied to each sub-sequence. The obtained heart rate estimation is compared with Samsung Health application that gives the Bpm by putting the finger into the sensor.
We aim to compare the proposed method using the aggregation of the bandpass filter, Eulerian transformer, and Lagrangian transformer methods with each method alone on the same hardware board (Raspberry Pi 3) and the same requirements.
The comparison is done according to two metrics adopted by previous works. The first one is the mean error which represents the difference between the heart rate computed by Samsung Health application and the estimated heart rate using our method. The second metric is the variance error which computes the squared standard deviation related to the mean error. Table 1 presents the obtained results.
Results show clearly that the proposed method achieved best results in comparison with methods based on Bandpass filter, Eulerian transformer, and Lagrangian transformer.
Curves in figure 2 show that the estimated heart rate error change according to ranges of the heartbeat. When the heart   quickly beats, the error is reduced, and the spectral estimation is more accurately performed. This result encourages researchers to improve the pre-treatment phase to obtain a convergent error.
Errors related to the proposed method change based on heartbeat range, see figure 4. According to the statistical analysis, the mean errors are decreased by 10% on average when moving from 50-70 range to 130-150 range, and the variance errors are reduced by 11% on average. These results prove that the heart rate estimation algorithm is more accurate for the highest range. An improvement for the proposed algorithm to achieve more accurate results for the lower range of the heart rate could be the subject of future work.
In conclusion, the results reveal that the proposed method is powerful in terms of accuracy. It should be implemented in a real situations to assess its performance in terms of real-time and hardware reliability.

B. HEART RATE RESULTS
The implementation requests specific hardware to run the heart rate estimation with a high accuracy performance. A Raspberry Pi 3 board equipped with a camera is used. Pi 3 is based on 32-bit ARMv7 which runs at 1.2 GHz. A Raspbian GNU/Linux 9 32-bit operating system is installed onboard. The main advantage to use this board is the capability to run codes at a higher level as C++, JAVA, and Python. In our case, we use Python 2.7 as a programming language for the proposed algorithm described in section 3. The OpenCV library provides a high-level API to perform the heart rate estimation using video processing.
The written code is run on Raspberry board. The memory usage, the CPU load average, the GPU temperature, and the CPU temperature are the essential keys to perform the performance of the system. We attempt to extract system status  during 1-minute execution. The memory usage is computed to identify the requested memory resource. The average used memory is about 29 MB and the maximum memory used is about 31 MB. Therefore, the board provides enough memory size. Nevertheless, the system should be implemented on 32 Gb SD because the Raspbian Operating System and OpenCV library request much memory. Figures 5,6, and 7 show results related to the performance factors. Figure 5 highlights the temperature variations related to GPU and CPU. The average GPU temperature achieved 75 • C and about 86 • C for the CPU. Results show that the CPU does not request any special cooling mechanism when the board environment temperature is around 27 • C to 30 • C. The temperature would be increased if a camera with a high resolution is used due to the complexity of computing. Figure 6 draws the variation of the CPU load average. It means the average processes load by the CPU over a period of time. Its value represents the number of processes in the waiting status to be performed by the CPU. This value gives an idea about the execution delay and the size of the buffer. The curve shows that the CPU load average is about 1.6 processes for 1 minute. This value mentions that the system could respect Real-Time requests and the delay seems to be acceptable. Figure 7 shows the CPU usage under 60 s length of the stream. This performance key indicates the rate of occupation of the CPU under the execution of processes. The CPU usage average is about 85 %. The result means that the hardware architecture related to the Raspberry PI 3 is enough for the proposed suspicious behavior recognition.
The heart rate estimation system based on face video shows good accuracy in terms of estimation reliability and hardware response.

C. FEAR FEELING RESULTS
This section aims to verify the true recognition rate of the fear feeling based on the proposed technique. To verify the proposed heart rate estimation in the case of fear emotion feeling recognition, the system is trained according to CK+ dataset and is tested on online video data set as presented above. Table 2 presents the comparison results between the proposed system and other recognition systems based on facial expression. This comparison is done on CK+ dataset and it is around two criteria: (1) True Recognition Rate (TRR), and (2) Real-Time. In our case, a Real-Time system provides the response time according to the time exigence which is within a few seconds [24]. Suk and Prabhakaran [12] apply the Support Vector Machine (SVM) classifier to extract feelings and emotions according to facial expressions. The proposed method is implemented on a mobile device (Samsung Galaxy S3). The TRR achieves 57.1% for fear feeling and real-time exigences are satisfied.
Zhao and Pietikinen [13] use the Volume Local Binary Pattern on Three Orthogonal Planes (VLBP-TOP) to recognize the fear feeling. The authors obtain better results than Suk et al., and achieve 79.2 % on TRR.
Swapna et al. [5] aggregate between Local Binary Pattern (LBP) and Support Vector Machine (SVM). The method ensures real-time requests, but the TRR still needs improvements in the case of fear feeling (57.1%).
Sajjad et al. [25] propose many improvements when using SVM to increase the recognition rate. Some emotions as happiness and sadness are well recognized but the fear remained the lowest TRR (53%). In addition, the method requests much computation time to perform recognition.
Xie and Hu [26] perform the recognition using the deep learning methodology by applying the Convolutional Neural Network (CNN). This method is based on image processing not on video processing as our case, but the proposed method is focused on recognition rate criteria which is independent with the input nature. The authors improve well the TRR and achieve 88.89%. Unfortunately, their method did not respect the real-time requests.
Our proposed method, as mention during this paper, attempts to recognize the fear feeling according to the heart rate estimation via face features instead of facial expression dealt with previously. The TRR achieves 85.6% near the best rate found by Xie et.al, and our funding went one better in time computation.

IV. CONCLUSION
During this paper, a brief description of heart rate estimation based on face video is described. Related works emphasize the importance of such works in many fields especially in the surveillance system to extract the behavior of fear. The proposed method which aggregates three known methods achieves the best results. The obtained mean and variance errors demonstrate that the proposed system is more powerful compared to previous studies.
The hardware implementation using a Raspberry board is discussed according to memory requirements, CPU and GPU temperatures, and the CPU load average. Outputs confirm that the proposed system achieves good results related to the heart rate estimation. Furthermore, the PI3 board plays an important role to ensure Real-Time.
The heart rate estimation method proves an accurate result and achieves about 85% for the TRR and takes into consideration the real-time constraints.
Although the proposed method achieves pertinent results, we propose as a perspective the combination of the proposed method based on heart rate estimation and facial expression recognition. This objective will achieve a better true recognition rate, but it will be challenged by the real-time exigence. SHAYA ABDULLAH ALSHAYA received the bachelor's and master's degrees in Saudi Arabia, USA, and Italy, and he has put his knowledge into practicality as an international entrepreneur, and his endeavors include education and research, ICT sector. He is currently an Assistant Professor with the College of Science and Humanities at alGhat, Majmaah University, Riyadh, Saudi Arabia. He is also a member of the National Association of Industrial Technology, the Institute of Electrical Engineers, and the Sloan-C Online Association amongst others. He has a wealthy experience as the Head of IT at Majmaah University. His experience as a Researcher, a Consultant, a Project Manager, and an Information Technologist has allowed him to bring his extensive experience of technology planning, training, and implementation to all of the projects he endeavors. The numerous national and international companies that utilize his expertise as a Consultant Evidence Dr. Alshaya's success. His wealthy experience, vast knowledge is integral to his vision and future projects.
MOHAMED ABID received the Ph.D. degree from the National Institute of Applied Sciences, Toulouse, France, in 1989, and the ''thèse d'état'' degree from the National School of Engineering of Tunis, Tunisia, in 2000, in the area of Computer Engineering and Microelectronics.
He is currently the Head of Computer Embedded System Laboratory CES-ENIS, Tunisia. He is also a Professor with the Engineering National School of Sfax (ENIS), University of Sfax, Tunisia. His current research interests include: hardware-software co-design, system on chip, reconfigurable system, embedded systems, and biometric. He has also been investigating the design and implementation issues of FPGA embedded systems. Actually Dr.