2D Facial Expression and Movement of Motion for Pain Identification With Deep Learning Methods

As pain is an inevitable part of life, this study examines the use of facial expression technology in assisting individuals with pain. The self-reporting system commonly used to detect discomfort is ineffective and cannot be utilized by patients of all ages; a standardized formula for measuring pain would resolve this issue. Facial monitoring technology is an important tool for measuring pain because it is both easy to use and incredibly precise. Accordingly, this article uses deep learning techniques to examine the use of 2D facial expressions and motion to sense pain. Sequential pictures from the University of Northern British Columbia (UNBC) dataset were used to train a deep learning model, as deep learning can detect motion and assist patients in self-reporting. Our mechanism is capable of classifying pain into three categories: not painful, becoming painful, and painful. The system’s performance was evaluated by comparing its findings to those of a specialist physician. The precision rates of the not painful, becoming painful, and painful classifications were 99.75 percent, 92.93 percent, and 95.15 percent, respectively. In sum, our study has developed an alternative way to test for pain prior to hospitalization that is straightforward, cost effective, and easily understood by both the general population and healthcare professionals. Additionally, this analysis technique could be applied to other screening methods, such as pain detection for infectious diseases.


I. INTRODUCTION
Pain is something that everyone experiences, sometimes at excruciating levels. As a result, a diagnosis of pain requires payment to a doctor. Traditionally, many patients and healthcare workers measure levels of discomfort using the self-rating approach [1] to describe the degree of pain. However, this method has drawbacks, primarily inaccuracy. Patients' and medical staff's perceptions of pain can vary widely, especially in children [2]. After the self-report approach was introduced, the observer rating scale (ORS) was developed to assist individuals in improving the accuracy The associate editor coordinating the review of this manuscript and approving it for publication was Anandakumar Haldorai . of their self-reporting [3]. In certain instances, however, pain control according to the ORS requires the skills of a specialist. Such cases frequently occur in intensive care units (ICU) and when doctors and nurses are unable to monitor patients for an extended period of time. Both self-reporting and ORS are applicable in many cases, but people can experience pain from a wide range of causes, from stomach cramps and shoulder injuries to sore throats. Simply describing a patient's condition is therefore inadequate for accurate treatment, despite the additional assessments of pain that may be available. A wide variety of techniques can now be used to assess pain, including neurologic diagnostic and imaging technologies, but such procedures are expensive for patients. Accordingly, Vijayanandh and Balakrishnan [4] proposed VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ facial recognition as a potential solution, specifically using a facial action coding scheme (FACS), which identifies facial expression. The FACS visualization system uses 44 action units (AUs) to show muscle activity in the face. A significant amount of research has been conducted on this approach to help people discern human emotional states, such as sadness, anger, impartiality, and calm [6]- [8]. The Prkachin and Solomon Pain Intensity (PSPI) metric was proposed by Lucey et al. [9], Prkachin and Solomon [10], who were the first to bring FACS into the medical profession. PSPI can quantify pain using FACS, which typically classifies pain using facial muscles and codes them using the corresponding AUs. Each AU is scored on a scale of 0 to 5, except for AU43, which has two values (0, 1). The facial expression data in this study has been extracted from the UNBC, a widespread pain recognition dataset. However, pain levels are inconsistent across the UNBC.
One of the most critical aspects of deep learning, first offered by Goodfellow et al., is generative adversarial networks (GANs) [11]. Many analysts have used GANs to complete a variety of tasks, such as creating iris models or a modified national institute of standards and technology (MNIST). Since GANs are composed of two deep neural networks, they perform the same task as a neural network; in essence, they are also a discriminator that attempts to match a fictitious picture against a genuine image created by the generator. This process is repeated until the separator is unable to differentiate between the initial and generated images. As a result, the GAN can address issues of class balance and accuracy improvement. Pikulkaew et al. [12] applied the Wasserstein generative adversarial network (WGAN) to human faces and discovered that it may improve the efficiency of the pain discovery technique. They also used the WGAN to improve pain detection techniques, which could increase efficiency due to the limited data in the relevant datasets [13].
This research aims to offer a system for 2D facial pain detection and classification that utilizes both deep learning techniques and motion, specifically a motion technique that allows individuals to maintain a high degree of orientation when shifting from side to side. This comes from the system's ability to predict the axis of the decoded picture based on the coordinates from a reference image. Our primary contributions are as follows: • A strategy for detecting pain that relies on deep learning for classification and recognition. This system is more capable of reaching state-of-the-art performance than other classification methods, such as support vector machines (SVMs) • Classification of pain into three categories: not painful, becoming painful, and painful. We implemented a three-level scale because we wanted to create a method that can be used as a standard tool for pain measurement in daily life. In the past, certain researchers were only capable of performing either painless or painful procedures, but the proposed method's evaluation system explores pain detection with both experimental and ground truth (doctor analysis). The results of the investigation matched the medical personnel's ground truth data.
We believe that the techniques presented here offer significant contributions to the fields of computer vision and image processing, among others, especially in terms painsensing methods. These techniques can also be used, with appropriate modifications, to address other detection issues. For example, a system that can help people with everyday life could include primary pain screenings for issues such as abdominal pain, shoulder pain, or COVID-19.
The rest of this article is structured as follows: the relevant works are proposed in Part II; our materials and methods are explained in Part III; the studies and findings are given in Part IV and discussed in Part V; and, finally, our accomplishments and subsequent works are summarized in Part VI.

II. RELATED WORK
The FACS, proposed by Ekman et al. [15], is a standard facial movement estimation tool used by researchers in many fields for facial expression identification. FACS defines facial expressions by encoding facial muscle movements as fourthfourth AUs. While it seems that it can both distinguish pain and identify its potency, some researchers [8] have observed a correlation between individual emotional states, such as sadness and surprise, making it harder to use the AUs to accurately detect pain. In-depth studies of the mood-detection process can be found in the work of Zeng et al. [16] and, most recently, Cohn and Torre [17]. These studies are generally tied to three database characteristics: (1) the classification of a pain expression (COPE) dataset [6] in terms of an infant experiencing pain; (2) the Biovid heat pain dataset [18], which archives pain instances; and (3) the UNBC pain dataset [9] of adult patients with shoulder pain.
Conversely, Gholami et al. [7] used a relevant vector machine (RVM) for binary pain detection on the face of a manually selected child and Guo and Zhang [19] suggested a local binary pattern (LBP) and extension to enhance facial imaging and accuracy. During experiments on the Biovid dataset, Werner et al. [18] obtained information from various outlets and then used a head posture estimator to assist in pain management. They showed that specific individuals had different pain thresholds. Littlewort et al. [20] discussed the perceptions of facial expressions in those who are experiencing pain by using a previously established AU detector implemented with Gabor, AdaBoost, and SVM filters. Their analysis focused on AUs and was developed using the UNBC pain database. Lucey et al. [5] used active appearance models (AAMs) to manually track and adjust faces on a labeled keyframe and enter them as a device classification SVM. If any pain-related AUs previously identified by Prkachin et al. were reported, the frame was described as indicating pain [10]. Methods explored in Kaltwang et al. [21] and Rudovic et al. [22] estimated numerous types of pain. In comparison, Kaltwang et al. used LBP, discrete cosine transformations (DCT), and AAM together to determine pain intensity either by AU or by directly processing all device frames while Rudovic et al. suggested a more time-specific conditional random field (CRF).
Prkachin and Solomon [10] demonstrated that four activities-lowered eyebrows, eyelid tightening or cheek elevation, nostril or upper lip lift, and eye closure-are consistent enough to be considered significant expressions of pain. Various studies have reported that other actions can also be related to pain. LeResche et al. [23], for instance, stated that horizontal stretching of the lip caused by the risible muscle correlated with suffering. Similarly, Craig et al. [24] indicated that lifting the lip, which includes the zygomaticus major muscle (which is also activated when smiling), increased as the pain level increased. Given the well-reviewed recommendations for self-measuring pain based on the FACS, Prkachin and Solomon [10] suggested the PSPI. In their study, pain expression was thoroughly defined by the activation of a limited set of facial muscles and coded by a community of related AUs: AU4, AU6-AU7, AU9-AU10, and AU43. The PSPI metric is presented in equations 1 and 2: Except for AU43, which has two values (0, 1), each operation is rated on a six-point scale (0 = imperceptible, 5 = maximum) using FACS. PSPI measurements yield 16 levels.
Significant advancements were made in 2001, when Viola and Jones [25] designed a hair-based cascade classifier for entity detection, and in 2002, when Lienhart and Maydt enhanced it [25]. The target detection method is both quick and accurate, though performs better from the front and is limited when placed to the side of the face. In 2020, Bargshady et al. [30] proposed a new method for facial expression deep learning to detect pain in four phases, and found that the enhanced joint hybrid -convolutional neural network -bidirectional long short-term memory (EJH-CNN-BiLSTM) system that they proposed substantially outperformed the traditional method in terms of efficiency. Additionally, Bargshady et al. [35] developed an ensemble deep learning model -CNN -recurrent neural network (EDLM-CNN-RNN) that is capable of correctly classifying pain and generating multi-class pain levels, offering several possible applications in the field of medical informatics.
Hussein et al. [36] suggested using a CNN model to comprehend three facial emotions: neutral, negative, and positive emotions. This model, inspired by Xception, uses residual blocks and depth-separable convolutions; it achieved an accuracy rating of 81% for unseen outcomes while neutral emotion recognition has an accuracy rating of only 51%. In the same year, Egede et al. [37] developed the EmoPain dataset, a novel platform for assessing chronic pain using multimodal facial and body expressions. Dufourq [38] conducted a study on facial expression recognition using CNN and found that CNN enables researchers to reach a higher level of classification accuracy than the conventional techniques. With sufficient effort, future CNNs will be capable of achieving state-of-the-art performance while also requiring less computing power. Li and Xu [39] hypothesized that the emotion-based categorization of facial expression recognition is heavily reliant on the quality of the available data. As a result, they proposed a new framework for pre-selecting relevant pictures based on reinforcement learning.
Ravi and Yadhukrishna [40] presented a balanced comparison of two of the most commonly utilized facial expression recognition (FER) methods, LBP and CNN, noting that the lighting conditions in the images affected accuracy. Their findings showed that CNN outperformed LBP. In the area of animation, Paier et al. [41] pioneered the use of a deep neural network (DNN). The method's central idea is to enable animators to easily manipulate the facial expressions of a virtual human character through the use of deep neural networks. This data-driven method allows for the generation of physically accurate facial animations and representations. Ameur et al. [42] presented a method for increasing the detection rate of faces using monogenic binary patterns, (MBPs) and CNN, as well as DCNN, which is one of the most effective methods for enhancing large-scale picture recognition.
However, as shown in the summary of these techniques presented in Table 1, none of the existing methods consider 2D facial pain detection and motion. An important advantage of this approach is that it decreases the amount of time that hospital staff requires to monitor patients, including impaired people, children, and ICU cases, or to, for example, conduct a primary screening for COVID-19. Moreover, this method can reduce costs by decreasing the need for expensive tissue analysis.

A. DATA COLLECTION
This study started with data collection to determine the extent of the difficulties involved in pain detection. Since we focused on human subjects, we used the shoulder pain dataset (UNBC), which included validation experiments, for our work. This photograph collection includes over 49,000 images of adults of all genders. Under painful conditions, sequential images of people and objects are taken with an effective image resolution of 320 × 240 pixels. The subject turns their shoulder independently to perform dynamic movement, though the operator may assist the patient with passive movement. Additionally, analysts concentrate on the front-facing images during the dynamic state, while video recording takes place approximately 70 degrees from front during the passive state. Examples are shown in Figure 1. Due to the benefits of this technique in real-life situations, including basic discomfort like a stomachache or menstruation, we only

B. DATA AUGMENTATION
It is important to ensure data remains almost equal, thereby enhancing the overall dataset and handling any potential errors. This approach is known as the imbalance technique [33]. Instead of modifying all of the pictures in the collection, the data augmentation procedure will choose one image, apply simple geometric transformations like  translation and rotation, and then proceed to the next step. By using this method, images in the training set that have been changed are treated as belonging to the same class as the original image. When we enhance data, our goal is to increase the generalizability of our models. We must utilize one of the imbalanced algorithms because the UNBC dataset is deficient in some classifications, meaning that the model may show bias when some classes are less common than others. Thus, before proceeding with the pain detection procedure, we must utilize data augmentation to expand the dataset; the parameter we used is shown in Table 2.

C. PAIN ASSESSMENT SCALES
Both careful evaluations and the formal diagnosis of a patient's condition are required for adequate pain control. As disease may have individual side effects, pain assessment approaches typically rely on a patient's understanding of   pain and degree of severity thereof. Typical self-reporting approaches therefore suffer from pitfalls such as reactivity to feedback, duplicity, and differentiation between a patient's sense of discomfort and the clinician's assessment, leading to the addition of an ORS to address these concerns. In situations where the examination takes a long time, such as when supervising individuals in clinics, the ORS's main disadvantage is its poor reliability and validity. We can categorize initial pain in three ways: (1) by computing pain using the PSPI equation; (2) by using ORS, the Visual Analog Scale (VAS), the Affective Motivational Scale (AMS), and the Sensory Scale (SS) to determine the source of the initial pain; and (3) by instructing a deep convolutional neural network (DCNN) to locate a point that changes based on pain.   Table 3 summarizes how to grade pain. The pain severity measures we used are described below: • The VAS is an adult version of a one-dimensional pain scale. On this measure, patients indicate their current level of suffering on a scale ranging from 0 to 10, with 0 representing no pain and 10 representing the most severe pain.
• The ORS is a multidimensional instrument that is related to past or recent performance and practice. Medical personnel will measure the patient's distress on a simple 0 to 5 scale, with 0 representing no discomfort and 5 representing the highest amount of pain [3].
• The AMS is a defined distinction measure that expresses emotional-control inclinations. The scale runs from

D. FACIAL DETECTION PROCESS
We first used OpenCV to improve facial recognition, but it is important to note that OpenCV is not designed to recognize faces. We attempted to plot an AAM to find AUs in every patient image before using the DCNN, demonstrating the primary role used to analyze pain, as shown in Figure 2. Our face recognition architecture is built on ResNet-34, a deep residual learning method developed by He et al. [14] for image recognition. Over three million pictures of tagged faces from the wild dataset were used to train the network. The training process then collected sequential photos from the UNBC dataset to create the marker. The reason for drawing labels is to boost and increase the accuracy of the system's speed. We then used the movement to predict the axis in the decoded picture based on the coordinates in a reference image. A deep learning model can learn by itself while maintaining precision, though it has limited power. For example, it needs to process a significant amount of data to obtain good results. Time is an additional aspect to consider, as this approach takes much longer than others, including the SVM.

E. DEEP LEARNING PROCESS
Convolutional neural networks (CNN) are neural networks that are primarily used to categorize pictures, cluster images based on their similarity, and recognize objects within scenes; CNN are the most frequently used computational method for image identification. LeNet [32] was one of the first convolutional neural networks to be developed and was crucial in the development of deep learning. We can refer to it as the world's first deep neural network, as it was built in 1998 to address the problem of digit recognition. Following that, ResNet is regarded as a particularly unique architecture because it is a true deeper design, with 152 levels. It gained popularity because it bypasses the degradation problem, which occurs when the depth of the network is raised beyond a certain threshold in both the training and testing sets. The previous layer's output, known as the residual, is added to the current layer's output in ResNet. Our technology for pain detection is based on CNN ResNet-34, with the number 34 indicating that the CNN has 34 layers.
A deep convolutional neural network (DCNN) is composed of many layers of neural networks. Usually, two distinct layer types, convolutional and pooling, are used alternately. Each filter in the network has a depth that rises from left to right. Typically, the final level is composed of one or more fully connected layers. The layer count is the distinction between CNN and DCNN; a deep learning model with a larger number of layers is said to be deeper. Thus, DCNN is just a CNN with more layers.
F. PARTICULARS OF APPLICATION Python 3.5.1 was used to establish pain-sensing via deep learning and motion algorithms on Windows 7 (64 bits) with OpenCV, and the UNBC dataset was used to train the painsensing system. To prepare for research, we used 70% of the sequential images for training and the rest for testing.
Another key factor is the processing time required for the Intel i7 processor with DDR4 16 GB RAM; identifying both pain classes takes about three months. Our system flowcharts are shown in Figure 3 and our algorithm for facial expression is displayed in Table 4.

A. PAIN INTENSITY METRIC
Including an alarm device with audible effects for emergency situations involving ICU patients or disabled individuals was essential for this research, as we wanted to make it possible for doctors and nurses to apply our software in real life. The results are summarized in Table 6. Six AUs-AU4, AU6, AU7, AU9, AU10, and AU43-controlled the intensity of PSPI pain. The PSPI scale can be given using equations (1) and (2). Other assessments, such as the VAS, ORS, AMS, and SS, were often used to rate people as well. The data was collected from medical professionals and others by a physician/specialist chosen by the National Center for Community Health Informatics. Table 5 illustrates our method for pain identification.

B. EVALUATION PROCEDURES
Determining whether or not a model is performing adequately in terms of belief and validity is, of course, essential, and often the only way to do so is via empirical data. Four possible outcomes may occur while conducting categorization predictions. Table 7 contains a list of the factors that we utilized in this study.
Precision, accuracy, and recall are the three essential characteristics used to evaluate a model. Table 8 summarizes the variables used in the evaluation measurement.

C. PROCESS OF VALIDATION
Our model was validated against classification criteria as well as a variety of batch sizes and epochs. Additionally, we assessed the proposed method's performance both with and without data augmentation. The findings from the tests are summarized in Table 9 while Table 10 compares the proposed model to the unbalanced methodology in terms of VOLUME 9, 2021    precision. The graph in Figure 4 illustrates the relationship between training failure and the accuracy of our model. Table 9 shows how our 2D facial expression and motion algorithm augments data. Ninety-nine percent of the results  were correct; accuracy was limited when working with small batch sizes but was excellent when working with large batch sizes. Additionally, our objectives could be accomplished within a short time period. Table 11 shows a comparison of experimental and ground truth (doctor analysis) data in terms of patients' pain in three modules: no pain, initial pain, and pain. The initial pain and pain classes influenced the imbalanced data from the UNBC dataset, showing that these two classes sometimes demonstrated similar patterns for the two values of accuracy. Nevertheless, it was important to explain both the benefits and drawbacks of deep learning, which require more information to process, and so we decided to demonstrate that some experiments could drop below 90% accuracy, though every experiment actually achieved above 90% accuracy in practice.

V. DISCUSSION
For example, Experiment 5 in the no-pain category had the highest accuracy rating, 99.75%, with a low of 98.63%. The initial pain category accounted for 92.93% of this group's maximum value, while the accuracy rating in terms of classification of the type of pain reached in 95.15% in Experiment 4.
Overall, the most significant decrease was found in the pain category. In Experiment 7, the accuracy of the initial pain hit a low of 88.43%, while pain accuracy plummeted to 82.17% in Experiment 8. However, all experimental results matched the results of the doctor's analysis, demonstrating that this experiment was reliable. Reliability could be further improved in the future by incorporating sequential images or using GAN or data augmentation to increase the deep learning model's accuracy and limits. Table 12 summarizes the obtained findings and compares them to the state-of-theart output.

VI. CONCLUSION
This article proposed a novel strategy for detecting pain using deep learning that can be used in both patients' everyday lives and in hospitals. Individuals with shoulder discomfort were used as a model to validate the technique. The proposed approach works effectively in conditions that cannot be handled using standard methods. We also proposed a method that can increase the efficiency and accuracy of pain perception by applying deep convolutional neural networks to monitor conditions and other complex environments.
We plan to add accompanying photos to one of our potential studies, as the levels of pain severity between the initial pain and pain classes are currently unbalanced. Accordingly, the GAN, imbalance technique, and data augmentation will be applied to increase the accuracy of the process [31]. Additionally, the deep learning methods presented here, mainly 2D facial expression and movement, were superior to conventional methods (doctor analysis) in terms of both accuracy and cost. This approach could be applied to either other fields or to further medical research in the future; potential examples include facial recognition for criminal identification and pain detection in terms of contagious diseases.