A Novel Facial Thermal Feature Extraction Method for Non-Contact Healthcare System

The non-contact healthcare system is a system that can avoid germ infection, and can also provide comfortable and convenient health care services for the caregivers. In the current thermal imaging research, local or small area images are used to represent the overall temperature information of the participants, but the temperature feature of the face should not only be used in a small part and ignore other parts. The facial thermal image can show clear temperature feature but is not conducive to meaningful feature extraction. Therefore, this research proposed a novel facial thermal image feature extraction method, which is used facial landmarks to detect and cut 12 blocks to establish a new feature matrix based on color mean values and standard deviation values. It can establish clear features on facial thermal images. The core part of the proposed healthcare system is the use of a deep learning framework, which is based on CAFFE under the DIGITS platform. The CAFFE runs the classic CNN, GoogLeNet. Based on the acquired images and new feature types, four models were trained, which were used for the raw RGB image, raw thermal image, RGB feature image, and thermal feature image. In the experiment, 800 images were used for training and validation, and 200 images were used for testing. An additional 40 images were used for random testing. The experimental results show that RGB images cannot be effectively used, thermal images can effectively predict the health status, and thermal feature images have the highest prediction accuracy.


I. INTRODUCTION
In the developed countries, the social structure of Taiwan is moving towards an aging society, and Japan is already an aging society. Facing the shortage of medical care manpower, the healthcare system is very important. Automated healthcare system requires only a few medical staff to provide care, some can even be performed fully automatically. With the improvement of the level of care, non-contact care systems have been developed to prevent people from being infected by germs and provide a safe care environment. In addition, non-contact systems can be monitored without affecting the living of the patient. Remote monitoring is used to submit The associate editor coordinating the review of this manuscript and approving it for publication was Chun-Wei Tsai . the immediate physiological status to the management center. In addition to the comfortable and convenient health care services provided by the caregivers, remote management services can also improve care efficiency. In order to achieve a non-contact care system, thermal imaging or infrared imaging is a common and well-established technique. In current thermal imaging research, facial thermal images can show clear temperature characteristics. Through image analysis and system integration, it is possible to monitor the health status predictions of the patient.
In recent years, non-contact system has gained increasing interest. Especially, thermal imaging is one of the emerging important materials. Many studies have used thermal image for development. However, in clinical practice, personnel have not been able to directly recognize unprocessed VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ thermal images. These images must undergo more meaningful processing to increase recognition [13]. There are many researches on the identification of facial thermal images in the state-of-the-art method research methods. In the past research, heterogeneous face recognition (HFR) is depend on eye center localization in thermal infrared faces. With the advancement of data science and computer throughput, automatic heterogeneous face alignment methods no longer need to rely on eye center localization [2]. B. Jian et al. [3] proposed a method use the facial temperature as feature in face recognition that is constructed to establish infrared thermal facial image. M. M. Khan et al. [4] analyzed thermal infrared images to measure affect induced facial thermal variations. Infrared facial images also change over time. G. Hermosilla Vigneau et al. [5] Observed the relationship between infrared image and time axis when used in facial recognition system. The results show that when infrared images of the face are acquired over time, the change results are obvious. If only the local binary mode is targeted, most of them have no significant effect on the time variables. Andrea Tangherlini et al. [41] analyzed the development of temperature measurement and infrared technology. The development of image processing technology has improved the reliability of thermal imaging analysis. The authors attempted to de-identify biometrics and tested them on faces, heat-sensitive faces, palm prints, palm veins and finger veins. Extensive experiments were performed to analyze the recognition ability and protection performance of each recognition method later. The results show that not only can the identity message size be reduced by 50%, but also a distinguishable and confidential revocable pseudo-bio identity can be generated. In order to improve the accuracy of thermal imaging in health prediction, we also searched for further analysis of thermal features. Gerald Schaefer et al. [31] proposed a one of the possible solutions of imaging registration which is a method that based on field-warp. By comparing the skin temperature distribution of the human body's natural skin or posture changes, the reliability of the thermal imaging is effectively improved. B.F. Jones [34] proposed a method for generating accurate overlays of thermal and visual medical images. The image segmentation step based on the skin detection removes the first unnecessary background information of the visual part to obtain a more accurate judgment of the treatment effect of the patient. In order to effectively diagnose the physiological condition of patients under heat and other stress, Duygu Savasci et al. [35] proposed that the method of image processing program can be used to analyze static and dynamic captured images. Adrian Dinculescu et al. [36] created an analytical system which is based on infrared thermal imaging. This system with serval advantages such as non-contact, non-invasive, non-ionized and with a lower error rate than existing system. Meanwhile, monitoring conditions were based on thermal asymmetry and time-dependent thermal differences. Thus, the author achieved successful results. In addition to the use of thermal imaging in physiological monitoring, thermal state assessment can also use thermal imaging as identification material. E.F.J. Ring [37] proposed a novel psychophysiological assessment architecture. The proposed vital parameters determining method was based on combined thermal infrared and visual spectrum imaging technologies. James B. Mercer et al. [7], [42] analyzed the infrared thermal images and the thermal change of temperature during fever. The results show that the inner canthi area of the eyes is a preferred and recommended site to represent core temperature, and the temperature will also change significantly when the physiological state changes.
In some studies, it was proved that adding RGB images to enhance the recognition effect. G. Hermosilla et al. [8] proposed a novel face recognition system based on fusing thermal and visible descriptors. S. Wang et al. [9] proposed a novel method that used visible image and thermal image as features for facial recognition, which is classified by SVM. A couple representation similarity metric (CRSM) is designed to measure the similarity between obtained graphical representations by Peng et al. [10]. Compared with the past analysis of RGB images and thermal images, it relies on human resources. Therefore, methods for using machine learning for image recognition have also been developed in recent years. Machine learning algorithms are applied to the current development of RGB images. This algorithm requires many labeled images. However, these image processing methods are currently unavailable in the thermal domain [14]. Compared to RGB images, the algorithms used for thermal image processing still lack robustness and accuracy. It requires a more advanced image processing engine to enhance processing performance. T. de Freitas Pereira et al. [1] suggested that the high-level features are potentially domain independent in visual spectra images of Deep Convolutional Neural Networks trained. The shallow CNN stacked with LSTM and deep CNN were applied in a proposed method that is by G. Batchuluun et al. [6]. CNN can capture many subtle 2D or 3D features, and LSTM can capture temporal features. So, the advantages of both networks are combined to capture more spatial and temporal features. To concludes the above state-of-the-art methods, we need to combine the RGB image and thermal image extraction features from different domains, in addition to strengthening the features can also prevent counterfeiting. The acquired features use deep learning for image recognition, which allows the recognition process to add more parameters and be more adaptive. The proposed architecture will be detailly described in next section.
The main contributions in this research are as follows: • Establishing a non-contact healthcare system to predict health status based on deep learning model with GoogLeNet.
• Proposed a novel feature extraction method based on facial thermal image to create a color matrix of facial features.
The rest of this paper organization is briefed as follows: Section 2 presents the material and method related to deep learning framework as well as image processing of RGB image and thermal image. Section 3 delivers the results related to the preprocessing results and prediction of the deep learning framework. Discussion is described in Section 4. Finally, Section 5 concludes the research.

A. PROPOSED ARCHITECTURE DESCRIPTION
The proposed architecture as shown in Fig. 1. The core of this architecture is a deep learning framework that is based on the convolution neural networks (GoogLeNet). Deep learning framework is consisting of a prediction model, a pre-trained model and OpenCV library. In this research, the model is implemented by CAFFE based on the DIGITS platform and OpenCV is used for real-time prediction. The input parameters of this architecture are RGB images, thermal images, RGB feature images, and thermal feature images as well as the output is the prediction/classification health status result of a human. The format of input images are paired images of RGB and thermal images in Rainbow-scale mode. The inputting feature format is a 4 × 6 RGB color matrix which are color features reconstructed from the original image. The prediction result will show an image and list the possibility of showing the health status, after a current human image is input into the model. The more detailed description will be described in the following sections.

B. THERMAL IMAGING AND CAMERA
The FLIR One is a mobile infrared thermal camera as shown in Fig. 2 and the specification as shown in Table 1. There are two cameras on the FLIR One, the first with far infrared imaging and the second with normal RGB. In order to provide a better detection effect, the images of the two cameras are superimposed, and the absolute temperature estimate of a point on the graph can be calculated. Since the temperature distribution characteristics of each generation of detection materials are different, the presented results cannot be viewed using a conventional single image. In FLIR One, different image mode selections can be used to observe different  image characteristics, which are seven modes including Contrast, Rainbow, Iron, Gray (White Hot), Lava, Arctic and Wheel.
A general-purpose pallet is colors palette Iron and Gray, which are quickly identifies thermal anomalies and body heat. Therefore, it can be used to achieve fast detection in general occasions. A simplicity for scenes with a wide temperature span could be offered by Gray palettes, and Iron shows heat distribution and subtle details by color. Colors palette Arctic is able to detect heat sources quickly and shown as different colors as well as darker shading picks out slight temperature changes. Colors palette Lava can be used in situations where contrast needs to be improved for detection. A colors palette Rainbow is best solution for scenes with minimal heat change and focus on an area with similar heat energy.
In general, the overall temperature of the face does not change much. In particular, we want to be able to observe temperature changes in different areas of the face and create this change as new features. Therefore, considering that the temperature change is not obvious, the Rainbow mode is quite suitable for facial thermal imaging in this research. In this way, relatively small regional temperature changes can be observed from the thermal image.

C. EXPERIMENTAL CONFIGURATION AND IMAGE COLLECTION
In order to make the collected images have the same specifications, Fig. 3 shows the experimental configuration of the image capture process. The distance between FLIR One and the participants is 60 cm, and the face must be in the center of the image. The thermal imaging is used the rainbow mode, VOLUME 8, 2020 and the image output contains RGB images and thermal images.
In the image collection, chronological collection was performed on 10 participants. One participant will take two images at 08:00, 11:00, 14:00, 17:00, and 20:00 (two RGB images and two thermal images) on each day. During the 10-day collection, a total of 1,000 images were stored in the dataset from participants. These 1,000 images will be used to validate and test prediction models of health status. In the final stage, an additional image of each participant is taken to make predictions for practical applications at random time.

D. IMAGE PREPROCESSING AND FEATURE EXTRACTION
The primary preprocessing in this research are image resize and image cutting, Fig. 4 shows the schematic diagram of dataset acquisition and preprocessing. Since the raw image size is as high as 1080 × 1440, in order to make the depth learning and image processing have better performance, all images were cut to leave only the head and resized to 480 × 480. The resized image will use the Landmark function to detect and record important points on the face and facial features. Since the Landmark function often fails in thermal images for facial recognition, after the RGB images are labeled, the labeled points are mapped to the thermal images to complete the labeling. After the marked points in the image are connected to each other, the face will be cut into 12 quadrangles of different sizes (as shown in Fig. 5). Then calculate the average value and standard deviation value of RGB color in each block and make a new color matrix. This color matrix is the color-feature image which is extracted from the raw image. After preprocessing and feature extraction, two images and two color-feature images can be obtained, and these are the four input-data of the prediction model. The Fig. 5 shows the schematic diagram of quadrilateral cutting and feature extraction of facial thermal image. The 12 blocks in this facial image are respectively named from A1 to A12 and mapped into the feature image, synchronously. Each block consists of two values (colors), which are the average and standard deviation of the RGB colors, respectively. The calculation method of the average value as shown in (1). After calculating the average value of each of the red, green, and blue in the square, a new color is formed. Equation (2) illustrates the calculation method of the standard deviation value. Similarly, the standard deviation of the three colors is calculated separately to form a new color. Therefore, the color is represented as a 4 × 6 matrix, and the numerical value is represented as a 4 × 6 × 3 three-dimensional matrix.
where: R i , G i and B i are the observed values of the sample items, R mean , G mean and B mean are the mean value of these 86548 VOLUME 8, 2020 observations, R STD , G STD and B STD are the standard deviation value of these observations, and N is the number of observations in the sample.

E. DEEP LEARNING FRAMEWORK
The deep learning architecture in this research is illustrated in Fig. 1. The DIGITS, a platform for fast training of deep neural networks on which many neural models can be performed, such as CAFFE, Torch and TensorFlow. The CAFFE is an excellent neural model in a common network model. The CAFFE is an excellent neural model in the common network. The advantages of CAFFE are easy to use, network configuration without coding, modular network design, high training performance and training state-of-the-art model easier. Among the ready-to-use classic models in CAFFE are AlexNet, GoogLeNet, and VGGNet, which are all based on Convolutional Neural Networks (CNN).
The earliest CNN classification model (LeNet) was developed in 1998. After 14 years, in 2012, AlexNet appeared due to the improvement of computer performance. In the revolution of CNN, AlexNet was a major breakthrough. At this time, CNN began a series of amazing developments. GoogLeNet and VGGNet appeared two years later. In essence, VGGNet is AlexNet with deeper network depth. GoogLeNet is also an important milestone in the history of revolution because it proposes a new network structure called inception that improves accuracy and reduces training time.
Conclude the current development of neural networks, we chose GoogLeNet, which has excellent performance in image applications in classic CNN, as the core model. Thus, DIGITS is selected as the platform to implement CAFFE, and train GoogLeNet to observe the prediction performance in the proposed architecture.

F. PROCEDURE
First, the data is collected and pre-processed, and a total of 1000 images of 10 participants are collected at this stage. Second, we extracted 80% of the 1000 images in a random manner as a training set and 20% as a test set. These two data sets are used to train, validate and test for GoogLeNet. When the model completes the testing, it will be deployed. At this stage, an additional 40 images will be used to predict the health status. After all experiments are performed, the real image and the feature image will be compared the prediction performance in graphical comparison and numerical analysis. Finally, the discussion and conclusions are described based on the experimental results.

III. RESULTS
All the experimental results in this chapter will be described and discussed. Since there is a lot of image data, one of the images is used as a demonstration and the experimental results are described.

A. PREPROCESSING OF RGB AND THERMAL IMAGE
The original image size is up to 1080 × 1440. In order to improve the performance of image training and prediction, redundant image removal and resize were performed in the preprocessing stage.
First, after acquiring an image, first perform object detection for a human head on a thermal image. Since the thermal source of the human body is greater than that generated by the background in the thermal image, the human face can be detected with high accuracy. Second, the new image range is centered on the human head and cut into a square. Third, resize this square image to 480 × 480. Finally, the relative position processing parameters are mapped to the RGB image to complete the cropping and resizing. The final result is shown in Fig. 6.

B. FEATURE EXTRACTION OF FACIAL IMAGE
In this section, the results of facial landmark detection, image quadrilateral cutting, and Feature Extraction will be described in detail, as followed.

1) FACIAL LANDMARK DETECTION
Object detection is easy to find specific contours in the image, but it is not easy to find the eyes or mouth on the face. The facial landmark detection is a good facial organ detection solution for eyes, mouth or forehead, and this feature detection technology is a mature and open source.
Thus, we perform landmark detection directly on the RGB image, and then map it to the thermal image after obtaining the coordinate points of organ feature. The temperature distribution of the human face does not have clear boundaries like real objects, which will cause features to be undetectable, because thermal images do not have clear feature boundaries. Therefore, landmark detection does not have a function on thermal images. Since landmark detection will output many feature points, we only need some points as the separation points of the block, so not all feature points are displayed. The result of landmark detection as shown in Fig. 7. VOLUME 8, 2020

2) QUADRILATERAL CUTTING
According to the experimental configuration, 20 feature points are output on the facial image after facial landmark detection is performed. The facial image based on 20 feature points was cut into 12 quadrangular blocks of various sizes. These quadrangular blocks will be used as a boundary for calculating the color in the feature extraction. The results of image quadrilateral cutting as shown in Fig. 8.

3) FEATURE EXTRACTION
After the quadrilateral cut of the facial image is completed, each block has clear boundaries. Each block needs to calculate the average and standard deviation values of RGB colors. The calculation method is according to Equ. 1 and Equ. 2. According to the configuration illustrated in Fig. 5, in each block, the left side color is from the mean value, and the right side is from the standard deviation value. This newly generated image is the feature image of the raw image, and it can also use a three-dimensional matrix to express this color feature image. The feature of RGB and thermal images are shown in Fig. 9.

C. MODEL TRAINING AND VALIDATION
In this research, the following stages need to be performed from model planning to deployment, namely the establishment of a data set, modeling, training, validation and testing.
First, according to the results of Sections A and B, four data sets were created, namely RGB image, Thermal image, RGB feature, and Thermal feature. There are five categories in each data set, which respectively indicate the health status. In the experimental configuration, there are a total of 800 training materials, 640 for the training program, and the remaining 160 for the validation program. In the modeling, the training epoch is set to 50, the basic learning rate is set to 0.001, and the neural network uses GoogLeNet. Fig. 10 to 13 show the training and modeling processes for the four datasets. It can be observed from the results that all four models converge at the highest accuracy rate, and the loss rate converges to zero. Table 2 shows the validation results in the four models. In the proposed neural network model (GoogLeNet), the probability of all categories (health state) will be output to represent the prediction result. The red box indicates the correct category of the prediction, so from Table 2, we can observe that the prediction results of each model are correct.       Table 3 shows the comparison of the accuracy of the test results. The highest and lowest prediction accuracy of each model for different health states are compared. Overall, the lowest accuracy rate is more than 50%, which means that the model can be meaningfully classified. Among them, the model for predicting thermal features has the highest accuracy rate, and the highest and lowest accuracy rates have the smallest difference, which means that this model can perform strong health state prediction.

E. RANDOM TEST ON HEALTH PREDICTION MODEL
In random tests, additional test images were taken from the original 10 participants. One person will take one photo and generate 4 images. Therefore, a total of 40 images in a random test will be used to predict health. Table 4 records the prediction results of participants' images. The form will be filled  in as ''Y'' if the prediction is correct, otherwise ''N''. It can be observed from the results that the images of 3 participants have not been correctly predicted.

IV. DISCUSSIONS
This research proposed a healthcare system for predicting health status based on thermal imaging and deep learning architecture. Based on the state-of-the-art research, many experts have developed healthcare systems for detecting thermal image temperature, respiratory rate, and even pulse rate. However, from the process of data preprocessing, it can be observed that the current research is to perform numerical analysis on local images, rather than comprehensive evaluation on images of the entire face. Therefore, in this research, the features of the full-face image were evaluated to make up for the gap in research.
In the data preprocessing stage, all preprocessing will directly affect the performance of model learning and prediction. Therefore, only the image of the face area is retained, and the size is reduced to 480 × 480. Since the temperature distribution of the face is relatively smooth, it is difficult to detect eyes or mouths by object detection or facial feature detection. Therefore, the detection process is performed on the RGB image and mapped to the thermal image.
Based on the best knowledge, facial feature extraction proposed in this study is a novel method. Due to the influence of temperature distribution, thermal image analysis should not only consider a small part of the image, but observe other thermal characteristics at the same time. Therefore, the image cutting and regenerating a new feature image are the biometrics of the participants. Based on the results in Table 3, it can be observed that when the thermal image is acquired using a novel feature extraction method, the prediction can have excellent accuracy. This is like a geometric analysis of the face. There are specific physical features between organs, and there are thermal features at the same temperature between different regions. After the facial thermal image is transformed into a new feature image, the blurred boundaries between features are clearly distinguished, so the features will be extracted and the model prediction effect will be improved.
It can be observed from Table 4 that the thermal image group passes uniformly but the RGB image fails. Since the data pre-processing method and the angle of the image shooting are similar, RGB images are easier to misjudge. Since RGB images do not have thermal features, it can only have skin colors and expressions. This has no effect on thermal image prediction, but instead strengthens the features of expressions.

V. CONCLUSION
This research proposed a non-contact healthcare system for health status prediction based on thermal imaging and deep learning architecture. The main purpose is to use thermal image to predict health status. Thermal images are captured using FLIR One and 10 participants joined. Deep learning architecture is based on CAFFE under the DIGITS platform, where CAFFE runs the classic CNN, GoogLeNet. In order to improve the prediction effect of the model and the features of the original image, a novel facial thermal image feature extraction method was proposed. The facial thermal image is cut into 12 pieces, and the color is given again based on the average and standard deviation values of the original image. This new feature image can effectively increase the prediction performance of the model. This can extend the proposed method to the long-term health tracking of individuals for future work as well as biometric technology based on thermal feature. The main contributions of this research are (1) establishing a non-contact healthcare system to predict health status, and (2) proposing a novel feature extraction method based on thermal imaging.