Deep Neural Network–Based Double-Check Method for Fall Detection Using IMU-L Sensor and RGB Camera Data

Existing methods for fall detection may not detect a fall when it occurs or may generate a false alarm when a fall does not occur. In order to overcome these limitations and detect falls with 100% accuracy, a double-check method for fall detection in elderly people via an inertial measurement unit-location (IMU-L) sensor and a red–green–blue (RGB) camera is proposed. The IMU-L sensor is a combination of an IMU sensor (accelerometer and gyroscope) and an ultrawideband signal-based location sensor; the RGB sensor is mounted on a robot. The proposed method involves detecting and confirming the fall of an elderly individual via the IMU-L sensor and an RGB image, respectively. The IMU-L sensor is worn on the body to detect falls. When a potential fall occurs, the individual’s location information is synchronized with the motion data. During detection, because of the sequential nature of IMU data, a deep learning technique called a recurrent neural network (RNN) is trained to classify falls. When the IMU indicates a suspected fall situation, the robot moves to the corresponding location and confirms whether a fall has occurred. During the confirmation stage, a convolutional neural network-based technique is applied to the RGB image data to recognize and confirm the fall. Repeated confirmed fall detections using this method classified falls more accurately than existing methods that use only an IMU sensor. We conducted a real-time experiment to validate our method using a dataset developed in a laboratory and achieved 100% accuracy in our experimental environment.


I. INTRODUCTION
A fall is a highly threatening situation for elderly people. A fall by an elderly person may cause serious injury or a lifethreatening situation [1], [2]. Thus, it is not surprising that the World Health Organization (WHO) [2] reported that falls are the second leading cause of accidental death following traffic accidents and more than 600,000 people die each year from falls. Because the number of elderly people worldwide is expected to increase to approximately 22% (approximately 1.0 billion to 1.2 billion) by 2050, the problem of falls by elderly people is likely to become even more serious [3], [4]. Therefore, prediction in advance of a fall by an elderly person The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan . is a very important concept. In [5]- [7], the authors presented a technique for predicting future frames of video data; this technique can help predict a fall by an elderly person. However, if a fall cannot be avoided by an elderly person, measures should be taken to ensure that the fall is detected and treated as quickly as possible. This is because elderly people cannot ask for help on their own if they are seriously injured or unconscious from a fall.
Consequently, many research teams are developing fall detection systems. The typical methods for automatically detecting falls by elderly people involve using a camera or motion sensor [8]- [15]. One common method involves detecting falls using a red-green-blue-depth (RGB-D) sensor, such as a Microsoft Kinect [16]- [29]. In fall detection methods that use a camera, the system first processes RGB images or depth data to recognize a person and then determines whether a fall has occurred. In addition, fall detection systems that utilize joint information from humans have been developed using depth data. However, because this type of method uses a fixed RGB-D sensor, it can detect falls only within the field of view of the RGB-D sensor.
Another typical method for fall detection is to use a system with a motion sensor such as an inertial measurement unit (IMU). Using parameters such as velocity, angular velocity, direction, and acceleration recorded by the IMU, the IMU can control instruments such as robots, track human joint information, classify behavior or detect falls [30]- [47].
However, IMU-only systems make it difficult to take immediate action in response to a fall because the location where the fall occurred is unknown. Therefore, to solve this limitation [48], focused on developing a technique to measure the location of a fall. In addition to these methods [49], also predicted the fall risk using a gait analysis.
However [50]- [52], state that these existing methods can generate false alarms. For example, actions such as the IMU wearer picking up dropped objects or lying on the ground can also be misclassified as falling. In addition [52], [53], state that wearable sensors are advantageous in terms of cost, but vision sensors are better in terms of accuracy. In [54], the authors studied a fall detection method using a fused system comprising an IMU sensor, an electroencephalograph (EEG) sensor and a fixed RGB camera. If another type of sensor is fused with a vision sensor, the fall can be verified once again, and good performance can be achieved, but only falls occurring within the camera's field of view can be detected. This constraint requires a large number of cameras and PCs to be installed to achieve full coverage of a large space such as a nursing home, which, in turn, leads to increased costs.
To overcome these limitations, we use a method that simultaneously acquires a user's location data and motion data, such as acceleration and angle. Our system utilizes the location data to move a robot to the location where a fall may have occurred; the system then double-checks whether a fall occurred by using image data obtained by a camera mounted on the robot. This method reduces fall detection errors as follows: when the analysis of the motion data collected by the IMU-L sensor detects a fall, the robot moves to the fall location and acquires an RGB image using its RGB camera ten times and then analyzes the RGB image to confirm the fall situation.
This study uses a recurrent neural network (RNN) and a convolutional neural network (CNN) to process the motion data from the IMU-L sensor and the image data from the robot, respectively. RNNs are widely used for activity and gesture recognition [55]- [57], gait recognition [58]- [60], and natural language processing [61], [62], and CNNs are widely used for object recognition and detection [63]- [68]. Because motion sensor data are sequential data, they can be analyzed with the RNN, and because an RGB image is a single-frame picture, it can be analyzed using the CNN. Because CNN algorithms require long learning times and large datasets to obtain good results, we used a transfer learning method, which can overcome the limitations of a small dataset and reduce the learning time [69]. Unlike previous studies, during the double-check step, the user's fall is confirmed using an RGB image. Because we use a single-frame image of a person lying on the floor, continuous RGB frames are not required. This is possible because the IMU sensor data are first used to determine a probable fall; then, the double-check to ensure that a fall actually occurred is performed using the image data after moving the robot-mounted image sensor to the user's location.
The novel contributions of the study described in this paper are summarized as follows.
• In existing fall detection methods that use an IMU sensor, when a fall occurs, the detection accuracy is not 100%. We propose a double-check method that can detect 100% of falls using an IMU-L sensor and an RGB camera mounted on a mobile robot. In this double-check method, when a fall is detected using an IMU-L sensor, the robot moves to the corresponding position and checks again for a fall using a mounted RGB camera. RGB images are acquired and analyzed ten times, and the accuracy is improved by deriving a label with a high probability. We use a robot that can move at a speed of 1 m/s. By achieving fall detection using RGB images within 2 seconds per image, the latency to recheck the fall is minimized.
• We compared the performance of RGB-based fall detection using various CNN algorithms used in the Ima-geNet Challenge (ILSVRC) [74], including VGGNet, ResNet, DenseNet, AlexNet, SqueezeNet, and Inception v3; then, we used the best CNN architecture to build the secondary fall detection algorithm.
The remainder of this paper is organized as follows. In section II, we present related work on fall detection methods. In section III, we describe the systems and environments employed in this study. In section IV, we describe the data set utilized in this study and the algorithm selected to analyze it. In section V, we present the results of experiments conducted using the datasets and algorithms described in the previous section. Section VI provides conclusions, and section VII describes future works.

II. RELATED WORKS
In this section, we present a compilation of related work on methods for detecting falls. The section is divided into a total of 3 parts, and the approaches that use an RGB-D camera, IMU sensor, and multimodal sensors are described. VOLUME 9, 2021 A. APPROACHES USING THE RGB-D CAMERA The method that uses an RGB-D camera is a study on a typical fall detection method.
In [8], the authors extracted the fall pose and general pose from the depth image obtained by an RGB-D camera and classified it using a support vector machine (SVM). The accuracy of each pose classification was improved by calculating the distance between a person's center point and a floor surface. They achieved 100% recall.
The authors of [23] create a bounding box using the depth information of a person obtained from an RGB-D sensor and detect a fall using the change in the width and/or depth of the bounding box. They achieved 83.0% and 87.0% for recall and accuracy, respectively.
In [28], the authors measured the direction of the human body by connecting several joints (head, center of shoulder, spine, hip and mean point of the knee) using the human joint information measured by an RGB-D sensor. Afterward, a fall was detected by measuring the angle between the direction of the human body and the floor and the speed of change. They achieved 92.5%, 100%, and 95.8% for recall, precision, and accuracy, respectively.

B. APPROACHES USING AN IMU SENSOR
Fall detection using an IMU sensor is another representative approach.
The authors of [32] use IMU sensors (accelerometers and gyroscopes) to detect falls. They collected 4 types of falls (forward, backward, lateral and failure to sit on a chair) and 6 types of activity of daily living (ADL) data (sitting on a chair, getting up from a chair, lying down on a bed, getting up from a bed, waling and walking along a stairway). The authors detect a fall, possible fall, and ADL using sensors worn on the waist. After a certain period of time after a fall was detected, they classified fall and possible fall using the pose information of sensors worn on the waist and ankle. In this way, the authors were able to reduce the false alarm of fall detection. They achieved 95.6% recall.
In [46], the authors obtained acceleration and angular velocity using an IMU sensor worn on the waist. They collected data on nine types of falls (slipping-backward, walking-tripping-forward, jogging-tripping-forward, sittingdown-backward, sitting-backward, forward, backward, lateral and twist) and 14 types of non-falls (walking, jogging, squatting, waist bending, stumbling while walking, jogging in place, jumping, ascending and descending stairs, slowly sitting up on a stool, quickly sitting up in chair, trying to get up and collapsing into a chair, lying, slowly sitting up on a low-height mattress and quickly sitting up in a low-height mattress) and designed a classifier using an artificial neural network (ANN). The authors achieved 99.86% classification accuracy for falls and non-falls.
Representative fall detection technologies that use IMU sensors commercialized in recent years include ''Apple Watch 4(or later version), Apple, Inc. [75]'', and ''Galaxy Watch 3, Samsung, Inc. [76]''. These devices detect a fall and generate an alarm if the user of the device does not move for a certain period of time after a strong shock is delivered to the user while wearing the watch.
There are other studies on fall detection using smartwatches. In [47], the authors proposed an application that detects falls using accelerometer data collected from commercially available smartwatches. They utilized Microsoft Band 2 to collect 7 types of activities of daily living data (sitting, getting up, jogging, throwing an object, waving, taking a drink, and going upstairs and downstairs) and 4 types of fall data (forward, backward, left-side, and right-side). They classified data classes using naive Bayes, SVM, and GRU models and obtained 100% recall and 85% accuracy when using the GRU.

C. APPROACHES USING THE MULTIMODAL SENSORS
Research has also been conducted using multimodal sensors to reduce false alarms and increase measurement accuracy. The multimodal sensor method has the advantage of improving the accuracy of fall detection by redundant inspection of falls. Alternatively, the detection area that cannot be detected using only one sensor is supplemented with different sensors.
In [11], the authors conducted a study to detect falls using an accelerometer and an RGB-D sensor. In this study, the accelerometer data were by the fuzzy engine to infer motion, and the depth data of the RGB-D sensor was used by the fuzzy engine to infer posture. In addition, inference is not performed on every frame but is executed when the results of the threshold analysis of the accelerometer data predict a possible fall. They achieved 100%, 93.75%, and 97.14% for recall, precision, and accuracy, respectively.
In [54], the authors collected and published data on six types of ADL (walking, standing, picking up something, sitting, jumping and laying) and five types of falls (forward using hands, forward using knees, backward, sitting in a chair and sideward). A total of 17 healthy subjects participated in this experiment. They simultaneously collected data using an accelerometer, gyroscope, light value sensor, electroencephalography (EEG) headset, and RGB cameras, and validated their performance using machine learning models. The authors combined and verified the data obtained from each sensor in seven combinations of modalities, and the best results were obtained when the video frames acquired by the RGB camera were input and verified by a classifier using the CNN algorithm (71.3% recall, 71.8% precision and 95.1% accuracy.

III. MATERIALS
In this section, the IMU-L sensor, robot, and RGB camera used to detect falls are described. In addition, we also describe the environment in which the system is installed and the characteristics of the software to run them. 48066 VOLUME 9, 2021

A. IMU-L SENSOR SETUP AND MOTION DATA ACQUISITION
The IMU-L sensor used in our system uses an ultrawideband (UWB) signal to determine the location of the sensor worn by the user. This IMU-L sensor was developed at the Center for Healthcare Robotics of the Gwangju Institute of Science and Technology (GIST, Republic of Korea). In this sensor, the DWM1000 module from Decawave Corp. was used to transmit/receive UWB signals, and the MPU9250 from TDK Invensense Corp. was used as the IMU sensor.
Four anchors and one master anchor must be installed for location measurement. The master anchor is responsible for controlling the blink rate of the two IMU-L sensors and sending the collected ranges of anchors 1-4 to a personal computer (CPU: Intel Core i5-7260U; RAM: 8 GB; SSD: 120 GB) via a serial port. Then, the PC computes the location of the sensor based on the multilateration method using the strength of each sensor's signal that reaches each anchor. The IMU-L sensor data were acquired at a rate of 16 Hz, and the sensor signal has a stable reach of up to 30 m. In this study, the size of the space consisting of 4 anchors installed to collect IMU-L sensor data is 9.5 m × 6 m, and 5 anchors (including a master anchor) are installed at a height of 2.6 m from the ground. The error range of the positional accuracy of the IMU-L sensor within this space is ±20 cm. IMU-L sensors are mounted on both the user and the robot. Fig. 1 shows the setup of the IMU-L sensor system. The users wear the IMU-L sensor on their shoulder. Although other studies have explored activity recognition with IMU sensors worn at various positions, such as on the arms, torso, and legs, the optimal position for IMU sensors has not yet been defined; however, we found the shoulder to be a good sensor position in our experiments because less physical interference occurred when a user fell than when the sensor was worn at any other position. Fig. 2 shows the IMU-L sensor and a user wearing it.
Previous research on fall detection using only motion data, such as acceleration or angular velocity data from an IMU sensor, has shown that actions such as picking up an object from the floor can be misclassified as a fall [51]. Similarly, actions such as lying on the ground or jumping can also be detected as falls. When the range of categories detected by an IMU-L sensor is small, a high possibility exists that other common actions will be erroneously detected as falls. Therefore, to overcome these limitations, we propose applying the IMU-L data to classify the various activities that may occur in daily life based on previous research [40], [77].
We collected IMU-L data on eleven types of actions, including four types of falls, two types of lying down, jumping, standing, walking, sitting, and picking up an object from the ground, as shown in Table 1. We collected IMU-L data on the eleven activities at 100 frames per cycle because at least 100 frames of data are required during one activity cycle. IMU-L data were obtained via laboratory experiments, and five males with no abnormalities in health participated in the experiment (age: 32 ± 7, height: 172.5 ± 5.5cm, and weight: 80 ± 19kg).
Five people participated in the data collection, and each person performed each of the eleven activities 150 times. Therefore, we collected a total of 8,250 datasets (five subjects × eleven activities × 150 samples). Examples of acquisition experiments and samples of acquired IMU-L data are shown in Appendix B.

B. MOBILE ROBOT SYSTEM TO DOUBLE-CHECK FALL DETECTION
In this study, when a potential user fall is detected by IMU-L sensor data analysis, a mobile robot system is used to doublecheck the fall situation of the user using an RGB camera. The robot used was a Silbot-3 [78]- [80], developed and sold by Robocare Corp. of South Korea.
The Silbot-3 is fully compatible with the Robot Operating System (ROS) [81]- [83]. Because Silbot-3 uses omnidirectional wheels, it can move in any direction at a maximum speed of 1 m/s and is equipped with an ASUS Xtion Pro Live [84], [85] RGB-D sensor that is positioned 76 cm from the ground. The ASUS Xtion Pro Live sensor includes an RGB camera whose maximum image resolution is 640 × 480 pixels. We utilized the OpenNI-based image_view package in the ROS to acquire RGB images at a rate of 30 Hz.

C. DATA ACQUISITION USING THE ROBOT-MOUNTED RGB-D SENSOR
Using the robot, we acquired two categories of RGB images: ''Fall'' and ''Non-Fall'' events. In ''Fall'' datasets, we increased a variety of situations by including a fallen person with people who were standing or sitting on a chair and several objects. Image samples for each category are shown in Fig. 3. The fall category includes images where the user is lying on the ground. This category includes both full and partial body images and contains images in various poses. In addition, to ensure the diversity of the data, the participants moved to different locations with floors of different colors and patterns, and we gathered data at a distance of 1 m to 2.5 m around the participants. The non-fall category contains images of situations where the user is not lying on the ground, such as when one or more users are standing or sitting on a chair. The RGB dataset contains images of five subjects totaling 5,000 images (five subjects × two categories × 500 samples). The subjects who participate in RGB image data collection are the same subjects for IMU-L data collection. The RGB image dataset is summarized in Table 2.

D. SYSTEM INTEGRATION USING ROS
ROS is a system for controlling several components of a robot from a system, such as a PC. ROS consists of several independent nodes, all of which can communicate with each other through publish/subscribe messages. ROS is open source, does not require multiple PCs to operate multiple nodes, and has the advantage of being able to run each node with a different OS. There are a total of 4 independent nodes in this system, and they exchange messages with each other by registering with the ROS Master. First, the 'IMU-L Sensor Node'' publishes the message received from the IMU-L sensor connected to the serial port as an ROS message. The ''Device Node'' controls the wheel of the robot, and the ''Camera Node'' publishes the RGB image received from the camera connected to the robot as an ROS message. Finally, ''Fall Detection Double-check Node'' detects a fall using data subscribed from ''IMU-L Sensor Node'' and ''Camera Node''. The ROS integration architecture developed in this study is shown in Fig. 4.

IV. METHODS
In this section, we first explain our proposed fall detection method using the IMU-L sensor and the robot's RGB sensor. We also describe how the IMU-L and RGB data are analyzed using the RNN and CNN algorithms, respectively.

A. PROPOSED METHODS TO DOUBLE-CHECK FALLS
We propose a method to double-check falls initially detected using an IMU-L sensor worn by the user with an RGB sensor mounted on a robot. The process designed to detect falls is shown in Fig. 5; the details are presented as follows: • A subject wearing the IMU-L sensor randomly repeats the eleven actions mentioned in Table 2: four types of falling, two types of lying down, jumping, standing, picking up an object, sitting, and walking. Each of the four subjects repeats every action a total of 150 times for every 100 frames of the IMU-L sensor. The collected datasets for every 100 frames of the IMU-L sensor are input to the trained RNN-based fall detection model, and the results are monitored. If the user's fall is not detected, the IMU-L sensor data are used to continuously track the user's movements in real time.
• If a fall is detected in the first step, the robot moves to the corresponding area using the location information from the IMU-L sensor.
• After moving to the user's expected fall area, the robot acquires images using the RGB sensor mounted on the robot.
• The RGB images are input into the CNN algorithm and used to determine whether a fall truly occurred (double-check).
When a fall is detected in the double-check step, our system takes action such as sending a rescue signal.

B. RNN NETWORK FOR CLASSIFICATION OF SEQUENTIAL MOTION DATA
The human motion data from the IMU sensor used in this research are sequential, time-dependent data. The IMU sensor data collect three-axis acceleration data and three-axis gyroscopic data. Because human activities represented by theIMU sensor data are not uniform in length or in time of occurrence, we explored using the RNN family of algorithms to classify human activities. In addition, the fall detection performance of linear regression, logistic regression, and multilayer perceptron (MLP) algorithms was compared with that of the RNN family of algorithms.
Because the length of human activities varies, it is difficult to predict the exact duration of any given activity. Furthermore, the basic RNN algorithm does not fully cover long-term dependencies [86]. This means that when learning from data using a basic RNN, if the distance between the relevant information and the point used is long, the gradient gradually decreases during backpropagation, which reduces the learning ability. Therefore, we also tested improved RNN architectures such as the LSTM and GRU models to overcome this disadvantage.
An LSTM is a special type of RNN that enables learning that involves a long dependency period. An LSTM is a structure that adds a cell state to an RNN's hidden state. The cell state is updated using input and forget gate values. Because the cell state acts as a type of conveyor belt, the gradients propagate relatively well for quite some time even after the state has elapsed. A GRU performs similarly to LSTM and is designed to overcome the long-term dependency weakness similarly to the method used in an LSTM. GRU uses reset and update gate values to control the amount of information, but its structure is different from that of an LSTM. A GRU performs almost identically to LSTM but has the advantage of a much simpler structure. In this study, we compared the performances of basic RNN, Bi-RNN, LSTM, and GRU models, all of which are representative algorithms of the RNN family, to analyze user motion data. Additionally, we analyzed motion data via linear regression, logistic regression, and MLP.
The structure of the IMU-based fall detection algorithm is shown in Fig. 6. The acceleration and gyroscope values (a total of six values) form the input data to a four-layer RNN structure. The hidden state values then pass through fully connected and softmax layers for classification. In the fall detection task, correctly recognizing a fall is much more important than recognizing other activities. If a fall were misclassified as a non-fall, a serious situation could result, and the proposed double-check method would be rendered useless. Therefore, we trained the model by maximizing the fall detection accuracy, although this might cause other activities to be misclassified as falls more frequently. We gave more weight to the fall datasets during the training as follows: where p i , l, w i and y i denote the output of the fully connected layer, the loss for a single dataset, and the weight assigned to the dataset, respectively. Softmax is the most commonly employed component for efficiently training a deep neural networks-based classifier. Because the proposed method is a binary classification method, the weighted softmax cross-entropy loss in (2) can be transformed into (3). The p 1 and p 2 terms represent the confidence associated VOLUME 9, 2021 with fall and non-fall activities, respectively. To give more weight to the fall datasets, w 1 (weight for the fall category) should be larger than w 2 (weight for the non-fall category).
To determine the optimal weight, we evaluated the learning performance using leave-one-person-out cross-validation (LOPO CV). LOPO CV is a kind of k-fold cross-validation.
In this study, we applied data from four subjects for training and one subject for testing. We performed a total of 5 crossvalidations and calculated the accuracy by taking the average of the sum of each accuracy. As seen in Table 3, the best performance value is when an LSTM with a weight of 80:1 (fall:non-fall) is used. Therefore, we set the value of w in (3) to 80:1 for the LSTM.

C. CNN FOR CLASSIFYING RGB IMAGE DATA
We conducted a double-check to confirm a user's fall when a fall was detected from the user's IMU-L sensor data. The data used in the double-check step are RGB image data, and the process analyzes the form of a person shown in a single RGB image. A CNN algorithm is suitable for analyzing such single-frame images. Due to fall-focused training, non-fall events might be more frequently classified as falls; however, we solve this problem by using secondary fall detection based on RGB images.
In the CNN algorithm, the convolutional and pooling layers serve to extract features from the input data. A convolutional layer is an essential element that reflects the activation function after filters are applied to the input data.
Including pooling layers after the convolutional layers is optional. The last part of the CNN algorithm adds a fully connected layer for image classification. Between the layers that extract the image characteristics and the layers that classify the image, a flattened layer is placed to form an array of data in the form of the image. A CNN is a high-performance deep learning algorithm, but it requires a substantial amount of training data and lengthy training times. In fact, training a large CNN typically requires more than a few days, even on high-performance computers. Therefore, we performed fine-tuning, a transfer learning method, using a pretrained model to overcome these disadvantages. Fine-tuning refers to a method of transforming the architecture for a new purpose (according to task-specific image data) based on a model that has been pretrained and updating the training from the already trained model weights, as shown in Fig. 7. Using this approach, we did not use the acquired RGB images to train a new model de novo; instead, we used the 1K-class Imagenet dataset [87], which is published in Pytorch, and fine-tuned a pretrained CNN model. Class labels of the 1K-class Imagenet dataset are shown in [88].   Fig. 8 shows, the system converges to better learning results with fine-tuning than without fine-tuning.
We performed fine-tuning using various CNN models whose performance abilities have already been demonstrated in the ImageNet Challenge. The first algorithm used was AlexNet, which is considered the first CNN model to produce meaningful results. The dropout technique used in this model has become standard practice in this field. Since AlexNet was first proposed, attempts have been made to deepen it by adding layers or to augment each individual layer via algorithms such as VGGNet and Inception. However, deeper models may experience the vanishing gradient problem, in which the gradient is not transmitted well during backpropagation. To solve this problem, the ResNet and DenseNet models included a new type of block, called a residual block, that allows the gradient to pass through. In this study, we attempted to improve the performance of the fall detection method using fine-tuning, which allowed us to reuse the pretrained CNN models described above.

V. EXPERIMENTAL RESULTS
In this section, we present the experimental results obtained via the proposed methods. First, we evaluated the performance of the IMU-based fall detection method using an RNN-based classifier. Then, we assessed the performance of RGB-based fall detection using various CNN algorithms and compared the results. Finally, we evaluated the proposed double-check method, which uses the best IMU-based fall detection RNN model and the best RGB-based fall detection CNN model. In all the experiments, we calculated the model accuracy, precision, sensitivity, and F1-score as [89]- [91]: where TP, TN, FP, and FN denote true positives (classifying a fall as a fall), true negatives (classifying a non-fall as a nonfall), false positives (classifying a non-fall as a fall), and false negatives (classifying a fall as a non-fall), respectively.

A. IMU-BASED FALL DETECTION
The proposed double-check method employs IMU-based fall detection as the first step. We developed four RNN-based binary classifiers (basic RNN, bi-RNN, LSTM, and GRU) and three non-RNN-based binary classifiers (linear regression, logistic regression and MLP). We input the 8,250 IMU data samples (five people × eleven activities × 150 datasets) into the binary classifiers. We fixed the number of data frames at 100. We verified the performance by LOPO CV. The IMU-based fall detection results are shown in Table 3. Among the seven binary classifiers, the LSTM-based classifier achieved the best accuracy. The GRU-based classifier yielded results similar to those of the LSTM, but its performance was slightly poorer. The linear regression, logistic regression, MLP, basic RNN and bi-RNN models exhibited poor fall detection performance. The GRU and LSTM consider more variables than do the linear regression, logistic regression, MLP, basic RNN and bi-RNN, and their structures are more complicated. Therefore, it is easier for them to classify long-term data, and they achieve better performance in IMU-based fall detection.
In IMU-based fall detection, sensitivity is the most important metric because it is much more important to classify a fall as a fall than to classify a non-fall as a non-fall. Therefore, we deliberately assigned a higher weight to falls during the training. Note that it was expected that the total accuracy would be decreased as a result. Table 4 shows the accuracy,  precision, recall, and F1-score of the experiments that showed the highest accuracy in seven binary classifiers.
We were able to achieve a sensitivity of 100% using the GRU-based and LSTM-based classifiers, of which LSTMbased classifiers achieved higher accuracy. On the other hand, the precision was only 45.17%, which meant that there was a 54.83% probability that the model would classify the input IMU data as indicating a fall when no fall had actually occurred. Consequently, a fall detection system developed using only an IMU-based method will waste time and energy because of false alarms. To overcome these problems, we use IMU-based fall detection only for the first check. When a fall is detected using the IMU-based fall detection model, the system does not activate an alarm. The final decision rests with the secondary fall detection model, which uses the RGB data and the CNN-based classifier.

B. RGB-BASED FALL DETECTION
After performing IMU-based fall detection, we use an RGB image to conduct the second step in our double-check method. We developed a classifier that differentiates between fall and non-fall images by testing six CNN-based models: Inception-v3, AlexNet, SqueezeNet, DenseNet, VGGNet, and ResNet. Specifically, we used the DenseNet-121, DenseNet-161, and DenseNet-203 models; the VGG-11, VGG-13, VGG-16, and VGG-19 models; and the ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152 models. We used a total of 5,000 (five subjects × two states × 500 datasets) RGB image samples to train and validate the CNN models. CNN-based fall detection was also subjected to LOPO CV, in which 4,000 data points were applied for training and 1,000 data points (from the subject not included in the training set) applied for the test. The entire process was repeated five times (for a total of 5 subjects). The fall detection results based on the RGB images are shown in Table 5.
Regardless of which model that we employed, we almost achieved minimum performance accuracies of 90%, but DenseNet-121 achieved an accuracy of 97.72%, which was the best result. DenseNet performed better than the Inception, AlexNet and SqueezeNet models because it uses residual blocks to reduce the residuals. Specifically, DenseNet-121 exhibited excellent performance; it had fewer vanishing gradient problems than the other DenseNet models because it has shallower layers than those models.  samples had not been used previously. First, 1,650 IMU sensor samples (five subjects × eleven activities × 30 datasets) were used to detect fall and non-fall events. In each case, the RGB data were obtained along with the IMU sensor data. Fall detection using RGB images is the final step in detecting falls in the proposed method. Therefore, fall detection using RGB images was performed ten times to minimize errors at this stage. This was possible because fall detection using RGB images is performed using a single RGB image.
When using the IMU data to detect a fall, we adopted a higher threshold for detecting a fall; therefore, cases occurred in which a non-fall situation was detected as a fall. As shown in Table 6, when only the IMU data were employed, 662 of the 1,050 non-fall situations were detected as fall situations (FP).
We solved this problem by applying the proposed method. Under the double-check fall detection method, when a fall   was detected from the IMU sensor data, an RGB image was used to double-check whether a fall had occurred. Since RGB-based fall detection is performed only when a fall is detected in IMU-based fall detection, a total of 1,262 RGBbased fall detections are performed, and the results are shown VOLUME 9, 2021 in Table 7. However, as shown on the left side of Table 7, when fall detection using the RGB image was performed only once, 31 cases were detected as non-fall events even after a fall had been detected from the IMU sensor data. In addition, 11 cases were detected as falls even though no falls had actually occurred. To solve this problem, we performed fall detection using the RGB image ten times and then selected the detection result with the highest probability. As a result, we were able to eliminate misdetection using RGB images, as shown on the right side of Table 7. In addition, we were able to use this double-check method to reduce the number of occurrences of false negatives and false positives from the IMU data to 0, as shown in Table 8. Using the double-check method, we were able to increase the fall detection accuracy achieved with the IMU sensor data from 59.88% to 100.0% in our experimental environment.

VI. CONCLUSION
In this paper, we propose the double-check method using the IMU-L sensor and mobile robot. Since previous studies could not perfectly detect falls by using only an IMU sensor attached on the body, we attempted to overcome this problem by adding a location sensor to the wearable sensor and an RGB sensor on the mobile robot. We adopted the LSTM model to analyze IMU data and selected the DenseNet-121 model to analyze RGB data and achieve the best performance. In IMU-based fall detection, we maximized the precision by using a larger weight for the fall data compared to the non-fall data. Although non-fall activity was classified as a fall by using the IMU data, the mobile robot could move to the site of the fall and confirm the fall with great accuracy. As a result, we achieved perfect performance in detecting a fall in our experimental environment. The proposed method increases the cost and time of falls but could minimize the occurrence of false alarms and maximize the fall detection precision. Our fall detection system can be applied to large spaces for elderly care, such as nursing hospitals and health centers, where robots can easily move. Multiple robots can be deployed to cover each floor in these places. Furthermore, the system may also be installed in a one-story house with elderly occupants. Indoor care robots can widen their roles to address emergency situations by applying our system.

VII. FUTURE WORKS
Since we tested our method in a limited laboratory environment, we were able to obtain optimal results. In a lessconstrained environment, the precision may be lower. It is expected that follow-up studies will show that considering additional features, such as time or spatial information, will overcome the limitations imposed by real-world environments. Furthermore, the risk of falls may be very low if a person is lying on a bed or sofa or if a person stands up and walks after a fall is detected. If the robot is dispatched to reconfirm the fall even in these less dangerous cases, many unnecessary costs may be incurred. We will improve on results in terms of cost reduction and efficiency by applying a fall detection method that includes this situational information.

APPENDIX A
The proposed fall detecting system has been demonstrated in several places, such as laboratories, aisles, conference rooms, offices, nursing hospitals and health centers. Our system successfully detected human falls in all places, so we verified that the system could work with other environments and backgrounds, which were not trained. Fig. 9 shows the demonstration examples of the proposed fall detection system.

APPENDIX B
The participants in this experiment were five healthy adult males. Data of four types of falls (forward, backward, falling on knee and falling on hip) and seven types of non-falls (laying supine, laying prone, picking up an object, standing, walking, jumping and sitting) were collected. Examples of acquisition experiments and samples of collected IMU-L data are shown in Fig. 10 and Fig. 11, respectively. VOLUME 9, 2021 FIGURE 11. Examples of IMU-L sensor data. VOLUME 9, 2021