Internet of Things and Deep Learning Enabled Elderly Fall Detection Model for Smart Homecare

Recently, the techniques of Internet of Things (IoT) and mobile communications have been developed to gather human and environment information data for a variety of intelligent services and applications. Remote monitoring of elderly and disabled people living in smart homes is highly challenging due to probable accidents which might occur due to daily activities such as falls. For elderly people, fall is considered as a major reason for death of post-traumatic complication. So, early identification of elderly people falls in smart homes is needed to increase the survival rate of the person or offer required support. Recently, the advent of artificial intelligence (AI), IoT, wearables, smartphones, etc. makes it feasible to design fall detection systems for smart homecare. In this view, this paper presents an IoT enabled elderly fall detection model using optimal deep convolutional neural network (IMEFD-ODCNN) for smart homecare. The goal of the IMEFD-ODCNN model is to enable smartphones and intelligent deep learning (DL) algorithms to detect the occurrence of falls in the smart home. Primarily, the input video captured by the IoT devices is pre-processed in different ways like resizing, augmentation, and min-max based normalization. Besides, SqueezeNet model is employed as a feature extraction technique to derive appropriate feature vectors for fall detection. In addition, the hyperparameter tuning of the SqueezeNet model takes place using the salp swarm optimization (SSO) algorithm. Finally, sparrow search optimization algorithm (SSOA) with variational autoencoder (VAE), called SSOA-VAE based classifier is employed for the classification of fall and non-fall events. Finally, in case of fall event detected, the smartphone sends an alert to the caretakers and hospital management. The performance validation of the IMEFD-ODCNN model takes place on UR fall detection dataset and multiple cameras fall dataset. The experimental outcomes highlighted the promising performance of the IMEFD-ODCNN model over the recent methods with the maximum accuracy of 99.76% and 99.57% on the multiple cameras fall and UR fall detection dataset.


I. INTRODUCTION
In recent years, the Internet of Things (IoT) and mobile communication find useful in healthcare sector. With an enhanced healthcare system in several countries, average life span has developed considerably. Plus lower natural increases result in an elderly population that would need appropriate care The associate editor coordinating the review of this manuscript and approving it for publication was Chi-Hua Chen . and more interest. But, in several countries, offering appropriate care could be challenging because of several reasons. The impaired and elderly populations would shortly live in smart homes [1], [2]. These homes offer a pleasant and safe place for the elders. Independently, security is considering the main concern in the smart healthcare model [3]. However, daily emergency incidents will also continue to occur due to seniors' human nature. Falling is the most common problem encountered by elder peoples. For elder adults, a fall VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ could be highly risky and might cause serious health issues. Additionally, lack of balance and fall might be symptoms of a life-threatening disease. Nevertheless of the cause for a fall, it can be critical if it happens, the injured people must obtain quick help. Frequently, the individual might not be able to rise up with no support and might require immediate medical consideration. Unreported cases result in the fall of injury that may involve earlier treatments. Fear of falling increases the negative post fall effects and may decrease patient confidence [4]. Consequently, it limits the patient's activities, decreases social interaction, and finally causes depression [5], [6]. Respectively, it aids to decrease treatment costs and raise the opportunity of recovery. In [7], researchers have divided fall detection systems into 3 classes regarding cameras, wearable devices, and ambiance sensors. The system is depending upon wearable device seems to be common as they could identify a fall precisely nevertheless of the patient locations (viz., outdoor & indoor) and don't interrupt the person's privacy and day-to-day activities. Because of their asset limitations (for example storage capacity & limited power), it must have an innovative scheme that assists to decrease computation heavier loads on wearable sensor nodes, when preserving/enhancing the QoS.

A. NEED OF IoT ENABLED AI TECHNIQUES FOR FALL DETECTION
The independent life of elder persons could be altered significantly afterward a fall. Based on health state of the elders, nearly ten percent of the persons fall would endure severe injuries, or may even pass away straight afterward a fall when no intermediary aid is presented [8], [9]. For preventing the serious effects of this fall, consistent fall detection is required. The most popular method for detecting falls is wrist worn detection system which measures the acceleration force. These wrist devices are attaining more interest over the population and become gradually stronger based on computation efficiency that the utilization of AI is moderate. Generally, elder person appears to be attentive in utilizing these devices while they reveal concern on privacy and understand accurately when the device is processing at certain times [10]. Various fall detection methods have been presented in previous years. This method ranges from simple threshold based techniques, on handcrafted feature based ML technique, and lastly to DL based automated feature extraction NN.
IoT is the most appropriate candidate for this system since it contains broad innovative techniques like WSN, CC, and sensing to interconnect virtual objects using physical objects. As the gateways could execute difficult fall detection techniques like discrete wavelet transform/data mining. Additionally, smart gateway helps to enhance QoS by offering innovative services viz. local storage to store temporary data/push notification to inform anomaly in real world. It is predictable that IoT could widely assist in reducing power consumption of wearable devices with the allocation of tasks. But, IoT could not often assurance a higher level of energy efficacy in wearable devices. Another main problem is data transmission and data acquisition cause higher energy utilization in wearable sensor nodes should be considerately deliberated. If a wearable sensor node is energy ineffective, it might cause untrustworthiness and decreases QoS.

B. PAPER CONTRIBUTIONS
This paper presents an intelligent IoT enabled elderly fall detection model using optimal deep convolutional neural network (IMEFD-ODCNN) for smart homecare. At the initial stage, the input video captured by the IoT devices is pre-processed in different ways like resizing, augmentation, and min-max based normalization. Moreover, SqueezeNet model is used as a feature extractor and its hyperparameters are tuned by the use of SSO algorithm. Furthermore, sparrow search optimization algorithm (SSOA) with variational autoencoder (VAE), called SSOA-VAE based classifier is employed. The SSO algorithm is preferable owing to its high efficiency, robustness, accuracy, and convergence rate. The VAE is chosen because of the capability of learning smooth latent state representations of the input data. Lastly, in case of fall event detected, the smartphone sends an alert to the caretakers and hospital management. An extensive set of simulations is carried out on UR fall detection dataset and multiple cameras fall dataset. The key contribution of the paper is given as follows.
• Propose a novel IMEDF-ODCNN model for elderly fall detection in smart homecare • Develop a hyperparameter tuned SqueezeNet based feature extractor with SSO algorithm to generate useful set of feature vectors • Design an SSOA-VAE based classification model to detect the occurrence of fall and non-fall events • Enables the smartphone to generate an alert to the caretakers and hospital authority on the occurrence of fall event • Validate the fall detection performance of the IMEFD-ODCNN model on UR fall detection dataset and multiple cameras fall dataset.

C. PAPER ORGANIZATION
The rest of the paper is organized as follows. Section 2 briefs the existing fall detection approaches and section 3 describes the overall system architecture. Then, section 4 explains the different modules involved in the proposed IMEFD-ODCNN model. Next, section 5 assesses the experimental results and section 6 draws the concluding remarks.

II. LITERATURE REVIEW
Hussain et al. [11] presented a wearable sensor based continuous fall monitoring scheme that can detect falling and identify fall patterns and the activity related to fall incidents.  [15], a sensing module combined energy efficient sensor was established that could sense and store the information of human activities from sleep mode, and interrupt driven technique is presented for transmitting the data to a server combined with Zigbee. Next, an FD-DNN operation on the server is designed carefully for detecting accurate falls. The FD-DNN integrated CNN alongside LSTM techniques was verified using offline & online datasets. In Kong et al. [16], a HOG-SVM based fall detection IoT scheme for elder adults was presented. For ensuring privacy and strong modifications of the light intensity, deep sensor is utilized rather than RGB camera for getting binary images of elder adults. Afterward attaining the denoised binary images, the features of person are extracted using the histogram of oriented gradient, and the image classification is executed to judge the fall condition using linear SVM.
Carletti et al. [17] proposed a new smartphone based fall detection scheme that considers falls as abnormalities regarding a module of usual events. This technique is related to other methods and it is demonstrated to be appropriate to operate on a smartphone located in the trouser pockets. This outcome is established from the attained accuracy and essential hardware assets. Mrozek et al. [1] proposed a scalable framework of a scheme that could observe 1000s of elder persons, identify falls, and inform the care takers. Scalability test discloses the need for enabling large scale scheme processes have been executed. Furthermore, they authenticated various ML modules for evaluating their appropriateness in the detection procedure. Amongst the tested modules, Boosted Decision Tree results in the optimal classification efficiency.
For improving the classification accuracy, the data from smartphones & smartwatches are integrated into [18]. There aren't various publicly available datasets integrating data from smartphones & smartwatches. Henceforth, the data would be independently gathering. The DT (J48) classification would be utilized for classifying the falls. Gia et al. [19] proposed the implementation of lightweight, tiny, energy efficient wearable, and flexible devices. Though several methods are available in the literature, it is needed to examine distinct variables (for example transmission protocol, communication bus interface, transmission rate, and sampling rate) impact on energy utilization of the wearable device. Additionally, they give complete analyses of energy utilization of the wearable in distinct configurations and operating situations. Also, it is give suggestions (software & hardware) for implementing an optimum wearable device to IoT based fall detection system based on higher QoS and energy efficiency.

III. SYSTEM ARCHITECTURE
The overall system architecture of the proposed model is depicted in Fig. 1. The proposed fall detection model uses a smartphone for processing. The IMEFD-ODCNN model allows smartphones and intelligent DL algorithms to detect the occurrence of falls in the smart home. The proposed IMEFD-ODCNN model involves distinct stages of operations like data acquisition, pre-processing, SqueezeNet based feature extraction, SSO based parameter tuning, and SSOA-VAE based classification. Primarily, the input videos are captured and are sent to the cloud server for additional processing where the proposed model gets executed. Then, the video frames are split and are pre-processed in three major levels such as resizing, augmentation, and normalization to enhance the quality of the video frames. Afterward, the features from the video frames are extracted to derive useful feature vectors using SqueezeNet model. Moreover, the hyperparameter tuning of the SqueezeNet model takes place using the SSO algorithm. Subsequently, the feature vectors are fed into the SSOA-VAE based classifier model to detect the occurrence of falls. Based on the classifier results, the subsequent actions will be performed. According to the value of classification outcome, the following actions are taken: • When an event is detected as a fall and is denoted as class 1, an alarm is transmitted to the patient device from where the caretaker can be notified automatically if the fall was not excluded from the application by the monitored person. VOLUME 9, 2021 • When an event is detected as non-fall event and is represented as class 0, no alarm will be transmitted and the event occurrence is discarded.
By the use of backend systems, the physicians/caretakers could observe the elderly people in real time from remote areas. Besides, the backend system aid doctors to treat diseases using the offered data and patient history.

IV. WORKING PROCESS OF IMEFD-ODCNN MODEL
The overall working process of the IMEFD-ODCNN model involves different subprocesses data acquisition, preprocessing, SqueezeNet based feature extraction, SSO based parameter tuning, and SSOA-VAE based classification. The detailed working of these processes is discussed in the succeeding subsections.

A. DATA PRE-PROCESSING
In the beginning stage, the frames were pre-processed for improving the characteristics of an image, removal the noise artefacts, and improve specific groups of features. At this point, the frames were processing from 3 important levels namely resizing, augmentation and normalization. In order to decrease the calculation cost, the resizing of frames occur from 150 × 150. At the same time, the frames are augmented where the frames are changed at all training epochs. For augmenting the frames, various models like zooming, horizontal flipping, rotation, width, and height shifting. At last, normalization technique was implemented to enhanced generalization of the model.

B. SqueezeNet BASED FEATURE EXTRACTION
The CNN generally contains full connection layer, convolutional layer, and pooling layer. Initially, the feature is extracted with more than one pooling & convolution layer. Later, entire feature mappings from the latter convolution layer are converted to 1D vectors for full connection. Lastly, the output layer categorizes the input images. The network alters the weight variables using BP and minimizes the square variance among the classification outcomes and predictable output. The neurons in every layer are ordered in 3D: depth, width, and height, where height & width is the size of neuron, and depth denotes channel amount of the input image/the amount of input feature mappings. The convolutional layer has many convolution filters, extract distinct features from the image using convolution process. The convolution filter of the present layer convoluted the input feature mappings for extracting local features and attain the output feature mappings. Later, the nonlinear feature mappings are attained with activation function. The pooling layer, so called subsampling layer, is behindhand the convolutional layer. It executes down sampling process, with a certain value as output in specific regions. With the removal of insignificant instance points from the feature map, the size of input feature map of the following layer is decreased, and the computation complexity is also reduced. Simultaneously, the flexibility of the network to the modifications of image rotation & translation was also raised [20]. The general pooling operation contains average and maximal pooling. The framework is denuding upon pooling & convolutional layers could enhance the strength of the network module. The CNN could expand by multilayer convolutions. By amount of increasing layers, the features attained via learning becomes global. Eventually, the global feature map learned is converted to a vector for connecting full connection layer. All variables in the network module are in the full connection layer.
Since the number of variables for VGGNet & AlexNet is increasing, the SqueezeNet network module was presented that has minimal variables when maintaining accuracy. The fire model is the fundamental model in SqueezeNet, and its structure is displayed in Fig. 2. This model is separated to Expand & Squeeze frameworks. The 1 × 1 convolutional layer has gained more interest in the deliberation of network structure. The works explain from the perception of cross channel pooling where MLP is equal to the cascade cross channel parametric pooling layer behindhand the conventional kernel, therefore attaining a linear integration of multiple feature maps and data incorporation over the channels. If the number of output & input channels are larger, the convolution kernel variable becomes larger. They include 1 × 1 convolution for all inception modules, decreasing the amount of input channels, and the convolution kernel variables and complexity operation is reduced. Finally, a 1 × 1 convolution is included for improving the number of channels and improve feature extraction. If the sampling reduction process is delayed, a large activation graph is given to the convolutional layer, whereas the large activation graph maintains additional data that could give high classification accuracy [27]- [29].

C. HYPERPARAMETER OPTIMIZATION USING SSO ALGORITHM
For tuning the hyperparameters of the SqueezeNet model, the SSO algorithm is applied to optimally adjust the hyperparameters involved in it. The salps included the set of Salpidae that comprises a visible barrel shaped body. The tissues are identical to jellyfishes. As well, the motion is identical to jelly fish, when the water is inspired by a body as propulsion and goes in the forward direction. A mathematical depiction of swarming behaviors & population of salps is determined. In addition, no mathematical method of salp swarm is utilized for resolving optimization problems were swarms of fishes, bees, and ants are widely applied and labeled to solve the enhanced problem. For modeling the salp chain mathematically, the population is categorized by two classes such as Follower and Leader. Firstly, leader is considered to be salp at the front phase of a chain, whereas the residual salp is so called follower. According to the names, the salps represents leader guide the swarm where the follower follows each other. Compared with other swarm based modules, the place of salps is determined as n-dimension search space where n represents variable amount of the employed problem. Henceforth, the position of salps is kept in a two dimensional matrix and so called x. Furthermore, it consider a food source so called F whereas search space is the swarm target. For upgrading the leader position, it is represented by.
where as x 1 j denotes location of primary salp in jth dimension, F j represents position of food source in jth dimension, ub j indicates maximal bound of jth dimension, lb j denotes minimal bound of jth dimension, c 1 , c 2 , and c 3 denotes random values [21].
Eq. (1) represents leader is preferred to update the location regarding food sources. The coefficient c 1 is more important feature in SSA since it deals with the exploitation and exploration as determined by: whereas l denotes current iteration where L indicates high amount of iteration. The attribute c 2 and c 3 are determined as random values which are generated uniformly with zero and one. The forthcoming location in jth dimension is negative/positive infinity and step size. To upgrade the position of follower, the given function is employed: whereas i ≥ 2, x i j denotes position of ith follower salp in jth dimension, t indicates time, v 0 represents primary speed, and a = ν final ν 0 where ν = x−x 0 r . Because the time in optimization denotes iteration, the difference between iterations is one, and assume that v 0 = 0, whereas the functions are employed by: whereas i ≥ 2 and x i j denotes location of ith follower salp in jth dimension. By Eqs. (1) and (4), salp chain can be speeded.

D. FALL DETECTION USING SSOA-VAE MODEL
During the classification stage, the SSOA-VAE model gets executed to determine the class labels of the input video frames, i.e. non-fall or fall event. A VAE is a variation of AE rooted in Bayesian inference. It can module the fundamental distribution of observation p (z) and generates novel data by presenting a group of latent arbitrary parameters z. They could denote the procedure as p (x) = p (x|z) p (z) dz. But, the marginalization is computationally intractable as the search space of z is constant and combinatorically larger. Instead, they could denote marginal log probability of a separate points log p (x) = D KL (q ϕ (z|x)||p θ (z)) + L vae (ϕ, θ; x) with representation from [22], whereas D KL denotes Kullback Leibler divergence from previous p θ (z) to the variation calculation q ϕ (z|x) of p (z|x) and L vae indicates variation lower bound of the data x with Jensen's inequality. Noted that ϕ and θ denote variables of the encoder & decoder, correspondingly. Fig. 3 demonstrates the structure of VAE. A VAE enhances the variables, ϕ, and θ , with the maximization of lower bound of the log probability, L vae , The initial word standardizes the latent parameter z with the minimization of KL divergence among estimated previous and posterior of the latent parameter. Next the recreation of x with the maximization of log probability log p θ (x|z) by sampling from q ϕ (z|x). The selection of distribution kinds is significant as VAE modules the estimated posterior distribution q ϕ (z|x) from previous p θ (z) and probability p θ (x|z). A usual selection for the posterior is Gaussian distribution, N (µ z , z ), whereas a typical standard distribution N (0, 1) is utilized prior. For the probability, a Bernoulli distribution/multivariate Gaussian distribution is frequently utilized to continuous/binary data, correspondingly.
In order to determine the parameters involved in the VAE (ϕ and θ ), the SSOA is applied in such a way that the fall detection performance gets improved. The sparrows are usually kindly birds and contain several types. They are distributed worldwide and attentive to survive in areas around humans. Similarly, they are omnivorous species and mostly eat seeds and grains. It is usually so called resident in nature. In comparison with other little birds, it is stronger in memory power and creativeness. It has two different types of captive house sparrows such as scrounger and producer. The producers strongly search for the food source, while the scrounger acquires foodstuff from the producer. Furthermore, the proof exhibits that the bird usually exploits behavioural approach adaptably, and shift among scrounger & producer. From the study, it is exposed that the sparrow finds their food by the approach of producer and the scrounger based on circumstances. It is noteworthy that the birds are placed on the edge of the population, are possible to be attacked through predators, and continuously try to attain an optimal location. The sparrows are located on the central may travel to their neighbours for reducing the threat.
Initially, the virtual sparrow is employed to identify an optimum source of food. The residence of sparrow is given by: whereas n denotes amount of sparrow and d indicates direction of variable that should be enhanced. Therefore, the fitness score of all sparrows are defined by [23]: whereas n denotes amount of sparrow, and calculate of entire rows in F X represents fitness score of an individual. In SSA, the producer contains maximal fitness measure that accomplishes an optimum food in search function. Likewise, producer is responsible for exploring food and support the action of entire populations. Hence, the producer can detect food in wide range than scroungers. According to rules (6) and (7), the location of producer is expanded by: whereas t indicates current iteration, j = 1, 2, . . . , d.X t j,j denotes rate of jt dimension of it sparrow at iteration t.iter max represents constant with maximal iteration. α ∈ (0, 1] denotes random value. R 2 (R 2 ∈ [0, 1]) and ST (ST ∈ [0.5, 1.0]) represents an alarm value and security threshold respectively. Q denotes arbitrary value that employs simple distribution and L indicates matrix 1 × d for entire components with one. While R 2 < ST , denotes no predator exists, and producers get to wide search mode. Once R 2 ≥ ST , then some sparrows have created the predator, and it is important to protect them by flying to safe region.
For scrounger, it employs rules (9) and (10). Few scroungers follow the producer obviously. When the producer finds an optimal food, then it leaves the location for competing to food. When the competitions are effective, then they could attain the food of producer, or rules (10) is executed.
The position upgrades formal for scrounger is given by: whereas X P denotes optimum location employed with a producer, X worst indicates current global worst location, A showcase a matrix of 1 × d for a component with one, and A + = A T (AA T ) −1 . If i > n/2, it recommends that it scrounger with unsuccessful fitness is highly starving.
Consequently, the sparrows are away from danger will contain further lifespan. The main position of sparrow is created arbitrarily in the population. Depending upon, the arithmetical technique is given by: whereas X best denotes current global optimum position. β, indicate step size control variable that is a standard distribution of random values with mean value of zero and a variance of one. K ∈ [−1, 1] denotes arbitrary measure. In this module, f i denotes fitness value of current sparrow. f g and f w indicates current global optimum and worst fitness measures. ε indicate minimal constant and remove zero division error. In event of easiness, when f i > f g determines sparrow is at edge of a group. X est indicates location of a centre of population that is secure. Now, f j = f g indicates sparrow, is in the center of a population which is attentive from danger and travels nearer to the border. K denotes direction whereas sparrow moves and step size control coefficient.

V. PERFORMANCE VALIDATION
The proposed model is validated using Multiple cameras fall dataset [24] Fig. 4 shows the sample test images from the dataset. Besides, the depth level of the images from the sample test images is illustrated in Fig. 5.
The classification results analysis of the IMEFD-ODCNN model on multiple cameras falls dataset is given in Table 1 under varying training size (TS). On the TS of 40%:60%, the IMEFD-ODCNN model has gained a specificity of 99.10%, precision of 99.67%, recall of 99.82%, accuracy of 99.54%, and F-score of 99.46%. Moreover, on the TS of 60%:40%, the IMEFD-ODCNN technique has achieved a specificity of 99.22%, precision of 99.51%, recall of 99.81%, accuracy of 99.80%, and F-score of 99.53%. Furthermore, on the TS of 80%:20%, the IMEFD-ODCNN methodology has obtained a    Table 2 Table 6 and Fig. 9 [26]. From the results, it can be revealed   that the VGG-16 technique has required higher training and testing time of 2352.6s and 1108.8s correspondingly. Simultaneously, the VGG-19 approach has needed a somewhat diminished training and testing time of 2778.6s and 1372.2s respectively.
Also, the ResNet-101 method has resulted in a moderate training and testing time of 1545.6s and 925.8s correspondingly. In the meantime, the ResNet-50 and 2D Conv NN models have showcased moderate testing and training time. Finally, the Depthwise model has needed a competitively reduced training and testing time of 1093.2s and 725.4s respectively. However, the presented IMEFD-ODCNN methodology has outperformed effective results with the training and testing time of 1014s and 677.4s correspondingly. The experimental outcomes highlighted the   promising performance of the IMEFD-ODCNN model over the recent methods with the maximum accuracy of 99.76% and 99.57% on the multiple cameras fall and UR fall detection dataset. The proposed model outperforms the existing VOLUME 9, 2021 methods due to the inclusion of SqueezeNet model and hyperparameter optimization process using SSOA.

VI. CONCLUSION
This paper has designed a new IMEFD-ODCNN model to detect fall events in smart homecare of elderly people. The IMEFD-ODCNN model allows IoT devices and intelligent DL algorithms to detect the occurrence of falls in the smart home. The proposed IMEFD-ODCNN model involves different stages of operations such as data acquisition, preprocessing, SqueezeNet based feature extraction, SSO based parameter tuning, and SSOA-VAE based classification. Once the fall is identified, an immediate alert is sent to the caretakers and hospital management. The utilization of SSO algorithm to select the hyperparameters of the SqueezeNet model and SSOA algorithm for parameter adjustments of the VAE model helps to considerably improve the overall fall detection performance. An extensive set of simulations is carried out on UR fall detection dataset and multiple cameras fall dataset. The experimental results highlighted the promising performance of the IMEFD-ODCNN model over the recent state of art methods. In future, the fall detection performance of the IMEFD-ODCNN model can be improved by the use of advanced DL models for classification process. Besides, scalable and robust versions of the IMEFD-ODCNN model can be developed to assist real time fall detection events from low-quality videos.