Background-Subtraction Algorithm Optimization for Home Camera-Based Night-Vision Fall Detectors

Background subtraction is one of the key pre-processing steps necessary for obtaining relevant information from a video sequence. The selection of a background subtraction algorithm and its parameters is also important for achieving optimal detection performance, especially in night environments. The research contribution presented in this paper is the identification of the optimal background subtractor algorithm in indoor night-time environments, with a focus on the detection of human falls. 30 background subtraction algorithms are analyzed to determine which has the best performance in indoor night-time environments. Genetic algorithms have been applied to identify the best background subtraction algorithm, to optimize the background subtractor parameters and to calculate the optimal number of pre- and post-processing operations. The results show that the best algorithm for fall-detection in indoor, night-time environments is the LBAdaptativeSOM, optimal parameters and processing operations for this algorithm are reported.


I. INTRODUCTION
The risk of falling is one of the most prevalent problems faced by elderly individuals. A study published by the World Health Organization [1] estimates that between 28% and 35% of people over the age of 65 suffer at least one fall each year, and this figure increases to 42% for people over 70. According to the World Health Organization, falls represent greater than 50% of elderly hospitalizations and approximately 40% of nonnatural mortalities for this segment of the population. Falls are a significant source of mortality for elderly individuals in developed countries. Falls are particularly dangerous for people that live alone because of the amount of time that can pass before they receive assistance. Approximately onethird of the elderly (those over the age of 65) in Europe live alone [2], and the elderly population is expected to increase significantly over the next twenty years. The fall detection system proposed by Fallert [3] is based on a low-cost device comprising an embedded computer The associate editor coordinating the review of this manuscript and approving it for publication was Jagdish Chand Bansal. and a camera. Installed into walls or ceilings, this device monitors a room without human intervention. Thus, people monitored at home are not required to wear devices, and the system is capable of 24 h monitoring. Fallert's fall detection system works relatively well (over 96% accuracy) during daylight, but performs poorly at night because of the lack of light. To solve this problem, the inclusion of an infrared emitter and a camera without an IR filter were required. Improvements to the background subtractor algorithm used previously [3] were required because of poor performance under night-time conditions.
Background subtraction is important for many image processing problems and has been extensively studied [4]. Several different background subtraction approaches are compared in [5]; however, the methods studied in this comprehensive review fall short when applied to infra-red video images. This paper analyses the performance of the background subtraction algorithms presented in [5] on night images taken with infra-red light, with the aim of selecting algorithms able to work with infra-red video. Pre-and post-processing improvements to background subtraction VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ accuracy were considered, and a genetic algorithm was implemented for optimal parameter selection. The authors of [5] applied hand-fitted parameters for each of the algorithms, a technique that is relevant for many applications, from traffic cameras to fall-detection of elderly people. In contrast, in this paper, background subtraction parameters of several algorithms are optimized using a genetic algorithm, in order to compare them and select the best background subtraction algorithm for detecting falls in night-time video images. This article is structured as follows: previous work and state of the art in background subtractor algorithms are discussed in sections II and III. The description of the Fallert system is presented in section IV. The methodology is described in section V, and the results and discussion in section VI. Finally the conclusions are summarized in section VII.

II. PREVIOUS WORK
Fall detection technologies can be divided into three categories, as explained by [6]: wearable sensors, ambient sensors, and vision-based technologies.
Most wearable sensors are based on accelerometers and gyroscopes that a person must wear or carry [7]- [11]. Some sensors use a mobile phone as a primary [12] or accessory device [13] to detect falls. These systems have a fundamental problem: they are based entirely on the person carrying the device. On the contrary, ambient sensors and vision-based technologies work independently from any user action. Most currently available commercial devices for fall detection are portable. In fact, the top 10 fall detectors listed on the website ''toptenreviews'' 1 are based on portable devices.
Ambient sensors monitor the environment continuously to detect falls. There are several approaches: technologies based on the interruption of some type of beam or set of beams of infra-red light (laser or not) [14], [15], 3D cameras [16], WiFi signal strength analysis [17], and sound [18], [19] and vibration [20], [21] detection.
Because of their applicability, vision-based systems are one of the most interesting approaches. They rely on artificial intelligence algorithms to analyze images or video taken from cameras. Following the discussion presented in [6], visionbased approaches are focused on real-time execution of a detection algorithm using standard computing platforms and low-cost cameras. There are several methods used to obtain semantic information through video analysis. Many such methods make use of a 2D or 3D model, while others are based on the extraction of some features after video image segmentation of the body. A more detailed explanation of those approaches can be found in [6] where they are classified into the following categories: body and shape change, posture detection, inactivity, spatiotemporal, and 3D head change.
In addition, two types of cameras are mainly used for fall detection: 2D cameras (like the one used in this paper or in [22], and 3D time of flight (ToF) cameras as discussed in [23] and [24]. Although ToF cameras provide more information, they have worse resolution and are more expensive; as a result, traditional cameras remain particularly attractive.
A common feature of most vision-based systems is the use of a background-subtractor algorithm. The segmentation of relevant scene information is a common first step for many computer vision algorithms. The most basic techniques involve subtraction of a background image or registration of scene changes between frames. More advanced algorithms, such as the background subtraction algorithm developed by [25], register the most common colors of each pixel and update learned data over time, consequently exhibiting adaptive capabilities responsive to scene changes over time.
The recognition of specific features can also be used to extract relevant information from a scene. Feature descriptor algorithms, such as histograms of oriented gradients, can be trained to identify certain features of the human body. One example of this application can be found in [26], [27], where a subject's head and body are independently followed, and accurate readings of their relative trajectories over time are found.
Due to their importance, many background-subtractor algorithms (included in the background-subtractor library BGSLIbrary) were studied in [5] under daytime conditions. The current article presents an analysis of background subtractors on indoor, night-time images. Furthermore, the use of background subtractors is focused on a specific application (the detection of falls), while in [5], analysis is performed for more general applications. A more detailed review of background-subtractor algorithms can be found in section III.

III. REVIEW OF THE BACKGROUND-SUBTRACTION ALGORITHMS UNDER ANALYSIS
In the previous section, the importance of the background subtractor algorithms was explained. The background subtractor library (BGSLIbrary) includes many algorithms, some of which were analyzed and compared in [5]. The purpose of this paper is to reanalyze these algorithms (as well as some more recent algorithms), to find the most suitable background-subtractor algorithm to detect human falls under night-time conditions. All background-subtractor algorithms can be found in https://github.com/andrewssobral/bgslibrary. The complete list of the algorithms is shown in Table 1.
Most of the algorithms listed in Table 1 require one or more input parameters. Parameter choice is critical to algorithm performance; thus, we used a genetic algorithm to find optimal parameters in order to obtain the best results with each of the algorithms.
In this section, we present an overview and brief analysis of the algorithms.

A. BASIC METHODS, MEAN AND VARIANCE WITH TIME
Methods based on basic functions usually rely on a single parameter. Briefly, in these methods each frame is compared with a frame selected as the true background, and the difference computed. The true frame is usually selected as the first frame, or the mean of the frames to that point.

B. STATISTICAL METHODS USING A SINGLE GAUSSIAN
In algorithms based on a single Gaussian model, each pixel is modelled with a probability function defined by its mean and standard deviation. With this Gaussian model, the probability of a pixel being background or foreground can be obtained. This method is more robust when illumination changes occur.
The DPWren GA algorithm, which models people in the image and the background, and the LB Simple Gaussian algorithm, which updates the mean and standard deviation with time belong to this category.

C. STATISTICAL METHODS USING MULTIPLE GAUSSIANS
One of the most popular background detection techniques is based on a parametric adaptive mixture of models. This model was first presented in [31], and improved in [48]. In this algorithm, the values of each pixel are modeled by a mixture of Gaussians. These distributions are generally updated using an algorithm of mean minimization that improves the use of the Gaussian distribution (see [30] an [4]). Every time a new frame is processed, the mixtures of Gaussians for each frame are updated.

D. METHODS BASED ON EIGENVALUES AND EIGENVECTORS
The only algorithm considered here that is based on eigenvalues and eigenvectors is the DPEigenbackgroundBGS method ( [33]). This method builds an eigenspace that models the background. The method includes characteristics of surrounding pixels to obtain a more precise description of each pixel.

E. LOCAL BINARY PATTERN (LBP)
The Local Binary Pattern method is a gray-scale invariant presented in the early 1990s. In the original case (see [49]), the original 3 x 3 pixel neighborhood is thresholded by the value of the central pixel. The values of the pixels in the thresholded neighborhood are multiplied by weights given to the corresponding pixels. Finally, the values of the surrounding pixels are summed to obtain the value of this texture unit.

F. LOCAL BINARY SIMILARITY PATTERN (LBSP)
The LBSP is based on the definition of a new characteristic index defined by the following equations: where T d is a similarity threshold and i c corresponds to the intensity of the central pixel (x c , y c ), and i p , the intensity of the pth pixel in the set of neighboring pixels P (see [50]).

G. METHODS BASED ON FUZZY LOGIC
The methods in this category use fuzzy logic in three different ways as described below. First, the Fuzzy Sugeno Integral uses a fuzzy integral to fuse texture and color features for background subtraction (see [38]).
Second, in the Fuzzy Choquet Integral method (see [39]), background initialization is made by using the average of the first N video frames containing objects. An update rule applied to the background image is necessary for the algorithm to adapt well to the system over time. For this, a selective maintenance scheme is adopted.
Finally, the LBFuzzy Gaussian method (see [40]) uses a saturating linear function instead of a hard limiter (in the fuzzy background subtraction) to determine if the pixel belongs to the background or to the foreground.

H. METHODS BASED IN TYPE-2 FUZZY LOGIC
The first detector in this category is presented in [51] (see also [41], [52]) in two modes, T2-FGMM-UM and T2-FGMM-UV. Although both modes can be used to model the background, the T2-FGMM-UM is expected to be more robust than the T2-FGMM-UV.
In [42], the authors introduced spatial-temporal constraints into the T2-FGMM using the MRF (Markov Random Field), a framework to achieve superior modeling performance for dynamic backgrounds.

I. NEURAL AND FUZZY-NEURAL METHODS
Neural and fuzzy-neural methods use SOM (Self Organizing Maps) to detect the background and foreground (see [43], [44]. Each node of the SOM has an associated weight vector of the same dimension as the input data vector and a position in the map space. The nodes are usually organized in a hexagonal or quadrangular regular 2-dimensional grid. The SOM describes a mapping between the higher dimensional input state and the lower dimensional map. The process of placing an input vector in the map requires finding the node with the most similar weight (the closest one).

J. OTHERS
Algorithms that could not be placed in any of the previous categories are described in this section.
The Independent Multimodal Background Subtraction (IMBS) method discussed by [45] seeks fast and efficient background subtraction. The background model is computed through per-pixel on-line statistical analysis of a set L of N frames in order to achieve high computational speed. According to a sampling period P, the current frame I is added to L, thus becoming a background sample S n , where 1 ≤ n ≤ N .
The non-parametric model called Vu Meter [46] is based on a discrete estimation of the probability distribution. The key aspects of this method are the probabilistic model and the temporal update.
In [47] (see also [4]), a parzen-window estimator for each background pixel is proposed: where K is the kernel (usually a Gaussian kernel), and N is the number of previous frames used to estimate P. A pixel is classified as an object when P(I t ) is higher than a predefined threshold, i.e. when the probability it belongs to the background is low.

K. FINAL CONSIDERATIONS ABOUT THE BACKGROUND-SUBTRACTORS
As pointed out at the beginning of the section, most of the algorithms discussed above were studied in [5] comparing real and computer-generated videos. In that extensive work, a broad comparison including detector accuracy, execution time, and CPU and memory requirements showed that there is not a perfect algorithm, and performance is highly dependant on the application. The algorithms studied by [5] were evaluated based on car detection in outdoor, daytime scenes, with different weather conditions. In contrast, our algorithm comparison is based on the detection and tracking of people inside a house. In addition, we are interested in night vision, which is not considered in [5].
In order to focus tests of background subtractors (available in the library) on night-time videos taken inside a house, a metric was designed by removing the D-score index used by [5]. This D-score index weighs the importance of a pixel depending on its location in relation to the contour of the object. However, in our case the shape of the object is not critical to indoor, night-time conditions, and the D-score index could lead to misleading results.
In [5], the parameters of the algorithms were adjusted by manually searching for the best results. In our work, we systematically compare algorithms using a set of test cases and optimize parameters using a genetic algorithm.

IV. DESCRIPTION OF THE FALLERT SYSTEM
The detection system used in this paper (called ''Fallert'' and shown in Fig. 1) was originally developed to be executed on a low-cost embedded computer. Several options were taken into account, and the Raspberry Pi board was chosen due to its sound technical characteristics, widespread adoption, and low price. In addition to the board (version 3B+ currently), the camera module designed for Raspberry Pi was used, which connects to the Raspberry Pi board via the CSI (Camera Serial Interface) port, thereby requiring significantly fewer CPU (Central Processing Unit) resources than a regular USB camera. An IR-LED ring is placed around the camera to provide IR illumination at night. The Fallert system also includes a case (designed specifically and printed with a 3D printer), an SD card, and a power supply. The electronic diagram can be found in Fig. 2.
This prototype is a fully capable, independent fall detection system with an estimated cost of less than 80e. The system connects to the internet using the built-in WiFi adapter of the  Raspberry Pi and sends a message (email or Telegram) when a fall has been detected (as in Fig. 3). The message includes an image of the fall. If the person recovers, another message is sent.
Most currently available commercial devices for fall detection are portable. Regarding vision-based systems, similar products to Fallert exist, for example, ''Carecams'' 2 , an online system based on IP cameras. However, these products employ fall detection algorithms run in a server outside the camera, where powerful computers can be used. In contrast, all image processing and fall detection in the Fallert system are performed in the Raspberry Pi.

V. METHODOLOGY
After reviewing the background subtraction algorithms in section III, the methodology used to evaluate each algorithm and to tune its parameters is presented in this section. 2 https://www.carecams.co.uk/peace-of-mind-cameras The effectiveness of a background subtraction algorithm depends on the tuning of its parameters. Each algorithm has different parameters and to test all combinations can be a time-consuming task. To simplify this task we have developed a genetic algorithm to select the best combination of parameters for each algorithm and to compare the performance of the different algorithms for night conditions in home environments.
The genetic algorithm uses a fitness function that objectively measures algorithm performance for our task. This function is also used to choose the best parameter combinations for each algorithm. In addition, to get the best performance of each algorithm, we have also included a series of pre-and post-processing operations to the input frames and the algorithm output in order to improve the background subtraction operation. All these computer operations require significant amounts of time, therefore, the hardware used for computation is an important factor in this process. It is important to take into account that not all the algorithms take the same amount of time to process the video, and this can be a limitation in some applications.
Summarising, the methodology includes the following steps: (1) for each video, a set of pre-processing operations are performed; (2) the background subtraction algorithm is executed with the chosen parameters; and (3) some postprocessing operations are performed. The genetic algorithm optimizes the number of pre-and post-processing operations and the values of the parameters of the background subtractors.

A. IMAGE PRE-AND POST-PROCESSING
In order to maximize the results of the background subtraction, a set of common operations were performed in each frame. The image processing cycle can be seen in Fig. 4. These operations, which are simple pixel transformations that VOLUME 7, 2019 help to remove possible noise or increase the efficiency of the method, are as follows: • Contrast: Used to adjust the image contrast. • Dilation: This operation consists of convoluting an image A with some kernel (B). The shape of the kernel is set to be a circle. This operation makes the white object bigger and is used to make the foreground (usually clearer than the background) bigger and thus easier to analyze.
• Erosion: This operation computes a local minimum over the area of the kernel in order to reduce noise.
• Open: This operation consists of erosion followed by dilation and is used to remove noise from the image after background detection.
• Close: This operation consists of a dilation followed by an erosion and is used to fill holes in the foreground figures that may have appeared after background detection. The experiments and results presented in the next section clarify the role that these basic image operations play in increasing the effectiveness of the background subtractors. Each operation is done with the default parameters, but the number of times each operation is performed is optimized using a genetic algorithm designed for speed and simplicity.

B. GENETIC ALGORITHM
There are several alternatives for dealing with the optimization of the proposed algorithms. Other optimization algorithms such as Differential Evolution or Particle Swarm Optimization, could have also worked. However, considering the similarity between the parameters to be optimised and the genetic algorithm genes, the genetic algorithm was considered as the most appropriate.
The genetic algorithm was implemented using MatLab, and its ga (genetic algorithm) function. To codify the information to optimize, the following genetic sequence (Fig. 5) was used. The number of genes depend on the algorithm to optimize, as the gene codification includes the intrinsic parameters of the algorithm. Here, N represents the number of specific parameters of the algorithm minus 5 (the number of parameters used for the pre-and post-processing steps.) The range for the pre-and post-processing parameters for all algorithms was set as shown in Table 2.
Each algorithm has a specific set of parameters (or, in some cases no parameters). Explanations of each parameter is beyond the scope of this paper. Optimization was initialized using either the default parameter values (set by the algorithm authors), or those selected in the review paper [5].
For the selection of the internal genetic algorithm parameters, we followed the method used in [53]. The objective is to  obtain acceptable results with a reasonable simulation time. A set of experiments were carried out in order to determine these parameters. The experiments were done simultaneously using different computers to reduce the impact of the randomness present in the genetic algorithm. The simplest algorithm, Frame Difference, was used in order to save time in this process. An example of these experiments can be seen in Figure 6.
Finally, the genetic algorithm parameters selected for the optimization were as follows: • FitnessLimit: If the optimization function reaches this value (−1), the optimization will stop. It is the perfect value.
• FunctionTolerance: The algorithm will stop when the relative change in the fitness function of the last generations is equal or less than this value (1e −1 ).
• PopulationSize: The number of individuals in each generation. After trials it was chosen 30.
• CrossoverFraction: The fraction of individuals in the next generation created with the crossover function. This value does not change during the execution of the optimization process. After trials it was chosen 0.6. Apart from these parameters, the default genetic algorithm from MatLab was used, including the crossover and mutation functions.

C. FITNESS FUNCTION
The fitness function used is based on the one used in [5]. This metric is based on the comparison of the result obtained in a set of frames by the algorithm and the ''ground truth''. The ground truth is obtained manually for each frame we want to   Table 3. compare. A first comparison is done pixel by pixel, obtaining 4 classes (see Table 3). An example of this is shown in Fig. 7.
With this pixel information, we use the following indexes: . The fitness function also takes into account other indexes that treat the images as a whole, and not just the pixels alone (see [54]): • SSIM: The Structural Similarity Index (SSIM) is used for measuring the similarity between two images, providing results in the range [0,1].
• PSNR: The Peak Signal-to-Noise Ratio is the ratio of the maximum possible power of a signal to the power of corrupting noise that affects the fidelity of its representation. An additional transformation is needed to change the dB results to the range [0,1]. Finally, the fitness function is the following: where N is the number of frames to analyze in the video.

D. HARDWARE USED
To execute the experiments and simulations, several computers were used, and their main features are listed in Table 4. The videos were taken with a Raspberry Pi model 3B and a camera Pi NoIR. This camera does not have an infra-red filter, allowing us to record videos without visible light (night videos). An infra-red LED ring was used to emit infra-red light as explained in Section IV. The videos were recorded in very different scenarios in order to have as general results as possible. Some examples of photograms used in the algorithm can be seen in Fig. 8.

VI. RESULTS AND DISCUSSION
All results are given in Table 9. In this section, we explain and analyze the results.    To determine the limits of the detection cost value, we obtain results when the algorithm detects everything as either foreground or background: • Everything detected as foreground: 0.3082 • Everything detected as background: 0.5683 Therefore, the value 0.5683 is considered as the lowest valid value for the algorithms.
A first test was made using the default parameters (the ones set in the paper [5] or in the code description), and results from the best-performing algorithms are shown in Table 5. An example of the detection in two particular frames is shown in the Figure 9.   Consequently, the intrinsic algorithm parameters were optimized using the genetic algorithm (pre-and postparameters were not optimised at this stage). The best results with this optimization are shown in Table 6. The results after the optimization are never worse than those obtained with the default parameters, as expected. We found that accurately tuning the intrinsic parameters of each algorithm is critical to performance, as clearly shown by the results presented in Fig. 10. The default and optimized intrinsic parameters are given in Table 7. Tests showed that after several executions the genetic optimization for this type of algorithms yielded similar results with small, non-significant variations in the adjustments. Therefore, the results were considered robust enough to draw conclusions.
Finally, pre-and post-parameters were optimised for the best algorithm of the previous phase (LBAdaptativeSOM), obtaining a fitness value of 0.8877. The result is shown in Fig. 11. From the initial result of the algorithm without optimization (0.7321) to the final result, the optimization has improved the result in a 21%.
The parameters obtained after adding the pre-and postparameters in the genetic algorithm are shown in Table 8,     Table 7. The color code is explained in the table 3.

FIGURE 11
. Best result obtained with the parameters optimized using the genetic algorithm (shown in Table 7) and additional processing, for the LBAdaptativeSOM algorithm. The color code is explained in the Table 3. along with the range used in the optimization process. Comparing the results in Tables 7 and 8 shows that the parameters of the algorithm change with and without the additional pre-and post-processing. We can conclude that VOLUME 7, 2019 the parameters need to be re-optimized when additional preand post-processing is performed. The parameters optimized in the LBAdaptativeSOM algorithm are related to the learning process of the SOM and its sensibility.
To see how the evolution process helped the detection of background and foreground, the results obtained using the original background detector used in the Fallert System with those obtained using the optimized algorithm are compared in Figure 12. The original background subtractor selected, the OpenCv default background subtractor Mixture Of Gaussians V2, performs well in daylight conditions but fails in the night videos. Improvement to background/foreground detection with the optimized algorithm is clearly shown in Figure 12. From the initial result of the Mixture Of Gaussians V2 algorithm without optimization (0.4790) to the final result with the LBAdaptativeSOM algorithm (0.8877), the optimization and a different algorithm choice has improved the result in a 85.3%. Thus, the optimization process greatly improves results, even in night-time conditions, for a variety of scenarios.
Finally, to see the improvement of the GA in the algorithm, image 9 c), showing the initial algorithm without optimisation, can be compared with image 12 a), showing the result after the whole GA optimisation.

VII. CONCLUSION AND FUTURE WORK
In this paper, we analyze several background-subtractor algorithms to determine which has the best performance in detecting human falls in indoor, night-time environments.
The starting point of this research was the analysis of several background-subtractor algorithms (from the library BGSLIbrary) performed by [5]. These algorithms have been reanalyzed (adding recent algorithms), to find the most suitable background-subtractor algorithm to detect human falls in night-time conditions. In order to improve the detection process, a genetic algorithm was used to optimize background subtractor parameters and to select the optimal number of pre-and post-processing operations performed on images. Our results show that the use of a genetic algorithms can help to optimize artificial vision algorithms.
In conclusion, the best background subtractor for detecting falls in indoor, night-time environments is the LBAdaptative-SOM with the parameters shown in Table 8.
Future work will focus on testing the application in other home environments with a larger set of videos and falls.