A Review of Abnormal Behavior Detection in Activities of Daily Living

Abnormal behavior detection (ABD) systems are built to automatically identify and recognize abnormal behavior from various input data types, such as sensor-based and vision-based input. As much as the attention received for ABD systems, the number of studies on ABD in activities of daily living (ADL) is limited. Owing to the increasing rate of elderly accidents in the home compound, ABD in ADL research should be given as much attention to preventing accidents by sending out signals when abnormal behavior such as falling is detected. In this study, we compare and contrast the formation of the ABD system in ADL from input data types (sensor-based input and vision-based input) to modeling techniques (conventional and deep learning approaches). We scrutinize the public datasets available and provide solutions for one of the significant issues: the lack of datasets in ABD in ADL. This work aims to guide new researchers to better understand the field of ABD in ADL and serve as a reference for future study of better Ambient Assisted Living with the growing smart home trend.


I. INTRODUCTION
ABD in ADL is not a new topic in computer vision. For the past ten years, much work has been done on anomaly detection and recognition in a wide array of areas, such as city surveillance for national security or also known as crowd scene analysis [1], driving activities for safe traveling [2], smart-in home monitoring, health care for elderly people [3], etc. Specifically, much research was done in the city surveillance field to ensure public security [4]. Public safety is a concern that needs to be taken care of. However, the safety of human beings at home should not be neglected, particularly the safety of elderly people staying alone. According to [5] and [6], the case of home accidents has seen an increasing rate, especially falling accidents (75%) which happened in the circle of elderly. Most home accidents occur unexpectedly, leaving many unnoticed, leading to severe cases such as The associate editor coordinating the review of this manuscript and approving it for publication was Omid Kavehei . bone fractures, sprains, burns, or even severe injuries [5]. Fortunately, with the help of emerging technology, home accidents can be prevented with state-of-the-art computer vision techniques that send out signals as warnings should any accidents happen to alert the authorities. A thorough review of this study is necessary to decide on the future development of a safer society given how important this field of study can lead to the safety critical events of elderly people. To the best of our knowledge, our work is the first comprehensive literature review that covers the types of input data (sensor-based and vision-based), methodologies (conventional approaches and deep learning), and datasets (available datasets and ways to tackle the lack of data issue) analysis in the field of ABD in ADL, specifically for elderly. Based on the work by [7], a review of ABD is done, but it is mainly focused on video-based crowd behavior, and only one type of ADL, the ''falling'' action, was mentioned [8], [9] comprehensively reviewed ABD but only partially discussed ADL and possessed no analysis of input data type. These papers are more suitable for advanced study and not for new researches. The survey by [10] stressed video-based ABD and only a few ADLs such as were mentioned. References [11] and [12] emphasized only deep learning methods that are popular in the field of ABD. All the studies mentioned overlooked the most significant issue in ADL: the lack of dataset problem that is analyzed and discussed in our study.
In this survey paper, we look into ABD in ADL from different aspects ranging from types of input data (sensor-based input and vision-based input), methodology (conventional approaches and deep learning approaches), available datasets and the challenges faced. This study comprises comprehensive information regarding ABD in ADL. A conceptual diagram is depicted in Fig. 1 to enable a better understanding of the topic discussed.

A. SURVEY SELECTION
This study conducted an intensive search on ACM Digital Library, IEEE, Springer, and Scopus. The searches were done according to the article title, abstract, and keywords from the year 2012 to the year 2022. In the 1 st phase, we searched by title, abstract, and keyword. Since the word ''abnormal behavior'' usually can be exchanged with the synonym ''anomaly'', these two terms were selected in the search queries. Four databases with queries (abnormal OR anomaly AND behavior AND detection AND in AND activities AND of AND daily AND living) were searched and a total of 19931 results were obtained. The number of results for ACM is too large as every word in the query was taken into account  by the ACM search engine; thus, in the 2 nd phase, the query for ACM was changed into (abnormal OR anomaly behavior detection AND activities in daily living). As a result, the number of papers had been significantly reduced to 8087, but it was still too large for manual selection. Hence, the scope was refined again for ACM in 3 rd phase into title search with the same queries and obtained 312. Table 1 summarizes the study selection from Phase 1 to Phase 3. In 4 th phase, filter functions were used in each database, and the results obtained are shown in Table 2.
The collected papers were skimmed manually to select only the relevant ones that matched the study scope. In the last phase, some papers relevant to the topic but the keywords are not stated in the studies' search queries, such as ''falling detection in activities of daily living,'' were also included. Table 3 displays the final number of papers selected for each database in Phase 5. 5070 VOLUME 11, 2023 Our Contributions: In this study, we thoroughly analyzed the scientific literature on ABD in ADL from 2012 to 2022. To the best of our knowledge, we are the first to carry out a review study on the work of ABD in ADL, specifically for the elderly. This paper focuses on the types of input data to the ABD system, methods for extracting features and classification into abnormal or normal behavior classes, relevant datasets available, the challenges faced, and how to resolve these issues. Fig. 2 depicts the overview flow of the system discussed. Instead of merely summarizing the current methods, this study attempts to: 1) introduce and promote the importance of ABD in ADL to the readers; 2) compare and contrast the types of input data and how they are applied in the ABD system; 3) discuss the current methods for ABD in ADL based on conventional or deep learning approaches and how they affect the performance of the detection system; 4) suggest potential applications of ABD systems in real-life; 5) assess publicly available datasets suitable for ABD systems; 6) analyze the performance metrics used in ABD in ADL; 7) outline the challenges of ABD in ADL and ways to resolve them.

II. CHARACTERISTICS OF ABNORMAL BEHAVIORS
The definition of ''abnormal behavior'' varies depending on the user's context. For example, behavior is defined as abnormal if it differs from one another under a context [90]. Another definition of abnormal behavior is an ''activity done in an unusual location, at an unusual time'' or ''events that are fundamentally different in appearance or having an odd order of occurrences'' [91]. In this survey, abnormal behaviors are defined based on ADL and can be further classified into ''accidental'' or ''non-accidental'' activities.
In terms of ''accidental,'' it refers to the common accidents in the household. According to [92] and [93], common home accidents include falls, poisoning, falling objects, bruises, sprains, cuts, burns, choking, mechanical suffocation, drowning, glass-related injuries, and more. Although there are a lot of common home accidents, most of the studies are mainly focused on falling detection. Falls can happen in every house place and imply to every household member. However, studies have shown that most falls happen among elderly and young children [92]. Since falls can occur in unexpected places, much research has been done to examine the case further. Falls can be classified into many types and ways of falling, for instance, (a) forward and backward falls [22], [94], [95], and (b) fall when sitting down in a wrong way or losing balance [22], [94], [95], and (c) falling from standing position and from sitting on the chair [96], [97]. Fig. 3 shows examples of falling from the studies. Besides falling, [63] presented abnormal behaviors in the form of gas leaks and flooding. Apart from ''accidental,'' according to the definition from [86], abnormal behavior happens when unusual activity is done at an unusual time. ''Non-accidental'' activities often involve patients with dementia or long-term degradation of the elderly's health. The study from [19] shows that dementia patients tend to carry out abnormal behavior such as (a) forgetting or doing things repeatedly and (b) sleep disturbance and dehydration. References [24] and [31] talked about longterm trajectory analysis for cognitive decline elders. Those with the symptoms tend to have repeated activities such as pacing around, lapping, or random walking.
The difference between ''accidental'' and ''non-accidental'' activities is that the latter cannot be identified merely on the data obtained. Some alterations and measures need to be taken for the logic to work. The authors [26] used wellness indices which include the user's well-being, movement, and emotion level or rules [33] to determine if the activity is abnormal. References [28], [35], and [59] produced synthetic abnormal behaviors based on three factors: frequency, order, and time taken to perform an activity. An abnormal behavior flag is raised if any activity hits on the threshold set for these factors. For instance, [14] first defined a sequence of normal activities based on rules: opening the door, using a kettle, using a cup, opening a cupboard, and using coffee. Fig. 4 shows the normal and abnormal behavior for the ''nonaccidental'' type, such as repeating activities. It is classified as abnormal as the ''use kettle'' action has been repeated multiple times and exceeded the time allocated for its routine use. Table 4 shows the summary of abnormal behavior types categorized by this study.

III. INPUT DATA TYPE
The selection of input data type varies according to the problem to be solved and the resources acquired. For human activity recognition, the most popular types of input data used are (a) sensor-based input and (b) vision-based input. For sensorbased input, it is further classified into (i) ambient sensors and (ii) wearable sensors. Vision-based input is divided into (i) image, and video sequences and (ii) pose estimation. Fig. 5 shows the diagram of the input data type discussed in this paper.

A. SENSOR-BASED INPUT
For sensor-based input, [3], [24], [45] applied ambient sensors such as door and motion sensors to collect ADL data of elderly suffering from dementia. Reference [52] applied two prototypes of acoustic floor sensors to capture the audio features of falling actions. As for wearable sensors, even though it is not so ordinary compared to ambient sensors, much research has also been done, which imparted the benefits of such a data collection method.
For instance, [13] developed a tri-axial accelerometer system and mounted it on different human body locations. As a result, this study found that the system is easier to detect and differentiate between normal actions and falling actions by installing the accelerometer on the upper trunk of the human subject. In the study by [53], an accelerometer-based wearable sensor was placed on the elder's waist to help detect the wearer's geographic features and send an alarm to the caregiver should any falling action occur. In [101], the elderly were asked to perform casual walking on a force plate provided to record the participants' kinetic features and detect whether the elder had Parkinson's disease. In the research carried out by [19], ambient and wearable sensors were studied. The ambient sensor (contact sensor, thermal sensor) and wearable sensor (accelerometer) were installed in the kitchen area or mounted on the wearer to detect abnormal activities in kitchen ADL.

B. VISION-BASED INPUT
Vision-based input is further classified into (i) image/video sequences and (ii) pose estimation. Image or video sequences are more like the raw data where the whole input is fed into the model to obtain the salient features, whereas pose estimation feeds in the keypoints from human joints. Both input types are equally popular in the case of human ABD, but (i) is slightly common in use compared to (ii) as image sequences are considered raw data in human activity recognition. It is also proven effective in line with emerging technology like deep learning that produces significant results with raw images.
Various studies have been carried out in ABD in ADL with vision-based input; some significant research was analyzed in this paper. The authors in [23] and [99] fed dynamic images from long videos into the model for human fall early detection. Reference [14] implemented a wireless camera system to capture the daily activities of the elderly for monitoring purposes and abnormal activity detection, including slipping and falling actions. Data collection in [18] was   carried out using a wireless camera system divided into functional and regular sensors based on different positions of sensors at home. Instead of using the raw image sequences, the work [100] converted the image sequences into motion history images before feeding them into the human fall detection model. Other than motion history images, depth images are quite the common input when the problem involves human action recognition. Depth images in [94] were applied VOLUME 11, 2023 to the machine learning model. As for pose estimation, [102] proposed a five-point inverted pendulum model built on the key points of a human subject to detect human fall behavior.
However, the easiest yet sufficient way for human activity recognition would be the raw key points extracted from the human subject. For instance, studies from [10], [11], [13], [17], and [93] applied OpenPose and PoseNet, the human pose estimators, to obtain the key points from a human subject and feed the features into the proposed models for senior fall detection. Table 5 displays the overview of sensor-based input, and Table 6 summarizes the vision-based input. Given the two different types of input data, both pose strengths and weaknesses in ABD in ADL. Table 7 shows the summary of input data types compared to different attributes.
There are strengths and weaknesses for both sensor-based input and vision-based input. The biggest strength of sensorbased input using sensors in ABD in ADL is less likely to incur privacy issues for the users as no faces are shown during the data collection process. Besides, data collection is regardless of the wearer's location [14], which makes the data-obtaining process easier. However, there are also some significant limitations of sensor-based input. First, it is uncomfortable for the wearer's daily use, especially when it involves mounting the sensors on the human body [103]. The data analysis process also requires experimented users as it often involves complicated equipment or gadgets [103]. Besides, wearers must understand the essential operation of the sensors mounted on their bodies, and the wearer might require charging devices [19].
Additionally, it has been demonstrated that when a person is wearing warm clothing or when the temperature difference between the body and the room is small, the performance of infrared sensors may drop [104]. Finally, it is worth noting that in [105] abnormal behavior such as falling action requires higher sensitivity than specificity as an accident like falling should not be overlooked. This also holds for all the other abnormal behaviors in ADL.
On the other hand, the biggest strength of vision-based input is that the data collection process provides ease and convenience for the users as it requires no contact between the users and the cameras [103]. It also has a higher chance of acceptability by the public as it is more realistic to be carried out in the long term. However, vision-based input also has its weaknesses. One of them is that sometimes occlusion happens in the image [106], or different viewpoints might incur different performances [106] for the model. Apart from that, the video's quality significantly impacts the model's performance [106] as well. Another disadvantage is that data collection using a camera may intrude on the wearer's privacy [18] and lead to insufficient data [106].

IV. METHODOLOGY
The selection of recognition characteristics informs the design of the ABD system. Modeling techniques that include feature extractors and the classifier can be categorized into (a) conventional approaches, (b) deep learning approaches, and (c) hybrid approach (combination of handcrafted features and learned features). Fig. 6 illustrates the diagram of the three categories as mentioned. With conventional approaches, the algorithm responds faster and uses fewer resources since the recognition features are 'handcrafted,' which the designer determines before the application [107]. However, selecting handcrafted features needs both meticulous plans and sufficient tests. On the other hand, deep learning is a type of machine learning, a computer system that learns without explicitly programming and improves based on past experiences on a task [108]. The drawback to such state-of-the-art technology is that deep learning often involves high computational costs and long model training time. Fig. 7 displays the overall concept of the conventional and deep learning approaches.

A. CONVENTIONAL APPROACHES
Many of the characteristics mentioned in the literature were ''hand-crafted'' by the designer to address particular problems like occlusions and differences in scale and lighting. However, finding the ideal balance between precision and computing efficiency is sometimes tricky when designing handmade features [109].
In this paper, conventional approaches are classified into two categories: (a) spatiotemporal-based and (b) appearancebased. Spatiotemporal-based is the finding of features based on the spatial and temporal statistics of the data [103]. On the other hand, appearance-based is the finding of features relying on shape features, motion features, or a combination of two from 2D or 3D depth pictures [103]. The fall alarm system's algorithm by [53] is based on rotation angle and cumulative acceleration thresholds. Using similarity and dissimilarity measurements, [45] gauges the two patterns' similarity. To attain the optimum results, select the right measure for a specific binary data analysis [110]. Classic hamming distance does not take nearby bits (the close bits) into account [111]. To enhance the activity detection of kitchen ADL, the sensor fusion technique considers the extraction of pertinent information from each type of sensor data and their combination. The purpose of [19] is to determine the key actions of the ADL by using a sensor fusion technique. The unique statistical characteristics of the Hidden Markov Model (HMM) have made it a tool for probabilitybased modeling that can separate various characteristics of a random signal sequence. This led to the development of an HMM-based system that could not only detect falls and forecast them to avoid falls using various methods like airbags [112].

2) APPEARANCE-BASED
This work [94] provided a novel vision-based method for automatic fall detection in an indoor setting. The method used a Kinect depth camera to retrieve data about human silhouettes. To protect the anonymity of those discovered, the authors purposefully avoid using color photos. Then, for each frame, curvature scale space (CSS) properties of the extracted silhouette [113], [114] were selected. To describe fall action, a bag of words model (BoW) [115] based on CSS features (BoCSS) is employed. Apart from using depth images, a twostage fall identification system based on aspects of human posture is proposed by [98]. Based on the human skeleton obtained from OpenPose, two additional essential characteristics for preprocessing were provided: deflection angles and spine ratio to represent variations in human posture. For classification, support vector machines (SVM), random forests (RF), decision trees (DT), K-nearest neighbors (KNN) [98], and extreme learning machines (ELM) [94] are used.

B. DEEP LEARNING APPROACHES
As the name implies, deep learning calls for a significant amount of data to be trained using a multi-layered neural network. To categorize the input into the appropriate category, the network layers can be used to learn the key characteristics of the input data. Deep learning can be further classified into (a) vision-based and (b) temporal-based in this study. Vision-based refers to the data input in image or video sequences, whereas temporal-based refers to the input in signal processing.

1) VISION-BASED
For image and video sequences input, Deep ConvNet [116] can harvest discriminant characteristics at many levels of abstraction thanks to its deep design. Reference [23] trained a Deep ConvNet as an early event detector using this information. In the study by [22], the authors specifically used CNN as the feature extractor of the model, as VGG-16 [116] performs better on particular image sequences using posture data. Although OpenPose can generalize 18 or 25 key points for 2D pictures, those extra key points (often facing or finger key points) are unnecessary for fall detection and slow down the model processing time. Apart from the feed-forward network, [16] proposed deep LSTMs that demonstrated remarkable performance while learning predictable and erratic sequences [117]. The utilization of deep neural networks and their derivatives surpasses methods like SVM and HMM, according to research results published in the field [118].

2) TEMPORAL-BASED
In the case of sensor-based input, a Siamese Neural Network (SNN) capable of learning a latent representation of an auditory event is proposed by [52] in their study. Besides audio features, the neural network also works well in processing other sensor-based features [3] and [24] presented RNN, specifically LSTM and feed-forward networks like CNNs, for identifying sensor-based dementia-related abnormalities in smart homes. Study result on activity recognition demonstrates that these techniques outperform NB, HMMs, HSMMs, and CRFs. Studies from [3] and [24] showed that RNNs effectively recognize activities. They can also effectively handle imbalanced data and anomaly detection, which is crucial in the case of dementia. They outperform several well-known and widely used algorithms for activity recognition, including SVM, NB, HMM, and HSMM, while still being quite competitive with CRF.
Additionally, the empirical tests revealed that all RNN versions performed similarly across all datasets utilized in this work. However, LSTM appeared to perform somewhat better. This concludes that RNNs are highly suitable for activity recognition and abnormal behavior detection, as proven in the study.

3) HYBRID
The ''five-point inverted pendulum model'' is a new model of human posture representation of fall behavior proposed by [102] based on research on the stability of human body dynamics. The inverted pendulum structure of human posture in complex real-world scenes was extracted and constructed by employing an improved two-branch multi-stage convolutional neural network (M-CNN). The approach provides great resilience, broad universality, and excellent detection accuracy according to experimental results in real scenarios. Table 8 shows a detailed summary of the methodology studied in this paper. Table 9 provides a summary of comparisons among the approaches.
From Table 9, performance is defined in terms of prediction accuracy. Computational cost is the cost inflicted upon training the model, and computational time refers to the time required for training the model. In the context of ADL, VOLUME 11, 2023    performance and computational time are the most crucial aspects, as ABD in ADL involves safety-critical events. Below are the critical evaluations for the three approaches summarized from the quantitative analysis of reviewed papers.

a: CONVENTIONAL APPROACH
The conventional approach is best known for having short computational time and cost. Besides that, this approach also offers customization of features, allowing users to handcraft specific features to enhance their work [53]. However, there are certain limitations to this method. It requires much labor to create the salient features suitable for the situation, and most frequently used feature extractors are created using a particular dataset, making them biased against databases as they cannot extract features for all purposes [124]. Furthermore, the use of conventional approaches to support ABD in ADL are insufficient because they are only effective with limited data. However, abnormal behavior in ADL involves such a wide range of behaviors that even a seemingly insignificant behavior may include several distinct series of actions. For instance, falling includes actions like falling forward, falling backward, falling while seated, and more [95]. Therefore, even though conventional approaches consume less computational time, they are least recommended for ABD in ADL since they often only perform well with a limited data set. Due to the limitations above, state-of-the-art deep learning methods are introduced to enhance the work in video classification.

b: DEEP LEARNING
CNN was adopted in [22] and [64] for falling detection in the elderly, which can automatically extract salient features from the input data that reduce the labor work while maintaining the model's accuracy. Another network variation inspired by CNN, such as VGG-16, was used [116] as it performs better on particular image sequences using posture data. To improve the accuracy of ABD in ADL, RNNs such as LSTM, GRU, and VRNN were used [24] as these methods take into account the temporal information of the input data, which is crucial, especially in ABD that involves interpretation of data over time. However, RNN poses a most significant drawback: the high computational time and cost consumption. To solve this problem, the work [125] proposed a Simple Recurrent Unit that enables highly parallelized implementation, which surpasses the performance of LSTM while reducing the computational time required. The utilization of deep neural networks and their derivatives surpasses methods like SVM and HMM, according to research results published in the field [118].

c: HYBRID
A deeper network was created and demonstrated by combining handmade, and deep-learning HAR approaches. In the cases of [102], [122], and [123], the hybrid approach is proven to yield the best performance out of all approaches. However, the overall architecture of the hybrid approach is technically more complex and might result in higher computational costs and time.
In summary, there is no precise answer regarding the best ABD technique because each method has advantages and disadvantages that vary depending on the problems and scenarios, but the deep learning method is strongly advised in the area of ABD in ADL based on the findings from the study.

V. DATASETS AND EVALUATION METRICS A. DATASETS
The volume and variety of publicly available datasets for experimentation have significantly increased with the development of new technologies. However, it is undoubtedly limited to the case of anomaly events in ADL. In this survey, datasets are defined based on the type of abnormal behavior in ADLs, such as (a) falling and (b) others like dementia or doing the ''wrong'' actions in the given time frames. Even though the datasets are limited, this issue will be discussed in Section IV, Part 2.

1) FALL DATASETS a: FALL DETECTION DATASET (FDD)
The authors in [126] collected 250 film sequences in four locations (with home included), of which 192 had fallen, and 57 featured many everyday actions and body transfers, such as moving from a chair to a sofa. 320 x 240 pixels are shown at a frame rate of 25 frames per second. The video data highlights the critical challenges of creating realistic scenes appropriate for an elderly person's household.

b: SDU FALL DATASET
Utilizing a cheap Kinect camera, the authors in [94] created a depth action dataset. The study included ten young men and women participants. Each participant performed six movements-falling, bending, squatting, sitting, laying, and walking-30 times. The subjects purposefully fall since actual falls are difficult to obtain.

c: MULTICAM DATASET
Eight affordable IP cameras with a wide lens make up [95] a multi-camera system that can capture the whole room. A healthy individual carried out 24 realistic scenarios in the dataset. Under an eight-camera setup, the first 22 scenarios include a fall and confounding events, whereas the latter two solely include confounding events (11 crouching positions, nine sitting positions, and four lying on a sofa positions). 7 m × 4 m is the size of the study space, which is furnished with a table, a chair, and a sofa that mimics a genuine living room. 720 480 pixels and 30 frames per second were used to record the video feeds.

d: HIGH-QUALITY FALL SIMULATION DATA
A space was decorated to resemble a nursing home room. In this room, five web cams (12 frames per second, 640 x 480 resolution) were set up to film various fall and non-fall events [128]. The room did not have any windows because it was in the basement. Near-infrared spots were employed to speed up the acquisition process when lower light intensities were present. 55 fall situations in all were noted. Each scenario had several typical events.

e: UR FALL DETECTION DATASET
Reference [96] This dataset includes 70 sequences (40 everyday life activities + 30 falls). Two Microsoft Kinect cameras and the related accelerometric data were used to capture fall incidents. ADL events are only captured using a camera and an accelerometer. Devices PS Move (60Hz) and x-IMU (256Hz) were used to gather sensor data. For cameras 0 and 1, which are positioned parallel to the floor and ceiling, each row includes a sequence of depth and RGB pictures, synchronization information, and raw accelerometer data.

f: UTD-MHAD DATASET
For a whole collection of 27 human activities, UTD-MHAD [15] includes RGB movies, depth videos, skeleton positions, and inertial data from a Kinect camera and a wearable inertial sensor. Our dataset has 8 subjects and 27 activities (4 females and 4 males). Each participant performed each action four times. It includes 27 complete human activities and 861 data sequences from 8 people. Table 10 shows the overview of publicly available fall datasets with their details.

2) OTHER DATASETS
Even though there might be available fall datasets in public, some of them involve falling actions in crowd scenes, which is unsuitable for ABD in ADL. The limited source of the dataset problem can be resolved by simulating anomaly events [3] or defining the abnormal behavior based on specific time frames [19]. Both methods are helpful and can be applied to cases such as dementia detection.

a: VAN KASTEREN DATASET
Van Kasteren dataset is one example of a dataset consisting of only normal behavior data in ADL.

i) ALTERATION MADE
The authors in [24] intentionally introduced some anomalies into Van Kasteren dataset because there is no dataset on the anomalous behavior of dementia patients that is accessible during their study time. To demonstrate how the suggested study may be used to find these anomalies, the authors concentrated on two distinct types of aberrations seen in dementia-affected seniors' daily activities, such as (1) forgetting or doing things repeatedly and (2) sleep disturbance and dehydration.
(1) forgetting or doing things repeatedly: Manually introducing a specific set of acts into the sequence of regular activities, the authors created these kinds of abnormal behaviors. As a result, that behavior will happen more than once and at an inappropriate time of the day, like having supper in the middle of the night. To create abnormal behavior sequences associated with frequency, the authors inserted instances of the following activities into the normal behavior sequences: brushing teeth, making dinner, eating, and grabbing a snack.
(2) sleep disturbance and dehydration: The authors recreated these abnormalities by adding certain artificial activities to a person's normal nighttime activity patterns. Specifically, they included the activities of drinking and using the restroom to the sequences of everyday activities involving sleeping. This mimics the behavior of obtaining a drink and repeatedly using the restroom in the middle of the night.

b: ARUBA DATASET AND WSU DATASET
Motion, door, and temperature sensors are employed in the Aruba testbed [129]. The actions taken in this dataset are entirely normal. There are 5 actions in the WSU testbed. It instructed the participants to include mistakes in their performance. These mistakes can be observed in the routine activities and behavior of older adults experiencing the effects of cognitive decline.

ii) ALTERATION MADE
Similar to the previous case, the Aruba testbed [3] was modified to simulate the first two categories of abnormalities. The dataset, in this instance, has a single participant. The authors first used the activities in the training data as a norm, then synthesized the deviations from this norm and incorporated them into the test data. When they take place at the incorrect time of day or immediately after or before a particular activity, these activities, which are entirely acceptable on their own, become abnormal. Therefore, it is crucial to capture these abnormal behaviors in their context. The WSU dataset already includes the third anomaly category; therefore, it is utilized without changing sensor readings. The categories are mainly (1) repeating activities, (2) disruption in sleep, and (3) confusion. The summary of datasets from Section V Part 2 is listed in Table 11.

B. EVALUATION METRICS
Evaluation metrics are often used in classification problems to evaluate a model's performance. This study discusses some basic performance measures frequently used in ABD.

1) CONFUSION MATRIX
Summary of model result statistically. The classification errors and their categories are also displayed in the confusion matrix. Hence confusion matrix is also known as the error matrix [103]. Fig. 8 shows the idea of a confusion matrix for an ABD system.

3) SENSITIVITY
This is also known as recall, true positive rate, or probability of detection. In the case of ABD, it represents the ratio of correctly classified abnormal cases out of all actual abnormal cases. The calculation of sensitivity is as stated below:

4) SPECIFICITY
Also known as true negative rate or false positive rate (FPR).
In the case of ABD, it represents the ratio of correctly classified normal cases out of all actual normal cases. The calculation of specificity is as stated below:

5) PRECISION
Also known as Positive Prediction Value (PPV). In the case of ABD, it represents the ratio of correctly classified abnormal cases from the predicted abnormal class. The calculation of precision is as stated below:

6) NEGATIVE PREDICTIVE VALUE (NPV)
Also known as negative precision. In the case of ABD, it represents the ratio of correctly classified normal cases from the predicted normal class. The calculation of NPV is as stated below:

7) F SCORE
Also known as F Measure and F 1 Score. It shows the harmonic mean of precision and recall [103]. The F Score is between 0 to 1 and the higher the score, the better the model's performance. The calculation of the F score is as stated below:

8) AUC (AREA UNDER THE CURVE)
AUC is used to visualize the performance and shows the separability between classes of the model. The model is proven ideal when AUC is near 1 and unsatisfying when AUC is near 0. AUC visualizes the TP Rate (True Positive Rate) vs. FP Rate (False Positive Rate) at a given threshold. Fig. 9 shows the typical graph of AUC. Table 12 shows the types of performance metrics applied by the papers in this study.

VI. DISCUSSION
A. POTENTIAL APPLICATION DOMAINS 1) HOME MONITORING FOR ELDERLY Applying the ABD system in ADL poses the same concern as in public places. Even in the most vigilant of families, accidents can still occur at home, no matter how hard we work to keep our homes as safe as possible. ABD system in ADL provides the need for automatic and autonomous abnormal behavior detection in the way it is needed. The elderly nowadays is often alone at home without supervision; accidents can happen anywhere at any time and might lead to severe circumstances such as death. Fig. 10 shows home and community death rates in the United States from 1978 to 2020. With the ABD system, the ADL of the elderly can be monitored 24/7, and early detection of abnormal behavior can be found to alert the authorities on time before any accidents happen.

2) ELDERLY HEALTH CARE CENTRE
By 2030, several business associations forecast a shortfall of about 100,000 physicians [139]. With state-of-the-art techniques, the ABD system in ADL can be implemented in those elderly health care centers. Moreover, the continuous VOLUME 11, 2023  and autonomous detection system can better monitor the behaviors of the elderly and send a warning to the authorities should any abnormal behavior happen.

3) COGNITIVE DECLINE DETECTION
Other than detecting abnormal behavior in ADL, ABD systems such as [28] and [29] also look into the cognitive decline in elders. This type of application is more for long-term monitoring and analysis. For instance, symptoms such as increasing pacing around or random walking could lead to the detection of dementia and send a warning to the authorities for further action.

4) HEALTH MONITORING SYSTEM
Apart from common illnesses in elders like dementia, the ABD system in ADL can identify many more healthrelated issues such as loss of appetite and even urinary tract infection [37]. This could be implemented as the monitoring of ADL can produce vast information about the users' health conditions.

5) FALL DETECTION SYSTEM
Due to advanced technology, ABD systems in ADL can detect falls indoors and outdoors [43]. This is important as ADL does not merely stick to home activities but also daily activities such as buying groceries. Hence, detecting falls outdoors is as crucial compared to detecting falls indoors.

B. CHALLENGES AND LIMITATIONS
Even though there are many studies done on the ABD system in ADL, the research still has many challenges and difficulties. Below are some of the significant challenges and limitations faced by ABD in ADL in the current trend: • Lack of real-life datasets The community has a hurdle because there aren't many datasets that can be used as benchmarks to train and evaluate contextual anomaly detection in ADL. This is understandable, given how uncommon and diverse deviant actions are in the actual world. Apart from the real-life datasets, most of the researchers [94], [95], [96], [126], and [128] simulated their datasets in controlled settings to train the model. However, the simulated datasets' diversity is limited, mainly on the action ''falling'' only. Our study shows that injecting synthetic data can resolve the problem, but a long-term solution should be established to counter this problem.
• Limited actions are studied Besides the ''accidental'' type action ''falling,'' some other accidents commonly happen in ADL, such as poisoning, falling objects, bruises, sprains, cuts, burns, choking, mechanical suffocation, drowning, glass-related injuries, and more. However, these fields are either seldom touched or not studied yet.
• Vast variation of ADL There are no rules to define a standard action sequence for a human being in ADL. The reason is that ADL covers a lot of variation [103] depending on the subjects. For instance, subject A likes to drink water first before going to bed; however, subject B is used to waking up in the middle of the night to get water and resume sleeping. These two actions might seem normal in our eyes, but how do we train a model to think so with two opposite action flows? Although studies like [27], [28], [59], and [65] specified the threshold indices for the abnormal behaviors, [140] allowed the participants to define their abnormal behaviors in ADL, these methods are too specific and not generic to all elderly. Hence, the vast variety of composite nature in ADL might be one of the biggest challenges in ABD, and this could be the next topic to be studied thoroughly.
• High false alarm rate ''Non-accidental'' ABD often involves an examination of an external factor. The meaning of this challenge is that, for instance, elder A wakes up in the middle of the night and opens the front door of his house. It might seem abnormal and have a high chance of being classified as a ''sleepwalking'' symptom. However, it might be a different scenario if external factors are considered, such as a dog barking at the door or people knocking. Hence, to reduce the false alarm rate of ABD systems in ADL, external factors need to be considered, and the combination of Internet of Things technology, such as door sensors, is vital to enable better performance of the detection system.
• Limited to one location Given the ADL of the elderly, it should not be limited to homes, health care centers, or other places only. It includes the activities of the elderly since they wake up and proceed to their daily activities no matter where those actions are taken. For example, elder A tends to go out to the market to buy groceries for cooking after she finishes her breakfast at home. Hence, the ABD system should not be limited to her activities at home only. The system should monitor her behavior from home to the market and the market back to home. This is crucial as abnormal behaviors such as falling might happen when the elder goes to the market or vice versa. [43] proposed a model that can detect abnormal behavior, such as falls or loss in elders' outdoor activities, by applying the sensor to the slippers of the users. Even though this is one of the ways to trace the behavior of an elderly outside of the house, it is associated with comfiness acceptance by each user as not all people are used to wearing a sensor under the foot.

C. FUTURE PROSPECT
We aim to achieve three milestones for the future study of ABD in ADL. According to a survey [141] from the United States, approximately 2200 children die each year from injuries at home, and 3.5 million children visit the emergency department due to injuries at home. So first, rather than merely focusing on elders, we wish to expand our study scope to children. Second, we will explore more abnormal behaviors mentioned in the earlier section targeting both ''accidental'' and ''non-accidental'' types to make the study more compact and comprehensive. Third, the study of Ambient Assisted Living (ASL) and smart home technologies can be exploited to enhance the ABD system in ADL in the future.

VII. CONCLUSION
ABD has been receiving increased attention in society nowadays. However, there is comparatively less research work related to ADL. This study highlights a few essential flows in developing an ABD system. First, we introduce the input data type for the system to provide a rough idea when choosing data, either from sensor-based or vision-based input. Both input data types pose their benefits and disadvantages.
Nevertheless, defining the study's scope, such as the target audiences, is crucial to narrow the research areas to specific parts, at home or in public facilities like the elderly health care system. After obtaining the idea of the research settings, the study's abnormal behavior types can be defined. One must select the most appropriate methods to perform ABD when determining the input data type. A typical ABD system consists of both feature extractors and classifiers. This paper discusses state-of-the-art methods, from conventional approaches using hand-crafted features to deep learning approaches with learned features. Conventional approaches are known for their simplicity and customization of features but is impractical for ABD in ADL as they only perform well with small data. Deep learning approaches have proven to be more effective in ABD but come with the cost of intensive data training. ABD in ADL also has many challenges, including a limited number of real-life datasets, vast variation of ADL, high false alarm rate, limited to one location, and more. These are the most basic yet complex challenges to be tackled in most ABD systems in ADL. Hence, this could be the next topic to be studied thoroughly, and the collaboration of Internet of Things technology should be considered. Her research interests include computer vision, machine learning, image processing, and biometric authentication. VOLUME 11, 2023