Learning Age From Gait: A Survey

Age is an important human attribute that needs to be determined for various purposes, including security, health, human identification, and law enforcement. Hence, there is an increasing research interest in automatic age estimation using biometric traits such as face and gait. In recent years, gait analysis has received growing attention due to the pervasive nature of video surveillance. Gait signals that measure the manner of walking can be obtained using vision and sensor-based techniques. Individual gait patterns obtainable from videos, images, or sensors are shown unconsciously and are not easily obscured. Additionally, gait signals can be obtained unobtrusively with cameras placed at a long distance because gait does not require high-resolution images. However, the extraction of age-associated gait features is a challenging task due to various gait covariates. These covariates include clothing and view changes for vision-based gait; walking slope and footwear for sensor-based gait. This paper provides a survey of scientific literature on age estimation using gait features. We focus on the approaches to extracting age-associated gait features, namely, vision-based and sensor-based approaches, how they may be affected by the different covariates, and domain-specific applications. To make this work useful for as wide of an audience as possible, we also include discussions on key topics such as existing datasets, evaluation strategies, and open challenges that should be addressed in the future.


I. INTRODUCTION
Age is an important human attribute that needs to be determined for various purposes, including security, health, human identification, and law enforcement. Given its importance, it is not feasible to rely on human discretion for age estimation [1]. Automatic age estimation involves automatically labeling a human with a precise age or age group based on physical attributes. The age attribute of an individual could be classified into two categories: the apparent age inferred from physical attributes; and the chronological age, which is the total number of years the person has lived from birth [2].
There has been a lot of research on automatic age estimation using face images [3]- [5]. Face-based age estimation has found practical application in many domains, such as preventing cybercrime and age verification in the gaming The associate editor coordinating the review of this manuscript and approving it for publication was Joewono Widjaja .
industry. For example, Innovative Technology 1 (ITL) offers age verification as a service for the online gaming market, using face images obtained from selfies of online gamers [6]. A limitation of face-based age estimation is that its performance depends on face image quality and exposure. For example, most surveillance cameras produce low-resolution images in which a subject's face could be occluded. Hence, face-based age estimation systems may be unsuitable for live surveillance. In recent years, gait analysis has received growing attention due to the pervasive nature of video surveillance. Individual gait is unique and is considered a behavioural biometric trait. It comprises posture and observable periodic patterns shown during bipedal locomotive activities such as walking, running, and jogging. In comparison with other biometrics, gait has many advantages. Individual gait patterns obtainable from videos, images or sensors are shown unconsciously and are not easily obscured. Additionally, gait FIGURE 1. Gait research trends in gait-based age estimation, gait and aging, and gait recognition showing that research on gait-based age estimation is still in its infancy and has not received nearly the same research attention as gait recognition in general. signals can be obtained unobtrusively with cameras placed at a long distance because gait does not require high-resolution images. For example, Watrix, 2 a prominent artificial intelligence company in China, developed software that recognizes people by gait. In 2018, Chinese authorities began a trial deployment of this software as a surveillance tool to identify citizens from up to 50 meters away, even with their backs turned or their faces occluded [7]. While the software's initial release required recording clips for analysis, Watrix later released an update that enabled real-time person identification [8].
The extraction of age-associated gait features is a challenging task due to various gait covariates. These covariates include clothing and view changes for vision-based gait; walking slope and footwear for sensor-based gait. Before 2001, gait features had been extracted for human recognition [9]- [11] and authentication [12], [13]. In 2001, Davis [14] classified pedestrians as adults or children based on their walking styles. Other early attempts on age estimation using vision-based gait include the works of [15]- [17], and [18] in 2005, 2010, 2014, and 2015 respectively. The first attempt on age estimation using sensor-based gait was made by Riaz et al. [19] in 2015. These early works and medical research on gait and aging paved the way for more research in gait-based age estimation. However, research on gait-based age estimation is still in its infancy and has not received nearly the same research attention as gait recognition in general, as age estimation requires careful feature selection and more data. Due to the availability of age-annotated data and the challenges involved in gait feature extraction, only a few early studies focused on age estimation using gait. These studies were conducted using self-collected datasets or relatively small datasets. In the year 2017, the largest gait dataset for age estimation was published [20], and there was a corresponding increase in the number of related papers. The trends described above are shown in Fig. 1. Based on a systematic search on Google Scholar, 3 we compare the numbers of research papers published in gait recognition, medical gait, and age estimation from gait between 2010 and 2020. From the figure, it is interesting to note that the number of studies on gait-based age estimation between 2019 and 2020 surpasses the total number of related publications before 2019. This increase in research interest is likely due to the recent increase in the number of publicly available large datasets and the deep learning revolution, which enables automatic feature extraction from silhouette images for further analysis.
Our Contributions: As shown in Table 1, the related reviews on gait-based age estimation [21]- [23] have coverage only up to 2019 and present general overviews on age and gender estimation using gait features. We aim to fill this gap by providing a comprehensive survey of scientific literature on age estimation using gait features from 2001 to 2021. We focus on the approaches to gait feature extraction, namely, vision-based, and sensor-based approaches, how they may be affected by the different covariates, and domain-specific applications. This survey aims to go beyond just a summary of the existing approaches to age estimation using gait by: 1) introducing the readers to the notions of age estimation using gait; 2) presenting a critical analysis of age-associated gait features and descriptors and how they have been applied for age estimation; 3) discussing the current approaches to gait feature extraction in age estimation and how they are affected by covariates; 4) suggesting domain-specific applications for each gait feature extraction approach; 5) assessing publicly available gait datasets suitable for age estimation; 6) outlining some current issues that should be addressed in the future. For the discussions in this paper, three age groups are defined, namely: child: (age = (0, 15) years); adult: (age = [15,65) years) and senior: (age ≥ 65) years.
The rest of this paper is organized as follows. Section 2 gives a high-level overview of the processes involved in age estimation using gait features. This overview includes gait sensing, feature extraction, age estimation functions, and factors affecting gait variability. In Section 3, we present a review of scientific literature on gait features as they are used for gait age prediction. We conclude the section with a brief discussion on the suitability of features based on the age estimation task. Section 4 discusses publicly available gait datasets with age metadata and metrics used for evaluation. Finally, before concluding in Section 6, Section 5 highlights the current progress, potential domains of application, open challenges, and potential research areas in age estimation using gait.

II. OVERVIEW OF AGE ESTIMATION USING GAIT
Human gait is characterized by two key attributes -the way of walking and posture [24]. While walking is a voluntary process, the overall gait is regulated by the nervous system [25]. Gait changes are part of the human growth process, and it takes some time for a child to reach gait maturity. During the first year of birth, a child goes from being carried to walking on all fours and then progresses to walking unsurely on two feet. Somewhere between ages 2 and 7, a child reaches some level of walking maturity and can walk confidently. Then, at age 60, the gait begins a decline [26].
Automatic age estimation using gait involves labeling a person with a precise age or age group based on extracted gait features. Gait is a unique behavioral biometric with attributes that can be modeled and learned to predict human age. While the actual age of a subject is referred to as chronological age, the age estimated based on gait is referred to as gait age [27]. Therefore, the terms gait age and age from gait are used interchangeably throughout this paper. As illustrated in Fig. 2, gait age prediction involves three main components: gait sensing, feature extraction, and age estimation. These components are introduced in this section but discussed in greater detail in the subsequent sections.

A. GAIT SENSING AND FEATURE EXTRACTION
Gait sensing is performed using either gait sensors or visionbased techniques such as images or video recordings. A gait feature is a measurement of an observed gait attribute, and a gait signature is a vector of a subject's gait features. Gait feature extraction is the process of obtaining the gait signature of subjects. Depending on the method of capturing gait, existing approaches to gait feature extraction are either vision-based or sensor-based. A camera is also a kind of sensor, but for simplicity, we categorize gait from cameras as vision-based and gait from other types of sensors as sensor-based. Preprocessing and post-processing steps may be applied depending on the method of acquiring gait.

1) GAIT FROM SENSORS
In the sensor-based approaches, gait features such as acceleration and gait speed are extracted from wearable sensors attached to the subject's body. Pressure and force-based gait features such as ground reaction force (GRF) are extracted from floor sensors installed on walking platforms or pressure-sensitive shoe insoles. Gait features such as stride length and range of motion can also be extracted using sensors.

2) GAIT FROM VIDEOS OR IMAGES
In the vision-based approach, gait features are extracted from images or videos of human walking sequences using a model-free or model-based approach. The model-based approaches utilize a skeleton model that includes the main body joints but ignores the general appearance of subjects. A skeleton model is fitted to the subjects' bodies to measure parameters based on body parts. On the other hand, the model-free approach is based on the appearance of subjects and does not require prior modeling of the human body. It involves background subtraction and binary silhouette generation. Silhouette images are grayscale and contain only the general outline and appearance information of subjects. Hence, the terms ''model-free'' and ''appearance-based'' are often used interchangeably.

B. AGE ESTIMATION
Gait age prediction is often formulated as a classification problem or a regression problem. In the regression approaches, age is taken as a continuous value, and the predicted age could take on any value within a specified range. This solution is referred to as Real-Value Age Encoding (RVE) [28]. On the other hand, the classification approaches adopt Classification Age Encoding (CAE), where each age or predefined age group is taken as a separate class.
Other techniques used for age estimation include ranking and granular learning, which are based on classification, regression, or a combination of both. In ranking approaches, classification tasks are broken down into a series of classification subtasks for proper modeling of the ordinal information in age [31], [32]. In granular learning techniques, age estimation is performed from coarse-to-fine age groups. More specifically, age group classification is performed and repeated for each classified age group [33]. Granular learning takes advantage of age group-dependent features and improves age estimation performance [34].
Evaluation: In classification tasks, age groups are predicted for each subject in a testing set. Performance is evaluated by measuring the prediction accuracy, that is, the proportion of correctly predicted age groups to the total number of subjects in the testing set. In regression tasks, the performance is evaluated by measuring the deviation of a subject's predicted age from the subject's actual age. This deviation is measured as the prediction error (in years). The prediction error across several subjects is measured as the Mean Absolute Error (MAE) or Mean Squared Error (MSE). Lower values of MAE and MSE are preferred. We discuss evaluation metrics in greater detail in Section 4.

C. SPATIOTEMPORAL GAIT VARIABILITY
While individual gait is unique, it is influenced by various factors that lead to spatiotemporal gait variations even in the same individuals. These factors are collectively known as gait covariates. They include emotion, walking speed, and viewpoint. The approaches to gait feature extraction for age estimation are affected in different ways by gait covariates. For example, in an experiment conducted by [25], children and adults were allowed to go on free walks with sensors to note each time their heels touched the ground. The time between consecutive heel strikes was measured as the interstride interval (ISI). There were variations in ISI for both children and adults, but the ISI of the children showed more variation. In general, spatiotemporal variations reduce as gait matures [35].
Gait covariates are not part of gait and may or may not be observable. Although their effect on gait is often transient, they have considerable effects on the extracted gait features. Hence, gait feature extraction is perhaps the most important and challenging process in gait age prediction. We first briefly discuss the types of covariates and how they can influence gait age prediction. We discuss factors that affect the spatiotemporal variation of gait under three categories -internal, external, and demographic (Fig. 3). The internal factors include muscle strength, nervous system function, and the general state of health of the individual. External factors include viewpoint, carried objects (COs), worn objects (WOs), multi-tasking, path angle of elevation, and walking speed. The demographic factors include age, gender, and race. Since this paper focuses on gait age prediction, we take age as the main variable and consider other factors as covariates. We summarize the effects of these categories of covariates of age-associated gait features in Table 2.

1) INTERNAL FACTORS
Health is a significant influencer of gait. Gait affected by disease or disability is known as pathological gait. For example, the effects of degenerative diseases such as Parkinson's disease on gait have been well studied [37]. This paper focuses on healthy gait because the existing datasets used for research on gait age prediction only include gait features acquired from healthy individuals.
The other internal factors that affect spatiotemporal gait variability include strength and emotion. These often affect the model-based features such as walking speed and sensor-based gait features such as range of motion. Adults are physically stronger than children and have a more balanced and symmetrical walking motion [38]. Studies have shown that compared to adults, seniors have a slower walking speed, shorter step length, and reduced range of motion in the ankle joints [39]. The decline in gait speed in seniors can be attributed to a decrease in stride length [40], which could, in turn, be attributed to a decrease in muscle strength with age [41]. In a recent study, Roether et al. [42] demonstrated that emotions influence gait by identifying anger, sadness, fear, and happiness from gait features.
While medical and physiological studies can measure the impact of internal factors on gait, research in pattern recognition focuses on external and observable factors and estimate VOLUME 9, 2021 demographic factors from gait patterns. Therefore, an exhaustive review of internal factors that affect gait variability is beyond the scope of this paper.

2) DEMOGRAPHIC FACTORS
There are variations in gait for individuals of different genders but of the same age or age group, affecting gait features. Studies [43], [44] have shown gender-based differences in stride length, gait speed, cadence, and range of motion (ROM) of the hip and ankle. According to Callisaya et al. [45], the relationship between gait and age varies depending on gender. For males, the relationship between gait and age is simple and linear. However, for females, the relationship between gait and age involves more variables and is more complex. For example, worn objects and carried objects differ widely among males and females in most cultures. Men tend to dress more plainly and carry backpacks, which requires them to lean forward to counterbalance the added weight [46]. On the other hand, females traditionally carry handbags, which reduce their arm swings. Additionally, females often wear high-heeled shoes, which require more adaptability to maintain a balanced gait.
It is possible to improve the accuracy of age estimation by incorporating gender information. However, this approach requires either knowing the gender of all subjects in advance or gender prediction as a separate task [47]. Lu and Tan [16] partially mitigated this challenge by assuming that the gender information for randomly selected training samples was missing. Hence, the labels used were 0 for male, 1 for female, and 0.5 for missing values.

3) EXTERNAL FACTORS
The performance of gait as a biometric is affected by covariates such as view angles, carried objects, and dress-ing. According to Connie et al. [48], the most significant gait covariate is the view angle. The method presented by Khamsemanan et al. [49] was an attempt to perform gait recognition across different views. To facilitate research on multi-view gait analysis, [50], [51] recently released two large multi-view datasets with 14 viewpoints for each subject. Leveraging on recent advances in pose estimation, [52], [53] proposed gait recognition methods that use body joint coordinates as features or inputs into Graph Convolutional Neural Networks [54]. Varying walking speeds also presents a challenge in the measurement of spatiotemporal gait features. When a subject alters walking speed, static features relating to body size are not affected, but dynamic features such as stride length and ROM are. This covariate can be addressed by normalizing with walking speed or applying transforms such as wavelet transform on gait sequence data [55].

III. LEARNING AGE FROM GAIT FEATURES
This section presents a review of techniques for gait age prediction following the hierarchical structure of age-associated gait features shown in Fig. 4.

A. GAIT FEATURES FROM VIDEOS AND IMAGES
In most approaches to gait age prediction, gait features are extracted from videos or images containing walking sequences of subjects using computer vision approaches. The vision-based features are either model-based or modelfree. While the model-based features are obtained by measuring body parameters from skeleton models fitted to the subjects in the video or image, the model-free features are extracted from silhouette images obtained after background subtraction.

1) GAIT AGE FROM MODEL-BASED FEATURES
Model-based gait features could be broadly categorized as biological or kinematic. Biological features are based on the subject's shape and size, such as limb lengths, body height, head height, and derived features such as head-to-body ratio. These could also be referred to as static features, as they remain constant throughout a subject's gait cycle. On the other hand, kinematic features such as stride length and step length are dynamic and vary during locomotion. They measure the distances covered while waking and the amount of time taken for different gait cycle phases (Fig. 5). We define some gait features in Table 3 and compare children, adults, and seniors based on gait features in Table 4.

a: BIOLOGICAL FEATURES
According to studies [38], [57], certain body ratios provide age information. One of these is the head-to-body ratio, which is larger in children. Using the head-to-body ratio as a feature, Ince et al. [17] performed age classification of pedestrians as child or adult. Additionally, children have shorter limbs, and their upper limbs appear closer to the ground. Children also tend to walk with their eyes focused on the ground, giving them a forward-tilted posture. Biological features are often sufficient to classify subjects as children or adults but may not be discriminative enough for more fine-grained classification or age regression. For example, subjects within the age range 30 -60 years would all be classified as adults based on their head-to-body ratios. More fine-grained classification or regression can be achieved using kinematic features or hybrid features that combine biological and kinematic features.

b: KINEMATIC FEATURES
A study by Sutherland [58] revealed that children between 12-18 months of age have reciprocal arm swings and heel-strikes, suggesting a difference in the wrist and ankle movement of children and adults. Due to their relative shortness of limbs, children have smaller arm swings and stride length, while they have higher cadence than adults and seniors [38]. Davis [14] made the earliest attempt on age classification with only 15 participants. A point model of the human body was used to perform a spatiotemporal analysis of head and ankle movements. A correct classification rate of 95% was achieved in classifying subjects as children (3-5 years) or adults.
Using features based on minimum foot clearance obtained from 58 subjects walking on treadmills, Begg et al. [15] performed age classification using Support Vector Machines. Zhang et al. [41] proposed an age classification technique using Hidden Markov Models. Contour features extracted from silhouette images were sufficient to model the subject's shape variations while walking and classify the subjects as either young or elderly. Frame to Exemplar Distance (FED) was applied for dimensionality reduction of extracted contour features.
In a Baltimore Longitudinal Study of Aging [34], there were 190 participants with ages ranging from 32 to 93 years. These were divided into middle-aged (32 -57 years), old (58 -78 years), and oldest (79 -93 years). At preferred walking speed, the range of motion (ROM) of the ankle (θ 5 in Fig. 6) was lower for the middle-aged, while hip ROM (θ 3 in Fig. 6) was lowest for the oldest group. Within the middleaged group, stride width reduced with age, while stride width increased with age within the old. These findings suggest that age-associated gait changes are age group-dependent. Hence, age estimation performance can be improved by granular learning-considering successively smaller age groups and learning the age-associated differences within each.
Wu et al. [60] studied the kinematic characteristics of subjects from the middle-aged, elder, and the young with mean ages 52.1 years, 74.8 years, and 23.3 years, respectively. Spatiotemporal parameters including gait speed, stride length, and step length were measured. The range of motion (ROM) of the angle formed by the center of pressure (COP) and center of motion (COM) was also measured. The most significant differences in spatiotemporal parameters were between the middle-aged and young subjects. A more recent study [65] discovered differences in the cadence and stride length of healthy adults and seniors. At the same walking speed, adults were reported to have higher cadence, longer stride length, and higher minimum toe clearance (MTC) than seniors. Seniors spent more time in the double-support stance phase and less time in the swing phase than adults. Adults showed greater flexion for both knee and ankle angles, with a lower ROM at the hip. Seniors also have a bent posture, wider bodies, and greater arm swings [38]. Yang and Wang [29] proposed a descriptor based on the lower limb joint angles. The joint angles were extracted as time-series signals during subjects' walks. The periodic signals obtained were expanded as Fourier series to solve the problem of missing data caused by occlusion. The harmonic coefficients of the resulting signal were then obtained by using the genetic algorithm. The moduli of the coefficients obtained were used as feature vectors in age classification.

c: HYBRID FEATURES
While biological and kinematic features contain ageassociated information, higher accuracies are often attainable by a fusion of biological and kinematic features. Some features can also be derived from both biological and kinematic features. An example is the step factor, which keeps increasing until the age of 4 years, after which it remains constant [38]. Hediyeh et al. [61] explored step frequency and step length obtained automatically in uncontrolled environments for age and gender classification, achieving an accuracy of 86% for age classification. In a study to compare the performance of face, gait, and speech features, Punyani et al. [21] combined gait speed, head-to-body ratio, and gait height to perform age estimation.
Chuen et al. [16] extracted features including stride length, stride frequency, head length, body length, head-to-body ratio, leg length, and stature. The body parts and joint positions were identified and labeled from the silhouette images of subjects, and features were extracted. The extracted features were used in classifying subjects as either adults or children. Using skeleton models obtained from an RGB-D sensor, Yoo and Kwon [62] performed age and gender classification. They extracted features such as shoulder width, hip width, spine length, leg length, step width, and joint angles and achieved an accuracy of 85.58% for age classification.
Hema and Pitta [63] proposed gait energy image projection models (GPM), which combines the longitudinal and transverse projections of the gait image. The transverse projection captures variations in image height, while the longitudinal projection captures variations in image width across the gait cycle. This descriptor mainly focuses on the subject's head and arm movements, body size, and stride length.
Aderinola et al. [64] attempted age group classification using walking sequences of 154 subjects obtained from public domain videos repositories such as YouTube. 4 First, they performed pose estimation using a state-of-the-art pose estimation framework [67]. Then, they extracted the head-to-body ratio, lower limb length, step length, upper-limb-to-ground distance, cadence, and gait speed from body joint pixel coordinates. They achieved 96% accuracy in classifying the subjects as children, adults, or seniors. A summary of model-based age estimation techniques is shown in Table 5.

2) GAIT AGE FROM MODEL-FREE FEATURES
Model-free gait features are also referred to as appearancebased features since they are based on subjects' general shape and appearance in images or videos. Model-free feature extraction involves two main processes, namely, background subtraction and silhouette generation. In some gait recognition tasks, gait is represented as a binary silhouette sequence. The average silhouette image, also known as Gait Energy Image (GEI), was proposed by Han and Bhanu [66]. The GEI is a compact and robust gait descriptor widely used as an appearance-based gait descriptor for age estimation. Given a gait video sequence with N frames, the GEI, G (x, y) can be obtained from binary silhouette images B n (x, y) as: where n is the frame number and (x, y) is the 2D image coordinate. GEIs include both space and time information.
In comparison to the traditional representation of gait with binary silhouette sequences, GEIs are compact, lightweight, and robust. Each row in Fig. 7 shows examples of binary silhouette images with their corresponding GEIs. Since GEIs are affected by viewpoint variations in gait, Lu et al. [68] proposed a cluster-based average gait image (C-AGI). Unlike GEIs, C-AGIs can model human gaits from varying views and poses, but this causes excessive intra-class variations.
In more recent studies [69], [70], gait is taken as a set of gait silhouettes. This approach is more robust to view and walking variations and has achieved state-of-the-art accuracy in multi-view gait recognition. For other state-of-the-art gait representations for gait recognition that could be applied to age estimation, the reader is referred to [71]- [73].
We categorize model-free features as deep, handcrafted, or hybrid features. Deep features are extracted using Convolutional Neural Networks (CNNs). CNNs require no manual feature selection, as they can learn features automatically from silhouette images. On the other hand, handcrafted features are more suitable for classical machine learning techniques, and results obtained using handcrafted features are often more interpretable than those obtained using deep features.

a: DEEP FEATURES
Convolutional Neural Networks offer the unique capability of automatic feature learning from input GEIs and can perform well with large-scale data. Berksan [74] explored this to evaluate CNN architectures for gender classification and age estimation using average silhouette images as features, achieving an MAE of 5.74 years for age estimation. Using a large gait database with more than 60,000 subjects, Sakata et al. [75] performed age estimation using a DenseNet. Using a deep residual network, Zhang et al. [76] performed a multitask classification of subjects based on age and gender. The results obtained by [76]- [78] suggest that multi-task learning can improve the accuracy of age estimation. For example, learning age and gender in parallel can improve the accuracy of age estimation. However, this is not always the case. Instead of parallel multi-task learning, Sakata et al. [47] proposed a sequential multi-task CNN for age estimation by predicting gender as a first step.
Compared to model-based features, appearance-based gait features often show greater disparities in the gait age and the actual age of subjects. For example, depending on the quality of life and habits, a forty-year-old man may have a younger gait age than a twenty-year-old man based on appearance. To properly model these uncertainties, Sakata et al. [79] pre-VOLUME 9, 2021 sented a method that predicts the age of subjects with a confidence score. Xu et al. [80] also performed uncertainty-aware age estimation and demonstrated its effectiveness in human search and counting by age group.
Additionally, appearance-based gait often varies widely with gender, dressing, and carried objects. To account for the effects of gender, Abirami et al. [81] presented a method that relates age-group with gender, using the Hilbert-Schmidt Independence Criterion (HSIC) to project high-dimensional data onto a low dimension subspace for age estimation. Li et al. [82] proposed a method to mitigate the challenge of carried objects by using generative adversarial networks (GANs). Given GEIs with or without carried objects as input, the GAN is trained to generate GEIs without carried objects. To model the ordinal information in age. Zhu et al. [83] posed the age regression task as a series of binary age classification sub-tasks. The proposed neural network architecture contains sub-networks capable of learning the local and global gait features.
The extraction of GEI or average silhouette features from subjects requires at least one gait cycle. This requirement introduces latency in the gait age prediction process. To address this and attempt real-time gait age prediction, Xu et al. [87] reconstructed the silhouette sequence of an entire gait cycle from a single image for age estimation.

b: HANDCRAFTED FEATURES
Using Gaussian process regression, Makihara et al. [27] performed age estimation using three silhouette-based features: GEI, FREQuency domain features, and Gait Periods (GP). The FREQ features achieve the lowest MAE of 8.2 years. Lu and Tan [18] proposed a Gabor-filtered GEI, which showed superior performance to the original GEI feature in age estimation tasks. Different age labels were encoded as a binary sequence, and a multilabel guided subspace (MLG) was proposed as a projection to characterize the relationship between age and gender. Makihara et al. [38] also used a multi-view dataset to classify subjects into four classes: children, adult males, adult females, and elderly, using the average gait features for each class. They first selected nine age groups and then performed Linear Discriminant Analysis (LDA) to view the inter-class distances between adjacent age groups. Adjacent age groups with inter-class distances below a threshold were then combined to form a class.
With a similar approach, Li et al. [33] first cluster gait features with age labels. Then, using manifold learning techniques, they train a support vector regressor for each cluster. Lu and Tan [84] also proposed an ordinary preserving manifold learning technique, which they applied to age estimation using a multiple linear regression model.

c: HYBRID FEATURES
Inspired by the improved performance offered by biometric features fusion, there have been attempts to improve age estimation performance through a fusion of gait descriptors. For example, [85] had proposed a Silhouette Model (SM) descriptor for age group classification based on the longitudinal and transverse projections of subjects' silhouette images during the gait cycle. The longitudinal projections describe the stride length, arm swing, and body size, while the transverse projection describes the height and posture of subjects. As an improvement on SM, Mansouri et al. [30] proposed SGF, a fusion of SM, GEI, and FED. They showed that the fusion-based descriptor performed better than any of the individual descriptors for age classification. It has also been shown that gait offers better performance in age estimation when combined with features obtained from face images [86]. Table 6 shows a summary of model-free age estimation techniques.

B. GAIT AGE FEATURES FROM SENSORS
In the sensor-based approaches, gait features are extracted from wearable sensors attached to the subject's body, floor sensors installed on walking platforms, or audio signals. Floor sensors measure the forces or pressures exerted during the stance phase of gait, also referred to as Ground Reaction Forces (GRF). A GRF profile is obtained when these forces are measured for a footstep.
By fusing the data from multiple low-cost wearable inertial sensors, several gait parameters can be reliably estimated. For example, [92] estimated step length from low-cost Inertial Measurement Units (IMUs) placed on both feet of subjects. Qiu et al. [93] obtained lower limb joint angles of subjects. Gait parameters such as gait speed, stance phase, and swing phase can also be measured using inertial sensors [94]. Wearable sensors such as accelerometers measure the acceleration of subjects during locomotion as a time-series signal. Since acceleration is a function of body mass and forces acting on the body, accelerometer-based features capture a lot of gait information [36]. Gait features can also be extracted from the sound signals generated during a gait cycle. For example, the sound signal generated during a subject's heel strike occurs periodically and is distinct from the more subtle signal generated during the swing phase.
These approaches are commonly used for person recognition. However, sensor-based features have been shown to contain discriminative features that can be used to estimate age, gender, and height [19]. In a recent competition for age and gender classification using wearable integrated measurement units (IMU sensors), a mean absolute error (MAE) as low as 5.39 was obtained using accelerometer-based gait features [90]. Since acceleration is a function of mass and force, the acceleration values recorded from wearable sensors depend on how close they are to the subject's center of mass and center of pressure. Riaz et al. [19] estimated age, gender, and height from the accelerations and angular velocities obtained from IMUs attached to subjects' chest, lower back, right wrist, and left ankle. They applied the moving average technique to suppress noise in the raw acceleration features. Using Support Vector Machines (SVM) and Decision Trees, Khabir et al. [88] extracted time-domain features from inertial sensor data for age estimation. They eliminated noise using a low-pass Butterworth filter. Gillani et al. [89] performed age estimation and gender classification based on accelerometer-based gait features collected from inertial sensors. Using accelerometer-based features, Yuhan et al. [91] classified subjects as young-middle age (18 -65 years), healthy older (65 years and above), and geriatric patients. A summary of sensor-based approaches to age estimation is presented in Table 7.

C. SUITABILITY OF GAIT AGE FEATURES
The age estimation task is often cast as classification, regression, or ranking. As with any pattern recognition task, the kind of features required for classification differs from those required for regression. Sensor-based gait features can be used for both age group classification and regression (Table 7). However, vision-based features may vary in their applicability to age estimation tasks depending on whether they are model-free or model-based.
Model-free gait descriptors are lightweight and require little computational power. Their compactness makes them suitable for the storage of large amounts of gait data. Hence, the largest gait datasets make use of the model-free GEI descriptor. Convolutional Neural Networks offer the unique capability of automatic feature learning from input GEIs and can perform well with large-scale data. Due to the availability of large amounts of well-annotated data and the robustness VOLUME 9, 2021 of GEIs, many model-free techniques can take advantage of the power of deep learning for age group classification, age regression, and age ranking.
Working with model-based features often involves a manual feature selection procedure, which is time-consuming and requires domain expertise. Moreover, since most publicly available datasets are model-free, model-based approaches often use modest-sized self-collected datasets. Whereas deep learning models offer automatic feature selection and powerful prediction capabilities, they require large datasets. The small size of model-based datasets and the feature selection requirement make them unsuitable for deep learning models. Additionally, due to the wide variability of gait across ages and age groups, it may be challenging to estimate specific ages directly from model-based gait features. Hence, most model-based approaches use conventional machine learning approaches for prediction and perform only age group classification (see Table 5). Only Punyani et al. [21] performed regression using model-based features. The main advantage of model-based features over model-free features is the greater interpretability of age prediction results.

IV. PERFORMANCE BENCHMARKING
In recent years, there is wider availability of public datasets with age information that can be used for performance evaluation of age estimation techniques. This section discusses the publicly available datasets and the most common evaluation metrics for gait-based age estimation.

A. DATASETS
While many public gait datasets are available, datasets with no age metadata are unsuitable for age estimation. Public gait datasets containing age metadata are listed in Table 8.
The USF dataset was published by the University of South Florida. The initial version of the dataset [95] contained 452 walking sequences from 74 subjects, with 75% being male. The dataset contains variations in the walking surface, viewpoint, and footwear. The current version of the USF dataset [96] contains 122 subjects with 1870 gait sequences extracted from video sequences. The number of subjects is very few, and the subjects' age range is small. There are no children, neither are there seniors in the dataset. Additionally, the male-to-female ratio is more than 2:1, which may introduce gender bias in predictions. Notwithstanding, the USF dataset contains many sequences per subject and is often used to evaluate gait recognition and gait age prediction techniques.
The TUM-GAID dataset [97] was collected at The Technical University of Munich using Microsoft Kinect sensors. This dataset has the advantage of being multi-modal, as Microsoft Kinect sensors output visual images, depth images, and audio streams. Additionally, the TUM-GAID is the only publicly available multi-modal sensor-based dataset containing age and gender metadata. However, with 305 subjects, 186 of whom are male, the TUM-GAID dataset is relatively small and not well balanced in terms of gender. Also, there are no children and seniors in the dataset.
Vajdi et al. [98] published a dataset based on accelerometer data collected from 93 subjects. The data was collected from two mobile phones attached to each subject's body. One was attached to the right thigh, and the other to the left side of the waist. Each subject walked a total of 640 meters. Since acceleration and angular velocity data were captured, the dataset is suitable for gait recognition and more general gait analysis based on walking motion. So far, this is the only publicly available accelerometer-based gait dataset containing age and gender metadata. The dataset is balanced in terms of gender -the male-to-female ratio is very close to 1. However, the subject size is small and does not include children and seniors.
The Osaka University Institute of Scientific and Industrial Research (OU-ISIR) has published several datasets suitable for age estimation. The first was the OU-ISIR Treadmill dataset C [38], which includes 88 males and 80 females between the ages of 4 and 75 years from 25 view angles. Each 10-year age group in the dataset consisted of at least 10 subjects, making the dataset very diverse and robust in terms of age. However, the size of this dataset is relatively small. Additionally, it was collected on treadmills, which may not accurately model the normal walking gait of subjects.
The OULP dataset [99] with normalized silhouette images was obtained from the walking sequences of 4007 subjects. The OULP dataset is large, gender-balanced, and all age groups are represented. However, the number of seniors in the dataset is few as compared to children and adults. Besides, the dataset does not include any variation in walking conditions.
On the other hand, the OULP-Sensor [100] is the largest inertial sensor-based gait dataset to date. The walking path slope angle of subjects was varied to make the data more robust. Four different types of sensors were used and placed around the subjects' waists -back, left, and right. Both acceleration and angular velocity were captured, making the dataset suitable for gait recognition and general gait analysis based on walking motion. With the subjects' ages ranging from 2 to 78 years, all age groups are represented. In addition, the dataset is well balanced in terms of gender, with 389 males and 355 females. The main limitation of this dataset is that the walking sequence for each subject is relatively short -a total of 12 meters.
The OULP-Age dataset [20], with more than 60,000 subjects, is by far the largest gait dataset in the world. It includes the average silhouette images of the subjects' walking sequences. The subjects' ages range from 2 to 90 years. Both male and female subjects are equally represented in the dataset. Between the ages of 0 and 70 years, each 5-year age group consists of 500 subjects or more. The main advantage of this dataset is its coverage of a very large population.
The OULP multi-view dataset, OUMVLP [50], is the largest multi-view gait dataset. With average silhouette images obtained from walking sequences of 5144 males and 5193 females, the OULMVP dataset is balanced in terms of gender. The cameras were placed at 14 viewpoints with 15-degree intervals for each subject -7 views between angles 0 and 90 • and another 7 views between 180 • and 270 • . The wide view angle variation makes the OULMVLP dataset very suitable for evaluating multi-view gait recognition and gait age prediction techniques. However, the age information for some of the subjects is not provided.
The OUMVLP-Pose dataset [51] was generated based on the OUMVLP dataset. The dataset includes the pose sequences of all the subjects in the OUMVLP dataset. The pose sequences were extracted using two state-of-theart pose estimation frameworks. Sharing the advantages of the OUMVLP dataset, the OUMVLP-Pose is the largest model-based gait dataset and is suitable for evaluating gait recognition and gait age estimation techniques. However, some pose sequences are missing, as well as the age metadata of some subjects.
VersatileGait [101] is a large synthetic dataset having 11000 silhouette images generated directly from game engines. It is the first publicly available synthetic gait dataset. Apart from its size, the dataset includes complex scenarios such as dressing and flexible viewpoint angles. Aside from high-level metadata such as identity, fine-grained descriptions such as walking style, age, and gender, are also included.

B. EVALUATION METRICS
The evaluation metrics used for gait age prediction include mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), cumulative score (CS), and correct classification rate (CCR). For consistency in comparison among the different techniques, we only report MAE and MSE, and CCR.

1) REGRESSION METRICS
Metrics used for gait age regression tasks include MAE, MSE, RMSE, and CS. Given an evaluation dataset with N subjects, let y n be the actual age, andŷ n the gait age of the n th subject. The MAE, MSE, and RMSE are each given as: Lower values of MAE, MSE, and RMSE mean better performance. The MAE is the most used metric for regression in gait age prediction. However, a few works, such as [88], use MSE for evaluation. The MSE metric places higher penalties on errors than the MAE.
Apart from obtaining the performance of gait age prediction using the MAE, MSE, or RMSE, it is possible to gain more insight into the performance of an age regression model by determining its error tolerance. This is often achieved by using the cumulative score, CS (k), given as: where e = y n −ŷ n , the absolute prediction error for sample n, k is the set error threshold, and N e≤k is the total number of samples for which e ≤ k. CS (k) shows the percentage of test samples for which the absolute error is within the set error threshold.

2) CLASSIFICATION METRICS
The accuracy score or correct classification rate (CCR) is commonly used as the main evaluation metric in age classification tasks. In binary classification, CCR can be obtained in terms of the True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). However, age classification is often cast as a multi-class problem with more than two target age groups. Given a dataset with N samples and C classes, let p n be the predicted class, and x n the actual class of the n th sample, p, x ∈ [1, 2, . . . C]. The CCR may be generalized for multi-class problems by using the Kronecker delta δ (p n , x n ): While the CCR is often used as the main metric in classification tasks, it could sometimes be misleading. Some more reliable measures of model performance include the class-weighted precision, recall, and F1 score. For multi-class problems, they can be obtained from the confusion matrix, M ∈ N C×C , which can be represented as: where n ij is the number of predictions for class i when the actual class is j, and the diagonal elements, n ii represent correct predictions. The precision for class i shows the proportion of samples predicted to be of class i that truly belong to class i. It can be obtained by dividing the diagonal element of row i by the sum of all row i elements. On the other hand, the class recall shows the proportion of samples in class j that are correctly predicted. It can be found by dividing the diagonal element of column j by the sum of all column j elements.
If we denote the diagonal elements of M as m ∈ N C , the precision, recall, and F1-score for class x, x ∈ [1, 2, . . . C] can be written as:

V. DISCUSSION
This section discusses the current progress of gait age predictions, potential applications in the laboratory and real-life scenarios, and challenges in gait age prediction that should be addressed in the future.

A. POTENTIAL APPLICATION DOMAINS
Gait age prediction is by no means a solved problem as gait feature extraction remains a challenging task due to the various covariates. Additionally, feature fusion is usually necessary to improve the performance of gait age prediction.
To the best of our knowledge, gait age prediction has not been deployed for public use. One question that may arise for public deployment is, ''Which gait modality is best suited for gait age prediction?'' The vision-based and sensor-based approaches to gait feature extraction each have their merits and demerits. However, no single approach can be said to be generally better than the other. Instead, their suitability depends on the application domain. The application domains can be grouped generally as gait in the lab and gait in the wild. Gait in the lab refers to application areas in which gait is collected under constrained conditions with control over the covariates such as dressing, gender, carried objects, walking speed, walking slope, and footwear. Gait in the wild refers to gait collected with little or no control over these covariates and many others.

1) GAIT IN THE LAB
Sensor-based gait analysis is more predominant in gait in the lab research, such as medical research, where there is strict control over gait covariates. One of the main advantages of sensor-based gait features is that they are not affected by external covariates such as viewpoint and dressing. Sensor data can provide precise gait data, but it often requires specialized equipment and a laboratory setting with constraints. Though inertial sensors are relatively low-cost and gait features can be extracted from inertial sensors embedded in smartphones and smartwatches, acquiring gait features from sensors is still not unobtrusive, and wearable sensors must be carried by the subjects.
Even with recent developments, including the development of pressure-measuring floor tiles and shoe insoles [36], the question remains whether sensor-based techniques can be deployed on a large-scale outside laboratory setting. Potential use for sensor-based gait age is gait simulation based on ageassociated gait features. Age-based gait simulation could be used in specific domains such as animation and gaming to make character movement more realistic based on age.

2) GAIT IN THE WILD
Gait signals in the wild can only be acquired using vision-based techniques, either model-based or model-free. Though more computationally expensive, the model-based approaches do not require background subtraction and are more robust to external covariates such as clothing and carried objects. Hence, model-based techniques may be more suitable for gait in the wild.
Also, to take full advantage of gait age prediction for surveillance, prediction should be done in real-time, which can be achieved more easily with model-based techniques. Since model-free approaches are based on average silhouettes or a set of binary silhouette images, they require several video frames to capture at least one gait cycle. This requirement introduces some latency in gait age prediction. To address this, Xu et al. [87] proposed a model-free method that predicts age and gender from a single image. Given a single image, they reconstruct the complete gait silhouette sequence for age estimation and gender classification. They demonstrated the efficacy of this method for real-time age and gender prediction. However, the accuracy of gait age prediction depends on the accuracy of the generated average silhouette. Potential application domains for gait age prediction in the wild include public surveillance, security, and access control. For example, the age group of a masked criminal can be predicted from public surveillance footage; an unattended child can be detected in a public place; a minor can be prevented from buying an alcoholic drink from a vending machine, and so on.

B. CHALLENGES IN GAIT AGE PREDICTION
Several issues and challenges in gait age prediction have not been addressed or are yet to be thoroughly addressed in the literature. They include:

1) GAIT AGE UNCERTAINTY
Errors in age estimation often arise due to disparities between predicted age and the actual age of subjects. These disparities arise because age prediction is based only on observable patterns. Subjects' gait patterns may be affected by unobservable factors such as disability, quality of life, and health. Hence, gait age prediction should include some corresponding level of uncertainty. To the best of our knowledge, only the works of [79] and [80] present methods that predict subjects' ages while considering these uncertainties.

2) EFFECTS OF TIME-LAPSE
Several datasets consider covariates such as view angle, dressing, and carried objects. However, no gait dataset provides gait information of subjects over a long time. For instance, the FG-NET face dataset [102] provides face images of the same individual across different ages, which could be used to simulate the effects of aging on face images. It would be worthwhile to have gait datasets that would enable the simulation of aging effects on gait. To the best of our knowledge, the only works that attempt gait-based age progression are [103] and [104].

3) VIEW INVARIANT GAIT AGE PREDICTION
Viewpoint is one of the main covariates of gait. Certain gait features can be obtained in the frontal view, while others can be obtained only in the sagittal view of gait. There are numerous studies on gait recognition across different views. However, there is a need to study which view of gait offers the most age-discriminating features. Studies in this area can give insight into gait features that are both view-invariant and age-discriminative.

4) MULTI-MODAL GAIT AGE PREDICTION
Research in biometrics has found that a fusion of features from different biometric modalities offers higher accuracy in recognition tasks. While there are fusion approaches that combine different types of gait features for age prediction, only the work of [86] combines gait with another biometric modality for age estimation. In the study, age estimation was performed by fusing gait features with features obtained from face images of subjects.

5) REAL-TIME GAIT AGE PREDICTION
If gait is captured based on posture and how humans walk, what is the minimum number of steps required to capture agediscriminative features? This question needs to be answered if gait age prediction would be applied in the real world. Many existing approaches to gait age prediction introduce some latency due to the number of steps or video frames required to capture sufficient gait information. One research that attempts to solve this problem is [87], which reconstructs the silhouette sequence of a complete gait cycle from a single image.

VI. CONCLUSION
Automatic age estimation is a rapidly growing research area that finds practical use in medicine, security, surveillance, and access control. Gait age prediction takes advantage of the uniqueness and unobtrusive nature of human gait. The list of potential practical applications is limitless. For example, based on the manner of walking, the age group of a crime suspect can be predicted based on surveillance footage; unattended children in public places such as airports can be detected, and so on. Many of these require obtaining subjects' features from a distance in an unobtrusive manner, making gait a perfect candidate.
Compared to research on face-based age estimation, gait age prediction is still in its infancy and has not been well studied. With the current progress in gait age prediction, it is still essentially unusable in real-life scenarios, that is, in the wild. This is perhaps due to the challenges involved in obtaining gait features, such as the effects of covariates on the performance of gait features. However, these covariates apply mainly to gait features and have little or no effect on biometrics obtained from other modalities. Future research in gait age prediction will most likely overcome these challenges by fusing gait features with features from other biometric modalities that are not affected by the same covariates, making for more robust age prediction methods. He is currently an Associate Professor with the Faculty of Information Sciences and Technology (FIST), Multimedia University. He has published more than 60 international refereed journal articles and conference papers. His research interests include biometric security, machine learning, and data analytics. He was the Conference Chair of ICoICT2017 and ICoICT2019. He has served as the Editorial Board for IEEE BIOMETRIC COUNCIL NEWSLETTER, from 2013 to 2015.
WEI-CHUEN YAU (Member, IEEE) received the B.S. and M.S. degrees from National Cheng Kung University, Taiwan, and the Ph.D. degree from Multimedia University. He is currently an Associate Professor with the School of Electrical and Computer Engineering, Xiamen University Malaysia. He is also a Chartered Engineer (CEng) and a Certified Information Systems Security Professional (CISSP). His research interests include cryptography, security protocols, machine learning, and network security. He was a General Co-Chair of Mycrypt 2016. He has also served as a Guest Editor for the ETRI JOURNAL Special Issue on Cyber Security and AI.
ANDREW BENG JIN TEOH (Senior Member, IEEE) received the B.Eng. degree in electronic and the Ph.D. degree from the National University of Malaysia, in 1999 and 2003, respectively. He is currently a Full Professor with the Department of Electrical and Electronic Engineering, College Engineering, Yonsei University, South Korea. He has published more than 300 internationally refereed journal articles, conference papers, edited several book chapters, and edited book volumes. His research, for which he has received funding, focuses on biometric applications and biometric security. His current research interests include machine learning and information security. He served and is serving as a Guest Editor for IEEE Signal Processing Magazine, an Associate Editor for IEEE TRANSACTIONS OF INFORMATION FORENSICS AND SECURITY, IEEE BIOMETRICS COMPENDIUM, and Machine Learning with Applications, and the Editor-In-Chief for IEEE BIOMETRICS COUNCIL NEWSLETTER.