Physical Fatigue Detection From Gait Cycles via a Multi-Task Recurrent Neural Network

This paper describes a deep learning approach to classify physically fatigued and non-fatigued gait cycles via a recurrent neural network (RNN), where each gait cycle is represented as a time series of three-dimensional coordinates of body joints. Gait cycles inherently have large intra-class variations caused by gait stance differences (e.g., which foot is supporting/swinging) at the beginning of each gait cycle, which makes it difficult to identify subtle differences induced by fatigue. To overcome these difficulties, we introduce a supporting foot-aware RNN model in a multi-task learning framework for better fatigue detection. More specifically, the RNN model has two branches of layers: one is assigned to the main task of fatigue classification and the other is assigned to the auxiliary task of estimating the first supporting foot in the gait cycles. We collected physically fatigued and non-fatigued gait cycles from eight subjects and conducted experiments to evaluate the accuracies of the proposed multi-task model in comparison to a single-task model. As a result, the proposed method achieved an overall area under curve (AUC) of 0.860 for fatigue classification in a leave-one-subject-out cross-validation, and an AUC of 0.915 in a leave-one-day-out evaluation. It can be concluded from the experimental results that a fatigue detection system for daily use, especially for screening purposes, is very feasible on the basis of the proposed approach.


I. INTRODUCTION
Fatigue is a serious concern and several studies have reported its worldwide prevalence. Fatigue has been measured in a variety of ways and populations with various numbers of survey respondents: 14.3% of males and 20.4% of females in a United States Health and Nutrition Examination Survey [1]; 38% (18.3% for 6 months or longer) in a British community survey [2]; 25% in an Australian primary-care study [3]; 22% (11% for 6 months or longer) in the general Norwegian population [4]; and 17.2% for fatigability (from a state of no fatigue to fatigue onset) and 13.6% for residual fatigue (from fatigue onset to complete recovery) in the general Japanese population [5]. Fatigue is common in the working The associate editor coordinating the review of this manuscript and approving it for publication was Joewono Widjaja . population [6] and has adverse effects on working efficiency, resulting in the loss of productivity and substantial economic costs [7]- [9]. Furthermore, fatigue can lead to serious occupational accidents [10], for example, in the transportation industry [11], in the construction industry [12], and during surgical operations [13]. Physical fatigue [14] in particular degrades physical performance and might lead to increasingly more fatal accidents (e.g., a fall from a high place on a construction site).
The condition of physical fatigue affects human gait and posture [15]- [18]. In addition, gait (walking) is one of the most common activities in everyday life, and hence it can be a promising clue for detecting physical fatigue in various situations [14], [19], [20].
As a means of gait measurement, an optical motion capturing (MoCap) system and inertial measurement units (IMUs) have been used for fatigue analysis [14], [19]- [21]. Both the MoCap system and the IMUs, however, place a burden on the subject (e.g., optical markers or the IMUs need to be attached to body parts). A user therefore might feel uncomfortable using this system in everyday life.
As another means of gait measurement, computer visionbased approaches are promising, because they do not require the abovementioned burden on the subject. Particularly, recent progress of human pose estimation using depth images (e.g., Microsoft Kinect [22]) or color images (e.g., Open-Pose [23]) enables us to relatively easily obtain gait kinematics (e.g., sequences of joint positions). In fact, vision-based gait kinematics have already been used in biometric person identification [24].
Variations in the gait kinematics between physically fatigued and non-fatigued samples of the same subject (i.e., inter-class variations) are, however, quite subtle, and hence intra-class variations (e.g., differences among subjects) overwhelm the inter-class variations. Moreover, different gait stances (e.g., which foot is the supporting foot at the beginning of a sequence) increase the intra-class variations because temporal patterns of gait cycles as time-series signals inherently differ from each other according to the gait stance.
To overcome the large intra-class variation due to gait stance differences, we propose a supporting foot-aware deep learning model for physical fatigue detection. Fig. 1 shows the overview of the proposed approach. More specifically, we take a gait cycle of three-dimensional (3D) joint positions as the input and feed it to a recurrent neural network (RNN) to output fatigue/non-fatigue probabilities. Moreover, the RNN outputs supporting feet (i.e., left or right) probabilities and hence both physical fatigue detection and supporting foot classification tasks are handled in a multi-task learning (MTL) framework. Note that the effectiveness of the MTL was verified in several situations where the inter-class variations for one task are regarded as intra-class variations for another task (e.g., in the MTL of action recognition and subject identity recognition [25], inter-subject variations are regarded as intra-class variations for action recognition, whereas inter-action variations are regarded as intra-class variations for subject identity recognition). The main contributions of this paper are summarized as follows.
• A computer vision-based approach to physical fatigue detection: We propose the first computer visionbased approach to physical fatigue detection with gait analysis. Because our method does not place any burden on the subject, such as requiring him or her to wear the optical markers or the IMUs, it can be a part of a transparent system in which users are not necessarily aware that their fatigue state is being monitored.
• A supporting foot-aware MTL-RNN model: Our model simultaneously estimates physical fatigue conditions and the first supporting foot in the gait cycles in an MTL framework, which enables us to effectively handle intra-class variations due to gait stance differences.
• High-accuracy physical fatigue detection: The proposed method yielded higher accuracy than a baseline of an RNN model with a single-task framework, which demonstrates the effectiveness of the proposed MTL framework. Specifically, we achieved approximately 80% sensitivity and specificity, which shows the feasibility of physical fatigue detection in screening use.

A. RELATED WORK 1) FATIGUE DETECTION
Various approaches of fatigue detection have been proposed in the literature, where each study aims at detecting a specific type of fatigue usually defined by the authors because there are still more questions than answers with respect to the status of fatigue [6], [26]. An individual's internal or external alterations due to fatigue can be measured objectively by various sensors and/or subjectively through questionnaires. An internal measure of fatigue is the heart rate. Heart rate variability derived from electrocardiography signals has been found to be useful for the early detection of driver fatigue or drowsiness [27] and to be a sensitive indicator of mental stress during computer-based work [28]. Although consumer wearable devices having the function of heart-rate monitoring have become popular and easily obtainable and are possibly accurate enough for fatigue detection [29], a noninvasive and contact-free approach, which is based on externally measurable alterations of fatigue, is more desirable and useful for daily use.
External alterations due to fatigue have been observed and measured as, for example, an increase in step width during walking found in older adults [15] and young adults [16], head stability during walking [17], and postural control impairment when standing on one leg [18]. As these studies suggest, gait patterns can be altered by physical fatigue and quantified through various measurements. An optical MoCap system provides accurate measurements of human body movements and can, for example, be used to estimate the gradual increase in fatigue from kinematic changes during a training exercise [21]. However, a MoCap system requires substantial setup and calibration time and a large space that makes it difficult to bring the system out of the laboratory. Movements of some body parts can probably also be measured by IMUs or actigraphy and employed as clues for classifying the state of physical fatigue or predicting the development of physical fatigue [14], [30]. Optical markers and wearable sensors obviously have to be attached to body parts and might be uncomfortable for use in everyday life. Therefore, they are still far from being part of a transparent system in which users are not conscious of being monitored for their fatigue state.
Gait patterns can be altered by physical fatigue and quantified through various measurements. Ground reaction forces have been used to investigate the effects of fatigue on gait parameters [31]. While those gait parameters based on the kinetic interaction of the foot with the ground should not be irrelevant to upper-body movements, ground reaction forces are not helpful in extracting and evaluating gait features that are directly related to parts of the upper body, such as the head and wrists. IMUs have also been used to represent gait patterns that can be classified into non-fatigued and fatigued conditions using machine learning methods, such as support vector machines [19], [20]. Strapping an IMU to a single body part, such as the sternum [20] or ankle [19], can be more comfortable than attaching IMUs to several body parts but cannot extract gait features from the other body parts.

2) DEEP LEARNING MODELS TO ANALYZE SKELETON SEQUENCES
Skeleton-based action recognition has recently become an active research topic. Deep learning approaches that can handle 3D skeleton sequences as inputs have been proposed and achieved high performance on publicly available data sets of action recognition [32]. An independently recurrent neural network (IndRNN) [33] applicable to various tasks involving sequential data outperformed state-of-the-art methods in action recognition [34]. A spatio-temporal long short-term memory (ST-LSTM) network [35] was proposed to encode spatial context in the skeleton structure of the human body as well as temporal context related to the motion dynamics of each joint. We adopted an IndRNN as a building block to construct a classifier model because an IndRNN has some advantages over an LSTM [33].
If the joint coordinates of skeletons are simply fed into an RNN model as inputs, the spatial relationship among body joints will not be explicitly taken into account. Geometric representations considering relations among body joints have been investigated in the literature [36]. On the other hand, encoding a 3D skeleton sequence into an image-like representation [37] allows convolutional neural networks that can achieve high performance in image recognition to be employed. Inspired by these work, we adopted a spherical coordinate system to encode the spatial relationship between anchor joints and the other joints.
A multi-task RNN [25] was proposed to simultaneously tackle both action recognition and person identification, where inter-action and inter-subject variations are regarded as intra-class variations for subject identity recognition and action recognition, respectively. The architecture was similar to ours except that they adopted an LSTM as a building block.
Fatigue detection from gait cycles requires specific approaches because, unlike the abovementioned action recognition, there is only one action (i.e., a walk), and a classifier model has to learn subtle differences in gait patterns between fatigued and non-fatigued conditions under large intra-class variations due to gait stance differences among gait cycles.

3) RECURRENT NEURAL NETWORKS FOR GAIT RECOGNITION
RNN and LSTM models have been recently applied to skeleton-based gait recognition tasks: for example, person identification [38] and pathological gait classification [39], [40]. None of these studies, however, considered the first supporting foot in the gait cycles. We demonstrate that the RNN models can achieve high performance in physical fatigue classification when simultaneously estimating the first supporting foot in the gait cycles.

A. DATA COLLECTION
The procedures of data collection were as follows ( Fig. 2): Participants 1) had their heart rate measured at rest for 10 minutes, 2) performed four trials of walking for a duration of 5 minutes with a 1-minute rest between trials, 3) exercised until they could be considered to be fatigued, and 4) walked for 10 minutes. Participants gave a response on a visual analogue scale (VAS) [41] after each procedure. Eight males (23.0 ± 0.76 years of age) participated in this study with informed consent. There was not much difference in body shape among participants, but the heights of the participants ranged from 169.9 cm to 188.0 cm. This study, including data collection procedures, was approved by the ethics committee of the Institute of Scientific and Industrial Research at Osaka University. It was not taken into account either how physically active a participant was daily or how often he exercised or played a sport. The proposed approach could fail if gait behavior were affected by factors such as neuromuscular, balance, and vision disorders, but such participants were not included in this study. VOLUME 9, 2021 We were interested in the alterations of gait patterns due to physical fatigue and the feasibility of fatigue detection based on gait behavior. This study adopted a step-up exercise to induce physical fatigue. Paillard [42] proposed a conceptual model on the basis of previous studies [43], [44], whereby muscular exercise deteriorates postural control when its intensity is superior to the lactate threshold. On the basis of these studies, participants were asked to step up and down on a box having a height of 33.5 cm for 600 seconds at an exercise intensity above 60%, which could be considered exercise at vigorous intensity [45]. Figure 3 shows that it typically took a few minutes of exercise for a participant to exceed the exercise intensity of 60%, and participants thus had to continue exercising for approximately 11-17 minutes. The exercise intensity was calculated every second as the percentage of the heart rate reserve (%HRR) according to [46]: where the maximal heart rate (maximal HR) was estimated as 220 − age. The heart rate at rest (resting HR) was measured in the supine position for 10 minutes and the average rate for the last 5 minutes was used. During the data collection procedures, participants wore a chest-strap heart-rate sensor (WHS-1, UNION TOOL CO.) that transmitted data wirelessly to monitor the heart rate and exercise intensity in real time.
The gait capturing setup from the top view is depicted in Fig. 4. The participants were asked to start at one end of a pathway, walk to the other end, and then return and repeat at their preferred pace. The length of the pathway is about 10 meters. Participants walked for 5 minutes in each trial and performed four trials with a 1-minute rest between trials before they performed the exercise. After the exercise, they also walked for 10 minutes. Using this protocol, we can capture the participant's gait behavior under non-fatigued and physically fatigued conditions. ''Round-trip'' walks were captured by two Kinect sensors with an image size of 512 by 424 pixels and a frame rate of 30 fps. The sensors were located at the two ends of the pathway.
The participants responded to a VAS question four times in total. They were asked to put an ''X'' mark on a horizontal line segment with a length of 10 cm according to the level of fatigue they felt; the left end of the line segment corresponded to the participant feeling no tiredness at all while the right end corresponded to the participant feeling too tired to do anything. A score was obtained by measuring the distance in millimeters from the left side of the scale to the participant's mark on the line segment. The VAS can be used as a subjective measure of fatigue. Figure 5 shows that the VAS scores just after exercise increased compared with those before exercise for all participants, which means that the participants felt tired by the exercise.

B. DATA PREPROCESSING
Because the Kinect sensor with its standard development kit can estimate 3D joint positions reliably in its effective range (i.e., 0.9-4.0 m) and in a frontal view (i.e., when a subject walks toward the Kinect sensor), we used data captured from the frontal view in the effective range (see Fig. 4).
The Kinect sensor sometimes failed to capture frames exactly at 30 fps (resulting in non-uniform capturing/sampling intervals because of such issues as frame drops and latency) and also suffered from noisy 3D joint positions due to frame-by-frame independent estimation. To arrange the frames at regular intervals and mitigate the noise, we upsampled an original time series of the 3D joint positions from 30 fps to 100 fps and then applied a Savitzky-Golay smoothing filter [47] to the upsampled time series.
If the periodicity of a time series was low (e.g., due to severe outliers), the time series was regarded as erroneous observations and excluded from the dataset. These observations can be computationally detected by checking for nonperiodicity according to the power spectral densities of the Y coordinate of the trunk. A power spectral density of the Y coordinate of the trunk was estimated using its periodogram P and used to remove erroneous observations that met both the conditions where the former condition ensures that the peak of the periodogram is significant according to Fisher's g-statistic [48], [49] and the latter ensures that the frequency at which the power is a maximum is valid. Figure 6 shows an example of an erroneous observation characterized by non-periodicity, where the significance of the largest peak is low and the corresponding frequency is irrelevant to the walking cadence. We then detected a gait cycle from the remaining upsampled time series. Specifically, a time series of the Y coordinate (i.e., the vertical direction) of the trunk shows its up-anddown movements during a gait cycle, and hence an interval between three consecutive peaks is extracted as a gait cycle (see Fig. 7). Note that a gait stance at the beginning of a gait cycle (i.e., at a peak of the vertical position of the trunk) is a single support phase and that one foot is supporting while the other foot is swinging.
Moreover, we detect the supporting foot at the beginning of a gait cycle, which is later used for training the RNN model which simultaneously estimates physical fatigue condition and the supporting foot. More specifically, because the ankle joint of the supporting foot stays almost still while that of the swinging foot moves, the foot with the smaller Z coordinate derivative (i.e., speed in the depth direction) of the ankle joint is recognized as the supporting foot (see Fig. 7).
The lengths (i.e., the number of frames) of the gait cycles differ, but they can be padded with dummy (zero) values to align them for efficient batch computation on GPUs. We, however, try to choose a fixed number of frames per gait cycle. Specifically, we adopted a sampling strategy proposed in [35]. We first divide an upsampled time series of a gait cycle into N seg segments of time series, and then randomly choose a frame from each segment, which sums to N seg frames as a result. Note that the N seg segments retain the temporal order and hence the chosen N seg frames also retain the temporal order. In contrast, the time interval between adjacent frames is not regular but varying, which can be interpreted as a sort of simulated time-warped signals with unstable sampling intervals. The possible combination of choosing the N seg frames is useful for the purpose of data augmentation in the training phase, and is also useful during the test phase because we can combine predicted probabilities from multiple sets of randomly chosen frames in a (soft) voting scheme, which enhances the statistical reliability of physical fatigue detection.

C. NETWORK ARCHITECTURE
Among the variants of RNN and LSTM that can capture time-series patterns, IndRNN [33] has some advantages such as regulation to address the vanishing [50] and exploding gradient problems, which are partly owing to the use of skip connections to convey features learned in earlier layers to later layers (see Fig. 8). The IndRNN has also achieved stateof-the-art accuracy in skeleton-based action recognition. We therefore adopted the IndRNN as our backbone network.
Furthermore, we combine an MTL framework with the IndRNN model to extend it beyond a straightforward IndRNN model with a single task to cope with large intraclass variations due to gait stance differences at the beginning of the gait cycle as well as to leverage subtle inter-class variations between physically fatigued/non-fatigued samples derived from the same walking action, as discussed in the introduction section. More specifically, the MTL-RNN model has task-specific branches; one is the main task of physical fatigue detection and the other is a sub-task (or an auxiliary task) that estimates the first supporting foot. The model also has a shared (or common) stream between them, as shown in Fig. 8. The whole MTL-RNN model contains four IndRNN layers: two for the common stream and two for each of main/sub-task streams. A specific branch for physical fatigue detection can focus on the extraction of its effective features because another branch undertakes the estimation task of the first supporting foot and shared layers extract essential features from the gait cycles for each task.
The proposed model can be trained using the whole collected data starting with both left and right supporting feet because the MTL-RNN model can effectively handle the intra-class variations derived from the difference in the first supporting foot, i.e., it does not matter whether the first supporting foot of a gait cycle is left or right.

D. LOSS FUNCTION
Because both physical fatigue detection and supporting foot classification tasks are so-called binary classification problems, the binary cross-entropy can be used to measure the training and validation losses. The weight of the crossentropy loss for physical fatigue detection was determined by the fact that the number of nagative samples was nearly double the number of positive samples (as shown in Fig. 9). Once we define the loss functions for both tasks as L PFD and L SFC , respectively, the whole loss function L is defined as This loss function was inspired by the study [51], in which σ PFD and σ SFC represent the uncertainty of physical fatigue detection and the first supporting foot estimation, respectively (i.e., physical fatigue detection is less confident or more uncertain than the first supporting foot estimation). The loss of a task with a larger uncertainty contributes less to the total loss. The term of log σ plays the role of a regularizer to avoid a trivial solution with high uncertainty.

E. INPUT REPRESENTATION
In this subsection, we introduce the input representation of gait cycles fed into the deep learning model for physical fatigue detection.
Because the absolute positions of the 3D joints depend on the walking position of a subject, which causes unnecessary intra-class variations, we employ their positions relative to some anchor joints, which encodes the spatial relationship between the other joints and the anchor joints. Specifically, the relative position of each joint from the anchor joint is represented using a spherical coordinate system (i.e., a 3D representation using radius r and two angles φ, θ, as shown in Fig. 1), which is unlike the cylindrical coordinate system used in [37]. This is because the representation with the spherical coordinate system yielded better accuracy than that with the cylindrical coordinate system in our preliminary experiments.
Following [37], we utilized N anchor = 4 anchor joints: left and right shoulders and hips. Given a total of N joint joints, we obtain a total of N anchor (N joint − 1) pairs of the anchor joints and all other joints except for themselves. We then simply concatenate the 3D relative position from all the pairs into a single vector whose dimension is 3N anchor (N joint −1). Finally, N seg frames of the vectors constitute the input for the deep learning model, and the temporal characteristics of the gait patterns as well as the spatial configuration of the body joints are learned through training.

F. IMPLEMENTATION DETAILS
Each IndRNN layer has 512 units and was trained for 150 epochs with 32 sequences per batch, where 16 samples were randomly generated from each sequence of gait cycle, and thus, 512 samples were generated per batch in the training and validation phase. In the testing phase, 64 samples were generated from each sequence of gait cycle to predict the label by voting. Adam optimization [52] was used with the initial learning rate 2×10 −4 and the weight decay 1×10 −4 , and the learning rate was reduced by a factor of 10 after five epochs with no improvement. The dropout rate was set to 0.5. We experimentally set N seg = 60, and g th = 0.32, f th = 1.0 in Eq. (2).

G. EXPERIMENTAL PROTOCOLS AND EVALUATION MEASURES 1) NUMBER OF POSITIVE AND NEGATIVE SAMPLES
We confirmed and compared the number of positive (i.e., physically fatigued) and negative (i.e., physically nonfatigued) samples for each subject, which were obtained by the proposed gait cycle detection. As described before, data augmentation was applied to the obtained samples by means of gait cycle upsampling and the subsequent random selection of frames, which were used for the following experiments.

2) LEAVE-ONE-SUBJECT-OUT CROSS-VALIDATION
The generalization capability of classification models that robustly discriminate between the physically fatigued and non-fatigued conditions of even unknown persons can be evaluated through a leave-one-subject-out cross-validation scheme, where all subjects but one are used for training and the remaining (previously unseen) subject is used for testing.
The trade-off between the sensitivity and specificity of a classification model was evaluated with an ROC curve, which allowed us to determine a cut-off value representing the best trade-off between sensitivity and specificity and to assess the overall accuracy using the area under curve (AUC). The sensitivity was defined as the ratio of the number of correctly classified fatigue samples to the total number of fatigue samples, whereas the specificity was defined as the ratio of the number of correctly classified non-fatigue samples to the total number of non-fatigue samples. The Youden index (sensitivity+specificity−1) [53] was used to select the best cut-off point at which the index reached its maximum. The performance of fatigue classification was evaluated for each subject and also in terms of an average for all subjects.

3) ABLATION STUDY FOR MULTI-/SINGLE-TASK FRAMEWORKS
The proposed MTL-RNN model was compared with singletask (ST/non-MTL) RNN models that aimed to detect only physically fatigued gait cycles, particularly with two ST-RNN models: one was trained using the whole data set and the other was trained using about half the data set that had only gait cycles starting from the left foot. The architecture of the ST-RNN model was similar to that of the MTL-RNN model except that the former was not split into two branches.

4) ABLATION STUDY FOR THE NETWORK ARCHITECTURES OF THE MULTI-TASK FRAMEWORK
In [25], different multi-task RNN architectures have been investigated for skeleton-based action recognition, including early-, middle-, and late-split architectures that differ from each other in the number of shared layers. Following their naming of architectures, the model proposed in the present study is considered a kind of middle-split architecture. These three types of architectures were compared with respect to the performance of fatigue classification.

5) LEAVE-ONE-DAY-OUT EVALUATION
A leave-one-day-out evaluation was also conducted to evaluate the performance of the proposed MTL-RNN model under a situation in which a data set could be collected from a target subject (and other subjects) in advance of testing and used for training. The last days when the subjects participated in data collection were used for testing, whereas the training set included data collected in the remaining days as well as the data of those who participated for only one day. The proposed MTL-RNN model was extended to be able to estimate subjects as well as physical fatigue conditions and the first supporting foot in gait cycles. It was compared with its variants. Cross-entropy was used as a loss function for the task of person identification.

A. NUMBER OF POSITIVE AND NEGATIVE SAMPLES
The number of gait cycles that could be extracted from the collected gait dataset are summarized in Fig. 9. Although the number of gait cycles collected under the non-fatigued condition should be about twice as large as those collected under the fatigued condition (because subjects were asked to walk for 5 minutes for four trials before exercising and then walk for 10 minutes while fatigued), there were exceptions because the proposed algorithm failed to detect some gait cycles owing to severe noise in the joint coordinates. Further, the total number of gait cycles differs among subjects because different subjects participated a different number of times in the present study.  Table 1 gives the result of fatigue classification obtained by the proposed MTL-RNN model and Fig. 10 shows the ROC curves in the leave-one-subject-out cross-validation. The varying performance of fatigue classification for the subjects indicates that the differences in the gait patterns collected under physically fatigued and non-fatigued conditions were somewhat dependent on each individual. Here, the proposed model achieved an average area under curve of 0.860, sensitivity of 0.763 and specificity of 0.812 and thus learned the gait patterns commonly seen in the subjects. In contrast, for the estimation of the first supporting foot in gait cycles, the proposed model achieved an average sensitivity of 1.000 and specificity of 1.000. These results indicate that the estimation of the first supporting foot is much easier than fatigue classification, which is also supported by the fact that the uncertainty of the former converged to a much smaller value than the latter (see Fig. 11).

C. ABLATION STUDY FOR MULTI-/SINGLE-TASK FRAMEWORKS
The results of the ablation study for multi-/single-task frameworks are shown in Table 2. The performance of both ST-RNN models was worse than that of the proposed MTL-RNN model. This is probably because the former ST-RNN model was prevented from learning the subtle differences between the physical fatigued and non-fatigued gait cycles because of the significant differences in gait patterns between the first supporting left and right foot. Moreover, it is likely that the latter ST-RNN model was not given enough data to learn from.

D. ABLATION STUDY FOR THE NETWORK ARCHITECTURES OF THE MULTI-TASK FRAMEWORK
The results of the ablation study for the network architectures of the multi-task framework are shown in Table 3. Although these architectures are comparable with each other, the middle-split architecture is the best from the viewpoint of overall performance. One possible reason is that the middle-split architecture could have the balanced ability to learn both task-specific and general gait patterns for fatigue classification.

E. LEAVE-ONE-DAY-OUT EVALUATION
The result of the leave-one-day-out cross-validation is shown in Table 4. The addition of person identification as a second auxiliary task did not improve and instead degraded the performance of the proposed MTL-RNN model. Although the proposed model could achieve an overall AUC of 0.994 for person identification, it was difficult to simultaneously learn the differences in gait patterns caused by physical fatigue and those derived from individuals because these two tasks might contradict each other.

IV. DISCUSSION
Although the performance of fatigue classification cannot be directly compared with the results of previous studies owing to different experimental conditions, our approach achieved comparable performance: for example, Karg et al. [21] achieved an average fatigue recognition rate of 81% whereas Zhang et al. [20] achieved a classification accuracy of 96%. Maman et al. [14] achieved a sensitivity of 1.00 and specificity of 0.79 on a test set that included only two subjects. Because, as shown above, the performance in the leave-one-day-out cross-validation became numerically higher than in the leave-one-subject-out manner, performance improvements can be expected from the use of previously captured data as anchors to train a classifier model with a triplet loss [54]. As far as the authors are aware, this paper is the first to propose a deep learning approach for the detection of physical fatigue from gait.
The proposed approach could lead to a promising system for fatigue detection in everyday life because • a user only needs to walk rather than perform extraordinary actions, such as standing on one leg, and • he/she does not need to be fitted with wearable sensors or motion capture markers, but • whole-body movements can be captured by an easily accessible range sensor, which offers more effective information for physical fatigue detection than the capture of partial-body movements. Furthermore the proposed approach is not restricted to fatigue classification but instead can be applicable to other classification problems where physical and mental conditions are accompanied by alterations in gait, such as the detection of emotions and mood disorders that can also affect gait movements [55].
There were several limitations in this study. First, the number of participants was small. The experimental results showed that the performance of the classifier model varied among individuals, which indicates that the data collected in this study was not large enough for the classifier model to learn the variability of individual's gait behavior. Conversely, the classifier model can achieve higher performance if it is trained using a larger data set. Secondly, all of the participants in this study were males. Several studies have suggested that females are more resistant to skeletal muscle fatigue than males [56]. The difference of the classifier's performance between males and females is an interesting open question for future work. Lastly, there was a lack of interpretability of the classifier model. Interpretable machine learning (ML) models, which can explain the reasoning behind their predictions, will allow users to accept or reject the predictions and recommendations [57], and also will provide novel biomedical insights [58]. One of interesting questions in this context is whether explanations that ML models provide have consistency with the effects of physical fatigue on walking ability [59] and postural control [60].
In this study, the VAS was just used for reference. Because the VAS is a subjective measure of fatigue and can be dependent on the participant's feeling, it should be difficult to determine a subject-independent threshold to distinguish between fatigued and non-fatigued conditions. It could be an open research direction in the future to develop a measure to quantify the degree of fatigue and a regression model to predict it.
In conclusion, this paper has described a deep learning approach for the detection of physical fatigue from wholebody gait movements, on the basis of a multi-task RNN model that simultaneously tackles two tasks: classifying physical fatigue conditions and estimating the first supporting foot in gait cycles. The auxiliary task of estimating the first supporting foot in a gait cycle helped the classifier model to significantly improve its performance for the main task of fatigue classification, while the additional task of person identification did not contribute to such a performance gain. The proposed network architecture was also compared with its variants from the viewpoint of classification performance and was evaluated with both leave-one-subject-out crossvalidation and leave-one-day-out evaluation. The experimental results showed that the proposed approach performed well, from which it can be concluded that a fatigue detection system for daily use is very feasible, especially for screening purposes. KOTA