Development of Game-Like System Using Active Behavior Input for Wakefulness-Keeping Support in Driving

Various drowsiness detection systems have been developed to prevent traffic accidents. Drivers feel such systems are annoying or useless when the systems provide many false drowsiness alarms. Drowsy driving prevention systems should satisfy the following three features to be accepted by many drivers: 1) Precise drowsiness detection, 2) Drowsy driving warning whereby drivers are less annoyed by false alarms, and 3) Mechanisms that promote drivers to adopt active behavior to maintain his/her wakefulness. The present work proposes a new drowsy driving prevention system that has these features. The proposed system, referred to as a wakefulness-keeping support system (WKSS), consists of a drowsiness detection system (DDS), which corresponds to the first feature, and an active game system (AGS), which corresponds to the second and third features. The AGS is a simple game, which encourages drivers to adopt specific active behavior to play the game. Drivers can keep their wakefulness by playing the AGS, while enjoying it, even though false drowsiness alarms may occur. The usefulness of the proposed WKSS was evaluated through experiments using a driving simulator, which suggest that WKSS could be accepted by many drivers and contributing to realizing a zero-traffic-accident society.


I. INTRODUCTION
A CCORDING to traffic accident statistics reported by the national police agency of Japan, 17.4% of 3,410 fatal traffic accidents in 2016 were classified as careless driving, which occupies the most frequent cause among all fatal traffic accidents [1]. Careless driving in these statistics is defined as impairment of driving ability due to fatigue or drowsiness. In Tatsuro Ibe and Erika Abe are with the Kyoto University, Kyoto 606-8501, Japan (e-mail: ibe@sys.i.kyoto-u.ac.jp; erika.abe127@gmail.com).
Color versions of one or more of the figures in this article are available online at https://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIV.2020.3029260 2009, the US National Highway Traffic Safety Administration (NHTSA) reported that 83,000 traffic accidents occurred due to drowsy driving, of which 886 were fatal accidents [2]. The risk of traffic accidents in drowsy drivers is estimated to be four to six times higher than awake drivers [3]. Hence, a drowsy driving prevention system is needed for realizing a zero-traffic-accident society.
Kaplan et al. classified drowsiness detection methods into visual feature-based and non-visual feature-based methods [4]. The former methods detect driver drowsiness by analyzing images of drivers faces captured by a camera [5], and the latter methods mainly use drivers physiological signals [6], [7] or driving behavior [8]. Recently, Abe et al. developed a new drowsiness detection system by integrating heart rate variability (HRV) analysis and an anomaly detection algorithm [7], which utilizes the same framework as an epileptic seizure prediction system [9]. There have been some studies that improved the sensitivity of the drowsiness detection methods [5]- [7], but it is difficult to eliminate false alarms completely. Drivers may feel annoyed by frequent false alarms [10], and therefore an appropriate warning method against drowsy driving is required so that drivers feel less annoyed by false alarms.
In addition to drowsiness detection, support for driver wakefulness-keeping is also needed to prevent drowsy driving. Some researchers have been studying driver methods for supporting driver wakefulness-keeping using active behavior. Arimitsu et al. showed that singing is more effective for wakefulnesskeeping than seat vibration or beep sounds [11]. Takayama et al. verified the effectiveness of foreign language learning while driving on wakefulness-keeping, and showed that speaking is more suitable than listening [12]. Sleep science also tells us that such active behavior is effective for keeping wakefulness. There is a close relationship between the circadian rhythm and the autonomic nervous system (ANS), and the sympathetic nervous system (SNS) becomes predominant over the parasympathetic nervous system (PNS) during wakeful periods [13]. Arai et al. investigated the effects of exercise on ANS by an experiment using a cycle ergometer [14], which showed that PNS activities decrease shortly after exercise starts and recover gradually after exercise is finished. LeDuc et al. evaluated wakefulness-keeping effects of exercise [15]. In the experiment, sleep-deprived participants who exercised were able to keep wakefulness longer This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ than those who did not. These studies indicate that active behavior stimulates ANS and helps maintain wakefulness.
In order to realize a drowsy driving prevention system that is acceptable to many drivers, the following three features should be taken into account: 1) Precise drowsiness detection, 2) Drowsy driving warning whereby drivers are less annoyed by false alarms, and 3) Mechanisms that promote drivers to adopt active behavior to maintain his/her wakefulness. The aim of this study is to construct a wakefulness-keeping support system that satisfies all of these three features. However, the present study focuses on the second and the third features as a wakefulness-keeping support system, because we can adopt an existing drowsiness detection method that satisfies the first feature.
The present work proposes a new wakefulness-keeping support system, referred to as a wakefulness-keeping support system (WKSS), that satisfies all three of the above features. Fig. 1 shows a schematic diagram of the proposed WKSS. It consists of a drowsiness detection system (DDS) and an active game system (AGS). The HRV-based drowsiness detection method [7] is used as the DDS for realizing the first feature. When driver drowsiness is detected by the DDS, the AGS requires the drivers to adopt specific active behavior through playing a game in order to maintain their wakefulness instead of sounding drowsiness alarms. Incidentally, drivers can enjoy the game even if the DDS has detected driver drowsiness in error. Therefore, the second and the third features are realized by using the AGS. The usefulness of the proposed WKSS is demonstrated through an experiment using a driving simulator.
Negative effects of active behavior on driving safety should be considered, in addition to its wakefulness-keeping effect. Kahneman referred to general resources that are required when executing cognitive work as cognitive resources [16]. Here, cognitive work is defined as information processing in response to perception [17]. Therefore, active behavior in response to drowsy driving alarms may consume the driver's cognitive resources and may impair safe driving.
Baddeley's working memory (WM) is a well-known model of cognitive processing capacity [18]. The multiple resource model (MRM) has been proposed as an expansion of the WM [17]. According to the MRM, human beings do not use a single type of cognitive resource but multiple types of cognitive resources at the same time. For example, they can understand a picture and a voice simultaneously although it is difficult to recognize multiple voices simultaneously, because information processing of visual and auditory perceptions use different cognitive resources. Since driving consumes various kinds of cognitive resources, it is important to investigate which type of cognitive resource should be used for driver wakefulness-keeping from the viewpoint of safe driving.
The MRM assumes that "response" is also a kind of cognitive resource. Since active behavior for driver wakefulness-keeping is regarded as a "response" to drowsy driving alarms, the mental workload of the active behavior can be analyzed by the MRM. In the MRM, response is classified into manual response and verbal response. When driving, an example of the former is tuning the radio and an example of the latter is voice input of a car navigation system. Although Takayama et al. have already advocated that verbal response is suitable for wakefulness-keeping [12], the effects of manual response and verbal response on driver wakefulness-keeping and safe driving should be investigated.
Accordingly, the present study employs two types of physical responses, body movement and speech, as input commands for the AGS used in the experiment, and their wakefulness-keeping effects, as well as effects on safe driving, are evaluated. This paper is organized as follows: the DDS and the AGS which are elements of the WKSS are introduced in Sections II and III. Section IV explains experiments using the WKSS. Section V describes the experimental results of the WKSS, which are discussed in Section VI. The conclusion and future works are mentioned in Section VII.

II. DROWSINESS DETECTION
A precise drowsiness detection system (DDS) is required in order to realize a wakefulness-keeping support system (WKSS). The proposed WKSS adopts a heart rate variability (HRV)-based drowsiness detection algorithm developed by Abe et al. [7]. HRV is a well-known phenomenon which reflects the autonomic nervous system (ANS) activities such as body core temperature, blood pressure, and respiration, and has been used for monitoring stress, drowsiness, and cardiovascular disease [19]. Thus, drowsiness can be detected by monitoring HRV because drowsiness affects the ANS. This section briefly explains HRV analysis and the HRV-based drowsiness detection algorithm.

A. Heart Rate Variability Analysis
A typical ECG trace of the cardiac cycle (standard lead II) consists of some peaks, as shown in Fig. 2, and the highest peak is called the R wave. The RR interval (RRI) [ms] is defined as the interval between an R wave and the next R wave, and the fluctuation of RRI is called HRV. In general, HRV tends to become low when the heart rate increases. On the other hand, higher HRV is associated with activation of the PNS [20].
A part of the raw RRI data collected from a healthy person is shown in Fig. 3(a). The raw RRI data are interpolated by using spline and resampled at equal intervals in order to extract frequency domain features. Fig. 3(b) shows the resampled RRI data whose sampling interval is one second.  The following time domain features can be calculated from the original RRI data [21]. r meanN N: Mean of RRI. r SDNN: Standard deviation of RRI. r RMSSD: Root mean square of difference of adjacent RRI. r Total Power (TP): Variance of RRI. r NN50: Number of pairs of adjacent RRI whose difference is more than 50 [ms] within a given length of measurement time.
The following frequency domain features can be obtained from the power spectrum density (PSD) of the resampled RRI data, and the PSD can be calculated by using Fourier analysis or an autoregressive (AR) model [21].  r LF/HF: Ratio of LF to HF. LF/HF expresses the balance between the sympathetic nervous system activity and the parasympathetic nervous system activity. Fig. 3(c) shows the PSD and its LF/HF of the resampled RRI data shown in Fig. 3(b). According to the HRV analysis guideline, the RRI data should be measured for two to five minutes for frequency analysis [21]. LF and HF are often used as indices to describe mental fatigue. For example, increase of LF and decrease of HF were confirmed in mental arithmetic experiments [22].
Yamakawa et al. developed an ECG-based wearable RRI sensor that can measure RRI of a millisecond order time precision, which can be easily used and manufactured for less than 100 US dollars [23]. The present study adopts this sensor for the HRV analysis.

B. Drowsiness Detection
Abe et al. proposed a drowsiness detection algorithm by integrating HRV analysis and an anomaly detection method [7]. A procedure of driver drowsiness detection is as follows: 1) Measure RRI data of a driver by using the wearable RRI sensor. 2) Extract HRV features from the measured RRI data. 3) Monitor a drowsiness sign based on the extracted HRV features by using an anomaly detection method. 4) Provide a warning when driver drowsiness is detected. The HRV-based drowsiness detection algorithm [7] uses multivariate statistical process control (MSPC) for monitoring driver HRV, although any anomaly detection method can be used.
The simplest way of detecting anomalies is to monitor each variable with upper and lower constraints independently, which is called univariate statistical process control (USPC). However, USPC cannot monitor changes in the correlation between variables. For example, when two variables have a positive correlation, as shown in Fig. 4(a), USPC cannot detect the anomaly R, which does not follow a positive correlation, because its constraints form a rectangular area. If an ellipsoid control limit shown by the dashed line is defined, the anomaly R can be detected since the correlation between variables is taken into account. MSPC is such a correlation-based anomaly detection method, which is widely used in process control [24]. Therefore, MSPC can detect a sample that does not follow the major trend in the modeling data.
In MSPC, the correlation between variables is modeled by using principal component analysis (PCA), which finds linear combinations of variables referred to as principal components (PC). PCA can describe major trends in a dataset as shown in Fig. 4(b). MSPC usually defines the normal operating condition (NOC) by two monitored indices, i.e., the T 2 and the Q statistics [25] instead of an ellipsoid control limit.
The T 2 statistic is defined as a Mahalanobis distance between a sample and the origin in the subspace spanned by principal components, which expresses a circular control limit as shown in Fig. 4(c). When the T 2 statistic is small, the sample is close to the mean of the modeling data. In addition, the Q statistic is a measure of dissimilarity between the sample and the modeling data from the viewpoint of the correlation between variables. MSPC detects an anomaly when either the T 2 or the Q statistic exceeds the predefined control limit. Fig. 4(d) shows a schematic diagram of NOC of MSPC.
In the driver drowsiness detection, the awake RRI data and the drowsy RRI data are defined as normal data and anomalous data, respectively. MSPC requires only normal data which is easy to collect compared to anomalous data, and it can be easily implemented because it is a linear method.
Abe et al. reported that sensitivity of the HRV-based drowsiness detection algorithm was 68% and it detected seven out of eight drowsy driving episodes [7], in which the facial expressional drowsiness estimation criterion was used for driver drowsiness evaluation [26].

III. ACTIVE GAME SYSTEM (AGS)
In the proposed WKSS, the AGS is started instead of providing drowsiness alarms, when DDS detects driver drowsiness. Drivers can enjoy the game even if they do not know its true purpose, although the AGS is a kind of game requiring drivers to adopt active behavior in order to keep driver wakefulness. That is to say, drivers can keep wakefulness naturally through playing the AGS. In the AGS, drivers are required to attack a lion when it is roaring or to do nothing when a cat is crying. The AGS uses a head gesture or speech for attacking the lion, sounds for score notification.

A. How to Play
In the AGS, approach of an animal is notified by a simple sound such as a lion's roar or a cat's cry instead of visual stimuli because simple sounds are less likely to greatly disturb driving operation. Animals approach from three directions: front, right, and left with respect to the driver's position. Two speakers located around the driver seat create either of two kinds of animal cries from one of the three directions.
When drivers hear the lion roar, they have to adopt active behavior to attack the virtual lion. There are two types of the AGS, each having a different input method: head gesture and speech. The AGS with head gesture is defined as AGS-Body, and the AGS with speech is defined as AGS-Voice.
In the AGS-Body, drivers tilt their head in the direction of the lion roaring. Such a head gesture is expected not to disturb driving operation, because the head is not used for steering, accelerating, or braking. In particular, head tilt does not avert the driver's gaze, while turning the head to the left and right may incur danger. In the AGS-Voice, drivers are required to answer vocally the direction of the lion's roar in Japanese; forward, right, or left. The voice command input is also expected not to disturb driving operation.
However, drivers have to do nothing when they hear a cat crying in both the AGS-Body and the AGS-Voice. When the driver attacks an animal with a head gesture or speech, the AGS makes a beating sound and judges whether the response was correct. The result is notified to the driver by sound. A coin-collecting sound means correct, and a beep sound means incorrect. The sound effects used in the AGS are summarized in Table I. Images of two types of the AGS input methods and the sound effects are shown in Fig. 5. Microsoft Kinect v2, which is equipped with a depth camera and a microphone array, is used for recognizing head gesture and speech.
A procedure of the AGS is shown in Algorithm 1. Here, variable T denotes the elapsed time [sec] from when the AGS starts and is reset at every AGS start. The activated and inactivated duration of AGS in steps 5 and 10 was arbitrarily determined so that frequent AGS activation does not disturb driving.

IV. DRIVING SIMULATOR EXPERIMENT
This section describes an outline of driving simulator experiments that were performed to verify the effectiveness of the proposed WKSS.

A. Participants
When recruiting the experimental participants, the Japanese version of the Epworth Sleepiness Scale (JESS) [27], which is a simple questionnaire for scoring daytime sleepiness levels, was used for sleep disorder screening, using a cut-off value of 15. if DDS detects driver drowsiness then 4: Set T = 0.

5:
while T ≤ 60 do 6: Make a cat cry or a lion roar from any of three directions and require a driver to respond. 7: Judge whether the driver response was correct. 8: Notify the result by sound. 9: end while 10: Wait 60 sec. 11: end if 12: end while All participants gave informed consent before the experiment started.
The number of participants used for the analysis was 32 (24 men and 8 women) out of 34 participants, because the data recorded from two participants were discarded due to failure of DDS and unexpected behavior, respectively. Participants were between 18 and 40 years old (Mean: 24.13, SD: 4.29), and their driving experience was between 3 and 219 months (Mean: 42.41, SD: 49.44).
This experiment was approved by the Engineering Research Ethics Committee of Kyoto University. Fig. 6 shows a driving simulator (DS) used in this experiment. The participants drove a virtual vehicle along a loop line with a radius of 1,500 [m] at night, no traffic light, and no stop line, because many drowsy driving accidents occur in monotonous roads like a highway or a rural road at high speed [28].

B. Experimental Environment
In addition, the participants were instructed to keep a constant distance from a preceding vehicle whose constant velocity was 90 [km/h]. The experiments were conducted between 1 pm and 3 pm, because drowsiness often occurs at that time according to the circadian rhythm [29].
There were four groups, each using a different type of WKSS:  where PA and NA were controls with respect to the two types of the AGSs. In the PA condition, the WKSS provides a beep sound when the DDS detects driver drowsiness, and the WKSS in the NA condition does not provide any information even when driver drowsiness is detected. Thresholds of the DDS were set so that sensitivity became high, since drowsiness prevention systems must not miss driver drowsiness.
Participants were separated into the above four groups so that they were substantially equivalent from the viewpoint of their duration of driving license (experience)  Table II.

C. Experimental Procedure
The experimental procedure was as follows: 1) An experimenter explained the purpose of the research and the experimental procedure, and instructed the participants on how to use the DS and the WKSS including the DDS and the AGS, the PA, or the NA, according to their assigned group.
2) The participants performed practice driving on the simulator course and experienced the AGS while driving if they belonged to the AGS-Body group or the AGS-Voice group. The practice was ended when they felt accustomed to the DS and the AGS. 3) They answered a NASA-TLX questionnaire (See Section IV-D). 4) They drove for 10 minutes to collect normal RRI data for determining the DDS threshold by using the wearable RRI sensor [23]. 5) They kept their eyes closed quietly for 10 minutes in a dark room so as to increase drowsiness [12]. 6) They drove the virtual vehicle on the DS using the WKSS for 50 minutes. 7) They answered the NASA-TLX questionnaire again and questionnaires for evaluating subjective drowsiness and the usability of the WKSS. The purpose of step 5 is to include drowsy participants in step 6 by causing drowsiness, which is consistent with the research of Takayama et al. [12]. This step helps to clarify the difference of the wakefulness-keeping effect between the drivers who already were sleepy at the beginning of driving and those who were not. The driving data obtained in steps 4 and 6 were analyzed to verify the proposed WKSS. The collected data were divided into ten intervals shown in Fig. 7 in order to investigate changes in participant states according to elapsed driving time.
The participants were required to answer KSS verbally at three points in time; intervals 1, 5, and 9, which are the beginning, the middle, and the end of the driving with the WKSS. Although it was impossible to ask KSS in interval 5, since asking could affect the driver drowsiness, the participants provided answers for interval 9 and interval 5 at the same time, after finishing driving.
2) Physiological Measurement: It is well known that there are individual differences in HRV. Therefore, this study adopted normalized HRV indices, LF norm and HF norm, in order to cope with individuality [31]. By using very low frequency (VLF) band This study adopted the difference of RRI (RRI diff) as follows: where RRI 0 is RRI in interval 0. Furthermore, for the ANS activity evaluation, this study used mean values of LF norm and HF norm, LF/HF, and RRI diff that were calculated for each interval. These values obtained in interval 1 from participant No. 2 in AGS-Voice (Voice-2) and in intervals 7-9 from participant No. 5 in AGS-Voice (Voice-5) were not measured due to sensor attachment failure. These missing data were interpolated by regression imputation [32] as follows: where e and m denote the error term, and the number of explanatory variables contain groups, intervals, and measurements in the last interval. The model was fitted by using ordinary least-squares.
Here, the effect of respiration should be taken into account when HRV is evaluated because HRV, particularly the frequency domain HRV, is significantly affected by respiration [19], [33] and the proposed AGS-Voice requires voice command input.
3) Driving Safety: Steering entropy (SE) was calculated to assess instantaneous variation in steering smoothness [34]. The SE decreases when drivers perform steering smoothly, and increases when they turn the steering quickly. Such a quick steering maneuver may occur as a result of inattention due to drowsy driving [35]. In addition, standard deviation of lane position (SDLP) was calculated to evaluate the driver's ability to keep the lane position [36]. The SE and the SDLP were measured throughout intervals 0 to 9.

4) Subjective Workload:
The NASA Task Load Index (NASA-TLX) is a widely-used scoring system for assessing driver mental workloads [37], which consists of the following six subscales; Mental Demand (MD), Physical Demand (PD), Temporal Demand (TP), Own Performance (OP), Effort (EF), and Frustration (FR). The score becomes high when the driver's workload increases. The participants scored the NASA-TLX before the 10-minute driving in step 4 and after the 50-minute driving in step 6.
The participants also answered questionnaires of seven point Likert-type scales [38] after driving, which evaluate reliability of the DDS, annoyance of intervention by the AGS or the PA, and the enjoyableness of the AGS. Table III shows the questionnaires. The scales describe "1: Entirely disagree" to "7: Entirely agree". The participants were also asked to give impressions about the AGSs or the PA by free description.

V. EXPERIMENTAL RESULTS
This section describes the experimental results of the driving simulator experiments mentioned in the previous section. Each measurement was analyzed using analysis of variance (ANOVA) and a post hoc analysis [39]. Fig. 8(a) shows KSS at three intervals 1, 5, and 9 of four groups for all participants. Two-factor mixed-design ANOVA found that a simple main effect of the intervals was significant (F (2, 56) = 5.20, p < .01), while there were no statistically significant differences between the four groups; (F (3, 28) = 1.09, p = .36) and no interaction between intervals and groups (F (6, 56) = 1.58, p = . 16). Accordingly, the result implies that there were no significant differences among the four groups with respect to the wakefulness-keeping effect.

A. Karolinska Sleepiness Scale (KSS)
Here, let us focus on KSS of the participants who were not drowsy just before the driving started. Fig. 8(b) depicts KSS of 25 participants (AGS-Body: 6 AGS-Voice: 5, PA: 7, NA: 7) whose KSS values in interval 1 were less than seven. Two-factor mixed-design ANOVA found a significant interaction between groups and intervals (F (6, 42) =  2.69, p < .05), and showed significant simple main effects of groups and intervals respectively (F (3, 21) = 3.29, p < .05, F (2, 42) = 9.93, p < .001). A multiple comparison test revealed that KSS of PA was greater than that of AGS-Body in both intervals 5 and 9 at a 5% significant level. Furthermore, it showed that KSS of AGS-Voice in interval 5 was greater than that in interval 1 at a 5% significant level, and KSS of NA in intervals 5 and 9 were greater than interval 1 at 5% and 1% significant levels.
These results indicate that the AGS-Body was more effective in wakefulness-keeping compared to the PA, the NA, and even the AGS-Voice for those who are not drowsy at the beginning of driving. Figures 9(a) and (b) depict the LF norm and the HF norm of all participants. Two-factor mixed-design ANOVA showed that there was no interaction between groups and intervals in the LF norm (F (27, 252) = 1., p = .45) and the HF norm (F (27, 252) = .6, p = .94). However, it also found that simple main effects with respect to intervals are significant in the LF norm (F (9, 252) = 4.98, p < .001) and the HF norm (F (9, 252) = 8.25, p < .001), although there were marginally significant effects with respect to groups (F (3, 28) = 2.50, p < .1 and F (3, 28) = 2.59, p < .1). A multiple comparison test revealed that there was no significant difference between groups in the LF norm, although HF norm of AGS-Body was greater than that of AGS-Voice (p < .1).

B. Physiological Indices
Next, Fig. 9(c) shows the LF/HF. According to the twofactor mixed-design ANOVA, there was no interaction between groups and intervals (F (27, 252) = 1.28, p = .17); however, there was a significant simple effect among groups (F (3, 28) = 3.49, p < .03) and with respect to intervals (F (9, 252) = 10.94, p < .001). A multiple comparison test revealed that the LF/HF of AGS-Voice was significantly higher than those of AGS-Body and PA (p < .05). The result suggests that the ANS might be slightly activated when using the AGS-Voice. Fig. 9(d) depicts the RRI diff. Two-factor mixed-design ANOVA showed that there was no interaction between groups and intervals (F (27, 252) = .53, p = .98) and no significant simple effect with respect to groups in the RRI diff (F (3, 28) = .61, p = .61); however, there was a significant simple effect with respect to intervals (F (9, 252) = 2.14, p < .05).

C. Driving Safety Indices
Fig. 10 depicts the SDLP of participant No. 8 (Body-8), who used the AGS-Body, and the average SDLP of other participants in the AGS-Body group. The SDLP of Body-8 was greater than the others' average, which shows that the participant drove unstably even in interval 0 when the participant drove without the WKSS. According to the questionnaire about driving experience, Body-8 was the only subject who had not been driving for more than one year.
Here, lane departure is defined as the deviation from the center of the road. As shown in Fig. 11(a), the lane departure of Body-8 was much greater than that of other participants in AGS-Body. On average, participant No. 8 crossed the traffic lane line for 12 seconds per minute, while other participants in AGS-Body crossed it for less than 2 seconds per minute on average. The result suggests that the driving skill of Body-8 is lower than other participants of AGS-Body.
Driving Safety Indices were analyzed excluding the data of Body-8 and participants whose duration of driving license was less than one year. The number of participants used for the analysis was 28 (AGS-Body: 7, AGS-Voice: 7 PA: 6, NA: 8). Figures 12(a) and (b) show the SE and the SDLP, respectively. There were significant simple effects among intervals in the SE (F (9, 216) = 4.01, p < .001) and the SDLP (F (9, 216) = 6.46,  The results showed no significant differences in the driving safety among groups. In other words, body movement and speech in the two types of AGSs did not significantly deteriorate driving safety. Fig. 12(b) shows that the variance of the SDLP of NA was also relatively high in interval 7. The reason is assumed to be that the lane departure of participant No. 5 in NA (NA-5) was greater than other participants in NA as shown in Fig. 11(b). According to the analysis of the recorded video of NA-5, the lane departure increased when the participant closed eyes due to drowsiness. Fig. 13 shows how one-way ANOVA with respect to WWL score of NASA-TLX found that there was no significant difference among the four groups (F (3, 28) = 1.54, p = . 22).

D. Mental Workload and User Acceptance
Regarding the subjective evaluation of the DDS reliability, excluding NA, using seven point Likert-type scales, the participants scored higher points in Question 1: "Did the AGS or the PA start when you were not drowsy?" compared to Question 2: "Did the AGS or the PA not start when you were drowsy?" (F (1, 23) = 13.85, p < .01). In other words, the participants felt that DDS provided false alarms frequently. The result implies that the threshold setting for drowsiness warning in the DDS was appropriate because it was set so that the DDS did not miss driver drowsiness, even if it meant providing false alarms. Nevertheless, the results for Question 3: "Did you feel annoyed by the AGS or the PA?" showed that the participants did not feel excessively annoyed by false alarms, even though participants felt that many false alarms occurred. Two out of eight participants in AGS-Body, one out of eight participants in AGS-Voice, and four out of eight participants in PA answered five or higher in Question 3. Seven participants out of eight in AGS-Body and six out of eight participants in AGS-Voice answered five or higher in Question 4: "Did you enjoy playing the AGS?", which suggests that the participants in both groups enjoyed the AGS.
Six participants in PA answered that they paid less and less attention to the beep sounds provided from the WKSS or that they ignored the sounds from a certain point during the driving because the alarm did not stop.

VI. DISCUSSIONS
The results of the driving simulator experiment showed that the AGS-Body and the AGS-Voice were more effective than the PA in wakefulness-keeping when the participants were not drowsy at the start of driving. It means that active behavior such as head tilting and speech is more effective in maintaining a driver's wakefulness than passive stimuli, which is consistent with related studies [12], [40].
Subjective sleepiness values of the participants using the AGS-Body remained constant, while those of participants using the AGS-Voice were gradually deteriorated. KSS of participants in AGS-Body did not increase. Takayama et al. reported that speech interaction is effective in keeping driver wakefulness and alertness [12]. This study, however, showed that head gesture is more effective than speech in wakefulness-keeping. Moreover, the experimental results where the subjective sleepiness value of PA was not significantly different from that of NA suggest that providing beep sounds alone is not effective in maintaining a driver's wakefulness. Unfortunately, an effect to improve wakefulness was not confirmed in any of the methods in the experimental results. It implies that there might be a limitation in wakefulness-keeping support systems, and therefore sufficient rest is essential for improving the driver's wakefulness even if wakefulness-keeping support systems were available.
Further investigation is required because the sample size in this study was not large enough to conclude that AGS-Body is effective in wakefulness-keeping. In addition, KSS was measured only three times because frequent questioning might affect the driver drowsiness. A method that would not affect the driver drowsiness is needed in order to examine the wakefulnesskeeping effect in detail.
The physiological index LF/HF indicated that the ANS when using the AGS-Voice was activated slightly more compared to other WKSSs, which shows that the AGS-Voice might be effective. However, there is a limitation in the verification of the proposed WKSS equipped with the AGS-Voice. A voice command input, which is required when using the AGS-Voice, affects respiration, and respiration affects the ANS and HRV [19], [33]. It is necessary to control for respiration in order to appropriately evaluate ANS activities by using LF/HF; however, this may be an unrealistic experimental condition.
Next, let us discuss the usability of the WKSS. The participants of the two types of AGSs and the PA were not annoyed, even though most of them felt that the DDS provided many false alarms. The reason why the annoyance of the AGS was small may be that the participants enjoyed the game. The participants using the WKSS with the PA also were not annoyed by the alarms very much because they came to ignore them. Such behavioral adaptation and indifference to stimuli may have resulted in the decrease of the wakefulness-keeping effect of the PA as shown in Section V-A. Motivation for long-term use of the AGS is a part of future works, since it was not investigated in this study.
The experimental results also suggest that head gestures are potentially dangerous for inexperienced drivers as shown in Section V-C. Thus, we should investigate which body gestures are safe and effective to awake inexperienced drivers. Recarte et al. [41] showed that the mental workload of distraction caused by conversation during driving increases the pupil diameter, concentrates the gaze, and decreases the inspection frequency of mirrors and the speedometer. According to Fig. 13, the WWL scores of NASA-TLX in AGS-Body and AGS-Voice were not significantly different from PA and NA in our experiment. Nevertheless, the previous study [41] suggests that AGS-Voice may deteriorate driving safety, and thereby it is necessary to compare the benefit of the proposed system on maintaining wakefulness with its safety-reducing risk. We need to evaluate the effect of the proposed system on driving safety by means of an objective method like gaze behavior and pupil diameter measurement.
Reyes et al. [42] indicated that the operation of an in-vehicle information system (IVIS) reduces the ability to detect bicyclists on the side of the road and that this performance deterioration continues even after IVIS operation ends. The AGS-Body and the AGS-Voice were activated for 60 seconds when the DDS detected a low arousal state. The next game would be activated after a 60-second interval instead of continuing the game in the case when the driver remains drowsy at the end of the game. The cat crying or the lion roaring appears eight or nine times during the 60-second game period. Based on the findings by [42] that the mental workload remains high during non-game intervals, the driver's safety confirmation could become slow, so we should verify this point in the next experiment.

VII. CONCLUSION
The present study proposed a wakefulness-keeping support system (WKSS) which consists of a drowsiness detection system (DDS) and an active game system (AGS). The AGS requires a driver to adopt active behavior; a head gesture when using the AGS-Body or speech when using the AGS-Voice. The driving simulator experiments confirmed that active behavior was more effective in maintaining a driver's wakefulness compared to a conventional alarm system, such as providing beep sounds. The AGS-Body was more effective than the AGS-Voice in wakefulness-keeping. The participants were not annoyed by either AGS, even if they understood that the DDS provided false alarms.
In future works, we need to examine the relationship between driving skill and safety of body gestures in wakefulnesskeeping. This research tested only five types of sounds listed in Table I; other types of sound may be effective for AGS, which will be evaluated in future works. In addition, we need to develop a method for evaluating the effect of body movement on wakefulness-keeping from the viewpoint of sleep medicine, such as the use of sleep scoring based on EEG measurement. Although the playing time of AGS was determined arbitrarily in this study, it would be useful to investigate the appropriate time for one play of the game in terms of wakefulness-keeping.