Classification of Driver Head Motions Using a mm-Wave FMCW Radar and Deep Convolutional Neural Network

Eight different driver head movements are measured using a millimeter-wave FMCW radar mounted in the dashboard of a car. The micro-Doppler signatures are converted into a spectrogram image format for analysis and classification purposes. The eight different head motions exhibit unique time-frequency profiles, which can be classified by deep learning algorithms. In this study, a convolutional neural network is used to classify the eight head motions with an optimized window size. Various dataset permutations are considered, such as the effect of window width on classification accuracy and the classification accuracy of head motions in a still car compared to a moving car.


I. INTRODUCTION
Driver drowsiness and distraction is a significant cause of automobile traffic accidents. The United States National Highway Traffic Safety Administration (NHTSA) estimates that drowsiness accounted for approximately 91,000 crashes and at least 795 deaths from road accidents in 2017 [1]. Driver alertness safety features have begun to appear in several car models due to manufacturers realizing their market potential [2], but most of these safety features are driver control input-based and do not accurately monitor the driver's true condition with high resolution. In contrast, analyzing the motion of the driver's head allows one to more accurately infer the driver's drowsiness condition [3], [4].
Previous works have investigated human head movements using a variety of sensing methods in different environments. Wearable sensors have proven viable to monitor head motion [4], [5], but this approach is too cumbersome for most natural driving applications. The two most popular noncontact approaches are camera-based video processing and radar sensing. Smith, et al. [6] laid foundational work in developing a video-fed algorithm to monitor driver head and The associate editor coordinating the review of this manuscript and approving it for publication was Zhong Wu . eye motion. Tawari et al. [7] used a multiple camera vision algorithm to monitor and detect driver behavior in a variety of occlusion and lighting conditions. Additional studies have been conducted to develop models to analyze head motion using video capture methods, as seen in [8]- [10]. While video methods enable a high degree of visual clarity for processing driver gaze, the privacy concern of being recorded is undesirable. Poor lighting conditions and head-worn accessories like sunglasses also limit the usefulness of video processing in more general scenarios.
Radar has proven to be a reliable approach to sensing human head motion. Radar has the advantages of working well regardless of lighting condition or clothes covering the driver. Radar does not have the privacy concerns that a camera-based system does. Chae et al. [11] used a 5.8 GHz Doppler radar to measure six distinct head motions in a laboratory environment to analyze driver head motions. Cardillo et al. [12] used 120 GHz FMCW phase tracking to monitor head and eyelid motions for patients with neurodegenerative disorders. Jung et al. [13] collected radar data of four driver head motions in a stationary car environment and used a convolutional neural network to classify the resulting spectrogram images with over 80% accuracy. Despite these advancements, most previous studies have been limited to few head movements and restricted to laboratory environments, and the classification accuracy is not high enough to render critical assessment of the driver's condition.
The goal of this study is to measure and classify eight common head movement patterns exhibited by motor vehicle drivers using a millimeter-wave FMCW radar in a moving car interior environment with passengers sitting nearby. The radar measurement data are processed using a compressed two-dimensional fast-Fourier transform (FFT), creating joint time-frequency visualizations that reveal features of different head motion patterns. These time-frequency images are used to train a deep convolutional neural network (DCNN) to recognize and classify new data from the eight activity classes.
This paper is organized as follows: Section II will detail the radar measurement setup and parameters and illustrate the eight head motions to be considered. Section III will display the spectrogram images containing the human micro-Doppler information and will discuss relevant features for each image. Section IV will present the neural network modeling and classification performance. Section V will conclude the study and discuss future directions.

II. EXPERIMENT SETUP AND DATA COLLECTION A. RADAR PARAMETERS
In this study, the Texas Instruments AWR1642BOOST [14] and DCA1000EVM [15] evaluation boards are used together to collect raw radar measurements inside a car environment. The FMCW radar can capture both range and velocity information simultaneously on a frame-by-frame basis. For each frame, the radar emits several linear chirps with specific slope and bandwidth, as shown in Table 1. The received chirps are mixed with transmitted chirps to compute the intermediate frequency signal, which is then processed using the FFT to generate the radar range profile. This mm-radar has four receiver antenna elements and can resolve targets in the cross-range dimension as well. A two-dimensional FFT across the multiple chirps allows a range-Doppler map to be created from each frame.   steering wheel facing the driver. This placement allows the radar to capture a wide view of the driver's range of motion. In the first set of experiments, the car is not moving and there are no passengers present in the car. Fig. 2 shows the range-azimuth profile when the driver is sitting still inside the car. The strong red return at 0.25 m is the steering wheel, the yellow return at 0.8 m is the driver, and the soft blue return at 1.2 m is the roof of the car.

B. EXPERIMENT PROCEDURE
Five driver test subjects perform eight distinct head movements in a repeating manner for 60 seconds each while the radar collects data. Testing takes place in three different vehicle types: a sedan, a small SUV, and a pickup truck. Three subjects test in the sedan, one subject tests in the SUV, and one subject tests in the pickup truck. The period of each repeated activity is controlled and listed below. We do not control the precise movement pattern of each test subject, allowing the subject to act out the motions in a natural way. The slight differences of motion across the subjects enriches our dataset diversity and enhances the classification capability for a general population. These eight head movements are chosen to represent a wide set of common human actions while seated in a car. The head movements are described as follows and are illustrated in Fig. 3. Period controlled to be about 0.5 seconds. 6) Lean Back-Forth-the driver slowly leans forward with the entire torso and then slowly leans back to the original position. Period controlled to be about 5 seconds. 7) Emergency Jolt-the entire body jumps/convulses with an impulse as if the car has collided with a solid surface. Period controlled to be about 5 seconds. 8) Sudden Head Jerk-a sharp motion throws the head forward and backward, as if a car collision has occurred or the driver is startled while drifting to sleep. Period controlled to be about 4 seconds. This experiment procedure is repeated twice for two different scenarios: a ''still car'' case and a ''moving car'' case. In the ''still car'' case, the test subject sits alone in the parked car while the engine is off, and he performs the head activities in front of the radar. In the ''moving car'' case, the test subject is driving the car under 25 miles per hour in an empty parking lot while turning randomly and breaking periodically. Two additional passengers are also present in the passenger seat and the right-back seat. The passengers are instructed not to make large motions but are allowed to move their heads and arms naturally. The ''moving car'' case simulates a more realistic driving environment with interference and extra vibration, the effects of which are explored in section III.

III. MEASUREMENT RESULTS AND ANALYSIS A. DATA PROCESSING
The radar data is post-processed using a customized MATLAB script: a two-dimensional FFT is taken along each frame, resolving the range-Doppler profile of the car cabin at each time sample. The range-Doppler frames are then compressed into one unified range bin using a sum operation along the range dimension. Each compressed frame becomes a column of the spectrogram, ordered from left to right to create the slow time axis. A difference filter is applied across the frames to remove the strong spectral component at zero Doppler frequency. The colormap is plotted on a dB scale for visual clarity. Fig. 4 displays the spectrograms corresponding to the eight driver head motion activities for the ''still car'' case. The horizontal axis is time in seconds, and the vertical axis is radial velocity in meters per second. A positive velocity implies forward motion toward the radar, and a negative velocity implies backward motion away from the radar. Fig. 4(a) shows the still results. This is the baseline for no head motion. A weak return around 0 m/s is visible. Fig. 4(b) shows the head droop forward results. This motion creates a spectral profile with distinct upward-curve shapes in the positive domain (head moving forward) followed by shorter triangular shapes in the negative domain (head moving backward). Fig. 4(c) shows the head turn left-right motion, which is characterized by small symmetrical lobes appearing along the center of the plot. Fig. 4(d) shows body turn left-right, which looks similar to the Head Turn Left-Right results but with larger velocity values and longer duration. The lobes alternate between having a positive and negative velocity skew-this is due to the driver moving his shoulder toward the radar as he looks to the left and then returning to the neutral position before repeating the motion while looking to the right.    Fig. 4(f) shows the lean back-forth motion. This motion exhibits slower, sinusoidal shapes in the spectrogram. The velocity does not exceed ±0.5 m/s. Fig. 4(g) shows the emergency jolt motion. This motion generates a tall, sharp, and quick Doppler response in both the positive and negative directions. This is due to the sudden convulsion of the body moving its parts in more than one direction at once. The head moves away from the radar while the torso moves slightly toward the radar. These spikes are followed by a small lobe where the body relaxes and returns to its original position. Fig. 4(h) shows the sudden head jerk motion. This motion's spectrogram exhibits a sharp positive Doppler response as the head is sharply pulled back up from a slouching position. This motion is similar to head droop forward but is more focused on the aggressive snapping-back motion of the head and less focused on the slow, drowsy, forward motion of the head droop.  Fig. 4(a). This phenomenon is present on all the Fig. 5 spectrograms. Another difference between these sets of data is the presence of low-velocity energy randomly appearing as time progresses, i.e., Fig. 5(a) at 10 and 15 seconds. This is due to the driver's hands turning the steering wheel 7 inches from the radar. However, the action of turning the steering wheel is primarily tangential to the radar, which causes negligible Doppler shift.

C. SPECTROGRAMS FOR MOVING CAR WITH PASSENGERS
Two docile passengers were sitting in the moving vehicle during the measurement trials. The presence of the passengers seems to have little visible effect on the spectrograms. Based on the distinctive traits of the data in Fig. 4 and Fig. 5, we next employ a deep learning algorithm, the deep convolutional neural network, for classification of image data.

IV. NEURAL NETWORK CLASSIFICATION TECHNIQUE A. NETWORK STRUCTURE
For real-world applications, a vehicle safety system must be able to recognize and classify driver head motions in an accurate and efficient way. Deep learning can accomplish most types of narrow, well-defined classification challenges given a large, diverse set of training data [16]. In this work, a deep convolutional neural network is trained to classify the eight driver head activities given the spectrogram images from section III as training data. Previous works have employed DCNN models for human body motion classification and have shown that well-trained models produce highly accurate and computationally efficient results for spectrogram-based image classification [5], [13], [17]. Fig. 6 illustrates the architecture of the DCNN. This network is implemented using MATLAB R and the Deep Learning toolbox available from Mathworks R [18]. This network is composed of an input layer for raw pixel information, three convolutional layers, three maximum pooling layers to reduce model dimensionality, and a fully connected layer that predicts which of the eight head motion options the input image belongs to.

B. DATASET
The experiment described in section II involving the still car is iterated a total of three times for three different test subjects, collecting three complete data sets for all eight driver head motions where each motion is recorded for 60 seconds. A fourth dataset is collected from one test subject driving a moving car while acting out the eight head motions. In total, there is a combined dataset of four trials with three unique test subjects where each trial contains eight head activity recordings that are each 60 seconds in duration.
To generate a properly large training data pool for the DCNN, each 60-second recording is split into overlapping windows of various widths. The resulting number of unique  images per recording using this technique is given as Step size With a 4-second window and a step size of 0.5 seconds, each 60-second recording generates 113 spectrogram windows. These values are selected based on heuristic search, and the effect of varying the window size is quantified in section IV-C. The total combined dataset including eight activity recordings for three subjects sums to 4,520 unique images. These data are labeled according to their activity and randomly partitioned into a training set and validation set using a 70-30 split: 70% of the data are used to train the model to recognize the unique features of each activity, and 30% of the data are withheld from the network until the validation step to help prevent overfitting to the training data. The training and validation sets are shuffled every epoch. The network predicts the activity label on each of the validation images to compute the classification accuracy. The model is trained for 30 epochs, with 30 iterations per epoch. Table 2 lists the key parameters for the DCNN model training procedure. Fig. 7 shows the confusion matrix for the DCNN model for the ''still car'' case. The average classification accuracy, referring to the mean of the diagonal elements, is 99.36%. The model had the most difficulty recognizing the chewing action, misclassifying it as still 3.8% of the time. This network displays excellent classification performance given the distinct spectrogram patterns for each of the head motions. We repeat the sliding-window dataset generation procedure for the ''moving car'' data shown in Fig. 5. Fig. 8 shows the confusion matrix for the ''moving car'' case. The average classification accuracy is slightly lower than the ''still car'' case, at 94.64%, which can be attributed to the unpredictable hand movements near the steering wheel confusing the DCNN. Nevertheless, this network model still performs well and is expected to improve with additional training data from a multitude of unique test subjects.

C. DCNN CLASSIFICATION RESULTS
Next, we combine the datasets for the ''still car'' case and the ''moving car'' case into one dataset to be used to quantify the effect of window size on classification accuracy. We consider window widths of 1, 2, 3, 4, and 5 seconds. A new DCNN model is trained for each window width, and the DCNN parameters are the same as those shown in Table 2. The accuracy vs. window size results are shown in Fig. 9. Evidently, the classification accuracy improves as the window size increases. At smaller windows, such as 1 or 2 seconds, the regions of no motion between the spikes of emergency jolt (Fig. 4(g)) may look like still ( Fig. 4(a)) to the network, leading to poor classification accuracy. At larger window sizes of 4 seconds or more, the window captures at least one full period of each repeated motion, resulting in high classification accuracy due to the distinctive features of the full motions. Based on Fig. 9, one can select the appropriate window width depending on the classification accuracy requirement.
The training process using MATLAB takes approximately 1 hour to complete. Once finished, the model can classify new images almost instantly.

D. COMBINED MOTIONS
Finally, real-life driver head monitoring would involve different head motions displayed in an unpredictable order, and it is necessary to continually monitor the driver's head motions in real-time during the trip. Fig. 10(a) shows a test case where the driver subject chooses to act out several of the eight actions in random order. The actions are performed for a naturally short duration. The first 7 seconds are still, and  then we see the small, rapid perturbations of chewing. Next, at 11 seconds, we see the symmetrical lobes characteristic of head turn left-right. At 20 seconds, we find the sinusoidal curve of lean back-forth. Finally, at 26 seconds, we see the impulse-like signature of sudden head jerk. A sliding window is used to segment this 30-second spectrogram into overlapping windows of 4-second width and 0.5-second step size to be compatible with the DCNN model we already trained. Each of the overlapping spectrogram windows are fed into the pre-trained DCNN model, and the prediction results are shown in Fig. 10(b). We can see that the model correctly predicts most of the activity regions of Fig. 10, starting with still, then transitioning to chewing, followed by head turn left-right, then lean back-forth, and finally sudden head jerk.
The transitions between each main activity region may cause some transient confusion for the network, as seen by sudden deviations to other class predictions such as sudden head jerk at 16 seconds or emergency jolt at 23 seconds. Fig. 11(a) and (b) show another case of random combined head motions and their classification. The classification results in Fig. 11(b) closely match the respective spectrogram patterns but become temporarily confused with spurious predictions around 15 and 20 seconds due to similar head motions so close together.

V. CONCLUSION
This study investigated eight common head motions of a human driver in a realistic motor vehicle environment and computed visualizations of the time-frequency signatures generated from the body interacting with the millimeter-wave radar. We trained a deep convolutional neural network to recognize and classify the activities based on the features exhibited in their respective spectrograms. This work has shown that the high velocity resolution offered by a mm-wave FMCW radar is capable of differentiating many human body motions for practical applications in real-world environments.
Future directions for this research include measuring and classifying driver motions under various real driving scenarios like parking lot, highway cruising, bumpy roads, bad weather, etc. Furthermore, the effects of vigorous passenger motion inside the car while driving need to be studied. The effects of different driver body shapes and sizes may also be explored. Finally, conducting a large-scale measurement campaign with many driver test subjects will enable the subsequent DCNN training to produce robust, high-quality models capable of detecting driver behavior in commercial safety applications. The end goal is to develop practical sensors and edge computing devices to detect head motions and alert the driver in real time.