3D Head Motion Detection Using Millimeter-Wave Doppler Radar

In advanced driver assistance systems to conditional automation systems, monitoring of driver state is vital for predicting the driver’s capacity to supervise or maneuver the vehicle in cases of unexpected road events and to facilitate better in-car services. The paper presents a technique that exploits millimeter-wave Doppler radar for 3D head tracking. Identifying the bistatic and monostatic geometry for antennas to detect rotational vs. translational movements, the authors propose the biscattering angle for computing a distinctive feature set to isolate dynamic movements via class memberships. Through data reduction and joint time–frequency analysis, movement boundaries are marked for creation of a simplified, uncorrelated, and highly separable feature set. The authors report movement-prediction accuracy of 92%. This non-invasive and simplified head tracking has the potential to enhance monitoring of driver state in autonomous vehicles and aid intelligent car assistants in guaranteeing seamless and safe journeys.


I. INTRODUCTION
The role of humans in driving is expected to shift from active control to supervision or to passiveness in advance driver assistance systems (ADAS) to partial and conditional automation transportation. Monitoring driver state is vital for ascertaining the capability of supervising or maneuvering the vehicle when unexpected road events occur and for understanding the driver's comfort level.
In-vehicle sensing of driver attention/distraction can be achieved by analyzing posture changes and head, hand, foot, and eye movements [1], [2]. Sensing of human activities in a vehicular setting has been a hot topic because of its importance in ADAS and conditionally autonomous driving and its potential in facilitating human-car interaction and monitoring driver health/fatigue/distraction in driving-safety systems. Most of the current solutions are based on either visiblelight cameras or wearable sensors. Camera-based sensing is affected strongly by variations in light levels during a journey, The associate editor coordinating the review of this manuscript and approving it for publication was Haiwen Liu . and in darkness it relies on external illumination, which could distract the driver. Also, privacy concerns arise [3], [4].
In addition, the technology entails a trade off between frame rate (decreasing motion blur) vs. cost. While wearable sensors [5] can overcome some of these issues, wearing the devices may be inconvenient for the driver, and the connection to the environment (e.g., via the steering wheel) can limit which driver states may be inferred (e.g., stress and fatigue) [6].
In recent years, device-free human sensing from wireless signals [7] has gained popularity, thanks to being less invasive and coping with occlusion and darkness. This development has inspired in-vehicle wireless human sensing in the form of monitoring vital signs, such as heart rate [8] and breathing [9], and the driver's state [10]- [12]. Our WiBot system [10] characterizes driver motion from head turns and hand gestures by means of WiFi signals, and, by developing WiCAR [11], we extended this to humans, in-car models, and external-setting-independent in-vehicle activity recognition. Importantly, all three studies utilized CSI as the signal descriptor, since WiFi is seeing increasing application for invehicle entertainment. WiFi's operating frequency, however, VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ is susceptible to the multipath effect due to passengers and nearby cars, and the head tracking's accuracy is constrained by wavelength. We have addressed this via radio signals at widely separated frequencies (1.8 and 30 GHz) in a monostatic configuration, for distinguishing a driver's head motions from those of the body [13]. So far, most radio frequency sensing efforts have focused on changes in the received signal strength, channel state information, and micro-Doppler motion [14]. In contrast, we have used millimeter-wave Doppler radar to perform invehicle driver/passenger 3D head tracking for three distinct head movements, as illustrated in Fig. 1.
Pitch, roll, and yaw movements in 3D space are of great interest because they provide information on driver attention levels and behavior [15]. For example, repeatedly pitching forward and backward may indicate drowsiness, yaw movements may point to shoulder checking and thereby indicate awareness of one's surroundings, and rolling motions may reveal distraction (e.g., reaching for objects or attending to rear-seat passengers).
We recently presented detection of translational movements and separating between the head and torso [13]. In this paper, we demonstrate that appropriate choice of the radar configuration (bistatic vs. monostatic) followed by estimation of the time-varying Doppler spectra allows distinguishing and accurately estimating rotation and translation of the head within a single system, which has not been studied before. While head translation along the radar line of sight leads to a Doppler shift in the frequency of the reflected signals, the rotational head movements have minute impact on the Doppler spectra and introduce a time-varying Doppler spectrum around the carrier frequency. A car-like arrangement in an anechoic chamber with a carrier frequency of 30 GHz was used to investigate the optimal radar configuration of receiver (RX), transmitter (TX), and subject.
A dynamic object's detectability by radar depends on its size, geometry, and wavelength. Size is inversely proportional to wavelength, which means that λ < size of the body part if one is to obtain the complete reradiated signal. At 30 GHz, λ is 1 cm, which is much smaller than a substantial head movement, so these motions can be captured with high accuracy.
Our approach computes a distinctive feature set by marking movement boundaries and performing joint short-term time-frequency analysis. Head rotations introduce Doppler modulations that can be examined to estimate the manner in which the head is moving. Via reduction and removal of redundancies and correlations in the feature set, we reduce it to the two most separable features. The final step is to separate among movement classes by using a support-vector machine classification algorithm.
The in-vehicle environment is normally static; i.e., the physical surroundings are fixed, and only the human body is in motion. This enables us to focus on dynamic Doppler-based features and to extract features that allow distinguishing between classes of movement, among them a pitch forward, pitch backward, roll left, and roll right (translation) and also a yaw left and yaw right (rotation).
To the best of our knowledge, the 3-dimensional head movement detection using mmWave wireless radar has not been presented before. The only published work closest to our research is [12], that detects 2D head orientation with CSI signals. Besides, this is the first work that present the importance of bistatic and monostatic configuration of TX/RX antennas and the subject, to separate rotational and translational head movements and improve the accuracy of detection. And most importantly, the idea of translating head movements detected from non-intrusive wireless signals to analyze human distracted behavior is a new advancement.
The main objectives of this work are to 1) Derive the optimal geometry to separate yaw movements in a car-like arrangement within an anechoic chamber and show that 30 GHz millimeter-wave signals are appropriate for this scenario 2) Demonstrate the advantage of bistatic over monostatic configurations for RF sensing that identifies head rotation and translation, since it captures more scattered components of the signal 3) Show that translation along the radar line of sight is more easily distinguished in a monostatic configuration and that detecting all rotational-and translationalmovement classes within one system requires additional RX units, at 0 • and 90 • . The discussion is organized such that Section II describes the concepts that were the building blocks of our research. Details of the experimental setup, human study and frequency-time analysis are presented in Section III, and Section IV covers the system implementation. The results are presented in Section V followed by system limitations in Section VI and conclusions in Section VII.

II. FUNDAMENTAL CONCEPTS
In this section, we introduce the significance of mmWave radar and its specifications in our work, then specify the target head movements and the reasoning behind focusing our methodology on them. Finally, we provide an empirical demonstration of the feasibility of both monostatic and bistatic configurations with regard to translational and rotational movements.

A. THE SIGNIFICANCE OF MMWAVE RADAR
Millimeter-wave radar is a well-established vehicular technology. Short-range radar applications (parking assistance and pre-crash applications), medium-range ones (crosstraffic alerts, lane-change assistance, and blind spots' detection), and long-range ones (adaptive cruise control) already operate in the automotive domain, at such frequencies as 76-81 GHz [16]. Typically, a frequency-modulated continuous waveform is used for deriving information about the existence, location, and velocity of neighboring vehicles from reflections. We are not, however, aware of applications of mmWave radar within the passenger cabin for monitoring driver state, even though it offers significant advantages over lower-frequency regimes with regard to in-vehicle driver state monitoring.
The accuracy with which minute movement can be recognized from RF signals is conditioned on the wavelength for the signal frequency and also on the bandwidth available (for 30-300 GHz, λ=0.001-0.01 m). Moreover, in in-car scenarios, the reduced range of mmWave signals does not limit the sensing capabilities, since the distance between antenna and sensed subject is small (usually 1-2 m). Compared to other mmWave spectrum areas, the frequencies around 30 GHz specifically provide a good compromise in this respect, showing the lowest attenuation among all frequencies above 20 GHz [17]. Since device-free RF sensing relies on reflected, heavily attenuated signal components, this makes frequencies around 30 GHz particularly suitable for in-vehicle device-free RF sensing.

B. TARGET MOVEMENT DESCRIPTION
Humans' head movements are physically complex and unstructured but contain rich information about behavioral characteristics. Moving the head relies fundamentally on support against the force of gravity. Head movements steer and are closely linked to the sensory structures of the head, especially those for vision [18]. Humans tend to use a combination of head and eye movements to stabilize their line of sight and focus on a target [19]. Especially in head rotations, the eye movements follow the head to keep gaze shifts in balance. In addition, head movements such as nodding or shaking are used for nonverbal communication. Therefore, head motions, alongside eye movements, are vital in characterizing human states of attention and interest. Since motions of the head have a bigger impact on RF signals than eye movements do, we chose them as our target movement.
In situations wherein the human's position is partially fixed, such as being seated (in a car or otherwise), locomotion is restricted and a subset of complex head movements such as pitch, roll, and yaw could provide enriched information about the behavior, interest, and attention. The emphasis here is on large, significant movements for behavior-detection purposes, and we specify these in the form of class memberships involving the horizontal (pitch and roll) and rotational (yaw) plane. The geometry of these movements is explained below.

1) TRANSLATIONAL HEAD MOVEMENTS
As depicted in Fig. 1, the subject is modeled as a vector S from the origin O s , pointing toward the head. Relative to O s , S is described in the Cartesian coordinate system as (x, y, z) and in the spherical coordinate system as (r, θ, φ), where φ = 0 • is in the +x-axis direction. The antenna coordinate system is described by (r a , θ a , φ a ) relative to origin O a , where R a = (r a , 0, 0) is the direction of maximum gain. The subject is placed electrically far from the antenna (||O s − O a || λ). For measuring pitch, the receiver is placed such that R a is parallel to the x-axis. Pitch is defined as a change in θ in the φ = 0 • direction, Projecting the angular velocity ω p onto the x-axis shows that the component of the angular velocity in the direction of R a results in a measurable change in the Doppler frequency at the receiver; therefore, any pitch forward or backward by the head can be detected in changes to the Doppler shift (see Fig. 1). A roll movement with angular velocity ω r is defined in the same manner as a change in θ in the φ = 90 • direction, and R a is parallel to the y-axis (the receiver is placed beside the subject).

2) ROTATIONAL HEAD MOVEMENTS
Yaw of the head is a rotational movement defined as change in φ in the direction of θ = 0 • , Yaw movement of the head is expected to cause less Doppler frequency shift than head pitch or roll. This is because yaw movement is partially projected onto the x-axis and does not result in significant components projected onto either the x-or the y-axis. Hence, the change in Doppler frequency is not sufficient for distinguishing between right and left yaw. These can be distinguished, however, with an appropriate bistatic configuration and the joint frequency-time analysis discussed in the following sections.

C. THE IMPACT OF GEOMETRY ON MOVEMENT DETECTION
While TX and RX are collocated in a monostatic radar configuration, bistatic radar is a system configuration in which they are separated by a considerable distance [20]. The bistatic configuration is specified in terms of the bistatic angle, β, which is defined as the angle (0-180 • ) between TX and RX, with its vertex at the target. Bistatic radar has complex geometry relative to monostatic radar but provides better receiver sensitivity, a refined radar cross-section, and wider spatial diversity [21]. The different geometry leads to differences between the two setups in scattering of the radiating signal. The angle of incidence between the transmitting and receiving antenna is large in a bistatic setup, in sharp contrast to the collocated antennas in a monostatic one, and this guarantees that the scattering will not be confined to the strongest scattering lobe and should significantly reduce the impact of backscatter [22]. That creates motivation for identifying the less observable or less dominant movements with a reduced radar cross-section (RCS). Scattering due to movement in a different direction results in a large RCS in some bistatic configurations, revealing the subtle movements and providing a higher signal/clutter ratio [23]. This implies that bistatic radar should be favored when the target reflects very little energy in the monostatic direction. Although the design and development of an optimal bistatic geometrical configuration is challenging, it could significantly improve the classification accuracy since this can capture information about those dynamic scattering properties of the target that are invisible to monostatic radar [24], [25]. The bistatic system's geometry affects radar operation characteristics such as the Doppler equation. For a static transmitter and receiver, the bistatic Doppler frequency is given by f B [26] where β is the bistatic angle, δ is the angle between the target velocity vector and bistatic angle bisector, and V is target velocity. We consider different β (0 • , 45 • , 90 • , and 180 • ) as shown in Fig. 2, and keep all experiment settings and parameters (described in Section III) constant to find out how changing β can affect the signal characteristics from head movements.

D. THE IMPACT OF MOVEMENTS ON THE RSS
The RSS is the variation in amplitude of the base signal caused by human body movements, which is the parameter most easily measured by mainstream wireless technologies.
Our first step was to analyze the impact of head rotations on the RSS. The normalized values for RSS indicator graphs for yaw left and yaw right in various configurations are given in Fig. 3. These graphs provided some fundamental information about the configurations that aided in Doppler effect and STFT analysis. The 180 • configuration does not capture any movement information and has high noise levels (see Fig. 3d and 3h), the reason being the subject sitting in the TX and RX line of sight and blocking all the signal that should be received at the RX. The 90 • configuration displays a clear gradient rise and drop for yaw right and left (see Fig. 3c and 3g). This occurs because rotational head movement leads to multipath scattering, and less-observable movement is captured more in the bistatic configuration as explained in Subsection II-C. The 90 • configuration seems the most reasonable in comparison to 45 • and 0 • . Head movement is more easily detected with a larger radar cross section (RCS) which depends on the target's reflectivity, i.e., the RCS of the head is unique in each configuration. In 90 • configuration, the complete head rotation from TX to RX gets captured while the complete range of motion for 90 • head rotation is not covered at 0 • and 45 • configurations, refer to Fig. 3a, 3e, and 3b, 3f.

E. THE IMPACT OF MOVEMENTS ON THE DOPPLER EFFECT
The Doppler effect is the primary feature used in detecting body movements. A positive frequency shift indicates that the target is moving toward the RX, while a negative one indicates moving away from the RX. The full distribution of the frequency shift due to the Doppler effect is called the Doppler spread, B D . We calculated B D by computing the fast Fourier transform (FFT) of the received signal [27]. Fig. 4a and 4f show the Doppler spread graph for pitch movements, toward and away from the RX in the monostatic configuration. Both forward and backward pitch motions are clearly distinguishable from the Doppler spread, since the Doppler shift is significantly greater in one direction for each movement. Yaw right and left movements have a less pronounced impact and Doppler shift at either end of the frequency spectrum and create almost equal shift on the two sides so cannot be separated with a monostatic configuration (cf. Fig. 4b and 4g). Inline with RSS analysis, yaw right and left can be distinguished clearly via the Doppler shift information in the Doppler spread graphs for 45 • (Fig. 4c and 4h) and 90 • (Fig. 4d and 4i).
Using equation 3 and assuming that the amplitude of 2V /λ term is the same for all yaw movements, the range of variation of cosδ. cos(β/2) for both yaw right and left were compared for 0 • , 45 • and 90 • configurations. It was observed that while the range of variation of cosδ. cos(β/2) for yaw right and left in the 90 • configuration has no overlap, the 45 • and 0 • configurations display overlap and f B = 0 in 180 • configuration.

III. METHODOLOGY
The measurements were conducted at 30 GHz in a 3 × 2.78 × 5 m anechoic chamber in the Radio Science Lab at the University of British Columbia. At 30GHz the λ is 1cm, which is much smaller than a proper manually measured head movement of approximately 10-15cm, therefore it is possible to capture it with high accuracy. The reason for using specifically 30GHz frequency is that it met our  criteria of separating 3D head turns and it is representative of unlicensed 60GHz frequency, with the change in wavelength only by a factor of 2 (λ = 0.5cm) and low attenuation level (<1.2dB/10m) behaviour much similar to 30GHz. There were complete equipment facilities in our lab for 30GHz frequency, to use them for experiment set up and evaluate our technique. Examining a range of higher frequencies is one of our future work. The chamber provided a static environment with stable positioning of the objects, a certain signal-tonoise ratio, and known clutter, which should afford isolation of signal variation due solely to the human body. Wireless channel impairments are expected in realistic environments and will need to be resolved by future work. The schematic VOLUME 8, 2020 block diagram and the experimental testbed used in our study are shown in Fig. 5 and Fig. 6.
An E8362C PNA microwave network analyzer was used as both source and VNA, at the TX and RX. For reason of limited VNA frequency range, a block upconverter (BUC) (Norsat 7040STC) was used to upconvert the 1.45 GHz CW source signal to a 30 GHz signal transmitted with a horn antenna (Pasternack PE9850). At the RX side, an orthomode transducer (OMT) (SAT-303-31528-C1-1) was used to separate two orthogonal polarized signals for the RX horn antenna (SAC-2309-315-S2). To monitor the 30 GHz signal received with the VNA, an E8257D analog signal generator created a 14.275 GHz sinusoidal wave, which was doubled with a frequency multiplier and passed through a mixer to downconvert the received signal. A laptop was connected to the VNA via a TCP connection for remote control of experiment operations. The equipment parameters are listed in Table 1.
For all measurement carried out in both monostatic and bistatic configurations, the TX antenna location was fixed at 90 cm away from the human subject. The RX antenna was placed 90 cm away from the human subject at 0 • , 45 • , 90 • , and 180 • with respect to the TX antenna, as shown in Fig. 2. The human subject sat in front of the TX antenna such that the head was within the antennas' aperture, to make sure all head movements were captured. Each type of head movements was performed similarly across all configuration angles, with the human face starting at the TX and ending at an angle to 90 • toward or away from the RX for yaw and roll movements. Finally, measurements were collected multiple times for different subject on different times to avoid any environment or subject bias in the results.

A. STFT ANALYSIS OF MOVEMENTS
We performed a discrete-time short-time Fourier transform (STFT) for input signal x[n] with window function w [n]. While the FFT of the full t-second dataset captures all Doppler activity in a single spectrum, an STFT is better for identifying the direction, time of occurrence, and duration of movements. STFT takes the Fourier transform of a windowed input signal, with the input signal split into m = N /τ chunks and the Fourier transform performed for each individually. N is the total number of points in x(n) and τ is the time separation between sections. In this case, time-localized frequency information can be obtained given by, Windowing function w(n) can be understood as a brickwall filter of width τ , but it is recommended to use a tapered window, such as a Hann or Hamming function, combined with some overlap between neighboring sections to minimize sidelobe amplitude [28].  Fig. 7(a), frequency spreads from negative to positive values in all movement thus, yaw left and right are indistinguishable from each other. In Fig. 7(b), however, each preceding movement has an opposite frequency shift, and makes movements clearly distinguishable.

IV. IMPLEMENTATION
In this section, we describe the steps in our process to detect head movements, refer to Fig. 8. We achieved accurate separation between movements via event boundary detection in which the boundaries were marked by means of an impulsive windowing approach and frequency-time STFT analysis was used to calculate the features. For obtaining a distinctive, highly uncorrelated, and easily separable feature set, data reduction was employed and reduced features were extracted accordingly. Then an appropriate machine learning algorithm was applied and trained to predict states of the human head.

A. EVENT BOUNDARY DETECTION
Before one can detect and identify individual activities being performed, windowing the RF signal is required to separate them from each other over time.
Abrupt and instantaneous changes occurring in time-series data from a natural physical environment demands efficient detection of changes, and cost should be optimized. We adopted a concept of finding locations where data values are changing abruptly and utilizing these in marking the boundaries of a window. This is a change-point problem which identifies the points in the input data where statistical attributes fluctuate [29].
In mathematical terms, the input data in ordered sequence can be represented as y1 : n = (y 1 , ..., y n ). The output model should include the number of change points, m, along with their locations, τ 1 : m = (τ 1 , . . . , τ m ). The changepoint positions must be integers, with each lying in the range between 1 and n − 1. We defined τ 0 = 0 and τ m + 1 = n and assumed the change points to be ordered such that τ i < τ j if, and only if, i < j. Consequently, the m change points will split the dataset into m + 1 segments, with the ith segment containing y(τ i 1 +1 ) : τ i .
To resolve the issues highlighted above, we made use of a maximum-likelihood estimation algorithm that dynamically identifies indices in our data where a significant statistical change has occurred. In particular, we aimed to find all points in our data where the standard deviation has large and abrupt changes.
We refer to these points as the window boundaries, and we assume that one independent activity is being performed between every two adjacent boundaries.
In our 20 seconds of measurement, four distinct head movements were performed, with no movement regions between these. The total number of change points detected is 12-15, with accurate capturing of the head-movement events, as indicated by Fig. 9. We assigned labels to windows of movements and of no movements accordingly.

B. FREQUENCY-TIME ANALYSIS 1) FEATURE CALCULATION
For each window computed, joint frequency-time analysis was performed and all instantaneous features were obtained from the Doppler spectra. Our pool of extracted features is composed of derived features such as mean, minimum, and maximum values of RSS, alongside frequency, velocity, displacement, and Doppler spread. The pool of multi-dimensional features still requires further processing, to reduce them to a subset that is more robust in distinguishing the set of classes. For this purpose, dimensional reduction was performed.

2) DATA REDUCTION AND FEATURE EXTRACTION
Since the features obtained via the spectrogram analysis are derived features and we allow for a large number of explanatory variables, there is a possibility of high correlations and over-fitting the model, through which some results might not generalize between datasets. To address this problem, the d-dimensional feature set was projected into an l-dimensional feature set, where l < d. This method, called principal component analysis (PCA), involves orthogonal transformation of the data matrix, converting the set of correlated variables to linearly uncorrelated variables, referred to as principal components.
The amount of variance in each principal component is explained by its Eigenvalue; PCA is performed by Eigen decomposition of correlation matrix, E PCA of Z to transform correlated rows into a new orthogonal coordinate system: where is the diagonal matrix of Eigenvalues λ (k) of Z Z and λ 1 has the highest variance [30].
Dimensionality reduction with PCA was a very critical step for our multi-dimensional dataset which helped to remove the correlations between features and to extract the most distinguishable features.

3) CLASSIFICATION AND PREDICTION OF HUMAN STATE
Supervised learning was employed to train the classification model, based on the labeled data, and predict values for an unknown dataset. To achieve this, the data was splitted into a training set (75%) and a test set (25%), using a VOLUME 8, 2020 cross-validation library from Scikit-learn. The classification model was built with the training set and its performance was evaluated with the test set. Our calculated feature set contains data in non-standard form -the independent variables' values are not on the same scale. Hence, to apply PCA and the classification model, the Scikit-learn pre-processing library was used to scale the feature set. After that, PCA was applied to the standardized training and test set. It was observed that our dataset contains greater than 50% variance in the first two principal components. Since it is still possible to visualize the separation of boundaries, as shown in Fig. 10, we opted to retain the first two components.
A support-vector machine (SVM) was used for classification and a discriminating or conditional classifier was designed to create a hyperplane that separates between classes. It is a supervised learning method that takes labeled training data of n points in the form ( x 1 , y 1 ) , . . . , ( x n , y n ) as input and determines the optimal hyperplane to categorize new classes. In two-dimensional space, where the y i values are either 1 or −1, the hyperplane is a single line dividing the plane into two parts, to categorize the data into two classes [31]. SVM's are reliable classical machine learning classifiers used in wrieless sensing for applications such as human fall detection [32], [33].

V. EVALUATION AND RESULTS
The following metrics that are commonly used in machine learning algorithms were applied to evaluate our algorithm's performance, where TP = true positives, FP = false positives, TN = true negatives, and FN = false negatives. To clarify these metrics, the example of the ''yaw movement'' class is considered here. TP would correctly detect yaw movements and FP indicates non-yaw movements detected as yaw. TN indicates non-yaw movements correctly detected as non-yaw, and FN presents actual yaw movement detected as non-yaw. Recall and precision values provide more concrete evaluation of the detection algorithm. Recall, or sensitivity, is the percentage of actual yaw movements correctly detected as yaw by the system, and precision, or positive predictive value, is the proportion of yaw-movement detection that reflect actual yaw movements. F1 score is the harmonic mean of precision and recall and accuracy refers to the fraction of correctly detected head movement classes out of all the events detected. One should strive for algorithm performance levels of 100% for accuracy, precision, recall, and F1. Extensive research was performed to achieve maximum accuracy for detecting translational and rotational movements independently and in combination. The former refers to the system being able to detect the given type of movement, translation or rotation, while considering all other movements to be noise. Combined movement detection refers to the system's detection of both translational and rotational movements. Fig. 11 shows the confusion matrices for translational and rotational movements in monostatic and bistatic configurations. From RSS and Doppler spread empirical study, it was inferred that bistatic angles of 45 • and 90 • preserve the most relevant information about rotational movements. Further analysis from STFT features for these two configurations' data revealed reaching maximum accuracy at 90 • for complete 90 • head rotation. For pitch movements, as shown in Fig. 11(a) and (b), 0 • configuration shows a maximum accuracy of 96%, precision and recall of 90%, while the accuracy falls to 85%, precision to 73% and recall to 56% with 90 • configuration. For yaw movements (Fig. 11(c) and (d)), in contrast, the 90 • configuration performs best, with an accuracy level of 91% with precision and recall value of 96%. In a monostatic configuration, the accuracy drops to 77% with precision of 53% and recall of 57%. Fig. 12 shows the confusion matrices for combined translational and rotational movements. For combined pitch and yaw, the maximum 80% accuracy was obtained in the monostatic configuration, while the bistatic 90 • configuration yielded an accuracy of 76%, refer to Fig. 12(a) and (b), respectively. This shows that we cannot accurately detect complete 3D movement with one type of configuration. It is predicted strongly that both monostatic and bistatic configurations are needed simultaneously for detecting combined translation and rotation movements, though due to hardware limitations, it was skipped in our measurements. Instead, we evaluated the data of combined 0 • and 90 • configurations in Fig. 12 which show that the accuracy improves to 88%, with precision and recall of 90%, for combined movements' detection. Roll is another translational movement that behave similar to the pitch movement with a greater Doppler impact at 90 • , due to the direction of movement, than at 0 • configuration. Performing analysis with a feature set containing labels for all pitch, roll, and yaw, 84% accuracy was reached when pitch movements were captured in a 0 • and roll and yaw movements in a 90 • configuration.
The evaluation metrics calculated for head movements are in line with our claim that a bistatic configuration significantly improves the detection accuracy for rotational movements while translational movements are handled best by  a monostatic configuration. From our individual movement detection results, we strongly believe that the accuracy will be significantly improved if data is collected in a multistatic configuration.

VI. LIMITATIONS
The primary focus of this research is to solve a complex problem of separating rotational and translational movements from each other with high accuracy. The technique is demonstrated for a single person as there are potential solutions to remove the impact of another person, for instance one sitting behind. One way is to utilize the direct relationship of RMS delay to transmitted power. In both monostatic and bistatic configurations, when a passenger is behind the driver, the driver blocks the transmitted signal, thus less power is received by the passenger. In case of any reflection from the passenger, the echo will have a higher RMS delay spread especially if it goes through multiple reflections which implies less received power compared to the reflected signal from the driver. At the receiver, any received signal below a certain threshold can be simply ignored.
The other way in which this challenge can be addressed with our solution is the utilization of appropriate antenna full beam width in bistatic configuration to minimize the reflection from movements outside the beam width.
In order to enhance the robustness of the approach we need to counter for any changes in the accuracy due to adverse road conditions. The current results are achieved from measurements performed in an anechoic chamber to primarily separate the translational and rotational head movements. This challenge can be resolved by incorporating car movement information from existing motion sensors in modern cars. These motion sensors are with integrated three-axis accelerometer, three-axis gyroscope and three-axis magnetometer which provide accurate information about the car movement process, any sudden changes in velocity, angle or direction. The data collected by motion sensors for usual and unusual road conditions could be utilized to spot any abrupt changes in the wireless signals. The change in the signal behaviour could be treated as car movement noise and filtered out to prevent the changes in the accuracy of human movement detection.

VII. CONCLUSION AND FUTURE WORK
Human head tracking is an intriguing concept especially in ADAS, partially and conditionally autonomous driving due to its potential in monitoring human behavior. Research into RF sensing is yielding effective tools which, in combination with the availability of cheaper high-frequency devices, allow for high-resolution movement detection. In this paper, to achieve in-vehicle 3D head tracking, a new technique was proposed that exploited a 30 GHz millimeter-wave Doppler radar with a car-like arrangement within an anechoic chamber. To detect rotational vs. translational movements which has not been studied yet to the best of our knowledge, the usage of bistatic and monostatic geometry for antennas was evaluated. Data reduction and joint time-frequency analysis provided a distinctive feature set to isolate dynamic rotational and translational head movements via class memberships.
From our experiments and analysis of translational and rotational movements at 0 • , 45 • , 90 • , and 180 • configurations, we demonstrated the advantages of a bistatic over a monostatic configuration for RF sensing of rotational movements. It was shown that 90 • is the optimum bistatic angle to capture the full rotational (yaw) movement with 92% accuracy. Our combined approach of refined configuration and frequency-time analysis shows that translation along the radar line of sight is more easily distinguished in the monostatic configuration and that detecting all classes of rotational and translational movement within one system requires additional RX units at, 0 • and 90 • , for an accuracy above 88%.
Our approach can be characterized as comprising marking boundaries to detect individual movements and separate them from each other, after which the pool of features is computed via STFT analysis, where a distinctive feature set for dynamic movements' separation is obtained by removing the correlations between features via a PCA-based feature extraction technique. A supervised machine learning classification algorithm SVM was used to train our model and predict the unknown movement classes. This non-invasive and simpler approach to head tracking holds potential to improve driver state monitoring in ADAS and conditionally autonomous vehicles and help intelligent car assistants guarantee a smooth and safe journey. In particular, it is vital to predict the driver's capacity to supervise or maneuver the vehicle in case of unexpected road events.
The authors intend to perform these experiments in realworld vehicle conditions to verify our approach's practicality. On account of the beamforming capability of mmWave signal and the static nature of the in-car environment, we expect to see similar accuracy with regard to head movements. Also, we plan to extend our system's movement-detection capability and classes by detecting small head displacements and conducting comparisons with fine-motion-tracking devices.