Human Head Yaw Estimation Based on Two 3-Axis Accelerometers

The estimation of the orientation of an object, and a human head in particular, can be defined by the Euler angles: the yaw, pitch and roll. The robust and drift-free estimation of those angles is usually achieved with the data from several sensors such as accelerometers, gyroscopes and magnetometers, processed with sensor fusion algorithms. However, wearable devices such as hearing instruments are rarely equipped with all those sensors, and might usually only have a single accelerometer embedded per device. While it is possible to retrieve a correct estimation of the roll and pitch using only a single accelerometer, estimating the yaw is a more challenging task as accelerations around the gravity vector cannot be detected by an accelerometer. In the context of binaural communication devices and spatial hearing, the yaw of the head is a key information that helps achieving dynamic binaural synthesis. This work proposes an algorithm that aims at estimating the yaw of a human head by using only two 3-axis accelerometers that are placed at each side of the head. The algorithm is evaluated with acceleration measurements of human subjects achieving a realistic task. The results suggest that the strategy used is promising as a good performance can be achieved for a majority of the measures as long as an individualized set of parameters can be selected. The results also suggest that the performance of the current implementation is sensitive to sensor displacement after calibration or a wrong estimation during initialization.

spatial awareness [4], [5]. Voss et al. [6] showed significant improvement in speech understanding, sound localization as well as preference in hearing-impaired listeners with such steering approaches. Being able to track the head motions of a user is of high interest since this would allow controlling the beam of a directional microphone system for example. In the context of spatial audio rendering with wearable devices, the knowledge of the orientation of the user's head in the azimuth plan enables achieving dynamic spatial rendering, i.e. the spatial filtering can be adapted to remain valid according to the listener's motions. Such processing can greatly improve the listener's spatial audio experience by increasing plausibility and reducing typical perception artefacts such as front-back confusions [7]. Previously developed binaural spatialization algorithms aim at improving the perception of auditory externalization and distance in normal hearing and hearing-impaired listeners [8]. In particular they are optimized for low computational cost, to be compatible with wearable devices [9]. The addition of head-tracking in this context is motivated by the potential increase in auditory externalization demonstrated in several studies [10], [11].
The orientation of an object, and a human head in particular, can be determined using Euler angles. Each angle, named the roll, pitch and yaw, corresponds to the rotation from an initial position, around the x-, y-and z-axis respectively. Euler angles This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ are expressed as a sequence of those three rotations around the rigid body's local coordinate axes. One of the most common method for tracking each of these angles relies on inertial sensors, and is usually achieved using at least a combination of one or more three-axis accelerometers and one or more threeaxis gyroscopes. The gyroscope measures the sensor's angular velocity, while the accelerometer measures the external specific force acting on the sensor, i.e. the earth's gravity and the sensor's acceleration. A magnetometer can also be used in combination, to help determine the orientation of the system using the earth's magnetic field. Devices using such sensors are usually referred to as inertial measurement units (IMUs). However, IMUs are typically prone to integration drift. Errors due to the measurement of acceleration and angular velocity can progressively be integrated and the accumulation leads to larger errors in the estimation of the velocity and the angle. Some devices combine an IMU and an on-board processing system to run sensor fusion algorithms and aim at reducing this type of drift [12], [13]. Such devices are usually referred to as Attitude and Heading Reference Systems (AHRSs). Drift from the gyroscopes integration can then be compensated by the reference vectors: gravity and the earth magnetic field. With all these devices, eventually in combination with machine learning algorithms, a reliable drift-free instantaneous estimation of the head orientation can be obtained [14], [15].
Nevertheless, not every device can be equipped with all the necessary sensors and few solutions have been proposed to track head movements relying solely on bilateral accelerometers so far. Some devices, such as HIs for which battery life and compactness are key factors, are only equipped with a single three-axis accelerometer. Such an accelerometer can be sufficient to determine the roll and pitch of the object to which it is attached, as detailed in [16], but does not allow estimating the yaw alone. An algorithm aiming at estimating the yaw rate of a vehicle using two 1-axis accelerometers installed on the center-line of the vehicle was presented in [17]. Nevertheless the proposed algorithm relies on a model and assumptions associated with the vehicle tracking use case, which are not applicable for tracking the motions of a human head.
In [18], an algorithm is proposed to determine the yaw and roll angular rotation rates of a spinning rigid object using only two linear three-axis accelerometers. The method was found to achieve satisfactory performance when the rotational motions were rather short (< 2 s) and fast (140 • /s to 1200 • /s). The proposed method is dependant on the sensors noise and relies on a fixed threshold to prevent drifting due to this noise. It is also specified that the angular accelerations need to be lower than the accelerometer sensing range in order to not saturate the sensor. For example, in [18], this corresponded to 1200 • /s. In the use case of head motions, typical movements rarely exceed peak values of 350 • /s in the case of healthy subjects [19].
This article describes a real-time method aiming at estimating the relative yaw position of a human head using only two 3-axis accelerometers such as the ones found on wearable devices. Such solution relying only on accelerometers may provide significant power consumption savings over solutions relying on the use of a gyroscope and additional sensors.

II. DESCRIPTION OF THE ALGORITHM
This section describes a method algorithm for estimating the yaw position of a human head using two 3-axis accelerometers. The base concept of the proposed algorithm is directly inspired by the one described in [18], which aims at estimating the yaw velocity of a spinning object. The principle of this base algorithm is to combine the data from two accelerometers placed on the spinning object in order to estimate the angular rotation speeds. The present method proposes several additional steps and improvement to extend it to the more complex task of tracking the yaw of human head motions using the accelerometers embedded in ear-worn hearing devices.

A. Description of the Model
Two accelerometers A1 and A2 are mounted on the opposite sides of a head, at the position of the ears. In practice, in the work reported in this paper, they were embedded in hearing devices placed above the ears. It is assumed that each of those accelerometers is fixed on the head, and cannot move independently from each other nor from the head. Their coordinate system (x, y, z, ω x , ω y , ω z ) is defined as shown in Fig. 1. In this algorithm, both accelerometers are supposed to be aligned with the center of the head, on the y-axis of rotation.

B. Base Algorithm
For this part of the algorithm, both x-axes of the accelerometers are considered parallel to each other, and the same applies to both z-axes. This is not the case in practical cases where accelerometers are placed on a human head, but the issue is addressed in II-C.4.
1) Computation of the Rotational Speed Magnitude: The acceleration measured by each of the accelerometer A1 and A2 along the y-axis is the superposition of a rotation component (the radial acceleration) and a translation component: a y2 = a r y 2 + a t y = 2 r d 2 + a t y (2) where d 1 (respectively d 2 ) is the distance between the center of the head and the accelerometer A1 (respectively A2) and r is the angular speed. By taking the difference between a y1 and a y2 , it is possible to eliminate the translation component, and deduce the angular speed: 2) Computation of the Angular Velocities: To compute the angular velocities ω x and ω z around the x-and z-axis, it is necessary to compute and integrate the angular accelerations α x and α z . The acceleration along the x-axis measured by each accelerometer A1 and A2 is the superposition of a rotation component (the tangential acceleration) and a translation component: By taking the difference between a x2 and a x1 , it is possible to deduce the angular accelerations around the z-axis as: Similarly, the angular accelerations around the x-axis is: The integration of α z and α x enables determining ω z and ω x from an initial time instant t init at which the speed is known, 3) Computation of the Yaw Estimation: In the setup described in Fig. 1, the accelerometers A1 and A2 are located on the rotation y-axis. Consequently, an arbitrary angular motion around the y-axis has a contribution equal to zero in a x1 , a x2 , a y1 , a y2 , a z1 and a z2 . Hence, the total angular rotation measurable by this setup is only affected by rotations around the x-axis and z-axis. So the total angular rotation rate ω total is given by: Using the total rotation rate magnitude calculated in Eq. (10) and the relative magnitude and direction of rotation obtained from Eq. (3), the yaw velocityẏaw can be determined as: Then, it is possible to add another integration step to the algorithm described in [18] to derive the yaw: where t init is a time instant at which the orientation of the user's head relative to a fixed point in the environment is known, e.g. when the user is facing forward at 0 • at the start. However, the accelerometer signals contain a certain amount of noise. Hence the signals integrated in Eq. (12), i.e. ω x , ω z and r , are noisy. Even with efficient denoising, the residual sensor noise remains challenging for the estimation of the rotation speed magnitude r . Indeed, in Eq. (3) the absolute rectification of a signal corrupted by a zero-mean noise produces a biased noisy r . This leads to r ≥ 0 even in the absence of movement. Integrating a biased r would yield erroneous rotation estimations. To avoid integrating the noise bias when there is no movement, any value of r that is below a threshold T is set to 0: T is constant in the algorithm described in [18] while it is dynamically controlled in the algorithm proposed in this article, as described in section II-C.1.
C. Upgrades of the Algorithm for the Head Tracking Application 1) Dynamic Definition of the Threshold: r , as defined in Eq. (3) is always positive, even when the system or the head of the user does not move. The mean value of r is then defined by the mean and standard deviation of the noise of the sensors a y1 and a y2 . A non-null r implies a non-null rotation speed. In order to ensure that r is null when there is no movement, a threshold T is used to force any smaller value of r to be set to 0. i.e. when a steady-state is detected. This value should depend on r , the mean of r and its standard deviation σ ( r ). As the noise is likely to evolve between measures, and during the estimation, it is necessary to dynamically define T so that the value can be optimally adapted to every situation. The priority is to have a robust computation, as an underestimation of T would lead to integrate noise in the next step. It is also important to have a reactive computation of T that adapts to the evolution of r with a small delay, in order to avoid integrating noise for too many samples nor missing too much of the motion if T is temporarily overestimated. A satisfying method for defining and updating T was defined using the following computation. First let K T be defined as: where γ is a parameter of the algorithm which value can be set empirically and Nb is the fixed size of the sliding buffer used for this computation. Nc is used to select the first samples of the buffer and can be set in relation to Nb, e.g. Nc = 3 4 Nb. The following conditions are checked at every sample n: where S R is the tolerance on the standard deviation that can be set empirically as an absolute value. Finally clock is the running sample counter and r T is the refresh rate in samples When the user is steady, Ω T is set to remain just above Ω r , and no computation is performed. As soon as a movement is performed (peak in Ω r ), Ω T remains constant until the next steady state, and the values of Ω r are used.
used to update T (n), defined such as: An example of the results obtained for T with this method is displayed in Fig. 2 2) Integration: In practice, the sensor data acquisition is discrete. The various stages of integration could therefore be processed with either a cumulative sum or with the trapezoidal rule. Both approaches have been compared and no significant difference were found. The simpler cumulative sum was therefore chosen for the integration, and Eqs. (8) and (9) therefore become: where k varies between the first sample of the buffer n 0 and the last sample n, and f s is the sampling frequency. Similarly, Eq. (12) in discrete time becomes: These integration steps introduce some errors, due to the noise of the sensors and the discrete approximations. Consequently, the initial conditions of the two stages of integration should be reset whenever possible. For Eqs. (17) and (18), this corresponds to when the rotation speed of the system r is null. For Eq. (19) this corresponds to when the rotation speed r is null and when the system receives information about its orientation compared to a fixed point in the room.
In the current implementation of the system, the reset of the integrals in Eqs. (17) and (18) is performed with the following method: unchanged otherwise (21) This means that the integral is reset when no movement is detected during N I nit T / f s seconds. The integral of Eq. (19) is only reset at the beginning of the yaw estimation. Otherwise, the last estimated orientation before detection of a steady-state is always kept in a buffer as initial condition of the next integration.

3) Sensor Denoising Using an Adaptive Smoother:
Let A Y , A X and A Z be defined as the difference of the accelerations measured by each sensor on the x-, y-and z-axis in Eq. (13), Eq. (6) and Eq. (7) respectively: The axis label will be discarded in the following as the concepts developed in this section are independent of the axis. Because the sensors are noisy, only the noisy counterparts a 1 andã 2 of the true accelerations a 1 and a 2 are available. The noise model can reasonably be assumed as additive: with the noise components v 1 and v 2 orthogonal to each other and to A (i.e. zero mean and uncorrelated ( . v can be interpreted as an average noise signal orthogonal to The variance ofÃ is: Compared to a method based on a single signalã 1 , using A leads to up to 3 dB signal-to-noise ratio improvement for d 1 = d 2 : The signalÃ should be further denoised before being processed in the integration stages and prior to compute T . A second reasonable assumption about A and v can be made: v is stationary and its power spectral density is relatively uniformly distributed over frequencies. A is a non-stationary band-limited signal with a short-term power spectral density A ( f, n) upper bounded by a maximal frequency f B (n) that can vary over time, depending on the head motion. The motion can be qualitatively characterized by two different motion modes: "slow" and "fast". In the slowest mode, the head is either moving and/or accelerating very slowly or not moving at all. A ( f, n) is band-limited at f B (n) = f 0 , which means that v is the dominant signal above f 0 as qualitatively pictured in Fig. 3(a). In the fastest mode, the head is moving quickly and/or accelerating strongly. A ( f, n) is smeared over a broader frequency range, delimited by a frequency f B (n) = f 1 with f 1 f 0 as qualitatively depicted in Fig. 3 Based on these assumptions, one can expect a good esti-mationÂ(n) of A(n) using a low complexity smoothing of A(n) made of a first order recursive filter with a 3 dB cut-off frequency f c (n) than can be adapted at each sample n to track f B (n), with f B (n) in the range bounded by f 0 and f 1 for the slowest and fastest modes respectively. This is equivalent to the application of an input power adaptive threshold mechanism between f 0 and f 1 : the input power in the spectral band below f c is passed through, while the input power beyond f c is stopped with f c being equal to the frequency f B at which the input power distribution ends. The heuristic behind this approach is the following: if the detected power between f 0 and f B cannot be produced by the frequency components of v alone, some participation of A is necessary. Otherwise, the frequency components of v are sufficient to explain the amount of power detected beyond f B . So, it makes sense to Such a smoothing has to be transparent while the user is moving (fast mode), in order to accurately track the variations during those segments ( f c close to f 1 ). When the user is nearly static (slow mode), it is desirable to have a more radical smoothing ( f c close to f 0 ), in particular because one of the goals is to reduce the standard deviation of r as much as possible so that the computation of T can be optimized to the lowest possible value. Overestimating T means that slow motions will not be detected, whereas underestimating T means that some noise will be integrated, and thus the final estimation of the yaw will drift.
The estimateÂ(n) of A is obtained via recursive averaging: where λ(n) is the time variable smoothing coefficient that must be adaptive controlled to guarantee a good approximation of A(n) independently of the context (slow or fast mode). λ(n) can be derived from f c (n) [21]: with f s the sampling frequency. λ(n) is bounded by λ 0 = λ( f 0 ) and λ 1 = λ( f 1 ).
To use the above equation, the adaptive control of λ(n) requires the signalÃ(n) to be transformed into the frequency domain to compute an estimatef B (n) from the evaluation of Ã( f, n). To keep the complexity low, it is desirable to realize the adaption control in the time domain. λ(n) must be adapted as a function of the time constant τ c (n) instead of the cut-off frequency f c (n): Eq. (31) can be obtained as follows: In the Z domain, Eq. (28) produces the following transfer function [21]: the last equality being valid because |(1 − λ)z −1 | < 1. The above transfer function can be Z inverse transformed [21] into the impulse response h[k]: Using either the Impulse Invariance [22]- [24] or the matched Z-Transform [23], [24] methods, one can derive Eq. (31) from Eq. (33). Similarily to f c (n), τ c (n) has to be adapted to suit the motion at sample n. Instead of using τ c (n) =τ B (n) withτ B (n) an estimate for τ B (n), the time constant λ c (n) that corresponds to f B (n) is directly obtained using the heuristic procedure described below.
In the slowest mode, there is only little power above f 0 . Thus, the variation between the current and the previous observed samples defined as: cannot be large. Indeed, in the slowest mode, we can expect the acceleration to be approximately constant, i.e.: This means that the observed change is approximately the noise variation between two successive samples: with |v(n)| ∼ σ v . By replacing A(n − 1) by its estimatê A(n − 1) in Eq. (35), a distance function d(n) can be defined as: such that in slow mode, the magnitude of v(n) can be approximated using this distance d(n): In fast mode, the acceleration signal contains power in the high frequencies, allowing A(n) to be very different from A(n −1). Hence, the approximation of Eq. (38) is not valid anymore. Slow and fast modes can thus be distinguished from each other by comparing the distance function d(n) to a threshold l 0 ∼ σ v . Practically, λ(n) is mapped from the distance d(n) using a piece-wise linear curve as pictured in Fig. 4 and implemented using the following equation: or and where λ 0 and λ 1 are obtained from τ 0 and τ 1 , respectively, using Eq. (31). Note that the time constants τ 0 and τ 1 have to be set empirically, trading good denoising performance in slow mode for low-latency tracking ability in fast mode. They are however expected to correspond approximately to their frequency domain counterpart f 0 and f 1 , respectively. Other mapping functions (e.g. arctangent, hyperbolic tangent, sigmoid, etc), could also be used to provide a smoother control of λ(n), potentially with larger computational costs.
Finally, r , α z and α x are obtained with the following equations: o t h e r w i s e (43) that replace Eq. (13), Eq. (6) and Eq. (7), respectively.

4)
Initialization of the Head-Tracking Algorithm: Because of the human head morphology, when using two sensors placed on the ears of a human head, the axes of the accelerometers may not be aligned with the axis as described in the algorithm. The y axis should be aligned with the interaural axis, the z-axis should be vertical, and the x-axis should be in the forward direction. In order to correct for this, a multi-stage rotation initialization has been implemented. A 12-stages calibration of multiple sensors was proposed in [20]. The method proposed here requires only two stages of calibration.
First, the user is asked to remain upright, looking in front of them, and steady. This is the first steady-state, and helps doing an initial alignment of the z-axis by applying a roll and a pitch correction. The users then need to lower their head, i.e. doing a pitch movement without any roll nor yaw movement, and keep their head down steadily for several seconds. This step allows to estimate the initial yaw of each accelerometer. S cal = max(σ (a x1 ), σ (a x2 ), σ (a y1 ), σ (a y2 ), σ (a z1 ), σ (a z2 ))δ After S cal is determined, if the standard deviation of the last N Steady ST D samples remains below this threshold for every acceleration, the head is considered in steady state. A counter is incremented for each new steady state detected, enabling triggering the appropriate computation corresponding to each of the two first steady states.

6) Compensation of the Initial Orientation of Each Sensor:
The first step consists in computing the initial pitch and roll angle for each accelerometer individually. This can be done at the start, as soon as the first steady state is detected. For each sensor ( j = 1 or j = 2 for the accelerometer A1 or A2, respectively), the following equations, for which the demonstration can be found in [16], are used for the pitch and the roll, respectively: For each sensor, the following rotation matrices are then used to compensate for the initial pitch and roll respectively: Once the first steady-state is detected, the means a x j,steady1 , a yj,steady1 , a zj,steady1 for each sensor j are calculated. The means are computed over N Steady ST D samples of smoothed sensor data. For this specific part of the algorithm, the sensor data is smoothed using a basic exponential smoothing, independent from the adaptive smoother as described in II-C.3. A fixed smoothing factor is used, with a value λ R = 0.05 (τ = 0.39 s with f s = 50H z), and the smoothed accelerations for each axis i (i = x, y or z) and each sensor j are computed as: The roll and pitch value of both sensors are then computed using the means for each sensor. In order to correct the yaw ψ j of each sensor, a second steady-state is required. When users rotate their head so as to have a non-null new pitch, but no new roll nor yaw, if the yaw of one of the sensor is non-null, the y acceleration of this sensor will also be nonnull. Therefore when a second steady-state is detected, the algorithm assumes the user moved only in pitch (around the y-axis The values of ψ j are then used in the rotation matrix associated with the initial yaw of each sensor: In order to increase the precision of the estimations, it is possible to run this estimation in a loop. After the rotation and a steady2 values, the roll, pitch, and yaw correction can be re-estimated and added to the initial estimations. This can be done in a loop until the values of a steady1 and a steady2 are below a predefined threshold.

D. Real-Time Implementation
The real-time version of the algorithm was implemented on Simulink (The Mathworks, Portola Valley, CA, USA). A simplified descriptive block diagram is depicted in Fig. 5. The sampling frequency was fixed to 50 Hz. This value was determined mainly by the hardware implementation of the prototype, and is sufficient to cover the needed range of accelerations. The algorithm uses sliding block processing. Most of the processing is performed sample by sample, inside each block. The size of the block was therefore reduced to 1 sample, keeping the latency low. No overlapping blocks are necessary in the implemented algorithm.

III. EVALUATION OF THE ALGORITHM
In order to evaluate the reliability of the algorithm, a database of head movements was first recorded, on which the algorithm was then tested. Next, the evaluation was performed by comparing the obtained estimation against the performance of a complete state-of-the art Euler angles tracking algorithm.

A. Database
The aim of this database is to display a large and realistic range of head movements that can occur in conditions where the algorithm could be used. It was therefore chosen to measure data from subjects, doing a task where they would not limit their head movements to the horizontal plane.
For the data acquisition, HIs with embedded accelerometers were placed over the pinna of the participants. A headband was used to carry the small circuit boards used to process the data of the sensors. The cable linking the HIs to the circuit board and the circuit board to the computer were chosen to be thin and flexible to avoid unwanted tension forces to be applied on the HIs. The device fitting is shown with a close-up shot in Fig. 7. The subject was also equipped with a NGIMU 1 IMU/AHRS device.
The acceleration data from both sensors was retrieved via USB. Euler angles measured with the IMU/AHRS device were recorded via USB as well, and served as a reference measurement. All the data was acquired and recorded on a dedicated Simulink model.
The listeners were asked to point their head toward the loudspeaker which emitted a sound (a speech sample pronouncing the number designating the loudspeaker). Then they had to wait in this position until a word is pronounced through the same loudspeaker. Subsequently they had to write down the word on a piece of paper on the table. Then they could move their head back to the last loudspeaker emitting the sound, and wait for the next number and word. The total duration of the measure was about 120 s and included 12 target positions.
For this database, 11 participants were fitted with the prototype and the reference IMU/AHRS sensor, and followed the protocol mentioned above. The instruction was given to the participants to turn their head as naturally as possible, as they would be attending a meeting with several speakers around them.
For the calibration, visual marks were placed in front of the subject and on the table for the first and second steadystates respectively. To ensure a precise movement from the subject, a visual feedback of the roll angle, as measured by the IMU/AHRS device, was provided to them on a screen. When the subject performed the second steady state, with a targeted pure pitch motion, they were instructed to keep this roll value between −2 • and +2 • , which was empirically found to be sufficient precision. The participants managed to perform the calibration with those constraints after a maximum of 3 trials.

B. Evaluation Method
A batch evaluation was conducted using the measures and various settings of the algorithms. The goal was first to assess the effect of various parameters used in the algorithm and look at the best performance achievable for each measurement.
The evaluation was achieved by comparing the estimation obtained with the proposed method to the results obtained with a state-of-the-art fusion algorithm for tracking Euler angles. 1 https://x-io.co.uk/ngimu/

TABLE I LIST OF THE TESTED PARAMETERS AND THEIR VALUES
The latter reference was taken from the Euler angles estimation performed by the NGIMU device, which is based on the implementation described in [25]. More precisely, a score was computed using the root-mean-square error (RMSE) between the reference yaw obtained with the NGIMU device and the estimation obtained from the proposed algorithm using two hearing devices with embeded accelerometers. This corresponds to: where N m is the length of the measurement in samples, yaw est is the estimated yaw, and yaw ref is the reference. The choice of the tested parameters and their range of interest were set empirically and are shown in Tab. I. The factorial design with those factors and levels leads to total of n = 576 combinations.

C. Results
During the measure, the USB cable was attached to the headband and could potentially apply some tension on the sensors if the participant involuntarily pulled the cables. Participants were wearing face masks, which made it complicated to have the HI fixed behind the ear firmly enough. Consequently, the HIs could sometimes move, in which case the calibration would not be adequate anymore, leading to drift in the yaw estimation. Hence the occurrences of displacement had to be monitored both with feedback from the participants and by scanning the acceleration data and detect unexpected behaviors. For that reason, in order to maximize the chances of obtaining at least one measure per participant with no hardware issues, two to three measures were recorded per participant. Out of the total of 30 measures, 19 were kept, for which no obvious hardware issue causing the sensors to move was detected.
For those 19 measurements and for all tested combinations of parameters, the results of the RMSE of the estimation compared to the state-of-the-art algorithm are displayed in Fig. 8. Examples of various yaw estimations on different measures are depicted in the following figures. First two measures, associated with lower RMSE values, are shown in Fig. 9(a) and 9(b). In Fig. 9(a), it can be seen that the main movements are detected in the correct direction and are well tracked while the subject is moving in azimuth. While the subject performs the writing task, the azimuth might not always be tracked. The same observations can be made in Fig. 9(b), while some local errors decrease the performance. Fig. 9(c) depicts a case for which most of the movements are underestimated, and a part is not detected at all. This could be explained by head movements which are too slow. In this case, the acceleration values are too small and thus, a part or the totality of the movement is not properly detected and set to 0 (see Eq. (43)) as the magnitude of the acceleration is too close to the magnitude of noise. Finally an example of a low performance of the algorithm is depicted in Fig. 9(d). In this case a drift can be observed resulting in a large shift at the end of the measure. Some sign inversions sometimes occur at the end of the movement, which yields large errors. An hypothesis can be made in this case that the initial yaw of each sensor was poorly estimated during the calibration, which results in an overestimation for a part of the movements in one direction, and underestimation for the other direction, resulting in a drift over time.

IV. LIMITATIONS OF THE ALGORITHM A. Measure Dependent Performance
In a view to assessing the possibility to find an optimal setting which might yield a performance close to the state-ofthe-art method on most measurements, a factorial analysis was conducted, as described in [26]. For seven of the measures (A1, B1, C1, D1, E1, F3, H1), it is considered that the errors are too large to be included in the data used to find a common set of parameters. It is hypothesized that sensors displacement might have occurred during these measurements, and that those were not visible when checking the acceleration data. This could happen if a displacement occurred during a movement. In this case, the displacement-induced shift in acceleration would not be visible while the acceleration is changing because of the head movement. Another hypothesis is that the calibration phase was not achieved precisely enough, notably to compute the initial yaw of the HI behind the ear. Hence, it was decided to investigate the possibility of an optimal parametrization with the twelve other measures.
The factorial analysis method described in [26] assumes that the dependent variable Y of the experiment is determined by the experimental conditions, and can be approximated by a polynomial function with second order interaction terms. An example is given below for three hypothetical parameters x 1 , x 2 and x 3 :  Then a sign table is built from this table, for each column (associated with one parameter), a "−" is associated with the minimum value and a "+" is associated with the maximum value. The coefficients b 1 , b 2 …are evaluated by either adding or subtracting the value of the response (in this case the RMSE value) for each line based on the signs in their corresponding columns from the factorial sign table. The result is divided by the total number of observations. Once the coefficients b 0 , b 1 , b 2 …are evaluated, the function describes qualitatively how the experimental variables and their interactions influence the response. This analysis was conducted for the twelve selected measurements and the seven  Those results indicate that the largest influence in the range of tested values is due to γ , Nb and D. It also shows that the interaction coefficient between γ and Nb is important as well (mean = 0.52, ST D = 4.21). However, no clear trend can be observed about which direction those parameters tend to affect the RMSE value, as the STD is large and the mean mostly centered around 0.
The results show that with this algorithm, a performance close to the state-of-the-art method can be achieved for certain of the measurements. The attempts to find a satisfying tuning that could work on every measure revealed that it is difficult at this point to find a unique set of parameters which might improve the performance of all measures. In particular, Nb, γ and D have the largest influence in the range of the tested values, but a change in certain direction in those values might improve the performance for some measurements and decrease it for other measurements. This indicates that the algorithms could benefit from a dynamic tuning of those three parameters depending on some characteristics of the accelerations retrieved from the sensors.

B. Other Limitations and Related Perspectives for Improvement
The acquisition of the data revealed that the algorithm is sensitive to displacements of the sensors. Indeed, if one of the sensors is displaced after calibration, the initial orientation of the sensors as computed during initialization becomes erroneous. This means that a differential is generated between the two sensors, and this offset might be interpreted as a constant speed movement. The dynamic definition of T as described in Eq. (16) should prevent to drift during static parts. Nevertheless, the differential created by a displacement of the sensor might result in overestimation for certain movements and underestimation of other movements which will create an accumulation of error with time. The recording of the data with the prototype included several hardware constraints (cables in particular) which might have made the measures particularly sensitive to this kind of issue, compared to HIs that would be placed behind the ears without being attached to other devices, and might be less prone to displacement.
The integration errors that might occur during the estimations are likely to accumulate over time, and generate a shift of the overall estimation. The current implementation would benefit from an absolute reference in the environment where it is used, that could punctually help re-defining the initial condition of the second stage of integration. HIs being equipped with two microphones per device, it is e.g. imaginable to use direction-of-arrival (DOA) algorithms with sound emitted from a known position in the room.
In the theory behind the equations of this algorithm, the sensors are supposed to be approximately aligned with the center of rotation of the head. In reality, this is dependent on the morphology of the head of the user. Further investigations and developments should study the influence of this parameter in the results.

V. CONCLUSION
In this article, a method was proposed to estimate the azimuth orientation of a human head in the horizontal plane, using two 3-axis accelerometers only. The method was evaluated with measures of realistic movements made by human subjects, and acquired with the accelerometers of HIs.
The results are promising and a performance close to a stateof-the-art algorithm can be achieved for a part of the measures as long as certain parameters are set individually, as revealed by a factorial analysis and a batch analysis investigating the effects of several parameters. The algorithm is sensitive to sensor displacement and wrong estimation of the initial orientation of the sensors, which both yield poor results. With the current implementation, it is not possible to find a unique set of the fixed parameters that would yield good performance for every measurements. This encourages to investigate the possibility to tune dynamically some of them, based on the acceleration data.