Efficient Representation and Sparse Sampling of Head-Related Transfer Functions Using Phase-Correction Based on Ear Alignment

With the proliferation of high quality virtual reality systems, the demand for high fidelity spatial audio reproduction has grown. This requires individual head-related transfer functions (HRTFs) with high spatial resolution. Acquiring such HRTFs is not always possible, which motivates the need for sparsely sampled HRTFs. Additionally, real-time applications require compact representation of HRTFs. Recently, spherical-harmonics (SH) has been suggested for efficient interpolation and representation of HRTFs. However, representation of sparse HRTFs with a limited SH order may introduce spatial aliasing and truncation errors, which have a detrimental effect on the reproduced spatial audio. This is because the HRTF is inherently of a high spatial order. One approach to overcome this limitation is to pre-process the HRTF, with the aim of reducing its effective SH order. A recent study showed that order-reduction can be achieved by time-alignment of HRTFs, through numerical estimation of the time delays of the HRTFs. In this paper, a new method for pre-processing HRTFs in order to reduce their effective order is presented. The method uses phase-correction based on ear alignment, by exploiting the dual-centering nature of HRTF measurements. In contrast to time-alignment, the phase-correction is performed parametrically, making it more robust to measurement noise. The SH order reduction and ensuing interpolation errors due to sparse sampling were analyzed for these two methods. Results indicate significant reduction in the effective SH order, where only 100 measurements and order 6 are required to achieve a normalized mean square error below $-$10 dB compared to a fully-sampled, high-order HRTF.


I. INTRODUCTION
Spatial audio interpolation plays an increasingly important role in applications such as virtual and augmented reality, spatial music, multimedia and gaming [1], [2].Spatial audio gives the listener the sensation that sound sources are positioned in 3D space and helps to create immersive virtual sound scenes [3], [4].A key component in spatial audio rendering through headphones is the head-related transfer function (HRTF) [5], which is the acoustic transfer function from a sound source to a listener's eardrum [6].The HRTF is a complex-valued function of direction and frequency.At each frequency, it represents both the magnitude and phase shifts in the transformation of the sound-pressure waveform from the source to the eardrum.
For high quality spatial audio rendering, an individual HRTF is required [7], with high resolution in both the space and Z. Ben-Hur (e-mail: zami@post.bgu.ac.il) and B. Rafaely are with the Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel.D. L. Alon and R. Mehra are with Facebook Reality Labs, Facebook, 1 Hacker Way, Menlo Park, CA 94025, USA. the frequency domains.Measurement of an individual HRTF at high spatial resolution is challenging and requires special and expensive equipment [8][9][10].Therefore, using sparsely measured HRTFs is of great interest.However, the use of a sparse HRTF in head-tracked virtual audio systems necessitates some type of interpolation and real-time filter updating in order to create an accurate virtual sound environment.A considerable amount of work has been done to find an adequate representation in the spatial domain that will facilitate efficient spatial interpolation, and that can be used in realtime applications.However, current methods of interpolation, such as bilinear interpolation or cubic spline interpolation, do not provide sufficiently accurate or high quality HRTFs from sparse measurements, due to the high spatial complexity of the HRTF, especially at high frequencies [5], [11].
Recently, the spherical-harmonics (SH) representation has become key to a very common interpolation approach [12][13][14][15][16], taking advantage of the spatial continuity and completeness properties of the SH basis functions over the sphere.However, SH representation of sparse HRTFs may lead to sparsity error, which comprises both truncation and aliasing errors [17], [18].Truncation error is caused by the limited SH order [19], due to the limited number of spatial samples, or due to realtime requirements, where a small number of SH coefficients leads to a lower computational cost.The truncation error has been shown to have a detrimental effect on the reproduced spatial audio [19], [20].Spatial aliasing error is caused by the limited number of spatial samples, where the high order SH coefficients of the sampled function are aliased into the low order coefficients [21], [22].HRTFs can be considered to be order limited functions, where their order increases with frequency, and has been theoretically shown to be more than N = 40 at a frequency of 20 kHz [15].This means that at least (N + 1) 2 = 1681 measurement directions are needed for accurate SH representation of the HRTF up to 20 kHz.Therefore, it is of great importance to find ways to reproduce high quality HRTFs from a sparse set of measurements.
One approach for reducing the required number of measurements is to pre-process the HRTF in order to reduce its effective SH order.Several pre-processing methods have been previously suggested in the context of SH representation of HRTFs [12], [16], [23][24][25].A recent study by Brinkmann and Weinzierl [26] compared these methods, using a simulated HRTF, in terms of SH energy distribution and binaural models for source localization, coloration and correlation.They showed that complex frequency representation of the HRTF, which seems to be the most common method [12], [15], [19], [27][28][29], requires the highest SH order and leads to the largest errors for a given SH order.The time-alignment method, which was first suggested by Evans et al. [12], where the Head-Related Impulse Responses (HRIRs, which are the time domain representation of the HRTFs) are aligned in the time domain by removing their delays, requires the lowest SH order.They also tested several mixed methods, i.e. magnitude-phase representations, where the magnitude and phase responses were interpolated separately (using frequency unwrapped phase [12], spatial unwrapped phase [23], logarithmic magnitude [16] and smoothed magnitude [24]).In a recent study by Zaunschirm et al. [25], another method that uses a frequency-dependent time-alignment of the HRTF and results in similar SH order reduction to that of the original time-alignment method was suggested.The time-alignment method requires estimation of the time delay of each HRIR.However, this approach might be sensitive to errors, mainly due to low SNR values and the multiple peaks in the HRIR on the contralateral side [30], [31].
In this paper, we present a new pre-processing method that exploits the fact that the HRTF is defined as a ratio between two transfer functions, and performs phase-correction accordingly.The correction of the phase of a measured HRTF is performed by translating the origin of the free-field component from the center of the head to the position of the ear.The advantage of using such a correction is that it can be computed parametrically, without the need for numerical estimations, which makes it more robust to measurement noise.The effect of the phase-correction on the SH representation of sparse HRTFs is presented theoretically (Secs.III, IV and V), and demonstrated numerically in terms of SH order reduction and interpolation errors using a rigid sphere approximation of the HRTF, and manikin and human HRTFs (Secs.VI, VII and VIII).Results indicate that the SH order required to achieve a normalized Mean Square Error (MSE) of less than -10 dB, compared to a high-order HRTF, is reduced to 9 or less.This leads to sparse HRTF sampling with only 100 measurement directions, which is less than 10% of the number of measurements required to fully sample an HRTF of order around 30. Sec.II reviews the basic theory of SH representation of HRTFs and Sec.IX outlines the conclusions of the research.

II. SH REPRESENTATION OF HRTFS
This section presents the relevant background for SH representation of HRTFs.The HRTF, which describes the filtering effect due to the head, torso and ears of a human, is introduced as an acoustic transfer function.Consider a far-field arbitrary source from direction, Ω ≡ (θ, φ) ∈ S 2 .A pair of HRTFs, H l and H r , for this source and for the left and right ears, respectively, is defined as [6]: where p l and p r represent the complex-valued sound pressure in the frequency domain at the left and right ears, respectively, p 0 represents the complex-valued free-field sound pressure in the frequency domain at the center of the head with the head absent, k = 2πf c is the wave number, f is the frequency, and c is the speed of sound.Note that defining the HRTF as a ratio between sound pressures, instead of between transfer functions, is derived from the assumption that both pressures are measured with the same sound source.
The SH decomposition, also referred to as the Inverse Spherical Fourier Transform (ISFT) of the HRTF is given by: where Y m n (Ω) is the complex SH basis function of order n and degree m [33], and h l\r nm (k) are the SH coefficients, which can be derived from H l\r (Ω, k) by the Spherical Fourier Transform (SFT): where It is now assumed that the HRTF is order limited to order N , and that it is sampled at Q directions.The infinite summation in Eq. (2) can then be truncated and reformulated in matrix form as where the Q×1 vector h = [H(Ω 1 , k), . . ., H(Ω Q , k)] T holds the HRTF measurements over Q directions (i.e. in the space domain).The l\r notation has been removed for brevity.The Q × (N + 1) 2 SH transformation matrix, Y, is given by and Given a set of HRTF measurements over a sufficient number of directions Q ≥ (N + 1) 2 , the HRTF coefficients in the SH domain can be calculated from the HRTF measurements, by using the discrete representation of the SFT [34], where is the pseudo inverse of the SH transformation matrix.Such representation in the SH domain allows for interpolation, i.e. for the calculation of the HRTF at any L desired directions using the discrete ISFT, where Y L is the SH transformation matrix, as given in Eq. ( 5), calculated at the L desired directions, and h L = [ H(Ω 1 , k), . . ., H(Ω L , k)] T is the interpolated HRTF at the L desired directions.
In practice, the number of HRTF measurement directions is limited, and the SH order of the HRTF increases with frequency [15], [29].Thus, if the relation Q ≥ (N + 1) 2 no longer holds, a sparsity error is introduced, which comprises both aliasing and truncation errors [17], [21].

MEASUREMENTS
In this section, the effect of defining the HRTF as a ratio of two transfer functions measured at different positions, as given in Eq. ( 1), on the SH representation of the HRTF is presented.
While HRTFs are measured with a microphone located at the ear, the origin of the spherical coordinate system is at the center of the head.In order to gain insight into the potential effect of this dual-centering measurement process, the simple case of a "free-field HRTF" is analyzed.This simplification is chosen because real HRTFs can only be analyzed numerically, while the "free-field HRTF" can be presented analytically.
Consider a single plane-wave of unit amplitude in free-field arriving from direction Ω with wave number k.The sound pressure at position (Ω 0 , r), can be written as [35]: where Θ is the angle between Ω and Ω 0 , and j n (•) is the spherical Bessel function.
Defining the position of the ear to be at (Ω l\r , r a ), the "freefield HRTF" is defined by substituting Eq. (8) in Eq. ( 1): where p 0 (Ω, k, Ω 0 , 0) = 1 for all (Ω, k).Thus, for a sound field composed of plane-waves from directions Ω ∈ S 2 the "free-field HRTF" can be written as: (10) This equation is similar to the ISFT presented in Eq. ( 2).However, some mathematical manipulation is needed in order to derive the SH coefficients of the "free-field HRTF", as presented in Appendix A. The resulting coefficients are given by: h Theoretically, these coefficients have some energy at every order n, which means that the HRTF is of infinite SH order.Nonetheless, due to the behavior of the spherical Bessel function, which has a negligible magnitude for kr >> n [35], the "free-field HRTF" can be considered to be order limited by N = kr a (Ward & Abhayapala [36] showed that N = kr gives an interpolation error of around -14 dB).Thus, while the possible order of a "free-field HRTF" measured at the position of the ear and normalized by the center pressure is N = kr a , looking at Eq. ( 9) it is clear that if the pressure at the position of the ear and the center pressure were measured at the same position, the "free-field HRTF" would be a constant 1, which is of SH order zero.This case demonstrates how sound pressure at the ear, which is measured at a distance r a from the origin (center of the head), when normalized by a sound pressure at the origin, can lead to an increase in the SH order of the HRTF by approximately N = kr a .Figure 1 shows an example of this added order as a function of frequency, demonstrating the possible large effect at high frequencies where the SH order increases up to 32.Note the similarity of the orders in Fig. 1 to the actual order of HRTFs [15], which suggests that the translation of the origin may explain the high orders, as will be investigated in the remainder of this paper.
Although the theoretical explanation presented in this section may not be accurate for real measured HRTFs, it gives an insight with regard to the possible increase in SH order due to the fact that the ear microphone is located away from the origin.

IV. PHASE-CORRECTION BY EAR ALIGNMENT
To counteract the effect described in the previous section, and to possibly reduce the effective SH order of the HRTF, a phase-correction method based on ear alignment is suggested.
Considering the effect of normalizing the ear pressure by the center pressure, which possibly increases the SH order of the HRTF, an ear-aligned HRTF could potentially be of lower SH order than the original HRTF.The ear-aligned HRTF can be defined as the ratio of the pressure at the ear and the free-field pressure at the position of the ear.A measured HRTF can be aligned by translating the free-field pressure from the center of the head to the position of the ear.This can be formulated using Eq. ( 1) as: where H a is the ear-aligned HRTF, and p l\r 0 is the free-field pressure at the position of the left or right ear.Assuming a far-field HRTF, the sound source can be a plane-wave, and the pressure in free-field can be computed using an exponential formulation as given in Eq. ( 8), which leads to the phasecorrection formulation: where r a is the radius of the head, Θ l\r is the angle between the direction of the source, Ω, and the direction of the ear, Ω l\r , and cos Θ l\r = cos θ cos θ l\r + cos(φ − φ l\r ) sin θ sin θ l\r .Note that, unlike other previously suggested methods for efficient HRTF representation (e.g.minimum phase [16]), this phase-correction process is invertible, which means that going from H l\r to H l\r a and back can be performed without any loss of information.
The proposed phase-correction may seem similar to translating the HRTF in space to be centered at the position of the ear.In fact, in the context of surrounding spherical microphone arrays, acoustic centering has been studied [37], and it was found that the location of a source inside a spherical microphone array influences the energy distribution of the SH coefficients.It was concluded that acoustic centering of the sound source achieved the most compact SH representation.Richter et al. [27] suggested performing HRTF acoustic centering in the SH domain in post-processing, in order to achieve a compact transformation for real-time auralization.However, in this work, in contrast to acoustic centering, the proposed phase-correction can be performed in pre-processing, before performing the SFT, which means that a lower number of spatial sampling points of the HRTF may be required.Furthermore, the approach in [27] may suffer from spatial aliasing and truncation errors, due the the post-processing in the SH domain, while in the proposed method herein these errors are absent.
Additionally, the assumed SH order reduction of the earaligned HRTF can explain the findings of Brinkmann and Weinzierl [26] regarding the reduced order of time-aligned HRIRs.Phase-correction by ear alignment can be interpreted as "virtually" removing the inherent delay in an HRIR caused by normalizing the pressure at the ear by the center pressure.This is evident from Eq. ( 13), where the phase in the exponential represents a delay from the center to the ear due to a source at Ω.The difference between these two methods (timealignment and phase-correction) is that while performing timealignment requires numerically estimating the time delays, phase-correction can be performed parametrically with the parameters r a and Ω l\r .However, estimation of the time delays may be challenging and its accuracy depends on the HRTF direction and on the quality of the measurements [30], [31].On the other hand, using the phase-correction with fixed parameters makes it data-independent (except for the choice of Ω l\r and r a ), which can potentially improve its robustness to measurement noises.In subsequent sections we will compare the two pre-processing methods, in the context of SH order reduction (Sec.VII) and sparse sampled HRTFs (Sec.VIII).

V. SPARSE SAMPLING WITH PHASE CORRECTION
The previous section presented the phase-correction method for the pre-processing of HRTFs.This section will show the use of this method for sparse sampling of HRTFs.
Given a sampled HRTF over Q directions, {Ω q } Q q=1 , its SH coefficients can be calculated using the SFT, as in Eq. ( 6).Assuming that the SH order of the HRTF is N , it will require Q ≥ (N + 1) 2 measurement directions for aliasingfree sampling [21].Now, assuming that the phase-correction can provide an HRTF with a lower SH order N < N (this assumption is validated later in Secs.VII and VIII), which requires Q measurements, then for aliasing-free sampling we get Q < Q.For the case where Q Q, sparse sampling can be performed, while preserving the ability to reproduce high quality HRTFs.
In order to interpolate an HRTF from its sparse samples, first, prior to the SFT, a phase-correction is performed directly on the measured HRTF as given in Eq. ( 13): where and Θ q is the angle between the measured direction Ω q and the direction of the ear Ω l\r .Assuming that the phase-correction results in an ear-aligned HRTF, h a , of SH order N , the SH coefficients of h a can be calculated without error: where Y is the order N SH transformation matrix of size Q × ( N + 1) 2 defined similarly to in Eq. ( 5).From here, the HRTF can be interpolated to any desired direction, using an ISFT of order N , as given in Eq. ( 7), h a L = Y L h a nm , and the inverse phase-correction can then be applied: where h L is the interpolated HRTF at the L desired directions, C L = diag[e −ikra cos Θ1 , . . ., e −ikra cos Θ L ] and Θ L is the angle between the desired direction Ω l and the direction of the ear Ω l\r .The overall process of interpolating an HRTF at any desired L directions from its sparse samples, can be formulated as:

VI. MEASURES FOR EFFECTIVE SH ORDER
To quantify the effect of the phase-correction on SH representation of HRTFs, several measures will be used.This section presents these measures.First, the energy distribution of the SH coefficients is analyzed by calculating the SH spectrum and the parameter b X , which is the lowest order that contains at least X% of the energy.The SH spectrum, which is the energy of the SH coefficients at every order n, is defined as: Note that the measures in this section are presented as a function of frequency, f , instead of wave number, k.This change is done in order to be consistent with the presentation of the measures as a function of frequency in Secs.VII and VIII.
The parameter b X is then calculated as: (19) To evaluate the effect of the phase-correction on the interpolated HRTF, an interpolation error is calculated as a normalized MSE: where h L is the original HRTF, and h L is the interpolated HRTF.Furthermore, magnitude and phase errors are defined as: mag (f ) = 10 log 10 where

VII. SIMULATION STUDY OF RIGID SPHERE AND MANIKIN HRTFS
The measures defined in the previous section were calculated for three different HRTFs based on: (i) rigid sphere approximation, (ii) simulated HRTF of KEMAR [38], (iii) measured HRTF of Neumann KU-100 [39].
(i) Rigid Sphere: As a first evaluation, an HRTF of an ideal rigid sphere is considered.This simplification offers us the ability to mathematically analyze the response, and to evaluate numerically the theoretical solution [40].Consider a unit amplitude incident plane-wave with wave number k arriving from Ω k .The "left ear" HRTF of a rigid sphere of radius r a can be written, using the SH, as [41]: where b n is given for a rigid sphere as follows: where j n and h n are the spherical Bessel and Hankel functions, and j n and h n are their derivatives.Using Eq. ( 13) to correct the phase will give: The rigid sphere HRTF was computed for a sphere with radius r a = 8.75 cm, and position of the ear at (90 • , 90 • ), for a total of Q = 2702 directions in accordance with a Lebedev sampling scheme [42], which can provide an HRTF up to a spatial order of 44.(ii) KEMAR: In order to study the effect of phase-correction on an HRTF, and to be able to compare the results with previously suggested pre-processing methods [26], a simulated HRTF of KEMAR that was used in [26] was analyzed.The HRTF was simulated using the Boundary Element Method (BEM) based on a 3D scan of the head of a KEMAR.The edge lengths of the mesh were graded from 1 mm at the left ear to 10 mm at the right ear (only the left ear HRTF was computed).The simulated frequencies were in the range of 100 Hz to 22 kHz with a resolution of 100 Hz.HRIRs were obtained by inverse Fourier transform, and shortened to 256 samples at a sampling rate of 44.1 kHz by applying squared sine fadeins and fade-outs of 10 samples.A total of Q = 4334 directions were simulated in accordance with a Lebedev sampling scheme (N = 56).For detailed information about the simulation process see Refs.[26], [43].(iii) KU-100: The HRTF of the Neumann KU-100 dummy head was used to demonstrate the effect of phasecorrection on measured HRTF data.The measurement contains a full sphere far field HRTF with a total of Q = 2702 directions in accordance with a Lebedev sampling scheme [39].The measurement was conducted in the anechoic chamber at Cologne University of Applied Sciences.The chamber has a lower frequency boundary of around 200 Hz.Measured HRIRs were obtained with 128 samples at a sampling rate of 48 kHz.For detailed information about the measurement process see Ref. [39].As a first step, for all three HRTFs, the phase-correction was applied using nominal parameters of r a = 8.75 cm and assumed position of the left ear at (θ l , φ l ) = (90 • , 90 • ).These parameters were chosen in accordance with the spherical head model [40].A sensitivity analysis for these parameters is presented in Sec.VIII, together with an optimization study.In order to compare between the phase-correction and the time-alignment methods, a time-aligned HRTF, H ta (Ω, k), was calculated as in Ref. [26], using an onset detection with a threshold of −20 dB and a ten times upsampled and lowpassed HRIR (8th order Butterworth, f c = 3 kHz) [31], [44].For the interpolation of the time-aligned HRTFs, the delays were also interpolated using the SFT and were added to the interpolated time-aligned HRTFs.18).The dashed red and green lines represent the parameters b 90 and b 99 , respectively, where b X is the lowest order at which at least X% of the total energy is contained at order b X and below, computed according to Eq. ( 19) with N = 35 and X = 90, 99.This verifies the proposition presented to explain Fig. 1, in which the high orders of the original HRTF actually originate from the translation of the origin.In particular, the b 90 line, which presents the order at which 90% of the energy is maintained, is reduced to be below order 10 for all frequencies and all HRTFs.While the rigid sphere and simulated KEMAR HRTFs have negligible energy above order 20 for all frequencies, the KU-100 HRTF still has energy up to order 35.This is also evident from the b 99 line.This behavior can be explained by the fact that the KU-100 HRTF is measured and contains measurement artifacts that may lead to added spatial complexity, as is also evident from the energy distribution of the un-processed HRTF, which has higher energy at high SH orders compared to the simulated KEMAR HRTF.

A. SH Spectrum Analysis
Note that for the rigid sphere HRTF, time-alignment seems to reduce the SH energy slightly more than phase-correction.This can be explained by the fact that for this ideal case, the time-alignment has an advantage due to its ability to completely align the HRIR, while the phase-correction only corrects the free-field component.This means that the HRIR from the contralateral direction will still have some delay compared to the HRIR from the ipsilateral direction (due to diffraction).This behavior is demonstrated in Figure 3, where the HRIRs of a rigid sphere after phase-correction and after time-alignment are presented, compared to the original HRIR.It can be seen that the time-aligned HRIR results in ideally aligned impulse responses, while the phase-corrected HRIR shows some misalignment, due to the signals from the contralateral directions.

B. Interpolation Error Analysis
For each HRTF, the interpolation error, L (f ), was computed as in accordance with Eq. ( 20) with h L that contains 240 directions that were chosen from the original HRTF (closest to nearly-uniform distribution).The interpolated HRTF, h L , was computed by performing SFT with the remaining directions up to order 35 using Eq. ( 6) (without any processing, with phase-correction or with time-alignment).Finally, the HRTF coefficients were truncated in the SH domain to various orders, and then interpolated into the original 240 directions (using Eq. ( 7)).
Figure 4 shows the minimum SH order in the truncation process needed to guarantee an interpolation error of less than a specific value of [-20 dB, -15 dB -10 dB].This is shown for the three different HRTFs (rows) and pre-processing methods (columns).The figure demonstrates the reduction in the required SH order for achieving reasonable HRTF interpolation over all frequencies, where the required SH order is significantly reduced when using phase-correction or timealignment.In order to achieve an interpolation error of less than -10 dB, the required order is reduced to less than 8 for the entire bandwidth for all three tested HRTFs.For a -15 dB error, order 13 is needed.
As presented in Sec.IV, the phase-correction was shown to theoretically reduce the SH order by N = kr (for an error of around -14 dB).The figure shows that this is a good approximation in the case of a rigid sphere, and also a very close approximation for KEMAR and KU100 HRTF sets.
In order to achieve a -20 dB error, which can be translated to a 1% error, the required order is still reduced significantly for the rigid sphere and KEMAR HRTFs.However, for the measured KU-100 HRTF the pre-processing methods are able to reduce the required order only up to around 8 kHz.This can be considered to be the limitation of these methods for a measured HRTF, as will be further investigated in Section VIII.
Figure 5 shows the interpolation errors for the KU-100 HRTF as a function of SH order, averaged across frequencies from 0.3 to 20 kHz. Figure 5(a) shows the complex error, L , computed by Eq. ( 20), as presented at the beginning of the section, where h L is defined as an interpolated version of h L after order-truncated SFT, with or without pre-processing.The effect of the phase-correction and the time-alignment are clearly seen as a large reduction in the interpolation error over all orders.It seems that the phase-correction has some advantage at low SH orders.Figure 5(b) shows the magnitude error, mag L , computed by Eq. ( 21), where h mag L and h mag L are the vectors of magnitude values of h L and h L , respectively.In addition to the computation of the magnitude error by using the magnitude values of the original and interpolated HRTFs (i.e. using complex response interpolation), the error was computed for interpolating only the magnitude response of the HRTF, which has been shown to improve the magnitude interpolation error [12], [16].This is marked as "HRTF magnitude" in Fig. 5(b).In this case, the interpolation is performed directly on the magnitude response of the HRTF, which means that the pre-processing methods presented in this paper cannot affect it.The figure shows that using phase-correction or timealignment and interpolating the complex response, result in similar performance to the case of interpolating only the magnitude response.Figure 5(c) shows the results for the phase error, phase L , computed by Eq. ( 22), where h phase L and h phase L are the vectors of phase values of h L and h L , respectively.Additionally, the phase error was computed by interpolating directly the unwrapped phase, which is different to interpolating the complex response and then computing the phase.This is marked "HRTF Unwrap Phase" in Fig. 5(c).Interestingly, the phase-correction and the time-alignment achieved better results than interpolating directly the unwrapped phase.
Figure 6 shows the interpolation error, L , for different frequency bands, demonstrating the significant contribution of high frequencies (above 4.5 kHz) to the interpolation error, and the improvement in the error at these frequencies obtained by using phase-correction or time-alignment.
Similar behavior of the interpolation error was obtained by using the rigid sphere and KEMAR HRTFs (with even lower errors when using the pre-processing methods, as can be expected from Figs. 2 and 4).These results are not presented here due to space constraints.

VIII. EXPERIMENTAL INVESTIGATION OF SPARSELY
SAMPLED HUMAN HRTFS The previous section analyzed the effect of phase-correction and time-alignment on SH representation of HRTFs using densely sampled HRTFs, where only truncation error is introduced.This section analyzes the effect on sparsely sampled HRTFs, where sparsity error is introduced, by investigating human HRTFs obtained both from simulations and measurements.

A. Experimental setup
The HTRFs were taken from the HUTUBS database [45], [46].This database contains numerical simulations and acoustic measurements of full-spherical HRTFs of 94 human subjects.The simulated HRTFs were computed using the BEM method for frequencies between 100 Hz and 22 kHz in steps of 100 Hz, generating a Q = 1730 point Lebedev grid.The edge length of the meshes was gradually increased from 1 mm at the simulated ear to 10 mm at the opposite ear (HRTFs were simulated separately for the left and right ear).HRIRs were obtained by inverse Fourier transform, and shortened to 256 samples at a sampling rate of 44.1 kHz by applying squared sine fade-ins of 10 samples and fade-outs of 20 samples.The acoustically measured HRTFs were measured in the anechoic chamber of the Technical University of Berlin with a sampling rate of 44.1 kHz, in a frequency range between 200 Hz and 20 kHz.Final HRIRs were truncated to 256 samples.The measurement regime comprised an equal elevation grid with 10 • resolution and equal azimuth grids of different resolutions at each elevation (10 • resolution at the horizontal plane decreasing towards the poles, to yield an almost constant great circle distance between neighboring points of the same elevation).This results in a full-spherical sampling grid with Q = 440 points, which ideally allows a SH transform of order N = 16.For detailed information about the simulation and measurement processes see Ref. [45].

B. Performance analysis with nominal parameters
Interpolation errors were computed for each HRTF from the tested database by first taking out 121 directions from the HRTF, and then, using sparse subsets from the remaining directions and interpolating these into the original 121 directions.The 121 directions were chosen as the closest to a 10th order Extremal sampling scheme [47].The sparse subsets were taken by choosing the Q directions that are the closest to an N th order Gaussian sampling scheme [34].The normalized MSE was calculated for the left ear HRTF of each subject, with phase-correction, with time-alignment and without any processing.The phase-correction was performed using the same nominal parameters as in Sec. for all subjects (head radius of r a = 8.75 cm, and left ear position of (90 • , 90 • )). Figure 7 shows the mean and STD of the error across subjects, as a function of frequency, for different subsets.
Figure 7(a) presents the results for the simulated HRTFs.The figure shows the significant effect of the phase-correction and time-alignment on the interpolated HRTF when using sparse HRTFs.With only 50 measurements and SH order 4, the errors reduced to below -10 dB up to 8 kHz and below -5 ḋB up to 20 kHz.With Q = 322 and N = 12, the errors are below -20 dB up to 18 kHz.These results are in agreement with the results presented in Sec.VII, where the phase-correction and   (c), computed as in Eqs. ( 20), ( 21) and ( 22), respectively, for KU-100 HRTF, as a function of SH order, averaged across frequencies from 300 Hz up to 20 kHz."HRTF Magnitude" was computed by interpolating only the magnitude of the original HRTF, and comparing it to the original HRTF magnitude."HRTF Unwrap was computed by interpolating the unwrapped phase of the original HRTF, and comparing it to the phase of the original HRTF.time-alignment reduced 99% of the SH energy and lowered the required SH order for achieving -20 dB error to be around N = 12.
Figure 7(b) presents the results for the measured HRTFs.Note that the numbers of used measurements are in some cases different from a Gaussian sampling of the same order; this is due to the limited number of original measurements (440).The positive effect of the pre-processing methods on the measured HRTF is smaller compared to the simulated HRTFs.However, the interpolation errors are still reduced significantly, where, using only 50 measurements and SH order 4, the errors reduced to below -10 dB up to 7 kHz and below -3 ḋB up to 14 kHz.With Q = 186 and N = 10, the errors are below -10 dB up to 12 kHz when using phase-correction, and below -5 dB up to 20 kHz.Interestingly, no improvement is visible above Q = 186; this is also evident in the analysis of the measured KU-100 HRTF in Sec.VII.This behavior can be explained by the fact that the phase at high frequencies in acoustic measurements is very sensitive to measurement noise, which introduces errors that the pre-processing methods cannot overcome.
The results of the measured HRTFs demonstrate the advantage of the phase-correction compared to the time-alignment method.As the number of measurements increases, the differences between the two pre-processing methods also increases, where the largest difference is around 4 dB.The higher error of the time-alignment method can be explained by the fact that it is based on delay estimation, which cannot be robustly performed for signals with a low SNR.Such signals exist, for example, at low elevation directions due to the measurement system1 .This result may suggest that the fact that the phase-correction is done parametrically and is dataindependent makes it more robust to measurement noise, such as low SNR.To demonstrate this, the interpolation errors of the simulated HRTFs were computed again, with added noise. Figure 8 shows the error as presented in Fig. 7 the error when using time-alignment increases significantly when the SNR decreases.
In order to qualitatively evaluate how the interpolation error varies across directional regions of the HRTF, the interpolation errors were calculated for different angles.Figure 9 shows the errors as a function of azimuth and elevation angles, averaged across all 94 subjects.The interpolated HRTFs were calculated by taking the subset of Q = 230, N = 10 of the simulated HRTF and interpolating it into a full sphere with 1 degree resolution, with and without pre-processing.The figure shows the errors at 5 kHz and 10 kHz, demonstrating the reduction in the interpolation error when pre-processing is applied.It is interesting to note that relatively large errors appear at the contralateral directions.However, these error may not be as important as the figure may suggest.First, at these directions the overall energy of the HRTF is expected to be relatively low (see Fig. 10 for example).Second, as shown by psychoacoustic experiments, the information at at the contralateral directions are less important for sound localization [48].
To demonstrate how the interpolation error affects the magnitude of an interpolated HRTF as a function of direction, a simulated HRTF from the HUTUBS database of the FABIAN head-and-torso-simulator [49] analyzed.As a reference, the original HRTF was also into the same grid using SH interpolation of order 35. Figure 10 shows the magnitude of the interpolated left ear HRTF as a function of direction for the same two analyzed frequencies, 5 kHz and 10 kHz.The figure illustrates the errors introduced to sparse measurements, which can be seen as a highly distorted magnitude pattern for the un-processed HRTF, especially at 10 kHz.The pre-processed HRTFs, with phase-correction and time-alignment, are much more similar to the reference, while larger differences are still visible at the contralateral directions.
In addition to the magnitude errors, the effect of the different pre-processing methods on the Interaural Time Difference (ITD) is of great interest for spatial audio reproduction.To analyze this effect, the ITDs were computed for the simulated HRTFs of the 94 subjects using the subsets of Q = 230, N = 10 and Q = 50, N = 4, which were then interpolated into the horizontal plane with 1 degree resolution, with and without pre-processing.As a reference, the original HRTF was also interpolated into the same grid using SH interpolation of order 35.Then, the ITD was computed using the threshold detection method, with a threshold of -30 dB applied to a 3 kHz low-pass filtered version of the HRIRs.This method has been shown by Andreopoulou and Katz [31] to be the most perceptually relevant procedure for ITD estimation.Figure 11 presents the mean and STD of the ITD values for the two subsets across subjects.As expected from the behavior of the ITD, which is mostly relevant at frequencies up to 1.5 kHz [5], [6], the ITD values for the high order subset (Q = 230, N = 10) are accurate even with the un-processed HRTF.However, with the low order subset of Q = 50, N = 4, significant errors in the ITD are introduced with the un-processed HRTF, while preprocessing the HRTFs with both methods achieved significant improvement.As reported by Andreopoulou and Katz [31], the Just Noticeable Difference (JND) for the ITD is subject and angle dependent.The lower JND is in the frontal direction (φ = 0 • ), with an average value of about 40 µs, and the highest is at the lateral directions (φ between 80 • to 100 • ), with an average value of about 100 µs.To assess the errors in ITD between the original HRTF and the pre-processed HRTFs with comparison to the JND, the average errors across directions were calculated for each subject.The maximum of these errors were found to be 6 µs and 12.7 µs for the timealigned and phase-corrected HRTFs, respectively, under both sampling conditions.As these errors are much smaller than the JND, it is expected that pre-processing may distort the ITDs but in a level that is not noticeable in most directions.

C. Sensitivity analysis for parameter values
The results presented above for the phase-correction method were obtained using nominal subject-independent parameters (head radius r a , left ear azimuth φ l and elevation θ l ).However, it is known that actual human heads are neither single sized, nor have their ears in a centered position.In order to study sensitivity of the phase-correction method to the applied parameters, the interpolation errors were computed again for 91 simulated HRTFs (the HRTFs with anthropometric metrics given the The analysis in Fig. 7 was repeated here with the Q = 230, N 10 subset, using phase-correction method different parameter values (subject-independent).The error was calculated for the 121 directions selected from the original HRTF, for all 91 subjects.Figure 12(a) shows the mean and STD values of the interpolation error over all directions and subjects using phase-correction with the left ear position at (90 • , 90 • ) and different values of head radius.Figures and 12(c) show the errors when using r a = 8.75 cm and different values of left ear elevation and azimuth positions, respectively.The different parameter values were chosen to cover a reasonable range of deviations from nominal parameters.The figures show that small deviations from the nominal parameters (7 cm ≤ r a ≤ 9 cm, 85 • ≤ θ l ≤ 92 • and 85 • ≤ φ l ≤ 95 • ) have a moderate effect on the interpolation error, mostly at very high frequencies (above 15 kHz), where a maximum of 2-3 dB differences in the errors appear in this parameter range.However, higher deviation from the nominal parameters, for example r a = 6 cm, θ l = 100 • or φ l = 80 • , lead to larger errors.These results suggest that the selection of values for the phase-correction parameters can be made in practice without the need for great accuracy, as long as the deviations from the nominal values as employed here are reasonably small.

D. Individualized selection of parameter values
To further analyze the effect of the parameter values on the phase-correction method, an optimization study was performed.The optimal parameters for each subject from the database were found by minimizing the center of power of the SH coefficients of the HRTF, as defined by Ben-Hagai et al. [37]: where x = (r a , θ l , φ l ), x opt = (r a opt , θ l opt , φ l opt ) and J 2 is the center of power, defined as: Minimization of J 2 means that the HRTF power is concentrated at the low SH orders, which leads to reduction in the effective SH order as shown in Sec.VII-A.An optimization solver was applied to compute x opt .Note that the optimization can be applied in practice on real data, as it does not require any additional information.The optimization problem was formulated using MATLAB version R2018b, and solved using the built-in function fminsearch, which implements the optimization method of the simplex search method [50].Additionally, the subjects' head-widths were taken from the anthropometric metrics given in the database and were used for comparison with the optimal r a .The average error between the computed optimal radius and half the measured head-width is 1.65 cm, where, in most cases, the optimal radius is larger.This can be explained by the fact that the human head is not spherical and usually the head-width (measured as the distance between the temples) is the smallest part of the head.Note that other anthropometric measures could be used for estimating the head radius, such as the head-depth and headheight.However, preliminary studies have indicated that the use of these measures, or an average of their combinations, do not yield better results than the ones presented herein.This implies that the suggested method seems to optimize for the smallest measure of the head, which is not unexpectedthe smallest measure typically corresponds to the translation distance between the origin (center of the head) and the ear. Figure 13 presents the interpolation error, computed from the sampling subsets of Q = 230 and Q = 186 of the simulated and measured HRTFs, respectively, with SH order 10.The phase-correction was applied with: (i) the nominal parameters (r a = 8.75 cm, (θ l , φ l ) = (90 • , 90 • )), (ii) the optimal parameters per subject (mean and STD over subjects are presented in Tab.I), and (iii) the head radius taken from the anthropometric metrics (in this case, the optimal (θ l opt , φ l opt ) were used).The figure shows the errors for both simulated and measured HRTFs.It can be seen that subject-dependent optimization of the parameters has a relatively small effect on the interpolation error, which is mostly present at the very high frequencies (above 15 kHz).Additionally, as expected, using the computed optimal head radius is preferable to using the measured head-width.The results presented here reinforce the  conclusions from Fig. 12, that phase-correction can be used in practice by employing nominal parameter values without the need for additional subject-dependent data.Note that the presented analysis demonstrates the relative objective errors when varying the parameters values.However, while small relative errors may lead to small perceptual differences, the actual implication on perception is not clear and requires additional investigation.Overall, it seems that using nominal values for the phase-correction parameters may be useful in practice, while a further small improvement can be achieved using optimization or individualization.
The results presented in this section suggest that using the phase-correction method has practical benefit in the design of HRTF measurement systems.The proposed method enables the measurement of an HRTF at a relatively small number of directions, but with accurate interpolation with high spatial resolution.In comparison to the time-alignment method, it has been shown to be more robust to additive measurement noise, while its parametric nature makes it easier to implement in practice.Additionally, using phase-correction leads to a compact SH representation of the HRTF, which is beneficial for real-time applications.Furthermore, as was presented in [26], this reduction in the effective SH order of the HRTF is expected to lead to a smaller HRTF sampling error and, therefore, to improved perceptual performance when using these HRTFs.

IX. CONCLUSION
This paper presented a method for efficient representation of sparse sampled HRTFs using phase-correction by ear alignment.A formulation of the phase-correction method was developed by considering the dual-centering of HRTF measurements.The method was evaluated by simulation and experimental studies with simulated and measured HRTFs.Significant reduction in the effective SH order was achieved, which affords the ability to measure HRTFs with many fewer directions and still interpolate it with high spatial resolution.
A comparison with a previously suggested method of timealignment was performed, demonstrating the robustness of the phase-correction method to additive measurement noise.Sensitivity analysis and an optimization study imply that nominal ear-alignment parameters can be used in practice and yield similar results to those obtained with optimized parameters.This paper presented an objective evaluation of the suggested method, while subjective evaluation of the effects on spatial perception, which is essential for the completeness of this study, is suggested for future work.

Figure 1 .
Figure 1.Added SH order due to normalization of the ear pressure by center pressure, N = kr , as function of frequency.Computed for r = 8.75 cm and c = 343.5 m/s.
vectors of the magnitude values of the original and interpolated HRTFs, respectively, and h phase L and h phase L are vectors of the phase values of the HRTFs.

Figure 2
Figure 2 presents the SH spectrum and the parameters b 90 and b 99 as a function of frequency and SH order.Each row in the figure presents a different HRTF, and each column presents a different pre-processing method.The SH spectra were normalized by the maximum value for each frequency.The figure shows how the energy of the high order SH coefficients of the phase-corrected and time-aligned signals are significantly reduced compared to the un-processed signal.

Figure 2 .
Figure 2. Normalized SH spectra, En, of different HRTFs (rows) and different pre-processing methods (columns), computed according to Eq. (18).The dashed red and green lines represent the parameters b 90 and b 99 , respectively, where b X is the lowest order at which at least X% of the total energy is contained at order b X and below, computed according to Eq. (19) with N = 35 and X = 90, 99.

Figure 4 .
Figure 4.The required SH order to achieve Normalized MSEs which are lower than a selected value [ L = -20 dB, -15 dB, -10 dB].The normalized error is computed as in Eq. (20), and presented as a function of frequency and kr, for different HRTFs (rows) and different pre-processing methods (columns).

Figure 6 .
Figure 6.Normalized MSE, L , computed as in Eq. (20) for KU-100 HRTF, as a function of SH order, averaged across different frequency bands.

Figure 7 .Figure 8 .
Figure 7. Normalized MSE, L , as a function of frequency, computed using HRTFs of 95 subjects taken from HUTUBS database[45].The errors are computed for each subject using different subsets from the original HRTF.The bold lines represent the means across subjects, and the dashed lines represent the STDs: (a) results for simulated HRTFs; (b) results for measured HRTFs.

Figure 9 . 4 Figure 11 .
Figure 9. Normalized MSE, L , as a function of direction (azimuth, elevation), for two frequencies (rows) and un-processed and the two studied pre-processing methods (columns), averaged across 91 simulated HRTFs.The HRTFs were interpolated from a subset of Q = 230 directions with SH order N = 10.

Figure 10 .
Figure 10.Magnitude of the left ear HRTF of FABIAN head-and-torso-simulator, as a function of direction (azimuth, elevation), for two different frequencies (rows) and different pre-processing methods (columns).The HRTFs were interpolated from a subset of Q = 230 directions with SH order N = 10.The original HRTF (left column) was calculated using the original Q = 1730 directions using SH order N = 35.

Figure 13 .
Figure 13.Error analysis for the optimized phase-correction parameters.Each figure presents interpolation error, L , as a function of frequency, computed using HRTFs of 91 subjects taken from HUTUBS database[45].The bold lines represent the means across subjects, and the dashed lines represent the STDs.The interpolated HRTFs were computed using the phase-correction method with nominal, optimal and anthropometric parameters: (a) results for simulated HRTFs; (b) results for measured HRTFs.

Figure 12 .
Figure 12.Sensitivity analysis of the phase-correction parameters.Each figure presents interpolation error, L , as a function of frequency, computed using simulated HRTFs of 91 subjects taken from HUTUBS database[45].The errors are computed for each subject using a subset of Q = 230 directions from the original HRTF.The bold lines represent the means across directions and subjects, and the dashed lines represent the STDs.The interpolated HRTFs were computed using the phase-correction method with different parameters: (a) different head radius; (b) different ear elevation angle; (b) different ear azimuth angle.