Directionally Weighted Wave Field Estimation Exploiting Prior Information on Source Direction

A wave field estimation method exploiting prior information on source direction is proposed. First, we formulate a wave field estimation problem as regularized least squares, where the norm of the wave field is used for a regularization term. The norm of the wave field is defined on the basis of the weighting function that reflects the prior information on the source direction. We derive the closed-form solution using theories on Hilbert spaces. Results of numerical experiments indicated that high estimation accuracy can be achieved by using the proposed method in comparison with other current methods that do not use any prior information.


I. INTRODUCTION
C APTURING a three-dimensional wave field is an essential technique for its analysis, visualization, and other related applications. In the field of spatial audio, in particular, sound field estimation techniques using multiple sensors (i.e., microphones) have attracted considerable attention owing to their wide variety of applications, such as the reproduction of a captured sound field using loudspeakers [1]- [4] or headphones [5]- [7] and the spatial active noise control [8]- [10].
Wave field estimation methods are classified on the basis of whether wave sources are allowed to exist inside the target region [11]- [15] or not [2], [16]- [20]. In general, the former case is much more difficult than the latter and often requires additional assumptions, such as spatial sparsity of source distribution and a reverberant-free condition. Here, we focus on the wave field estimation inside a source-free target region as shown in Fig. 1. Note that wave sources may exist outside the target region also in this context.
There are a large number of wave field estimation methods for source-free target regions, which can be further classified on the basis of whether the linearity between the observed signals and estimated wave fields at each frequency, i.e., the linear time-invariant (LTI) property, holds. For example, many methods based on the higher-order Ambisonics (HOA) approach in spatial audio have the LTI property [2], [18]- [20]. In these methods, the wave fields are expanded by spherical wavefunctions up to a certain order, and their coefficients are estimated linearly from the observed signals [2], [18], [20]. In our previous work [19], the spherical wavefunction expansion is considered up to infinite dimensions, which is essentially equivalent to the expansion by infinite number of plane-wave functions [21]. This makes the estimation process independent of the settings of the expansion center and truncation order. Owing to the LTI property, these methods allow a fast implementation using LTI digital filters, which is preferable to non-LTI ones from the viewpoint of computational efficiency and almost essential for real-time systems (e.g., spatial active noise control).
On the other hand, sparsity-based methods [15], [22]- [24] are typical examples of non-LTI approaches. In these methods, wave fields are decomposed into sparse basis functions, and their coefficients are determined by sparse optimization methods. In contrast to the HOA approach, the plane-wave functions and monopole functions (free-field Green's functions) are often used as the basis functions. These methods may improve estimation accuracy in various practical situations since they exploit the prior information on the target wave field, i.e., the sparsity of the coefficients. However, these non-LTI methods require iterative calculations for optimization, which makes a real-time implementation difficult. In addition, the function space in which we seek the solution has to be approximated by a finite number of basis functions, e.g., by discretizing the directions of plane-wave functions or the positions of monopole functions, which causes This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ a tradeoff relationship between the approximation accuracy and the computational cost. Therefore, an LTI method exploiting some prior information on a target wave field is desired to achieve both high estimation accuracy and low computational cost.
In this paper, we present a wave field estimation method exploiting prior information on source direction. Although it is not always easy to obtain the exact source positions beforehand, their rough estimates are available in many practical situations (e.g., when the possible source position is limited within a certain region owing to the physical constraint or when the source position can be estimated by using some other sensors such as image sensors). We formulate the wave field estimation problem as the regularized least squares in a Hilbert space [25], where the prior information is incorporated as a directional weighting into the regularization term, and the closed-form solution can be obtained using the representer theorem [26], [27]. In particular, this method r has the LTI property, r incorporates prior information on source direction in a flexible form (e.g., single direction, multiple directions, and diffuse field), and r does not require the finite-dimensional approximation of the function space composed of wave fields as in our previous work [19], [28], which is made possible by utilizing theories of functional analysis. Note that the formulation in this paper includes those in our previous works [19], [28] as special cases; the proposed method can be regarded as their generalization. The proposed method is also closely related to our other previous work [29], where the kernel interpolation of two-dimensional sound fields based on directional weighting is presented. Whereas only omnidirectional sensors in two-dimensional wave fields are considered in [29], the sensor directivity is extended to general cases and three-dimensional wave fields are considered in this paper.
The rest of this paper is organized as follows. In Section II, several notations and preliminaries are introduced. In Section III, the proposed method is described. In Section IV, results of several numerical experiments for comparison between the proposed method and other LTI methods are reported. Finally, we present our conclusions in Section V.

II. NOTATIONS AND PRELIMINARIES
First, we provide several basic notations and preliminaries on the representation of wave fields and modeling of sensor directivities.

A. Notations
The set of natural numbers including zero are denoted by N, and the set of positive integers are denoted by N + . For any integers m and n satisfying m ≤ n, [[m, n]] denotes the integer interval between m and n inclusive.
The sets of real and complex numbers are denoted by R and C, respectively. The imaginary unit is denoted by i, and the complex conjugate of a complex number is denoted by (·) * . The unit sphere in R 3 is denoted by S 2 . The transpose and Hermitian transpose of a matrix are denoted by (·) T and (·) H , respectively. For any two vectors x := [x 1 , x 2 , x 3 ] T and y := [y 1 , y 2 , y 3 ] T in S 2 , R 3 , or C 3 , the operator • is defined as x • y := x 1 y 1 + x 2 y 2 + x 3 y 3 .
An integral of a measurable function f : S 2 → C over the sphere S 2 (with respect to the natural measure) is denoted by The set of square-integrable functions from S 2 to C is denoted by L 2 (S 2 ). According to conventions, we do not distinguish functions in L 2 (S 2 ) that are equal almost everywhere on S 2 .

B. Representation of Wave Fields
Let Ω ⊆ R 3 be a simply connected open subset of R 3 and u : Ω → C be a wave field in Ω at a fixed angular frequency ω ∈ R, i.e., u(r) denotes a pressure at r ∈ Ω. 1 If Ω does not include any wave sources, u can be well modeled as a solution of the following (homogeneous) Helmholtz equation [30]: where Δ denotes the Laplace operator and k := ω/c is the wavenumber with c ∈ (0, ∞) being the phase velocity, which is assumed to be constant in Ω. A typical modeling of u satisfying (1) is a superposition of plane-wave functions (also called the Herglotz integral [31]) as follows: Here,ũ ∈ L 2 (S 2 ) represents the complex amplitude of plane waves arriving from each direction. Let A denote a transform of functions fromũ to u defined as (2), and we define a function space H as In what follows, the tilde symbol over a letter denotes the inverse of A (e.g.,ũ := A −1 u).
Although H does not include all solutions of (1), any solution of (1) can be approximated arbitrarily by functions in H in the sense of the uniform convergence on compact sets (see Appendix VII for a proof). Therefore, a wave field estimation problem can be regarded practically as a process of determining a function u within H on the basis of some criterion.

C. Modeling of Sensor Directivity
Suppose a single sensor with a certain directivity located at r 0 ∈ Ω in a wave field u ∈ H . Its directivity can be modeled as the directional function γ ∈ L 2 (S 2 ), i.e., γ(x) denotes the sensor's response to the plane wave arriving from the direction x ∈ S 2 . For example, omnidirectional, bidirectional, and firstorder sensors are respectively modeled with constant parameters y ∈ S 2 and ζ ∈ [0, 1] as follows. Omnidirectional: Bidirectional: First-order: In many practical cases including the above examples, the directivity γ can be well represented using finite-order spherical harmonic functions as where N ∈ N is the maximum order, N ν,μ is the abbreviated form of N ν=0 ν μ=−ν , and Y ν,μ (·) : S 2 → C denotes the spherical harmonic function of order ν ∈ N and degree μ ∈ [[−ν, ν]] (see [21] for the definition and [32] for an efficient computational algorithm). Here, we consider the complex conjugate on the right-hand side of (4) to simplify later discussion.
Since u can be expanded around r 0 as the observed signal s ∈ C of the sensor is given by where ∈ C is the observation noise. Hereafter, we use the simplified notation as Here, F : H → C is a functional defined as In addition, we refer to F as an observation operator of the sensor. Note that F is a linear functional on H , which means the superposition principle in the observation.

A. Formulation
Let M ∈ N + sensors be located arbitrarily in Ω as shown in Fig. 1. For each m ∈ [ [1, M]], the position, directivity, and observed signal of the mth sensor are denoted respectively by r m ∈ Ω, γ m ∈ L 2 (S 2 ), and s m ∈ C, and they are assumed to be given. In this case, the observation operator of the mth sensor, denoted by F m , is given for each We consider the following formulation of wave field estimation problems: where σ 1 , . . . , σ M ∈ (0, ∞) are dispersion parameters representing the observational uncertainty, λ ∈ (0, ∞) is a regularization parameter, and · H : H → [0, ∞) is a norm on H , which is defined later. The first term of (10) is a loss term, which represents a deviation between the predicted values F 1 u, . . . , F M u and the observed values s 1 , . . . , s M . On the other hand, the second term is a regularization term, which evaluates the reasonability of u independently of the observation. If we can design the norm · H so that the wave fields that are likely to occur have small norms, the solution of (10) will be induced to such wave fields. Therefore, it is desirable to design an appropriate norm exploiting prior information of wave fields. On the basis of the above discussion, we introduce the norm · H over H defined as Here, w : where β ∈ [0, ∞) and η ∈ S 2 are constant parameters and C(β) is a scaling constant (so that w satisfies x∈S 2 w(x) dχ = 1) defined as . (13) This function w is also known as the probability density function of the von Mises-Fisher distribution used in directional statistics [33]. Note that w(x) takes a large value when x is close to η, especially in the case of large β. Therefore, by using this norm in (10), we can impose large penalties for wave fields originating from wave sources in the direction away from η while imposing small penalties for wave fields from wave sources in the direction close to η. When β = 0, on the contrary, the solution of (10) will be induced relatively to a diffuse field. The weighting function w can be further generalized by using a linear combination of (12) for different parameters β and η, which is useful in cases of multiple wave sources. This extension is discussed in Section III-D after we describe the basic concept of the proposed method.

B. Closed-Form Solution
We provide the closed-form solution of (10) on the basis of theories on Hilbert spaces. First, we define the inner product ·, · H on H as Then, (H , ·, · H ) is a complex Hilbert space because it is isomorphic to the Hilbert space L 2 (S 2 ) with the weighted L 2 inner product. Using this inner product ·, · H , we can write the objective function as where Then, we can apply the representer theorem [26], [27], which guarantees that the solution of (10), denoted byû, has the form ofû with someα := [α 1 , . . . ,α M ] T ∈ C M . Therefore, we need only to seek the optimal coefficientsα in the finite-dimensional linear space C M instead ofû in the infinite-dimensional linear space H . Here,α can be obtained by solving the following optimization problem: which is given by substituting Finally, by solving (18), we obtain andû is given by substituting (23) into (17). Therefore, the remaining problems are the calculations of v 1 , . . . , v M and K.

C. Directionally Weighted Spherical Wavefunctions and Directionally Weighted Translation Operators
As noted in Section II-C, the directivities of various sensors can be well modeled by finite-order spherical harmonic functions. Suppose γ 1 , . . . , γ M can be represented as (24) with N 1 , . . . , N M ∈ N. Then, by substituting (12) and (24) into (16) and (22) and solving the integrals (see Appendix IX for detailed derivations), we obtain where with Here, j ν (·) : C → C is the νth-order spherical Bessel function of the first kind, y ν,μ (·) : C 3 → C is the normalized homogeneous harmonic polynomial [21] of order ν and degree μ (i.e., a homogeneous harmonic polynomial satisfying y ν,μ (x) = Y ν,μ (x) for x ∈ S 2 ), and G(·) denotes the Gaunt coefficient with a slight modification regarding complex conjugation, which is defined as The closed-form expression of (31) can be obtained using the formulae in [21], and an efficient computational algorithm is proposed in [34]. Note that (29) is well defined independently from branches of the 1/2-power function and can also be defined at z = [z 1 , z 2 , z 3 ] T ∈ C 3 such that z 2 1 + z 2 2 + z 2 3 = 0 by using limit values because these points are removable singularities  (also note that the term (z 2 (29) is not the complex norm given by (|z 1 | 2 + |z 2 | 2 + |z 3 | 2 ) 1 2 ). When β = 0, it can be immediately shown that (27) and (28) correspond respectively to the usual spherical wavefunction and translation operator [4], [19], [20] (except for the constant coefficients), and this method corresponds to our previous work [19]. Conversely, the proposed method is obtained formally by replacing the usual spherical wavefunctions and translation operators in [19] with (27) and (28), respectively (therefore, the computational costs are essentially the same in these two methods). Since the differences from the previous work [19] in β ∈ (0, ∞) are derived from the directional weighting function w, we refer to (27) and (28) as a directionally weighted spherical wavefunction and a directionally weighted translation operator, respectively. We show several examples of directionally weighted spherical wavefunctions in Fig. 2, where k = 10 and η = [cos(π/3), sin(π/3), 0] T . One can see that these functions become close to the plane-wave function arriving from η as β increases.
As a special case, we consider the case of omnidirectional sensors, i.e., γ m ( In this case, the proposed method corresponds to the kernel ridge regression [35] with the kernel function κ : i.e., (H , ·, · H ) is a reproducing kernel Hilbert space [36] generated by κ. Similar discussion for the two-dimensional case is also given in [29].

E. Comparison With Non-LTI Methods
From (17) and (23), the estimated wave fieldû is immediately shown to be linear with respect to the observed signal s in the proposed method. Here, for a theoretical evaluation of the computational cost, we consider some application represented as a linear system with inputû and output φ ∈ C E in the form of where E ∈ N + is the dimension of the output variable and Φ 1 , . . . , Φ E are arbitrary linear functionals. For example, in the visualization of wave fields, φ is the pointwise values ofû, and in sound field reproduction using loudspeakers/headphones, φ is the loudspeaker/headphone signals that reproduce a sound field u (the details of the reproduction methods are beyond the scope of this paper; however, note that the loudspeaker/headphone signals are obtained linearly with respect to the desired field u in most methods [1]- [7]). From (17), we can rewrite (38) as φ = Vα (39) with V ∈ C E×M defined as Therefore, from (23), the relationship between φ and s is represented as where M ∈ C E×M is defined as M := V (K + λΣ) −1 . The calculation of M requires O(M 3 + EM 2 ) times of scalar multiplication and addition; however, it can be calculated before we obtain s. Therefore, once we have obtained M , only O(EM ) times of scalar multiplication and addition are required in the calculation of φ from a given s. Also in the time domain, (41) can be implemented using E × M LTI filters determined independently of s. For comparison, we provide a brief description of the sparsitybased method based on l p norm regularization [15], [22]- [24]. In this method, the wave field u is first decomposed into a finite number of basis functions as where N basis ∈ N + is the number of basis functions (typically set to be greater than M ), ψ 1 , . . . , ψ N basis : Ω → C are basis functions and b 1 , . . . , b N basis ∈ C are unknown coefficients that are expected to be sparse (i.e., only a small number of them have nonzero values). The unknown coefficients b 1 , . . . , b N basis are determined by solving the following minimization problem: with p ∈ [0, 2), where p = 0 or 1 is typically used. Here, · p denotes the l p norm [37], 2 and D ∈ C M ×N basis is a dictionary matrix whose (m, n)th element models the observation of ψ n by the mth sensor for m ∈ [ [1, M]] and n ∈ [[1, N basis ]]. For p > 0, the following formulation is also often used instead of (43): where λ ∈ (0, ∞) is a regularization parameter. By defining B : C M → C N basis as the nonlinear map where B(s) ∈ C N basis denotes the solution of (43) or (44) for a given s ∈ C M and Ψ ∈ C E×N basis as we can represent the relationship between φ and s as φ = ΨB(s). (46) In this case, we cannot utilize the same precomputation as in (41) since B is a nonlinear mapping. For the calculation of B(s), i.e., the optimization of (43) or (44), the orthogonal matching pursuit [37], [38], (accelerated) proximal gradient [39], and iteratively reweighted least squares [37], [40] are representative algorithms. In total, the following times of scalar multiplication and addition are required in the calculation of (46).
Here, K ∈ N + is the number of nonzero values in b, and K iter ∈ N + is the number of iterations in the algorithm. Particularly when E is small, these computational costs are much larger than that of the proposed method. Also in the time domain, (46) does not allow a fast implementation using LTI filters in the time domain owing to the nonlinearity of B. Therefore, the proposed method is preferable to the above non-LTI methods from the viewpoint of the computational cost, which is an important factor in applications such as real-time systems, where fast implementation is required.

IV. NUMERICAL EXPERIMENTS
Numerical experiments of sound field estimation using a microphone array were conducted to demonstrate the performance of the proposed method. The sound field in the air was considered, and the phase velocity (i.e., the speed of sound) was set as c = 340 m/s. We compared the proposed method with the current LTI method presented in [18] (also introduced as the general sampling array approach in [2]), which is referred to here as the truncation method. In addition, the proposed method for β = 0, which corresponds to our previous work [19], was also investigated for comparison. Hereafter, the notation of temporal frequency is used instead of angular frequency.

A. Estimation of Plane-Wave Field Under Free-Field Condition
In a three-dimensional free field, 64 microphones were located on a sphere with a radius of 1.0 m centered at the origin. Their positions were determined according to the spherical t-design [41]. Each microphone was modeled as a cardioid microphone oriented outward, i.e., the observed signals s 1 , . . . , s M in a sound field u were given by Furthermore, these directivities γ 1 , . . . , γ M can be represented as (24) with The true sound field u true was set as a single plane wave, which was defined as with x inc = [1, 0, 0] T (note that x inc denotes the incident direction, not traveling direction). An example of the true sound field is shown in Fig. 4, where the black line denotes the boundary of the microphone array. The observed signals were calculated using (47), and the observation noises were sampled independently from the circularly symmetric Gaussian distribution with zero  mean and variance of 10 −2 × S, where S denotes the average power of the noise-free signals (i.e., the signal-to-noise ratio was 20 dB).
In the truncation method [18], the truncation order was determined as 7 so that the number of unknown coefficients corresponds to the number of microphones, i.e., 64. The global origin (center of the spherical wave function expansion) was set at the center of the spherical microphone array. The regularization parameter in the matrix inversion was set at 10 −2 as in the proposed method.
As an evaluation criterion, the normalized mean squared error (NMSE) was used, which was defined as NMSE := 10 log 10 i∈I eval |u true (r Here, u est denotes the estimated sound field, and the evaluation points {r (i) eval } i∈I eval were set as all grid points with an interval of 0.1 m on and inside the surface of the spherical microphone array.
First, the relationship between frequencies and NMSEs was plotted in Fig. 5. We can see that the NMSEs for the proposed method (β = 0) were almost the same as those for the truncation method at low frequencies and lower than those for the truncation method at high frequencies, which was also reported in [19] for the two-dimensional case. Among the proposed methods, the NMSEs for θ = 0 were lower than those for the other conditions, and even the NMSEs for θ = π/6 (i.e., inaccurate prior information) were lower than those for β = 0. For further investigation, the NMSEs at different θ and β were plotted at frequencies of 500 and 1000 Hz in Fig. 6. It can be seen that for θ = 0, the NMSE decreased as β increased, and for θ > 0, the NMSE took a minimum value at a certain value of β. This is considered to be because the estimation was strongly affected by inaccurate prior information in cases of large β for θ > 0. The best value of β varied depending on θ and the frequency, and their quantitative relationship seems complex; however, at least, many conditions achieved lower NMSEs than the condition of β = 0. These results indicate that even rough prior information on the source direction may improve the estimation accuracy in the proposed method. We also show an example of the estimated sound fields and the (pointwise) normalized error distributions at 500 Hz in Figs. 7 and 8, respectively. In this example, the NMSEs were (a) −4.87, (b) −18.20, (c) −24.74, and (d) −1.18 dB. The tendencies described above can also be seen in these results.

B. Estimation of Monopole Field Under Free-Field and Reverberant Conditions
In a 6 m × 4 m × 3 m rectangular room with its center defined as the origin, the same spherical microphone array as used in the previous experiments was located with its center positioned at [−1, 0, 0] T m. The reverberation in the room was simulated by the image-source method [42], where image sources were considered up to the 20th reflection order. Here, the reflection coefficients were set as Γ ∈ {0, 0.4, 0.8} for all six wall surfaces, where each of the above three values was investigated (Γ = 0 corresponds to the free-field condition). In these settings, the reverberation time (from Sabine's formula) and the reverberation radius (critical distance) [43] were respectively 0.13 s and 1.35 m for Γ = 0.4 and 0.30 s and 0.88 m for Γ = 0.8.
The true sound field u true was set as a superposition of two monopole functions, whose direct wave component was defined   in Fig. 9, where the black line denotes the boundary of the microphone array and the blue dots denote the position of the sound sources projected into the xy-plane. The observed signals were calculated using (47), and the observation noises were added in the same way as in the previous experiments.
In the proposed method, σ 1 , . . . , σ N and λ were set to be the same as in the previous experiments. For prior information, we used the weighting function defined in (35) with L = 3, where we defined  using the parameters β ∈ [0, ∞) and a ∈ [0, 1]. We defined β 3 = 0 for the third weighting to represent the reverberant component, and different β and a were investigated. In the truncation method, the same parameters as in the previous experiments were used. The NMSE was used as an evaluation criterion. First, the relationship between frequencies and NMSEs was plotted in Fig. 10. In the free-field condition, i.e., Γ = 0, one can see similar tendencies to the experimental results in Section IV-A. In the reverberant conditions, the NMSEs for the proposed method (β = 6, a = 0) became large as Γ increased, which was because the regularization term for a = 0 takes a large value for reverberant components. On the other hand, the proposed method (β = 6, a = 0.5) achieved the lower NMSEs than the proposed method (β = 0) at most frequencies although the performance improvement decreased as Γ increased. The reduced effectiveness is considered to be related to the extent of reverberation; in highly reverberant environments, even sound fields originated from monopole sources become close to diffuse fields, which makes it difficult to improve estimation performance by using a directional weighting.
For further investigation, the NMSEs for different a and β at 500 Hz were plotted in Fig. 11. Also in this case, most conditions achieved lower NMSEs than β = 0, which indicates again that even rough prior information may contribute to the improvement of estimation performance. We also show an example of the estimated sound fields and normalized error distributions for Γ = 0.8 at 500 Hz in Figs. 12 and 13, respectively. In this example, the NMSEs were (a) −5.22, (b) −5.90, and (c) −3.90 dB. The tendencies described above can also be seen in these results.
Finally, to investigate how the above results generalize, we conducted the same evaluations for several different source positions, i.e., r (1) src , r (2) src , and several accuracies of prior information, i.e., η 1 , η 2 . The source positions were sampled randomly according to the uniform distribution on the entire room excluding the ball with a radius of 2.0 m centered at [−1, 0, 0] T m. The directions η 1 , η 2 were sampled randomly so that the angle between η l and the true lth source's direction from the center of the spherical microphone array was θ ∈ {0, π/6, π/3} for each l ∈ {1, 2}, where each of the above three values of θ was investigated. Figure 14 shows the relationship between NMSEs averaged over 20 trials and frequencies for different values of a, β, and Γ. For θ = 0 (i.e., accurate prior information), one can see that the performance improvement using the directional weighting can be generalized in various settings of source positions. Moreover, the proposed method (β = 6, a = 0.5, θ = π/6) exhibited lower NMSEs than the proposed method (β = 0). For Γ = 0.8, the NMSEs for these two conditions were very close; however, the proposed method (β = 6, a = 0.5, θ = π/6) showed slightly lower NMSEs at most frequencies. These results mean a certain degree of inaccurate prior information can be used in the proposed method.

V. CONCLUSION
We proposed a wave field estimation method exploiting prior information of source directions by introducing a directional weighting function. The closed-form solution was obtained using the directionally weighted spherical wavefunctions and directionally weighted translation operators. In the numerical experiments, we confirmed that a higher estimation accuracy can be achieved by incorporating prior information in the proposed method than by using other existing methods that do not use any prior information.

APPENDIX A APPROXIMATION BY PLANE-WAVE FUNCTIONS
We show that for any solution u of (1), bounded closed set Ω K ⊂ Ω, and positive number , there is a function u aprox ∈ H satisfying |u(r) − u approx (r)| < ∀r ∈ Ω K .
First, from the boundedness of Ω K , there is an open ball B ⊃ Ω K centered at the origin. For such B, from the Lax-Malgrange theorem [44], [45] (note that the Helmholtz equation is an elliptic partial differential equation), there is a function f : B → C satisfying (Δ + k 2 )f = 0 and |u(r) − f (r)| < /2 ∀r ∈ Ω K .
Moreover, let Y ν,μ be differential operators obtained by re-