Efficient 3D Acoustic Simulation of the Vocal Tract by Combining the Multimodal Method and Finite Elements

Acoustic simulation of sound propagation inside the vocal tract is a key element of speech research, especially for articulatory synthesis, which allows one to relate the physics of speech production to other fields of speech science, such as speech perception. Usual methods, such as the transmission line method, have a very low computational cost and perform relatively good up to 4–5 kHz, but are not satisfying above. Fully numerical 3D methods such as finite elements achieve the best accuracy, but have a very high computational cost. Better performances are achieved with the state of the art semi-analytical methods, but they cannot describe the vocal tract geometry as accurately as fully numerical methods (e.g. no possibility to take into account the curvature). This work proposes a new semi-analytical method that achieves a better description of the three-dimensional vocal-tract geometry while keeping the computational cost substantially lower than the fully numerical methods. It is a multimodal method which relies on two-dimensional finite elements to compute transverse modes and takes into account the curvature and the variations of cross-sectional area. The comparison with finite element simulations shows that the same degree of accuracy (about 1% of difference in the resonance frequencies) is achieved with a computational cost about 10 times lower.


I. INTRODUCTION
A. GENERAL CONTEXT Speech sounds are produced by sound sources (vocal fold oscillations and/or turbulent flow) placed inside a waveguide, the vocal tract, defined as the air volume between the vocal folds and the lips. The motion of the articulators, such as the tongue or the jaw, modify the vocal tract shape and the speech sound produced. Resonances are generated inside the vocal tract by the wave propagating forward and backward from The associate editor coordinating the review of this manuscript and approving it for publication was Guido Lombardi . the glottis to the mouth and the mouth to the glottis. They enhance some part of the speech spectrum, usually referred to as formants, which constitute acoustic cues allowing us to differentiate phonemes.
Acoustic simulation of the vocal tract is useful to establish a relationship between the vocal tract shape and the acoustic properties of speech. Thus, it is possible to synthesise speech directly from the vocal tract shape using articulatory synthesis [8], [23], [36], [48], [49], [51].
The most popular method to simulate vocal tract acoustics is the transmission line model (TLM) [49]. It takes advantage of the fact that acoustic waves are guided along the vocal tract, and that under a low frequency assumption (valid up to about 5 kHz), only plane waves propagate. This means that the local cross-sectional shape has little impact on the propagation and that one needs to consider the cross-sectional area only. Thus, under these hypotheses, the curvature of the vocal tract can be ignored and the description of its shape can be limited to the variation of its cross-sectional area. The TLM can be implemented in both frequency and time domain. It requires very few computational resources and can run in real time.

B. LIMITS OF CURRENT SIMULATION METHODS
A first limitation of the TLM is the impossibility to take into account accurately the three-dimensional (3D) shape of the vocal tract. A second one is that it can intrinsically not describe a non uniform transverse acoustic field. The consequence of this is particularly visible at high frequencies, above about 4-5 kHz, when other modes than plane waves (higher order modes) can propagate [11], [40], [41]. This induces additional resonances and antiresonances in the vocal tract that cannot be predicted by TLM. However, it has also consequences at lower frequencies in the vicinity of discontinuities where the transverse acoustic field can be slightly nonuniform. This induces inaccuracies in the prediction of the resonance frequencies. Note that it is theoretically possible to reduce these inaccuracies by applying length corrections to the TLM segments [49].
The consequences of each of the two limitations of the TLM have been explored. On one hand, the effect of geometrical details such as the piriform fossae, 1 the vallecula, 2 the inter-dental space, the nasal cavity or more generally the precise 3D vocal-tract shape have been observed to affect resonance frequencies and induce additional resonances and anti-resonances [53], [54]. On the other hand, the higher-order modes have been shown to induce additional resonances and anti-resonances at frequencies above 4-5 kHz [11] and affect speech radiation patterns [12], [16]. Note that side branches can be modelled with TLM, however, to our knowledge, no correction has been proposed to take into account transverse modes.
So far, most of the works using 3D acoustics simulations have been limited to small numbers of static geometries. Some of the most recent works are focused on the generation of articulated sounds such as specific diphthongs [3], [5], [18], [31] or consonant-vowel sequences [4], [32].
The main reason for this limitation is that the 3D numerical methods are strongly limited by their high computational cost, which prevent to use them as a substitute for the TLM.
To illustrate that, using the FEM in the frequency domain, about 10 hours are required to simulate 1000 frequencies using a standard single processor (without meshing the outer space though) [10]. The BEM may have a lower computational cost because it does not require to mesh the outer space and involves a smaller number of degree of freedom compared to FEM and FDM [35]. However, its application to vocal tract acoustics simulation is very anecdotal [34].
One possibility to reduce this cost is to use analytical or semi-analytical methods which have a lower computational cost, but are limited in term of geometrical accuracy. As for the TLM, it is possible to take advantage of the elongated shape of the vocal tract to simulate the propagation of higherorder modes, in addition to the plane waves simulated with the TLM. This type of method, which simulates several transverse modes instead of just the plane mode is called multimodal method. For example Motoki [42], [43] have taken advantage of the lower computational cost of this type of method to run numerous simulations with small perturbations of the geometry. However, the use of rectangular crosssections, which allowed a fully analytical solution with a very small computational cost, implied a strong geometrical approximation of the vocal tract shape. Blandin et al. [11] improved this method by introducing a numerical computation of the transverse modes that substantially increased the realism of the cross-sectional shapes. However, the curvature was not taken into account, the cross-sectional area was piecewise constant and no wall losses were simulated. The multimodal method can also help to understand the physical phenomena involved in speech production better than fully numerical methods, as they provide a theoretical background (higher-order modes propagation) that can be related to experimental observations. For example, it was useful to Yoshinaga et al. [56] for understanding better the mechanisms of sound generation of the fricative /s/. However, in this case as well, the use of rectangular cross-sections made the geometry used less realistic than what could be achieved with fully numerical methods.

C. PROPOSED APPROACH
The objective of this work is to achieve a better trade-off between geometrical accuracy and computational cost by combining the advantages of numerical methods and the state-of-the-art multimodal method.
For this purpose the vocal tract geometry is sliced in segments in which the cross-sectional shape is considered constant, as illustrated in Fig. 1. The local transverse modes are computed with two-dimensional (2D) FEM, and the propagation along the waveguide axis is solved using a multimodal formulation. The wave problem is solved for a static geometry. Similar semi-analytical finite-elements (SAFE) methods have already been applied to the simulation of street acoustics [47] or vibrations in rails [6], [26], [33], [38]. The multimodal formulation of the propagation along the waveguide axis allows us to take into account the effects of the curvature and cross-section area variations, as detailed below in Sec. II. An implementation of this method is compared with a full FEM solution of the wave equation in a single segment and in three vocal tract geometries corresponding to the vowels /a/, /i/ and /u/. The geometries, the parameters used for the multimodal simulations and the method used for the full FEM simulations are presented in Section III. The outcomes of both methods are compared in Section IV.

II. MULTIMODAL METHOD
Throughout the paper, small bold letters denote vectors, capital bold letters denote matrices and ∂ x is understood as ∂ ∂x .

A. SEGMENTATION OF THE 3D GEOMETRY
The 3D vocal tract geometry can be provided as a 3D surface mesh (e.g. from medical image) or by a set of functions defining 3D surfaces (e.g. from an articulatory model). In order to apply the proposed method, they are first segmented in portions with constant cross-sectional shape, as illustrated in Fig. 1. For this purpose, a curve, referred to as the centerline, passing through the vicinity of the centers of the transverse cross-sections and contained in the sagittal plane is defined [9]. The 3D geometry is cut in several planes perpendicular to the centerline. A segment is defined between two consecutive cuts. It is characterized by a contour extracted from one of the cuts, the local curvature of the centerline and a scaling factor describing the area variation along the centerline (see Fig. 2a). Keeping the contour shape constant along the segment allows one to use the same transverse modes along the segment. Several methods have been proposed to implement this segmentation process [7], [9], [28], [36]. For classical TLM simulation, the number of segments generated for vocal tract geometries ranges between 40 and 80 [50], [52]. For 3D simulations, the comparison of FEM simulations of geometries without segmentation and segmented with 40, 60 and 80 segments showed that the segmentation process induces an upward shift of the resonance frequencies (lower than 5% below 5 kHz for 80 segments), the global shape of the transfer functions being preserved [2].

B. EXPRESSION OF THE WAVE EQUATION INSIDE THE SEGMENTS 1) GEOMETRICAL TRANSFORMATION
Following earlier works on the multimodal formulation in complex-shaped waveguides (2D [39] or 3D with circular cross-section [30]), we define a geometrical transformation from the Cartesian coordinates (X , Y , Z ) in which the segments are curved with a varying cross-sectional area (see Fig. 2a), to the coordinates (x, y, z) in which they are straight with a constant cross-sectional area, as illustrated in Fig. 2b. The position m of a point can be described by the coordinates (x, y, z) in a Frenet-Serret frame (t, n y , n z ), whose origin is a point C of the centerline, to which is associated the position vector s. The vector t is tangent to the centerline, and the vectors n y and n z are orthogonal to each other and contained in the local transverse plane (y, z). x is the curvilinear abscissa along the centerline curve. A scaling function l(x) is defined that describes the variations of the cross-section dimensions along the centerline. The position vector m of a point in both coordinate systems can be expressed as An angle α(x) is defined between t and i (see Fig. 2a). The differentials of the vectors s, n y and n z are dn y dx = 0 and where κ(x) is the local curvature of the centerline. In order to apply the proposed transformation to the wave equation, it is necessary to compute its Jacobian matrix J. For this purpose, using the derivatives given in Eq. (2), Eq. (1) is differentiated where f (x) = 1−zκl and l = dl/dx. The vectors t, n y and n z are expressed as a function of the vectors i, j, k and the angle α(x), and Eq. (3) is expressed in a matrix format with the Jacobian matrix where R α is a rotation matrix of angle α. The determinant of this Jacobian is det J = 1 fl 2 .

2) WAVE EQUATION
The wave equation in Cartesian coordinates (X , Y , Z ) is where X p = ∂ 2 X p + ∂ 2 Y p + ∂ 2 Z p is the Laplacian operator, p the acoustic pressure, k = ω/c the wavenumber, ω the angular frequency, c the sound speed and a time factor e jωt is understood. The wall boundary condition is expressed in Cartesian coordinates as ∇ X p · n X + jkζ p = 0, where ∇ X p = ∂ X p ∂ Y p ∂ Z p t is the gradient operator, n X the outward pointing normal to the boundary and ζ = Z 0 /Z w is a boundary admittance coefficient related to the air impedance Z 0 = ρc, with ρ the volumic mass and c the sound speed. The wall impedance Z w is defined as where u is the particle velocity. According to Eq. (2.1) in [44], the Laplacian X can be expressed as a function of the divergence div and the gradient ∇ expressed in the transformed coordinates as where Substituting directly this expression in the wave equation Eq. (6) gives the wave equation in transformed coordinates According to Eq. (2.10) in [44], the outward pointing normal n X in Cartesian coordinates can be expressed as a function of the outward pointing normal n in the transformed coordinates as Knowing that from Eq. (4) the gradient in Cartesian coordinates is related to the gradient in the transformed coordinates as ∇ X = J∇, the boundary condition equation Eq. (7) can be written as After rearrangement, it can be rewritten as where W is the surface constituting the wall boundary of the waveguide. In order to transform the wave equation Eq. (11) in a first order evolution equation along x, a secondary field q is introduced Identifying its expression in the wave equation, it can be rearranged as where the local frame transverse coordinate vector is v = y z t , the transverse gradient is ∇ ⊥ = ∂ y ∂ z t and VOLUME 10, 2022 the transverse divergence is div ⊥ u = ∂ y u y + ∂ z u z . Likewise, an expression is obtained for the boundary condition where n x is the x component of the outward pointing normal n and n ⊥ is the projection of the normal n in the local frame transverse plane (y, z).

3) MODAL FORMULATION
The fields p and q are decomposed into the transverse modes ϕ n (y, z) The transverse propagation modes ϕ n fulfill the orthogonality relationships where a, b ≡ Sā b dS and a, b ≡ Sā · b dS is the scalar product, S is the cross-sectional surface,ā is the complex conjugate of a and γ 2 m is the eigenvalue associated to the transverse mode ϕ m .
Combining Eqs. (16) and (17) and projecting them onto the transverse propagation modes ϕ n yields a set of coupled equations describing the evolution of the vectors p and q where where where is the contour of the segment and the component ζ n corresponds to the boundary impedance specific to the mode ϕ n . This allows one to account for the tangential velocity specific to each mode in the computation of viscous losses as proposed in [17]. For the full development of the modal projection, see Appendix V-B.

C. COMPUTATION OF THE ACOUSTIC FIELD 1) COMPUTATION OF THE TRANSVERSE MODES AND THE PROPAGATION MATRICES
Since the vocal tract cross-sectional shapes are quite different from simple shapes such as ellipses or rectangles, the transverse eigenvalues problem giving the transverse modes ϕ n and the associated eigenvalues γ 2 n is solved using 2D finite elements. A Neumann boundary condition is considered as it is closer to the properties of the vocal tract walls and thus, ensure a faster convergence of the method. Note that an improved formulation using a so called supplementary mode would further improve the convergence [21]. The solution obtained is used to compute the propagation matrices E, C, K 2 and D, defined in Eqs. (25), (26), (27) and (29).
The transverse functions ϕ n can be expressed as the summation of shape functions e i (y, z) corresponding to the finite elements where ξ ni are the amplitudes of the shape functions. The transverse modes ϕ n and their associated eigenvalues γ 2 n are obtained solving the eigenvalue problem where A is the stiffness matrix whose terms are and M is the mass matrix whose terms are The propagation matrices C, D, E and K R2 can be obtained from the eigenvectors ξ n solutions of Eq. (31) as where For the details of the implementation of the 2D finite element method the reader is referred to classical textbooks such as [37]. The computation of the transverse modes and the propagation matrices has been validated comparing it with analytical solutions obtained for a rectangular shape of dimensions 5.5 cm × 3.2 cm. The expression of the transverse modes and the propagation matrices for a rectangular shape are provided in Appendix V-A.
The relative error of the matrices C, D, E and K R2 has been computed as the ratio of the norm of the difference between the numerical and the analytical solutions over the norm of the analytical solution. For example, for the matrix C the relative error is ||C A − C||/||C A ||, with C A the analytical solution. It is presented for various densities in Figs. 3a and 3b for the 10 and 50 first modes respectively. The mesh density is defined as the ratio of the square root of the cross-sectional surface over the average side length of the triangular elements. It can be seen as the number of elements per characteristic length.
As expected, the relative error decreases when the mesh density is increased. For the 10 first modes a density of 15 is sufficient to obtain an error lower than 1%. But for the 50 first modes, one need a density of 30 to achieve this. The error depends on the type of matrix: the matrix C has the lowest errors, and the matrix D has the highest. This may be due to the approximation of the gradient of the modes ∇ ⊥ ϕ for the matrices D and E which could potentially generate more errors, and of the integration on the contour for the matrix K R2 which involves less elements than surface integration. The impact of the mesh density on the computation of transfer functions is evaluated in Section IV-A.

2) CONNECTION OF THE SEGMENTS
As shown in Fig. 1, the vocal tract geometry is described by a succession of segments which need to be connected to each other. This is done by assuming the continuity of the acoustic field at the junctions, as proposed by Pagneux et al. [45]. Fig. 4 illustrates a junction between two waveguide segments with different cross-sectional contours and area varying along the propagation axis. The cross-section of the segment a is contained in the one of the segment b, and, hence, the surface S a (in grey in Fig. 4) is smaller than the surface S b . The variations of cross-sectional area inside the  Relationships between the modal amplitudes at both sides of the interface between the segments a and b can be derived from the condition of continuity of the acoustic field on the common surface S a and the boundary condition q b = 0 on the surface S b − S a of the wall of the largest segment where where S aX is the surface S a expressed in Cartesian coordinates (X , Y , Z ). The scaling factors l a and l b must be introduced as the integration is done in Cartesian coordinates (X , Y , Z ) (see Appendix V-C). Combining Eqs. (42) and (43) and introducing the impedance and admittance matrices Z and Y defined as p = Zq and q = Yp allows one to derive relationships between the impedance and admittance matrices on side a and b However, the segment interfaces obtained by slicing vocal tract geometries do not always correspond to this case: the cross-section of side a may not be fully contained in the cross-section of side b. According to Ginsberg [27], this case cannot be treated directly using an orthogonality based method as used here. To overcome this limitation, zero length segments whose cross-section is the intersection between the cross-section a and b are introduced in this case. This allows one to use the expression derived above. Note that, according to Ginsberg [27], a collocation based method would not have this limitation, and could be used as an alternative. VOLUME 10, 2022 FIGURE 5. Disposition of the physical quantities solution of the wave problem: the impedance Z and admittance Y matrices, the modal amplitude of the acoustic pressure p and the related quantity q, at the input and output of the segment s n .

3) PROPAGATION
The vectors p and q and the impedance and admittance matrices Z and Y can be computed solving numerically the Ricatti equation with a Magnus-Möbius scheme as proposed by Pagneux [46]. The vectors p(x n+1 ) and q(x n+1 ) can be obtained at a position x n+1 from their value at the position x n through the relationship where is, for the fourth order of the Magnus scheme = exp (20) and As the boundary condition at the mouth is a radiation impedance matrix, it is necessary to compute the impedance matrix Z Z Z n+1 at a position x n+1 from the impedance Z n at a position x n . This is obtained through the relationship

D. GENERAL WORKFLOW
The whole process of solving the wave problem is described in the pseudo-code algorithm 1. The different physical quantities implied in solving the wave problem are shown in Fig. 5.
The equation (20) is solved by setting a radiation impedance boundary condition at the mouth end and a particle velocity field at the glottis end. The radiation impedance condition is described by a radiation impedance matrix which can be numerically integrated following the method described in [13]. It is necessary to compute this radiation impedance matrix in the Cartesian coordinates system. Thus, the distances must be multiplied by the factor l at the end of the last of the N s segments. In an infinite baffle, the radiation impedance matrix Z out Ns is given by where h = (y − y 0 ) 2 + (z − z 0 ) 2 . Alternatively, the radiation impedance matrix can be computed using the method proposed by Felix and Doc [22]. The impedance and admittance matrices are computed from the mouth to the glottis, and the acoustic field is computed from the glottis input particle velocity to the mouth using the impedance and admittance matrices computed previously.
The acoustic pressure radiated to an external point (x r , y r , z r ) is computed with the Rayleigh-Sommerfeld integral in the same scaled space where r = x 2 r + (y − y r ) 2 + (z − z r ) 2 and (y, z) ∈ S. This simulation method was implemented as a new functionality of the articulatory synthesizer VocalTractLab [8], which will be made available to the public soon on the website www.vocaltractlab.de.

III. SIMULATIONS
The method presented in section II has been applied to test geometries and compared to FEM simulations. This section presents the test geometries used, the parameters for the multimodal simulations and the FEM used. The results of these simulations are compared and discussed in Section IV.

A. GEOMETRIES 1) SINGLE SEGMENT
The first test geometry is a single horn shaped segment with a circular cross-section. Its centerline is defined as a circle arc of radius 7.5 cm and angle 130 • yielding a total length L = 16.95 cm (see Fig. 8). These dimensions are chosen to be of the same order as vocal tract dimensions. The scaling factor l(x) along the centerline is

2) VOCAL TRACT GEOMETRIES
After testing the method on one segment, it was tested on more realistic vocal tract geometries constituted of multiple segments. They were generated by the articulatory model Compute the matrices A, M, M z , A z , B and R integrating Eqs. (32), (33), (38), (39), (40) and (41) 5: Compute the transverse modes ϕ m and their eigenvalues γ 2 m solving Eq. (31) 6: Compute the matrices C, D, E and K R2 using Eqs. (34), (35), (36) and (37)  7: end for 8: for Each segment do 9: if the segment is connected to another one then 10: Compute the mode matching matrix F integrating Eq. (44) 11: end if 12: end for 13: for each frequency do 14: Compute the radiation impedance matrix Z out Ns following [22] or [13] 15: n = N s 16: while n > 0 do 17: Propagate either Z n or Y n depending on the input using Eqs. (51) or (52)  Propagate p n and q n using Eq. (47) 32: if entrance area of s n+1 > exit area of s n then 33: Compute q in n+1 at the entrance of the next section using Eq. ( Compute the radiated acoustic pressure using the Rayleigh-Sommerfeld integral Eq. (54) 43: end for implemented in VocalTractLab 2.3 (www.vocaltractlab.de). The geometries provided by this software have been fitted on MRI scans of a male subject [8]. The vowels /a/, /i/ and /u/ were selected as they represent well the variety of vocal tract shapes for vowels. To ensure the best geometrical accuracy possible, the maximal number of segments provided by the segmentation algorithm implemented in VocalTractLab 2.3 was used. This results in 129 segments whose length ranges between 0.55 mm and 1.6 mm. The corresponding contours were exported in text files so that they could be used with Matlab and GID 3 to generate finite element meshes. During this process, the consecutive contours were linked by straight segments. This introduces slight differences with the geometry actually simulated with the multimodal method. In fact, since the proposed method considers the contour shape as constant within each segment, small discontinuities are introduced, making the geometry simulated rough compared to the one simulated with FEM. The potential consequences of such discontinuities are discussed in the section IV-B.

B. PARAMETERS FOR THE MULTIMODAL SIMULATIONS
For both the single segment and the vowel geometries, the multimodal simulations were performed using the same value of wall boundary admittance coefficient ζ = 0.005 as in [2]. A uniform input particle velocity G(f ) = q in · ϕ, with f the frequency, was imposed on the glottal cross-sectional surface G . This was done setting the amplitude of the plane mode to q in 0 = −j2πf ρ and the other ones to zero. A vocal tract transfer function was computed as where P o (f ) is the acoustic pressure tracked inside the geometries at 3 mm from the center of the exit surface, G(f ) = −j2πf ρϕ 0 the uniform input velocity (ϕ 0 being the plane mode) and A g the glottal cross-sectional area.

1) SINGLE SEGMENT
A zero pressure boundary condition was simulated at the open end of the single segment. It was approximated by setting a uniform impedance of 10 −16 kg.m −2 .s −1 on the exit surface. This was done by defining Z out Ns as a diagonal matrix whose diagonal terms are all equal to this value. The impact of different numbers of modes, mesh densities and number of integration points were evaluated. Simulations were performed using the 6, 17, 32 and 53 first modes (ordered by increasing cutoff frequency). The choice of these numbers is done to include the second, third, fourth and fifth axi-symmetric modes which have significant contribution to the acoustic field in geometries with circular cross-section. The mesh density was varied from 10 to 30 and the number of integration points from 25 to 200 (about 1.5 to 12 points per centimeter).

2) VOWELS
The boundary condition at the open end was simulated using a radiation impedance matrix numerically integrated using the method proposed in [13]. The number of modes in each segment was determined by a maximal cutoff frequency criterion: all the transverse modes having a cutoff frequency lower than a chosen value were included. This results in a reduced number of modes for the smaller segments, which allows one to minimise the number of modes to be simulated over the entire geometry for a given accuracy. For a maximal cutoff frequency of 40 kHz the number of transverse modes ranges between 2 and 42. Different values of this criterion ranging from 20 kHz to 60 kHz were used to evaluate the influence of this parameter (see section IV-B).
Different geometrical approximations were explored: • straight segments with constant cross-sectional area (κ = 0, l(x) = 1), corresponding to the state of the art multimodal simulations of vocal tract geometries [11], where R c is the curvature radius determined by the angle between the normals of the cutting planes at both ends of the segment, • segments with curvature (κ = 1/R c ) and area variations defined as a linear interpolation between the area at the entrance and the end of the segments (case presented in Fig. 2).

C. FINITE ELEMENT SIMULATIONS
FEM simulation were performed to compare to the multimodal simulations. It was used to numerically solve the wave equation in the time domain for the acoustic pressure p(m, t) and particle velocity u(m, t) 1 ρc 2 ∂ t p + ∇ · u = 0, (57a) in a computational domain . Equation (57) is supplemented with the following set of boundary and initial conditions (see Fig. 6) is used to introduce a particle acoustic velocity at the glottal cross-sectional area G , Z w is the wall impedance imposed at the vocal tract walls W to introduce losses, and Z 0 = ρc is the air impedance used in (58d) to implement a Sommerfeld boundary condition so as to absorb sound waves reaching the outer boundary ∞ . Z w is computed from the boundary admittance coefficient ζ = Z 0 /Z w , which was set to the same value as for the multimodal simulations: ζ = 0.005. A rigid wall boundary condition was enforced on the baffle surface H of the external domain. For the vocal tract geometries, the computational domain consists of a vocal tract set in a hemisphere of radius 0.16 m to allow sound waves radiate outwards from the mouth (see Figure 6). In the single segment geometry, however, the radiation domain is removed and terminates at the horn exit. A zero pressure release condition (p = 0) is imposed on it to emulate an open-end condition. A volume mesh was next generated discretizing in a set of tetrahedra of average size 0.003 m in the horn or vocal tract, and {0.004, 0.0065} m in the inner and outer radiation domains of the vocal tract geometries.
The wave equation (57) with boundary and initial conditions (58) was next solved in using a custom FEM code based on the stabilization strategy defined in [29]. The stabilization parameter was reduced from 0.05 to 0.01 compared to [29], which allowed us to use larger time steps while preserving a low numerical dissipation. The following Gaussian pulse was used as input for the vocal tract, with T gp = 0.646/f c and f c = 10 kHz. This pulse was low-pass filtered at 10 kHz to avoid numerical errors above the maximum frequency of analysis (10 kHz). A numerical simulation of a time event lasting 0.05 s was then performed using a time step of 2e-6 s. The acoustic pressure p o (t) was tracked close to the vocal tract exit, inside of it, at 3 mm from the mouth aperture center. This helped us to minimize possible spurious reflections coming from ∞ , compared to capturing the acoustic pressure in the radiation domain. A vocal tract transfer function was finally computed in the same way as for the multimodal simulations defined in Eq. (56). P o (f ) and G(f ) were computed as the Fourier transform of p o (t) and g(t) respectively. This simulation method showed in previous studies a good agreement with experimental data [2], [11]. More generally, the FEM can be trusted to describe accurately vocal tract acoustics, as illustrated by the good agreement obtained by Fleischer et al. [25] between a different implementation of FEM and experimental data.

IV. COMPARISON WITH FINITE ELEMENT SIMULATIONS A. SINGLE SEGMENT
The figure 7 shows the transfer function of the single segment computed with FEM and the multimodal method with 53 modes, a mesh density of 30 and a 200 integration points (about 12 points per centimeter). It shows the typical resonances produced by the plane mode, but also exhibits troughs and additional resonances related to transverse resonances induced by the curvature and higher order modes propagation. See as an example the trough located at 5755 Hz and the corresponding acoustic field, computed with the multimodal method, presented in Fig. 8, in which these effects are clearly observable. The wall boundary condition described in Eq. (7) implies that in the vicinity of the walls the normal gradient of the acoustic pressure is very small. This implies that the isoamplitude lines are almost perpendicular to the walls. This can be observed in Fig. 8, thus, confirming that this boundary condition is properly simulated.
The influence of the number of modes, the mesh density and the number of integration points on the simulations was examined by computing the transfer function with different values of these parameters. The resonance frequencies, -3 dB bandwidths and amplitude of the resonances of the obtained This reference case was first compared with the FEM simulation with which a very good agreement was observed. As can be seen in Fig. 7, the amplitude of both transfer functions is almost perfectly superimposed, with some slight differences increasing toward the high frequencies which consists mainly in stronger losses and slightly higher resonance frequencies for FEM. The latter can be attributed to the stiffening effect of FEM [19]. More precisely, the average differences are 0.16%, 15.6% and 1.2% for the resonance frequencies, bandwidth and amplitudes respectively. The maximal differences are 0.33%, 51.3% and 3.7% for the frequencies, bandwidth and amplitudes respectively. The significant differences in bandwidth can be attributed to the time and spatial discretization of FEM, which tends to artificially increase the losses with frequency. Indeed, note that below 8 kHz the FEM and MM curves match to a large extend, reducing the average differences to 0.1%, 6.8% and 0.45% for the resonance frequencies, bandwidths, and amplitudes, respectively. It is to be mentioned that FEM results could be improved using a finer mesh and smaller time step, but at the price of increasing the computational cost.
The relative difference between different values of parameters and the reference case (mesh density of 30, 53 modes and 200 integration points) are presented as box plots in Fig. 9. When a parameter is changed, it is implicitly understood that VOLUME 10, 2022 the other parameters are set to their maximal values (e.g. the mesh density and the number of integration points are set to 30 and 200 when the number of modes is varied). The differences reduce as the parameters get closer to the reference case. The parameter which induces the greatest impact is the number of modes. Large differences are observed for the bandwidth which is equally influenced by the three tested parameters. 32 modes are sufficient to achieve an average difference lower than 0.1% for the frequencies and amplitudes, but still more than 1% for the bandwidths.

B. VOCAL TRACT GEOMETRIES 1) TRANSFER FUNCTIONS
The transfer functions of the vowel geometries obtained with different geometrical approximations and with FEM are presented in Fig. 10. There is an overall good agreement for the 3-4 first resonances between the different geometrical approximations: the difference in resonance frequencies with FIGURE 11. Acoustic pressure amplitude computed with the proposed method in the sagittal plane and in two transverse cutting planes indicated as dashed lines in a vocal tract geometry corresponding to the vowel /i/ at 7500 Hz for an input volume velocity of 1cm 3 /s. This corresponds to the 6 th resonance of the transfer function obtained with curved segments in Fig. 10. The iso-amplitude lines are highlighted in black.
FEM remains lower than 5% in all cases. This is in line with the fact that the plane wave assumption is a good approximation up to 4-5 kHz [15].
The agreement with the FEM solution is better when the curvature is considered, and further improved when the variations of cross-sectional area are taken into account. The resonance frequencies are generally lower with the straight segment approximation. In particular for the second resonance of the /i/ (see Fig. 10b). This is in line with the findings of Arnela et al. [2]. Furthermore, taking into account the curvature just by changing a parameter of the model confirms that this effect is induced by the curvature alone. As a matter of fact, in the study of Arnela et al. [2] this could have also been induced by the method used to straighten the vocal tract geometries.
There is overall more difference between the geometrical approximations above 4-5 kHz, in particular for /i/ and /u/. Taking into account the variation of cross-sectional area substantially improves the agreement with FEM in this frequency range for /u/ (see Fig. 10c). Thus, the high frequency acoustic properties of the vocal tract appear to be sensitive to small geometrical details. This is in line with the findings of Motoki [43] who showed that small geometrical perturbations of simplified vocal tract geometries induced more important variations of the transfer function above 4 kHz. Despite the roughness introduced by the contour change, there is a relatively good agreement with FEM simulation performed with a smoother geometry.
Substantial differences remain between FEM and the multimodal method for the case accounting for the curvature and the cross-sectional area variations (about 1% of difference in the resonance frequencies), see also around 8 kHz for /i/ and /u/ (see Figs. 10b and 10c). This can be attributed to remaining geometrical differences due to the discontinuities introduced by the contour changes, and to intrinsic differences between both methods. The small geometrical differences could be slightly reduced, for example by using a better interpolation of the cross-sectional area, or by refining the segmentation of parts having significant changes of cross-sectional shape. However, the gain in accuracy may be of the order of the uncertainties in the extraction of the vocal tract geometries from medical images. It would also be possible to optimise the segmentation by using longer segments in the parts having little variations of cross-sectional shape, and thus further reduce the computational cost. The algorithm used to generate the centerline curve can have an impact as well, as it was observed by Arnela et al. [5]. However, it is expected to be smaller for 3D simulations than for TLM simulations.
The losses are slightly lower with the multimodal method: the resonances have smaller -3 dB bandwidths (of the order of 10%). This can be attributed to the perfectly reflective surfaces which are considered at the discontinuities between the segments (see Section II-C2). On the other hand, the numerical scheme used for the FEM could tend to slightly artificially increase the losses as the frequency increases due to the adopted stabilization strategy and the time and space discretization (see Section III-C).

2) ACOUSTIC FIELD
The Fig. 11 illustrates the acoustic field computation inside a segmented geometry with the field corresponding to the 6 th resonance of the vowel /i/ (7500 Hz). As for the single segment, the effect of the transverse modes is clearly visible as transverse variations both in the sagittal and transverse planes. The iso-amplitude lines tend to be perpendicular to the walls of the geometry, confirming that the boundary condition is properly simulated. When looking carefully, the effect of the segmentation is visible as small discontinuities in the iso-amplitude lines where there is a strong discontinuity at the interface between two segments (see as an example at the expansion at the base of the geometry). This may be improved by using more transverse modes to describe the strong discontinuities.

3) INFLUENCE OF THE NUMBER OF MODES
Multimodal simulations were run with values of the maximal cutoff frequency ranging from 20 kHz to 60 kHz using the best geometrical approximation (taking into account the curvature and the variations of cross-sectional area). The resonance frequencies of the transfer functions were compared with the solution obtained with 60 kHz and the FEM solution. The distribution of the relative error is presented as box plots in Fig. 12. For the 3 vowels tested the simulations converge toward the 60 kHz solution, with less than 1% of difference from 40 kHz. It does not strictly converge toward the FEM solution, as from 40 kHz onward the difference with FEM remains stable or slightly increases, but remains smaller  than 1%. Note also that for some resonances the difference does not decrease rigorously monotonously: it can increase for some specific values of the maximal cutoff frequency (see as an example the outlier at 30 kHz for /a/). points per segment (about 20 points per centimeter) and 1000 frequencies. The computations were run in a single thread on a processor Intel R Xeon R W-2145 3.7 Ghz. The computation time varies with the vowel geometries, and from 30 kHz onward, it increases exponentially. Fig. 13b shows the distribution of the computation time over the different tasks required to solve the wave problem for the vowel /i/ (vowels /a/ and /u/ show similar trends). They consist in the computation of the transverse modes, the junction matrices (Eq. (44)), the radiation impedance matrix (Eq. (53)), the propagation of the physical quantities (Z, Y, p and q) and the computation of the radiated field (Eq. (54)). The computation time for the propagation is split into the computation of the matrix exponential in Eq. (48) and the other parts of the task since this part of the process requires a lot of computation time. Most of the computation time is dedicated to the computation of the transverse modes and the matrix exponential. When few modes are used, the computation time is mainly spent in the computation of the modes. As the number of modes used is increased, the time spent in the computation of the matrix exponential increases and becomes the task occupying most of the computation time. Note that the computation of the transverse modes, the junction matrices and the radiation impedance are done only at the beginning of the whole process. Thus, the time required for these tasks is not affected by the number of frequencies at which the transfer function is computed. A maximal cutoff frequency of 40 kHz appears to be a good compromise between the accuracy and the computation time.

4) COMPUTATION TIME
For frequency domain FEM, 10 hours were necessary to compute 1000 frequencies with a single processor [10]. However, in this example a simplifying assumption was made for the radiation and the external space was not meshed, which reduces the computation time in comparison to a more accurate simulation. Thus, it can be concluded that the proposed method provides simulations with the same degree of accuracy as FEM, but with a computation time reduced by about a factor 10. The implementation used can be optimised to achieve even shorter computation times. One can also mention that it requires far less exchange memory, and it is thus easier to run on a standard laptop without special equipment for simulations (e.g. additional memory).

V. CONCLUSION
The proposed method can simulate curved vocal tract geometries with realistic cross-sectional shapes and cross-sectional area variations within the segments. The same degree of accuracy as FEM is achieved for the computation of transfer functions of a single segment (about 0.2% of differences in the resonance frequencies). Accounting for the curvature and the cross-sectional area variation substantially improves the agreement with FEM, particularly above 4-5 kHz. The segmentation process of realistic 3D vocal-tract geometries introduces small differences, but the agreement with FEM simulations remains very good (less than 1% of difference in the resonance frequencies). A good trade-off between accuracy and computation time is achieved using a mesh density of 15, about 10 integration points per centimeter and a maximal cutoff frequency of 40 kHz. With these parameters, the proposed method achieves computation times about 10 times shorter than FEM while preserving the same degree of accuracy. The proposed method can be used to synthesize articulated sounds by simulating a succession of static geometries corresponding to various instants of the articulation motion and following the approach proposed by Sondhi and Schroeter [49].

A. ANALYTICAL EXPRESSIONS OF THE TRANSVERSE MODES AND PROPAGATION MATRICES FOR A RECTANGULAR SHAPE 1) TRANSVERSES MODES
The transverse modes corresponding to a rectangular shape of side lengths a and b along y and z respectively, can be expressed as and the corresponding eigenvalues are 2) MATRIX C The terms of the matrix C are where C y and C z are the integrals over y and z respectively. 2 (cos(nπ) − 1) if j = 0 and n = 0, √ 2b (jπ) 2 (cos(jπ) − 1) if j = 0 and n = 0. (64)

5) MATRIX K R2
RÉMI BLANDIN was born in Pau, France, in 1989. He received the master's degree in acoustics from the University Pierre and Marie Curie, Paris, France, in 2013, and the Ph.D. degree in acoustics from the University of Grenoble, France, in 2016.
Since 2019, he has been working as a Postdoctoral Researcher with the Dresden University of Technology (TU Dresden). His research interests include waveguide acoustics, semi-analytical simulation methods, speech production, articulatory synthesis, and speech directivity.
MARC ARNELA received the degree in telecommunications engineering, with a specialization on image and sound, the M.Sc. degree in networks and telecommunications engineering, and the Ph.D. degree in acoustic engineering from La Salle-Universitat Ramon Llull (URL), Barcelona, Spain, in 2009, 2010, and 2015, respectively, and the professional degree in music education, with a specialization on piano, from the Conservatori Municipal de Música de Barcelona, in 2009. He is currently a Lecturer and a Researcher on acoustics with La Salle-URL, where he is also the Coordinator of the Acoustics Laboratory. His main research interests include the numerical production of voice using finite element methods, the design and experimental testing of new loudspeakers made of ultrasonic sensors, and the acoustics of ducts and mufflers with a special interest on the acoustic black hole effect. VOLUME 10, 2022 SIMON FÉLIX received the M.S. degree in fundamental physics from Université Paris-Sud, France, and the Ph.D. degree in acoustics from Le Mans University. He is currently working as a Senior Researcher with the Laboratory of Acoustics, French National Centre for Scientific Research (CNRS), Institute of Acoustics-Graduate School, Le Mans, France. His research interests include wave propagation in inhomogeneous waveguides, or structured complex media whose heterogeneity, anisotropy, or regularity (symmetries, periodicity, and disorder) are used for wave control applications. The fields of application of this work cover various fields of wave physics and various problems, such as urban acoustics, musical acoustics, metamaterials, or wave control.