An Innovative Architecture of Full-Digital Microphone Arrays Over A 2 B Network for Consumer Electronics

—Microphone arrays of various sizes and shapes are currently employed in consumer electronics devices such as speakerphones, smart TVs, smartphones, and headphones. In this paper, a full-digital, planar microphone array is presented. It makes use of digital Micro Electro-Mechanical Systems (MEMS) microphones, connected through the Automotive Audio Bus (A 2 B). A clock propagation model for A 2 B networks, developed in a previous work, was employed to estimate the effects of jitter and delay on microphone arrays. It will be shown that A 2 B allows for a robust data transmission, while ensuring deterministic latency and channels synchronization, thus overcoming the signal integrity issues which usually affect MEMS capsules. The microphone positioning is also discussed since it greatly affects the spatial accuracy of beamforming. Numerical simulations were performed on four regular geometries to identify the optimal lay-out in terms of number of capsules and beamforming directivity. An A 2 B planar array with equilateral triangle geometry and four microphones, three in the vertices and one in the center, was built. Experimental measurements were performed, obtaining an excellent matching with numerical simulations. Finally, the concept of an array of arrays (meta-array) is presented, designed by combining several triangular units and analyzed through numerical simulations.

transmission architecture are critical aspects, as they have a significant influence on the audio quality.The geometry of the array, the number of capsules and their layout, instead, mostly affect spatial accuracy and working frequency range of beamforming.As a result, we see a steadily growing demand for systems capable of supporting more and more channels.Examples of massive multichannel microphone arrays can be found in [10]- [12].
Regarding the choice of capsule type, most of today's solutions employ analog microphones, which may provide high-quality audio signals.On the other hand, they entail bulky wiring and noise immunity problems, particularly in case of long wires connecting the capsules with the Analog-to-Digital (A/D) converters.In addition, analog capsules, pre-amplifiers, and A/D converters contribute to increase the cost of the system and design complexity.Conversely, low-cost digital MEMS microphones are more robust to electrical noise, usually at the price of worse acoustic performances (dynamic range and Signal-to-Noise Ratio).
In this paper, a full-digital solution for multichannel arrays is presented, overcoming most of the above-mentioned limitations.Although the usage of MEMS capsules is not new [13]- [15] the adoption of the A 2 B bus [16], [17] is significantly innovative.In fact, most of the existing solutions employing digital MEMS capsules make use of the I 2 S or TDM interfaces [18], [19], which do not provide robust transmission of digital signals over distance, thus limiting the length of the cabling to a few centimeters.An A 2 B network, instead, can transport up to 32 channels through a series of distributed nodes connected in daisy chain up to 40 m.The connection between nodes is made with a single low-cost Unshielded Twisted Pair (UTP) cable, which can also carry power supply together with data.Each node can include transducers such as, but not limited to, microphones and accelerometers.To manage the data flow, a single low-cost A 2 B transceiver is required on each node, reducing dramatically the cost of the network.At the same time, it offers several advantages for building microphone arrays such as low deterministic latency of just two samples (e.g., less than 50 μs at 48 kHz) and clock synchronization.Finally, yet important, the A 2 B bus is expandable: the number of capsules of an array, or in general the complexity of an A 2 B network, can be increased just by adding additional nodes to the daisy chain.This allows to cover large areas by distributing the nodes, and to build modular arrays, or "meta-arrays", by combining multiple nodes into a single physical device.
The synchronization of the clock signals is mandatory for ensuring optimal beamforming of transducer arrays.For this reason, the effects of the clock propagation in an A 2 B network [20] have been studied, evaluating how the latency and the jitter could affect the beamforming of an A 2 B microphone array.The clock propagation was modelled with uncorrelated normal distributions, whose mean values (μ) and standard deviations (σ ) represent deterministic latency and random jitter.The values of μ and σ were assessed by means of experimental measurements.By applying the clock model to numerical simulations of the array, it will be shown that the A 2 B bus is optimal for building microphone arrays, since the effects of clock propagation and clock reconstruction are negligible in almost every practical application.
The geometry is another critical aspect in the design of a microphone array.In fact, the position of the capsules greatly affects the spatial sampling of the sound field.This has an impact on the robustness and stability of the beamforming filters and on the performance of the array in terms of spatial accuracy.Many solutions have been already explored, mostly spherical and cylindrical [21]- [26], and examples of two-dimensional planar microphone arrays can be found in [27]- [30] with circular, radial or random distribution of the capsules.
In this paper, numerical simulations of planar arrays having regular polygon geometry with different number of vertexes were carried out, with the aim of optimizing the beamforming accuracy and the number of microphones.
Capsules have been positioned on the vertices of triangular, square, pentagonal, and hexagonal geometries, and the theory that a central capsule can improve the spatial accuracy of the beamforming was exploited, as suggested in [31].It will be shown that the equilateral triangular array with four capsules, three at vertices and one in the center, is the most efficient solution with respect to square, pentagonal, or hexagonal arrays with number of capsules ranging between 4 and 7. Finally, a prototype of triangular array with four capsules over A 2 B bus was designed and built.Measurements were performed, showing an excellent matching between numerical and experimental results.
Aforementioned geometries were preferred, as they allow exploiting the A 2 B bus expandability, an innovative feature of the described solution.In the proposed design, several arrays can be attached in daisy-chain and combined to form more complex and more performing structures, the meta-arrays, which are currently under development by the same authors.The paper concludes presenting a preliminary study of a planar meta-array, constituted by seven triangular units, designed accordingly to the previous results, and positioned side by side.The beamforming accuracy of the meta-array was analyzed through numerical simulations also considering the effects of the clock propagation.

II. BEAMFORMING THEORY
The raw signals captured by the capsules of a microphone array can be combined to obtain arbitrary directivity beams, called virtual microphones.This is generally known as encoding or beamforming.In this work, such operation is performed with a linear processing, which makes use of a matrix of Finite Impulse Response (FIR) filters, computed with the regularized Kirkeby [32] inversion: where is the index of the Directions-of-Arrival (DoA) of the sound wave; the matrix C is the complex response of each capsule m for each direction d, the matrix A defines the frequency independent amplitude of the target directivity patterns, e −jπ k introduces a latency that ensures filters causality, the dot (•) is the scalar product, I is the identity matrix, [] * denotes the conjugate transpose, [] −1 denotes the pseudo-inverse, β is a frequency-dependent regularization parameter [33], and it represents the most significant improvement of the Kirkeby method over the traditional Tikhonov regularization, where instead β is constant.The grid of DoA employed for simulating and measuring the array responses is a spherical t-design geometry [34], [35], of order t = 21, consisting in a total of D = 240 directions uniformly distributed over a unit-radius sphere.
A unidirectional virtual microphone (V = 1) was encoded, centered, and pointed outward, perpendicularly to the surface of the array.The target directivity A is a fourth order cardioid without any side or rear lobes [36] defined as follow: 4  (2) The virtual microphone is obtained by multiplying, in the frequency domain, the response of the array C m,d,k with the beamforming filter H m,v,k , as follow: Ideally, i.e., in case of perfect reconstruction, it would result V = A for all k frequencies and d directions.Two-dimensional and three-dimensional directivity of a fourth-order cardioid virtual microphone are shown in Fig. 1.
The effective directivity is evaluated by employing two parameters as a function of frequency, directivity factor Q and half-power beamwidth BW [37].The directivity factor Q is given by: where I max is the magnitude of the sound intensity vector in the direction of maximum emission, I 0 is the average of the magnitude of sound intensity over the whole sphere, v is the virtual microphone, and k is the frequency.The parameter BW is equal to twice the angle of the beam between the direction of maximum sensitivity and the direction at −3 dB below the maximum, hence: where is the direction where directivity is reduced by 3 dB respect to the maximum, ∠ denotes the angle, v is the virtual microphone, and k is the frequency.

III. ARRAY DESIGN AND SIMULATION
Beamforming is physically constrained by the ratio between the distances of the capsules within the array and the wavelength, as: where c = 343 m/s is the speed of sound and f is frequency.The minimum distance s between the capsules was designed considering the typical frequency range of vocal band, which is usually 300 Hz -3.4 kHz in telecommunications [38], [39].Thus, the array would fit voice applications, such as teleconferencing, speech recognition, speakerphone, or ANC systems.
The array was designed with a distance between the capsules s min ≈ 25 mm, obtained by substituting f max = 3.4 kHz in (6).
In this way, array dimensions fit consumer electronics applications.By exploiting meta-arrays geometries, it is possible to increase the dimension of the array and the number of capsules, hence the maximum distance between the capsules s max , thus improving low frequency performance without affecting high frequency performance.
To investigate the minimum number of capsules satisfying the requirements, four arrays were designed, having shape of equilateral triangle, square, regular pentagon, and regular hexagon, arranging the capsules in the vertexes along a circle, and keeping constant the radius r = s min = 25 mm, as shown in Fig. 2. The analyzed layouts feature a central capsule, as suggested in [31].
The four geometries were numerically simulated in frequency domain by employing the Finite Elements Method (FEM).The material of the domain is air, while the arrays were modeled as a rigid body.The simulation is performed considering the near field effect: the system is stimulated by a point source radiating spherical waves of 1 Pa at 1 m distance.The simulations were calculated for each direction d of the previously described grid (D = 240).
A 3-dimensional modelling was used, thus discretized with a tetrahedral mesh, featuring six elements per wavelength [40].The simulations were solved within the frequency range  300 Hz -3.4 kHz, with a frequency resolution of 5 Hz.For each direction d, the solution is evaluated at the M points corresponding to the position of capsules, considering an ideal frequency response.This provided the matrix C m,d,k required to solve (1).
The solutions were processed by combining ( 2) in ( 1), for values of M ranging between 4 and 7.Then, the frequency dependent parameters Q (Fig. 3) and BW (Fig. 4) were evaluated for each of the four different cases.
The values of Q are in the range 4.2 -7.8, while BW values are comprised between 105 • and 69 • .It is possible to note that the parameter BW as a function of M has slight variations in the frequency range of interest.The same applies to parameter Q in the range 300 Hz -2 kHz.Instead, the value of Q increases from 6.7 to 7.8 at 3.4 kHz.This means an increment of 16.4% of Q when using 7 capsules instead of 4, that is a 75% increase of M. The ratio Q/M, which is the directivity factor normalized on the number of capsules, represents the effectiveness of a single capsule and it is maximum when M = 4, as shown in Table I.
In conclusion, the triangular array with four capsules allows maximizing the directivity and minimizing the half-power beamwidth with the minimum number of capsules.

IV. A 2 B ARCHITECTURE AND CLOCK PROPAGATION MODEL
The block diagram of the architecture of an A 2 B network is shown in Fig. 5.The network is composed by a single main node and multiple subordinate nodes.Each subordinate node has I/O ports to communicate with local devices, e.g., transducers and codecs.The main node provides the clock to the network, and it is connected to an interface board, which converts A 2 B signals into protocols commonly used for digital audio distribution, such as USB, AES67 or MADI.A 2 B allows to transport up to 32 audio channels, as well as control data, on a single network composed of up to 10 subordinate nodes.In addition, subordinate nodes can be power supplied through the same data bus (up to 2.7 W), thus keeping the wiring as simple as possible.Access to the bus and audio flow are managed by dedicated A 2 B transceivers, removing the need of additional devices that would increase the system cost and design complexity.A simple microcontroller (e.g., an 8-bit microcontroller) is required on the main node to configure the network at the start-up, but it is not needed anymore during normal operation.
Since A 2 B subordinate nodes reconstruct the clock from the network, in this chapter, the effects of clock latency and jitter on the beamforming performance are evaluated by means of an equivalent additive noise [41]- [43].Fractional delays were applied in frequency domain, overcoming the limited resolution of the discrete time domain, namely ±1 sample (e.g., 20.83 μs at 48 kHz).Clock propagation was introduced in the post-processing of numerical simulations by means of (7): where C is the numerical response of the array and e −jωt delay accounts for the clock skew contribution.Finally, t delay [s] is: where N is the standard normal distribution, μ is the mean value of N , σ 2 is the variance of N , d = [1, . . ., D] is the DoA index of the sound waves, m = [1, . . ., M] is the capsule index, k is the frequency index.Hence, the clock propagation was modelled with uncorrelated normal distributions, whose mean value (μ) represent a deterministic latency and the standard deviation (σ ) represent the effect of a random jitter.
The model was tuned accordingly to experimental measurements specifically performed, whose results are summarized in Table II and represented in Fig. 6.Each A 2 B chip reconstructs the clock on each subordinate node, therefore all the capsules of an array connected to the same A 2 B transceiver share the same clock source and are affected by the same amount of jitter and latency.Hence, assigning different latency and jitter values at each capsule of a single node models the worst-case The central capsule, assumed as relative reference of the time scale, has μ = 0.The latency of the other capsules is modelled with the worst possible combination.The standard deviation values are measured, and they reflect the tendency of the jitter to increase at each reconstruction of the clock on subsequent subordinate nodes.By processing numerical simulations with and without jitter and latency, it was possible to assess that any effect is completely absent.This result confirms that A 2 B technology is optimal for building microphone arrays and particularly advantageous when several A 2 B chips are employed at the same time, which is the case of meta-arrays.

V. EXPERIMENTAL RESULTS
A prototype of the triangular array was built (Fig. 7, microphone capsules are highlighted with red circles).One can note that capsules face on a perfectly flat and smooth PCB surface to minimize diffractions.Therefore, the electronics was designed to have all components and connectors on the rear side.The employed digital MEMS capsule is characterized by an Acoustic Overload Point (AOP) of 130 dB (SPL), SNR of 69 dB(A), dynamic range of 105 dB and an operating voltage range 1.62 -3.6 V [44].
The block diagram of the designed triangular array is shown in Fig. 8.The triangular array is seen as a single A 2 B subordinate node.The UTP cable length connecting the main node with the subordinate node can be up to 15 m.This allows placing the microphone array in the desired position, keeping the acquisition interface away.The acquisition board comprises the A 2 B main node and the interface board.In this work, it has been adopted an USB connection between the interface board and the PC.Such design allows connecting in daisy-chain many arrays to form a larger A 2 B network, realizing both distributed arrays or the previously mentioned made by positioning side-by-side, in two or three dimensions, several array boards.
The prototype was measured in an anechoic chamber with a two-axis turntable [23] and a loudspeaker positioned at 1 m.The schematic of the experimental setup is shown in Fig. 9.The turntable is controlled by the PC via ethernet link.The test signal is sent from the PC to the interface board through USB.This system allows playing the test signal and synchronously recording all the channels of the Device Under Test (DUT), in this case the microphone array.The sound source is a studio monitor, which is connected to the interface board through S/PDIF.In this way, the whole measurement system is full-digital.
The test signal is an Exponential Sine Sweep (ESS) [45], pre-equalized for flattening the spectrum of the sound source in the range 50 Hz -18 kHz within ±0.5 dB.The same test grid employed for simulations was used (D = 240 directions).After each measurement, the PC sends to the turntable the new measurement direction and then the ESS is played and recorded.Impulse Responses (IRs) were calculated by convolving the recorded signals with the inverse filter associated to the test signal, namely the inverse sweep [45].Finally, the C m,d,k matrix of (1) is obtained by applying a Fast Fourier Transform (FFT) to the IRs.Subsequent processing is unchanged with respect to the simulations.
The triangular array was processed and compared to the simulation, by superimposing the directivity polar patterns (Fig. 10).
One can note a very good agreement between the two methods, and particularly at the frequency bands centered at 2 kHz and 4 kHz.Instead, the numerical solution provided slightly narrower polar patterns at 500 Hz and 1 kHz.This behavior is explained by the non-idealities affecting the experimental approach.These non-idealities are geometrical, related to the measurement system, and constructive, related to the MEMS capsules, which are not identical in terms of magnitude and phase response.

VI. META ARRAY SIMULATION
The planar meta-array of Fig. 11 was designed and analyzed accordingly to the outcomes and methodology previously described in Section III.The simulation was performed up to 3.   performance improvement in the low frequency range provided by the increased value of s max .Seven triangular arrays were employed, thus the total number of capsules of the meta-array is M = 28 (4 capsules for each triangular array).
For exploiting the increased beamforming capabilities of such meta-array, the target function required in (1) was modified for producing three virtual microphones of the same type described by (2), uniformly distributed along the azimuth at angles 0 • , 120 • , and −120 • , with an elevation of 45 • , as shown in Fig. 12.
Then, the parameters Q (Fig. 13) and BW (Fig. 14) were calculated for the following three cases: only central triangular array (M = 4), central triangular plus the three triangular arrays around it (M = 16), and all the seven triangular arrays (M = 28).Each parameter was averaged among the three virtual microphones, thus reducing the number of curves, and improving the reliability of the result.
One can note that the directivity factor Q increases in the whole frequency range from M = 4 (black line) to M = 16 (red line) and a further increase is observed below 700 Hz with M = 28 (green line).Conversely, the parameter BW decreases in the whole frequency range between the cases M = 4 and   M = 16 and a further decrement below 700 Hz is observed with M = 28.At high frequency the improvement is provided by the increased number of capsules, which in turn gives a better conditioning of the Kirkeby inversion in (1), while at low frequency by the increment of the maximum distance s max between the capsules of the meta-array.
Clock propagation effects were introduced in meta-array simulations, to consider the non-idealities introduced by the data acquisition and transmission architecture.Clock propagation measurements obtained in [20] were employed, as discussed in Section IV.A latency value μ between −10 ns and 10 ns was randomly assigned to each node, with a uniform probability distribution.The standard deviation of random jitter σ was linearly increased by 0.2 ns for each node, starting from the minimum measured value of 1.6 ns.Hence, the four capsules of each node are affected by the same jitter amount, as it happens effectively.It was concluded that clock propagation has no effect on both Q and BW even in large meta-arrays.Fig. 15 shows the results for Q parameter of the meta-array with 7 triangular boards and M = 28.It can be seen that the ideal without jitter (green solid line) is overlapped with the the measured values of jitter (black line).The clock non-idealities must be increased by three orders of magnitude to make deviations observable (red dashed line).

VII. CONCLUSION
An A 2 B based, full-digital, and modular microphone array was proposed.A 2 B offers optimal characteristics for automotive and consumer electronics applications such as robust data transmission (up to 40 m), synchronized acquisition, cheap cabling, and expandability thanks to daisy-chain connections.In addition, the design complexity of the proposed microphone array is minimal since data acquisition of digital MEMS microphones and data transmission are carried out by dedicated A 2 B transceivers.Hence, neither A/D converters or programmable devices (e.g., microcontroller, DSP, or FPGA) are required in case of analog or digital capsules respectively.The system architecture allows creating both distributed microphone arrays and meta-arrays, by positioning side-by-side several array boards in two or three dimensions.In such applications the adoption of A 2 B, which communicates by means of UTP cables, allows avoiding bulky wiring between capsules, arrays, and acquisition boards.
The effect of clock propagation in A 2 B bus was studied by modelling latency and jitter.The model is implemented with normal distributions, where mean and standard deviation representing latency and jitter respectively, tuned against experimental measurements.In conclusion, it was possible to assess that the effect of clock propagation can be neglected, confirming that the A 2 B bus is an optimal solution for building arrays of transducers, even when capsules are connected to different A 2 B nodes, as it happens for meta-arrays.
Numerical simulations at finite elements were employed for analyzing several planar geometries, with the aim of optimizing the design of the array in terms of position and number of capsules.Only regular polygon geometries were considered, as they are the most suitable to be combined in two or three dimensions to obtain meta-arrays.Triangular, square, regular pentagon, and hexagon arrays with microphones located on the vertices and in the center of the polygons were studied.Beamforming performance was analyzed by means of two metrics, Q and BW.The triangular array with three capsules in the vertices and one in the center allowed optimizing the beamforming with the minimum number of capsules.As suggested from previous theoretical results, the presence of a central capsule provided an improvement in terms of spatial accuracy, allowing to minimize the deviation between the directivity of the virtual microphone (beam) and the target function (4 th order cardioid).
A prototype of the triangular array was built and measured, employing a full-digital test bench.Good match between numerical and experimental results was obtained, as demonstrated by the superimposition of the directivity polar patterns.The proposed architecture is particularly suitable for different consumer electronics applications.In hand-free phones for meeting rooms, depending on room size, delocalized microphones can be easily connected to the main unit thanks to flexible wiring, thus creating meta-arrays.This can be exploited also in smart home or voice assistant devices, where the flexible positioning of the microphones, with respect to the main unit, can enhance acoustic performance.The proposed architecture guarantees positioning flexibility, ease of connection using a simple UTP wiring, and modularity.
Finally, a concept of planar meta-array was presented, obtained by positioning side by side seven triangular arrays.Numerical simulations were performed and processed by including a growing number of units, and therefore by increasing the number of capsules and the maximum distance between them.The improvement observed in the metrics adopted for evaluating the accuracy of beamforming, Q and BW, demonstrated the potential of meta-arrays, whose implementation is made particularly advantageous by the adoption of the A 2 B architecture.The clock propagation model was applied to meta-array simulations, showing that jitter effects do not worsen beamforming performance.

Fig. 7 .Fig. 8 .
Fig. 7. Prototype of the triangular microphone array with four capsules, three at vertices and one the center.
4 kHz but extended down to 20 Hz, to analyze the Authorized licensed use limited to: Universita degli Studi di Parma.Downloaded on July 26,2022 at 07:19:32 UTC from IEEE Xplore.Restrictions apply.

Fig. 10 .
Fig. 10.Polar patterns of a triangular microphone array with four comparison of numerical (solid line) and experimental (dashed line) solutions.

Fig. 11 .
Fig. 11.Design of a meta-array with seven triangular arrays and 28 capsules.

TABLE II JITTER
AND LATENCY FOR EACH CAPSULE OF THE TRIANGULAR ARRAY Fig. 6.Clock propagation applied to four capsules connected to four different A 2 B chips in daisy chain.