Sound Pressure Field Reconstruction for Airborne Ultrasound Tactile Display Encountering Obstacles

Airborne ultrasound tactile display (AUTD) is used to provide non-contact tactile sensations with specific foci sound fields through the optimization of transducer phases. However, most existing optimization approaches are not directly applicable in case of an inhomogeneous medium, such as in the presence of obstacles between the AUTD and objective sound field. Certain methods can perform optimizations by considering the sound-scattering surfaces of the obstacles to compute the transmission matrix, which requires several complex measurements. This study proposed two methods to reconstruct the sound field under an inhomogeneous medium, wherein the need to calculate the impact of the obstacles was eliminated. The two methods are Bayesian optimization and greedy algorithm with brute-force search. Further, the process of the foci field generation was assumed as a black box. The proposed methods require only the pressure intensity at the control point generated by the input phases, discarding the need for transmission matrix in the presence of obstacles. Moreover, these methods offer the advantage of optimization of the phases in the presence of obstacles. This study explains the working of proposed methods in different forms of foci fields encountering obstacles.


I. INTRODUCTION
An airborne ultrasound tactile display (AUTD) is an array of ultrasound transducers that facilitates individual control of transducer phases [1].The ultrasound waves emitted from these transducers interfere with each other and form a sound field in space.Through appropriate control of the phases, a focus (one control point) or a foci (multiple control points) field can be generated.The single-focus radiation pressure field can be generated by the phases from solving an inverse problem.One example of using a single focus is a mid-air interaction system that allows users to touch a floating virtual screen with non-contact tactile feedback [2].
In addition, the foci field has been used in many applications such as three-dimensional haptic shape [3], [4], [5], [6].In such applications, a linear synthesis scheme (LSS) is a simple method to achieve the foci field by linearly superimposing the phase patterns that form a single focus at a specific position.The foci sound fields can be easily formed by calculating the phase patterns of every single focus.
The conventional LSS linearly superposes the phases of each focus generated by each transducer.If the output of the transducer has no upper limit, the sound field can be generated by conventional LSS perfectly.However, as the transducer has an upper limit, it should be considered.Therefore, the output of the transducer must be normalized to the possible range.LSS assumes that all focal points are in phase, with room for optimization.Consequently, the sound field generated by LSS is generally weaker than that generated by the optimal solution [7].As a result, the problem of obtaining the optimal phases of sound fields has been widely discussed.Long et al. assumed the process of forming a foci field as an eigenvalue problem to optimize the phases and generated three-dimensional haptic shapes in mid-air with foci fields [5].Moreover, certain methods have been proposed to solve the optimization problem at high computational speeds [7], [8].
In general, such optimization methods cannot be utilized in case of the presence of inhomogeneous media in the propagation.For example, if there are obstacles between the AUTD and the target sound field, the measurement of the acoustic characteristics of the propagation (i.e., transmission matrix) is challenging.The impact of the obstacles will affect the form of the sound field and reduce the sound pressure, which makes the tactile experience worse.
Recently, some studies used the boundary element method to model the surface of scattering objects (i.e., obstacles) between the AUTD and field [14].It resulted in the transmission matrix being calculable with obstacles.Conversely, the impact of the obstacles must be precomputed, and if the used acoustic characteristics or the position of the obstacles is not precise, the calculated phases cannot always be optimal.
In this article, we propose two phase estimation methods that allow obstacles between the transducer and the target without specifying transfer characteristics.The proposed method performs phase optimization by observing sound pressure patterns on a specific plane.Even if the acoustic characteristics of propagation, including the effects of obstacles, are unknown, optimization can be achieved by observing the relationship between the phase patterns of the transducers and the corresponding changes in the sound field.By repeating the exploration, the phase pattern with the desired sound field can be found.
Two methods to solve the problem are Bayesian optimization (BO) and greedy algorithm with brute-force search (GABS) [9].BO is a strategy for the global optimization of black-box functions that does not assume any functional forms.Similarly, GABS only observes the inputs and output without considering the effect in the black box.Using these two methods, we generated the foci field in the presence of obstacles without explicitly calculating the transfer characteristics.The process of generating a foci field from phases is assumed as a black box.
Situations that can be realized with the methods proposed in this article are as follows: "there is an immobile obstacle between the ultrasound device and the hand, and the phase to generate the desired pattern with the obstacle at a specific position can be calculated in advance."For example, in a previous study [15] that used airborne ultrasound stimulus and mist to present a cool sensation, the cameras were set beyond the propagation path to avoid influencing the transmission of ultrasound waves.However, due to the mist, the distant cameras had difficulty in detecting the finger position.If the sensor can be placed in front of the fingers without affecting the foci field, the cool sensation will be experienced better.Therefore, our proposed methods optimize the immobile sensor in advance to solve the aforementioned problem.
In this study, to investigate whether optimization from such measurements is feasible or not, the basic principle is examined assuming that the correct sound field is precisely measured using simulations.As the proposed methods are intended to be implemented on a real system with actual measurements as the ultimate goal, we aim to utilize the proposed methods by using a thermal camera [12] or a microphone to measure the sound field in two dimensions in the future.
Reportedly, a method optimized LSS with GABS (referred to as LSSG, "G" means greedy algorithm) to obtain the optimal phases in a homogeneous medium [7].In LSSG, the obstacles were not considered in the phases pattern of LSS at first; however, GABS can be used to optimize the offsets between the phase patterns of LSS for the case involving the presence of obstacles.Consequently, LSSG can be theoretically applied even in the presence of obstacles.The workings of LSS and LSSG are explained in Section II in detail.This study utilized LSSG as the baseline for encountering obstacles.
To evaluate in a realistic computation time, we decided to treat the AUTD with a small array size without the loss of generality.Owing to the arrangement of transducers in the AUTD, changing the number only affects the pressure intensity of the sound field; however, a small number of transducers can generate sound fields normally.Herein, we first performed the simulations in a 4 × 4 array in a homogeneous medium, followed by another in an inhomogeneous medium.

II. METHODS
This section first presents a brief introduction of LSS, followed by short introductions regarding the working of BO and GABS.

A. Linear Synthesis Scheme
LSS generates the foci field by linearly synthesizing each focus signal.Thus, the phase set that generates each focus must be calculated.
Let the phase set q be defined as where φ i is the phase of i-th transducer.Assuming the sound emitted from the transducer is a spherical wave, the acoustic field p(r) generated by M transducers is expressed as: where k is a wavenumber, and r i is the position of i-th transducer.
When creating a focus at a position r f , we should set φ i as follows: Here, let the phase sets of the transducers that generate the focus at r f be q(r f ), that is, By linearly synthesizing the phase set that generates each focus, we can generate N foci located at r f 1 , . .., r f N .
As providing the same offset to one phase set does not change the field, the phase of each focus can be optimized.Therefore, the phase set q, which generates N foci, is represented as where o n ∈ (0, 2π] is the phase offset of n-th focus.Theoretically, as there is a degree of freedom for the offsets, setting all the offsets to 0 is the fastest way to form a foci field; however, since conventional LSS does not consider an upper limit to the transducer output, the sound pressure may become weaker.We used GABS (introduced later) to handle the optimization of the offsets.Instead of the conventional LSS, we used LSSG as it can form a better foci field.
The calculations for LSS (or LSSG) were performed using (5); however, the amplitude is set in the range of [0, 1].Further, the normalization is as follows:

B. Bayesian Optimization
Instead of using the single focus phase set, we aimed to explore the optimal phases φ = {φ 1 , φ 2 , . . ., φ M } that construct a better foci field by observing the input phases and output pressure pattern.
Assuming the existence of an unknown objective function f (x), BO was used to find an input x * to maximize the model parameter (pressure), that is, x * = arg max x∈X f (x) [10].As the objective function is unknown, BO treated it as a black box and used a prior to evaluate it.Within the iterations, the prior was updated to form the posterior distribution over the objective function.The most common method used to define the posterior distribution is Gaussian process (GP).The mean function of a GP is usually set as 0. Further, the covariance function (i.e., kernel function) affects the observations of BO as well, and the Matérn 5/2 kernel is commonly used.The prior was used to perform the search and these data D induced a posterior over the function; however, there is still the need for a method to determine the next point to be evaluated via a proxy optimization x next = arg max x a(x), where a(x) is referred to as the acquisition function.Several different functions have been proposed as the acquisition function.Herein, we introduced the definition of the expectation of improvement (EI): where the maximum observation is at x + .That is, x + = arg max x∈x 1:t f (x).μ(x) is the mean of the GP, while σ(x) is the standard deviation.Further, Φ(•) is the cumulative distribution function and ϕ(•) is the probability density function.In addition, Z is expressed as We set the inputs of BO as the phases φ i ∈ [0, 2π) of each transducer of the AUTD, and the output as the sound pressure p at the target foci position.Based on the procedures, we can apply BO (Algorithm 1) to explore the optimal phases.

C. Greedy Algorithm With Brute-Force Search
The greedy algorithm, used in optimization problems, breaks the problem into sub-problems and searches for the local optimal solution independently at each step.
Suzuki et al. [9] formulated the reconstruction problem as a combinational problem through sampling from the original continuous [0, 2π) phase space in an equal and discrete manner L. They treated each transducer in the AUTD as a separate problem and then applied the greedy algorithm.First, the initial amplitude output of all transducers was set as 0. Starting from the first explored transducer, the amplitude output was set as 1, and from the L equally divided phases, the phase φ l i with the largest sound pressure at the target foci position was determined.After obtaining the local optimal phase of the transducer, the output of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Algorithm 1: Bayesian Optimization for Sound Field Reconstruction. 1: Find q t by optimizing the acquisition function over the GP: q t = arg max q a(q|D 1:t−1 ), where D is the observation data.

5:
Sample the objective function with the acquisition function: p t = f (q t ) 6: Augment the data D 1:t = {D 1:t−1 , (q t , p t )} and update the GP 7: end for Algorithm 2: Greedy Algorithm With Brute-Force Search. 1: for l = 0, . . ., L − 1 do 6: obtain p l i = p(q 0 , . . ., q i−1 , e jφ l i ), where p is the current sound pressure.7: end for 8: q i ← e jφ l * i s.t.p l * i = max{p l i } 9: end for transducer's amplitude and phase was recorded, and the next transducer was explored.These steps were repeated until the phase exploration of all transducers in the AUTD was completed (Algorithm 2).

III. EXPERIMENTS
Here, we briefly summarize the experimental procedure followed in this study.First, we used the proposed methods to explore the phases, which were then used to calculate the sound pressure field on a specific observation plane in the simulation.Subsequently, the pressure pattern was used to solve the optimization problem by optimizing the pressure in the foci position and updating the phases.These steps were iterated as required.
The simulation was performed using the K-wave library [11].The simulation space was a cube with a side length of 100 mm.The air density and speed of sound were set as 1.293 kg/m 3 and 340 m/s, respectively.The 4 × 4 transducers array was placed on the plane x = 1, and the center of the array was (1, 50, 50) at the center of the yz plane.The interval between each transducer was 10 mm.Further, the observation plane was set at x = 80 mm, parallel to the transducer array.In addition, the lengths of the planes were y = (0, 100) mm and z = (0, 100) mm.The homogeneous medium is shown in Fig. 1(a).We simulated two situations in an inhomogeneous medium.One of the obstacles used was three cubes with a side length of 10 mm.The three cubes were arranged in a stepped shape (Fig. 1(b)), where the centers of the three cubes were all at the position of x = 41 mm, and the yz plane coordinates were (60, 40), (50, 50), (40, 60).Another obstacle used was two cubes arranged in sequence, where the larger cube had a side length of 16 mm with the center coordinates at (41, 50, 50); the small cube had a side length of 6 mm with the center coordinates at (30, 50, 50) (Fig. 1(c)).The position and shape of the observation plane were the same as that of the homogeneous medium environment.To show the shape of the obstacles more clearly, the observation planes were not plotted in Fig. 1(b) and (c).The density of each cube was 100 kg/m 3 and the speed of sound was 340 m/s, implying an acoustic impedance approximately 100 times higher than that of air.The experiments were run on a desktop computer with an Intel(R) Core(TM) i9-12900KF CPU @3.20 GHz to 5.2 GHz CPU and an NVIDIA(R) GeForce RTX(TM) 3090 Ti 24 GB GDDR6X GPU.The BO code was executed in Python library [13], while GABS was performed in MATLAB.
The target field in our experiments was a 4-foci field (Fig. 1(a)).The foci were on the periphery of a circle with a radius of 30 mm.The target field was generated in both homogeneous and inhomogeneous mediums using the three methods.LSSG was executed for one experiment in each situation.In addition, we performed 50 experiments in each situation using BO and GABS.
We used EI as the acquisition function in BO.The iteration number was 256.The input dimension was M = 16, based on the 4 × 4 array.Due to the periodicity of the phases, the time-averaged sound pressure does not change, when the same offset is added to all transducers.To avoid this situation, we fixed the 7-th phase value as 0 within the optimization.Moreover, we discretized the exploration phase in GABS as L = 16 equal divisions, such that the iteration number of GABS was also 256.As LSSG is based on LSS and GABS, the iteration number depends on the foci number and the equal divisions.Herein, the foci number was 4, and the equal divisions L = 16 were same as GABS; therefore, the total iteration number of LSSG was 64.Additionally, from (5), note that the phase patterns calculated by LSS are used for LSSG; wherein, the obstacles are not considered at first.Next, GABS is used to optimize the offset for the case involving the presence of obstacles.Therefore, LSSG is applicable even in the case of obstacles.
LSSG, BO, and GABS all aimed to maximize the pressure in the foci position.The objective function of the two proposed methods was maximize f , where f = min(p n ), and n is the n-th focus.This objective function was used to find the phase that achieved a stronger sound pressure at the focus position where the pressure was the weakest in the current exploration.

IV. EVALUATION
To better observe the difference between LSSG and the proposed method, we compared one of the 50 experimental results generated by BO and GABS in each case.Fig. 2 shows the results of different methods.The range of the color chart changed according to the output results.LSSG formed a 4-foci field; however, the pressure in the foci position was weaker.In contrast, the two proposed methods formed strong amplitude points.
In the presence of three-cube obstacles, the pressure generated by LSSG was much weaker, with a few peaks being formed.Using the proposed methods, the 4-foci still formed clearly, and the pressure was stronger similar to that in the case of without obstacles.In the second obstacle situation, LSSG generated the 4-foci field successfully, but the pressure was still weak.In contrast, the proposed methods generated the 4-foci field with strong pressure.
The average pressure of the field in each situation is shown in Fig. 3(a).In the case of LSSG, only one experiment was carried out because the result would not change if it were repeated.While both BO and GABS were used for 50 trials.As evident, LSSG always formed a low-pressure field in both homogeneous and inhomogeneous mediums.In the presence of obstacles, the average foci pressure of LSSG was much weaker.We used the phase patterns that were calculated by LSS, wherein the obstacles were not considered at first; however, we used GABS to optimize the offset (i.e., the phase patterns) for the case involving the presence of obstacles.However, the average foci pressure of LSSG was still at a lower level.In contrast, BO and GABS maintain a strong pressure even when encountering obstacles, with the average pressure of the foci field generated by GABS found to be stronger than BO.We evaluated the area of the focus generated by our proposed methods in Fig. 3(b) (We did not discuss LSSG because of the weaker pressure).We set the threshold above √ 2 of the average sound pressure at objective foci positions as the foci area, and Fig. 3(b) shows the average value of one focus point.Note that in the presence of obstacles, GABS can generate a foci field with a smaller spreading area.Fig. 3(c) shows the standard deviation among the focus amplitude.As evident, the standard deviation among foci of BO was lower than GABS.
Regarding the calculation time, as we executed the BO algorithm in python, the MATLAB library must be called when calculating the sound field pattern during the exploration process.Further, the exploration of GABS was conducted in MATLAB; thus, we will discuss the computational cost base on theoretical time complexity and the exploration number required for convergence.The cost of BO depends on the observed data points (i.e., iteration number t) in GP, which is O(t 3 ).Meanwhile, the cost of GABS depends on L equally divided phases.Additionally, the cost of forming a foci field is determined by M number of the transducers, N number of the foci: O(MN ).That is, the time complexity to optimize a foci field by BO is O(t 3 + tMN ), while GABS is O(LMN ).Futhermore, LSSG optimized the offset sets based on GABS.In other words, the time complexity depends on the number of foci and the equally divided phases: O(LN ).For the exploration number required for convergence, BO may optimize the phases within the iteration number.In contrast, GABS and LSSG have to run until all the divided phases and transducers have been explored.Therefore, we evaluated the convergence of the three methods as well in Fig. 3(d).
V. DISCUSSION BO and GABS can both explore the optimal phases even in the presence of obstacles between the AUTD and observation plane.The results indicate that GABS always generates a strong pressure field, which can provide a strong tactile sensation.In the absence of obstacles, the focus areas generated by BO and GABS are at the same level.However, when obstacles exist, the area spread of the focus generated by GABS is smaller than that generated by BO.The standard deviation among each focus generated by BO was lower than that generated by GABS.This implies that the sound pressure generated by BO between focus points is closer, while GABS produces a more focused and high-intensity control point.However, BO requires greater computational time to handle the optimization than GABS in theory, but it may complete the optimization within the set convergence times.Further, LSSG requires the least time, although it yields the weakest pressure.
The simulation was performed based on a 4 × 4 transducers array.If the number of transducers is increased, GABS still exhibits a stable performance as reported in [9].In contrast, conventional BO does not perform well when the number of dimensions is higher than 20.At present, improved Bayesian optimization methods have greatly resolved this problem [16].However, the purpose of our paper is to discuss the optimization effect of BO in generating the foci field under homogeneous and inhomogeneous mediums, rather than evaluating the effect of a specific Bayesian optimization method.Consequently, we used the conventional BO to experiment with a 4 × 4 array in our study.However, BO still can be applied in large scale scenarios.

VI. CONCLUSION
In conclusion, we assumed the acoustic impedance of the propagation process as a black box and explored the optimal phases based on the pressure intensity at the control points by BO and GABS.Using the proposed methods, the sound field was reconstructed regardless of the presence of obstacles between the AUTD and the target sound field.Compared with LSSG, the proposed methods ensured the shape of the foci field and maintained a strong sound pressure in an inhomogeneous environment.The average pressure of GABS was higher than that of BO, but the standard deviation of sound pressure generated by BO was smaller.The spreading area of the foci generating by GABS was smaller, while BO might converge within the selected iteration numbers.We expect that even if there is some kind of obstacle in the propagation path, or interference, such as the cameras, our proposed methods can provide a stable and good haptic experience.Currently, we did the experiments base on a 4 × 4 transducers array in simulation.In the future, we will conduct our experiments in the real world with the AUTDs by observing the ultrasound pressure pattern using a thermal camera.

Fig. 2 .
Fig. 2. Results of the sound field in different situations using different methods.