PAX (Passive–Active Crossing) Method for Sub-Millimeter Coregistration of Passive Acoustic Mapping and B-Mode Images

Nonlinear ultrasonic emissions produced during a therapeutic ultrasound procedure can be detected, localized, and quantified through a class of methods that can be referred to as passive acoustic mapping (PAM). While a variety of PAM beamforming algorithms may be employed, they share a common limitation that a single sound speed is specified for both phase steering of array elements and for calculation of source power or energy. The specified value may be inadequate whether derived from B-mode-based metrics or literature values for constituent materials. This study employed experiments and simulations with linear and curvilinear array geometries to investigate the impact of in situ sound speed uncertainties on source localization in layered media. The data were also used to evaluate a new method for optimizing coregistration of PAM and B-mode images. Coregistration errors as large as 10 mm were observed with the curvilinear array, which also showed much greater sound speed sensitivity than the linear array. Errors with both array geometries were typically reduced to the order of 0.1 mm using the proposed optimization method regardless of beamformer choice or whether the array was calibrated. In a further step toward reliable implementation of PAM, the current work provides an approach that can help ensure that therapeutic ultrasound procedures are accurately guided by cavitation emissions.

therapeutic ultrasound field can be detected, localized, and quantified through a class of methods that can be collectively referred to as passive acoustic mapping (PAM). The resulting images, which are formed based on one-way propagation rather than from tissue backscatter, do not inherently contain information about tissue structure. To provide context for treatment guidance and monitoring, PAM images must be paired with anatomic data from other imaging modalities. Accurate coregistration of PAM and anatomic image data is therefore a critical part of the overall treatment monitoring scenario.
PAM beamformers steer array elements to a location in the imaging domain using a single sound speed, leaving the process susceptible to phase biasing and variability errors induced by propagation in complex tissues. These errors may be mitigated through correction of individual element data prior to beamforming, as has been described in detail for transcranial imaging with 3-D [14] and conventional 2-D arrays [15]. For imaging in soft tissues not encased by bone, errors in sound speed have previously been discussed in the context of handheld array lens refraction [16], the estimation of energy and path attenuation [17], and the use of multiple arrays [18]. While these effects have primarily been described for PAM beamforming in homogeneous media with linear arrays, there has been relatively little exploration of other (e.g., curvilinear) geometries or of strongly refracting tissue paths that may induce substantial phase aberration in the received signals.
The issue of image registration can be illustrated in a simplified form as follows. When using a single array, automatic alignment of PAM and B-mode methods has previously been assumed [19] on the basis of shared array elements. However, the two imaging modalities are mechanistically different, as illustrated with a point scatterer example in Fig. 1(a). In its simplest form, B-mode interprets the absolute time of flight (τ ) to and from a scatterer through the use of an assumed sound speed value (c est ). For the simplified case of an A-line path from the j th element of a J -element linear array at (x rcv, j , 0) to a scatterer at (x sct , z sct ) with identical lateral positions (x sct, j = x rcv ), the time of flight is where c o is the actual sound speed.  (2) and (6) with a true depth of z sct = 50 mm and sound speed errors of up to ±3% This leads to a B-mode-based depth estimate (z pk,B−mode ) of z pk,B−mode = c est τ m /2 = (c est /c o )z sct . (2) Rather than using absolute time of flight, passive beamformers localize scatterers based on wavefront curvature that manifests as relative delay/phase across the J elements of the array. For a scatterer located at (0, z sct ), the relative delays on the array may be expressed as where x rcv and τ are sized ( J by 1). Writing the square root term in (3) as a series expansion to second order under the assumption that x rcv z sct gives τ = z sct + x 2 rcv /2z sct − z sct /c o = x 2 rcv /(2c o z sct ). (4) For an assumed sound speed c est , the passive beamformer reports a peak output at a depth z pk,PAM that minimizes the difference between the actual and estimated relative delay distributions over the array, leading to so that Physically, this relationship is a consequence of how wavefront phase changes with sound speed for a fixed scatterer and receiver geometry. A high estimate of c est decreases the apparent relative delays between elements and flattens the phase curvature across the array. Therefore, the phase curvature seen in the data is only recovered when the source position is moved closer to the receivers. Although illustrated here with simplified expressions, the localization behavior with respect to sound speed holds for a range of array geometries and target locations, as will be presented in Section III.
Comparing the depth estimates for B-mode and PAM in (2) and (6), respectively, it is clear that their sensitivities to sound speed errors act inversely. This relationship is represented graphically in Fig. 1(b) for a scatterer at z sct = 50 mm. By way of example, a choice of c est that is 2% high (c est /c o = 1.02) will cause a 1-mm increase in the B-mode depth and a 1-mm reduction in the PAM depth. The net effect is to amplify the impact of sound speed uncertainty by moving time of flight (B-mode anatomic) and relative delay (PAM) features away from each other. However, we hypothesize that this behavior can be harnessed to estimate an in situ sound speed that coregisters B-mode and PAM image information regardless of array geometry or propagation environment.
Through a combination of simulations and experiments, the present study investigates how: 1) sound speed errors lead to divergent range estimates from PAM and B-mode imaging; 2) the sensitivity to sound speed uncertainty is dependent on array geometry; and 3) coregistration of PAM and B-mode may be ensured through a simple algorithm that operates on conventionally acquired array data.

II. MATERIALS AND METHODS
This section begins with a description of experiments performed with two common array geometries in layered media in order to assess the impact of sound speed uncertainty on image coregistration. This is followed by descriptions of companion ray-tracing simulations and PAM beamforming implementation. This section concludes with the proposed procedures for optimizing the sound speed passed to PAM and B-mode algorithms so that their images are coregistered.

A. Media Characterization Experiments
Range estimation and sound speed optimization experiments were carried out using two conventional handheld diagnostic arrays: 1) L7-4 linear array with 0.30-mm element spacing (ATL Ultrasound, Bothell, WA, USA) and 2) C5-2v curvilinear array with 0.51-mm element spacing on a 49.57 mm radius (Verasonics, Kirkland, WA, USA). Signal production and reception capabilities were facilitated by a research ultrasound system (Verasonics Vantage 256). All experiments employed a scattering target consisting of a 10 cm length of 0.10-mm-diameter Tungsten wire (Goodfellow Cambridge Ltd., Huntington, U.K.) held taut in a Perspex u-shaped fixture.
The arrays were operated in single plane-wave transmission B-mode in order to illuminate the target. The resulting backscatter data were acquired with B-mode image reconstruction running in real time and raw receive data saved to disk for further processing (see Sections II-C and II-D). Data were collected for array centerline to target distances of 35-95 mm, with the arrays oriented orthogonally to the wire length in order to minimize the scattering aperture.
The array experiments were performed with two propagation media configurations (Fig. 2). The first consisted solely of filtered water, with the aim of aligning and confirming target positions with the aid of sound speed estimated based on water temperature [20].
Monitoring of liquid temperatures for this and subsequent media configurations was provided by a thermocouple (Omega Engineering HYP2 type-T, Manchester, U.K.) and a dedicated data logger (Pico Technologies TC-08, St. Neots, U.K.).
In the second configuration, water was replaced with a 22.5% dilution of glycerol (Sigma Aldrich 49781-5L, Gillingham, U.K.) in filtered water in order to provide a sound speed in the published range [21] of muscle ∼1594 m/s. A 2-cm layer of gel wax (Fullmoons Cauldron, Ascot, U.K.) was placed in contact with the face of each array to mimic the sound speed contrast and refraction conditions between human muscle and peripheral fat. The gel wax was prepared first by melting and repeated cycling under vacuum to remove entrapped air before pouring into an 8-cm-diameter, 2-cm-tall mold. The molded gel wax was held on its periphery against the array faces using a plastic fixture (not shown) well away from the array elements.
Independent sound speed estimates for all materials used in the above experiments were found using through-transmission measurements made with a pair of 3.5-MHz center frequency unfocused transducers (Panametrics V384, Evident, Stansted, U.K.). The transducers were configured to face each other and were mounted to digital calipers (CD-15DAX, Mitutoyo, Andover, U.K.) in order to provide a thickness measurement when the transducers were pressed against the sample [ Fig. 2(c)] [22], [23]. Liquids were placed in a 0.25-L clear plastic bag for testing, while the gel wax was tested in air. For all measurements, transducers were coupled to the test media using ultrasound gel (Aquasonic-100, Parker Laboratories, Fairfield, NJ, USA). Signal generation and reception capabilities were provided by a pulser-receiver (Panametrics 5900PR, Evident). For each sample, data were acquired at three different positions (on the gel layer or the bag), with the media compressed to three different thicknesses per position. Sound speed at each position was found from a linear fit to the signal arrival time versus thickness data, the process of which removes the contribution of the bag used in the fluid measurements. Temperature was monitored using a thermocouple (Omega type K) mounted on one of the caliper blocks and monitored with the same logger used for the scattering experiments.

B. Array Calibration
To assess the importance of array element response variability for the present study, each array was calibrated as a function of scatterer depth using a substitution technique [17]. Briefly, a single-element focused transducer (Olympus V380) was driven by a pulser (Olympus 5900 PR) to provide a spatially concentrated and broadband (2)(3)(4)(5)(6)(7)(8) illumination of the same wire fixture described in Section II-A. The resulting scattered field was first scanned with a calibrated needle hydrophone (HNA0400, Onda Corporation, Sunnyvale, CA, USA) whose directivity had been separately measured. The arrays were then placed in the wire-scattered field in turn, and the voltage-to-pressure sensitivity of each element M j ( f, z sct ) was computed as a function of frequency ( f ) for a set of wire target positions centered on the array midline. Magnitude and phase information was retained as a function of frequency so that the full effects of the array elements and the covering lens were incorporated in the calibration.
These calibration functions were applied to the array element responses v j (t, z sct ) in the backscatter experiments (Section II-A) by: 1) applying a 6th-order Butterworth bandpass filter B[·] (3-8 MHz for L7-4 and 1-6 MHz for C5-2v) in the forward and reverse time directions to reduce out-of-band noise while eliminating filtration delay; 2) applying a Fourier transform; 3) dividing by the complex-valued sensitivities; and 4) inverting back to the time domain This last inversion step is needed purely for compatibility with the time-domain beamformers used in this study.

C. Simulations
To support simulations of PAM and B-mode behaviors for the experiments described in Section II-A, propagation delays between array elements and scatterer positions were calculated using conventional ray tracing [24] in the xz plane using custom code written in MATLAB (Mathworks, Natick, MA, USA). Array elements were treated as point receivers located at manufacturer-specified positions. Sound speed values for external media (diluted glycerol and gel wax) were taken from the results of the caliper testing and are presented in Section III. The path average sound speed for each array and media combination was found from where the "glyc" subscript refers to diluted glycerol, "gw" refers to the gel wax, and the thickness terms h compose a straight-line path from the scatterer to the nearest array element so that r sct = h glyc + h gw . The lens was excluded from the simulations, as it was assumed that each array was perfectly calibrated and therefore incorporated a lens correction.
The computed ray-tracing delays were used to generate synthetic time-domain datasets for use in PAM beamforming simulations (see Section II-D) of the configurations described in Section II-A. These simulation datasets were formed first by creating an array-specific baseline short pulse and then by propagating the pulse from each scatterer to each array element location using the information from the ray-tracing calculations above.
A baseline short pulse signal s b (t) representative of a typical B-mode transmission was created from a tone pulse with center frequency f ctr and duration matching those that were observed in the experiments: 3.5 and 5.5 MHz, respectively, for the C5-2v and L7-4, each with 1.36-μs duration The pulse envelope was modified with a Hanning window W to emulate the experimentally observed shape. The baseline signal was further shaped with a 4th-order Butterworth bandpass filter to emulate the band limits of the array element frequency response indicated by the manufacturer. The final baseline signal was normalized to have unity pressure amplitude.
To form a synthetic dataset of received pressures p sim, j on each of the j = 1:J array elements from an individual scatterer, the baseline signal was propagated from the scatterer location to each array element using the delays τ j and path lengths l j found from the ray-tracing simulations. A planewave delay τ pw was included to account for the travel of the signal emitted from the array to the scatterer. For simplicity, this delay was taken to be the time from the scatterer to the closest element of the array aperture (e.g., array centerline for an on-axis target) so that the backscattered signals were Finally, Gaussian random noise was added to the signals in (10) so that its root-mean-square value was 0.01 times that of the largest signal. This scaling approximated the upper bound of the broadband electronic noise observed in the experiments.
B-mode spatial responses were calculated by: 1) cross correlating the receiver data from (10) with the baseline pulse in (9); 2) extracting an envelope by applying a Hilbert transform to the cross correlation; and 3) interpreting time of flight as distance using assumed sound speeds that are used for both B-Mode and PAM processing. The sound speeds used with the simulated data were the same as used with the experiment data.
To keep the focus of this work on issues related to signal phase, layer reflection and path attenuation terms were neglected. The significance of this approach is discussed in

D. PAM Beamforming
This study employed two well-established PAM algorithms: a conventional beamformer [time exposure acoustics (TEA)] [25] and an adaptive beamformer [robust Capon beamforming (RCB)] [26]. Inclusion of both facilitates comparisons of source localization performance weighed against tradeoffs in computational burden and spatial resolution.
The algorithms have been implemented in the time domain and produce estimates of monopolar source energy E(r) at an imaging location r where t 0 is the data record start time, T is the signal window duration, ρ and c are the density and sound speed of the ambient medium, respectively, and bold-type quantities indicate vectors. The estimated source strength q is defined by a sum over the array elements where p(t, r j ) refers to the array response signals received at element locations r j , r s = r j − r is the distance from the j th element to the observation point, |r s |/c is a geometric steering delay, c is the sound speed chosen for the calculations, and w j are the weights applied to each element. The beamformer versions used for the present work applied the steering delays explicitly rather than as discrete sample period shifts in order to eliminate phase noise in the steering process. While the weights in (12) are set to unity for TEA, they are independently optimized by the RCB algorithm for each imaging location and each new dataset. A single input factor (a real, positive number) is specified to control the degree to which weights may be adjusted: increasing allows greater flexibility in array weight adjustment. A broader discussion of the selection of epsilon and its impact on RCB performance can be found elsewhere [26], [27]. Table I provides a full listing of the PAM parameters employed in this study. A value of = 1 was used for simulations and experiments when array calibrations had been applied. When the experiment data were left uncalibrated, values of = 12 (L7-4) and 25 (C5-2v) were applied after a preliminary processing review of the data. These higher values are consistent with prior studies [16] conducted with uncalibrated arrays.

E. B-Mode Experiment Processing
B-mode image processing for the experiment data was handled directly through the native scripts provided by and running on the array controller (L7-4Flash.m and C5-2vFlash.m, Verasonics Vantage version 4.5.4). No alterations were made to the reconstruction processing other than to evaluate the backscatter data over a range of sound speeds for use with the PAX analysis (Section II-E). The position of the spatial peak was found from the reconstruction matrix (intensity as a function of lateral and axial position) produced by the B-mode scripts.

F. PAX Method
Given the divergence of PAM and B-Mode range estimates in the presence of increasing sound speed uncertainties, it is hypothesized that this behavior could be used to coregister the two modalities. Specifically, if both imaging methods are evaluated using a modest range of sound speeds, the intersection of the speed-target depth curves should identify the coregistering sound speed. As seen in Fig. 1(b), "x marks the spot" where the sound speed is optimized for coregistration. The proposed scheme, named "passive-active crossing" (PAX), is outlined as follows and shown in Fig. 3.
Step 1: Perform a B-mode scan of the region identified for imaging (e.g., treatment volume).
Step 2: Identify a feature within a region of interest (e.g., bubble hyperecho or another echoic feature).
Step 3: Estimate the depth of this feature using B-mode and PAM for a small number (3-4) of candidate sound speeds c est,n bounded by literature or otherwise expected values for the fastest and slowest tissues. The calculations can be performed over a condensed imaging domain in order to accelerate the processing, particularly in the lateral dimension.
Step 4: Find the intersection of the resulting speed-depth curves-this defines the speed-depth pair (c pax , z pax ) that coregisters the two imaging methods for the selected scattering feature, and c pax can be used for subsequent imaging in the region of interest.
Step 5: Repeat steps 2-4 for other regions of interest if needed in order to profile the treatment volume for the therapeutic ultrasound procedure.
To summarize, backscatter data acquired during a simple imaging measurement are processed actively (B-mode, absolute time of flight) and passively (PAM, relative phase) to estimate the speed that ensures coregistration in a region of interest. This speed can then be used in subsequent therapeutic ultrasound exposures for accurate mapping of cavitation activity to the underlying tissue structure. The optimization steps, particularly 3) and 4), may be further simplified in some cases as described in Section III.

A. Curvilinear Array
The presentation of results begins with the specific example of a calibrated curvilinear array imaging through the gel wax layer and diluted glycerol to localize a target whose nominal position is (x, z) = (0, 55) mm, with all PAM results generated with the RCB algorithm. Fig. 4(a) shows the PAM and B-mode on-axis normalized intensities resulting from the assumption of a uniform sound speed equal to the path average (8) as derived from the media measurements described in Section II-A. The image registration error (distance between peaks) is 4.5 mm, which is clearly discernible relative to the individual image widths. After PAX optimization [see Fig. 4(c)], the beam peaks in Fig. 4(b) are aligned within 0.1 mm.
The trend of the PAM result being too shallow when computed using the path average speed implies that excess refraction delays through the layered media make the apparent speed slower than the path average. This behavior [see Fig. 4(d)] is consistent across all depths, as expected, and is well-captured by the simulations [see Fig. 4(e) and (f)]. The above results raise the question as to whether the ability to coregister images is dependent on array calibration or beamformer type. The answer is presented in Fig. 5, where path average and PAX optimized registration errors are shown for all four on-axis target positions, and with the data processed both with and without calibration, as well as for both TEA and RCB beamformers. Bars heights illustrate the registration errors, with numerical values listed atop each bar to the nearest 0.1 mm. To cover the calculated range of values, the displays are shown on a logarithmic scale.
Axial registration errors increase with increasing depth (up to 12.5 mm at 95-mm target depth), which are slightly larger when the arrays are uncalibrated but are not strongly dependent on beamformer type. Regardless of all these variables, PAX optimized coregistration errors of the axial image coordinate are typically no greater than 0.1 mm. Although unlikely to be of practical concern, the small residual misregistration occurs because the optimization as proposed here is noniterative. Once a handful of possible speeds are reviewed, a single estimate of the actual speed is made. Inclusion of an iteration step is certainly possible, but the error sizes do not appear to warrant further computations.
All results presented to this point have been for on-axis targets, but cavitation monitoring is routinely performed over a lateral span that at minimum captures the width of the therapeutic ultrasound beam employed to instigate bubble activity while also allowing for observation of unexpected off-target responses. Fig. 6 shows the peak locations of the PAM and B-mode images in the 2-D imaging plane for targets at ideal lateral offsets of 0.0, 5.0, and 10.0 mm from the array centerline, as calculated with the path average (left) and PAX-optimized (right) speeds. Numerical values indicate the 2-D distance between image peaks.
The results for wider lateral positions generally follow the on-axis trends, and all axial registration errors are 0.1 mm or less. However, there are small residual lateral displacements that typically drive the total optimized error to 0.3 mm or less for all but one case: for a nominal target position of (10, 35) mm, the optimized PAM result increases the lateral offset from the B-mode by 0.8 mm. For this target position, array elements on the contralateral side of the imaging plane are heavily shadowed, and in trying to use them, the adaptive beamformer creates a distorted image. This problem can be mitigated either by increasing the RCB epsilon factor (raising ε from 1 to 2 reduces the total registration error to 0.8 mm) or by allowing the beamforming aperture to rotate with imaging position as is routinely done with B-mode. Doing so comes  at a cost of reduced spatial resolution, and optimization of this aspect of PAM beamforming is the subject of ongoing research.
The foregoing presentation has compared results obtained with PAX-optimized sound speeds with those found using the path average speed, which should be near-optimal for absolute time-of-flight-based imaging. Of course, this path average may not be known a priori, and it is convenient in practice to assume a path-independent speed to simplify the imaging calculations. The impact of fixed speed choice on image coregistration is shown in Fig. 7, where two fixed speeds are added to the depth-dependent results first presented in Fig. 4(d). As shown in Fig. 7(b), a fortunate choice of a fixed speed of 1510 m/s in this example limits the registration error (z) to be within 2 mm over a depth span of 35-75 mm. By comparison, the choice of 1540 m/s leads to the largest observed error at the shallowest depth.
The foregoing examples were conducted in a configuration intended to demonstrate refraction effects in a geometry with modestly thick fat-like layer (2 cm) and an exterior environment with a measured sound speed of 1594 m/s, which is at the upper end of reported values for muscle and liver. The combination of media with a high but still realistic sound speed contrast and a thick fat layer serve to exacerbate the refraction-induced delay changes on the peripheral elements of the arrays. Examples of misregistration under less-refractive conditions were evaluated using the simulations described in Section II-C. As expected, the results (Supplemental Materials) show a gradual decrease in registration error as the fat layer is thinned and the speed of the distal medium is slowed.
The results of this section demonstrate the high sensitivity of measured and simulated curvilinear arrays to uncertainties in path sound speed, particularly as it impacts localization with passive (relative phase) imaging algorithms. At a target depth of 95 mm, the largest examined, image peak locations for PAM and B-mode are misregistered by more than a centimeter when using the path average speed, which provides an accurate B-mode localization performance. The proposed PAX algorithm typically reduces PAM-B-mode registration errors in the imaging plane to less than 0.3 and 0.1 mm in depth.

B. Linear Array
Relative to the curvilinear array, linear arrays, including the geometry examined here, have considerably smaller path angles between the array elements and imaging targets. It would therefore be expected that refraction effects would be much weaker with the linear array, leading to smaller coregistration errors. This is confirmed by the measurement and simulation results in Fig. 8, where the path average speed is seen to give an offset between image peaks of less than 1 mm. PAX optimization provides improved registration, and the optimized speeds are on the order of 0.5% above the path averages, which is rather small compared to the range of 2.2%-2.5% seen with the curvilinear array. Again, the measurement results are well supported by the simulations.
The optimized speeds for both simulation and measurement trends are slightly higher than the path average. Recalling that the path average defined here (8) is for the direct line between the target and closest array element (e.g., to the centermost element from a target lying on the array centerline), the higher optimized values simply are indicative of the longer ray paths to the peripheral elements. Since refraction from the fast to slow media shifts incoming rays closer to normal incidence on each element (Supplemental Materials), each ray segment in the faster media is slightly lengthened. This in turn acts to elevate the array-mean speed to the target relative to the previously defined path average. In other words, the path average speed is applicable for single element or small aperture processing, while the optimized value is applicable for beamforming with the full array in refracting media.
Extending the processing to targets at x = 0, 5, and 10.0 mm relative to the array centerline (see Fig. 9) reinforces the notion that the path average speed works well, with maximum While the path average speed works well for coregistration with a linear array, this speed value may not be known in advance, as pointed out earlier. The use of depth-independent speed estimates can still lead to substantial coregistration errors (see Fig. 10) depending on the constant value chosen. Therefore, even with the reduced refraction sensitivity of the linear array, PAX optimization will likely provide enhanced coregistration and take the "guesswork" out of the process. Simulations performed with the linear array geometry  For the imaging scenarios shown here, the PAX evaluation of a single imaging line at four candidate sound speeds takes approximately 0.1-0.01 (depending on the image domain used) of the time to form the full image on a common desktop computer. Small as this is, it may be possible to further accelerate and simplify the process.
For a linear array in homogeneous media, the rearrangement of (2) and (6) reveals that the true sound speed can be found directly from z pk,B−mode and z pk,PAM evaluated at any single sound speed c est . Specifically, c o = z pk,PAM /z pk,B−mode 1/2 c est .
Evaluating this expression with the on-axis linear array data above (even though the media are inhomogeneous) leads to speed estimates (c o ) that differ from the full optimization by no more than 4.4 m/s for all depths and initial speed guesses (c est ), with corresponding registration errors no greater than 0.3 mm. It therefore appears that (13) could be used for a one-step version of PAX, where beams need only be formed with a single value of c est rather than multiple values employed for the full optimization.
In summary, the results of this section showed that the linear array geometry was considerably less sensitive to path sound speed uncertainty than the curvilinear array. Still, when the propagation environment is not well known, either the full optimization or one-step versions of PAX reduce coregistration errors to sub-millimeter levels.

IV. DISCUSSION
This study has shown that innately accurate coregistration of targets imaged with absolute time of flight and relative phase processing cannot be safely presumed, even when employing the same array for both ultrasound-based calculations. Even the most thorough calculation of cavitation dose through improved beamforming, calibration, and attenuation correction would be of markedly reduced value if the region to which the activity is attributed does not reliably map to the targeted tissues. The proposed PAX algorithm is intended to ensure that ultrasound-based treatment monitoring can provide accurate treatment guidance regardless of array geometry, calibration, or beamformer choice.
Although evaluated here with several combinations of geometry, environment, and beamforming algorithm, there are many more variations to consider. For example, all simulations were 2-D, but of course, refraction can also occur in 3-D. A mitigating feature is the elevation focusing of the 2-D handheld arrays, which limits the extent to which refraction in the elevation dimension can impact detections.
An additional simplification in this study was the use of a single plane-wave B-mode sequence, which was chosen for its simplicity of interpretation and simulation. Since PAX was effective with two different PAM beamformers, it is anticipated that the method would work with other B-mode sequences as well. Evaluation of PAM and PAX under more complex and realistic scenarios is a matter of ongoing research.
The demonstrated sub-millimeter registration capabilities of PAX may be more precise than necessary, for example, when the bubble activity is widely distributed over a large target. Sub-millimeter precision may be helpful for small target treatments, but it is unclear if such precision would have general radiological value for real-time monitoring. Still, the use of PAX optimization may reduce practitioner doubt as to whether the cavitation activity is visualized properly in relation to the targeted tissue.
There are a large number of algorithms already described for sound speed estimation based on ultrasound backscatter data [28], [29], [30]. All have the aim of improving image quality and/or diagnostic efficacy, but PAX is the first to directly address coregistration of data processed with active (absolute time of flight) and passive (relative phase) algorithms, as is necessary for overlaying PAM imaging of cavitation data with B-mode structural information. Critically, the present work is not intended to estimate the local sound speed distribution or its path average but is instead intended to find a compromise for the benefit of coregistration.
To illustrate an example of how PAX performs relative to an established algorithm, the minimum phase variance method for speed estimation [31] was evaluated using the linear and curvilinear array simulation sets in layered media described in Section II-C. As shown in Table SM1 (Supplemental Materials), PAX produces similar results for the linear array, with discrepancies in speed and depth of less than 4 m/s and 0.4 mm, respectively. However, performance was significantly degraded with the curvilinear array data, leading to discrepancies in speed and depth of as much as 41 m/s and 9.8 mm for the deepest target, respectively. Other techniques may perform better than with curvilinear arrays in layered media, but their evaluation is left to future research.
In considering the literature on in situ sound speed estimation, PAX appears to be distinct from methods using array element-level variance or coherence metrics in that it operates on beamformed image data so that it should be less sensitive to noise. PAX also employs a relatively small number of sound speed evaluations-four in the present work-whereas over 100 candidate speeds may be evaluated when searching for a global minimum or maximum in a variance-or coherencerelated cost function [30], [31]. As such, there may be a considerable advantage to PAX in terms of computational efficiency and the potential to operate in real time. PAX can be implemented with coarse survey of a broad sound speed range because it compares two types of beamformers that have simple but opposing trends in sound speed sensitivity.
The acoustic properties literature contains a broad range of values for soft tissues of practical interest such as fat, liver, and muscle [32]. With uncertainties arising from measurement techniques, sample handling, and tissue variability, methods that employ patient-specific data would appear preferable to those that assume any specific literature value.
The use of a passive method to estimate path average speed and distance to a feature of interest has been described previously [33] and was later applied to short-pulse imaging of cavitation [34]. This concept was evaluated solely with linear array geometries, and as originally written, the formulation would not work with a curvilinear array geometry nor does it fully account for the covering lens. However, it appears that the method could be reformulated to account for array curvature in the imaging plane with a multiparameter fit.
The experiments and simulations in this study were conducted using idealized thin line and point scatterers, respectively, but for PAX to have a clinical impact, it must be evaluated and adopted for use with realistic scattering features. These may include exogenous highlights, including infused microbubble bubbles, cavitation-induced bubbles (as in a test exposure from a therapeutic system), or other high contrast features such as calcifications or surgical implants. Methods for windowing and processing of backscattered data from such features and the surrounding clutter have been explored in the sound speed and attenuation [35], [36] estimation literature, and their adaptation for PAX is the subject of ongoing research.
In comparison to linear arrays, curvilinear arrays are clinically valuable for their broad fields of view, and their typical operating frequency range permits diagnostic imaging at much greater depths. The present work points out challenges for the implementation of curvilinear arrays for cavitation monitoring, and the balance between benefits and difficulties must be carefully considered when defining an imaging system for monitoring of therapeutic ultrasound procedures.
Use of a single imaging depth line for PAX optimization allows for rapid evaluation, but this approach may have difficulty in tracking lateral displacement of shallow scatterers at wide angles. The approach could be modified to include a modest range of lateral image lines without great impact of evaluation time, especially since PAX is envisioned to be run before therapeutic procedures begin, and only periodically thereafter to check and update the path speed to the region of interest.
Although the arrays were calibrated, their data were not attenuation-compensated in order to focus purely on the rangeestimation issue. Full calibration of the array and propagation path is appropriate for minimum bias quantitative PAM, and a demonstration with all proposed calibration and characterization methods is part of ongoing research.
Finally, the present study is restricted to applications that are ultrasound-based for both structure and cavitation imaging. If the structural information was provided by computed tomography (CT) or magnetic resonance (MR), there would still be a need for a sound speed optimization to properly register the PAM images, and the methods in this article should still be helpful, provided that transmit-receive data collection is available. Registration of cavitation data with CT and MR structural imaging has been studied in detail for transcranial therapies, with success demonstrated using skull aberration corrections primarily based on propagation simulations [37]. Approaches based on in situ data such as described here may offer more flexibility for adjusting to changes that may occur during treatment.

V. CONCLUSION
For PAM to offer clinically viable real-time treatment guidance, it is critical that the cavitation images can be accurately mapped to anatomic information. However, this accuracy may be degraded by the different algorithmic interpretations of signal timing with B-mode (absolute time of flight for arraytarget-array propagation) and PAM (relative delay from target to array). In this work, we have illustrated how the above methodological difference may lead to substantial image misregistration over a range of environments, array geometries, and scatterer depths, with errors exceeding 1 cm in some instances.
The PAX method has been proposed to mitigate these issues, and initial evaluations performed with simulations and phantom experiments portray an in situ data-based solution that is rapidly calculable while typically yielding image coregistration errors on the order of 0.1 mm over a range of beamformers, array geometries, and imaging environments. Linear array geometries appear to be much less sensitive to path sound speed uncertainties than curvilinear array geometries, and further simplified optimization of sound speed appears to be feasible. Based on the results obtained thus far, the PAX method can help ensure that therapeutic ultrasound procedures will be guided by cavitation emissions with accurate anatomic context, representing a further step toward reliable clinical implementation.