Real-Time Speed-of-Sound Estimation In Vivo via Steered Plane Wave Ultrasound

Speed-of-sound (SoS) is an intrinsic acoustic property of human tissues and has been regarded as a potential biomarker of tissue health. To foster the clinical use of this emerging biomarker in medical diagnostics, it is important for SoS estimates to be derived and displayed in real time. Here, we demonstrate that concurrent global SoS estimation and B-mode imaging can be achieved live on a portable ultrasound scanner. Our innovation is hinged upon the design of a novel pulse-echo SoS estimation framework that is based on steered plane wave imaging. It has accounted for the effects of refraction and imaging depth when the medium SoS differs from the nominal value of 1540 m/s that is conventionally used in medical imaging. The accuracy of our SoS estimation framework was comparatively analyzed with through-transmit time-of-flight measurements in vitro on 15 custom agar phantoms with different SoS values (1508–1682 m/s) and in vivo on human calf muscles (<inline-formula> <tex-math notation="LaTeX">${N} =9$ </tex-math></inline-formula>; SoS range: 1560–1586 m/s). Our SoS estimation framework has a mean signed difference (MSD) of <inline-formula> <tex-math notation="LaTeX">$- 0.6 \, \pm \, 2.3$ </tex-math></inline-formula> m/s in vitro and <inline-formula> <tex-math notation="LaTeX">$- 2.2 \, \pm \, 11.2$ </tex-math></inline-formula> m/s in vivo relative to the reference measurements. In addition, our real-time system prototype has yielded simultaneous SoS estimates and B-mode imaging at an average frame rate of 18.1 fps. Overall, by realizing real-time tissue SoS estimation with B-mode imaging, our innovation can foster the use of tissue SoS as a biomarker in medical ultrasound diagnostics.

form ultrasound images.Typically, for image formation, the SoS inside the human body is assumed to be a constant 1540 m/s [1].While this assumption is sufficient for forming images with adequate quality, it neglects to account for the varying SoS of different types of tissue (e.g., muscle versus fat) in the human body [2], [3].After all, the SoS, being an intrinsic acoustic property of human tissues, can potentially serve as a biomarker for tissue composition.Indeed, studies have shown that tissue SoS can help to assess intramuscular fat content [4], [5], detect age-related or cast-related muscle decay [6], [7], differentiate between benign and malignant tumors [8], [9], [10], and quantify accumulation of fat in the liver [11], [12], [13].It would be advantageous to clinically derive tissue SoS estimates and, preferably, render this biomarker in parallel to real-time B-mode imaging that is known for its bedside applicability [14].
Ultrasound-based SoS estimation is typically accomplished via four types of methods: 1) through-transmission time-

Highlights
• A new, real-time speed-of-sound (SoS) estimation framework has been designed for live SoS measurements in parallel with B-mode imaging on a portable ultrasound scanner.
• Our framework yields a real-time frame rate of 18.1 fps and is accurate in vitro (-0.6 ± 2.3 m/s) and in vivo (-2.2 ± 11.2 m/s) compared to time-of-flight SoS measurements.
• Live SoS estimation is a key advance in fostering the clinical translation and adoption of tissue SoS as a biomarker for pathological conditions.
In addition, there are currently no systems that can perform USCT in real time [27].In contrast, global SoS estimation is a potential solution for accurate in vivo SoS measurement with real-time capabilities.Although global SoS estimation inherently lacks localization, this drawback is of less concern in applications with roughly homogeneous media, such as muscle tissue where the SoS has been recognized as a potentially useful biomarker [6], [7].
Early methods for estimating the global SoS could only measure the SoS using through-transmission time-of-flight methods [28], [29].To overcome this limitation, pulse-echo SoS estimation methods have been developed to enable SoS estimation in vivo without the need for bilateral access to the tissue.These pulse-echo SoS estimation methods are based on principles such as beam tracking [30], speckle brightness maximization [31], spatial coherence [11], [12], geometric model fitting [32], or image registration for steered transmissions [33], [34].However, few of these SoS estimation techniques have been demonstrated in real time.Another challenge of these methods is that their estimation accuracy is difficult to validate in vivo [33].To advance the practical use of SoS as a tissue biomarker, there is a practical need to develop an effective global SoS estimation algorithm that can function in real time while maintaining high accuracy in vivo.
In this work, we propose a new pulse-echo global SoS estimation framework with demonstrable real-time capability.We hypothesize that the SoS can be estimated through a loss minimization approach that seeks to find, for a given range of candidate SoS values, the best-fit SoS with the smallest difference between images beamformed from a set of steered plane wave transmissions using the delay-and-sum (DAS) algorithm.As will be shown, our framework can be implemented with real-time throughput on a portable research ultrasound scanner that is connected to a laptop with a discrete GPU.This setup is capable of achieving concurrent B-mode imaging and SoS estimation in real time to foster clinical translation of SoS in tissue health diagnostics.

II. THEORY
To estimate the SoS, we propose to leverage the information contained in steered plane wave transmissions that insonify the entire imaging view in every transmit firing.One key benefit of using this data acquisition approach is that one image frame can be generated from each pulse-echo sensing event.Accordingly, it enables a set of full-view images to be acquired with a limited number of pulse-echo sensing events, and these images are typically less susceptible to patient and operator motion as they are acquired over a short period.From a physics standpoint, if the assumed SoS used for DAS beamforming differs from the true SoS of the medium, differences would be expected in the beamformed images between different steering angles.As such, the global SoS of the medium can in principle be estimated by finding the SoS that minimizes the differences between the set of steered images.The following subsections explore this principle in greater detail.

A. Effect of Speed-of-Sound Mismatch on Point Targets for Steered Plane Wave Transmissions
When the beamforming SoS does not match the true medium SoS, there is a corresponding loss of image quality.We define this type of scenario to be an SoS mismatch that results in a quantifiable loss of the imaging resolution and contrast.Such an SoS mismatch phenomenon has previously been studied for focused transmissions [31].Because our proposed method relies on unfocused transmissions, the effect of SoS mismatch for the plane wave transmission scenario needs to be examined.
For unfocused plane wave transmissions, we use a wire suspended in distilled water to study the effect of SoS mismatch on the point spread function (PSF).Fig. 1 shows the B-mode images of a single point target that was placed at approximately 25-mm depth relative to the transducer surface and that  [1].For scenarios with SoS mismatch, not only was the resolution worsened, but also the PSFs of the point targets were positionally offset for different steering angles.The bottom row of Fig. 1 shows the two steering angles, (blue: 15 • ; red: 1 • ) corresponding beamformed point targets in transparent overlaid form for the three SoS scenarios.It can be observed that when there was an SoS mismatch [see Fig. 1(c) and (i)], the nonoverlapping PSF regions are much larger compared to when there was no SoS mismatch [see Fig. 1(f)].Accordingly, the defocusing of the scatterers across steering angles due to SoS mismatch can be leveraged as the core physical principle to estimate SoS via plane wave pulse-echo sensing.

B. Beamforming Images for Speed-of-Sound Estimation
To leverage the inter-steering defocusing principle for global SoS estimation, we begin by acquiring raw RF data from a range of steering angles.From the data for each steering angle, images can then be beamformed at a range of beamforming SoS values.We denote the specific SoS values used for beamforming as SoS candidates.By performing DAS beamforming on the same set of RF data using many SoS candidates, we can examine the inter-steering difference between the B-mode images of the same SoS candidate and, in turn, select the SoS candidate with the minimum inter-steering difference as the global SoS estimate of the medium.2. Differences in transmit paths for refraction-corrected beamforming and standard delay-and-sum beamforming [35].In (a), the difference between the assumed SoS for a plane wave transmission and the actual medium SoS leads to a refraction effect for the transmit path, compared to the straight transmit path of (b).
Our beamformer applies two key modifications to the conventional plane wave DAS method [35].First, when calculating the transmission time-of-flight that is needed for setting beamforming delays, we have accounted for the effects of SoS mismatch since the actual transmitted angle would differ from the intended steering angle in such a scenario.Second, when using different SoS candidates for beamforming, we have scaled the axial field of view to account for the apparent axial shift of the imaged targets that would appear at, respectively, shallower or deeper depths in the beamformed images for lower or higher beamforming SoS values, as observed in Fig. 1(a)-(h).These two corrections are needed to reliably compare the computed image differences between steered transmits for different SoS candidates.
Compensating for the SoS mismatch on transmit, the steering angle used for beamforming is adjusted based on the interface between the probe surface and the imaged medium.Fig. 2 highlights the relationship between the intended steering angle and the actual steering angle when there is an SoS mismatch.If the medium SoS is higher than the transmit SoS, the steering angle is effectively increased upon transmission [see Fig. 2(a)]; conversely, if the medium SoS is lower than the transmit SoS, the steering angle is effectively reduced.Compensating for this effect in DAS beamforming is equivalent to compensating for the effect of refraction between two media.Thus, we can apply Snell's law at the interface to calculate the theoretical transmitted steering angle as follows [33]: where v T and θ T are the assumed SoS and intended transmit steering angle, respectively, and v M and θ M are the SoS of the imaged medium and the corresponding actual steering angle after refraction, respectively.In the delay calculation for beamforming at pixel position (z, x), we follow the modified transmit for time-of-flight calculation as in Fig. 2(a), leading to the following delay equation: The first term in the expression corresponds to the portion of the path that is traveling through the medium, while the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
second term corresponds to the virtual path for the transducer to form a steered plane wave.As shown in the Appendix, the transmit time-of-flight τ TX of (2) can be re-expressed by substituting (1) into the second term of (2) and applying trigonometric identities to arrive at the following form that is similar to conventional plane wave DAS beamforming [35]: Intuitively, (2) explicitly characterizes the refraction effect versus the intended transmit, while in (3), the transmission angle θ M was implicitly adjusted to match the medium such that the plane wave followed a straight propagation path as illustrated in Fig. 2(b).When either (2) or ( 3) is applied to the DAS calculations for each SoS candidate, we can calculate accurate time-of-flight values to beamform images for each steering angle, assuming that the medium SoS is equal to the beamforming SoS.
To adjust for the effect of different beamforming SoS values on the imaging view, the axial positions of the pixels should be adjusted depending on the SoS.In Fig. 1, the effect of different beamforming SoS values on the apparent point target position was observed.For higher beamforming SoS values [see Fig. 1(h)], the point target appears at a deeper imaging depth.For lower beamforming SoS values [see Fig. 1(b)], the point target appears at a shallower imaging depth.Without restricting the axial imaging view based on the beamforming SoS, new structures can be unintentionally introduced to or removed from the imaging view depending on the SoS candidate.To fairly compare image differences for each SoS candidate, the effective imaging view should be the same, regardless of the beamforming SoS.We can align the imaging depth in the following way to ensure the same physical imaging view (though the actual positions of the pixels would not be the same): 1) set an imaging view for a specific beamforming SoS and 2) scale the imaging view for other SoS candidates by the ratio between the candidate and the set SoS.If we denote the intended imaging depth at beamforming SoS v T to be z T , then the converted position for beamforming SoS v M is given by the equation: While this equation is only strictly true for the unsteered axial path, it is a sufficient approximation for adjusting the field-of-view given the steering angles typically used for plane wave imaging.For example, if the desired axial field-of-view is 10-40 mm at 1540 m/s, then the axial field-of-view for an SoS candidate of 1400 m/s would be 9.1-36.4mm and the axial field-of-view for an SoS candidate of 1700 m/s would be 11.0-44.2mm.In this manner, we ensure that no new structures are introduced to or removed from the beamformed images for any of the SoS candidates.Thus, any differences between images beamformed from separate steering angles must be caused by the SoS mismatch.

C. Estimating the Speed-of-Sound Through Loss Minimization
Applying the previous principles, images are beamformed for each SoS candidate and each steering angle.Next, we can evaluate the image differences between images from each steering angle.For each SoS candidate, we encapsulate the image differences between angles as a single loss value, from which the SoS that minimizes the loss can be identified.This SoS candidate can then be declared the SoS estimate for the imaged medium.
Formalizing the idea of minimizing the differences, let L be a loss function that encapsulates image differences.Also, let N θ , N Z , and N X be the number of steering angles, axial pixel positions, and lateral pixel positions, respectively.Then, the input to L is an N θ × N Z × N X tensor consisting of the complex (after analytic signal conversion) pixel values for each steering angle, while the output of L is a single real value representative of the image differences.The loss function is evaluated on all images beamformed using each SoS candidate to derive an associated real loss value for each candidate.From these values, a curve can be fit to evaluate where the minimum loss lies.We denote this loss curve to be the inter-steering loss.The SoS that minimizes the inter-steering loss is the SoS estimate produced by our framework.

D. Full Overview of SoS Estimation Framework
A full detailed overview of the SoS estimation framework is shown in Fig. 3.We begin by acquiring raw radio frequency (RF) data using several steered plane waves (Fig. 3, step 1).Let the number of transmit steering angles be N θ .After one set of acquisitions, we then have a full N RF × N C × N θ set of channel RF data, where N RF is the number of RF samples per channel and N C is the number of elements on the probe.Then, an N S number of SoS candidates ranging from a minimum possible SoS v min to a maximum possible SoS v max are selected for beamforming images to calculate the inter-steering loss.For each of the N S SoS candidates, an N Z × N X image is then beamformed according to Section II-B using that SoS candidate for each of the N θ steering angles, leading to an N Z × N X × N S × N θ data tensor (see Fig. 3, step 2).From the data tensor, the inter-steering loss is then calculated for each SoS candidate leading to an N S × 1 vector (see Fig. 3, step 3), from which a more refined inter-steering loss can be formed via a smoothing curve fitting (see Fig. 3, step 4).After the inter-steering loss has been smoothed (see Fig. 3, step 4, red line), a minimum value can be found (see Fig. 3, step 4, black arrow).The associated SoS leading to that loss is then declared the SoS estimate.

A. Generating Accurate Reference Speed-of-Sound Measurements With Water Validation
To validate that our proposed algorithm leads to accurate SoS estimates, we measured the SoS of in vitro phantoms and in vivo human muscles by performing through-transmission ultrasound time-of-flight measurements.For these reference measurements, we constructed a device [see Fig. 4(a)] to simultaneously measure the distance traveled (d) and the time-of-flight (t) for an ultrasound pulse for SoS calculation based on the classical d/t relation.For the distance measurement, an electronic caliper (ABSOLUTE; Mitutoyo; Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 3. Four-step overview for speed-of-sound estimation framework.Data are acquired from several steered plane waves (step 1) and then beamformed for many SoS candidates while accounting for refraction and adjusting the imaging depths so that no new structures are introduced in the beamforming view (step 2).The beamformed images are then evaluated for image differences between steering angles via a loss function (step 3).Finally, the loss function is smoothed and the loss minimum is declared to be the SoS estimate (step 4).Kawasaki, Japan) with 0.01-mm resolution was used to accurately measure the distance.Two custom resin 3-D-printed attachments [see Fig. 4(a)] matching the lower jaws of the caliper were then used to hold two single-element transducers (C567; Olympus; Tokyo, Japan) in alignment with the caliper.One transducer was connected to an arbitrary waveform generator (SDG2042X; Siglent; Solon, OH, USA), emitting a 3-cycle, 10-V peak-to-peak, 5-MHz sinusoidal pulse.Both transducers were also connected to an oscilloscope (DS1104Z; Rigol; Beijing, China) to measure the time-of-flight from one transducer to the other.For automated time-of-flight picking from the acquired oscilloscope data, we applied the Akaike information criterion time-of-flight picker used for USCT [37].The accuracy of our custom SoS measurement setup was validated in distilled water at 22.1 • C, which has a known SoS of 1488.6 m/s [36].The temperature was measured with an electronic thermometer with 0.4 • C accuracy (Traceable; Webster, TX, USA).In the water, the mean device measurement of the SoS was 1488.9 m/s (σ : 0.3 m/s) with 23 independent measurements at distances ranging from 43.9 to 104.9 mm.This level of accuracy was within the temperature measurement error of the validation system.The custom device was then used to take reference SoS measurements [see Fig. 4(b)] both in vitro and in vivo to validate the accuracy of the proposed pulse-echo SoS estimation method.Before each new measurement session, the reference measurement setup was calibrated and validated again in water to ensure the correctness of our SoS reference values.

B. Construction of In Vitro Agar Phantoms
The in vitro phantoms were constructed using a mix of agar (2% by weight), graphite (2%), water (64.6%-94%), and glycerol (2%-35.6%).The water and glycerol added up to 96% of the mixture by weight, with higher glycerol concentrations leading to corresponding higher phantom SoS [38].
To construct in vitro phantoms with a homogeneous SoS, three-step staircase molds were first 3-D printed out of polylactic acid (PLA), with step heights of 4, 5, and 6 cm.Next, the water and agar were boiled while stirring to form the material for the phantom.After the mixture boiled, glycerol and graphite were added alongside a small amount of potassium sorbate (0.3% of the total mixture weight) as a preservative.This mixture was stirred for 5 min on heat and then transferred to a magnetic stirrer to cool until the mixture reached 50 • C. Finally, the mixture was poured into the staircase mold and then refrigerated until they were used for the experiments.Using this procedure, we constructed 15 total phantoms akin to Fig. 4(b), which enabled robust assessment of the SoS in vitro via the reference measurement device.Before any measurements, all phantoms were removed from the refrigerator and left at room temperature overnight to ensure a consistent temperature, minimizing the effects of the SoS of the medium changing during the experiments.The measured SoS of the constructed phantoms ranged from 1508.0 to 1681.9 m/s, with the standard deviation within the reference measurements for each staircase ranging from 0.5 to 1.9 m/s.

C. In Vitro and In Vivo Measurement and Data Collection
We collected both in vitro and in vivo data to assess our proposed SoS estimation framework.First, we used B-mode ultrasound to locate and mark the position on the phantom or the muscle that we intended to measure.Next, pulse-echo plane wave RF data were acquired using the parameters of Table I.Single pulse-cycle steered plane waves were transmitted at 5 MHz, with steering angles ranging from −15 • to 15 • in intervals of 1 • .A SonixTouch research scanner (Analogic Ultrasound; Peabody, MA, USA) with the ability to save raw channel RF data was used for data collection.To establish a reference SoS, each phantom or muscle was measured several times with the custom caliper-based setup (see Section III-A) at the marked location.Finally, the raw RF data were transferred to a computing server for offline processing of the data with MATLAB (2020b; Natick, MA, USA).
For the in vitro staircase phantoms, all phantoms were left outside the refrigerator overnight before any measurements were taken.This process ensured that the phantoms were at a consistent temperature.Each step of each staircase was measured five times with the custom reference setup, and then, all measurements were averaged to form a reference SoS for each staircase.Afterward, the topmost step (6-cm step) of each staircase was insonified with steered plane wave transmissions as in Table I and the raw RF data were saved for later processing.The pulse-echo acquisitions were repeated a total of ten times for each phantom.
For the in vivo human data, we acquired data from muscles of nine healthy volunteers (age: 27.7 ± 2.9; six males, three females).For each volunteer, we examined the gastrocnemius (calf) muscle with both the leg extended and the leg bent.The calf muscle on the dominant side of the body was selected for measurement.To minimize sources of error during SoS measurement, the reference measurements for each muscle were taken immediately following the pulse-echo acquisitions.Starting with the volunteer's leg in a bent position, ten pulse-echo acquisitions (31 angles per acquisition) were performed without moving the probe position.Immediately afterward, the muscle was measured seven times with the reference setup with slight changes in position between each measurement to cover the same field-of-view as the linear probe.The volunteer was then asked to extend the muscle and the same quantity of pulse echo and reference data were acquired at the same location on the muscle.The bent calf and extended calf scenarios were used to investigate whether partial calf loading would lead to an SoS difference in the calf muscle, with the additional benefit of informing future studies.This research data acquisition protocol has been approved by the Research Ethics Board of the University of Waterloo (Protocol No. 44778).

D. Beamforming Specifics
From the RF data, we beamformed the set of images for each SoS candidate according to the previously described DAS beamforming scheme (see Section II-B).We selected SoS candidates ranging from 1400 to 1750 m/s in 5-m/s intervals; this range encompasses the expected range of SoS for soft tissue in the body.To avoid near-field effects, the axial imaging view was selected to range from 10 to 45 mm at 1540 m/s, which was correspondingly scaled for each SoS candidate.We additionally examined using a reduced imaging view ranging from 20 to 45 mm at 1540 m/s to mitigate errors in estimation due to thicker regions of subcutaneous fat.Because the ideal beamforming SoS of deeper regions depends on the SoS of the superficial tissue, selecting a range further from the fat layer reduces the influence of the SoS of the fat.For these axial ranges, refraction-corrected DAS beamforming was then performed according to the parameters summarized in Table II after the RF data were prefiltered and Hilbert transformed.From each set of steered transmit RF data (size 3120 × 128 × 31), we thus formed a corresponding 128 × 128 × 31 × 71 complex data tensor consisting of beamformed analytic images for each SoS estimate.

E. Speed-of-Sound Estimation
Once the images for each SoS candidate for all steering angles have been beamformed, the SoS estimates were formed by minimizing the inter-steering loss (see Section II-C).We examined two pixel-wise loss functions to capture image Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
differences between steered angles, as pixel-wise functions are amenable to GPU acceleration for real-time operation.The first loss function is based on the coefficient of variation (CV), defined as the ratio of the standard deviation to the mean.For each pixel and each SoS candidate, we calculate the CV across the steering angles and then sum the CV over all the pixels to arrive at a single real value for each SoS candidate.Expressed symbolically, the CV loss for a specific SoS candidate v is given by the sum over the pixels p where σ θ ( p) is the standard deviation across angles at pixel p and |µ θ |( p) is the mean of the absolute pixel magnitudes across angles.Intuitively, images that match better across angles should have a lower standard deviation for each pixel.For this loss to accurately capture the inter-steering loss, we assume that the effect of the SoS variation is greater than that of angle-dependent transmission.The second loss function is based on the complex autocorrelation of the pixels across angles.For each pixel and each SoS candidate, we evaluate the autocorrelation function across all steering angles, then take the negative of the absolute discrete integral of the autocorrelation at each pixel position.Expressed symbolically, the autocorrelation loss at a specific SoS candidate v is given by the sum over the pixels p where R p (θ ) is the autocorrelation function at pixel p over the steering angles evaluated at lag θ.Intuitively, the pixel-wise autocorrelation across angles should be high if the images match well across steering angles.Both of these pixel-wise loss functions [see ( 5) and ( 6)] permit fast GPU implementations.
When the loss was calculated, only the pixel positions that were fully insonified by all plane wave transmissions were used for the summations.In doing so, we avoided situations where image regions of naturally poor image quality were being compared.The loss was normalized to the number of pixels that were included in the summation.Once the loss was calculated for each SoS candidate, we then fit a normalized smoothing spline in MATLAB (using the fit function) with a smoothing parameter of 0.99.The minimum of the smoothing spline was selected as the SoS estimate for each sample.A simplified MATLAB code sample of our entire framework has been posted online (ht.tps://github.com/litmus-uwaterloo/Global-SoS).
To evaluate the accuracy of the SoS estimates generated by our framework, we compared the estimates with the through-transmit reference measurements.For the in vivo samples, we additionally analyzed the difference between the SoS estimates and reference due to the subcutaneous fat layer by measuring the thickness of subcutaneous fat and skin via B-mode imaging.We also performed paired t-tests on both the reference SoS values and the estimated values in vivo to assess whether there was a statistically significant difference between the SoS of calf muscle in a bent state versus an extended state (see Section III-C).Finally, we assessed how the DAS beamforming calculations that incorporate SoS effects (see Section II-B) affected the accuracy by estimating the SoS without the two adjustments (refraction correction and depth alignment) to the beamformer.
For all accuracy evaluations, we quantified the error via the mean signed difference (MSD) and standard deviation between the estimates and the reference measurements.A negative MSD implies that our framework underestimates the SoS, while a positive value implies that our framework overestimates the SoS; values close to 0 indicate accurate estimation on average.The standard deviation of the MSD indicates the amount of variation in the accuracy of our SoS estimates.We additionally report the maximum intra-sample variability (MIV) as the maximum over all the samples of the standard deviation across the ten acquisitions for each sample.For the in vitro samples where the probe is fixed to a stationary stand, this value indicates the robustness to noise across multiple acquisitions.For in vivo samples where the probe is held free hand, this value additionally indicates the robustness to operator and tissue motion during the acquisition.

F. Real-Time Implementation Using Sparse Matrices and GPU Acceleration on Portable Scanner
Implementing the SoS estimation procedure in real time added additional challenges for our algorithm.First, an image for every steering angle must be beamformed for each SoS candidate.Thus, the operation time scales linearly not only with pixels for the DAS beamformer but also with both the number of steering angles and the number of SoS candidates.Second, the loss function evaluation must be performed in an efficient manner after all images have been beamformed.For both of these challenges, we relied on efficient GPU implementations for a real-time realization of our estimation framework.
For real-time operation, the US4R-Lite (US4US; Warsaw, Poland) portable research scanner was connected via USB-C to a GPU-enabled ThinkPad X1 Extreme Gen 4 laptop (Lenovo; Beijing, China).The laptop was running an i9-11950H CPU (Intel; Santa Clara, CA, USA) and an RTX 3080 laptop GPU (Nvidia; Santa Clara, CA, USA) with 16 GB of VRAM.All programming was done in Python (ver.3.8.5),with GPU acceleration provided through the CuPy (ver.10.6.0)library and CUDA (ver.11.2), interfacing with the US4R Python API (US4US; Warsaw, Poland) for real-time imaging.During operation, SoS estimation was applied as a pass-through method; the raw channel RF data were first processed for our custom SoS estimation framework, and then, the same raw channel RF data were output again for image formation using the built-in US4R beamforming pipeline.
To efficiently beamform the entire data tensor, we applied sparse matrix beamforming (SMB) for GPU acceleration.SMB has demonstrated high efficiency previously for high-intensity focused ultrasound monitoring and passive acoustic mapping but has also proven to be an effective means of acceleration for DAS beamforming [1], [39], [40].SMB enabled the simple conversion of our DAS beamforming algorithm (see Section II-B) for GPU processing through CuPy.Expressing DAS beamforming as a single sparse matrix multiplication with the vectorized RF data is a mathematically equivalent reformulation that enables the use of efficient, prebuilt sparse matrix libraries for performance instead of developing a custom GPU codec.GPU-based sparse matrix multiplication was implemented via the csr_matrix class in CuPy.For real-time implementation on our portable scanner, we additionally reduced the scope of the framework in two ways: 1) reducing the number of transmit steering angles to be −15 • to 15 • in intervals of 3 • instead of 1 • , skipping the 0 • transmit; and 2) reducing the number of SoS candidates used for beamforming 1400 to 1740 m/s in 20-m/s intervals.We conducted additional SoS estimation experiments to ensure that we minimized the loss of accuracy with the framework scope reduction.
To evaluate the inter-steering loss efficiently, we used functions from the CuPy library for GPU acceleration.Because of the similarity in estimation accuracy from the two loss functions [see ( 5) and ( 6)], we implemented the CV loss [see (5)] for real-time operation as it was more computationally efficient.In CuPy, the complex beamformed data were split into real and imaginary portions for computation of the mean and standard deviation for each pixel, matching the MATLAB definitions of those functions for complex data.For loss smoothing, the calculated losses for each SoS candidate were transferred to the CPU in order to apply the UnivariateSpline function from the SciPy (ver.1.5.2) library for smoothing (setting s = 0) before finding the loss minimum.This loss minimum was then selected as the SoS estimate for the corresponding compounded B-mode image frame.For smoother real-time estimation, the displayed SoS estimate was a running average of the most recent ten estimates.
Videos of live simultaneous B-mode and SoS estimation were captured for three scenarios: 1) a bipartite in vitro agar phantom with a slower SoS on one side and a faster SoS on the other side; 2) ex vivo porcine meat from the shoulder (fatty cut) and the loin (lean cut); and 3) in vivo human biceps where the muscle is flexed and relaxed during the acquisition.For scenario 1, the SoS of the two sides were quantified by creating staircase phantoms using the same mixtures as the bipartite phantom.For scenario 2, the SoS was quantified using our custom SoS measurement device.For scenario 3, only qualitative observations were made.These three scenarios served to demonstrate the generalizability and applicability of our SoS estimation framework.In addition, we measured the computation time during real-time operation for each step of our framework, averaging all the timings over 200 frames.

A. High Accuracy of Speed-of-Sound Estimates Compared to Reference Measurements
On the acquired data (see Section III-C), the estimates generated by our proposed framework are accurate both in vitro and in vivo, regardless of the applied loss function.The top row of Fig. 5   accuracy for the in vitro cases, all framework variations have less than 2.5 m/s mean error.Examining one subplot in greater detail, in Fig. 5(c), the mean error is −0.6 m/s, while the error standard deviation is 2.3 m/s.For all plots in Fig. 5, the error is quantified via the MSD in the bottom-right corner.Also shown in the bottom-right corner, the MIV for all in vitro samples was less than 0.5 m/s in all plots.For the more challenging in vivo scenarios (see Fig. 5, bottom row), our framework still maintains less than 10-m/s mean difference across the varying framework conditions.There is a minimal difference in the estimates between the two compared loss functions [see Fig. 5(e) versus (f) and Fig. 5(g) versus (h)], but a large difference in the estimates between the two beamforming depths for the same loss function [see Fig. 5(e) versus (g) and Fig. 5(f) versus (h)].It can be observed that beamforming at depths that excluded the subcutaneous fat layer [see Fig. 5(g) and (h)] resulted in reduced SoS estimation differences compared to beamforming at depths that may include the subcutaneous fat layer for specific individuals [see Fig. 5(e) and (f)].In the in vivo cases, the MIV was higher than in the in vitro cases but did not exceed 2.5 m/s.We denote the framework of Fig. 5(c) and (g), using CV loss and 20-45-mm beamforming depths, as the baseline framework.
Further investigating the effect of the subcutaneous fat on the SoS estimates, Fig. 6 graphs the absolute difference in SoS between the reference and estimation compared to the measured subcutaneous fat for CV loss for two beamforming depths.In Fig. 6(a), beamformed at depths from 10 to 45 mm, a trend can be observed showing that the difference increases with subcutaneous fat depth, especially after 10-mm fat depth; quantified, the mean absolute difference was 11.8 m/s.In contrast, the baseline framework restricted the beamforming depths between 20 and 45 mm and led to a profile less dependent on the subcutaneous fat layer with a mean absolute difference of 9.3 m/s.Fig. 6(c) and (d) shows two sample B-mode images of a male subject (M) with a thin subcutaneous fat layer (including skin) and a female subject (F) with a thick subcutaneous fat layer (including skin), alongside the points used to estimate the thickness of the layer in the B-mode images (red diamonds).The points were manually selected to be on the upper boundary of the first region containing muscle fibers [white arrows in Fig. 6(c) and (d)].
We additionally examined whether there was a difference between bent-leg and extended-leg measurements on the calf SoS (with no muscle activation).Using the baseline framework setup, a paired t-test (n = 9) on the estimated SoS for the pairs of measurements resulted in the null hypothesis ( p = 0.19).A paired t-test on the reference SoS for the same measurements also resulted in the null hypothesis ( p = 0.18).

B. Importance of Refraction Correction for Accurate Estimates
To assess why correctly modeling the geometry in DAS beamforming (see Section II-B) is important for accurate estimates, we separately omitted refraction correction and depth alignment from the baseline framework.Fig. 7 shows the results of performing SoS estimation without the corrected DAS beamforming procedure.Compared to the baseline SoS estimation framework [see Fig. 7 While the depth alignment proved insignificant on our dataset, we believe that the alignment can still be relevant as it leads to greater consistency in the selection of RF data used for beamforming across all SoS candidates.For example, if the bone structure is on the bottom edge of the imaging view, then SoS estimation bias may result from images of some SoS candidates, including data from the bone, while others do not.Our baseline framework with the beamforming modifications [see Fig. 7(a)] has the least mean difference, along with the lowest standard deviation, for both the in vivo and in vitro samples compared to not applying beamforming modifications.

C. Framework Is Capable of Real-Time SoS Estimation on Portable Open-Platform Ultrasound Scanner
For real-time implementation, we decreased the scope of the estimation framework by reducing the number of transmit angles (from 31 to 10) and the number of SoS candidates (from 71 to 18) for beamforming (see Section III-F).We observed that the reduction in framework scope did not significantly reduce the SoS estimation accuracy (see Fig. 8) compared to the baseline framework.For both the in vitro and in vivo samples, the reduced framework [see Fig. 8(b)] had similar accuracy as the baseline framework [see Fig. 8 Three videos of live SoS estimation were taken to demonstrate the applicability and accuracy of our real-time framework (see Section III-F).In the first video ( Movie 1), the probe was slid along the length of an in vitro bipartite phantom.We observed that when the imaged region was fully contained in only one medium, the SoS estimates closely matched the corresponding measured SoS values.In addition, the transition of the SoS estimates was smooth as the probe was slid from one side of the phantom to the other side [as in Fig. 9(a) and (b)].In the second video ( Movie 2), the probe moved between a fattier sample (porcine shoulder-left side) and a leaner sample (porcine loin-right side).Qualitatively, the shoulder has a slower SoS [see Fig. 9(c)], while the loin has a faster SoS [see Fig. 9(d)].Compared to the reference SoS measurements, our live estimates were within 5 m/s of the reference.In the third video ( Movie 3), the probe was held in approximately the same position on the right biceps muscle, while the individual flexed and relaxed the muscle.We observed that not only was there a qualitative difference in the SoS between the flexed [see Fig. 9(e)] and relaxed [see Fig. 9(f)] states, but the displayed estimates were also stable when the muscle was not in the process of transitioning between states.Due to the rounded geometry of the biceps muscle, through-transmit measurements were difficult to obtain.
The average frame rate during real-time operation was 18.1 fps, measured over 200 frames.Further quantifying this frame rate, we measured the average time taken for each section of the estimation algorithm and the built-in image formation and display pipeline.The average preprocessing (prefiltering and Hilbert transform) time for the RF data from all ten steered transmissions was 7.8 ms.The average time taken for beamforming at each SoS candidate and angle was 38.6 ms.The average time for loss calculation and minimization was 1.2 ms.Finally, the average time taken by the built-in image formation pipeline was 6.1 ms.

A. New Algorithm and Device for Real-Time Speed-of-Sound Estimation
In this work, our two main contributions are: 1) a new algorithm for SoS estimation via steered plane waves and 2) the development of a device capable of simultaneous SoS estimation and B-mode imaging in real time.Our new SoS estimation algorithm leverages the variations that arise from the rotation of the defocused scatterers when the incorrect SoS is used for beamforming (see Fig. 1).By beamforming the same RF data at many SoS candidates, we can find the SoS that Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.minimizes the differences between images for the set steering angles-the minimized SoS is declared the SoS estimate (see Fig. 3).To validate that our new framework resulted in accurate SoS estimates, a custom through-transmit device (see Fig. 4) was used to take reference SoS measurements for custom agar phantoms and human calves.The average error of the proposed framework, as quantified by the MSD, was measured to be −0.6 m/s in vitro [see Fig.For real-time application, the proposed framework was implemented on a portable open-platform ultrasound scanner (see Section III-F).The numbers of steering angles and SoS candidates used for beamforming were both reduced to accommodate the device hardware and improve the real-time frame rate, without reducing the SoS estimation accuracy (see Fig. 8).Using the US4R-Lite system, we demonstrated the concurrent SoS estimation and B-mode imaging at 18 fps for in vitro ( Movie 1), ex vivo ( Movie 2), and in vivo ( Movie 3) imaging scenarios.

B. Importance of Live Speed-of-Sound Estimation Framework
The ability to perform point-of-care imaging with ultrasound is a distinct advantage over other medical imaging modalities [41].By performing tissue SoS estimation in real time, we can better connect the SoS as a biomarker with the B-mode images displayed to the sonographer.For example, in Movie 3, the estimated SoS can be associated with the state of the biceps muscle as it transitions from one state to another.Recently, it has been shown that ultrasound of the biceps using shear wave elastography can help assess rigidity in Parkinson's disease [42]; SoS estimation, similar to elastography, quantitatively measures a biomechanical property of the muscle tissue.Our work can aid in the clinical translation work for relating tissue SoS with muscle health and function.We envision that one possibility is for tissue SoS to serve as a one-number metric, similar to how FibroScan summarizes liver stiffness via elastography [43].
Our investigation has advanced the state of the art in SoS estimation in multiple ways.First, we have explicitly demonstrated the effects of not performing refraction compensation during beamforming on the resulting SoS estimates [see Fig. 7(b)].These effects are seldom mentioned in the literature, though a few works have previously asserted its necessity [22], [24], [33].Local SoS estimation methods can also see benefit from refraction correction to potentially increase their estimation accuracy if such correction has not yet been implemented [17], [18], [44].One method for local SoS mapping that already applies a form of refraction correction uses the eikonal equation to calculate the time of flight [45].Second, we show that global SoS estimation in real time is feasible, and the corresponding real-time insights can be used simultaneously with B-mode imaging.Other global SoS estimation algorithms may also be amenable to real-time implementations.We believe that live imaging will have the most impact on SoS as a biomarker.Because of the simplicity and efficiency of our framework, the global SoS estimate can potentially find use as an initial SoS value for local SoS estimation algorithms that depend on post-beamformed images [22], [46].Finally, unlike SoS estimators that are based on through-transmission time-of-flight measurements, our framework may be integrated into an ultrasound scanner without hardware modifications as long as the scanner is capable of performing plane wave transmissions and raw RF data collection.Because we used a pass-through method (see Section III-F), our framework can be inserted into the processing pipeline for any system capable of real-time software beamforming [47].In addition, by passing the live SoS estimate into the beamformer for displaying the image, B-mode image quality can be improved as well, though we did not implement this feature on our system.

C. Limitations and Future Work
While we achieved our goal of real-time SoS estimation, our work can still be improved via both framework improvements and application studies.SoS estimation is highly sensitive to the measured delays and distances.Before each experimental session, the calibration of the reference measurement device (see Fig. 4) on the order of 0.01 mm and 0.1 µs is necessary to ensure the optimal accuracy.Similarly, the receive delays introduced by the imaging system when collecting the raw RF data were characterized to 0.1 µs (time between first RF sample and first element firing).An even more precise understanding of all the system and probe parameters, such as accounting for the effect of the probe lens, would improve the SoS estimation accuracy.Moreover, the demonstrated frame rate of 18.1 fps can be improved further.Some improvement options include using a custom GPU implementation for the entire framework or a parameter sweep to assess the minimum number of angles and SoS candidates required for accurate estimation.
Measuring the SoS in vivo is challenging for several reasons.For our experimental dataset, the calf muscles of the healthy volunteers were almost all within the 1570-1590-m/s range, leading to limitations for broader validation of our framework in vivo.Measuring the SoS at fine resolution in vivo poses inherent difficulty as the SoS depends on the temperature of the tissue at the time of measurement [29].Despite this challenge, the temperature dependence of SoS can also be leveraged beneficially, as live SoS estimation can potentially find applications in monitoring thermal changes in the tissue, such as during an ablation procedure [48].We are planning a future study involving more muscle groups and more participants to better assess how muscle SoS relates to muscle health and other related parameters.Yet, another in vivo challenge is how the subcutaneous fat thickness affects the accuracy of our global estimation framework (see Fig. 6).While our real-time framework does demonstrate robustness for inhomogeneous scenarios ( Movies 1 and 2), separating the SoS values of the fat and muscle layers would be superior for using the tissue SoS as a biomarker to quantify tissue health.One possible approach found in the literature is to assign nominal values to the subcutaneous layer and then solve the SoS in the region of interest [11].Extending the methods from this article for real-time layer-wise (as opposed to global) SoS estimation in vivo is another direction that our group is currently developing, following a similar trend as in [24].Frameworks that have real-time capability for performing SoS estimation layer-wise, region-wise, or locally can be extremely useful for improved quantification of tissue health via the SoS biomarker [13].

VI. CONCLUSION
The ability to estimate tissue SoS in real time augments the benefits of ultrasound as a point-of-care imaging modality.In this article, we proposed a new real-time algorithm for estimating the SoS using steered plane wave transmissions and implemented it on a portable ultrasound scanner.By creating a device capable of simultaneous SoS estimation and B-mode imaging, we hope to accelerate clinical translation research regarding the use of the SoS as a biomarker for tissue health.

Fig. 1 .
Fig. 1.B-mode images of a wire target suspended in distilled water beamformed at three SoS values: (a), (b) 1389 m/s; (d), (e) 1489 m/s; (g), (h) 1589 m/s.Results are shown for two steering angles (1 • and 15 • ) and are displayed at 6-dB dynamic range from the peak signal.Overlaid point targets for different steering angles are shown in blue (1 • steering) and red (15 • steering) in the bottom row.

Fig.
Fig.2.Differences in transmit paths for refraction-corrected beamforming and standard delay-and-sum beamforming[35].In (a), the difference between the assumed SoS for a plane wave transmission and the actual medium SoS leads to a refraction effect for the transmit path, compared to the straight transmit path of (b).

Fig. 4 .
Fig. 4. (a) Setup for measuring a reference SoS via through transmissions by attaching transducers to an electronic caliper.(b) Using the setup to measure the SoS of an agar staircase phantom.

Fig. 5 .
Fig. 5. Speed-of-sound estimates for the collected ultrasound data as compared to the reference measurements.Two beamforming depths and two loss functions are computed for (a)-(d) in vitro agar staircase phantoms and (e)-(h) in vivo human calves in two states.On each plot, the mean signed difference (MSD) and the maximum intra-sample variability (MIV) are reported in m/s.Values in bold represent the quantified accuracy of what we denote in our baseline framework.
shows the SoS estimates compared to the reference values for the in vitro agar staircase phantoms for two loss functions [CV loss: (a) and (c); autocorrelation loss: (b) and (d)] at two different beamforming ranges (10-45 mm, left side; 20-45 mm, right side).Substantiating the estimation

Fig. 6 .
Fig. 6.(a) Difference analysis graphs comparing the absolute difference between the estimated and reference SoS (coefficient of variation loss) versus the measured subcutaneous fat via B-mode imaging for two beamforming depths (a) and (b).(c) and (d) Sample B-mode images along with the points used for manual estimation of subcutaneous fat depth-the upper boundary of the muscle fibers.
(a)], we observed that eliminating refraction compensation leads to large errors (in vitro MSD: −26.5 m/s; in vivo MSD: −20.6) on the SoS estimates overall [see Fig. 7(b)].Eliminating depth alignment reduces the standard deviation for the in vivo estimates only, though an F-test for equality of variances reveals that this difference is not statistically significant [Fig.7(c); MSD standard deviation of 12.8 m/s compared to 11.2 m/s baseline].

Fig. 7 .
Fig. 7. Speed-of-sound estimates for the collected ultrasound data for (a) proposed framework with coefficient of variation loss, (b) same framework as in (a), but without refraction compensation, and (c) same framework as in (a), but without depth alignment.On each plot, the mean signed difference (MSD) and the maximum intra-sample variability (MIV) are reported in m/s.
(a); identical to Fig. 7(a)].Quantifying the accuracy on the dataset, the reduced framework indeed achieves only marginally lower accuracy metrics in vivo (MSD: −2.2 m/s ± 11.6 m/s versus −2.2 m/s ± 11.2 m/s baseline) as the baseline framework, along with a similar marginal increase in the MIV (2.93 m/s versus 2.50 m/s).

Fig. 8 .
Fig. 8. Plots comparing the SoS estimation comparison of the full original proposed framework (a) using 31 steering angles and 71 SoS candidates with a reduced framework (b) with ten angles and 18 SoS candidates.On each plot, the mean signed difference (MSD) and the maximum intra-sample variability (MIV) are reported in m/s.

Fig. 9 .
Fig. 9. Snapshots of Movies 1-3 at different time points.(a) and (b) Probe is slid along middle of bipartite phantom, showing that SoS estimate is in between reference measurements.(c) and (d) Probe is placed moved between the ex vivo fattier pork shoulder and the leaner pork loin.(e) and (f) Probe is placed on biceps muscle in vivo, while muscle is contracted and relaxed.* Indicates reference through-transmit measurements were taken.
5(c)] and −2.2 m/s in vivo [see Fig. 5(g)] on the collected datasets, showing a high degree of accuracy compared to the reference measurements.The MIV was 0.23 m/s in vitro and 2.50 m/s in vivo, demonstrating our framework's consistency and robustness.