Efficient Parallel-Beamforming Based on Shared FIFO for Ultra-Compact Ultrasound Imaging Systems

The realization of a parallel-beamforming (pBF) method in ultra-compact ultrasound imaging systems is challenging to achieve uncompromised beamforming accuracy in limited hardware and power resources. In this paper, we present a new hardware- and power-efficient pBF method that utilizes a shared first-in-first-out block on a post-fractional delay filtering architecture (pBF-sFIFO). For an analog-to-digital conversion (ADC) rate given, the proposed pBF-sFIFO method yielded beamforming accuracy comparable to that by an unconstrained pBF (pBF-CON) method, with up to 15% less power consumption. Otherwise, a conventional time-sharing pBF method (pBF-TS) was more vulnerable to aliasing artifacts. Even though increasing ADC rate in the pBF-TS method could recover the beamforming accuracy, exponential need in power consumption was inevitable. Therefore, the pBF-sFIFO method would be an effective solution to enable advanced imaging features in ultra-compact ultrasound imaging systems.


I. INTRODUCTION
In medical ultrasound imaging, there are trends to support the point-of-care diagnosis outside of hospitals, such as in the home, at a patient's bedside, in the battlefield, and in the emergency room [1]. This framework necessitates ultracompact form factor instrumented on highly integrated circuit chip solutions: field programmable gated array (FPGA) and application-specific integrated chip (ASIC) [2]- [10]. Along with this trend towards point-of-care diagnosis, having a high frame rate have been desired to enable more advanced imaging features, that need to compound or analyze temporal changes of ultrasound signals [11]- [16]. For example, color Doppler imaging, an essential diagnostic technique, The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Huo .
necessitates multiple ensemble scanlines to extract Doppler component, whose number would affect the imaging frame rate inverse-proportionally. To address to this technical need, parallel-beamforming (pBF) method has been investigated to generate multiple scanlines simultaneously at a single excitation event. This leads to an increase in frame rates and scanline density over a conventional single-beamforming (sBF) method [17]- [22]. However, an unconstrained pBF (pBF-CON) method, employing a dedicated dynamic receive beamformer for each parallel-scanline generation, would be suboptimal due to excessive burden in hardware resource and power consumption. This will eventually lower areaefficiency and battery life of the system implemented. Therefore, development of a hardware-and power-efficient pBF architecture is needed for ultra-compact ultrasound imaging systems. A time-sharing pBF (pBF-TS) method has been most widely used for the efficiency in hardware and power consumption, in which a single dual-port memory at each channel generate multiple scanlines by simply time-multiplexing an operating rate given, f s [23], [24]. However, the pBF-TS method will have the limited applicability due to the effective beamforming rate lowered by the number of parallelscanlines, M (i.e., f BF = f s /M ). For example, the effective f BF for a quad-beamforming (M = 4) at 40-MHz f s would be 10 MHz, at which aliasing image artifacts will appear with signal component > 5 MHz -it would only support clinical ultrasound transducers at a low center frequency range (f 0 = 2-4 MHz) with bandwidth restriction [25], limiting the efficacy of the ultra-compact ultrasound imaging system.
In this paper, we present a pBF method based on a shared first-in-first-out block (pBF-sFIFO) on a post-fractional delay filtering dynamic receive beamforming architecture [26], [27], in which optimal image quality and power consumption can be both obtained without compromising f BF from the given f s .

II. BASE ARCHITECTURE
In dynamic receive beamforming, the temporal resolution of time-of-flight calculation 16-times finer than f 0 is needed to obtain an appropriate image quality. There are two straightforward solutions: (1) operating ADCs and beamforming circuitry at 16-f 0 or (2) conducting temporal interpolation of the radio-frequency (RF) data digitized at the lower f s (e.g., 4-8 f 0 ), followed by a beamforming circuitry operating at 16 f 0 [28]. However, both solutions would have clear drawbacks in ultra-compact ultrasound imaging systems. The implementations of analog or digital front-end operating at 16 f 0 are either impractical (e.g., 160 MHz for 10-MHz f 0 ), because it must result in poor battery life and high implementation cost with challenging timing constraints.
The post-fractional delay filtering-based dynamic receive beamforming architecture was proposed for a portable ultrasound imaging system. This architecture employs selective fractional delay (FD) filtering on coarsely pre-beamformed data, in order to achieve the uncompromised beamforming accuracy without increasing either f s or f BF . In the previous studies, the 16-f 0 beamforming performance comparable to an interpolation-based beamformer could be obtained at only < 46 % of hardware complexity [18]. Figure 1 demonstrates the conceptual diagram of the architecture when requiring delay compensation with 4 additional fractional delays (i.e., 0, 0.25, 0.5, and 0.75 of the ADC sampling interval). The architecture consists of overall 5 procedures: 1) The digitized RF data from the ADCs are sequentially stored in the RF memory of each channel (i.e., dual-port memory) at f s . 2) The RF data samples indicated by successive focusing address in the ADC sampling interval (i.e., coarse address) are temporarily pulled into the first-in-first-out (FIFO) block, by which the array of sequential RF data samples (block data set) can be generated. The coarse address of each channel is given by where denotes the Cartesian geometry of the ith channel ranging from 1 to the total number of channels N ; k is the sample index; R is the distance between the center channel to a focusing point; c is the speed of sound in soft tissue, e.g., 1,540 m/s; and θ is a steering angle.
3) The block data sets with the same fractional delays along the channels are pre-beamformed on a selective summation block (SSB).  4) The block data sets are parallelized, and fed into the FD filters [27], [29], [30], which yield a fractionallydelayed RF data samples. For example, 8-tap FD filters optimized for the frequency spectrum can provide group delay responses in the frequency range from 0 to 15 MHz when 40-MHz f s is employed, without altering magnitude responses ( Figure 2). 5) Finally, the aggregation of the FD filter outputs yields the beamformed data for an individual focusing point without increasing the f BF from a given f s .

III. PARALLEL-BEAMFORMING BASED ON A SHARED FIFO BLOCK
A. OVERALL ARCHITECTURE Figure 3 shows the conceptual diagram of the modified architecture supporting the proposed pBF-sFIFO method. The fundamental architecture is based on the post-fractional delay filtering described in previous section, but now involves additional parallel-beamforming paths. Each path includes the dedicated SSB and FD filter blocks, and RF data at each channel can be simultaneously accessed by these parallel-beamforming paths. Detailed hardware processing flowchart is presented in Figure S1 in the supplementary materials.

B. SHARED FIFO BLOCK
To supply different block data set for each parallelbeamforming path, the proposed sFIFO block at each channel temporarily stores the extended number of RF data samples. Thus, the appropriate block data set can be multiplexed to respective parallel-beamforming paths without lowering f BF , as shown in Figure 3 (e.g., L sFIFO = 5 and L FD = 4, where L sFIFO and L FD are the lengths of the sFIFO block and FD filter, respectively). More specifically, Figure 4(a) shows the sFIFO block that stores the extended number of RF data samples and shifts a single step whenever RF memory address is increased. The RF memory address is pointed by , respectively. In this case, the parallel-beamforming address calculator will produce 1, 0, and γ for τ 1 , so that local block data sets can be concurrently delivered to each parallelbeamforming path at every focusing point k.

D. ERROR-FREE CONDITION
The proposed pBF-sFIFO method should satisfy two confinements for error-free operation. The first confinement is that the difference between adjacent τ RF i [k] should be less than a single ADC sampling interval to construct block data sets without omitting any RF data sample. Not meeting this condition causes beamforming errors during FD filtering that lasts until the erroneous boundary is drained out of the sFIFO block. When τ RF i [k] is fixed at specific m, the parallelbeamforming addresses at the kth focusing point can be rewritten from Eq. (1-1) as ( x m , z m ) and θ m i are the position offset and steering angle for the mth parallel-scanline, respectively. Therefore, an instantaneous increment of τ RF i [k] at the kth focusing point can be derived by differentiating Eq. (3-1) by k, which is given by Eq. (4) can be rewritten for the given error-free condition (i.e., Finally, Eq. (5) is reduced to

≤1) as
Because Eq. (6) is an identical equation, it shows that the proposed pBF-sFIFO method can conduct parallel-beamforming without errors for the given condition. The second confinement ensuring the error-free operation is that the total length of an sFIFO block should span the maximal difference among parallel-beamforming delays throughout a frame, by which a sFIFO block should always be able to provide appropriate block data sets to parallel-beamforming paths. To consistently satisfy the condition, the length of an sFIFO block can be equated by where · max is the global searching functions for the maximal value throughout the entire channels and depths for all transmitting events comprising a frame. This heuristic approach is obliged because the difference among parallelbeamforming delays depends on the numerous combinations of the imaging specifications defined by user (e.g., steering angle, the number of array elements, the distance between array elements, and sampling frequency, etc.).

IV. QUANTITATIVE PERFORMANCE EVALUATION A. BEAMFORMING QUALITY
Ultrasound data were respectively acquired from a tissuemimicking phantom (Model 040GSE, CIRS, Norfolk, Virginia, USA) and healthy volunteers. The f s was kept at 40 MHz for every pBF method for a fair comparison under identical system resources. The pre-beamformed RF data were captured by a commercial ultrasound imaging system (SonixTouch, Ultrasonix, Corp., Vancouver, Canada) equipped with a research package (SonixDAQ) using commercial convex, phased, and linear array probes (C5-2/60, PA4-2/20, and L14-5/38). Table 1 shows the detailed specifications of the array transducers employed and imaging specifications applied. From the experimental setup, we assumed the sBF method at target scanline density as a reference for each array transducer (i.e., number of scanlines: 256 for the convex and linear array transducers; 128 for the phased array transducer). This is justified because it would represent an ideal beamforming accuracy when evaluating the pBF method. Quadbeamforming was conducted for the pBF-TS and pBF-sFIFO methods with the reduced number of ultrasound transmission: 64 for the convex and linear array transducers; 32 for the phased array transducer. In the pBF methods, 16-channel ultrasound transmission was used to alleviate the striping artifacts in parallel-scanlines. For the quantitative comparison between parallel-beamforming methods, the quality of the ultrasound images was measured for each array transducer with the peak signal-to-noise ratio (PSNR), which is expressed as where I max is the maximum possible image intensity (e.g., 255 for an 8-bit resolution gray image); MSE is the mean square error, expressed by E[(I ref − I signal ) 2 ], where I ref and I signal are the ultrasound images reconstructed by sBF and pBF methods, respectively. The contrast resolution was also comparatively evaluated using the relative contrast-to-noise ratio (rCNR), given by where CNR sBF and CNR pBF are the calculated contrast-tonoise ratios measured from the specific regions of interest (ROIs) in the ultrasound images reconstructed by the sBF and pBF methods. The rCNR value ranges from 0 to 1 because CNR sBF should be consistently higher than CNR pBF . Each CNR value is given by where µ c and µ s are the mean intensity of the speckle and noise regions, respectively, and σ c and σ s are the standard deviation of the speckle and noise regions, respectively. The bulk power consumption (static + dynamic) was also estimated by using the Vivado software from the aforementioned implementations. 40-MHz f BF was consistently applied for each pBF method to simulate identical user experience in practical diagnostics. Particularly, it required to unlock the system resource limitation in f s , as the pBF-TS method necessitated 80 MHz, 160 MHz, and 320 MHz for dual-, quad-, and octa-beamforming. Since the maximal synthesizable f s was 120 MHz, power consumption at 160 MHz and 320 MHz of f s was extrapolated from measurements at 40 MHz, 80 MHz, and 120 MHz of f s using a quadratic polynomial curve fitting tool in MATLAB software (R2016b, Mathworks, Inc., Massachusetts, MA, USA).

A. PARALLEL-BEAMFORMING QUALITY ANALYSIS
The quad-beamforming accuracy of each pBF method was evaluated with tissue mimicking phantom. Figures 5(a), (b), and (c) show the B-mode ultrasound images by convex, phased, and linear array transducers, respectively reconstructed by sBF, pBF-TS, and pBF-sFIFO methods. Under visual assessment, there are only slight differences between the beamforming methods when using the convex and phased array transducers using 3.5MHz and 2.5MHz of f 0 . However, for the linear array transducer, the pBF-TS method shows a greater level of image quality degradation compared to those of the sBF and pBF-sFIFO methods, as shown in Figure 5(c). This is because the Nyquist theorem could not be met for the signal component centered at 7.5 MHz with ∼70 % fractional bandwidth (2.25MHz -12.75MHz), when f BF is lowered by a factor of four in quad-beamforming (i.e., 10MHz). Figure 6 shows the in vivo experimental results for liver, heart, and thyroid regions of a healthy volunteer when using convex, phased, and linear array transducers, respectively. The in vivo ultrasound images yielded similar trend as presented in the tissue-mimicking phantom experiments under visual assessment. The proposed pBF-sFIFO method shows an image quality comparable to that of the sBF method with same scanline density, regardless of the type of ultrasound array transducers used, whereas the pBF-TS method suffers from the aliasing artifact when using the linear array transducer. In both phantom and in vivo experiments, the residual error images of the pBF methods well agreed with the visual observations ( Figure S2 in the supplementary materials).
To perform a quantitative evaluation, the PSNR and rCNR values were respectively computed for the phantom and in vivo experiments (Table 2). In the phantom experi-80494 VOLUME 8, 2020  ments, the pBF-sFIFO method showed 1.4 dB, 5.0 dB, and 10.0 dB improvements in PSNR compared to those of the pBF-TS method for the convex, phased, and linear array transducers, respectively. The rCNR values from the ROIs in Figure 5 indicated that the pBF-sFIFO method provides 97.8%, 92.5%, and 94.9% of contrast resolution for convex, phased, and linear array transducers, respectively, when compared to the sBF method with same scanline density. These indicate 2.8-%, 2.2-%, and 22.8-% higher contrast when compared to those of the pBF-TS method, providing 95.0%, 90.3%, and 72.1% of rCNR values.
The in vivo studies present the results well agreeing with those presented in the tissue-mimicking phantom study. the PSNR and rCNR values of the pBF-sFIFO method are improved in terms of the PSNR (2.4 dB, 8.4 dB, and 11.8 dB) and rCNR (1.5%, 11.5%, and 16.9%) compared to those of pBF-TS method when using the convex, phased, and linear array transducers, respectively. Table 3 shows the hardware resource utilization in each method. Note that f * BF indicates the beamforming frequency for uncompromised beamforming resolution in the axial direction, preventing image degradation and aliasing artifacts. First of all, the implementation of the pBF-CON method necessitated significantly increased LUT memory for dual-, quad-, and octa-beamforming about 1.0-, 3.0-, and 4.6-fold more when compared to those for pBT-TS and pBF-sFIFO methods. This is because multiple memory components should be allocated for every parallel-beamforming path. On the other hand, even though the pBF-TS method presented the least amount of hardware utilization as sharing a single beamforming path for generating entire parallelscanlines, it necessitates M · f s of f * BF which will significantly increase the dynamic power consumption. On the other hand, the pBF-sFIFO method preserved f s as f * BF with significantly reduced use in RF memory compared to the pBF-CON method, but it utilizes more hardware resources than the pBF-TS method: FF pair, 34%; logic, 39%; LUTs, 38%; register, 3%; and slice, 35% in octa-beamforming. However, the meaningful comparative evaluation among pBF methods would be presented in the following power consumption analysis, as each hardware component consumes power differently at diversified input f s and toggle rates.

B. HARDWARE RESOURCE UTILIZATION AND POWER CONSUMPTION
From the system implmentations in Table 3, bulk power consumptions of the sBF, pBF-CON, pBF-TS and pBF-sFIFO methods at f * BF were measured (Figure 7). The sBF requires 1.85W, whereas pBF-CON method exhibits the proportional increase of power consumption for dual-, quad-, and octa-beamforming: 2.35W, 3.33W, 5.31W. The conventional pBF-TS method with the identical beamforming quality consumes 2.57W, 6.19W, and 20.60W for dual-, quad-, and octa-beamforming, which are with 9%, 86%, and 288% more fractions compared to those in the pBF-CON method. The results confirm that the use of pBF-TS method would be limited when the performance objective aims uncompromised beamforming accuracy and power efficiency for the transducer at high f 0 , which will reduce efficacy of the ultra-compact ultrasound imaging system in general diagnostics. Otherwise, the proposed pBF-sFIFO method necessitates 2.09W, 2.83W, and 4.69W for dual-, quad-, and octa-beamforming, which are 11.06, 15.01, and 11.67% less than those by the pBF-CON method. These power consumptions are only 81.48%, 45.79%, and 22.76% of those in pBF-TS method for dual-, quad-, and octa-beamforming at f * BF , respectively.

VI. DISCUSSION
In this paper, a new pBF-sFIFO method based on a postfractional filtering architecture is presented for ultra-compact ultrasound imaging systems. Based on an identical system resource given (i.e., f s ), the proposed pBF-sFIFO method presents less power consumption than that of the pBF-CON method, and also provides significant improvements in image quality compared to those given by the pBF-TS method ( Figures. 5, 6, 7; Table 2). Also, when aiming to have uncompromised parallel-beamforming accuracy, the pBF-sFIFO method presented significantly lower power consumption than pBF-TS methods (Figure 7).
In the proposed pBF-sFIFO method, there is still room for further optimization of hardware resource utilization and power consumption. For example, using two sFIFO blocks and feeding both at f s /2 of data rate will reduce approximately half of the dynamic power, while block data sets can still be assembled by accessing both of the sFIFO blocks. Also, the summation block implemented by the Vivado software may not have the best timing constraint and power consumption. Alternating them with a carry-save adder followed by a ripple carry adder or a carry lookahead adder will yield superior timing and power efficiency based on its capacity adding multiple binary numbers at the same time. Moreover, using more advanced CMOS technology would provide better bulk power consumption [32]. On the other hand, there could be more strategies dependent to a clinical application. If tissue at superficial depth, where the τ m i [k] is maximized, is out of interest in the target clinical application, the second error-free confinement in subsection III.D can be alleviated, which will proportionally affect power consumption with reduced L sFIFO . For example, in our system configuration, the power consumption in Watt follows 0.003018L 2 sFIFO − 0.002587L sFIFO + 1.681 (R 2 = 1.000; Figure 8a). One may neglect superficial depth ranging from 0 cm to 1 cm for quad-beamforming with linear array transducer. This will reduce L sFIFO from 20 to 17 (Figure 8b), leading to 11.3-% additional saving in power consumption from 2.83 W to 2.51 W. This provides in total 24.62 % and 59.45 % savings from those of pBF-CON and pBF-TS methods, respectively. Therefore, a detailed strategy in target diagnostic applications may further optimize an ultra-compact ultrasound imaging system.
Further works will be progressively conducted for the higher efficacy and applicability of ultra-compact ultrasound imaging systems. An ultra-compact ultrasound imaging system with advanced imaging features (e.g., synthetic aperture imaging, elastography, vector flow imaging, and/or photoacoustic imaging) will be embodied using the proposed pBF-sFIFO method. This extensive research would require an optimized architecture providing the beamforming accuracy comparable to the sBF method, whereas it still alleviates hardware resource utilization and power consumption for better user experiences.
The architecture could be expanded to the volumetric beamforming. Recently, the two-dimensional array is highlighted with its capability to capture the volumetric feature of tissue in real-time, but it suffers from the technical bottleneck in the limited data throughput with the massive amount of RF data received. For this, hybrid beamforming strategies have been proposed (e.g., digital and analog beamforming in lateral and elevation direction), but the precise two-dimensional dynamic beamforming cannot be expected yet because only the limited number of control lines are allowed on an ASIC chip to control thousands of channels [33], [34]. The technological challenges guide us beyond the point-of-care diagnosis; we envisage the ultimate alteration of the analog beamforming part with a digital hardware beamforming solution. This will be groundbreaking innovation for better imaging quality with efficient hardware use and power management, leading to the compact footprint of twodimensional array transducer for premium ultrasound diagnostic applications. For this, we will further generalize the analytic equations, and realize with the optimized system architecture to support the volumetric scanning. As the first step of this expansion, we may use the 32-channel systemon-chip (SOC) solution we recently built for point-of-care ultrasound diagnosis, which can support pBF-sFIFO method (quad-beamforming) with large degree of freedom with onboard delay calculation and quad-chip extension (control up to 128 channels at the same time) [3]. We can expect fourtimes higher volumetric scanning rate using the compact multi-SOC solution in a two-dimensional transducer by taking advantage of its small footprint: 27×27 mm 2 . In addition, it allows a 'chip extension' function inter-connecting up to 4 SOCs, which will enable 16-times higher volumetric scanning rate.
Further improvement can be expected with the hybrid approach with the pBF-TS method. Basically, the strongest advantage of the pBF-TS method is its capability for integration without changing any hardware architecture. Even though the proposed pBF-sFIFO method outperforms pBF-TS method especially when using high f 0 , there would be space available for lowering f BF when employing the transducer at the lower f 0 . For instance, when using 40-MHz f s and 3 MHz f 0 with 100% fractional bandwidth, the transducer only requires an upper band at 4.5 MHz. Therefore, quadbeamforming can be supported since it covers up to 5 MHz frequency band. In this case, the hybrid strategy between pBF-sFIFO and pBF-TS methods will allow up to 4-M parallel-scanline generation, which will be significant benefit for supporting advanced imaging technologies. On the other hand, integrating a compressive beamforming approach to the pBF-sFIFO method may also provide substantial synergy to reduce data rate in dynamic receive beamforming (lowering f * BF ) at minimal image quality degradation [35]- [37]. The integration may result in lower power consumption, enabling an ultra-compact ultrasound imaging system with longer battery life.
The consideration of software implementation would be an interesting point to further investigate. Previously, there was a comprehensive comparison between various architectures for sBF [38], which indicates that the post-fractional filteringbased dynamic receive beamforming method, the basic architecture used in the pBF-sFIFO method, yields the inferior execution time due to more memory access. However, when it comes to the realization of pBF, the memory access could be significantly reduced by the proposed pBF-sFIFO method since the single block data sets fetched in the shared memory can be effectively accessed by multiple parallel-beamforming paths. Therefore, the further study is necessary for evaluating the architectures to support the pBF feature.
The collective evolution of the pBF-sFIFO method will enable further innovative applications beyond conventional diagnostic ultrasound imaging. Photoacoustic imaging is a rapidly emerging modality that can provide optical contrast with ultrasound imaging resolution at deep penetration depth in centimeter scale [39]. There have been several promising progress in science and clinics [40]- [44]. The investigators envision to develop an implantable and/or portable photoacoustic sensor for instantaneous/continuous monitoring of patients with various disorders [45]- [48]. An ultra-compact development of ultrasound receive module would facilitate the translation of the photoacoustic imaging investigations into practice.
In all, we present the efficient pBF-sFIFO method ranging from architectural consideration in practical design; defining theoretical confinements for error-free operation; to phantom and in vivo performance evaluation. The results indicate that the proposed pBF-sFIFO method can outperform the conventional pBF-TS method in terms of efficacy for wider clinical diagnostic applications, especially when there is a need for better image quality with higher M (higher scanline density) and f 0 (higher spatial resolution).