Artificial-Intelligence-Enhanced Ultrasound Flow Imaging at the Edge

Ultrasound flow imaging has long been used for cardiovascular diagnostics. Color Doppler imaging (CDI) is the predominant ultrasound flow imaging mode, but its diagnostic value is hampered by aliasing artifacts that limit the range of detectable blood velocities. Here, we present the first demonstration of how edge artificial intelligence (AI) can enable real-time CDI with aliasing resistance. Specifically, graphical processing unit acceleration and AI-ready edge computing hardware have been leveraged to realize the first end-to-end CDI processing pipeline that involves AI-based aliasing correction. Performance results show that, using our edge AI engine, aliasing-resistant CDI frames with threefold velocity detection range can be generated at a real-time frame rate of 25 frames per second (for raw datasets with 192 channels, 12-bit data resolution, and 25-MHz sampling rate). Overall, edge AI can critically improve the real-time visualization quality of ultrasound flow imaging and, in turn, potentially transform its bedside application value.

A rtificial intelligence (AI) solutions based on deep learning principles are becoming increasingly prevalent in medical ultrasound imaging, with successful applications seen throughout the imaging pipeline. 1,2 Specifically, with its data-driven paradigm and the potency of convolutional neural networks 3 (CNNs), deep learning has demonstrated efficacy in complicated ultrasound processing tasks that were challenging to address with traditional approaches. For instance, deep learning was found to be useful in performing ultrasound image feature detection tasks such as the identification of tumors or damaged organs. 1 These AI techniques have also demonstrated effectiveness in handling ultrasound signal processing tasks such as beamforming and noise filtering. 2 Despite such achievements, the incorporation of AI-based ultrasound data processing techniques into clinical ultrasound scanners is still in the early stages.
AI in ultrasound imaging needs to be edge compatible to foster commercial and clinical adoption. The reason is because medical ultrasound imaging systems are typically realized as mobile scanner platforms that can be moved from bedside to bedside where real-time scanning is the application norm. These systems may also be deployed in the emergency room and in intensive care units. 4 As a general expectation for real-time ultrasound imaging, data collected from the ultrasound transducer needs to be processed (i.e., including the steps of beamforming, filtering, flow estimation, and image rendering 5 ) at imaging frame rates of over 25 frames per second (fps) to provide the sonographer with immediate feedback and to allow image navigation. Therefore, any AI-based solution that needs to be integrated into the ultrasound data processing pipeline must have minimal computational overhead so as to meet the real-time imaging requirement.
On the topic of edge AI in medical ultrasound imaging, color Doppler imaging (CDI) is one promising application direction that would benefit from the use of AI tools to overcome systemic limitations. CDI is the goto ultrasound modality for noninvasive flow imaging in cardiovascular healthcare and excels as an accessible, point-of-care solution. One of the main and recurring issues in CDI is the emergence of so-called aliasing This  artifacts 6 that occur when the flow being imaged exceeds the maximum detectable velocity. Aliasing is most likely to arise in conditions with a fast flow that typically emerge in arteries with stenotic plaque. It may also arise in imaging scenarios where the maximum detectable velocity must be lowered to achieve a finer velocity detection resolution or a greater imaging depth. 7 If aliasing is present at a pixel position, its resulting value would become misleading by indicating an erroneous flow speed and direction. We have previously presented a postprocessing framework for aliasing resistant CDI through a deep-learning-based strategy. 8 The framework, however, was trained and implemented in a high-performance workstation and real-time performance was not achieved. AI IN  MEDICAL ULTRASOUND IMAGING,  COLOR DOPPLER IMAGING (CDI) IS  ONE PROMISING APPLICATION  DIRECTION THAT WOULD BENEFIT  FROM THE USE OF AI TOOLS TO  OVERCOME SYSTEMIC LIMITATIONS. In this article, we demonstrate the feasibility of edge-AI-based processing in ultrasound flow imaging, specifically in expanding the velocity detection limit of CDI to enhance its flow visualization quality. To that end, we design an end-to-end graphical processing unit (GPU) accelerated and AI-based computational pipeline based on our previously reported framework 8 and demonstrate its feasibility in an edge computing setting. The framework was implemented on the Clara AGX Platform (NVIDIA Corporation, Santa Clara, CA, USA), a general-purpose edge-computing system for AI applications. The platform is a power-efficient ARM-based system with AI-optimized GPU and small form factor. This article provides an evaluation of the efficacy of this strategy for AI-based aliasing resistant CDI in both phantom and in-vivo imaging scenarios.

Overview of Computational Algorithm
The computational goal of the presented framework is to ultimately produce an aliasing-free color Doppler image from the high-frame-rate data received at an ultrasound transducer and to do so at 25 fps using an AI-based aliasing removal pipeline in an edge computing platform [see Figure 1(a)]. The real-time performance of the framework is achieved by devising a GPU-accelerated pipeline for image formation, flow estimation, AI-based aliasing removal, and image display [see Figure 1 Aliasing artifacts in CDI stem from the underlying pules-echo sensing and Doppler processing paradigm. Flow estimation for CDI is done by measuring the mean change in phase, also called Doppler frequency, in successively received pulse echoes induced by moving scatterers within the region of interest. 7 Aliasing occurs when the pulse repetition frequency (PRF) is lower than the minimum rate (i.e., the Nyquist limit) needed to sample flow-induced Doppler frequency contents. Aliasing manifests as discrete errors in the measured Doppler frequency. An aliased region is shown in Figure 1(a) for conventional CDI.
A deep learning approach to achieve aliasing detection has previously been designed by our group. 8 In brief, this method works by training a U-net CNN to detect aliasing errors that may emerge in every flow pixel in the image frame. It resolves aliased flow pixels by applying phase unwrapping individually to these aliased locations. Further algorithmic details from an edge computing perspective are given in the "AI-Based Aliasing Removal" section.

Edge Computing Hardware for Real-Time CDI Processing
Our aliasing-resistant CDI processing pipeline was implemented on the Clara AGX Developer platform to achieve real-time AI-based performance in a compact computing system. The AGX edge computing platform consists of an 8-core Carmel ARM CPU, 32-GB CPU RAM, an RTX 6000 discrete GPU (dGPU) with 24-GB of VRAM, as well as 2 PCIe 4.0 slots and a 100-Gbps ethernet connection powered by a ConnectX-6 Mellanox network interface card. A photo of this edge computing platform is shown in Figure 1(c).
In contrast to typical high-end Â86-based workstations, the ARM processor is power efficient and ideal for edge applications, operating between 10 and 20W. The platform also includes an internal GPU (iGPU) with 512 Compute Unified Device Architecture (CUDA) cores and 64 Tensor cores, the latter of which are optimized to run deep learning operations. The CPU and iGPU may be run together for applications with low power requirements at 30 W.

AI-BASED ALIASING REMOVAL
Aliasing errors in CDI were removed using an AI-based pipeline consisting of two main stages 8 [see Figure 2(a)]: 1) A pretrained CNN is used to segment aliasing artifacts in the underlying Doppler frequency measurements. 2) An adaptive aliasing removal algorithm, known as phase unwrapping, is applied to the segmented pixels to remove the aliasing errors.

Aliasing Segmentation Using CNN
For the first step, we used our previously trained CNN model for aliasing segmentation. This model was trained using 284 original CDI data frames with three Doppler features (power, frequency, and bandwidth) that were acquired from eight human acquisition trials. 8 The model architecture is a 13-layer U-net based encoder-decoder (total parameters: 31,032,837) with skip connections that were demonstrated to be suitable for aliasing segmentation in CDI for the femoral bifurcation. 8 The architecture of this network, along with the major parameters, is detailed in Figure 2(b). We refer readers to the previous publication 8 for further details on CNN training and related performance measures.
Here, we focus on realizing real-time inferencing with the trained CNN model, which was deployed using the TensorRT library (NVIDIA), on the Clara AGX edge computing platform.

Adaptive Phase Unwrapping
After the inference step, aliased pixels were corrected using the adaptive phase unwrapping algorithm (see Figure 3). 8 We implemented the algorithm using the two-dimensional image and signal performance primitives (NPP) library (NVIDIA Corporation) for CUDA-accelerated image processing. This algorithm consists of the following steps: 1) Extraction of Aliased Regions: Aliased regions were extracted by thresholding the inference map produced by the CNN using the nppiCompare function to produce a binary map indicating where aliasing was present. The binary map was then multiplied with the flow detection map to eliminate spurious aliasing segmentation outside the vessel lumen. 2) Extraction of Edges of Aliased Regions: The edges of aliased regions were extracted via the Canny edge detection algorithm via nppiFilter-CannyBorder with a 5 Â 5 Sobel filter. The detected edges were then dilated using nppiDila-te3x3 with a 3 Â 3 mask.

3) Classification of Aliased Pixels to Connected
Islands: The next step in the pipeline has sought to identify islands of connected aliased pixels as they should be corrected with the same correction factor. The islands were identified using four-way connectivity with nppiLabelMarkers and nppiCompressMarkerLabels. To prevent computational overhead, we excluded islands with a total number of pixels computing using nppiSum that was less than 20 and limited the dealiasing to a maximum of 20 islands. 4) Identification of Internal and External Boundaries of Aliased Islands: For each valid aliasing island, we extracted the internal and external edge pixels around this island to determine the aliasing correction factor for that pixel. This operation was done using a logical AND operation (nppiAnd) between the dilated boundaries map and the body of the aliasing island dilated. 5) Determination of Aliasing Factor for Each Island: The mean of the Doppler data on the internal and external boundaries for each island were then computed using nppiMean with the external and internal boundaries used as masks in the function. The means were then subtracted and rounded up to find the correction factor for that island. 6) Correction of Each Aliasing Island: Doppler phase unwrapping was applied to each aliasing November/December 2022 island by adding the correction factor to the Doppler data via nppiAdd. The final aliasing free Doppler map was then median filtered using nppiFilterMedian with a 3 Â 3 mask.

Ultrasound Imaging Scheme
Our aliasing-resistant CDI processing framework was devised for high-frame-rate plane-wave imaging with a 60-frame repeated transmission sequence, consisting of two interleaved B-mode and Doppler-related subsequences that are each of 30 frames in size. The B-mode sequence consisted of 30 plane waves transmitted with steering angles ranging from À15 to 15 (spaced by 1 , with 0 skipped). The Doppler subsequence was all transmitted with a 10 steering angle.

CDI Processing Pipeline
Our computing algorithm was implemented as a Cþþ program that invoked calls to the CUDA application programming interface (API) (v11.1; NVIDIA) to realize GPU computing on the Clara AGX platform that operated in a Linux environment. Two computing modes were devised: 1) live mode and 2) replay mode. In the live mode, high-frame-rate raw RF data was processed at every second batch to achieve a live frame rate of 25 fps. In the replay mode, the data stream was processed continuously with one skipped frame. This work leverages previously reported 9 GPU kernels. Details on the implemented CDI processing pipeline are given in the following subsections.

Prefiltering and Image Formation
Dedicated GPU kernels were used to perform two preprocessing operations: 1) bandpass filtering of the raw high-frame-rate data (to suppress out-of-band noise) and 2) analytic signal conversion. In the live mode, for every sequence of 60 analytic frames, beamforming was performed on each frame via the delay-and-sum method with a 96-channel Hanning apodization to generate a set of low-resolution images (LRI). The beamforming grid was set to a size of 400 Â 368 pixels that covered a depth range of [0, 3.86] cm and a lateral span of [À1.9, 1.9] cm. Frames in the B-mode subsequence were all beamformed with 0 receive steering to form 30 LRI frames, after which they were coherently compounded to form one high-resolution image (HRI). In contrast, frames in the Doppler subsequence were beamformed with À10 receive steering to facilitate Doppler estimation with a 20 span between the transmit and receive angles. In the replay mode, image filtering and beamforming were done on all the frame batches in the buffer before they were fed one batch at a time to subsequent CDI processing stages. 9

Doppler Processing
From the LRI frames in the Doppler subsequence, corresponding Doppler ensembles with 30 samples each were formed at every pixel position. Pixelwise Doppler ensemble processing was then performed to compute the mean Doppler frequency, power, and bandwidth. In brief, this processing involved two main steps. First, Doppler clutter filtering (removal of tissue echoes) was done via highpass filtering at w c normalized frequency that can be controlled by the user. Second, a lag-one autocorrelation phase estimation algorithm 7 was implemented to derive the mean Doppler frequency, power, variance, and bandwidth. In the live mode, Doppler processing was applied to the 30 Doppler frames from each 60-frame batch being processed in live mode to produce one new duplex CDI frame (Doppler and B-mode) at 25 fps. In the replay mode, sliding window processing (Doppler ensemble size: 30; step size: 1) was performed to generate a high-frame-rate CDI cineloop from all collected data in the buffer. Note that the sliding window strategy was not applied to B-mode processing so one B-mode background image was still produced per batch in the replay mode.

Image Display
To facilitate the generation of each duplex CDI frame, log compression was first applied to the corresponding B-mode HRI, and the frame of Doppler frequency estimates was mapped to a hot-cold bicolor hue to form a raw Doppler image. Next, a color gain map with feathered boundaries was formed to identify image grid pixels that were deemed to contain flow. It was created by thresholding the Doppler frequency variance map and smoothening it using persistence, median and Gaussian filters. Subsequently, OpenGL was used to render the duplex CDI frame on the screen in an 800 Â 787 window. This process involved live layering of the B-mode background image and the Doppler foreground image. Doppler color codes were only depicted in flow regions identified by the color gain map.

Femoral Imaging as a Representative Application Scenario
Our aliasing-resistant CDI framework was tested on phantoms and in vivo. For the phantom experiment, an anthropomorphic wall-less model of the femoral bifurcation was fabricated using tissue-mimicking material and a previously reported lost-core fabrication protocol. 10 The model was connected to a pulsatile flow pump that generated a femoral-resembling pulsatile flow profile with a peak flow rate of 25 ml/s and a pulse rate of 72 beats/min using blood-mimicking fluid. 10 The in vivo femoral bifurcation scan was acquired from a 28-yearold male with no known vascular conditions. This data collection has been approved by the Clinical Research Ethics Committee at the University of Waterloo.

Data Acquisition
Long-axis acquisitions of the femoral bifurcation artery were performed using an ultrasound research scanner (US4US; Warsaw, Poland) with a linear array transducer (SL1543; Esaote, Genoa, Italy). Raw channel-domain ultrasound datasets of plane-wave data acquisition were collected for offline processing on the Clara AGX system. Table 1 summarized the probe properties and imaging parameters used in this investigation. Because the transmission sequence consisted of an interleaved B-mode and Doppler sequence, the effective PRF for Doppler estimation was therefore 1,500 Hz. 8

Performance Assessment
Processing time performance on the Clara AGX system was evaluated for the overall pipeline and its components in live and replay processing modes. For a given acquisition, raw data was first loaded by the program into the RAM. In live mode, the data was then copied to the GPU memory in 60-frame batches for processing using an independent thread at 20 ms per batch. For a batch size of 60 frames, this data copying step corresponded to an effective transfer rate of 3,000 fps, mimicking the operations of the scanner. In replay mode processing, the computational time for GPU-based operations was measured using CUDA events before and after the key stages and the frame update rate was measured using the Cþþ Chrono library. The system's screen was recorded using Kazam software.

Phantom Demonstration
The feasibility of real-time aliasing-resistant CDI was successfully demonstrated on the Clara AGX platform in an edge setting. Figure 4(a) shows the proposed framework running on the Clara AGX, with the results visualized on an external display. A recording of the framework output in live processing mode on the acquisition from the femoral bifurcation phantom is shown in Movie 1 (see supplementary material published online). A snapshot of this output at systole is shown in Figure 4(b), with the corresponding aliasing segmentation and aliased CDI frame shown in Figures 4(c) and (d) respectively. Aliasing artifacts have been correctly segmented every frame by the CNN and pixels that were aliased in conventional CDI became aliasing-free in the output of the proposed framework. The computational pipeline achieved this goal in a live processing mode where raw channel data was processed to a final aliasing-free image with a frame rate matching our target of 25 fps. Table 2 shows the mean and standard deviation of processing time for key stages in the computational pipeline. Aliasing segmentation via the CNN was the bottleneck in the process (17.7AE1.7 ms), whereas the phase unwrapping step required a substantially shorter processing time (1.4AE0.5 ms). Image formation was the second most time-consuming step (9.2AE1.2 ms). Our edge AI framework has also demonstrated its efficacy in high-frame-rate processing for slow-motion playback. Movie 2 provides a visual demonstration of the framework in the replay mode where data was processed continuously and displayed in slow-motion. Movie 2a shows the aliasing-resistant CDI output of the proposed framework; the corresponding aliasing segmentation and conventional CDI outputs are shown in Movie 2b and 2c respectively. This cineloop confirms that aliased pixels in conventional CDI have been correctly segmented by the CNN throughout the cardiac cycle, and aliasing-free CDI has been achieved with the proposed framework. As in the live mode, our target frame rate of 25 fps was achieved. Table 2 shows the processing time of the batch-based processing steps used in replay mode for image formation and Doppler processing. Note that the processing time for other steps (aliasing removal, image preprocessing) are identical to those in live mode.
The variability of the processing time of the phase unwrapping algorithm throughout the cardiac cycle can be tracked in the replay mode and is shown in Figure 4(e) over the flow speed sampled from inside the star annotation [see Figure 4 (d)]. This plot is also shown dynamically in Movie 2. The processing time of the phase unwrapping algorithm varied between approximately 1-3 ms, and it was correlated with increased flow speed in the blood vessel. As flow increases at systole, more pixels exceeded the maximum detectable velocity and thus were aliased. Accordingly, they led to larger aliasing segmentation zones (Movie 2b) and the need to apply phase unwrapping to more pixels, so the processing time was inevitably increased for these frames.

In Vivo Demonstration
The proposed framework successfully removed aliasing errors in the in vivo scenario for both live and replay processing modes. A screen recording of the  Figure 5, showcasing the successful aliasing segmentation and removal. The frame processing times were measured at 40.8AE4.3 ms and 41.0AE1.4 ms for live and replay modes respectively.

Enhanced Flow Visualization With CDI Using Edge AI
This work has demonstrated that it is feasible to realize an AI-based ultrasound imaging pipeline in real-time on an edge computing platform for aliasing-resistant CDI (see Figure 1). Specifically, for the first time, a pretrained CNN 8 was successfully deployed and a GPU-accelerated pipeline (see Figures 2 and 3) was realized on a Clara AGX platform (NVIDIA) where real-time performance was achieved in the absence of cloud computing services. The efficacy of the framework for live and replay mode processing was demonstrated with the range of measurable flow speeds expanded in both phantom (Movies 1 and 2, and Figure 4) and in vivo (Movies 3 and 4, and Figure 5). In turn, aliasing removal in CDI improves the visualization quality of the modality. 7 Achieving aliasing-free CDI in real-time remains challenging from both computational and system perspectives. Previous solutions for aliasing such as staggered transmissions 11 and dual wavelength processing 12 have demonstrated utility in resolving aliasing artifacts but may be challenging to integrate on existing clinical scanners due to their unconventional image pulse sequencing requirements. Alternatively, postprocessing solutions 13,14 have been proposed that operate on the Doppler frequency measurements used for CDI. Our CNN-based approach is one such solution and thus may be generalized to a wider range of applications. The AI-based framework stands out in that it integrates more information about the Doppler signal and extracts the required aliasing features automatically. In this work, we demonstrated the feasibility of our AI-based approach on an edge computing platform. Such demonstration is the first of its kind to enhance CDI's flow visualization quality via an edge computing approach. It demonstrates the application potential of our solution in point-of-care settings to enhance the bedside diagnostic value of CDI.

Future Development
With the computational framework integrated on a Clara AGX edge computing platform, it will be natural to next realize aliasing-resistant CDI in a truly live imaging application with data actively being received from the probe. 9 This work will be of high practical importance as ultrasound imaging is a point-of-care modality and live imaging is important for image guidance. 4 Such a live demonstration will require direct connection between the Clara AGX and an ultrasound scanning platform that allows access to the raw channel data. The associated data transfer rate requirements (up to 80 Gbps 9 ) can in principle be accommodated by the Clara AGX that is equipped with massive input/output (IO) data streaming resources, including two PCIe 4.0 slots with a 128-GB/s directional transfer bandwidth, Ethernet connectivity with 100-Gbps, 18-Gbps HDMI 2.0 input, and a USB 3.1 Gen 2 with 10 Gbps. Given that our computational pipeline can deliver aliasing-resistant CDI from raw RF data in approximately 40 ms (see Table 2), we envision that a live demonstration should be achievable in a point-of-care setting using the Clara AGX platform. Establishing this integrated system pipeline will enable us to efficiently conduct human experiments and, in turn, expand the training set of our CNN models and enhance their generalizability to other imaging scenarios.
Our realization of aliasing-resistant CDI can be considered as a case demonstration of how the performance of ultrasound imaging can benefit from edge AI. In the future, other real-time AI solutions for ultrasound imaging may be implemented via edge computing, such as deep-learning-based beamforming, speckle reduction, tissue segmentation, classification of abnormal conditions, exam guidance, and real-time patient deidentification for the creation of consumable datasets. For computationally demanding applications, such as the concurrent use of several deep learning  It should be emphasized that the future of medical edge computing will require more than just impressive computing hardware. It will by necessity require a secure operating system and a software stack that adheres to demanding security and medical device safety standards. An openly available platform that meets such standards could help future developers save time and resources from the current reality of developing specialized software stacks from the ground up for any new edge computing solution.

CONCLUSION
Deep-learning-based algorithms have the potential to address long-standing issues in medical ultrasound imaging. With the use of AI-ready, general-purpose edge hardware, a real-time aliasing-resistant CDI framework for high-frame rate-ultrasound was achieved, extending the measurable velocity range by threefold without extensive computational resources. By realizing such a solution on a small form-factor system, this work brings aliasing-resistant CDI and AI-based processing closer to clinical adoption at the point-of-care.