Quantifying Valve Regurgitation Using 3-D Doppler Ultrasound Images and Deep Learning

Accurate quantification of cardiac valve regurgitation jets is fundamental for guiding treatment. Cardiac ultrasound is the preferred diagnostic tool, but current methods for measuring the regurgitant volume (RVol) are limited by low accuracy and high interobserver variability. Following recent research, quantitative estimators of orifice size and RVol based on high frame rate 3-D ultrasound have been proposed, but measurement accuracy is limited by the wide point spread function (PSF) relative to the orifice size. The aim of this article was to investigate the use of deep learning to estimate both the orifice size and the RVol. A simulation model was developed to simulate the power-Doppler images of blood flow through orifices with different geometries. A convolutional neural network (CNN) was trained on 30 000 image pairs. The network was used to reconstruct orifices from power-Doppler data, which facilitated estimators for regurgitant orifice areas and flow volumes. We demonstrate that the network improves orifice shape reconstruction, as well as the accuracy of orifice area and flow volume estimation, compared with a previous approach based on thresholding of the power-Doppler signal (THD), and compared with spatially invariant deconvolution (DC). Our approach reduces the area estimation error on simulations: (THD: 13.2 ± 9.9 mm2, DC: 12.8 ± 15.8 mm2, and ours: 3.5 ± 3.2 mm2). In a phantom experiment, our approach reduces both area estimation error (THD: 10.4 ± 8.4 mm2, DC: 10.98 ± 8.17, and ours: 9.9 ± 6.0 mm2) and flow rate estimation error (THD: 20.3 ± 9.9 ml/s, DC: 18.14 ± 13.01 ml/s, and ours: 7.1 ± 10.6 ml/s). We also demonstrate in vivo feasibility for six patients with aortic insufficiency, compared with standard echocardiography and magnetic resonance references.


I. INTRODUCTION
H EART valve regurgitation is a condition where backward flow of blood due to leaky valves may cause volume overloading and compromised net forward stroke volume, and it is associated with a poor prognosis for the patient. The prevalence of valve regurgitation has been estimated to 18%-19% in middle-aged adults [1], and it is projected to increase overall due to an aging population [2]. Patients with mild or moderate regurgitations undergo regular follow-up, but do not benefit from routine surgery. On the other hand, patients with severe regurgitation generally require surgical intervention to improve symptoms and prevent heart failure. Therefore, it is essential to accurately separate severe from mild and moderate cases of valve regurgitation. In this way, patients with severe regurgitation can receive appropriate treatment, while patients with mild/moderate cases can avoid the unnecessary risks associated with surgery.
Transthoracic echocardiography (TTE) is the most common noninvasive tool for assessing the severity of valve regurgitation. Current recommendations [3], [4] recommend a comprehensive evaluation of the severity based on integrating multiple quantitative and qualitative metrics. As a consequence, the grading of valve regurgitation using TTE is a time-consuming procedure and is subject to high inter-and intra-observer variabilities. According to the recommendations, the main method for quantitative evaluation of valve regurgitation is the 2-D proximal isovelocity surface area method (2-D PISA), which provides the effective regurgitant orifice area (EROA), instantaneous flow rates, and total regurgitant volume (RVol) from a combination of color flow and continuous wave (CW) Doppler recordings [5]. However, 2-D PISA is highly userdependent [6], [7] because several important steps must be performed manually, such as selecting the imaging plane, time frame, color flow gain, and measuring the radius of the flow convergence region. Moreover, accuracy is limited by dynamic changes in regurgitation flow rates during systole and deviations from the assumption of hemispheric convergence zones [8].
Recent research based on 3-D Doppler ultrasound in invasive transesophageal echocardiography (TEE) has shown promise for accurate and less user-dependent assessment [9], [10], [11], [12], [13]. However, TTE acquisition is characterized by larger imaging depths and reduced transmit frequencies, which limits both spatial resolution and pulse repetition frequency (PRF). Due to this, the current TTE 3-D Doppler methods have limited diagnostic value alone [3].
Avdal et al. [14] previously proposed a quantitative estimator for the cross-sectional area (CSA) of the regurgitant jet and RVol based on TTE high frame rate 3-D Doppler ultrasound. A high PRF, weakly focused acquisition was used to acquire the entire region of interest continuously, rather than using packet acquisitions as in color flow imaging. The use of continuous acquisitions enabled the estimation of pulsed wave (PW) Doppler spectra in each voxel. The spectra could be used for maximum velocity envelope estimation, which is more robust to the presence of clutter compared with the autocorrelation estimator used in color flow. This method can therefore achieve quantitative flow measures efficiently and in fewer steps than both 2-D/3-D PISA and current 3-D Dopplerbased approaches. Using this approach, accurate estimates of the flow volume through a circular orifice phantom were achieved. However, the area estimator is highly dependent on the choice of power threshold for detecting voxels which contain flow. Moreover, due to the large point spread function (PSF), it is difficult to accurately depict irregular orifice geometries [15]. This is a challenge when trying to distinguish small orifices from larger ones, limiting the clinical value of the method.
In ultrasound imaging, blurring of the imaging object due to the PSF is a common problem which limits resolution and image quality. Techniques for restoring such images are typically based on deconvolution (DC) [16], [17], [18]. DC aims to restore the object f from the image s, given a model of the imaging system which is commonly described as s = f * h + , where h is the PSF, and is the noise.
In recent years, we have seen an increasing use of convolutional neural networks (CNNs) in ultrasound research. CNNs have been applied for DC and image enhancement [19], [20], and a vast amount of structure segmentation from B-mode images [21], [22], [23], [24].
Inspired by recent developments in deep learning, we investigate how CNNs can be trained to perform joint DC and segmentation on highly blurred power-Doppler images of regurgitant cardiac jets acquired using TTE. We develop a CNN-based approach for segmenting the CSA of jets from leaky valves using cross-sectional images extracted from 3-D Doppler recordings. We apply transposed convolutional layers in our CNN to deconvolve and upscale the poor resolution image and segment the jet CSA. Our CNN is trained on the simulated power-Doppler data, as the amount of real data examples is limited and the target labels are unavailable. The data are simulated using ultrasound simulation software and procedurally generated orifice masks of arbitrary shape and size. Hence, training is completely unsupervised, removing the need for manually labeled training data. Finally, we combine velocity estimates from a conventional spectral velocity estimator with the segmented CSA to quantify the instantaneous flow rate.

A. Orifice Generation
We simulated pairs of power-Doppler images and binary target label maps which mimic regurgitant orifices. The orifices were generated using the Bezier polygons where B(t) is the closed boundary of the Bezier polygon parameterized by t. C i are the Bezier curve control points which satisfy where U(x, y) is a uniform distribution of the image pixel positions (x, y), with a domain equal to the imaging region. C 0 is the initial control point drawn from U(x, y).
We define the object function as a binary image I (x, y) where I (x, y) = 1 within the region (x, y) enclosed by B(t), and I (x, y) = 0 elsewhere. The generation parameters are listed in Table I. Fig. 1 shows six example orifices generated procedurally, showing a variety of possible shapes. In patients with valvular regurgitation, the orifice shape can indeed vary from case to case, based on the cause and position of regurgitation [25], [26]. Using our approach, we can generate a large variety of shapes for the training set, which prevents the model from overfitting on certain geometries.

B. Ultrasound Simulation
The power-Doppler images were generated in a two-step process. First, the pulse-echo field h pe is computed assuming dynamic receive focusing, using the Field II ultrasound simulation software [27], [28], and the parameters are listed in Table II. Finally, the power-Doppler realizations R 0 (x, y) were generated by integrating the pulse-echo contributions from the pixels that belong to the orifice I (x, y)  where x h and y h denote the coordinates in which the pulseecho response is calculated, and P h is the energy of h pe calculated as The same pulse-echo field can be used to generate different power-Doppler realizations by changing the orifice map, which allows for quick generation of training data. By integrating the pulse echo signals as in (3), we can achieve simulations efficiently without loss of integrity. This approach was preferred over averaging the backscattered signal from randomly distributed scatterers as this would require too much time considering the amount of training examples needed.

C. Data Augmentation
The training set consisted of 30 000 power-Doppler and orifice pairs with varying imaging depths and center frequencies. To account for local a reduction in contrast observed in our experimental setup, we superimposed N i bivariate Gaussian functions to each simulated power-Doppler image where N i is uniformly sampled from {0, . . . , N max }, µ j and S j are uniformly distributed random variables deciding the position and covariance of the j th Gaussian, and α is the augmentation intensity. The data generation parameters are summarized in Table I.

D. Model Training
The simulation and training phases are depicted in Fig. 2. The GE 4Vc-D geometry was used. The simulation parameters are shown in Table II. N = 30 000 orifices were procedurally generated with areas uniformly distributed between 0 and 75 mm 2 . The power-Doppler images were computed using h PE , with varying transmit configurations, i.e., varying values of z and f c .
We normalized the images using Z -score standardization and superimposed random Gaussian blooming to each image, as described by (5). Depth and frequency information was added as separate input channels as images with all the points having the value of z in meters, and f c in MHz. For training, we generated 1000 images for each transmit configuration, amounting to 30 000 images in total. The CNN was trained on the augmented power-Doppler images with the orifice binary maps as target labels, using the Adam optimizer [29] with a learning rate of 0.001 and a binary cross entropy loss function. The augmentation intensity α was chosen by training models with varying augmentation intensities and choosing the one which achieved the highest flow rate estimation accuracy in an experimental flow setup.
The model was implemented using the Keras Python deep learning application programming interface (API) and trained on a NVIDIA Quadro RTX 3000. Data generation and training took about 2 h. For validation, we generated 600 images for each transmit configuration. The model was validated on the in silico test dataset. The area estimates from the model predictions were compared with the ground truth and reference segmentation approaches described in Section II-F.

E. Network Architecture
The model architecture was a lightweight (∼20 000 parameters) network using transposed convolutional layers (UpConv2-D) at the end of each convolutional block, such that the number of image pixels at the output of each block is doubled. The input images are hence transformed from 15 × 12 to 128 × 128 pixels. The transposed convolutional layers learn to upscale the image directly from the training data as opposed to using simple interpolation techniques. Each convolutional block consists of two convolutional layers with rectified linear unit (ReLU) activations and batch normalization layers to stabilize training. Dropout layers with a dropout rate of 0.25 are added to the end of each block for regularization. The output activation function enforces a binary output. We chose the hyperbolic tangent for this purpose, although a sigmoid would perform equally well. Residual connections connect the first and last layers of each convolutional block to allow for low-resolution features to flow through the network. This is a common technique used in deep residual networks (ResNets) [30], which has been shown to improve training stability. We herby refer to our network as "ResNet."

F. Reference Segmentation Methods
For comparison with the proposed deep-learning-based segmentation, we used two reference approaches. One approach is a conventional −3 dB thresholding of the power-Doppler image, which was used by Avdal et al. [14]. The other approach was spatially invariant nonblind DC, similar to [16]. We used the Richardson-Lucy DC algorithm [31] with 20 iterations to deconvolve the power-Doppler images using the analytical PSF where L AZ and L EL are the aperture dimensions calculated using Table II, c is the speed of sound, z is the imaging depth, and x and y are the azimuth and elevation positions, respectively. The deconvolved image was segmented by thresholding at 50% pixel intensity.

G. Phantom Experiments
We validated the method using the custom-made flow phantom shown in Fig. 3. The acquisition and signal processing parameters are summarized in Table III. The phantom was filled with a mixture of water and corn starch to mimic the scattering properties of blood. Channel data were acquired using a GE 4Vc-D probe and a GE E95 scanner operating in high PRF mode. The scanner was locally modified to enable diverging wave acquisitions with a focal point 40 cm behind the transducer. We performed the measurement for insonation angles at 0 • , 30 • , 40 • , and 50 • , and for three different flow rates. The flow rate was varied by adjusting the height of the upper fluid reservoir. An ultrasonic flowmeter (Cynergy3 UF25) was used as a reference. We performed the experiment for circular orifices with sizes 15, 25, 35, and 45 mm 2 . We performed a similar experiment for three orifices with noncircular shapes, namely, an equilateral triangle (35 mm 2 ), a half circle (35 mm 2 ), and a bifurcation of two circular orifices (15 and 25 mm 2 , respectively).
The in-phase quadrature (IQ) channel data were recorded for offline processing. The channel data were beamformed using the MATLAB UltraSound ToolBox (USTB) and clutter filtered using a finite impulse response (FIR) filter with an asymmetric frequency response to remove clutter from recruited flow. The passband of the filter was adjusted in each recording to match the observed PW spectrum. We estimated the power-Doppler signal R 0 (x, y, z, t) from the filtered signal s(x, y, z, t) by calculating the energy |s| 2 with an observation window of 10 ms and an overlap of 50%. The power-Doppler signal was smoothed temporally using moving average filter with length N smooth = 11 (1 ms) and radially using a filter with length N z = 3 (0.2 mm).
The mean velocityv mean was estimated using a spectral envelope estimatorv wherev max is the maximum velocity envelope estimated from the PW Dopppler spectrum, andB is the estimated bandwidth. We estimated the PW Doppler spectrum using a discrete Fourier transform applied to the same temporal window used to generate the power-Doppler images. The spectrum was smoothed along the temporal dimension similar to R 0 . Before envelope detection, the spectrum was binarized automatically using Otsu adaptive thresholding [32]. Estimating the mean velocity using the spectral envelope was preferred over autocorrelation as mean velocity estimators are biased toward by low-frequency clutter, which was present in cases of suboptimal clutter filtering. The maximum estimator was shown to be more robust in [14].
Finally, the quantitative metrics were estimated according to x,yv mean (x, y, z vc , t)g(R 0 (x, y, z vc , t))dxdy (9) where CSA(t) and Q(t) are the cross-sectional area and flow rate, respectively. The segmentation operator is denoted by g(·). We acquired the segmented orifice image sequence by segmenting the power-Doppler cross sections R 0 (x, y, z vc , t).
The parameter z vc is the vena contracta depth, which was selected manually in the phantom experiment for each recording. For clinical use, z vc needed to be estimated automatically, as described in Section II-H.

H. In Vivo Feasibility Analysis
We acquired 3-D channel data from six patients with aortic valve regurgitation. All the patients provided written consent, and approval was given by the regional committee for medical and health research. We used a GE Vivid E95 scanner with a 4V-D probe in high PRF mode, using the same parameters as in Section II-G. These recordings were made with a setup using a focal point 30 cm away from the transducer. At the time of recordings, we did not have approval from our industry partners for our improved setup using a −40 cm focal point, but approval was granted at a later time. Comprehensive echocardiograpic examinations were performed to provide reference values for EROA and RVol using 2-D PISA. Magnetic resonance imaging (MRI) was also performed to provide RVol. The reference measurements  were performed by a cardiologist which was blinded to the results provided with the 3-D Doppler method. We applied our method to the 3-D channel data to estimate RVol and CSA.
The processing chain for the in vivo data was the same as in the phantom experiment. However, to account for valve and vena contracta motion in the clinical recordings, we estimated the vena contracta depth z vc at each time t as wherev max is the same maximum velocity envelope as in (7). Here, we used the assumption that the maximum velocity occurs at the vena contracta. RVol was estimated by integrating the flow rate Q(t) over the regurgitation time.

A. Model Training and In Silico Validation
We evaluated ResNet segmentation accuracy on a test set consisting of 600 simulated power-Doppler images. The test images were generated similar to the training images, as explained in Section II-D. The mean area estimation errors were 3.5 ± 2.2 mm 2 for ResNet, 13.2 ± 9.9 mm 2 for power thresholding, and 12.8 ± 15.8 mm 2 for DC. Two examples from the test set are shown in Fig. 4. ResNet accurately reconstructs the underlying orifice. The DC method is less able to restore the original shapes accurately, likely due to its assumption of spatial-invariant PSFs. Thresholding is limited to only providing near-elliptical predictions in the object centers, since the PSF severely blurs any sharp edges. The results shown in Fig. 5 indicate that ResNet achieves improved segmentation accuracy compared with the references and differentiates better between small and large orifices. Fig. 6 shows the performance of models with different training schemes when subjected to test data with varying imaging depths and transmit frequencies. The results indicate that providing explicit knowledge about depth and frequency during training is beneficial. This was expected, as there will be ambiguities in the relationship between the PSF and object size when these parameters are changed. In addition, estimation accuracy decreases with increasing depth, and also decreases for frequencies outside of the training domain. This was also expected, since the transmit frequency and imaging depth affect both the axial and lateral resolutions. In (b), we see a power-Doppler elliptical cross section of a jet from the experimental setup described in Section II-G. Using α 10 −2 results in incorrect segmentation of local areas with reduced contrast. Using α 10 −2 results in a poor training phase and therefore inadequate predictions with high errors. Using α = 10 −2 , we mostly avoid incorrectly segmenting low-contrast regions. Fig. 7 shows the results from models trained with different augmentation intensities α. Following a grid search, a value of α = 10 −2 gave the best quantitative accuracy while visibly mitigating the effects of local reductions in contrast. The search was performed by training the models with α values between 0 and 1 and computing their average flow rate estimation error for the flow phantom study with circular orifices, as well as monitoring the segmentation qualitatively.

B. Phantom Experiments
Figs. 8 and 9 show the flow rate and area estimates from the phantom setup with four circular orifices, described in Section II-G. The results indicate that ResNet achieves slightly less biased estimates compared with power thresholding and DC, while interframe variability is similar. Fig. 10 shows flow phantoms with different orifice shapes along with the experimental power-Doppler images. The jet cross sections are segmented using ResNet, power thresholding, and DC. The results indicate that ResNet better reconstructs the shape of the orifice than the references. Fig. 11 shows the results from six patients with aortic regurgitation. The plots compare the RVol and CSA estimated using ResNet, DC, and thresholding. The results are compared with MRI and 2-D PISA. In Fig. 11(b), 2-D PISA EROA estimates are plotted along with CSA as estimated by our method. Note that since 2-D PISA EROA is estimated indirectly using the peak velocity from the CW spectrum, it is not directly comparable to our method, which estimates the CSA directly. Fig. 12 shows PW spectra, power Doppler with jet segmentation, and velocity estimates from the six patients. Fig. 13 shows a summary of the flow rate results, comparing  (15,25,35, and 45 mm 2 ) using segmentation from power thresholding, DC, and using ResNet. Each separate measurement is from a recording for a certain orifice size, flow rate, and angle. Error bars signify the standard deviation between frames in each recording (10-15 frames per recording). Each recording had a duration of about 60 ms, which is close to the regurgitation durations we observed clinically. The black line shows the flowmeter reference. The estimated velocity field was the same for all the methods. Linear regression slopes and coefficients of determination for each orifice size are denoted by β and r 2 , respectively. Fig. 9. Angle-corrected CSA estimates of circular orifices (15,25,35, and 45 mm 2 ) using segmentation from power thresholding, DC, and using ResNet. The error bars signify the standard deviation for all the angles and flow rates for a given orifice size.

C. In Vivo Feasibility Analysis
the accuracy from simulations, the experimental validation, and the patient data.
We can observe that ResNet is more robust than the other segmentation methods. Power thresholding and DC are more prone to overestimation, most notably in patients 2 and 3. This is attributed to ResNet's ability to infer smaller areas from the highly blurred power-Doppler images, as can be seen in Fig. 12. We can see in Fig. 11(a) that this ability has a big impact on the RVol estimates, in which ResNet has a better agreement with the 2-D PISA and MRI references.

IV. DISCUSSION
In this work, we combined deep learning and high frame rate 3-D ultrasound to quantify regurgitant jets in heart valves. This was done using a neural network trained on the simulated data to segment the regurgitant orifice from poor resolution power-Doppler images, which facilitated estimators for the orifice area and RVol. The experimental and simulation results shown in Figs. 4, 5, 8, and 10 suggest that deep learning-based segmentation achieves higher accuracy than power thresholding and spatially invariant DC, and it is able to reconstruct the orifice shapes from low-quality images. We also demonstrated feasibility for six in vivo cases of aortic regurgitation, as shown in Figs. 11 and 12.
The neural network can be trained entirely on the simulated data, and the inference time is short due to the lightweight architecture. The experimental validation showed that our approach is transferable from the simulated domain to real acquisition data, even though ResNet has been trained solely on the simulated data. One challenge we encountered was the difference in signal-to-noise ratio (SNR) between the simulated and observed power-Doppler images. Moreover, we noted a presence of a diffuse signal surrounding the jet in the observed data, which causes a further reduction in contrast. We believe that the cause for this signal component could be a combination of recruited flow, defocusing due to phase aberration and side lobes. We could account for these blooming effects using augmentation with the Gaussian bivariate (Left) PW spectra, with mean velocity traces plotted in green, and current time frames marked by blue dashed lines. (Right) Power-Doppler cross sections with the CSA segmented using thresholding (white), DC (blue), and ResNet (red). In patient 6, we can observe from the spectrum that the PRF is insufficient to capture the entire velocity envelope. This likely explains the underestimation of RVol for this patient, as can be observed in Fig. 11. functions, a strategy which previously has been applied to account for shadowing artifacts in B-mode images [33], [34].
To illustrate how increasing realism increases the problem complexity, flow rates from the different test environments are compared in Fig. 13. We can observe that when moving from the simulations to the experimental data, and then to patient data, the accuracy decreases at each step. This trend shows that even though our deep learning model performs well within the simulator conditions, the simulator is limited in providing sufficiently realistic training examples which cover the challenging clinical conditions. The aforementioned effects that are present in a clinical environment, but not in the simulated or experimental environments, have a big impact on the overall signal quality. Valve motion introduces high-intensity clutter which is difficult to effectively filter away and also affects vena contracta depth estimation. We also suspect that aberration and small intercostal windows cause additional deterioration of the image quality.
The experimental setup facilitated validation of our method in a controlled environment where the true orifice geometries and flow rates were known. However, our setup was not intended to accurately mimic the clinical case. The experimental environment has limited realism compared with the clinical cases of TTE for aortic regurgitation. Notable limitations of the experimental setup include the lack of fatty tissue aberrators and ribs and the lack of a moving valve apparatus which may cause shadowing and clutter noise. Future work should aim at creating experimental environments closer to the clinical case. This would facilitate better analysis of the method's limitations in a controlled environment.
Deep learning was only used to segment the regurgitant orifice from power-Doppler images, while velocity was estimated using a conventional PW Doppler estimator. Moreover, the vena contracta depth needed to be estimated prior to segmentation. Future work could expand the method to infer both area and velocity from 3-D plus time volumes of IQ data, alleviating the need for handcrafted estimators. The neural network architecture would need to be changed to incorporate learning correlations in three dimensions and across time. A combination of 3-D convolutional layers and temporal units such as recurrent layers or attention-based mechanisms could be used.
To achieve a model capable of inference directly from 3-D plus time volumes, we would need an abundance of such training volumes. This creates the need for a simulator which is fast enough to generate enough training examples in reasonable time, while still accounting for spatially variant PSFs. Field-II would not be fast enough for this purpose; however, FLUST [35] is a viable alternative. In the future, we are planning to make a 3-D plus time model based on a FLUST simulator, as well as improve the key steps in high PRF acquisition, and processing steps such as adaptive clutter filtering. We believe that these improvements would provide a method capable of producing more robust and accurate results in a clinical environment.

V. CONCLUSION
In this article, we presented a method that combines deep learning segmentation and 3-D high frame rate ultrasound for the quantification of flow rates, flow volumes, jet areas, and shapes for heart valve regurgitation. We showed that our approach better distinguishes between different regurgitation sizes and reconstructs the orifice shape better than a previous approach using thresholding and an approach using spatially invariant DC. In vivo feasibility was demonstrated for six patients with aortic regurgitation. Challenges in the acquisition and image formation need to be solved to ensure sufficient in vivo image quality prior to segmentation. We believe our method could be valuable in clinical assessment in the future, as it could provide higher accuracy results with less user dependency than current recommended methods.