Two-Stage Motion Correction for Super-Resolution Ultrasound Imaging in Human Lower Limb

The structure of microvasculature cannot be resolved using conventional ultrasound (US) imaging due to the fundamental diffraction limit at clinical US frequencies. It is possible to overcome this resolution limitation by localizing individual microbubbles through multiple frames and forming a superresolved image, which usually requires seconds to minutes of acquisition. Over this time interval, motion is inevitable and tissue movement is typically a combination of large- and small-scale tissue translation and deformation. Therefore, super-resolution (SR) imaging is prone to motion artifacts as other imaging modalities based on multiple acquisitions are. This paper investigates the feasibility of a two-stage motion estimation method, which is a combination of affine and nonrigid estimation, for SR US imaging. First, the motion correction accuracy of the proposed method is evaluated using simulations with increasing complexity of motion. A mean absolute error of 12.2  $\mu \text{m}$  was achieved in simulations for the worst-case scenario. The motion correction algorithm was then applied to a clinical data set to demonstrate its potential to enable in vivo SR US imaging in the presence of patient motion. The size of the identified microvessels from the clinical SR images was measured to assess the feasibility of the two-stage motion correction method, which reduced the width of the motion-blurred microvessels to approximately 1.5-fold.

A MONG the many medical imaging modalities, ultrasound (US) imaging stands out in terms of accessibility and cost. Using conventional B-mode or contrast-enhanced ultrasound (CEUS) imaging at clinical frequencies, sub-wavelength structures such as the microvasculature cannot be resolved due to the fundamental diffraction limit. This limit can however be overcome by the method of ultrasound super-resolution (SR) techniques, such as ultrasound localization microscopy [1], where the final image is formed by localizing spatially isolated microbubbles (MBs) through multiple acquired frames. Viessmann et al. demonstrated that it is possible to spatially resolve two touching 200 µm internal diameter tubes using an unmodified clinical CEUS system operating 2 MHz [2]. Since then, several research groups have demonstrated the use of SR imaging within microfluidic channels [3], a tissue phantom with microvessels through an ex vivo human skull [4], and in vivo in mouse and rat models [5]- [9]. A theoretical localization precision as low as 1.8 µm was predicted for ultrasonic localization microscopy for human breast imaging at 7 MHz [10].
In vivo imaging introduces the additional complexity of sample motion during image acquisition. Christensen-Jeffries et al. mapped the micro-vasculature in vivo in a mouse ear where vessel features as fine as 19 µm, which is approximately 6 times smaller than the receive wavelength (∼ λ rx /6), were visualized in under 10 minutes of ultrasound acquisition using an unmodified clinical system [5]. In this work, a 2D sub-pixel cross-correlation was used for motion correction and gating to avoid artefacts due to motion caused by breathing. Errico et al. imaged an in vivo rat brain that was fixed within a stereotactic frame to minimize the motion. A ten minute acquisition was required for each coronal plane of the whole-brain scan to form the SR images that can resolve two vessels located 16 µm (∼ λ rx /6) apart [6]. In their later study, Hingot et al. used a cross correlation based method to correct for the motion between frames, which was applied within a block of 200 images acquired at 500 frames per second [11]. Ackermann and Schmitz performed multiple microbubble tracking in vivo in a tumor xenograft-bearing mouse and measured capillary blood flow (< 1 mm/s). Due to respiratory motion, they discarded 1151 frames out of 6000 frames acquired over four minutes [7]. Lin et al. detected vessels in vivo in tumor-bearing rats as small as 25 µm (∼ λ rx /7) by 3D ultrasound localization microscopy with a total acquisition time of 11.5 minutes (16 seconds for each 2D slice). They excluded 20 − 30% of the acquired frames due to the breathing induced motion artefacts to avoid the interference of bubble positions [8].
Rather than gating or stabilizing the sample, another approach to reducing motion artefacts is to use a higher concentration of microbubbles and shorter acquisition times to form SR images; however the final resolution of SR images obtained via localization of spatially non-isolated MBs is currently poorer than those obtained using the methods described above. Bar-Zion et al. imaged in vivo rabbit kidney and tumor models using higher order statistics by acquiring less than a second of high frame rate ultrasound data. They achieved an improvement of 50% in spatial resolution with a significantly shorter acquisition time as low as 0.1 seconds that makes the proposed approach clinically applicable [9]. However, even at this short time scale, super-resolution images will still be prone to motion artefacts at the micrometer level and motion correction may be required, especially for handheld clinical scans.
For SR US imaging to become useful clinically, motion artefacts must be addressed first. During normal breathing, the diaphragm moves 15 mm and the chest circumference changes 7 mm, and respiration causes translation of organs in the upper body [12]. Although respiratory motion is usually considered to be rigid, human soft tissue is mostly anisotropic and tissue deformation is only in the linear elastic region of the stressstrain curve for tissue strains up to 5% [13]. Cardiac motion is very complex and non-rigid, involving longitudinal and radial contractions. Although it does not generate as much motion as respiration, the region of the liver adjacent to the heart is typically displaced by approximately 4 mm [12]. Moreover, there are many unpredictable and unavoidable sources of motion in the body generating rigid and non-rigid motion such as: swallowing, coughing, peristalsis, bowel movements, pulsations of arterioles and venules, and other local muscle movements. Motion is an inherent part of diagnostic imaging and, unless corrected, it sets the achievable resolution limit in SR US imaging.
Doppler-based motion estimation is sensitive to small phase delays in the RF data and it can compensate for the local motion in the axial (or equivalently the radial) direction. Poree et al. achieved contrast and image quality improvement by applying this technique to high frame rate echocardiography [14]. Doppler based motion estimation worked well for this application since the lateral motion observed in their study was smaller than half of the lateral (or equivalently the crossrange) resolution. Gammelmark and Jensen demonstrated that axial motion compensation alone is not sufficient when the total motion in lateral direction is large in comparison to the wavelength [15]. They have performed the motion correction by tracking the position of the pixels in each low-resolution image acquired with synthetic transmit aperture method, which is similar to super-resolution imaging in a way that a combination of many low resolution images is necessary to generate a high resolution image.
Ideally, the motion should be compensated with an accuracy higher than the spatial resolution to be achieved in the SR image. When the problem of motion correction is solved for clinical ultrasound imaging, it will be possible to achieve a resolution below 10 µm, which will enable the imaging of human capillary vessels that may benefit many applications. Imaging blood vessels at micro-scale can reveal the elements that modulate endothelial barrier function; such as blood-brain barrier opening [16]. Many features of the immune system's interactions with small blood vessels and microcirculatory networks can be observed by using SR imaging. Vascular abnormalities associated with tumor growth can be monitored in detail, where the acoustic angiography using contrast agents already revealed the potential of using high resolution imaging [17]. High resolution and accurate imaging is the key to success for the diagnosis and endovascular treatment of peripheral arterial disease [18].
In this study, a two-stage motion estimation algorithm previously used in magnetic resonance imaging was applied to SR US imaging [19], with the goal of correcting for both rigid and non-rigid sample motion. The accuracy of the motion estimation method was analyzed in silico and also the application of this method to clinical SR imaging was demonstrated in vivo.

II. MOTION ESTIMATION AND CORRECTION
Here, we refer to motion as the combination of ultrasound probe motion and tissue motion such as respiratory motion, cardiac motion and other patient movements. It is not usually possible to control these sources of motion, which can be on the order of millimeters for an in vivo SR image that requires seconds to minutes of data acquisition. In SR imaging, motion can be as low as a micrometer for high frame rate imaging (> 1000 fps). However, the total acquisition time is determined by the speed of the physiological processes, such as the blood flow velocity in microvessels, not by the imaging frame rate [5], [20]. Over such a long duration, the motion between the first and the last frame will be a combination of rigid motion and local non-rigid deformations with different amplitudes.

A. Two-Stage Motion Estimation
The motion estimation is based on an image registration approach which was previously applied to MRI and the MATLAB (The MathWorks, Natick, MA) code is currently available to download [21]. This approach is based on the work of Rueckert et al. and Lee et al. [22], [23] and it is capable of performing rigid, affine, non-rigid, and two-stage motion estimation. The rigid registration is capable of capturing the translation and rotation. The affine registration can estimate translation, rotation, shearing and resizing. The non-rigid registration is a B-spline based free-form deformation that can estimate the local compression and rarefaction of tissue [22]. Non-rigid registration is achieved by minimizing a cost function, which is a combination of the cost associated with the smoothness of the transformation and the cost associated with the image similarity. Smoothness of the transformation is crucial to mimic the local deformation of the soft tissue, where adjacent points move cohesively. Smoothness is achieved by introducing a penalty term C regularization (T ) which regularizes the transformation and ensures that the resultant transformation field is not noisy. In summary, a high value of the regularization parameter ensures neighboring transformation points are similar and that they vary smoothly over space. A low value allows greater freedom of changes in neighboring transformation points. It is hard to relate this term to an actual physical quantity or fit a model to predict the optimal value, therefore this study used a grid search method to determine an appropriate value of C regularization (T ). When the registration algorithm finishes optimizing the cost function for a given C regularization (T ) and converges, the transformation matrix T saves the estimation result that can be used to correct the motion in the registered frames. Two-stage image registration is a combination of affine registration which estimates the global motion, and non-rigid registration which can estimate the local deformation of tissue.
For a small regularization penalty, motion estimation results in a viscous fluid-like registration with long computational time where pixels can move almost independently which is not realistic for human soft tissue. For a large regularization penalty registration finishes after a small number of iterations and can result in large errors for complex motion fields, such as combinations of large global movement and small local deformations. The two-stage registration approach is advantageous because the affine registration finds a rough global estimate first and then the non-rigid registration refines the final solution. The two-step approach effectively increases the range over which the non-rigid registration will work and improves the speed and the convergence of the optimization process.
There is an obvious trade-off between the resolution of the mesh size used in the registration model and computational complexity. In order to achieve the best compromise between the resolution or the accuracy of the non-rigid deformation and the computational cost, this model implements a hierarchical multi-resolution approach [23]. Resolution of the control grid spacing is increased along with the image resolution from a coarse to fine level and several iterations are performed at each stage until the cost function is minimized [22]. Fig. 1 shows the SR image processing chain. Motion estimation was performed as the first step on the B-mode image and the transformation matrix was used to correct the motion in the CEUS images. Two-stage registration implicitly performs affine and non-rigid registrations and outputs one transformation matrix and the motion corrected image.

B. Application of Motion Correction to Super-Resolution
For the clinical study, the CEUS images were generated by the clinical ultrasound system. After the motion correction stage, a spatio-temporal filtering based on singular value decomposition was applied to remove the residual tissue echoes from the CEUS images [24]. Separation of microbubble and tissue signals are crucial for SR imaging and it is not a straightforward procedure since nonlinear propagation of ultrasound through microbubble contrast agents can lead to imaging artefacts including subsequent erroneous localization of MBs [25], [26]. After the filtering stage a threshold was applied to remove the noise before microbubble detection. In the microbubble detection stage an intensity threshold was used to reject large signals which might be due to multiple microbubbles. Finally, the super-localization stage was performed as explained in [27].

A. Simulation Study
To verify the accuracy of the proposed motion estimation method, Field II simulations were performed [28], [29]. Controlled motion patterns were simulated on a tissue phantom with increasing complexity of motion as illustrated in Fig. 2, Fig. 3 and Fig. 4. In all simulations, white Gaussian noise was added to the simulated data before the beamforming operation and the SNR was calculated as a ratio between mean tissue signal and standard deviation of the white Gaussian noise.
Probe motion was simulated by moving the location of scatterers together in the axial or lateral direction as shown in Fig. 2. Translation of scatterers in the axial or lateral direction creates a rigid tissue motion with uniform displacement. These simulations were performed for 11 different exponentially spaced motion amplitudes for a range of 1 − 1024 µm and they were repeated 20 times with each repeat having a different noise.
Tissue deformation was generated by displacing the scattering points as a function of depth or lateral distance using a linear stress-strain relation to mimic the effect compression from top by an ultrasound probe or compression from side by a moving organ or muscle. By moving scatterers independently from each other, a non-rigid tissue motion was created as illustrated in Fig. 3 (left) and (middle). As shown in Fig. 3 (right), the motion within the imaging field changes from a few micrometers to up to a millimeter, which corresponds to a maximum of 2% compression in the axial direction and 2.5% compression in the lateral direction.
For the last simulation, a more realistic motion pattern was simulated in Field II by using the motion field estimated from a clinical scan. Two frames acquired approximately ten seconds apart were chosen as the reference frame and the frame with motion. The extracted motion field from these two frames accommodates a combination of a large scale counterclockwise motion located at south-east of the image and a small scale clockwise motion located at north-west of the image with an average motion of 203 ± 113 µm as shown in Fig. 4 (middle). This motion field was then applied to a numerical simulation of a homogeneous tissue phantom shown in Fig. 4 (left). Simulation parameters were chosen specifically to match the parameters of the clinical study with a center frequency 6 MHz, 80% bandwidth, 160 elements and a pitch of 237.5 µm. The −6 dB width of the point spread function at The simulated tissue phantom had 10 scatterers per resolution cell for every simulation to generate a fully developed speckle pattern [30]. For the simulations with probe motion and tissue deformations, the phantom included circular hypoechoic and hyperechoic regions with a diameter of 2, 3, 4, and 5 mm and 4 point scatterers ( Fig. 2 and Fig. 3). For the realistic motion simulations (Fig. 4), a homogeneous phantom was used without any structure, which makes the motion estimation harder because in this case there are no dominant features in the B-mode frames to aid the motion estimation, and the motion estimation is therefore obtained purely from the simulated speckle pattern. The attenuation coefficient was set to 0.5 dB/cm/MHz and white Gaussian noise was added to the simulated data before beamforming. By shifting the scatterers together or independently, a different speckle pattern was generated for all tested scenarios. The signal to noise ratio in the simulated B-mode image was varied between 20 dB, which is the empirical lower bound where a suitable motion correction is possible for SR imaging with the proposed approach, and 50 dB, which is the dynamic range of the ultrasound scanner used in the clinical study.
The motion estimation algorithm was applied to the simulated B-mode image after envelope detection, log-compression and downsampling. The resulting images had a pixel size of 60×60 µm 2 and a 50 dB dynamic range. During image registration, most of the parameters were kept the same except the grid spacing and the regularization penalty, which had a large effect on motion estimation accuracy for some cases. The similarity measure of squared pixel distance (also called squared difference) was used for all estimations. Among two interpolation techniques available with this motion estimation method, linear interpolation was used instead of cubic, as cubic interpolation was found to have higher error values due to the discontinuities inside and at the boundaries of the image [31]. The final result was cubic interpolated. The registration was always initialized with a uniform grid and the maximum number of grid refinement steps used by the multi-resolution method was fixed to two. The registration method was changed between rigid, affine, non-rigid, and twostage (also referred to as both inside the MATLAB code) in order to demonstrate the accuracy of each method for different scenarios. A grid spacing of 128×128 was used for all simulations except the simulated translational motion, where a larger spacing of 256×256 and 512×512 showed an improvement. When a grid size of 128×128 was chosen initially, the registration function also used a 64×64 grid and a 32×32 grid thanks to two subsequent grid refinement steps. To demonstrate the potential of the motion estimation method, the regularization of the two-stage and non-rigid registrations were optimized for each simulation. Each simulation was run for a set of regularization values (10 −6 ≤ C regularization (T ) ≥ 1) and the penalty parameter that achieved the minimum error was chosen as the optimum value. For the simulated probe motion, lowest error values were achieved by using a higher regularization parameter as opposed to the cases for simulated tissue deformation and realistic motion.

B. Clinical Study
Healthy volunteers were recruited from a research center (Charing Cross Hospital, Imperial College London). The study was approved by the National Research and Ethics Committee (Reference 13/LO/0943), and each participant provided written informed consent.
Clinical data was acquired in the tibialis anterior muscle using a Philips iU22 ultrasound scanner (Philips Medical Systems, Bothell, WA) with a handheld 3 − 9 MHz linear array probe. A vial of Sonovue (Bracco S.p.A, Milan, Italy) was diluted using normal saline (25 mg in 20 mL) and was administered as an intravenous infusion (VueJect, Bracco S.p.A, Milan, Italy) at a rate of 4 mL/min via an 18G cannula placed in an antecubital vein. The cannula was flushed with saline (5 mL of sodium chloride 9 mg/mL (0.9%) solution) and disconnected. B-mode and CEUS (power modulation) frames were acquired using the RS imaging mode of the Philips scanner operating around a 6 MHz center frequency with a mechanical index of 0.06 and a dynamic range of 50 dB.
Several acquisitions from three healthy volunteers were recorded over a duration of 40 to 55 seconds with 500 to 700 B-mode and CEUS frames at a frame rate of 13 Hz and the resulting data was saved to disk as video files. Supplementary video shows the in vivo CEUS and B-mode data. It can be seen that the tissue signal dominates the B-mode images and that the moving bubble signals will not significantly affect the motion estimation. B-mode frames were used for motion estimation and the motion correction was performed on CEUS frames before formation of the SR images.

A. Simulation Study
The accuracy of the motion estimation for the simulated probe motion in the axial and lateral directions are given in Fig. 5 (left) and Fig. 5 (right) respectively. The mean absolute error values are plotted with an error bar that represents the standard deviation for a range of motion between 1 and 1024 µm.
In the case of rigid probe motion in the axial direction, all methods estimated the motion with less than 3.5 µm absolute error and with a standard deviation smaller than 3 µm. In the case of rigid probe motion in the lateral direction, the absolute error increased for all motion estimation methods, where nonrigid motion estimation gave larger error values (8 ± 7 µm) for the simulations with 20 dB SNR.
Results of the non-rigid tissue deformation simulations are shown in Fig. 6 (left) and Fig. 6 (right) for all motion estimation methods. Results of the rigid motion estimation for all noise levels were above 45 µm, which is not suitable for SR US imaging. Both affine and two-stage motion estimation performed similarly for tissue deformation illustrated in Fig. 3 with absolute error values below 5.4 ± 5.2 µm for lateral and below 3.7 ± 2.9 µm for axial motion. Due to the nature of the simulated linear elastic compression of tissue, affine motion estimation performed better than the non-rigid motion estimation.
The computation time including both the motion estimation and motion correction were measured for an image with a size of 641 × 670 pixels while using only a single core of 12 core processor at 2.6 GHz. For this given setup, rigid registration and motion correction took 42 iterations and approximately 10 seconds. The second fastest method was the affine registration, which took 47 iterations and approximately 15 seconds. Non-rigid registration with an optimized C regularization (T ) = 2 × 10 −4 was the slowest method, which took 201 iterations and approximately 90 seconds. Two-stage registration with an optimized C regularization (T ) = 3 × 10 −2 took 47 iterations for affine and 8 iterations for non-rigid registration with a total of 20 seconds approximately.

B. Simulation Results for the Realistic Motion
A B-mode image with motion was generated with 20 dB SNR using the motion field extracted from the clinical scan shown in Fig. 4 (middle). This frame with motion was registered to a reference B-mode image without motion by using non-rigid, affine, and two-stage motion estimation methods. Fig. 7 (top) shows the motion field estimated by different methods and Fig. 7 (bottom) shows the absolute difference between induced and estimated motion fields. Amongst all the methods presented, the two-stage method has the lowest mean absolute error value of 12.2 µm, which increased to 15.3 µm for the non-rigid method and 28.7 µm for the affine method. When using the two-stage method, the absolute motion estimation error was less than 15 µm for 70% of the total image area, whereas this area drops down to 65% for the non-rigid and 38% for the affine method.
Overall, the rigid method has the worst performance since it cannot accommodate the localized shearing and rotation of the tissue for this case of simulated realistic motion. The affine method has a good performance for this case, however it performs worse than the two-stage method. The second nonrigid registration stage of the two-stage method compensates for the local deformations and gives an advantage over the affine method. The non-rigid method works best for small local deformations, however, when the motion is larger than   a few pixels the performance drops significantly, which is visible in the south-west corner of the Fig. 7 (bottom-middle). Therefore, the two-stage method is a good composite method that compensates the large global motion first and provides a better starting point for the non-rigid registration stage.

C. Effect of Regularization Parameter
By comparing the computation time, it is easy to notice the advantage of the two-stage registration against non-rigid registration. According to the previously given example, nonrigid registration with an optimized regularization parameter performed 201 iterations, where the two-stage registration with an optimized regularization parameter performed a total of 55 iterations to minimize the cost function. The first stage compensates a large portion of the motion by using the faster affine method and requires fewer iterations for the slower second stage based on the non-rigid registration.
Although speed improvement is a big advantage when using the two-stage method, the most important benefit of this method is improved estimation robustness when using a non-optimized regularization parameter. To demonstrate this, a large image region (between 10 to 35 mm in depth and between -13 to 13 mm in lateral) and a smaller region (between 10 to 30 mm in depth and between -8 to 13 mm in lateral) were chosen from the same dataset simulated with the realistic motion, where the small region avoids the large motion at the south-west corner of Fig. 4 (middle).
Performance of all estimation methods for this simulation is shown in Fig. 8 (top) for the small region and (bottom) the large region. The mean absolute error values calculated for each method is plotted against the regularization parameter. Rigid and affine registration methods do not use a regularization parameter, so they have the same constant value for the whole range. The performance of the non-rigid and two-stage registration methods depends on the regularization parameter, where the performance is best for the optimized values highlighted by red and blue circles in Fig. 8.
For the small region, the rigid method has the largest error between all methods due to the shearing and rotation of the tissue. When the regularization parameter is optimized, the non-rigid and two-stage methods have similar error values. However, the two-stage method provides a better performance for a broader range of regularization parameters, which is advantageous for real-time applications where the optimization of the regularization parameter may not be possible for every frame.
For the large region, the rigid method again has the largest error as shown in Fig. 8 (bottom). The affine and two-stage methods performed similarly for both large and small regions, but the performance of the non-rigid method dropped significantly. By comparing the results after choosing a large and a small area from the same B-mode image, one can conclude that the robustness of the two-stage motion estimation method is better than the non-rigid method.

D. Clinical Results
It is hard to demonstrate the accuracy of the motion estimation and correction on the clinical dataset without knowing the ground truth. Therefore, this section demonstrates the use of the two-stage motion estimation on clinical SR images based on the assumption that healthy volunteers without peripheral arterial disease should have long straight vascular structures without stenosis and tortuous vessels, as demonstrated by references [32] and [33] using angiograms and micrographs. Fig. 9 shows the CEUS maximum intensity projection (MIP) and the SR image generated from a 45 second clinical scan with 580 frames. Using less than a minute of clinical data acquisition, the imaging region was visualized using the SR method with an average number of localizations of 30 MBs per frame. Motion correction was performed by using the estimated motion with the two-stage method, where the estimated motion from two selected regions are shown in Fig. 9 (bottom) as a demonstration. An average motion of 233 ± 17 µm per second was estimated from the clinical US scans. Qualitatively, it is clear that the spatial resolution of the SR image is higher than that of the MIP. Fig. 10 (top) demonstrates the effect of motion correction on a chosen vessel. The average thickness of the vessel inside the boxes are given in Fig. 10 (bottom). The sizes of the vessels from these images were measured by using linear interpolation [34]. The full-width at half maximum (FWHM) of the vessel was measured as 1075 µm from the MIP image without motion correction. The SR image without motion correction achieved a sub-wavelength vessel FWHM of 220 µm; however after the application of proposed twostage motion correction method the FWHM of the vessel was reduced to 104 µm and the double-vessel feature disappeared.
The benefit of using two-stage motion estimation and motion correction on SR imaging in human microvasculature is demonstrated in Fig. 11. The microvessels were chosen from four different SR images acquired from three healthy volunteers. Figure shows the effect of the proposed twostage motion correction on different microvessels where the thickness of vessels are given in Table I. After motion correction the torturous vessels appeared as straight vessels, which illustrates the significance of the two-stage motion correction on clinical interpretation of SR US images. Motion correction also potentially removed blurred vessels and artificial double copies, which were mostly visualized as single vessels after correction. After motion correction the size of the average vessel in the SR image dropped from 146 µm down to 94 µm by reducing the width of the motion blurred microvessels approximately 1.5-fold as listed in Table I.

V. DISCUSSION
Motion is an inherent part of in vivo imaging and ultrasound imaging methods based on multiple acquisitions suffer from motion artefacts even for images acquired at high frame rates with or without microbubbles [35]- [37]. For super-resolution imaging, sub-wavelength motion correction methods are required to visualize microvascular structures and flow beyond the diffraction limit through localization of spatially isolated microbubbles. Rigid motion estimation techniques using image data cannot compensate for local deformations as demonstrated in the simulation study presented here.
This study employed an image based motion estimation approach for SR US imaging. The applied two-stage method is a combination of affine image registration that can estimate the global motion, and non-rigid image registration that can estimate the local deformation of tissue. The main advantage of using a two stage registration instead of a non-rigid registration is that the non-rigid method is more complex and computationally heavy. The first stage, affine registration, compensates for the global motion and it gives a better starting point for the non-rigid stage, which also reduces the number of iterations required to minimize the cost function.
For the first two simulation studies with the probe motion and tissue deformations, presented in Fig. 5 and Fig. 6, the standard deviation values are relatively large since the displayed results present a combination of many simulations performed for 11 different motion amplitudes between 1 µm and 1024 µm. For the same simulations, the accuracy of the motion estimation was better in the axial direction for all methods due to the shape of the point spread function, which was narrower in the axial direction. These two sets of simulations were performed to assess the feasibility of the motion estimation methods; however the motion field was too simplistic to highlight the advantages of the twostage registration method compared to affine registration. The affine method has the degrees of freedom required to correct for the simplistic motion, whereas the two-stage method has many more degrees of freedom. For these simulations, the proposed two-stage method achieved similar results with the affine method with a mean absolute motion estimation error  Fig. 11. Effect of motion correction is presented in 12 SR image pairs. Images are displayed in three columns, where the left hand side image is without motion correction and the right hand side image is with motion correction. Colorbar is the same as Fig. 9 and it corresponds to the number of localized microbubbles. of 7.5 ± 5.8 µm or less for the simulated motion range of 1 − 1024 µm with 20 dB SNR. However, for the simulation with realistic motion field, the advantage of using the twostage method over affine method became obvious, where the mean absolute error was 2.3 times smaller for the two-stage method.
The error values presented in this study will change as a function of many variables such as wavelength, sampling frequency, imaging resolution, SNR, and regularization parameter. Improving some of these parameters and also using RF data instead of image data can increase the estimation accuracy. Although in this study the motion estimation was performed on B-mode images acquired by a commercial scanner, the application of the proposed motion correction scheme to RF data is possible [38]. Using RF data instead of image data can increase the estimation accuracy, as the RF domain is a super-set of the image domain with an additional signal phase information.
The regularization penalty might have a significant impact on the motion estimation error for the non-rigid method, however it is possible to achieve a reasonable estimation accuracy for a large range of regularization values while using the two-stage method as shown in Fig. 8. For both the simulation and clinical ultrasound data used in this study, the smallest used value was C regularization (T ) = 2 × 10 −4 . Below this value the computation time of the estimation process increases without a benefit since a lower penalty results in a viscous-like registration which is not realistic for human soft tissue. Above the largest used value of C regularization (T ) = 0.5, estimation results in an affine-like registration which is not suitable for estimating complex non-rigid motion fields.
The motion estimation can be performed using either a dynamic or a static reference frame. It is possible to perform registration between each pair of consecutive frames by changing the reference frame for every registration, however this results in an accumulation of error over every registered frame. In this study, motion estimation was performed by using a single reference frame. In this case, the choice of the reference frame becomes crucial. If the specific chosen reference frame is corrupted with artefacts or significant motion, this can make it very different from the rest of the frames in the sequence. An automatic and systematic way of choosing a good quality reference frame may eliminate this problem. Future improvement can include group-wise registration of the entire video sequence together rather than pairwise registration [39].
There are two significant limitations in SR US imaging performed in 2D. The biggest problem is the out-of-plane motion that cannot be compensated using 2D imaging methods. Before starting the motion correction procedure for the clinical study, the B-mode and CEUS frames were visually inspected. Video files were segmented into smaller sections with no obvious out-of-plane motion in the B-mode images. From these segmented videos, only those with 250 or more frames were selected for further processing. Out of five to seven minutes of US and CEUS acquisitions from three volunteers, only four continuous acquisitions with a duration of 40 to 55 seconds were suitable for SR imaging after motion correction. The tibialis anterior muscle is located at one of the extremities of human body and is therefore not affected by respiration and cardiac motion. When imaging in the abdomen and chest region, while the motion correction algorithm was designed to cope with large motion amplitude including that expected in abdomen, increased out-of-plane motion will limit the applicability of our approach in its 2D form, as the current correction procedure relies on a constant 2D plane in the sample being imaged over time. In the future, it should be possible to apply our approach when imaging in the liver, pancreas or kidney using 3D imaging approach, or 2D SR US provided that an experienced clinician places the probe in a way that the imaging region only moves inplane with the probe. Secondly, the acquired SR images did not have the required resolution in the elevation direction. Both of these issues can be addressed by using 3D imaging methods, which can achieve the required elevational resolution and SNR for 3D SR imaging [40]- [42]. Although 3D imaging offers a solution to the problem of out-of-plane motion, it may introduce other limitations for SR US imaging. For multiplexed 3D ultrasound systems, the acquired volume data is a combination of multiple transmissions and acquisitions, which may generate intra-volume motion artefacts. High frame rate 3D ultrasound imaging with plane waves is capable of imaging the whole volume with a single acquisition. However, this method suffers from low SNR due to the lack of elevational focusing, which may increase the localization error in SR US imaging. A carefully chosen imaging strategy is required for the 3D SR US to balance the trade-offs between motion artefacts and localization error. High speed implementation of 3D SR remains a big challenge due to post-processing complexity and data size for both multiplexed and plane wave 3D ultrasound imaging.
The SR images shown in Fig. 11 have all been selected using visually-perceived improvement in image quality, based on an assumption that the microvessels should be long and straight in healthy patients [32], [33]. It is not possible to calculate the resolution of the SR images due to the lack of the ground truth. Nevertheless, it is possible to measure the width of the microvessels, where the motion corrected SR imaging method will not give an underestimate of the width due to motion and localization error. For this reason, the spatial resolution of the SR images generated in this study using a clinical US system has to be < 94 µm (λ ≈ 250 µm), and therefore these results represent the first clinical localizationbased super-resolution ultrasound imaging.
This study used a dataset acquired by a clinical scanner with normal frame-rate and demonstrated the use of a motion correction method without considering the computational speed. The motion estimation and correction were performed by using a MATLAB code and executed on a CPU; however it is possible to significantly improve the computational time of the applied method by using a GPU or other parallel processing approaches. A fast motion correction can empower the image based super-localization technique and lead to quick clinical translation of super-resolution ultrasound imaging.

VI. CONCLUSIONS
Clinical motion observed in ultrasound imaging is an aggregation of various motion types. Probe movement, respiration, cardiac motion and many other unavoidable sources of body motion result in a combination of translation, shearing and non-rigid deformations at different scales. Super-resolution images are generated through multiple ultrasound frames acquired over a duration of seconds to minutes, where both large tissue movements and small local deformations can be observed. To estimate both a large and a small scale motion and local deformations simultaneously with high precision, this study used a two-stage approach for ultrasound superresolution imaging.
Ideally, super-resolution ultrasound imaging should be limited by the localization precision rather than sample motion. Therefore, motion should be compensated with an accuracy higher than the spatial resolution of the super-resolution image as demonstrated in this study. The feasibility study showed that it was possible to achieve a sub-pixel (x = 60 µm) and sub-wavelength (λ ≈ 250 µm) motion estimation accuracy of 7.5 ± 5.8 µm or better for the simulated probe motion and tissue deformations while using the two-stage image registration method. Similar results were achieved for the simulations with a realistic motion extracted from a clinical dataset, where the mean absolute error was 12.2 µm and 70% of the motion was estimated with an absolute error smaller than 15 µm. The two-stage method was then applied to achieve clinical superresolution ultrasound imaging of microvasculature in human lower limb using a commercially available clinical ultrasound scanner.