RobUSt–An Autonomous Robotic Ultrasound System for Medical Imaging

Medical ultrasound (US) systems are widely used for the diagnosis of internal tissues. However, there are challenges associated with acquiring and interpreting US images, such as incorrect US probe placement and limited available spatial information. In this study, we expand the capabilities of medical US imaging using a robotic framework with a high level of autonomy. A 3D camera is used to capture the surface of an anthropomorphic phantom as a point cloud, which is then used for path planning and navigation of the US probe. Robotic positioning of the probe is realised using an impedance controller, which maintains stable contact with the surface during US scanning and compensates for uneven and moving surfaces. Robotic US positioning accuracy is measured to be 1.19± 0.76mm. The mean force along US probe ${z}$ –direction is measured to be 6.11±1.18N on static surfaces and 6.63±2.18N on moving surfaces. Overall lowest measured force of 1.58N demonstrates constant probe-to-surface contact during scanning. Acquired US images are used for the 3D reconstruction and multi-modal visualization of the surface and the inner anatomical structures of the phantom. Finally, K-means clustering is used to segment different tissues. Best segmentation accuracy of the jugular vein according to Jaccard similarity coefficient is measured to be 0.89. With such an accuracy, this system could substantially improve autonomous US acquisition and enhance the diagnostic confidence of clinicians.


I. INTRODUCTION
Robots are gradually transforming medical procedures by facilitating the earlier diagnosis of diseases, improved operational efficiency, and implementation of advanced surgical techniques [1]. Medical imaging modalities are used to acquire data and create a visual representation of the human anatomy, that can be utilized for diagnostics or during surgical procedures [2]. Specifically, medical ultrasound (US) is widely used for the diagnosis of internal tissues due to its beneficial features, including low cost of operation, noninvasive nature, and real-time imaging. However, there are challenges associated with acquiring and interpreting US images. The acquisition of US images requires correct placement and consistent force between the US probe and the The associate editor coordinating the review of this manuscript and approving it for publication was Jinhua Sheng .
patient. Variables such-as probe-to-surface orientation, probe position, and tissue contact force affect the quality of the US images [3]. Once the US images are acquired, an analysis follows, whereby the clinician localizes and inspects the tissues. In order to enhance this process, image processing and analysis techniques are used for extraction of additional diagnostic information from the US images [4].
Volumetric data is useful to the clinicians to visualize and inspect internal tissues. Such data is generated from 2D and 3D US images using reconstruction techniques [5]and knowledge of the US probe pose. The pose relation between the US probe and the frame of reference is obtained using electromagnetic [6], visual [7] or mechanical [8] spatial sensing devices. Based on that notion a robotic system can be used to manipulate the US probe and obtain accurate spatial data. For example, Zhang et al. demonstrated the application of a 6-DOF robot with an in-hand 2D US probe that facilitates 3D US volume reconstruction [9]. Aalamifar et al. introduced a calibrated robot-assisted US system, which employs visual tracking of movements to follow and align to a manually operated probe [10]. US probe calibration has been investigated by incorporating automatic calibration using a 6 degreesof-freedom (DOF) robot, a marker-based optical tracking system, and a US calibration phantom [11], [12]. Employing a robot is also useful to provide contact force information and control the end-effector position in relation with the scanned surface [13]. As shown by Jiang et al., a combination of US images and force feedback from a 7-DOF robot can be used to control the alignment of the US probe in reference to the scanning surface [14]. Furthermore, Jiang et al. demonstrated the importance of the US probe-to-surface alignment when acquiring images of human tissue [15].
Despite the aforementioned large body of work in the subject, the planning and execution of trajectories ensuring stable robot contact with an anthropomorphic body remains a vital research problem.This problem is catalyzed by high anatomical variations in human population, including uneven geometries of skin surfaces and varying skin elasticity. Robotic path planning is intensively researched in the fields of mobile robotics [16] and non-destructive testing [17]. Translation of these techniques to medical applications involving robotic US and the use of point clouds acquired from the 3D camera for path planning of a probe in contact with the body surface is studied in [18]- [20]. For example, Graumann et al. gave only a rudimentary account of using impedance control for stable contact. Next, Huang et al. proposed the planning of one line surface path and used two single-axis force sensors for correcting US probe poses. Finally, Virga et al. studied autonomous robotic US acquisition to automate 3D camera-based single line trajectory planning, and reported on force-based position compensation during breathing motions. Nevertheless, planning of paths that contain multiple scanning sweeps on curved surfaces and motion control necessary to execute these trajectories while compensating for both the position and the orientation of the US probe in presence of physiological movement, was not demonstrated. To address this issue, we propose a novel approach, whereby the planned path is transformed into a series of equilibrium poses for the impedance controller. With this approach we maintain constant contact with both deformable and moving surfaces during US scanning.
US image analysis is a well-researched topic within the context of manual US equipment without robotic assistance [21]. While conventional US image analysis involves a clinician and requires experience, automatic segmentation methods can improve accuracy and clinical outcome [22] or enable real-time systems to control medical instruments like needles [23]- [25], catheters [26], or drills [27]. However, US images inherently contain many imaging artifacts and have low signal-to-noise ratio so filtering of noise from the signal is one of the main perceived challenges [28].
Furthermore, due to the aforementioned anatomical variances and US machine characteristics, the intensity, density, FIGURE 1. RobUSt -''Robotic Ultrasound System'' (a) A 3D camera is used to obtain a point cloud of an anthropomorphic ultrasound (US) phantom surface. This point cloud is used for planning of an US probe scanning path. (b) US images are acquired using a probe that is manipulated, along the planned path, with the impedance controller that ensures constant surface contact. (c) A US volume is reconstructed and unsupervised learning K-means clustering method is used for the segmentation of scanned tissues and multi-modal visualization. and speckle content of tissues can be perceived differently in US images. This occurrence requires the use of machine learning methods, which segment such tissues automatically [29], [30]. In this study we demonstrate that the roboticaided reconstruction of US volumes and the utilization of unsupervised learning method K-means enable the robust and automatic segmentation of sub-dermal tissues. The resulting accurate volumetric data of segmented tissue can be used for diagnostics, computer-assisted surgery, and can be readily integrated with other automated interventional systems.
This study presents a novel integrated medical imaging solution, combining the 3D US volume reconstruction and image analysis with autonomous path planning and robot control. RobUSt is a ''Robotic Ultrasound System'' that performs autonomous path planning and implements impedance control during US scanning (Fig. 1). Such an approach facilitates the capture of US images while compensating for tissue deformation, body motions, and positioning errors. Our system improves on diagnostic capabilities by performing 3D US reconstruction, tissue segmentation, and registration of the multi-modal data. To the best of our knowledge, this study on robotic US is the first to combine path-to-motion robot control and the generation of multi-modal data. The paper is outlined as follows: In Section II, we present both the hardware of RobUSt as well as its key software components, comprising of novel algorithms for autonomous path planning, robotic impedance control, 3D US reconstruction, and tissue segmentation. The proposed solutions are validated through a series of experiments in Section III. Conclusions and directions for future studies are provided in Section IV.

II. THE RobUSt SYSTEM
The main components of the RobUSt system are as follows: a 7-DoF robot (Panda, Franka Emika GmbH) with integrated torque sensors (force resolution < 0.05N and relative force accuracy 0.8N), a diagnostic medical US machine (Sonix-Touch Q+, BK Medical), and a 3D camera (Intel Realsense SR305). The end-effector of the robot is used to manipulate the attached 3D camera and the US probe. The 3D camera uses a projected light pattern to capture a 3D-RGB point cloud of the body surface (resolution of 640 × 480 pixels), which is used for planning of the US scanning path. During 2D B-mode image acquisition the robot moves the US probe along the planned path while keeping contact with the surface using the impedance control.
US images are used for 3D volume reconstruction and segmentation by employing Point Cloud Library (PCL v1.9.1) for the handling of point clouds, and Open Source Computer Vision Library (OpenCV v4.4.0) for image processing. Algorithms for robot control and machine vision are integrated into a C++ application on a portable workstation (Intel Core i7-9750H, 2.60Ghz, and 32 GB RAM) running Ubuntu 18.04 with Kernel 5.4, and RT-PREEMPT for real-time robot control capabilities. Experiments are performed using an anthropomorphic US mannequin (Gen II US Central Line Training Model, CAE Healthcare, USA), which we will refer to as anthropomorphic US phantom.

A. SYSTEM CALIBRATION
The robot base frame (B) is chosen to serve as a global reference for position control of the robot (Fig. 2). Homogeneous transformation matrix between any two coordinate frames (Y and X ) is denoted as T X Y ∈ SE(3). It is composed of the rotation matrix R X Y ∈ SO(3) that describes the orientation, and the position vector p X Y ∈ R 3 of frame (X ) with respect to the reference frame (Y). Robot flange pose is the transformation T F B . 3D camera and the US probe are rigidly fixed to the robot flange frame (F), making T U F and T C F constant. T C F is estimated using calibration method implemented in Visual Servoing Platform (ViSP v3.3.0) [31]. The least square error solution for p U F is estimated by retrieving T F B while moving the robot in several configurations while keeping constant position of the US probe. Positioning of the 3d camera and the US probe on the robot end-effector ensures that R U F = R C F . Transformation of frame (A) which is originally observed in camera frame (C) to robot base frame (B) is valid under the assumption that T F B is retrieved at the same time as T A C : During US scanning, the probe is in contact with the surface and captures B-mode US images. Frame (I) is assigned to the US image matrix and any pixel is transformed to the position coordinates in frame (B) using the equation: where c ∈ N and r ∈ N are the c-th column and r-th row in the US image matrix, v is the width of the probe and h and w are the height and width of any pixel, respectively.

B. PATH PLANNING
The goal of the path planning algorithm is to automatically generate a sequence of poses on the body surface that are used to image and reconstruct the underlying volume beneath this area. These poses are based on user-defined outer point limits of the scanning area. The developed path planning algorithm is described in Fig. 3. First, the 3D camera is used to capture 3D-RGB point cloud of the anthropomorphic US phantom surface. The point cloud ({κ}) is used in the algorithm that autonomously plans a path that the robot carrying the US probe needs to follow when preforming US scanning. Only positional data (without the colors) are used to define the point cloud ({κ}) in the (C) frame, comprised of f number of points κ j ∈ R 3 with j = 1, . . . , f . Outer limits of the scanning area are defined by a set of points ({ι}), where ι k ∈ R 3 and k = 1, . . . , 4. The number of the generated grid points (s) and the spacing between them depend on the distance between the limit points and the user set variable (g) which is based on US probe size. A set of grid points ({σ }), shown in Fig. 3(a), is generated such that minimal number of points (σ i ) with i = 1, . . . , s are linearly spaced, and g is the largest allowed distance between any two points. All the grid points (σ i ) are spaced in straight 3D lines between the points of the set ({ι}). Thus, they need to be projected to the actual surface to compensate for curvatures. Each grid point (σ i ) is projected along the direction of normal (n) to the least square fitted grid plane. The nearest κ j found on the surface of the body to the probe in a path point ρ i is defined using the z-axis unit vector as a normal to local surface according to (4) and a vector yy i that connects consecutive path points ρ i and ρ i +1 and lies in the yz-plane.
It should be noted that for the last path point in every row, yy i from a previous point is used. When the probe is returning on even row sweeps, yy i := −yy i , to avoid the turning of the US probe. Unit vectors x i and y i for orientation ϑ i are derived from z i and yy i . (d) The robot path in C is based on position ρ i and orientation ϑ i . ray − → σ i n then becomes a surface path point (ρ i ): Orientation of the US probe (ϑ i ) is calculated for path point ρ i using surface points around it and the position of the next path point ρ i+1 . The direction of z-axis unit vector for every path point (ρ i ) is calculated based on the local neighborhood of points in a radius of size (g) around it: where t is any point in point cloud ({κ}). Therefore, value of variable (g), the maximal distance between neighboring path points, is used for calculation of local surface normal. Normal to surface for a set of points is estimated using the PCL function, denoted as normal{} in (4), that solves the eigenvalues and eigenvectors of a given covariance matrix and estimates the least-squares plane normal. Unit vectors (x i ) and (y i ) are calculated in Fig. 3(c). Path point position (ρ i ) and orientation (ϑ i ) together define a transformation T A i C that is transformed to frame (B) using (1). A complete set of robot end-effector path poses in frame (B) is denoted

C. IMPEDANCE CONTROL FOR PROBE-TO-SURFACE CONTACT
The autonomous positioning of the RobUSt probe in contact with the body surface necessitates the use of a control algorithm, which adheres to a number of requirements. First of all, the control of the probe must involve contact forces which are deemed safe. Second, at all times, the distal surface of the probe must remain completely in contact with the surface of the body for imaging. Finally, the positioning accuracy on a plane tangent to the surface of the body must be sufficient to ensure that the entire path ( ) is mapped. We fulfil these requirements by using a two-stage algorithm (Fig. 4) comprising of a motion generator and a closed-loop impedance controller. As an input, the algorithm takes the Cartesian position vector p A i B and orientation quaternion q A i B ∈ H of the desired robot pose T A i C . Operating continuously, the algorithm generates reference joint torques τ r ∈ R 7 fed as an input to the low-level built-in controller of the robot.
The motion generator contains an internal model of the Panda robot, described by joint angles θ m ∈ R 7 . The generator uses an inverse kinematics scheme, based on the damped Moore-Penrose pseudoinverse ( †) [32] of the space robot Jacobian J B ∈ R 6×7 , where ψ ∈ R > 0 is the damping coefficient and I ∈ R 6×6 is an identity matrix. The generator provides a continuous motion trajectory comprising of intermediate US reference The impedance control (Fig. 4) takes place in the reference frame (U) of the US probe. We use the control law where e c U ∈ R 6 is the error between the reference and current configuration of the robot expressed in frame (U),ṗ U ∈ R 3 and ω U ∈ R 3 are linear and rotational velocities of T U B expressed in frame (U) [33], and τ c ∈ R 7 are torques compensating for robot dynamics. These torques are computed VOLUME 9, 2021 is generated with an internal kinematic model, described by joint angles θ m . This model is used in an inverse-kinematics motion generator based on damped pseudo-inverse of the robot space Jacobian J B . The generator minimizes the error e m between the end-effector pose of the model and the path pose. The smooth trajectory is then used as an equilibrium reference in an impedance controller. The control input comprises of reference torques (τ r ) for the low-level torque controllers of individual robot joints. The input is based on the actual displacement of the US probe from the equilibrium, simulating a spring-dampener system with parameters defined by stiffness matrix K c and damping K d . using a dynamic model of the Panda robot provided by the manufacturer. We use the gain matrices K c ∈ R 6×6 ≥0 and K d ∈ R 6×6 ≥0 to define the stiffness and the damping of the force-position relation. The stiffness is defined as follows: while such that the system is critically damped. The force-position relation provided by impedance control enables the robot compliance in the presence of contact force. By introducing a constant offset (δz ∈ R + ) to the instantaneous position reference p ref U along the axis ( z U ), we use this phenomenon to achieve a sustained probe-to-surface contact with the resulting normal force F z U ∈ R + . The steadystate components of that force can be modelled as follows: where e r z ∈ R represents the error between the location surface as registered using the point-cloud and the actual one. As long as δz > e r z , the probe will remain in contact with the body. By tuning the coefficient (K z ), the range of contact forces can be selected. Additionally, we choose K j such that the resulting torsional compliance compensates for uneven contact of the probe surface with the body [18]. The rest of the coefficients of K c are K x , K y K z , and K i , K k K j to achieve high stiffness, which improves the precision of the motion. The values for all coefficients are specified in Section III.

D. ULTRASOUND VOLUME RECONSTRUCTION AND SEGMENTATION
The 3D camera and impedance control are used to realise the autonomous US surface scanning for volume reconstruction and segmentation. During US scanning 2D-intensity images are retrieved and inspected to detect potential loss of contact between the US probe and the body. In case of uncertainties (e.g., variation in tissue stiffness or body surface motion), part of the probe can lose contact with the surface, resulting in zero value pixels in columns of the US image ( Fig. 5(a)). The intensity check algorithm is developed to detect such columns and prevent the conversion of their pixels to 3D points in the reconstructed US volume. Redundant control using contact force and US image inspection is used to detect unexpected situations during US scanning. The acquired 2D US image and corresponding transformation T F B of the robot flange are used with (2) to map all the valid pixels to the 3D points that represent the reconstructed US volume (Fig. 5(b)). Reconstructed US volume and the surface previously captured with the 3D camera are transformed to the robot base coordinate system using (1) and (2). The result is a multi-modal data set registered in frame (B), as shown in Fig. 5(c). The advantage of this data set over a standard US data set is that visualisation of inner tissues is shown relative to the surface of the body.
Segmentation is used to visualize different tissues within the reconstructed 3D US volume. The segmentation process starts with speckle reduction and intensity homogenization in 2D US images using the Edge-Preserving-Filter (EPF) implemented in OpenCV (v4.4.0). EPF avoids edge shifting which characterizes simpler methods, such as Mean or Gaussian filters, and has a lower computational cost in comparison to other edge-preserving algorithms, like bilateral texture filter and anisotropic diffusion [34]. Before EPF filtering, the image is transformed from the grey-scale to RGB color representation. The filtering results are shown in Fig. 6(b). Following the image processing, the pixels are transformed to 3D space using (2). Due to anatomical variability and changes in the US machine settings, identical tissues can be observed differently in US images with regards to intensity, density, and speckle content. K-means clustering method [35], implemented in the PCL (v1.9.1), is used to separate differently observed intensities of tissues in the reconstructed US volume. K-means clustering is an unsupervised learning method that identifies a set number of centroids in any given data set and then appoints every data point to the nearest cluster by reducing the in-cluster sum of squares. K-means is chosen for the segmentation task because it requires no prior knowledge and performs well in cases with large variability in the input data. In our implementation, the complete FIGURE 7. Positioning accuracy measurements are performed using a series of 5mm diameter cylindrical shafts fitted inside the phantom filled with a gelatin-water mixture. Fabricated dimensions between the shafts are considered as the ground truth since their accuracy is at least an order of the magnitude higher than measured accuracy. Shafts localized in US images are transformed to the robot base frame (B) using the corresponding robot poses and (2).
3D-RGB type point cloud is used as an input for K-means algorithm with three clusters. This results in clustering that is predominantly influenced by color values while the impact of point position grows with the distance between clustered points. Segmentation of the neck area is shown in Fig. 6(d).

III. EXPERIMENTAL RESULTS
We validate the RobUSt system in a series of studies. First, we assess the performance of the robotic system, by measuring the positioning accuracy and the contact forces during US scanning. Secondly, the segmented US volumes are evaluated based on robustness to changes in US system parameters. The protocol for all the experiments involves aligning the robot end-effector with the attached 3D camera towards the surface of interest and capturing the point cloud. The outer limits of the scanning area are then selected interactively on the visualization window. Based on those inputs the algorithm autonomously plans the motion trajectory. The robot moves the US probe with the motion controller while simultaneously capturing the US images, end-effector poses and forces.
The positioning accuracy of the robot with the US probe is defined as the ability of the robot to localize 3D points in the gelatine phantom (Fig. 7). The positioning error between localised points is a cumulative error composed of robot positioning errors, robot-to-US probe calibration errors, US image reconstruction parameters, and in-image localization errors. Based on the localization of 20 points uniformly distributed inside the phantom, the mean Euclidean error is 1.19±0.76mm. However, authors expect larger positioning errors when scanning uneven surfaces, due to changes in orientation of the robot end-effector [36].  Maximum distance between grid points σ i is set to g = 30mm and each equilibrium point is translated along the z-axis of orientation ϑ i from every path point position ρ i for δz = 40mm. The reference pose changes The probe-to-surface forces are measured during scanning of four surfaces with different geometry on the static anthropomorphic US phantom. Measured forces relative to z-axis of frame (U) are reported in Fig. 8. The experiment demonstrated the ability of the RobUSt to keep stable contact with surfaces of different geometries. The mean normal (|F z |) forces exerted on the surface are between 5.31N and 6.93N, which falls within the range advised for acquiring clear US images [13], [37].
Next, the experiment with the moving surface is performed. The probe position in frame (B) and the force in frame (U) are measured during path execution while the anthropomorphic US phantom is moved using the second robot with cycle time of 3s and vertical position amplitude of 25mm (Fig. 9). The experiment is performed 4 times on the chest and 2 times on the neck area, each with a duration of 22s. The mean normal (|F z |) forces exerted on the surface are 6.63±2.18N, with overall lowest and highest force measured to be 1.58N and 11.86N, respectively. This experiment demonstrates that probe-to-surface contact is consistent on a moving body.
Robustness of the segmentation algorithm is validated by changing US transducer frequency (f ), dynamic range (DR, ratio of the largest to the smallest signal), and US image gain (K u , the overall brightness) during three separate scanning experiments on the collum (neck) area of the static anthropomorphic US phantom as shown in Fig. 10  (1a, 2a, and 3a). Changing the values of US settings results in capturing US images that differ with regard to intensity, speckle content, and the level of tissue homogeneity. During robotic US scanning, images of size 256 × 460 pixels (38 × 70mm) are retrieved and filtered with EPF. The parameters of the EPF algorithm are determined heuristically with the size of the neighborhood being set to 60 and the coefficient of the edge-preserving threshold set to 0.15. The reconstructed US volume is segmented using the algorithm described in Section II.D. A qualitative segmentation analysis is performed based on the visual examination of the featured segmentation results shown in column (b) of Fig. 10. We conclude that, even with different US system parameters, the algorithm can differentiate between the jugular vein wall (green points), the liquid within the jugular vein (red points), and the tissue surrounding the vein (blue points). For the quantitative evaluation, the ground truth mask of the jugular vein is manually denoted in 2D images and the 3D ground truth volume is reconstructed. The pixels in the rectangular area around every 2D ground truth mask are compared to their corresponding segmented 3D points. As a measure of similarity between the masks, the Jaccard similarity coefficient, i.e., the Intersection over Union (IoU) is calculated in column (d) of Fig. 10. In conclusion, the unsupervised learning algorithm demonstrated that it can robustly segment a variety of tissues with significant accuracy, with no prior training data.

IV. CONCLUSION AND FUTURE WORK
In this study, we present a framework for RobUSt: a robotic US system with high level of autonomy. RobUSt is capable of performing autonomous path planning that accounts for anthropomorphic surface geometry, by utilizing the 3D camera data. The positioning accuracy measured using the gelatin phantom is 1.19±0.76mm. The impedance control enables the positioning of the US probe while compensating for tissue deformation or surface movement. This is shown in by measuring probe-to-surface force during US scanning of a static body and moving body with 6.17±1.19N and 6.63±2.18N, respectively. Furthermore, overall lowest measured force of 1.58N demonstrates constant probe-to-surface contact is kept. The acquired US images and corresponding probe poses are used for the 3D US reconstruction and registration of multi-modal images consisting of the US volume and the body surface point cloud. The resulting data is used for the segmentation of the jugular vein in three scenarios with varying US machine settings. Ground truth and segmentation volumes of the jugular vein are used to the calculate the best case IoU to be 0.89. Using K-means clustering, we can reliably differentiate between the inside of the vein, vein walls, and the surrounding tissue. In conclusion, with added task-specific functionalities, the RobUSt system framework could be utilized for autonomous medical examinations, clinical diagnosis and automated surgical interventions.
One of the challenges observed when registering the surface point cloud to the US volume is the decreased accuracy between the two due to tissue deformations caused by contact with the US probe. To account for this in our future work, we plan to acquire 3D point-clouds of the probe-surface contact area in real-time. Based on the obtained US images and measured forces, a model can be created that describes and predicts the internal tissue deformations. This approach will enable estimation and reconstruction of the undeformed US volume and improve the registration accuracy with the surface point cloud. VOLUME 9, 2021 Furthermore, we will utilize the RobUSt framework in specific applications e.g., endovascular interventions in clinically relevant scenarios.