Suturing Support by Human Cooperative Robot Control Using Deep Learning

Considering the widespread use of surgical robots in recent years, computer-assisted surgery is becoming significantly popular. Because the automation of surgical tasks without human intervention remains complex owing to individual patient differences, a human cooperative control is proposed in this study. A system in which an operator manipulates one surgical instrument to insert a suture needle was developed, along with another surgical instrument that automatically pulls out the needle from the operated instrument. In the proposed method, YOLOv3 and a standard convolutional neural network (CNN) are used to estimate the penetration and pull state of the needle. An image-based state estimator classifies the state regardless of the stiffness of the object to which the suture needle is inserted. Furthermore, after the pull state is detected, despite a failure of the needle pulling, the position can be corrected and the automated surgical instrument can approach the needle again. Experiments on human cooperative control demonstrated the effectiveness of state estimation using the proposed method. In addition, the failure of grasping observed in a previous study caused by the needle angle error was reduced.


I. INTRODUCTION
Endoscopic surgery, wherein only a few small incisions are required to insert surgical instruments into the body, has recently become popular. Postoperative esthetics and rapid recovery are of significant benefit to patients, however, the minimally invasive operation becomes more complex than the open surgery. For example, endoscopic surgery involves pivoting the surgical instrument around its insertion point in a narrow working space within the body. The longer instruments compared with those of open surgery also require training of the physician for precision. Surgical robots have been developed to solve the problems of endoscopic surgery [1], [2]. Surgical robots have eliminated complex operations around the insertion point and introduced hand tremor suppression and motion scale features to ensure accurate surgery. Intuitive Surgical's da Vinci is the most popular surgical support robot and is used for abdominal, pelvic, and thoracic The associate editor coordinating the review of this manuscript and approving it for publication was Xiwang Dong.
surgeries [3], [4]. Most surgical robots are master-slave robots, and the surgeon continuously operates the position of the surgical instruments through endoscopic images. Therefore, the automation of complicated tasks during surgery using precise positioning control of the robot would lead to efficient surgery.
Murali et al. automated surgical cutting and debridement subtasks through ''Learning By Observation'' with a surgical support robot [5]. However, the color of the suture needle and the shape of the forceps were different from those of real surgical tools. Khoorjestan et al. proposed an automatic suture device that has greater strength than a hand-sutured specimen [6]. However, it is difficult to use this open-space device in minimally invasive surgery. Several systems have been developed to automate suture tasks in minimally invasive surgery [7], [8]. Shademan et al. verified the effectiveness of suturing task automation by experiments using pigs [9]. However, automation of surgical tasks without human intervention is still difficult due to individual differences in human bodies. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Watanabe et al. proposed a single-master dual-slave system in which manually manipulated forceps insert a suture needle and autonomously operated forceps pull it out [10]. This method, which was previously proposed, shortens the time of the entire task and demonstrates that the semi-autonomous assistance to the operator is useful. However, there are a few remaining issues. This method uses the product of the external force estimated at the tip of the forceps and the angle of the roll axis of the holder robot to detect needle penetration. Penetration is not detected when the needle is inserted into an organ softer than the one used to define the threshold. For penetration into a rigid organ, the threshold is exceeded prior to penetration. Furthermore, because it cannot detect whether the suture needle was pulled, the automated slave conducts a hand-off motion without the needle when it fails to pull out the needle.
Image processing is effective for autonomous surgery and several automation methods are based on it. Stefan et al. verified the phase estimation of surgical videos from images using machine-learning technology and demonstrated its usefulness [11]. Compared to previous our methods that only use information regarding kinematics and force, imageprocessing-based methods can detect needle penetration despite the varying stiffness of organs. Furthermore, image processing can be used to detect failure to pull the needle and retry pull-out operation. For efficient and accurate needle state estimation, the following two-step image processing is effective: tracking the needle and cropping the image around the needle first, followed by estimating the state using the cropped image. Speidel et al. detected suture needles by color-based and geometry-based segmentation, and Chen et al. detected suture needles using a random forest model connected with a particle filter [12], [13]. The previous needle detection methods include sequential processing and template matching, and are computationally extensive. Considering the burden on the patient, it is desirable to perform image processing quickly for use of the needle state in a feedback control system.
In this study, a two-step image processing method is proposed using two different deep neural networks. A laparoscopic image around the suture needle is cropped using a convolutional neural network (CNN), and then another CNN estimates the state of the needle. The proposed method can be applied to organs of varied stiffness because the penetration and pull states are determined only by the endoscope image, in contrast to the previous method [10], which was insufficient for managing stiffness variation. In this study, YOLOv3 was used to detect a suture needle, and a standard CNN was used to estimate the state from the image area of the suture needle. YOLOv3 is an object detection algorithm that can perform smoothing processing using deep learning [14].
The remaining of this study includes the following sections: Section II presents the devices used in this study. Section III presents the details of the state estimator using deep learning and human cooperative control. Section IV presents the experimental results regarding the learning of state estimators and the automation of suturing tasks using human cooperative control. Section V presents the improved functions compared with the previous study. Finally, the conclusions are presented in Section VI.

A. SLAVE MANIPULATOR
The slave manipulator used in this study is shown in Figure 1(a), which consists of a holder robot developed by Tadano et al. [15] and forceps developed by Haraguchi et al. [16]. The holder robot has a joint of yaw q 1 ,  pitch q 2 , linear motion joints q 3 , and roll q 4 , and operates as shown in Figure1(a). The holder robot was designed such that the intersection of the yaw and pitch axes of the holder and the axis of the forceps becomes a fixed point O, and the rotation around the fixed point. The tip of the forceps mounted on the holder robot is shown in Figure 1(b). The forceps provide a mechanism for bending the flexible joint by the push-pull of the two opposing nickel titanium wires. The bending postures φ 1 and φ 2 of the tip were controlled using four wires arranged at equal intervals. The opening and closing movements of the gripper attached to the tip grasps the organs and suture needles.

B. MASTER DEVICE
The 3D Systems' PHANTOM Desktop used as a master device in this study is shown in Figure 1(c). The master device is an input interface for the position and posture necessary for the control of the slave manipulator. An input interface attached to the distal end of the master device controls the opening and closing of the gripper at the tip of the forceps. The operator manipulates the input interface of the gripper through one's finger. The master device also has a triaxial force presentation function to feedback the force estimated by the forceps to the operator.

C. ENDOSCOPE
A stereoscopic endoscopy (Olympus ENDOEYE FLEX3D) was used in this study. The endoscope outputs 680 px × 540 px images at 30 Hz. This endoscopy can select two modes of a three-dimensional image mode and a two-dimensional image mode. Because the scope of this study is to confirm the effectiveness of the proposed method, the two-dimensional image mode was selected from a simple implementation point of view to estimate the state of the suturing task.

III. METHOD A. HUMAN COOPERATIVE CONTROL
A flowchart of the human cooperative control and the corresponding endoscopic image are shown in Figure 2. The system consists of an endoscope, Slave A1 operated by the master device, and Slave A2 for autonomous assistance. The state estimator with image processing is used as a switch to control the [Auto, Pause] mode of Slave A2. To assist the suturing task, three states were defined: non-penetration, penetration, and pull. When Slave A1 grasps the needle, whether the needle penetrates is represented by non-penetration or penetration; when Slave A2 grasps the needle, the state is set to pull. The suturing task is divided into four steps, and processing is performed based on the output of the state estimator. The processing at each step is presented as follows: Step 1: The operator manually operates Slave A1 and inserts a needle into the training kit. For simplicity, Slave A1 already holds the needle from the beginning of the task. At this time, the state is defined as ''non-penetration''. While the state estimator continues to output non-penetration, Slave A2 is in pause mode. The position of Slave A2 is the initial position.
Step 2: The target position and posture of Slave A2 are calculated to pull out the suture needle, and the pulling operation of the suture needle is controlled. Slave A1 and Slave A2 are mechanically fixed and their positional relationship is known; the orientations of their bases are the same. The target position of Slave A2 is first given in the Slave A1 coordinate considering the needle shape, and then transformed to the Slave A2 coordinate by using an offset vector. However, the needle may not be held in an ideal manner because it rotates freely around the grasping position of the gripper of Slave A1. To solve this problem, an algorithm in which Slave A2 corrects the target position was added, which considers the rotation of the needle when the pull out of the suture needle fails.  Figure 3 presents the geometry of the needle and assumed needle rotation. Slave A2 first moves to position P 1 , assuming that the needle is ideally grasped. Then, if the first attempt fails, the position is corrected in the order of P 2 , P 3 . The target positions P 1 , P 2 , and P 3 of Slave A2 are calculated as follows on the coordinate system of the tip of Slave A1: where P A1 is the tip position of Slave A1, and d is the diameter of the semicircular arcuate needle shown in Figure 3. θ is the rotation angle of the suture needle assumed in this study. The target posture of Slave A2 is calculated as follows: where R z (30) is a matrix rolling the z-axis shown in Figure 3 by 30 • , and R A1 is the posture of Slave A1. In this study, Slave A2 is rotated by 30 • around the z-axis to make the posture suitable for grasping the suture needle. The modified target positions P 2 and P 3 depend on the needle rotation angle θ, and was set to 20 • in this study. Under this setting, if the rotation is within 20 • , then the needle will be within the graspable range of the gripper in less than three approaches. When the target position and posture of Slave A2 are calculated, Slave A2 shifts to the automatic control and pulls out the needle. Slave A2 conducts the pulling operation with the gripper closed, and the state estimator confirms the success or failure of the needle pull-out. When the state estimator does not detect a ''pull'' after the pulling operation, Slave A2 performs the pulling operation by correcting the target position, as shown in Figure 3. While the state estimator continues to output the penetration, Slave A2 is in Auto mode.
Step 3: The pull of the needle is detected, and Slave A2 is controlled to hand off the suture needle to Slave A1. The position where slave A2 hands off the needle is the position that moved 20 mm in the y direction from the position where the state estimator detected penetration. While the state estimator estimates the pull, Slave A2 is set to the auto mode and moves to the needle transfer position.
Step 4: The operator manipulates Slave A1 to grasp the needle held by Slave A2. After Slave A2 moves to the needle transfer position, Slave A2 shifts to the pause mode until Slave A1 grasps the needle. When the operator manually operates Slave A1 and grasps the needle, Slave A2 releases the needle and moves to the initial position. After the operation is completed, return to Step 1.
The correspondence between the steps of automation, the mode of Slave A2, and the output of the state estimator is shown in Table 1.

B. STATE ESTIMATION
State estimation is one of the most important factors to achieve task automation because it is used as the switch of automated control. In this study, a method to estimate a task state is proposed, as shown in Figure 4. The suturing task was performed using the suture training kit by KOTOBUKI Medical Inc., as shown in Figure 4. In this study, the following two deep neural networks were combined: one for needle detection and the other for task state estimation. YOLOv3 and a standard CNN were used to estimate the three states (non-penetration, penetration, and pull) required for human cooperative control. YOLOv3 is a method based on deep learning that detects a target object from an image and outputs a bounding box that operates at a high speed [14]. The proposed method uses YOLOv3 to obtain the bounding box of the needle from the 680 px × 540 px image and resizes it to 256 px × 256 px for input to the CNN. The resized image is annotated with [non-penetration, penetration, and pull] labels, and the CNN estimates the state of the three classes. Image data was used when a suturing task was executed by the master--slave device in Figure 1 for the learning of state estimation. The total of 15,505 images were created for learning, and the ratio of the number of each state was approximately [non-penetration : penetration : pull] = [1 : 1 : 1]. The learning data was manually annotated with 1,471 images that were resampled every 10 samples, excluding images where the needle was not visible. YOLOv3 learning is a single classification of suture needles. Object detection was conducted for all the images using learned YOLOv3. Images in which the suture needle could not be detected were qualitatively removed. The image in which the suture needle was finally detected was 13,386, and the learning of state estimation was performed using these images.
The structure of the constructed CNN is shown in Table 2. The CNN for state estimation receives the detected image of the suture needle resized to 256 px × 256 px and classifies it as a [non-penetration, penetration, or pull] state. The weights and biases of the CNN were initialized to the initial value of He et al. [17]. All convolution layers were activated by the Relu function and optimized using Adam [18], [19]. The extracted image vector is smoothed after convolution layers and classified into three states using the softmax function. In this study, a standard CNN was designed to focus on human collaboration and high processing speed, not Alexnet or Resnet, which scored significantly well in the image categories [20], [21]. In this study, the images of the suture needle detected by YOLOv3 were prepared for learning the CNN. The training data were 13,386 images in which the suture needle was detected by YOLOv3. To avoid control errors due to misdetection of the state estimator, the state transition was programmed to occur when the penetration or pull was estimated 20 times continuously. In addition, the pull was processed to transition only from the penetration.

A. HUMAN COOPERATIVE EXPERIMENTS
Verification experiments of human cooperative control in the suturing task were performed using the state estimator. The operator manipulates a single Slave A1 through a single master. A trial that achieved Step 4 is considered a success. On the other hand, the trial Slave A2 missed gripping the needle and did not reach Step 4, which was considered a failure. The initial condition of the task was that the system state is in Step 1 and Slave A1 is grasping the needle. The termination condition is that the system is in Step 4 and Slave A1 grasps the needle held by Slave A2. We conducted 20 trials. The proposed method estimates the state of the suturing task at a sampling rate of approximately 10 Hz. The experimental results are presented in Table 3. In the 20 trials, 12 suturing tasks succeeded in the first approach to the suture needle. The number of trials that succeeded after approaching the needle two or three times were 4 and 2, respectively. There were two trials in which the pull of the suture needle was not confirmed after three approaches.

B. STATE ESTIMATION
The accuracy of needle detection by YOLOv3 and the accuracy of state estimation by CNN are shown in Table 4. The accuracy of state estimation was verified only for images in which the suture needle was detected by YOLOv3. Table 4 presents the needle detection and state estimation accuracy VOLUME 8, 2020 of each step. Because Steps 3 and 4 were not performed in the 2 tasks in which the suturing task failed, the accuracy of Step 3 and Step 4 indicates the average of the 18 tasks.

A. HUMAN COOPERATIVE EXPERIMENTS
Results of the human cooperative experiments for the suturing task demonstrated that the proposed method detected the penetration and pull of the needle, and automated part of the suturing task. State estimation using an image is useful in that the state is estimated regardless of the stiffness of an object to be sutured because the penetration or pull of a needle is detected from the image feature around the needle. When the needle posture has an error of less than 20 • from the ideal posture, the proposed method succeeds by the second or third grasping motions. Note, the indicated suturing task would have failed using the previously proposed method [10]. This is because the previous method assumed an ideal needle posture and did not have a system to detect pulling failure. An image-based state estimator has a significant advantage of the proposed method because it has the potential for successful suturing tasks despite the needle tip not being in an ideal position.
However, the experimental results recorded 2 failures in 20 trials. The reason for these failures was that the posture of the needle was rotated more than the expected angle of 20 • . In addition, there was a scene in which Slave A2 grasped the simulated organ by mistake; however, it is desirable for the needle to be accurately pulled out by one grasping operation considering the damage to a patient' s organ. These concerns may be resolved by measuring the tip position of the suture needle. Image-based estimation of needle posture is possible, as Kurose et al. demonstrated, using a 3D model of a suture needle and template matching [22]. Furthermore, the use of external force information acting on forceps will lead to safe, human cooperative robot control.
In this study, when automatically controlling Slave A2, a control signal was given for a linear movement to a target position. However, in real-time surgery, such a simple linear motion might cause an undesirable contact between the organ and Slave A2. Therefore, it is desirable to develop a method for generating trajectories that avoid organs for clinical application. The development of a trajectory generation system that uses YOLOv3 to detect organs and avoid undesirable contact during movement is considered for future studies.

B. STATE ESTIMATION
The false detection of the state estimation in Steps 1 and 2 leads to the failure of the suturing task. This is because human cooperative robot control is performed based on the results of state estimation in Steps 1 and 2. As Step 3 and Step 4 execute automated control regardless of the estimated state, there is no failure of the suturing task due to erroneous detection of the state. Table 4 presents that the state estimation accuracy of Steps 1 and 2 is high. The effect of false detection in the same step was removed by the processing, which shifted the state when the same state was estimated continuously for 20 frames.
Examples of state estimation errors are shown in Figure 5. Although Figure 5(a) is an example of the pull state, the state estimator indicated non-penetration in Step 4. The error is considered to have occurred because the learning data was not created for the two slaves located near the needle. However, the state estimation error that occurred in Step 4 does not affect human cooperative control because Slave A2 is in pause mode. An example of the state estimation error that occurred in Step 1 is shown in Figure 5(b). Figure 5(c) presents non-penetration, whereas the state estimator indicates penetration. Because the state estimation error that occurred in Step 1 can possibly transition to Step 2 before the actual penetration, an improvement in accuracy is required. Therefore, expanding the learning data would be effective in improving the estimation accuracy. By automatically annotating data using previous needle detection methods, it is possible to expand the learning data without manual annotation [12], [13], [22]. In this study, the proposed state estimation method was validated using a suturing training kit. However, in real-time surgery, the background has complicated shapes and colors, such as organs, thus the proposed method is planned to be validated in vivo in the future. In addition, tissue mechanics is also an essential problem to achieve the suturing task in a real surgical situation [23]. Therefore, kinematic conditions for robot motion generation along with conditions for tissue mechanics will be added as the next step.

VI. CONCLUSION
In this study, a human cooperative suturing task control system was developed using a state estimator that combines YOLOv3 and a standard CNN. The proposed method estimates the penetration state regardless of the stiffness of the sutured object by estimating the penetration state from only the image. The effectiveness of the proposed method was confirmed when there was an error in the position of the suture needle, in which the robot using the method in a previous study [10] failed to grasp the needle. The following will be considered for future studies: By adding an algorithm to the proposed method to estimate the three-dimensional posture of the suture needle, the needle will be pulled out at once regardless of the posture error. Because the accuracy of suture needle detection and state estimation for suturing tasks performed in an unlearned environment is insufficient, the expansion of learning data using existing suture needle detection methods is planned to improve the estimation accuracy. For clinical application, a trajectory generation method that detects organs and avoids undesirable contact during movement will be developed. In addition, tissue mechanics conditions will be added to the robot motion generation as the next step.
TAKUTO MIKADA received the B.S. degree in mechanical engineering from the Department of Systems Design Engineering, Faculty of Engineering Science, Akita University, Akita, Japan, in 2018. He is currently pursuing the master's degree in health sciences and biomedical engineering track with Tokyo Medical and Dental University, Tokyo, Japan.
His research interests include medical robotics, machine learning, and control engineering.