Breast Cancer Patient Auto-Setup Using Residual Neural Network for CT-Guided Therapy

Patient setup will influence the treatment of the breast cancer in radiation therapy. Improving the accuracy of the tumor target localization is vital for the cancer treatment. In this study, we focus on the breast patient setup and develop an accurate tumor localization method based on the deep learning in radiation therapy. The proposed method used a double residual neural network model to achieve the high precision and efficiency patient tumor localization. In the network training, the model attempt to localize the breast and then detect the landmarks inside the localized region. After the model training, we used an iterative filter scheme for calculating a transformation to the daily CT. Therefore, the gray value distribution can match well with the training image. The final landmark positions were obtained after the iteration. The translation errors in the daily CT were determined using the detected landmarks. We used the digital CT phantom images and the real patient CT images to evaluate the proposed method. Then result of the breast patient setup was shown to be clinically acceptable. The mean and standard deviation setup errors were 0.64 ± 1.40 mm, 0.15 ± 1.28 mm, −0.46±1.17 mm in the anterior-posterior, left-right, and superior-inferior, respectively. In conclusion, we proposed an accurate patient setup method, which shown a very promising alternative for marker-free breast auto-setup.


I. INTRODUCTION
Researches have demonstrated that increasing the delivered radiation dose to the breast cancer will improve the disease control, especially for the patients who suffer from the advanced disease. Image segmentation and localization technique will improve the quality of the treatment [1]- [8]. However, with the current tumor localization methods, increasing the delivered dose to the breast or the heart will result in a higher dose to the surrounding normal tissues. The position of the breast is not identified accurately for the different fractions due to the random movement of the internal organs. To take into account this uncertainty, an extensive safety margin is often implemented around the clinical target volume to ensure that the entire breast cancer will always receive the expected daily radiation dose.
The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Zia Ur Rahman .
Many researchers studied the accurate tumor localization in radiation therapy. Among all tumor localization methods, X-ray imaging is the most widely used image for localizing the tumor. The X-ray guided prostate localization methods can be divided into two major folds: two-dimensional (2D) X-ray projection and 3D CBCT image. The limitation of the 2D X-ray is that the correlation of the bony pelvis anatomy and the position of the breast tumor is weak because independent inter-fractional motion exists between the bony anatomy and the breast tumor [9]. The limitation of the 3 CBCT image is that (a) the inherently low image contrast of the breast tumor and (b) substantial daily breast tumor position changes and some breast tumor deformations [10]. Therefore, the required further manual intervention for correcting the registration errors is thus becoming a barrier to breast tumor localization.
In this paper, we try to find an accurate translation parameter for the patient setup to achieve the breast cancer targeting. Different from the registration approaches based on the image intensity, in this paper, we establish a patient-specific deep learning model to achieve the breast auto-setup. The setup errors between the planning CT (pCT) and the daily CT is obtained by aligning the corresponding breast landmarks which are outputted from the deep learning model [8], [11]- [13]. It can address the drawbacks in the traditional methods through the following contributions: 1) Instead of requiring a large patient dataset, the training data in the article is generated using a patient-specific data augmentation strategy, which can account for the changes of the image content between the daily CT and the pCT. We develop a new architecture of network named double residual network aiming at the landmark detection on the breast image. Specifically, the double residual network consists of two deep residual convolutional neural networks (Resnet), with each focusing on one specific task. In the first step, we propose a Resnet based regression model, aiming to localize breast. To make precise landmarks detection, in the second step, we enlarge the breast image in the located image, which is outputted from the first step. The enlarged breast image is then inputted into another Resnet model. Importantly, with the advantage of deep feature extraction using the Resnet, the proposed network can detect the landmarks accurately, even if there are no fiducial markers (FMs) implantation. Besides, the double residual network can simultaneously detect large-scale landmarks in real time because of the parallel computation. 2) We proposed an iterative prediction strategy to the daily CT image in predicting (testing) the landmarks' position, which can make the most of the constraint of the relative distance between each landmark. The iterative prediction allows us to incorporate prior knowledge into the landmark prediction more effectively [14]- [16]. In this sense, the breast setup is no longer a conventional image registration problem. Rather, it becomes a breast detection/localization problem. This approach gives different perspectives to the patient auto-setup in the daily CT guided radiotherapy and sheds some new insight into the organ localization [17]- [22].
In Summary, the major contributions of our work were in three folds: 1) We proposed a patient-specific model for tumor localization without requiring a large patient dataset. It can mitigate the human labor in the data annotation. 2) A new architecture of network using residual network was proposed. With the advantage of deep feature extraction using the Resnet, the proposed network can detect the landmarks accurately, even if there are no FMs implantation. 3) An iterative prediction strategy was proposed to the daily CT image in predicting (testing) the landmarks' position, which can make the most of the constraint of the relative distance between each landmark. We organize this paper as follows. In section 2, we provide a patient-specific pCT augmentation model to generate the training data and describe a double residual network suitable for the breast auto-setup. In section 3, we demonstrate the technical feasibility of the proposed framework by testing 270 pCT cases and 80 clinical breast daily CT cases. Section 4 summarizes and discusses the results. Section 5 concludes the proposed method.

A. TRAINING DATA GENERATION
In this section, we attempt to deal with the challenging problem of the limited training data in detecting landmarks with breast CT image. As a typical procedure of the breast external beam radiation therapy, the breast contours were first manually outlined on the pCT image. Therefore, the landmarks can be selected from the on the breast contours as the label data. There is only one pair of training data if we do not add other patients' pCT image. Therefore, we enlarge the dataset through the use of image augmentation technique, which is usually implemented in deep neural network [23]. In this study, a patient-specific image augmentation model for pCT was proposed to generate a large amount of training data without relying on the other patients' image data. For taking into account the different positions of the patients in the treatment setup, the proposed augmentation model first applies the random geometric transformation to the pCT image. Different from the conventional augmentation technique, to improve the testing accuracy of landmarks detection on the daily CT image, we develop a random image transformation processing accounting for the image modality and content changes. The random image transformation does not require any parameter learning, which can be easily implemented to the other convolutional neural network (CNN) regression tasks. The details of the random geometric and image transformation are explained as follows.

1) RANDOM GEOMETRIC TRANSFORMATIONS
The most popular current practice for data augmentation is to perform the image geometric transformations. In this paper, the methods of the geometric transformations include rotation, translation, and rescaling. The rotation angles are randomly chosen from −9 • to 9 • in the yaw, pitch, and the roll direction. The translation distances are randomly generated from −60 mm to 60 mm in the in the Anterior-Posterior (AP), the Left-Right (LR), and the Superior-Inferior (SI) direction. The scaling factor are randomly from 0.8 to 1.2. The selected parameters here are reflected the possible geometric changes in the patient breast setup.

2) RANDOM IMAGE TRANSFORMATIONS
The image transformations include the random Gaussian filtering and random erasing. The image transformations are conducted with a certain probability in training. For an image I in a mini-batch, the probability of it undergoing random image transformation are p, and the probability of it being kept unchanged is 1-p. Gaussian filtering selects a standard deviation σ of the Gaussian distribution randomly in the range specified by the minimum σ l and the maximum σ u . VOLUME 8, 2020 After that, the random erasing is applied to the image [24]. A cube region I r is randomly selected to be erased. The voxel values in the erased areas are set to zeros. Suppose that the voxel size of the training image V = l × w × h. We randomly initialize the edge length of erasing cube region to r, which is in range by minimum r l and maximum r u . Then, we randomly initialize the point P = (x r , y r , z r ) in I . We set the region I r = I (x r , y r , z r , x r + r, y r + r, z r + r) as the selected rectangle region if I r do not cover the breast C (I r ∩ C = ∅). Otherwise repeat the above process until an appropriate I r is selected. The parameter setting and the procedure of the random image transformation is summarized in Table 1.

B. THE ARCHITECTURE OF THE DOUBLE RESIDUAL NETWORK
In this section, we focus on detecting the landmarks accurately on the breast CT image using the training data generated from the above section. As demonstrated in Fig. 1, to detect the landmarks on the breast image, we develop a double residual network framework. The task of the first-step residual network regression model is to find the 3D coordinate of the breast center and localize the breast. The second step is to measure the landmarks' 3D positions on the breast shown in Fig. 1(b). The sub-sections illustrate the architecture of the double residual network framework in detail.

1) FIRST STEP: LOCALIZING THE BREAST IN CT IMAGE
We use the local pCT image, as the input data. The entire pCT is cropped into a region of interest (ROI) in the first step as shown in Fig. 1(b). The breast will be inside the ROI because the initial setup correction is carried out to the patient using the sign on the patient skin or the fixed device before scanning the daily CT. Therefore, the breast will not move faraway in the setup treatment relative to planning. In the training stage, the label of the breast center is calculated by averaging all the 3D coordinates of the selected landmarks on the breast.
As demonstrated in Fig. 1(a), the 1st step of the double residual network is to predict the coordinates of the center of the breast cancer. If the input and output are in the same dimensions as seen in the solid arrows, the identity shortcut is used directly. If the dimensions are being increased, the projection shortcut shown in the dotted arrows is used to match dimensions by using the convolutional layer. The detail explanation of shortcut can be found in the Ref [25].

2) SECOND STEP: DETECTING THE LANDMARKS IN THE LOCAL CT IMAGE
Given the 3D coordinates which are outputted from the 1st step, we can extract a 3D image cover the breast shown in Fig. 1(a). The edge length of the extracted 3D image is the same as the planning target volume. The task of the 2nd step is to detect the landmarks in the extracted 3D image. In other words, given the extracted 3D image, the network in the 2nd step aims at formulating a non-linear mapping to output the landmarks' 3D coordinates on the breast [26]. To improve the accuracy of the landmark detection, we enlarge the voxel matrix of the 3D breast image with a smaller physical voxel size before feeding the image into the non-linear mapping as seen in Fig 1(b). The architecture of the non-linear mapping in the 2nd step can be similar with the 1st step that treats the 3D images as input and the 3D coordinates as output.

3) PARAMETER SELECTION AND NETWORK TRAINING
The double residual network is a regression model. The cropped pCT images are taken as the input data. The input data is first connected with the 7 × 7 × 7 convolutional layer, and then followed by 8 residual blocks. We double the number of filters in every two residual blocks shown in Fig. 1. The filter size is n×n×n in the residual blocks. Because the performance may be influenced by the number and the size of the filter, we will carry out the double residual network with different parameters in the di. Training the double residual network is to find a non-linear mapping for detecting the landmarks on the CT image. All the weighting and bias in the network can be estimated by optimizing a loss function. The training in the 1st step can be written as * where f 1 is the architecture of the Resnet with the initial weighting and bias 1 . The input training set is X p1 and the is the mean square error (MSE). After the training in the 1st step, the 3D coordinates of the breast center Y pred1 = f * 1 , X p1 is outputted. The extracted 3D images X p2 and their labels Y p2 are obtained using the output Y pred1 in the 1st step. The training in the 2nd step can be written as * where f 2 is the architecture of the Resnet with the initial weighting and bias 2 . The input training set is X p2 and the is the MSE.
We set the initial learning rate at 10 −2 .

C. PREDICTING (TESTING) ON THE DAILY CT IMAGE
Due to the different image modality between the pCT and the daily CT, the results are not accurate if the prediction is directly carried out on the daily CT image. To improve the prediction accuracy, an image transformation is applied to the daily CT image so that its gray value distribution can be as close as the training image. However, the prediction results are sensitive to the parameter of the image transformation. To address this problem, we proposed an iterative prediction strategy to find a preferable parameter for daily CT image transformation. Instead of directly predicting the landmarks on the daily CT image, we incorporate the relationship between each landmark into a constraint, which aims at maintaining the shape of the predicted landmarks on the daily CT image. Because the relative difference of each voxel inside the breast will not distort exceed 3 mm as reported by various publication [27]- [29], the relative distance between each predicted landmark should be not far away from the corresponding distance on the pCT image. With the prior knowledge from the pCT image, the landmarks in the prediction image is no longer independent. For a given daily CT image, we carry out the method of the image transformation mentioned in Algorithm 1. Instead of setting the parameter randomly, in this section, the image transformation parameter t = [σ, r, x r , y r , z r ] is determined by a proposed objective function. To find a preferable t * from the feasible solution that maintaining the shape of the predicted landmarks on the daily CT image, the goal of the objective function is given by where D pred2 and D p are the average value of the relative difference between each landmark in the daily CT and the reference pCT image, respectively. D pred and D p can be written as where y pred2, t (i) is the i th predicted daily CT landmark outputted from the double residual network through the use of the image transformation with parameter t. y p (i) is the i th reference pCT landmark. N is the number of the landmarks. Equation 3 is a non-linear and non-differentiable function because of introducing the double residual network. Existing optimization algorithm, such as Newton's method and gradient descent, cannot solve this problem. In this work, the objective function is solved using the mesh adaptive direct search algorithm which is a class of derivative-free direct search method. The details of the algorithm have been given in Refs. [30] and [31] and will not be repeated here. With the VOLUME 8, 2020 initial parameter setting t 0 , the iterative prediction process is self-updating during each iteration. If the relative difference between each landmark in the daily CT and the reference pCT image D pred2 t − D p 2 ≥ 3, the iteration process continues using Eq. 3. Otherwise, the preferable t * is returned. If the maximum iteration number is reached, the process also will be stopped.

D. EVALUATION OF THE ALGORITHM PERFORMANCE 1) DATASET
In this paper, the daily CT images are acquired at different time during the setup treatment of the patient. To make a quantitative evaluation on the proposed method without relying on the FMs, the reference translational parameters are determined by the experienced physician. To train and verify the proposed marker-free breast auto-step, all the CT images are divided into three sets: the training set, the validation set, and the testing set. In the training stage, the number of the landmarks is 10. The training and the validation sets are a group of 900 pCT images and 100 pCT images, respectively. These data are augmented from the patient-specific pCT image using the proposed method mentioned in section 2. The testing set has two parts: the testing digital CT set and the testing daily CT set, which is explained as following.

2) EVALUATION OF THE TESTING DIGITAL CT SET
The testing digital CT set is a group of 270 cases of the dCT which are acquired using the augmentation model. Because the images and the existing landmarks are transformed with the known transformation matrices, the ground truth of the landmarks on the testing digital CT images are exist. Therefore, the ground truth of the translational parameters can be calculated. We can assess the accuracy of the translational parameters on the testing digital CT set quantitatively by comparing the proposed method with the ground truth.

3) EVALUATION OF THE TESTING DAILY CT SET
The testing daily CT set is a group of 80 cases of the daily CT images, which were augmented from the patient with 20 cases of daily CT acquired at a different time. Different parameters of the translation are applied in the testing daily CT augmentation. However, the evaluation of the translational parameters is often a difficult task on the testing daily CT set because of the lack of the ground truth. To address this problem, we employed an experienced physician to position the patient and acquired the ground-truth of the translational parameters.

A. THE TESTING DIGITAL CT STUDIES
The proposed method is first evaluated on the testing digital CT images. The quantitative analysis of the translational parameter errors is shown in figure 2. For the analysis of the translational error, figure 2(a) plots the translation errors between the ground truth and the proposed method in 3D.
The frequency histograms, figures 2(b) indicate the ensemble mean and standard deviation (SD) for the distribution of translation errors. Figure 2(c) shows the 2D correlations of the proposed method and the reference. The analysis of translation error from row 1 to row 3 are in the direction of AP, LR, and SI, respectively.
We can make the following observations. First, the ensemble mean and SD for the distribution of the error is 0.06 mm (SD 0.87 mm), -0.27 mm (SD 0.70 mm), and 0.16 mm (SD 0.56 mm) in the direction of AP, LR, and SI, respectively. Second, all the Pearson's correlation coefficients are very close to one, which implies the relationship between the ground truth and the proposed method is positive linear correlation. Third, we find that most of the translational errors are less than 2 mm. The largest translational errors are less than 4 mm. The results are found to be clinically acceptable.

B. THE TESTING DAILY CT STUDIES
We now illustrate the results of the proposed method on the testing daily CT images. As mentioned in section 2, the testing daily CT set is a group of 80 cases of the daily CT images, which were augmented from the patient with 20 cases of daily CT acquired at a different time.
The proposed method is also evaluated on the testing daily CT images using the similar evaluation method with test digital CT images. The quantitative analysis of the translational parameter errors is shown in figure 3. In figure 3(a), most of the translation errors are less than 2.5 mm. The largest translational errors is less than 4 mm. Although the results in testing daily CT images are larger than that in the testing digital CT image, the results are still found to be clinically acceptable. Figures 3(b) indicates the ensemble mean and standard deviation (SD) for the distribution of translation errors. The ensemble mean and SD for the distribution of the translational error is 0.44 mm (SD 1.64 mm), 0.14 mm (SD 1.34 mm), and -0.33 mm (SD 1.50 mm) in the direction of AP, LR, and SI. Figure 3(c) shows the 2D correlations of the proposed method and the reference and all the Pearson's correlation coefficients are also very close to one. We also compared our work with the most recent publish works [32], which was < 1 mm in 89.1% of the daily CT. The proposed results were improved to 89.4% of the daily CT.

IV. DISCUSSION
The accuracy of the landmark detection will directly influence the auto-setup. Therefore, we attempt to focus on improve the performance of the double residual network. In this section, we first flesh out the difference between the double residual network and the conventional network, and then analyze the influences of the parameters.

A. IMPACT OF THE NETWORK ARCHITECTURE
There are at least two differences between the proposed double residual network method and the previous regressionbased method. First, instead of using the conventional stacked layers, we adopt a regression model using the Res-net in  It is evident that the RMSE achieve an error of 2 mm in VOLUME 8, 2020 the proposed method, which is better than that without residual block. Second, instead of directly detect the landmarks, the proposed two step network can improve the accuracy. The experimental results are shown in figure 4(b). The RMSE using the proposed method is smaller than the method with only one-step at the end of the epoch.

B. IMPACT OF THE FILTER NUMBER AND THE FILTER SIZE
The filter number and the size of the double residual network need to be optimized. Although the proposed method may be more difficult to reach a convergence before the 10th epoch, the final RMSE is lower. As compared the performance with different filter number, we find that the RMSE is the lowest when k = 4. The possible reason is that as the depth of the network increases, the network enables to extract a deeper feature of the image. However, as the filter number increase, the computer memory needs to be enlarged. Therefore, based on existing computer configuration, we set k = 4. Figure 5(b) shows the influence of the filter size.

C. IMPACT OF THE ITERATIVE PREDICTION STRATEGY
As mentioned in section 2, we proposed an iterative prediction strategy to find a preferable parameter for daily CT image transformation. To evaluate the impact of this strategy on the testing daily CT image, we add the experiment that predicting the landmarks directly on the daily CT image. The proposed method achieves lower setup errors on the testing daily CT images than the method without iterative prediction.

D. LIMITATIONS AND FUTURE WORK
First, 3D neural networks are included residual network. Therefore, the training processing is time consuming and it takes 21 hours for one case. In the future, we will focus on the efficiency improvement in the training step using down-sampling or other preprocessing technique on the training image. Second, only 20 cases of the testing daily CT image were evaluated in this study. As one of our future work, we will test more cases to further assess the robust of the algorithm. Of course, the proposed method not only can apply in the breast setup, but also any other organ such as spine, brain, liver and so on. Third, image artifacts such as the scatter, ring, metal artifact will hamper the accuracy of the setup [33]- [44].

V. CONCLUSION
The marker-free breast setup was formulated as a landmark alignment problem and the concept of the patient-specific image augmentation model was introduced to account for the changes of the image content between the daily CT and the pCT. Instead of requiring a large patient dataset, the approach allows us to train the network with limited data. Besides, this new architecture of network can simultaneously detect the landmarks fast. Furthermore, an iterative prediction and the determination of the parameters in the setup treatment were proposed. In this sense, the breast setup is no longer a conventional image registration problem. Rather, it becomes a breast detection/localization problem. This approach gives different perspectives to the patient auto-setup in the daily CT guided radiotherapy and sheds some new insight into the image-guided radiation therapy. The results suggest that it has strong potential to replace the current image registration process and makes it possible to full-automatic and the high-precise image-guided radiation therapy without the manual intervention.