Introduction
Prostate cancer (PCa) is the most common solid organ malignancy and is among the most common causes of cancer-related death among men in the United States [1]. Multi-parametric MRI (mpMRI) is the most widely available non-invasive and sensitive tool for the detection of clinically significant PCa (csPCa), 70% and 30% of which are located in the peripheral zone (PZ) and transition zone (TZ) respectively [2], [3]. The clinical reporting of mpMRI relies on a qualitative expert consensus-based structured reporting scheme (Prostate Imaging-Reporting and Data System (PI-RADS)). The interpretation is based primarily on diffusion-weighted imaging (DWI) in the peripheral zone (PZ) and T2-weighted (T2w) imaging in the transitional zone (TZ) since csPCa lesions have different primary imaging features [2], [3].
Accurate segmentation of PZ and TZ within the 3T mpMRI is essential for localization and staging of csPCa to enable MR targeted biopsy and guide and plan further therapy such as radiation, surgery, and focal ablation [4]. Segmentation of the prostate zones on mpMRI is typically done manually, which can be time-consuming and sensitive to readers’ experience, resulting in significant intra- and inter-reader variability [5]. Automated segmentation of prostatic zones (ASPZ) is reproducible and beneficial for consistent location assignment of PCa lesions [6]. ASPZ also enables automated quantitative imaging feature extraction related to prostate zones and can be used as a pre-processing step to improve the computer-aided diagnosis (CAD) of PCa [7].
ASPZ was previously proposed by the atlas-based method [8]. Later, Zabihollahy et al. [9] proposed a U-Net-based method for ASPZ. Clark et al. [10] developed a staged deep learning architecture, which incorporated a classification into U-Net, to segment the whole prostate gland and TZ. However, the U-Net-based segmentation sometimes resulted in inconsistent performance because the anatomic structure of the prostate can be less distinguishable, and the boundaries between PZ and TZ may distort semantic features [5]. Liu et al. [5] recently improved the encoder of the U-Net by using the residual neural network, ResNet50 [11], followed by feature pyramid attention to help capture the information at multiple scales. Furthermore, Rundo et al. [12] proposed an attentive deep learning network for ASPZ via incorporating the squeeze and excitation (SE) blocks into U-Net. SE adaptively recalibrated the channel-wise features to potentially improve inconsistencies in the segmentation performance.
Moreover, segmentation outcomes from ASPZ are typically deterministic; there is a lack of knowledge on the confidence of the model [13]. Providing uncertainties of the model can improve the overall segmentation workflow since it easily allows refining uncertain cases by human experts [13]. The uncertainty can be estimated by the Bayesian deep learning model, which not only produces predictions but also provides the uncertainty estimations for each pixel. This can be done by adopting probability distributions of weights rather than the deterministic weights of the model.
In this study, we propose an ASPZ with an estimation of pixel-wise uncertainties using a spatial attentive Bayesian deep learning network. Different from Rundo et al. [12], we adopt a spatial attentive module (SAM), which models the long-range spatial dependencies between PZ and TZ by calculating the pixel level response from the image [14]. The proposed model incorporates four sub-networks, including SAM, an improved ResNet50 with dropout, a multiple-scaled feature pyramid attention module (MFPA) [5], and a decoder. The SAM forces the entire network focusing on specific regions that have more abundant semantic information related to prostatic zones. We use the improved ResNet50 to handle the heterogeneous prostate anatomy with semantic features. The MFPA is designed to enhance the multi-scale feature capturing. Finally, the spatial resolution is recovered by the decoder. We also implement the Bayesian model through both training the proposed model with dropout and Monte Carlo (MC) samples of the predictions during the inference, inspired by prior work by Gal and Ghahramani [15]. The dropout can be regarded as using Bernoulli’s random variables to sample the model weights [15].
We evaluate the proposed model’s performance using internal and external testing datasets and compared it with previously developed ASPZ methods. The segmentation performance is compared to investigate the discrepancy between two MRI datasets. The importance of each individual module within the proposed method is also examined. Finally, the overall prostate zonal segmentation at apex, middle, and base slices are computed to illustrate the uncertainty of segmentation at different positions of the prostate.
Materials
This study was carried out in compliance with the United States Health Insurance Portability and Accountability Act (HIPAA) of 1996 with approval by the local institutional review board (IRB). The MRI datasets were acquired from two sources. For model development and internal testing (n = 259 and n = 45)—internal testing dataset (ITD)—we used the Cancer Imaging Archive (TCIA) data from the SPIE-AAPM-NCI PROSTATE X (PROSTATE X) challenge. [16] For independent model testing, we used an external testing dataset (ETD) (n = 47; age 45 to 73 years and weight 68 to 113 kg) retrieved from our tertiary academic medical center. For the ETD, the pre-operative MRI scans, which were acquired between October 2017 and December 2018 using one of the three 3T MRI scanners (Skyra (n = 39), Prisma (n = 1), and Vida (n = 7); (Siemens Healthineers, Erlangen, Germany)) were collated.
For both ITD and ETD data, both PZ and TZ were contoured using OsiriX (Pixmeo SARL, Bernex, Switzerland) by MRI research fellows. Then, two genitourinary radiologists (10-19 years of post-fellowship experience interpreting over 10,000 prostate MRI) cross-checked the contours. The axial T2 TSE (turbo spin-echo) MRI sequence was used for both ITD and ETD segmentation (Table 1). Prior to the training and testing, all the images in both datasets were normalized to an interval of [0, 1] and were also resampled to the common in-plane resolution (
Methods
A. Proposed Model for Automatic Prostatic Zonal Segmentation
The overall workflow of the proposed network is shown in Figure 1, which consists of four sub-networks. By joining the four sub-networks together, a fully end-to-end prostatic zonal segmentation workflow was formed. Both PZ and TZ segmentations were done simultaneously using a single network.
A whole workflow of the proposed model. Input is a 2D T2w MRI slice, and output is a segmentation mask, which has the PZ and TZ segmentation result (Gray and white colors indicate PZ and TZ, respectively), and a pixel-wise uncertainty map (yellow pixel indicates large uncertainty and blue indicates low uncertainty). There are four sub-networks in the network, which are (a) spatial attention module (SAM), (b) improved ResNet50, (c) multiple-scaled feature pyramid attention (FPA), and (d) decoder.
1) Spatial Attentive Module (Sam)
Inspired by Wang et al. [14], the SAM was designed to make the network intelligently pay attention to the regions, which had more semantic features associated with PZ and TZ (shown in Figure 1.a).
Inside the images, there existed some spatial dependencies of PZ and TZ pixels. For instance, TZ was always surrounded by PZ in the bottom of the prostate, and the urinary bladder region was always above PZ and TZ in the image and TZ was usually in the image center. SAM helped the network to model such spatial dependent information through global features. Specifically, the response at each pixel was computed by considering all the pixels in the image. Higher priorities were then adaptively assigned to the pixels, which had more informative semantic features.
Detailed processes regarding spatial attention are shown in the left bottom of Figure 1. After going through a convolution layer and reshaping, three kinds of vectors - query vector \begin{equation*} y=\mathrm {softmax}\left ({{\alpha (x\mathrm {))}}^{T} {\ast }\left ({\beta \left ({x }\right) }\right) }\right) \ast g\left ({x }\right)\tag{1}\end{equation*}
2) Improved ResNet50 With Dropout
Improved ResNet50 (shown in Figure 1.b) was served as the bone structure of the network. ResNet50 in this paper was improved by the following three steps, which followed the methods of Liu et al. [5]. First, the initial max-pool was removed since it was proved to compromise the performance of segmentation. Bottleneck block at stride one as the first block in the 4th layer was replaced with the regular block. Then, we used the dilated bottleneck to serve as the second block in the 4th layer so as to minimize the potential loss to the spatial information. Finally, the dropout layer was inserted after each block within the improved ResNet50 to transform the current neural network to the bayesian neural network [17].
3) Multi-Scaled Feature Pyramid Attention (MFPA)
Feature pyramid attention (FPA) module (shown in the bottom right of Figure 1) was applied after each layer within Resnet50 to help capture the features from the multiple scales. Next, feature maps after each FPA were then upsampled to the same size and then concatenated in the decoder.
4) Decoder
The decoder (Figure 1.d) was used to recover feature maps’ spatial resolution. In the decoder, the total features calculated in the 3) went through two
B. Uncertainty Estimation for Prostate Zonal Segmentation
Figure 1 shows the uncertainty estimation workflow by the proposed method. Monte Carlo dropout [15] was served as the method for approximate inference.
Usually, a posterior distribution \begin{equation*} -\sum \limits _{c=1}^{C} \frac {1}{T} \sum \limits _{t=1}^{T} {p(y=c\vert x,w_{t}))\mathrm {log}\left({\frac {1}{T}\sum \limits _{t=1}^{T} {p\left ({{y=c}\vert {x,w_{t}}}\right)} }\right)}\tag{2}\end{equation*}
C. Average Uncertainty Maps for the Prostate Zonal Segmentation
The average uncertainty map tells the overall zonal uncertainty in different positions on the prostate image. Figure 2 shows the processes of obtaining the average uncertainty map.
The overall workflow for the registration of the sample (one of the non-templates) uncertainty map to the template uncertainty map. A1 and A2 are a template image and its uncertainty map. B1 and B2 are a sample image and its uncertainty map, respectively. C shows the result after the zonal boundary registration between the sample and the template. Red and blue points represent the zonal boundaries on the template and the sample images, respectively. D is the warped uncertainty map based on the corresponding zonal points after the registration. E and F show the overlapping of zonal boundary points and uncertainty maps before and after registration.
In order to obtain the average uncertainty map at the prostate apex, middle portion, and base, three template prostate images at the three sections were chosen by a radiologist after inspecting all the prostate images. Next, for each prostate section, zonal boundary points on non-template prostate images (sample images) were then registered to those on the prostate template image within the section using a non-rigid coherent point drift method (CPD) [20]. Within non-rigid CPD, alignment of two-point sets was thought of as a probability density estimation problem where one point set serves as the centroids of the gaussian mixture model (GMM), and the other represents the data points. By maximizing the likelihood, GMM centroids were then fitted to the data. Also, GMM centroids were forced to move coherently to preserve the topological structure by regularizing the displacement field and utilizing the variational calculus to obtain the optimal transformation.The thin plate spline (TPS) method [21] was then used to warp the sample uncertainty maps to the template uncertainty map based on the corresponding zonal boundary points (Figure 2). In doing so, the average was computed among all the warped sample prostate uncertainty maps, including the template uncertainty map, yielding an average uncertainty map in this prostate section. In the end, three average uncertainty maps were obtained for the prostate apex, middle portion, and base.
In addition, the prostate zonal average uncertainty score for each prostate section was calculated by averaging all of the pixels’ uncertainties in the zone.
D. Model Development and Testing
Cross entropy (CE) served as the loss function to train the proposed model. For each given pixel, cross-entropy was formulated as, \begin{equation*} \mathrm { }CE=\frac {1}{3}\sum \nolimits _{i\mathrm {=0}}^{2} {-y_{i}\log {(p_{i})}} -\left ({1-y_{i} }\right)\log {\left ({1-p_{i} }\right)\mathrm { }}\tag{3}\end{equation*}
Training and evaluation were performed on a desktop computer with a 64-Linux system with 4 Titan Xp GPU of 12 GB GDDR5 RAM. Pytorch was used for the implementation of algorithms. The learning rate was initially set to 1e-3. The model was trained for 100 epochs with batch size 8. The loss was optimized by stochastic gradient descent with momentum 0.9 and L2-regularizer of weight 0.0001. The central regions (
Patient-wised Dice Similarity Coefficient (\begin{equation*} DSC=\frac {\mathrm {2\vert A\cap B\vert }}{\left |{ A }\right |+\vert B\vert }\tag{4}\end{equation*}
Patient-wise Hausdorff Distance (HD) [21] was also used to evaluate the segmentation performance, which is formulated as: \begin{equation*} HD\left ({X,Y }\right)=(h\left ({X,Y }\right),h(Y,X))\tag{5}\end{equation*}
E. Statistical Analysis
The distribution of DSCs was described by the mean and standard deviation. Paired sample t-test test was used to compare the performance difference between the proposed method and baselines on both ITD and ETD. The performance difference of the proposed method was also tested by paired sample t-test.
Result
A. Performance Using Internal Testing Dataset (ITD) and External Testing Dataset (ETD)
Figure 3 shows two typical examples of prostate zonal segmentation results by the proposed method and the three comparison methods, including Deeplab V3+ [22], USE-Net [12], U-Net [23], Attention U-Net [24] and R2U-Net [25]. USE-Net was proposed by Rundo et al for the prostate zonal segmentation, which embeds the squeeze-and-excitation (SE) block into the U-Net and enables the adaptive channel-wise feature recalibration. Attention U-Net, proposed by Ozan et al, which incorporates attention gates into the standard U-Net architecture to highlight salient features that passes through the skip connections. Deeplab V3+ [22] is one of the state-of-art deep neural networks for image semantic segmentation, which takes the encoder-decoder architecture to recover the spatial information and utilizes multi-scale features by using atrous spatial pyramid pooling (ASPP). Convolutional features at multiple scales are probed by ASPP via applying several parallel atrous convolutions with different rates. R2U-Net is an extension of standard U-Net using recurrent neural network and residual neural networks.
Two representative examples of the zonal segmentation by the proposed method, DeeplabV3+ USE-Net, U-Net. Yellow lines are the manually annotated zonal segmentation, and the red lines are algorithmic results. The top two and bottom two rows represent the segmentation examples from two different subjects.
Means and standard deviations of DSCs for PZ and TZ on ITD and ETD are shown in Table 2. Mean DSCs for PZ and TZ of the proposed method were 0.80 and 0.89 on ITD, 0.79 and 0.87 on ETD, which were all higher than the results obtained by the comparison methods with significant difference.
Means and standard deviations of Hausdorff Distance (HD) are shown in Table 3. The proposed method achieved the lowest mean HD among all the methods for both PZ and TZ segmentation.
Figure 4 showed the superior and inferior cases for the PZ and TZ segmentation. The superior case had DSC > 0.90 for PZ segmentation and DSC > 0.95 for TZ segmentation. DSCs of the inferior case were lower than 0.60 and 0.50 for the PZ and TZ segmentations, respectively.
Superior and inferior cases for PZ and TZ segmentation. Superior and inferior cases for PZ and TZ are shown in the first and second row.
B. Performance Discrepancy Between the Internal Testing Dataset (ITD) and External Testing Dataset (ETD)
There was no significant difference (p<0.05) between ITD and ETD for the performance of PZ segmentation for the proposed method. However, there was a 2.2% difference for the TZ segmentation (Table 2).
C. Performance Investigation for Each Individual Module in the Proposed Method
We carried out the following ablation studies to investigate the importance of each module within the proposed network. TABLE 4 indicates which module was used (a checkmark) or not used (a cross) in each experiment. We showed that the best model performance is achieved when both SAM and MFPA are used in the model for the zonal segmentation.
In experiment 1, DSCs for both zones on ITD and ETD decreased and were lower than the proposed model when SAM was removed from the proposed model, which proved that SAM helped improve the overall segmentation performance. In experiment 2, DSCs for PZ on ITD, and both zones on ETD decreased when MFPA was removed from the proposed model, indicating that that GFM was essential within the model.
D. The Overall Uncertainty for the Prostate Zonal Segmentation of the Proposed Method
Figure 5 and TABLE 5 shows the overall uncertainties of the proposed method for the prostate zonal segmentation. The pixel-by-pixel uncertainty maps showed that the zonal boundaries had higher uncertainties than the interior areas at the three prostate locations (apex, middle, and base slices). Also, highest uncertainties were observed at the intersection between the PZ, TZ and the AFS.
The pixel-by-pixel uncertainty estimation of the zonal segmentation at the apex, middle, and base slices of the prostate (top). The orange color indicates high uncertainties, and blue color indicates low uncertainties. Bottom: Average uncertainty scores (bottom left) and average normalized DSCs (bottom right; normalized by TZ DSC– 0.87 shown in in Table 4) with the standard deviation at the apex, middle, and base slices of the prostate (x-axis).
The TZ segmentation had lower overall uncertainties than the PZ segmentation, and the proposed method achieved better segmentation in TZ (DSC=0.87) compared to PZ (0.79). We used a normalized DSC (DSCnorm, normalized by TZ DSC – 0.87) to show relative differences at different locations of the prostate. For PZ segmentation, the highest overall uncertainty was observed at base, consistent with the worst model performance at base (DSC
Discussion
In this study, we proposed an attentive Bayesian deep learning model that accounts for long-range spatial dependencies between TZ and PZ with an estimation of pixel-wise uncertainties of the model. The performance discrepancy between ITD and ETD of the proposed model was minimal. There was no difference in PZ segmentation between ITD and ETD, and a 2.2% discrepancy in TZ segmentation. The average uncertainty estimation showed lower overall uncertainties for TZ segmentation than PZ, consistent with the actual segmentation performance difference between TZ and PZ. We attribute this to the complicated and curved shapes of PZ. The PZ boundaries generally have bilateral crescentic shapes, while the TZ boundaries are ellipsoid in shape.
SAM aided the model to focus on certain spatial areas in the zonal segmentation. This was done by the modeling of spatial dependencies with the help of global features. Since spatial attention was inserted adjacent to the raw images, large GPU memory was required to obtain the global spatial features during the training and evaluation. The SAM can be inserted into other positions within the network, but we observed that the zonal segmentation performed the best when the SAM followed directly after the raw image.
There exist high segmentation uncertainties on the zonal boundaries. This may be explained by the inconsistent manual annotations since the boundaries between TZ and PZ are hard to be defined precisely due to partial volume artifact. This resembles the “random error”, which persists throughout the entire experiment, so we call such uncertainty “random uncertainty” in the prostate zone segmentation.
The areas with the highest uncertainty are located at the junction of AFS, PZ and TZ. One possible reason is that it is hard for the MRI to distinguish the tissue around the junction. There is probably a significant reduction of signal by the more severe partial volume artifacts caused by PZ with the high pixel intensity, TZ with the intermediate pixel intensity and AFS with lower pixel intensity.
The overall uncertainties were higher at apex slices than those at base slices for the TZ segmentation. This may be caused by the fact that the size of TZ gradually increases from apex to base slices, making it hard to recognize the zone for the model. In contrast, the overall uncertainties for PZ were higher at the base slices than at the apex and middle slices. Similar to TZ, we attributed the low uncertainties to the large PZ structure between apex and middle slices [26] (Figure 6).
The estimation of pixel-wise uncertainties of the prostate zonal segmentation would provide confidence and trust in an automatic segmentation workflow, which allows a simple rejection or acceptance based on a certain uncertainty level. This can be implemented as a partial or entire rejection of the automatic segmentation results when presenting to experts, and future research will be needed to determine the level of uncertainties to be acceptable to experts. We believe that this additional confidence would enable more natural adaption or acceptance of the automatic prostate segmentation than the one without it when the prostate segmentation is integrated into the downstream analysis decision.
We observed that simple incorporation of the inter-slice information by 3D U-Net was not sufficient to improve the segmentation performance. Our prostate MRI data had a lower through-plane resolution (3-3.6 mm) than the in-plane resolution (0.5-0.65 mm), resulting in a conflict between the anisotropism of the 3D images and isotropism of the 3D convolutions [27], [28]. This may be the main reason that the model’s generalization was compromised. Specifically, voxels in the x-z plane will correspond to the structure with different scales along x- and z-axes after the 3D convolution [26]. Moreover, the performance was more significantly different when both ITD and ETD were used for testing, potentially due to the difference in the imaging protocol. Further study may be needed to investigate advanced approaches that incorporate the inter-slice information into the 3D convolution when there exists a difference between in-plane and through-plane resolutions while minimizing sensitivities to different imaging protocols.
The significant effect of including SAM and MFPA was investigated in the ablation study. The average DSCs of the proposed method were higher than the experiments in the ablation study for PZ and TZ in both datasets. However, there were no significant differences between DSCs obtained by the experimental methods and the proposed method for both zones in the ablation study when a paired t-test was used. Based on the power analysis, we need 100, 253, 143, and 194 cases for Experiment 1 in Table 4 (when SAM is removed) and 394, 253, 143, and 194 cases for Experiment 2 (when MFPA is removed) to achieve 80% power with alpha = 0.05.
We also compared the uncertainty of the proposed method and that of the U-Net. We found that average uncertainty scores of the proposed method for both PZ and TZ at three different prostate locations are all smaller than U-Net (Table 6).
Our study still has a few limitations. First, the training time was long due to small batch sizes to extract the global features which also required a large GPU memory. Second, all MR images were acquired without the use of an endorectal coil in the study. This mirrors general clinical use since the use of endorectal coil is decreasing due to patients’ preference. Also, studies showed no significant difference for the detection of PCa between MR images acquired with and without the endorectal coil [27], [28] due to the increased signal-to-noise ratios (SNRs) and spatial resolution of 3T MRI scanners, compared to 1.5T. We can apply pixel-to-pixel translation techniques such as cycle-GAN to handle the cases with an endorectal coil since the images with the endorectal coil contain large signal variations near the coil. Third, the study considered the slices that contain the prostate, which could potentially reduce the false positives of the non-prostate slices and increase the overall segmentation performance.
Conclusion
We proposed a spatial attentive Bayesian deep learning model for the automatic segmentation of prostatic zones with pixel-wise uncertainty estimation. The study showed that the proposed method is superior to the state-of-art methods (U-Net and USE-Net) on the segmentation of two prostate zones, such as TZ and PZ. Both spatial attention and multiple-scale feature pyramid attention modules had their merits for the prostate zonal segmentation. Also, the overall uncertainties by the Bayesian model demonstrated different uncertainties between TZ and PZ at three prostate locations (apex, middle and base), which was consistent with the actual model performance evaluated by using internal and external testing data sets.