Automation of Spine Curve Assessment in Frontal Radiographs Using Deep Learning of Vertebral-Tilt Vector

In this paper, an automated and visually explainable system is proposed for a scoliosis assessment from spinal radiographs, which deals with the drawback of manual measurements, which are known to be time-consuming, cumbersome, and operator dependent. Deep learning techniques have been successfully applied in the accurate extraction of Cobb angle measurements, which is the gold standard for a scoliosis assessment. Such deep learning methods directly estimate the Cobb angle without providing structural information of the spine which can be used for diagnosis. Although conventional segmentation-based methods can provide the spine structure, they still have limitations in the accurate measurement of the Cobb angle. It would be desirable to build a clinician-friendly diagnostic system for scoliosis that provides not only an automated Cobb angle assessment but also local and global structural information of the spine. This paper addresses this need through the development of a hierarchical method which consisting of three major parts. (1) A confidence map is used to selectively localize and identify all vertebrae in an accurate and robust manner, (2) vertebral-tilt field is used to estimate the slope of an individual vertebra, and (3) the Cobb angle is determined by combining the vertebral centroids with the previously obtained vertebral-tilt field. The performance of the proposed method was validated, resulting in circular mean absolute error of 3.51° and symmetric mean absolute percentage error of 7.84% for the Cobb angle.


I. INTRODUCTION
Adolescent idiopathic scoliosis is a structural spinal deformity mainly in the coronal plane [7]. Because radiography is fast, inexpensive, and simple compared with computed tomography and magnetic resonance imaging, frontal radiography is commonly used in diagnosis of scoliosis and for monitoring the progression. Radiography takes advantage of the ability to generate an entire spine image of a standing patient while reflecting the 3D rotatory nature of the scoliotic deformity [18].
The Cobb angle is commonly used to measure the lateral curvature of the spine in the coronal plane from a frontal radiograph, and is defined by the angle between two lines parallel to the upper plate of the superior end vertebra and The associate editor coordinating the review of this manuscript and approving it for publication was Qichun Zhang . the lower plate of the inferior end vertebra [12] as shown in Fig. 1. If the endplates are not well visualized, the boundary of pedicles are used to compute Cobb angle [12], [18], [19]. A manual measurement of the Cobb angle is time-consuming, cumbersome, and operator-dependent, resulting in high interand intra-observer variations. Even the intra-observer variability of the Cobb angle, which is known to be less than the interobserver variability, was reported to range as much as 5 • to 10 • [20], [24].
The human measurement of Cobb angle is known to be variable; a spine curve is practically considered progressed on radiographs when the Cobb angle increases by 5 • or more, per consecutive clinic visit [29]. This could be an arbitrary criterion that can potentially mislead the patient care. Hence, there is a significant need to improve the reproducibility of the Cobb angle measurement through automatic estimation.
Numerous computer-aided methods for an automatic estimation of the Cobb angle have been developed by many investigators. These can be divided roughly into two categories: segmentation-based methods and direct estimation methods. Segmentation-based methods use an active contour model [1], customized filter [2], and charged-particle models [26] for vertebral segmentation to calculate the Cobb angle. Unfortunately, these methods are not robust because an accurate segmentation of the vertebra is extremely difficult owing to an unclear vertebral boundary in the radiographs. Direct estimation methods [30], [33] attempt to extract the correlation between spine features (e.g., landmarks) from radiographs and the Cobb angle estimation without segmentation. However, these landmark-based methods suffer from an accurate and robust estimation because small errors in the landmarks can cause serious errors in the Cobb angle.
Recently, Wu et al. [34] proposed an Multi-View Correction Network to achieve a fully automated comprehensive scoliosis assessment by leveraging the correlation between the frontal and lateral radiograph. Wang et al. [35] proposed an Multi-View Extrapolation Network for an accurate Cobb angle measurement in both frontal and lateral view by taking advantage of multiple views [34] and high-precision calculation [6]. Chen et al. [6] developed Adaptive Error Correction Net combined with a high-precision calculation for directly calculating the Cobb angle in a single frontal radiograph.
Despite their highly accurate results, these methods still suffer from lack of visual interpretability because they provide only the Cobb angle as output, without providing the most tilted vertebra. The designation of the most tilted vertebra is important for the decision of the curve progression and surgery planning. Therefore, it would be desired to build a clinician-friendly diagnosis system that provides highly accurate and reproducible Cobb angle measurement in a visually interpretable manner. This paper proposes a hierarchical deep learning method designed to build a clinician-friendly system that provides intermediate decision process. The advantage of the proposed method is that it can directly visualize the end vertebrae by calculating a tilted angle of an individual vertebra, which could be hardly provided by the previously proposed methods [6], [34], [35]. The proposed method consists of three main steps, considering both the local and global information of vertebrae, which mimic the decision-making process of a clinician: 1) Localization and identification of the individual thoracic and lumbar vertebrae using a confidence map. 2) Estimation of the slope of the vertebrae using the vertebral-tilt field. 3) Cobb angle measurement using a confidence map and vertebral-tilt field. To develop a visually interpretable and highly accurate system, we combined the segmentation-based method and direct estimation method. Similar to the segmentation method, which can visualize the results, the vertebral-tilt field provides a prediction at each pixel inside the vertebral region. The Cobb angle is commonly used to measure the lateral curvature of the spine in the coronal plane. It is defined by the angle between two lines parallel to the upper plate of the superior end vertebra and the lower plate of the inferior end vertebra. Three Cobb angles, namely, the proximal-thoracic (PT), main thoracic (MT), and thoracic-lumbar (TL) anlges, are needed for scoliosis assessment.
Here, it predicts a vector that provides the slope of a vertebra in direct manner. Thereafter, this vertebral-tilt field, combined with the localization and identification of the vertebrae from a confidence map, provides an accurate slope on an individual vertebra in a visually interpretable manner.
The proposed method has three major contributions. (1) A highly accurate and robust Cobb angle measurement is achieved by the confidence map and vertebral-tilt field.
(2) A visually explainable system is developed to improve the clinician's workflow. (3) A vertebral-tilt field is proposed for accurate estimation of slope of vertebrae.
The performance of our method is evaluated on 128 anterior-posterior (AP) radiographs with 481 labeled training data confirmed by radiologists. The experiment results show that the proposed method provides an accurate and robust performance for an identification of the vertebrae and Cobb angle measurement.
We achieved a 3.51 • circular mean absolute error (CMAE) and 7.84% symmetric mean absolute percentage error (SMAPE) for the Cobb angle.

II. METHODS
Let I (x) represent the intensity of the grayscale AP X-ray image at pixel position h, x 2 = 1, · · · , w} represents a pixel grid in an image. Then, image I can be viewed as a matrix I ∈ R h×w . The goal is to develop a fully automated method for a Cobb angle measurement from AP radiographs I . An automated measurement of the Cobb angle from radiographs requires dealing with the overlapping shadows of other thoracoabdominal bone and soft tissue structures. In addition, it is necessary to distinguish between the cervical and thoracic vertebrae which are adjacent and have a similar shape in frontal radiographs.  Our method consists of the three parts: localization and identification of the thoracic and lumbar vertebrae, an estimation of the slope of the vertebrae, and a Cobb angle measurement.
The schematic overview is shown in Fig. 2. A confidence map is used to localize and identify the 17 vertebrae. For an accurate and robust estimation of the Cobb angle, we take advantage of the vertebral-tilt field to describe the slope of individual vertebrae. The Cobb angle can be accurately determined by 17 vertebral-tilt vectors, which are given by the vertebral centroids and vertebral-tilt field.

A. LOCALIZATION AND IDENTIFICATION OF THE VERTEBRAE
In our method, we first predict the centroids of the 12 thoracic and 5 lumbar vertebrae in image I , where the output are expressed by the vector P = (p 1 , p 2 , · · · , p 17 ) ∈ R 2×17 , where p j = (p j,1 , p j,2 ) represents the centroid of the j-th vertebra.
For an estimation of the 17 centroids, we employ a confidence map [5], [23], [31], [32] representing the belief of the centroids at each pixel position x = (x 1 , x 2 ) in image I (see Fig. 2). To obtain the confidence map, we generate an individual confidence map ψ j : x → R for j = 1, · · · , 17 that is defined by where σ 2 j is given by 1/8 of the height of j-th vertebra in image I . Next, these 17 confidence maps are integrated into the confidence map : x → R obtained by the following: Here, (x) represents the maximum among all values of ψ j (x) with j = 1, · · · , 17 at pixel position x.
A confidence map regression function: f c : I → rfn will be learned using a deep learning technique with a labeled training dataset D c := {I (n) , P (n) , (n) } N n=1 . The proposed network, called a Centroid-net, consists of three neural network functions: (i) a feature extraction network f ext : I → I * shown in Fig. 3(a), (ii) an initial prediction network f init : I * →˜ init in Fig. 3(b), and (iii) a refinement network f rfn : (I * ,˜ init ) → rfn shown in Fig. 3(c).
Here, f ext initially produces a set of feature maps I * = f ext (I ) ∈ R h 4 × w 4 ×512 which is an input of the network f init . The next two networks sequentially predict the coarse initial confidence map˜ init = f init (I * ) ∈ R h 8 × w 8 and the final confidence map rfn = f rfn (˜ init , I * ) ∈ R h×w by taking advantage of the refinement of the initial prediction˜ init where the initial confidence map f init (I * ) is concatenated with the intermediate feature map I * as an input of f rfn .
The Centroid-net is designed to achieve a large receptive field size at a pixel of the output layer with a sequential prediction structure to capture the long range dependency between the 17 vertebrae. We adopt a convolutional neural network to learn three functions f ext , f init , and f rfn . Fig. 3 shows the architecture of the Centroid-net. These networks f ext , f init , and f rfn are learned simultaneously, using the training data D c := {I (n) , P (n) , (n) } N n=1 . In the Centroid-net, we use the weighted loss function to improve the prediction accuracy for T1 (the first thoracic vertebrae), because it is difficult to distinguish between T1 and C7 (the last cervical vertebra), as shown in Fig. 4. The proposed weighted loss function is:  where L (n) c,1 represents the intermediate loss given by and L (n) c,2 represents the final loss given by , f init (f ext (I (n) ))).
Here, U is an 8× upsampling operator using bicubic interpolation and ω (n) is the weight given by The above weight ω (n) is designed to calculate the loss only in the region containing the thoracic and lumbar vertebrae, whereas ω (n) = 0 is within the region containing the cervical vertebra. This weight ω (n) is used to focus on predicting 12 thoracic vertebrae (T1-12) and 5 lumbar vertebrae (L1-5) locations, while ignoring the prediction in the region containing cervical vertebrae. This weighted loss approach can accurately and robustly predict T1, while avoiding the difficulty of distinguishing between T1 and C7 (see Section III-F for further details).
The proposed Centroid-net involving three networks f ext , f init , and f rfn is determined by minimizing the loss function in (3) using the training data D c = {I (n) , P (n) , (n) } N n=1 . The Centroid-net maps from I (frontal radiograph) to rfn (confidence map), as shown in Fig. 5(a) and (b). From the confidence map rfn , it is easy to determine the centroids P = (p 1 , · · · , p 17 ) ∈ R 2×17 of the 17 vertebrae. First, the Otsu's thresholding [21] is applied to the confidence map rfn in Fig. 5(b) to eliminate small local perturbations which are local maxima distant from vertebrae. The local maxima after thresholding are shown in Fig. 5(c). These local maxima are the candidates of the centroids. Next, we need to select 17 centroids P = (p 1 , p 2 , · · · , p 17 ) ∈ R 2×17 from the several candidates. To do so, we set the score as the value of rfn at each local maximum point. We exclude those candidates whose scores are less than half of the mean score. See red box in Fig. 5(c) and (d). Finally, we select the candidates 17 starting from the bottom candidate, as shown in Fig. 5(d).

B. LEARNING VERTEBRAL-TILT FIELD
This section describes a method for providing a vertebral-tilt field which will be used for determining the 17 vertebraltilt vectors. The vertebral-tilt field, denoted by V, aims to describe the slope of each vertebra in image I , as shown in Fig. 7. To learn a neural network f vec : The architecture of the M-net is based on U-net [25], and two major parts are added in the input and output layers. Three parts are explained as following.
The U-Net is a convolutional neural network architecture developed for biomedical image segmentation. The architecture of the U-net consists of two parts. (1) The encoding path performs 3 × 3 convolutions followed by a rectified linear unit (ReLU), and max pooling. (2) The decoding path applies upsampling using 2 × 2 transpose convolutions and 3 × 3 convolutions, followed by ReLU in which the up-sampled output is concatenated with a high-resolution feature in the encoding path as shown in Fig. 6. In the input layer, an image pyramid constructed by multi-scale images is used to integrate a multi-level receptive field. Here, the image is down-sampled by the average pooling and convolution with ReLU applied to the down-sampled image.
In the output layer, a side-output layer is used to learn local and global information at the same time. A multi-label loss function with a side output is used to deal with the vanishing gradient problem by replenishing the back-propagated gradients [8], [32]. At the output layer, a 1 × 1 convolution and an element-wise tangent hyperbolic activation function are applied.
Here, the f vec is expressed as follows: where f vec,i is the function producing the i-th side output, and the f vec is learned by minimizing the following multi-label loss: Here, ω (n) is the same weight as in (6). Now, we will describe how to generate the ground-truth V (n) . Given image I , we first take a rectangular domain j occupying the j-th vertebral region, as shown in Fig. 7. The vector field V is zero outside ∪ 17 j=1 j . In the j-th vertebral region, V is determined by where m j,r and m j,l are the right and left midpoints, respectively, as shown in Fig. 7.

C. COBB ANGLE MEASUREMENT
From the neural networks described in the previous sections, we obtained a map from I to (P, V). Now, it remains to determine three Cobb angles, = ( 1 , 2 , 3 ). We chose a disk D j centered at p j with radius 5 that is contained in the region of the j-th vertebra j (see Fig. 8). We computed the weighted average of V over the disk D j : This v j is called j-th vertebral-tilt vector and it provides a slope of the j-th vertebra, denoted by θ j as follow: Using these 17 vertebral slopes (θ 1 , · · · , θ 17 ), we first determined the end vertebrae in three regions: the proximal thoracic (apex between T1 and T3), the main thoracic (apex between T3 and T12), and the thoracolumbar/lumbar (apex between T12 and L4). Then, the three Cobb angles = ( 1 , 2 , 3 ) are given by angle between the end vertebrae in the three regions, respectively. Here, apex is the vertebra or disk which is the most distant from the center of the vertebral column [12].
Here, we provide a more detailed explanation on how to evaluate the three Cobb angles = ( 1 , 2 , 3 ). Evaluation of is determined by finding four end vertebrae and using the corresponding vertebral-tilt vectors, denoted by v c 1 , v c 2 , v c 3 , and v c 4 (see Fig. 9). Now, we explain how to determine the four end vertebrae. Let θ j,k denote the angle between v j and v k . Then, θ j,k satisfies VOLUME 8, 2020 Let c min and c max be numbers in {1, 2, · · · , 17} given by c min = min argmax (j,k)∈{1,··· ,17}×{1,··· ,17} θ j,k , and Here, c min -and c max -th vertebrae can be viewed as the upper and lower end vertebrae, respectively, of the major curve that has the largest Cobb angle, where MT or TL can be the major curve [12]. For example, in Fig. 9(a), MT is the major curve and therefore, c 2 = c min and c 3 = c max .
On the other hand, in Fig. 9(b), TL is the major curve and therefore c 3 = c min and c 4 = c max . Now it remains to determine remaining two end vertebrae. In the case when MT is the major curve, the remaining two vertebral-tilt vectors v c 1 and v c 4 are determined by On the other hand, when TL is the major curve, the remaining two vertebral-tilt vectors v c 1 and v c 2 are determined by Then, the three Cobb angles = ( 1 , 2 , 3 ) are determined by j = θ c j ,c j+1 for j = 1, 2, 3. (16)

III. EXPERIMENTS AND RESULTS
In this experiments, Python 3.6 and PyTorch 1.1 [22] were used to implement the proposed method. All training and evaluation were conducted on a workstation equipped with the two Intel(R) Xeon(R) E5-2630 v4 @ 2.20GHz CPUs, 128GB of DDR4 RAM, and four NVIDIA GeForce GTX 1080ti 11GB GPUs.

A. DATA
For the training and evaluation, spinal AP X-ray images and their label were provided by the Digital Imaging Group, London, ON, Canada [33]. All X-ray images were collected from individual patients. The provided images include 481 AP X-ray images for training and 128 AP X-ray images for testing. For training, we split the training data into 431 and 50 for training and validation, respectively. The labeled data include 3 Cobb angles and 68 landmarks representing the four corner points of the 12 thoracic and 5 lumbar vertebrae. The labeled data were manually annotated by two experts in London Health Sciences Center, London, ON, Canada [6], [33]. We call this dataset an internal dataset to distinguish it from external dataset described below. For an external validation, we also used an external dataset provided by a different hospital. The external dataset include 20 AP X-ray images that were collected from individual patients.
The 2) The intensity of resized image I (n) was scaled to the range of 0 to 1.
3) The centroids of each vertebra were given by the intersection of the middle of the width and the middle of the height in each vertebral body. 4) For the training of Centroid-net, we generated a ground-truth confidence map (n) using (1) and (2) with the ground-truth centroids P (n) . 5) For the training of M-net, we generated the ground-truth vertebral direction field V (n) using (9). 6) For the data augmentation, we applied the random brightness, random contrast adjustment, and random rotation within an angle of −10 • to 10 • .

B. TRAINING OF THE PROPOSED NETWORK
We trained the proposed neural network by minimizing the loss functions in (3) and (8) using the Adam method [11].
Here, we choose a batch size of 4 by considering our computational capability. Batch normalization [9] was also applied. The learning rate was set to 10 −4 . We trained the Centroid-net and the M-net for 1000 and 1500 epochs, respectively. The training was finished when the validation loss stopped decreasing.

C. QUANTITATIVE ANALYSIS OF COMPUTATIONAL EFFORT
We quantitatively analyze the computational effort, including computation time and memory requirements of the proposed neural networks.

1) COMPUTATION TIME
We provide the computation time of Centroid-net and M-net for training and test processes. In the training process, we measured the average time per epoch, which includes data loading, data augmentation, forward computation, and backward computation with optimization process. In the test process, the average time per batch was recorded with a batch size of 1. The test time included data loading and forward computation. The computation time is summarized in Table 1.
Here, we used a single GeForce GTX 1080ti 11GB GPU to measure the training and test time.

2) MEMORY REQUIREMENTS
To estimate the total memory requirements, it requires to compute the number of all network parameters and intermediate activations [28]. To obtain the total memory in units of byte from this number, we multiply by 4 because  every floating-point occupies the 4 bytes in a single-precision system.
We computed the amount of memory occupied by the network parameters. The Centroid-net has 1130080 trainable parameters, which occupy 4.31MB in single-precision floating-point format. The M-net has 10014888 trainable parameters occupying 38.20MB.
To estimate the memory requirements in the training process, we computed the total number of intermediate activations in forward pass and gradients in backward pass. Therefore, the required memory can be estimated as layer N batch × N activation × 2 × 4byte, (17) where N batch is the batch size and N activation is the number of activations in each layer. Here, we multiply by 2 in the consideration of the backward pass, which occupies same memory as the forward pass. The Centroid-net requires 1.46GB with a batch size of 4 and the M-net requires 3.28GB with a batch size of 4.

D. QUANTITATIVE EVALUATION AND COMPARISON OF THE RESULTS
In this section, we provide quantitative evaluation of the proposed method on the internal testing dataset that includes 128 AP X-ray images.

1) IDENTIFICATION AND DETECTION OF VERTEBRAE
For a quantitative evaluation of the Centroid-net, we used the distance error between the output of the proposed method and ground-truth centroids in the pixel space. Fig. 10(a) shows the boxplot of the center position detection error.
We achieved a median error of 1.11 for the 17 vertebrae. A higher error occurred when the Centroid-net failed to predict the L5 vertebra. The identification of 17 vertebrae was deemed correct when the model predicted the 17 vertebrae with a distance error of less than 20 pixels. We achieved an identification rate of 90.6%. We also computed the distance error between the output and closest ground-truth centroids, as shown in Fig. 10(b). This error shows how close the predicted centroids are to the centroids of the vertebrae, regardless of the vertebral level.

2) COBB ANGLE MEASUREMENT
For an evaluation of the three Cobb angles given by the proposed method, we used circular mean absolute error (CMAE) [4] and symmetric mean absolute percentage error (SMAPE) [13], [17]. The CMAE between (n) and (n) GT is defined using the mean of circular mean (CMEAN) as follows: where Here, = ( 1 , 2 , 3 ) indicates the three Cobb angles given by the proposed method and GT = ( GT,1 , GT,2 , GT,3 ) is the ground-truth of the three Cobb angles labeled by experts. The CMEAN was used to evaluate the angular quantity correctly. For example, the absolute error between the two angles 358 • and 2 • is 356 • , whereas the difference in the actual angle is only 4 • . To be precise, we first convert the three angles from degrees into  FIGURE 11. Box plots of circular mean error for several methods. The box plots represent interquartile ranges of circular mean error. Red lines denote the median value and black squares denote the mean value reported in Table 2. radians to compute the sine and cosine in the CMEAN. Next, we convert the value of CMEAN from radians to degrees.
The SMAPE is defined as follows: Here, the SMAPE is a prediction accuracy represented by relative error and it has advantage of scale-independency and robustness to outliers. We compare the proposed method with other existing methods: Angle-net in [6], Boost Net in [33], and Landmark Net in [6]. Additionally, we also used the U-net [25] based proposed method instead of using M-net. The quantitative evaluation results for several methods are reported in Table 2. As shown in Table 2, the M-net based proposed method achieves better performance in terms of CMAE and SMAPE. Box plots are represented to show the distribution of circular mean error (19) for several methods, as shown in Fig. 11.
To show the robustness of the proposed method against noise in a radiograph, we provide the performance evaluation of CMAE and SMAPE by adding different levels of Gaussian noise to the radiograph. The value of a noisy radiograph at pixel position x is defined by (1 + r )I (x) with ∼ N (0, 1), where r is a noise level (e.g., r = 0.05 for 5% noise).  As shown in Table. 3, the proposed method still provides better performance than the existing methods in [6], [33] even though we added the Gaussian noise to the input radiograph. The experimental result also shows that the proposed method has robustness against noise in radiographs. Note that when we trained our model, we did not use the addition of random Gaussian noise as data augmentation.

E. QUANTITATIVE EVALUATION ON THE EXTERNAL DATASET
In the previous section, we demonstrated that the proposed method provides accurate and robust Cobb angle estimation on the internal testing dataset. Furthermore, to test generalization ability and robustness of the proposed method, we additionally assessed the Cobb angle measurement performance using 20 frontal radiographs from a different hospital. The end vertebrae designation and Cobb angle measurements of these radiographs were performed by two experienced radiologists in consensus. The quantitative evaluation of Cobb angle measurements on external dataset is reported in Table. 4. The proposed method achieved small error for this external dataset, showing that the proposed method has robustness and generalization ability.

F. COMPARISON BETWEEN CONFIDENCE MAP RESULTS WITH AND WITHOUT USING WEIGHTED LOSS
We next compare the localization performance with and without using a weighted loss function. Fig. 12 shows that a Centroid-net using a weighted loss function outperforms the result without a weighted loss function. As shown in Fig. 12(b) and (e), the Centroid-net without a weighted loss function fails to predict the T1 vertebra. We analyzed the results as follows: The model trained using a conventional loss function has to predict the T1 vertebra but not the C7 vertebra, which is adjacent to T1 and has similar a pattern as T1. This sometimes causes the model to fail in predicting the  T1 vertebra (Fig. 12(e)), or to predict T1 with low confidence (Fig. 12(b)). This problem arises from the fact that one of the two vertebrae with a similar shape has to be predicted and the other does not. On the other hand, when using the weighted loss function, we do not calculate the loss in the cervical vertebral region, resulting in predicting the thoracic and lumbar vertebrae accurately. In this case, the Centroidnet predicts the cervical vertebra with high probability owing to its similar shape.

G. COMPARISON BETWEEN THE PROPOSED METHOD AND SEGMENTATION-BASED METHOD
We qualitatively compared the results between the proposed method and the segmentation-based method to show the advantage of the proposed method through a vector field approach. The M-net was used to segment the 17 vertebrae from frontal radiographs. As shown in Fig. 13(a) and (c), the segmentation method fail to describe a vertebral region accurately. In this case, the segmentation-based method cannot provide the slope of the vertebra since it requires highly accurate boundary segmentation. The segmentation-based methods require additional edge detection method such as hough transform [2], [3] to identify the vertebral endplates. On the other hand, the proposed method shown in Fig. 13(b) and (d) provides the accurate estimation slope of vertebra denoted by the red arrows even though the vertebral-tilt field did not encode the vector at each pixel in the region of vertebrae as shown in yellow box in Fig. 13(d).

H. QUALITATIVE EVALUATION OF THE PROPOSED METHOD
For a qualitative evaluation of the proposed method, we visualized the results on internal testing dataset with six selected subjects, as shown in Fig. 14. The results show that the proposed method properly provides 17 centroids (Fig. 14(b)) and 17 vertebral-tilt vectors (Fig. 14(c)), which are used for Cobb angle measurement. In Fig. 14(d), four end vertebrae given by the proposed method are presented with ground truth.

IV. DISCUSSION AND CONCLUSION
In this paper, we proposed a visually explainable Cobb angle measurement method using deep learning by considering clinician's decision process. The accurate and visually explainable Cobb angle measurement is important because of the following reasons: (1) Inaccurate measurement may lead clinicians to misinterpret scoliosis curve progression.
(2) A visually explainable scoliosis assessment algorithm that not only calculates Cobb angle, and also identifies the most tilted vertebrae of the curve can improve the clinicians' workflow in the real clinical practice.
However, no existing method has achieved both accurate and visually explainable measurement of the Cobb angle in terms of clinical performance. The direct estimation methods suffers from lack of interpretability, even though it achieved highly accurate results. The indirect estimation methods face inherently inaccurate measurements owing to the dependency on the quality of landmark estimation or boundary segmentation of vertebrae, even though they visualize the intermediate decision process using the anatomical structure of spine.
To overcome the above-mentioned difficulties, we integrated the advantages of direct estimation method into the proposed indirect method. First, we used the confidence map regression method to localize and identify all vertebrae by taking a fully convolutional structure [27], while conventional coordinate regression methods require deep layers with a large number of network parameters and only take a fixed size of image as an input [23]. Next, the vertebral-tilt field was used to describe the slope of vertebrae by assigning a vector at each pixel inside region of vertebrae. This vector provides the slope of vertebra in a direct manner. An advantage of the vertebral-tilt field is that it can estimate the slope of the vertebrae even if the vectors were not well learned over the region of vertebra, as shown in Fig. 13(d). Finally, the Cobb angle was provided by combining the confidence map and vertebral-tilt field results. In this study, the vertebral-tilt field was implemented by the M-net which has shown improved performance in medical image segmentation because vertebral-tilt field provide pixel-wise dense prediction like image segmentation [27].
We demonstrated that the proposed method achieved a highly accurate Cobb angle estimation through a visually explainable system based on the confidence map and vertebral-tilt field. The performance evaluation on both the internal and the external testing dataset shows that the proposed method has robustness over frontal radiographs from different hospitals.
The proposed method has room for improvement. We believe that uncertainty quantification of the proposed Cobb angle measurement will be an important in our future study, where Bayesian deep learning method in [10], [14] or Gaussian process regression in [15], [16] could be adopted. He has published many articles-based on clinical researches using MRI. His research interests include musculoskeletal radiology, analysis of medical imaging, including radiography, ultrasonography, CT, and MRI, and machine learning application for medical image analysis and reconstruction.
JIN KEUN SEO received the Ph.D. degree from the University of Minnesota, in 1991. Since 1995, he has been a Professor with Yonsei University, South Korea. He is currently the Director of the BK21plus of Computational Science and Engineering. He wrote books entitled Nonlinear Inverse Problems in Imaging (Wiley Press) and Electro-Magnetic tissue properties MRI (Imperial College Press). His research interests include inverse problems, mathematical modeling, image processing, partial differential equations, harmonic analysis, and deep learning for medical image analysis.