OP-convNet: a patch classification based framework for CT vertebrae segmentation

Accurate vertebrae segmentation from medical images plays an important role in clinical tasks of surgical planning, diagnosis, kyphosis, scoliosis, degenerative disc disease, spondylolisthesis, and post-operative assessment. Although the structures of bone have high contrast in medical images, vertebrae segmentation is a challenging task due to its complex structure, abnormal spine curves, and unclear boundaries. In recent years, deep learning has been widely applied in the segmentation of vertebrae images. In this paper, towards a robust and automatic segmentation system, we present an overlapping patch-based convNet (OP-convNet) model for automatic vertebrae CT images segmentation. Due to the greater memory and processing costs associated with 3D convolutional neural networks, as well as the risk of over-fitting, we employ overlapping patches in segmentation tasks using 2D convNet. In the proposed vertebrae segmentation method, OP-convNet effectively keeps the local information contained in CT images. We divide CT image slices into equal-sized square overlapping patches and applied the RUS-function on these patches for class balancing to minimize computational requirements. Then, these patches are input into the model along with their corresponding ground truth patches. This method has been evaluated on publicly available CT images from the MICCAI CSI workshop challenge. The results indicate that OP-convNet has precision (PRE) of 90.1%, specificity (SPE) of 99.4%, accuracy (ACC) of 98.8%, F-score of 90.1% in terms of the patch-based classification accuracy, and BF-score of 90.2%, sensitivity (SEN) of 90.3%, Jaccard index (JAC) of 82.3%, dice similarity score (DSC) of 89.9% in terms of the segmentation accuracy that outperform previous methods across all metrics.


I. INTRODUCTION
Image segmentation is a process that converts raw medical image data into meaningful, spatially structured information, which is necessary for scientific discovery [1]. Vertebrae segmentation is a prerequisite step for automatic spine analysis. Vertebral fracture detection, spine deformities assessment, and computer-assisted surgery of the spine [2] are facilitated by accurate vertebrae segmentation. Automated analysis of spine faces positive diversity among different tomographic scans, including dedicated spine scanning and the abdomen, chest, and neck scans. Therefore, it needs a generic vertebrae segmentation robust to a range of image resolutions and spine coverage. This requires that the vertebrae are visible clearly with their anatomical structure and show which spine section they belong to.
The goal of vertebra segmentation in a computed tomography (CT) volume is to diagnose, quantitatively analyze, and plan surgical/therapy for various spine diseases and disorders such as spine trauma, scoliosis, and other spinal pathologies. Like many other image-guided diagnoses in medical images, the vertebrae segmentation procedure can be accelerated and enhanced by computer-aided diagnosis (CAD) systems because manual segmenting of vertebrae in 3D medical images can be time-consuming and laborious, with the unpredictability of personal subjectivity. In the process of diagnostic radiology, computer-aided diagnosis (CAD) plays a significant role. Patients' images are processed faster because CAD systems shorten the time it takes for routine tasks to be completed, resulting in fewer mistakes caused by fatigue physicians. This is because CAD is relatively sensitive, as well as repeated and robust. Low computing time is required for healthcare purposes [3]. One of the biggest challenges of CAD application is the segmentation and detection of vertebrae from computed tomography (CT) images. CT images allow the evaluation of bone in details because it has high contrast and spatial resolution. Its availability is higher, and the overall low cost of the patient examination makes it the most widespread imaging technique.
It isn't easy to perform vertebrae segmentation because of anatomical variations. Such variations go to spinal variable coverage and structure that guides anatomy cues for vertebrae segmentation, especially the sacrum and ribs. Furthermore, neighboring vertebra often has similarity in appearance and shape too. Many methods focused on such challenges for automatic segmentation have been published in [4]- [7]. The model-fitting has remained a challenge in the vertebrae segmentation technique, which has often relied on statistical shape modeling, as well as shape constrained deformable and active shape modeling [8]- [10]. Some approaches proposed are based on atlas [11], active contours [12], and level sets with shape prior [13]. Due to the registration process, these atlas-based approaches have a high computational cost. While the accuracy of most of these deformable models is promising, they are sensitive to initialization, performed manually or automatically. Intervertebral discs bounding boxes were found using an interactive variant of marginal space learning used for segmentation of vertebrae based on graph cut and Markov random field in [14]. The work in [15] applied the Adaboost based object detection framework to find out the vertebral bodies bounding boxes that were used to segment each vertebral body by inflating a mesh from the center. The work in [5] described the vertebral bodies center detection by using random forest regression and this framework is utilized to highlight the ROIs where vertebrae were segmented. A method in [16] regress the distance to the vertebral body's nearest center by using multilayer perceptron (MLP) and then vertebral bodies are segmented to initialize an adaptive shape modeling by using detected locations. The work in [17] used probabilistic boosting-tree classifier to detect the vertebral boundary by combining shape modeling and machine learning, that applied a surface mesh adaptation to a vertebra with combination of statistical shape model (SSM). This SSM was used for mesh initialization to impose shape constraints.
Deep learning is a gateway in the development of medical image analysis in various clinical and research fields, e.g., brain tumors detection [18], supervised segmentation methods on volumetric medical images by 3D fully convolutional network [19], liver segmentation [20]- [22], prostate cancer segmentation from MRIs [23], and lung nodules segmentation using multi-scale 2D+3D features. Other methods based on CNN for CT lungs airways segmentation is used for leak detection [24], breast cancer classification [25], deep learning with level sets left were proposed in work [26] for heart ventricle segmentation and detection. Deep learning techniques are also applied in vertebrae segmentation and localization, e.g., vertebrae pathology grading [27], vertebrae localization for lumbar surgery in ultrasound images [28], and vertebrae segmentation [7], [29]- [32].
Deep learning is gaining popularity in CAD for vertebra segmentation, and several convolutional neural networks (CNNs)-based approaches have been proposed. Most recently published methods focus on vertebrae segmentation that replaced the explicit vertebral appearance and shapes modeling with convolutional neural networks (CNNs). A method in [33] used multiclass CNN for pixel labeling for lumbar vertebrae segmentation from 2D sagittal slices, and the lumbar region bounding box is estimated by a simple multilayer perceptron (MLP) for identifying the image's region of interest. In the subsequent work [34], a network for voxel classification was developed in the entire image from a 3D patch model, and a 2D frame was used to predict low resolution masks on the vertebral column that effectively handled the false-positive outside the vertebral region. A similar work was presented in [35] for spine segmentation. In that work, firstly, the lumbar region bounding box is estimated using regression CNN. Secondly, a classification CNN is used to do voxel labeling within the bounding box to achieve segmentation. A two-stage methodology [36] with interactive strategy is used to segment the vertebrae in downsampled images, and images were analyzed one after the other by a CNN for vertebrae segmentation. Fullyresolution images were analyzed in the second framework for refining the low resolution of vertebrae segmentation. Vertebrae localization and identification method in CT images are proposed in [37] by combining short and longrange contextual information in a supervised manner to develop a multi-task 3D FCN. A two-stage multi-class segmentation architecture based on a 3D graph convolutional segmentation network (GCSN) for 3D coarse segmentation and a 2D residual U-Net (ResUNet) for 2D segmentation refinement model is presented in [31], but several limitations exist in these works. Inferior segmentation is produced due to the blurry boundary, and it came at a high computational cost due to its complex network model.
While the aforementioned studies could segment the vertebrae CT image, they required high computational operations due to a large number of trainable parameters and were also low in segmentation accuracy. The consistent problem of automatically, accurately, and rapidly segmenting the vertebrae persists. As a solution to these problems, we propose a vertebrae segmentation model from CT images that integrates patch-based deep learning with pre-and post-processing. As the spine CT images in the dataset are grayscale and the size of the vertebra is very small in image slices, a convolutional neural network [38], [39] is selected as an appropriate design to solve the two-class classification problem. The training and validation images are input as overlapping patches containing vertebrae and background. The proposed OP-convNet is a simple, patchbased architecture that requires low computational cost due to the small number of trainable parameters and gives high classification accuracy. Vertebrae image patches contain such gradients that can be exposed through convolutional neural networks. The following are the main contributions of this paper.
 We propose an OP-convNet model to segment vertebrae CT images. The slices from CT images are divided into equal size square overlapping patches enhancing localization, so the trained network can emphasize local characteristics within each patch.  The proposed patch-based convnet model avoids the difficulties associated with reduced feature map resolution and semantic feature loss.  Class balancing of overlapping patches is achieved by the RUS-function.  The results on publically available MICCAI CSI dataset indicate that OP-convNet achieved outstanding precision, specificity, accuracy, Fscore in terms of the patch-based classification accuracy, and BF-score, sensitivity, Jaccard index, dice similarity score in terms of the segmentation accuracy without adding to the computational load.

II. METHODOLOGY
Similar to other deep learning-based methods, our proposed method is also divided into training and testing phases. The proposed method further comprises the preprocessing (HU, Gaussian filter, data augmentation), overlapping patch generation, RUS-function for class balancing, OP-convNet framework and testing phase. Figure 1 shows the flowchart of the proposed methodology for vertebrae segmentation.

A. DATA PREPROCESSING
The primary aim of the preprocessing phase is to identify bone pixels and improve differentiation between vertebrae and other tissues. We removed the noise artifacts from the entire CT images by applying a threshold technique in this study. Then, outside the bone intensity range of 100HU (Hounsfield unit) and 1500HU, the intensity is set to 0 to minimize noise, imaging artifacts, and influence from the tissues surrounding the vertebral column. Our input spine CT data are volumetric data, so they need to be processed slice by slice. The applied threshold distinguishes the vertebrae from other soft tissues because they have higher pixel intensities in CT images than other tissues. But vertebrae have similar intensities to other bones like ribs, so we train our deep learning model to distinguish the vertebrae from the different bony structures in the CT images. We apply the Gaussian filter to control the smoothness of CT images with a fixed kernel size to enhance the spine's segmentation accuracy and antiquate the noisy pixels' effeteness. Large deformation is correlated with a large value of Gaussian kernel, and the estimated transformation is used as a small kernel input of smoothness. Figure 2 shows 512 × 512 pixels original CT slice images with respective ground truths from dataset.

1) DATA AUGMENTATION
ConvNet's accuracy increases with data augmentation, and overfitting is reduced [40]. Since the input images are rotationally invariant, we rotate each image (θ degrees) to maximize the number of image samples. It should be noted that while rotating an image may degrade its high-frequency content slightly, it should have no effect on the background or foreground of the image. Following that, we randomly translate each image to increase the number of training samples for the network. The augmentation step based on image rotation is essential to the network's performance [41] because of the limited number of images in the spine dataset.
In the next step, the data are normalized to the range [0,1].

B. OVERLAPPING PATCH GENERATION
The input CT images are divided into overlapped patches with size of n×n. We take certain stride pixels to create the overlapping patches as shown in figure 3. 32×32 pixel size patch having a total of 1024 pixels, if the total pixels inside a patch is equal or more than 513, then the patch is labeled as 1 (vertebra or foreground patch) otherwise, it is labeled as 0 (non-vertebra or background). We used a certain pixel stride for the sliding window to create overlapping patches for training. Testing images are also divided to generate n×n pixel size overlapped patches to check the model's accuracy and then segment the vertebrae from CT images. Figure 4 shows the randomly selected input patches with size of 32 x 32 pixels.

C. RUS-FUNCTION
We used two classes for classification, and there was an imbalance size of numbers of training patches, as shown in the class distribution in figure 5. Because the vertebrae spatial area in the images is much smaller than the background area, most of training patches are labeled as 0, so the classifier can lead biasness in the background. From a medical perspective [42], a high recall rate (correct vertebra patch classification) is desirable, but a high false-negative rate (vertebra misclassified as background) is inappropriate on a practical level. The proportions of positive and negative training samples are to be balanced to address this dilemma [43]. We applied the random under-sampling function (RUSfunction) for negative samples and generated a balanced training set to deal with this issue. As a result, this increases the accuracy and convergence rate of the network during the training process [40], [41]. Figure 6 indicates the RUSfunction is applied to CT image patches to remove majority class patches (background patches) for making balance classes before training stage.

D. OP-CONVNET FRAMEWORK
When combined with suitable selected local features, the overlapping patch-based technique appears to be an effective solution for the vertebrae segmentation problem. However, due to a lack of precise definitions for the vertebra to be classified, determining the appropriate features in the treated problem is challenging. Furthermore, choosing the most important and mutually independent features from a huge number of potentially available characteristics need a complicated statistical analysis based on large volumes of image data. This is why the OP-convNet is employed to implement the automatic feature extraction in this study. Figure 7 shows general proposed OP-convNet structure (overlapping patches as input, convolutions, non-linearities and pooling layers), followed by more convolutionary and fully connected layers. AlexNet [40], VGG net [44], GoogLeNet [45], ResNet [46], and other standard CNN architectures are designed to classify RGB images but all of these networks require an exact size of 224 × 224 pixels of input, which is too large to meet our problem. Some strategies (zero-padding, re-sampling, etc.) can help overcome this drawback, but they require an excessive amount of processing effort. In addition, the above classification networks contain a large number of trainable parameters and require a large number of computational operations during a single forward pass. Another possibility was to use fully convolutional nets specifically designed for image segmentation, such as U-Net [47], SegNet [48], which can segment an entire two-dimensional image in a single forward pass. When using low-resolution feature maps for up-pooling, SegNet frequently loses neighboring information. Additionally, it focused more on central slices. Furthermore, when it comes to maintaining local features for image segmentation, U-net also has some drawbacks. To address the aforementioned shortcomings, we propose a method for vertebrae CT segmentation that utilizes an overlapping patch-based convolutional neural network architecture. It achieves a higher level of segmentation accuracy when applying overlapping patches, while maintaining the same degree of complexity as the standard CNN architecture. In this work, we proposed a specific convNet architecture for segmentation of vertebrae from square patch images. A convNet is a multi-stage deep learning framework (convolutions, non-linearities and pooling layers), followed by more convolutionary and fully connected layers. The raw pixel intensity image is given as input by convNet. Every class of data is represented by a unique neuron in the output layer (in our case, convNet is chosen as a suitable architecture to solve the two-class classification problem). The weights (W) in the ConvNet are optimized using the backpropagation algorithm to minimize the classification error on the training.

1) CONVOLUTIONAL LAYER
The convolutional layer (the first layer) takes square patches from the input image (using a stride value and padding, if necessary) or feature maps (for subsequent layers) and performs 2D convolution with a filter. A rectified linear unit (ReLU) is used to accelerate training by feeding the sum of the resulting convolutions into a non-linearity function [40]. The same filter is used for all feature maps in a given layer, but different filters are used for other feature maps. This filter sharing property of the conv layer enables the detection of the same pattern in multiple locations across the feature map.

2) POOLING LAYER
The feature map is down-sampled using the pooling layer, which frequently sums up the feature responses in each overlapping patch by computing the maximum activations (max-pooling). These results in features that are invariant to minor data translations.

3) FULLY CONNECTED LAYER
Convolution and pooling create small dimension feature maps than the input image, which are then processed through numerous fully connected (FC) layers. The initial few FC layers combine these feature maps to construct a feature vector. The final FC layer contains two neurons that use softmax regression to calculate the probability of classification for each class. To avoid overfitting, the fully connected layers are constrained using dropout [49].

E. TESTING
ConvNets' weights are initialized with Gaussian distribution. These weights are changed iteratively during training with the gradients of the loss function, which are computed using stochastic gradient descent (SGD) over a mini-batch of training data. After a given number of epochs, the learning rate slows down. To accelerate learning and reduce overfitting, momentum and weight decay are applied. After a set number of epochs, the training process is completed. The final model with the lowest validation loss value is selected. The test spine CT image receives the same preprocessing. Test patches of 32×32 pixels with a stride of 1 are fed to the trained model from unseen spine images. Each test patch will get a label through a trained model. In our case, OP-convNet predicts the two-class patch-classification problem (vertebra or background). A higher rate of false-positive (FP) was detected due to pixel-based segmentation. Because of most segmentation problems, background pixels are more than foreground pixels, so a systematic false negative (FN) error is more favorable than a systematic false positive (FP) error. For this reason, post-processing analysis was implemented. Then, the obtained binary predicted image is processed by applying simple morphological operations [50] to achieve fine segmentation.

A. DATASET
Our model's performance is evaluated using a publicly available MICCAI CSI dataset [51] gathered at a trauma hospital during routine medical examinations. The CT scan covers all thoracic and lumbar vertebrae which an in-plane resolution of 0.31mm to 0.45mm. These datasets were compiled from patients ranging in age from 16 to 35 years. Each slice is approximately 512 × 512 pixels in size. There are between 520 and 600 slices in each case. This includes fifteen cases in total, ten cases for training, and five cases for testing. The dataset was received from the University of California, Irvine's medical center (Orange, CA, USA). Scan and reconstruction settings include a voltage of 120 kVp, intravenous contrast, and a 0.7 to 1.0 mm slice thickness. The scans were performed at a high spatial resolution as a continuous CT data set. Additionally, the challenge organizers provided reference segmentations that were generated semi-automatically and manually corrected.

B. EVALUATION MEASURES
Evaluation measures [52] used to evaluate the patch-based classification and vertebrae segmentation performance with other existing methods. The evaluation measures are considered well-known and widely used in medical image analysis.

1) CLASSIFICATION METRICS
The performance of patch-based classification is evaluated by precision (PRE), specificity (SPE), accuracy (ACC), and F-score for quantitative assessment. We draw the confusion matrix to perform the quantitative evaluation. Recall measures the ratio of positive patches in the ground truth that is also detected as positive by the proposed method. Specificity is used to measure the negative patches evaluated by the method that is negative in ground truth too. It identifies how sensitive the method is to detect the correct background patches. It is obtained by an equation as: Precision is achieved as: The overall accuracy is obtained as: F-score is calculated as follows:

IV. EXPERIMENTS AND RESULTS
The proposed OP-convNet was trained by the techniques of mini-batch descent gradient and the momentum backpropagation. All biases were initialized to zero. The optimal convNet hyper-parameters (initial learning rate, momentum, weight decay, and mini-batch size) were found utilizing a pyramidal approach by doing a random search within the constrained space parametry [55]. Stochastic gradient descent (SGD) is used to train the network for 30 epochs with a mini-batch size of 128. The initial learning rate is 0.01 and, learning rate drop factor is 0.1 on each 10th epochs. The momentum and rate of weight decay L2 regularization are set to 0.9 and 0.0001, respectively. The fc4 and fc5 layers are constrained using a dropout ratio of 0.5. The experiments are conducted using MATLAB 2018a on a 1.80 GHz i7 CPU, 32GB RAM, and an NVIDIA GeForce MX250 GPU. Figure 8 illustrates the architecture of OP-convNet. Three convolutional-pooling layers were used in the convNet. For the three layers, the number of feature maps is 64, 128, and 256 based on study [38]. The filter sizes for all three layers are 3 ×3, whereas the max-pooling sizes are 3 ×3, 3 ×3, and 2 ×2 respectively. After preprocessing, input patches of 32×32 pixels are fed to the model. The first 2D convolutional layer outputs a convolutional feature map with 64 neurons and has a kernel size of 3×3.   Zero paddings are applied to boundary pixels during convolution so that each layer's feature map is the same size as the input patch. This layer is followed by a rectified linear unit (ReLU) activation layer, which handles element-wise non-linearity. A 3×3 maxpool layer subsamples the input patch image and output size is 7×7×64 at this stage. Subsequently, convolutional layers two and three have the same size of kernel (3×3), and ReLU layers extract 128 and 256 features maps, respectively. These layers are followed by 3×3 and 2×2 maxpooling, resulting in 3×3×128 and 2×2×256 size output. In OP-convNet, the three FC layers have 1024-256-2 neurons, respectively. The values of 1024 and 256 were chosen based on our empirical findings, and the number 2 was chosen to accommodate the number of object categories in our two-class (vertebrae/background) classification problem. The proposed architecture is most suitable for optimizing performance of classification-based vertebrae segmentation. The flattened output was sent into a fully connected layer (FC) with 1024 nodes and a dropout layer. With a probability of 0.5, the dropout layer removes different sets of nodes every time. Drop out layer maintained the regularization and, during training, avoids over-fitting of the convNet. In the first (FC) layer, a vector of 1024 features is created, while in the second (FC) layer, a vector of 256 features is produced, and dropout layers being applied. Finally, there is a softmax layer with two nodes that computes a probabilistic prediction for two classes. The softmax layer as output layer calculates L2 regularization cross-entropy loss. Table I listed the detailed configurations of OP-convNet model parameters. In total, 895019 patches were generated to train the model, in which 716019 patches (80%) were used for training and 179000 (20%) were used for validation. Figure 9 illustrates the learned filters for convNet's first, second, and third convolutional layers. These automatically trained filters compute gradients primarily with varying frequencies and orientations and color blobs necessary for classification. In addition to these learned filters, figure 10 shows the activations feature maps of pooling layers (maxpool1, maxpool2, and maxpool3). In the proposed method, overlapping patches of the same size (a stride of 12 pixels) are used. The reason for the adoption of a 12-pixel stride is that when its size decreases, the number of patches grows, resulting in increased computational complexity. As a result, we determined a stride size of 12 pixels as the ideal value for our experimental study. Figure 11 depicts an OP-convNet fine-tuning process over 120 training epochs. As illustrated in the figure, the validation loss reaches its smallest value (0.106) after eight epochs, corresponding to a validation accuracy of 0.987. The training of OP-convNet requires 27 hours under our experimental setup.

B. SEGMENTATION PERFORMANCE
After presenting the patch-based classification performance using training images, we now used five test cases containing 3418 slices to access the segmentation performance of our approach. Figure 13 shows the confusion matrices of training, and test cases separately. Figure 14 shows the segmentation results for the axial plane, where the original images and ground truth segmentation maps for each slice, are illustrated in first and second rows respectively. The third row shows the proposed approach's predicted segmentation map, and it can be seen that the proposed method produces well-segmented results. In comparison to related studies,

IV. DISCUSSION
The results comparison in table III quantitatively verify the advantages of OP-convNet framework over existing methods. For example, U-Net [47] takes as input a whole image, whereas the proposed method utilizes overlapping patches based on the convNet architecture as input. The experimental results demonstrated that the U-Net did not perform well in segmentation due to a lack of local information and achieved 83.7% of Dice similarity coefficient (DSC), whereas the proposed OP-convNet performed significantly better (89.9% DSC). The local information may be kept more efficiently in the proposed method than the U-Net, where the slices are divided into overlapping patches, and the predictions are made independently for each patch. As a result, segmentation performance has improved. Another method D-TVNet [57] based on U-Net obtained a mean DSC of 86.17%. The experiment's findings demonstrate that D-TVNet cannot identify the essential spots for measuring the angle of the spine curve when segmented bones are used. Furthermore, this method is unsuccessful at recognizing them when the bones are not sharp, and image noises are larger. Though D-TVNet method can to filter out some noise from images, in some cases, it can also eliminate essential bone pixels by mistake.  [59]. A classification based PaDBN segmentation model [7] is proposed for automatic CT vertebra segmentation and achieved an 86.1% DSC. This approach indicates that the initial thoracic vertebrae have a lower DSC due to the influence of the ribs and intervertebral discs; therefore, it wrongly segmented certain bones that were not visible in the label annotations, resulting in misclassification and a poor DSC. OP-convNet framework also outperformed other classical backbones in terms of DSC. Compared with the VGG net [44], ResNet [46], and DenseNet [56] proposed framework is significantly better by 8%, 5.4%, and 2.5% average in DSC respectively. Therefore, proposed model has strong predictive performance and application ability to assist planning and biomechanical analysis for vertebrae CT images segmentation. Figure 15 shows OP-convNet outperforms previous methods across all metrics in segmentation performance. There are several patterns of vertebrae, with certain patterns found at various spinal levels. For example, significant morphologic differences can be seen between two vertebrae with a large spatial separation within the spinal column, such as a lower lumbar vertebra and upper thoracic vertebra. Thus, proper segmentation of all vertebrae would be a difficult process. In order to achieve a robust segmentation, vertebra specific models require anatomical knowledge of the modeling process. The limitation of our proposed approach is found in some vertebra T6-T8, where poor segmentation has been observed due to the presence of rib structure. Also, vertebra T1 typically receives lower DICE scores than the other vertebrae. Figure 16 shows poor segmentation visualization results. This is probably because the T1 top vertebra is seen in certain datasets, whereas in other datasets, some cervical vertebrae are included. This sometimes causes the procedure to stretch the T1 vertebra to the C7 vertebra. This issue does not exist at the other end of the spinal column, specifically for the L5 vertebra, because all data sets include the sacrum, and L5 is not aligned with S5 in the sacrum. It is also possible to address these circumstances by obtaining more CNNs that are specifically trained to identify the C7/T1 transitions, as well as more 2D images or 3D image volumes datasets could be required. More datasets from more patients and other institutions will need to be corroborated in order to demonstrate the consistency, transferability, and robustness of the model.

V. CONCLUSION
This paper investigated the use of the deep learning OP-convNet framework for CT vertebrae segmentation. The results indicated that our proposed approach achieved outstanding precision, specificity, accuracy, F-score in terms of the patch-based classification accuracy, and BF-score, sensitivity (SEN), Jaccard index, dice similarity score (DSC) in terms of the segmentation accuracy on publically available MICCAI CSI dataset. Our research has demonstrated that by dividing the input slices into overlapping patches, RUSfunction for class balancing, and training the convNet framework on these overlapping patches, one can achieve superior segmentation performance over previous methods. The proposed OP-convNet architecture produces predictions for each input patch independently, retaining more local spatial information. Additionally, our framework outperforms prior methods in terms of all evaluation metrics with an overall DSC score of 89.9% for vertebral CT image segmentation. In a clinical application, segmented vertebrae can be used to prepare for subsequent automatic planning and positioning. The risk of deviation in a patient who undergoes an increased amount of bone biting can be minimized when pedicle screw implantation is employed. Additionally, single vertebra printing enables doctors to gain a better understanding of the tissue structure, which is especially beneficial for patients with scoliosis and enables surgeons to complete planning prior to surgery. Thus, our method improves the practicality and the accuracy of segmentation results in order to support clinical treatment without the complicated network architecture. Our future work will include a more extensive validation, improvement, and the possibility of implementing our proposed OP-convNet for additional tasks, such as multi-class classification and segmentation using 3D medical images. Additionally, future generations of GPUs with larger internal memory will enable the proposed model to examine different pathologies on a large dataset.

CONFLICTS OF INTEREST
The authors declare no conflict of interest.