Brain Vessel Segmentation Using Deep Learning—A Review

This article provides a comprehensive review of deep learning-based blood vessel segmentation of the brain. Cerebrovascular disease develops when blood arteries in the brain are compromised, resulting in severe brain injuries such as ischemic stroke, brain hemorrhages, and many more. Early detection enables patients to obtain more effective treatment before becoming critically unwell. Due to the superior efficiency and accuracy compared to manual segmentation and other computer-assisted diagnosis procedures, deep learning algorithms have been extensively deployed in brain vascular segmentation. This study examined current articles on deep learning-based brain vascular segmentation, which examined the proposed methodologies, particularly the network architectures, and determined the model trend. We evaluated challenges and crucial factors associated with the application of deep learning to brain vascular segmentation, as well as future research prospects. This paper will assist researchers in developing more sophisticated and robust models in the future to develop deep learning solutions.


I. INTRODUCTION
This Cerebrovascular disease (CVD) or stroke is an acute interruption of cerebral vasculature leading to a compromised perfusion to the brain parenchyma. Over the past decades, despite an increment in the global stroke prevalence, the mortality rate is decreasing owing to a longer life expectancy [1]. CVD also represents a significant cause of disability and mortality, where the stroke is recognized as the leading cause of adult's disability or functional loss and cognitive decline [2], [3], [4]. Additionally, it is widely accepted that about 85% of stroke subtypes are ischemic in nature (i.e., due to blockage), whilst the remaining are hemorrhagic strokes (i.e., due to rupture) [3]. Therefore, recognizing stroke at an early stage and treating it promptly is important to prevent or minimize The associate editor coordinating the review of this manuscript and approving it for publication was Kumaradevan Punithakumar . mortality and/or morbidity. Of note, studies also reported that up to 45% of cases of dementia are CVD-related [4].
The etiology of ischemic stroke includes microthrombosis, embolism, and lacunar, with up to 65% of the etiologies thought to be due to cerebral small vessel disease (CSVD) [2]. There are multiple cardio-cerebrovascular risk factors of stroke, with hypertension (i.e., elevated arterial blood pressure) serves a leading risk factor of stroke, especially in women. Other cardio-cerebrovascular risk factors include type-2 diabetes, smoking, high body mass index (or obesity), drug use, and atrial fibrillation [2]. Hypertension may afflict anyone at any age, especially someone with a family history of hypertension. Researchers have discovered specific changes in brain vasculature due to hypertension over time.
As per a clinical hypothesis, cerebral vasculature changes, such as changes in the diameter and tortuosity, are frequently evident before hypertension develops symptoms [5]. Changes in cerebral vasculature and cerebral perfusion are also important indicators of the aetiogenesis of hypertension. Moreover, chronic uncontrolled hypertension may lead to CSVD, mainly in the deep subcortical region, such as the thalamus, pons, internal capsule, and cerebellum [6].
In addition, hypertensive individuals may also have genetic-based cerebrovascular susceptibility more than nonhypertensive people, according to Warnert et al. [7], who proposed the hypertension-induced remodeling of cerebral vasculature to maintain blood circulation balance. Other research reinforces this prior finding, claiming that cerebral vascular remodeling and higher cerebral perfusion pressure occur before the onset of hypertension in both animal models and humans. Predictably, chronically elevated blood pressure has been linked to changes in carotid artery diameter in rats [8], while blood artery tortuosity that is excessive or aberrant has been linked to multiple manifestations of ischemic stroke due to systemic hypertension as reflected by the brain and vasculature imaging [9], [10].
Neuroimaging of biomarkers is commonly used to detect CVD. However, neuroimaging and biomarker technologies have advanced in the recent period, and there is still much to learn about the pathogenesis of vascular disorders. Human intervention is frequently required for diagnosis, which is tedious and error-prone. Because the stroke rate suggests the necessity for effective early disease diagnosis, automation of such tasks is one option to make life easier. Nowadays, medical imaging is becoming a more valuable and cost-effective method for diagnosis and prognosis, attracting researchers from various domains to work together to provide reasonable solutions. Medical imaging techniques involve automation as a highly potential research field where researchers believe, with enough research, an intricate diagnosis like CVD can be accurately detected [11]. The aforementioned studies depict the significance of a high-precision early cerebral blood vessel diagnosis, and image segmentation techniques can help solve the problem.
Roentgen discovered the first technique of structural imaging in 1895, termed X-ray [12]. However, it was not until 1927 that Egas Moniz conducted the first human cerebral angiography [13]. Before 1927, Haschek and Lindenthal used an opaque fluid to inject into human corpses to create radiographs of blood arteries. The latest advances in science and computing have resulted in increasingly sophisticated systems for acquiring data from the brain. Computed tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI) are the three primary techniques that have been utilized for decades; MRI was created most recently by Nobel laureate Lauterbur and Mansfield. Magnetic resonance angiography (MRA) is a collection of techniques that leverage MRI to depict the brain's blood vessels in detail. TOF-MRA is the most frequently used modality nowadays for cerebrovascular radiography. Together with other imaging modalities, such as digital subtraction angiography (DSA), photoacoustic imaging (PAI), and transcranial doppler (TCD), the techniques above have advanced our comprehension of the brain's vasculature,  thereby increasing and improving our knowledge of the central nervous system's complexity (CNS) [14]. Figures 1 & 2 show typical MRA image slices of the brain with and without the label.
The cerebral network of the brain is intricately connected to different brain tissues, making it difficult to physically identify the tiny arteries, let alone detect Blood Brain Barrier (BBB) leakage. Noise is an inherent component of all magnetic resonance images and degrades the image's resolution and contrast, which is critical for segmenting tiny brain vasculature. Using noise reduction to retrieve the brain's vascular network from an MR image is crucial in medical imaging. Numerous strategies for segmenting the vascular network from MR images have evolved, indicating a good chance of overcoming the problem through recent research. However, such an application is still in its infancy in the clinical setting. As medical imaging modalities advance at a breakneck pace, new application-specific segmentation challenges emerge, and novel approaches are regularly investigated and proposed [17]. Choosing the most appropriate method for a particular application is a difficult task.
Numerous studies on segmentation have been conducted, including atlas-based algorithms [18], [19], [20], active contour models [21], [22], machine learning techniques [18], [23], and statistical models [24], [25]. A previous review on blood vessel segmentation discussed in detail the mentioned methods [26]. Some proposed models can be classified as manual, semi-automated, or automated. However, of all the models, the Active contour model (ACM) is the most extensively used clinically, where images can be identified based on their edges, regions, or higher knowledge [27]until recently. When it comes to microscopic features, the ACM has limitations, and time complexity increases as data volume grows. Since the problem is well-known, researchers are looking for a more robust solution, and deep learning is becoming more popular as an alternative. The First deep learning-based segmentation was performed very recently by Phellan et al. [28].
Since 2017, a lot of deep learning-based research has been done on brain blood vessel segmentation, leading to the focus on developing Computer-Aided Diagnosis (CAD). Radiologists employ CAD tools to recognize and evaluate medical images automatically. It provides a crucial second opinion and reduces Intra and Interobserver variability, allowing for faster, more accurate, and consistent diagnosis. Conventional CAD systems can automatically diagnose various CVD disorders, including intracranial aneurysms (IA). Due to low sensitivity and high false positive (FP) rates, such methods are not commonly used in medical practice. However, thanks to the advancement of deep learning models and computer vision in medical imaging, CAD systems have recently evolved. MRA has been regularly used in CAD-based systems for IA incorporating various deep learning architectures in recent years. 2D CNN model to detect IA on maximum intensity MRA [31], DeepMedic CNN on TOF-MRA [32] and CTA [33], 18-layers CNN Residual network on MRI [34], 3D Resnet on TOF-MRA CTA [37] all are the current methods used in the CAD system to diagnose IA with sensitivity ranging from 70% to 94%. Recent advancements in CAD systems suggest an increase in medical research. More on the development of CAD-based systems for IA can be found in this article [38]. Assume that a CAD-based system can be enhanced to the point where the system's sensitivity and accuracy are therapeutically beneficial. In that situation, it will improve radiologists' capacity to diagnose brain imaging.
There has been extensive research on segmenting the cerebrovascular system using deep learning in the past five years. To our knowledge, no review article has explored the present application of deep learning approaches to the segmentation of brain arteries. This paper will explore current trends in deep learning-based model architectures for segmenting brain images for vascular extraction. In addition, it will investigate the limitations and scope of future research in this field.
We included studies from 2017 that established cerebrovascular segmentation as the primary task. Articles discussing strategies, such as vessel wall segmentation and artery tracing approaches for cerebrovascular segmentation, were excluded from consideration.

II. RECENT GROWTH IN DEEP LEARNING IN MEDICAL IMAGING
Deep learning-based techniques for medical imaging have grown in popularity in recent years due to their robust feature extraction, accurate classification, and compatibility. The Convolutional Neural Network (CNN) architecture is the most frequently used deep learning architecture for image processing and segmentation. The feature extracted using many layers (convolutional layer, pooling layer) is highly robust and impractical to produce manually. Depending on the input data, 2D, 2.5D, and 3D CNNs are utilized for medical imaging. In 2D CNN, the input picture is given in a two-dimensional format to apply a two-dimensional filter for segmentation. With transfer learning, a similar architecture was used, in which pre-trained 2D models on ImageNet were used in conjunction with low-level filters [40]. 2.5D architecture delivers much more spatial information than 2D design at a lower computational cost than 3D architecture prompted its development. According to some studies, the 2.5D training technique with 2D labeled data is more compatible with present technology than the 3D training technique [41], [42], [43]. They cannot employ 3D filters that require 3D CNN since 2D architecture is still limited to 2D kernels. The voxels from 3D patches are used in 3D architecture to predict the label, like 2D CNN but with more spatial information. Most medical images are in 3D format, and researchers preferred the architecture because of the availability of processing capacity [44].
Fully convolutional network (FCN) is another network proposed by Long et al. [30] FCN substitutes the final fully connected layer with a fully convolutional layer, enabling the network to make pixel-by-pixel predictions. This layer enhances the dense pixel-wise prediction in a single forward pass from a full-sized image compared to a patch-wise prediction. High-resolution activation maps are linked with upsampled outputs and fed into the convolution layers to create a more precise result by enhancing localization performance. FCN is frequently utilized to segment organs [45], [46] using 2.5D and 3D images. There are more FCN versions, including Cascade FCN [47], Focal FCN [48], and Multi-stream FCN [49], that are widely used in medical imaging with high accuracy. One of the most commonly used architectures in medical imaging today is U-net, which was proposed by Ronneberger et al. [29]. This model employs deconvolution and FCN to create a U-shaped architecture comprising 19 layers. Two steps are included in the model: analysis and synthesis. The analysis step makes use of a CNN structure with layers for downsampling. The synthesis is accomplished by a series of upsampling layers followed by a deconvolution layer. Though the first structure was  designed for 2D pictures, it lacked localization capability. Later, Çiçek et al. [50] created the 3D U-net to provide additional spatial information to the network, employed in vascular border identification [51]. 3D U-net is a memoryintensive algorithm. V-net is the most well-known adaptation of U-net, presented by Milletari et al [52]. Other potential deep learning models are being applied in medical imaging, including Convolutional Residual Networks (CRNs) [53], Recurrent Neural Networks (RNNs) and their variations, long short-term memory (LSTM), Contextual LSTM [54], Gated recurrent unit (GRU), and clockwork RNN (CW-RNN). More details on the models and their application were discussed in [55]. Figure 3 provides the U-net network structure.

III. DEEP LEARNING USED FOR CEREBRAL VESSEL SEGMENTATION
Recent advances in deep learning are transforming medical imaging, particularly cerebrovascular vessel segmentation. A substantial amount of research is being conducted on this topic utilizing deep learning. Generally, a deep learning model for vessel segmentation follows a generalized pipeline which is shown in Figure 4. The pipeline is developed based on the multiple works done on the topic as a summary.

A. DATASET AND EVALUATION METRICS
The study of brain vascular segmentation needs Magnetic Resonance Imaging (MRI), and MRA is a particular type of MRI. Because of its short echo time and utilization of flow correction, TOF-MRA is the most widely used technology for non-contrast bright-blood imaging of the human vasculature. Concerned with privacy and ethics, most BVS research uses TOF-MRA data acquired by the research team. As a result, most of the datasets utilized in earlier studies were private. Table 1 is an overview of the dataset widely utilized by academics, including the resolution and quantity of the data.
In medical imaging, image voxels are categorized as vessel voxel (Positive) or non-vessel voxel (Negative). The ground truth labels are compared with voxel identification to determine the identity of each voxel. True positive (TP), true negative (TN), false positive (FP), and false negative (FN) are the four fundamental measurements. The metrics are presented in Table 2 below.
In BVS using deep learning, data annotation is a significant component of the process. As most of the study follows a supervised technique, the ground truth of the data is mandatory. Even though some research tries to adopt an unsupervised method, ground truth is still essential to qualitatively examine the unsupervised output to measure the model's performance. In most situations, the annotation is done manually by experienced observers with several years of expertise in Radiology. The observer utilizes software to segment each voxel manually. Some software is used frequently for ground truth segmentation, i.e., ITK-SNAP [61], MevisLab [62], etc.
Usually, the annotation process is determined by the data collection technique. Before segmenting the actual mask, image processing, active contour techniques, or statistical models are employed to identify the Region of Interest (ROI). For example, in the paper [63], the observer used ITK SNAP software to generate a pre-segmentation mask using the active contour segmentation pipeline. Later, domain experts utilized the pre-segmentation mask for post-manual enhancement. In the study [64], the grey transformation was utilized as a method of image processing to help distribute grey image values for improved annotation. In a different study [15], histogram-based thresholding on maximum image intensity was employed to select the ROI, which was then manually annotated by an observer. Manual segmentation may require post-processing to guarantee that the mask has no discontinuous regions or holes [65].
Some of the principal assessment metrics typically utilized in the BVS study are listed in Table 3. A few metrics may have distinct names but equivalent expressions; for instance, DSC and F1 scores are equivalent, and the true positive rate (TPR) is equivalent to Recall and Sensitivity. The average Hausdorff distance from point set X to Y is the sum of all minimum distances between all points in X and Y , divided by the number of points in X , where X is the ground truth, and Y is the segmentation.

B. PREPROCESSING
Deep learning algorithms typically extract features from unprocessed data, with researchers mainly focusing on model optimization rather than data preprocessing. However, some standard preparation is required for the medical image because it contains noise. Following is a discussion of some standard approaches utilized to solve this issue.
When dealing with a model based on deep learning, the data set must be preprocessed in a certain way to feed the model. Because the models do the feature extraction automatically, it is necessary to perform some preprocessing to eliminate discrepancies in the feature extraction. The primary issue with medical data is the dataset's limitations. Data augmentation is often used to address this issue, in which data is added with low noise, rotation, blur effect, or gaussian blur. Augmentation techniques are frequently used in preprocessing cerebrovascular vessel identification [15], [55], [66]. Normalization is a widely used approach for reducing bias in any dataset, used in conjunction with any machine learning technique. The concept of normalization means altering the value of data without changing its nature. Researchers often utilize various normalization forms, although z-score normalization is favored [15], [66]. Bias adjustment is another often-used technique for optimizing model performance. N4 bias correction [15], [32] and multiplicative intrinsic component optimization (MICO) [67] are two of the most frequently utilized forms of bias correction algorithms. MRA imaging typically includes both the brain skull form and the actual picture. Brain stripping is the process of removing the skull from a brain scan. It is possible to smooth the process using the BET2 algorithm [32], [67]. Resampling the image, performing a maximum intensity projection (MIP) [67], creating a three-dimensional generalized Gauss Markov random field (GGMRF) [68], and extracting a three-dimensional patch are some common preprocessing methods used in cerebrovascular vessel segmentation.

C. MODEL ARCHITECTURE
After preprocessing, the creation of features is a necessary component of a deep learning algorithm. Unlike machine learning algorithms, deep learning models can automatically extract features from data. Researchers use different model architectures and algorithms to train and extract features. Following is a discussion of the current architectures used in BVS research, which we have categorized in accordance with the model architecture.

1) CUSTOM CNN MODELS
Previously, 2D and 3D CNN models with several layers were employed to segment the BVS data. We reviewed 11 2D and 3D CNN models in the following sections and summarized their performance evaluation in Table 4.
Phellan et al. [28] initiated the first study on deep CNN-based brain vessel segmentation (BVS). In the study, two convolution layers (followed by a ReLU activation function) and two fully connected layers (FCN) were employed to create a 2D CNN model. Although the accuracy was not particularly convincing (DSC: 0.764 to 0.786) compared to the current model, the researcher suggested that a complex deep CNN model could segment vessels more precisely. According to the authors, a basic CNN architecture requires a small amount of well-segmented ground truth data to get a satisfactory result appropriate for clinical use.
According to a study [31], a 2D CNN model paired with MIP can significantly reduce the False-positive (FP) rate and. maintain a higher sensitivity (SEN: 0.70) than human radiologists. Instead of optimizing or introducing a model, the research focused on developing a CAD-based system that can train and evaluate using large-scale data. This implementation is available for use on multiple platforms as a plugin.
To address the issue of insufficient leveling of medical data, Zhao et al. [69] suggested a method in which erroneous tube-level labels for vessels were created and utilized to train a Hierarchical CNN (H-CNN) architecture. The H-CNN model was verified using stopping conditions that generated six quantification indices. The ground truth was partially annotated voxel-level labels at the circle of Willis Kandil et al. [70] proposed using TOF-MRA data to segment brain arteries using a 3D CNN model. The TOF-MRA data was divided into two subgroups based on the Circle of Willis (CoW) location -above the CoW and at and below the CoW. Later, two groups were fed into the 3D CNN model to segment the data. The method achieved a DSC score 0.8437. Fan et al. [67] did a study in which they employed Hidden Markov Random Fields (HRMFs) to pre-segment data before passing it via deep learning models. As thick blood arteries have a greater intensity difference, the HMRF approach extracted the thick blood artery from the image of the brain. The HRMF approach used Gaussian distributions to represent the extracted vessel. Vessels that were extracted were later used as labels to train DNN models. Manual annotation was used to verify the results. HMRF, HMRF + SegNet 2D, and HMRF with U-Net 3D: A comparison analysis was validated using these three models. It was discovered that HMRF With DNN produces excellent results over HMRF alone. Using MIP (maximum intensity projection) pictures, the observation was compared in the axial, coronal, and sagittal orientations using MIP images. While DNN performed better in vascular segmentation, the primary limitation is that it takes a large quantity of data, which led the authors to investigate an unsupervised technique that requires fewer data. The method attained a DSC score of 0.79.
One of the primary drawbacks of 3D models is that it takes excessive time to train a deep learning model. To address the problem, Tetteh et al. [71] suggested a novel deep learning architecture (DeepVesselNet) to address the issue, utilizing 2D cross-hair filters that outperform 3D filters. Cross-hair filters, the study found, considerably cut training time and memory consumption. To prevent the over and under-segmentation challenges, a new weight and FP rate correction was incorporated in the research that improved recall and precision during training and gained a DSC score of 0.87.
Other CNN variants are frequently employed in vascular segmentation. Joo et al. [66] used a 3D Resnet architecture to classify, followed by a pixel-wise voting technique to generate bounding boxes around the vessel. However, the study is restricted to vessels with a diameter greater than 3 mm.
A generic 11-layer 3D CNN model for medical image segmentation, specifically for brain imaging of all imaging modalities, named as DeepMedic, was suggested [74]. The model employs a dual pathway architecture that simultaneously integrates local and broader contexts while processing several scales. The 3D Conditional Random Field (CRF) is used for post-processing, which successfully eliminates false positives. Initially, this model was used for brain lesion segmentation using MRI datasets. This model was later used in various imaging modalities and segmentation tasks. Ziegler et al. [65] used contrast-enhanced MR angiography (CE-MRA) data to train multi-segmentation carotid arteries using the DeepMedic model. DeepMedic was also compared to other state-of-the-art models in several research studies to see its performance across diverse datasets and data modalities. In a study, [73] DeepMedic was compared against U-net for a cerebrovascular segmentation task on DSA data and reported that DeepMedic outperformed (DSC: 0.80) the widely used U-net model. In a further study, [16] BRAVENET was evaluated with DeepMedic for cerebrovascular segmentation on a TOF-MRA dataset, and the results (DSC: 0.91 & 0.89, respectively) showed that DeepMedic transcends BRAVENET by a small margin.
A DenseNet model was improved by incorporating dense connection and dilated convolution [72]. By extracting highlevel semantic features and detailed low-level features, the proposed DD-CNN model performed more effectively (DSC: 0.97). The segmentation task was followed by a preprocessing step that generated data labels and implemented a Clean-Mechanism model to enhance the quality of automatically generated labels. The model generated successful outcomes for sparsely labeled data.

2) U-NET MODEL AND ITS MODIFICATIONS
U-net architecture is one of the most well-known deep learning models for medical imaging nowadays. Both 2D and 3D U-net produce extremely accurate segmentation results when images are properly preprocessed for cerebrovascular vessel segmentation [15], [75]. However, several variants of the U-net models are being developed in medical imaging and show great promise. In this survey, we reviewed 12 model architectures that utilized the U-net model and summarized their performance evaluation in Table 5.
To segment brain vessels from TOF-MRA data, Livne et al. [76] proposed a modification to the U-net architecture. For simplicity, the 2D U-net model was reduced by half in each layer. As the name implies, the model is half the size of a traditional U-net model, lighter, and faster. The classic U-net and graph-cut algorithms were compared to the half-U-net model. Even though the performance (DSC: 0.88) was not significantly greater than that of the classic U-net model, training time and parameters were significantly reduced.
Hilbert et al. [77] proposed a modification of the 3D U-net model that combined multiscale context aggregation and Deep Supervision (DS). Incorporating context aggregation into the U-net model was intended to improve the segmentation of small vessels, whereas DS was used to facilitate the convergence of intermediate layers to avoid exploding or vanishing gradient issues. Even though the network had more parameters and layers than the base U-net model, its segmentation performance (DSC: 0.93) was superior.
In another modification of U-net architecture [78], the 2D MIP features were projected into the 3D volume segmentation network to incorporate the dependability of the learned features instead of using the complex features created empirically. The presented JoinVesselNet model uses 3D U-net as a segmentation branch and half 2D U-net as a 2D composite MIP segmentation branch. The model's projection increases the local vessel probability. The DSC score of the model was 0.72.
Fu et al. [79] proposed a 3D CNN model for vessel segmentation in CTA-based images of the head and neck. The 3D CNN model consisted of two components: the ResU-net model, primarily responsible for bone segmentation and vessel extraction, and the Connected growth prediction model (CGPM), which was used to maintain vessel integrity. Bottleneck-Resnet (BR) was implemented in the modified U-net model to select the optimal parameters automatically. This model was successfully tested in clinical settings, and the performance (DSC: 0.94) in terms of time and accuracy was superior to manual segmentation.
To address multiscale spatial information of vessels, a new architecture was developed that blends 3D U-net with 3D FCN [80]. The anatomy of the vessel was obtained using two parallel channels. The 3D U-net learns local details, whereas the 3D FCN learns the general spatial link between vessels and adjoining tissues and the morphological information of the bigger vessels. With the highest DSC, the MDNet-Vb model surpasses Resnet, DenseNet, 3D U-net, V-net, and DeepMedic (72.91% and 69.32% consecutively on CTA and MRA datasets).
Using multiscale inputs and residuals, Min et al. [81] proposed a modification to the U-net architecture. Inspired by [77], the proposed method added two 1×1×1 convolution layers on the final level to achieve a fully connected layer. To restore the original image size, the max-pooling layer was replaced with upsampling during the decoding process. The method demonstrates excellent segmentation accuracy and generalizability (DSC: 0.92).
Vos et al. [15] conducted five experiments using CNN with U-net architecture. The experiments involved various types of data augmentation and varying input patch sizes. Experiments with 2D U-net and 3D U-net architectures were conducted to determine that augmentation performs (DSC: 0.72 to 0.83) better with TOF-MRA data.
Cheng et al. [64] presented a method for segmenting intracranial aneurysms using unreconstructed 3D-RA sequencing data based on a U-net model. The spatial Information Fusion (SIF) feature was obtained by recording multiple successive image frames to create a new image sequence in which the region of interest (ROI) was used to stitch the images together. In place of binary cross-entropy, the Focal Tversky Loss function is used to reduce the class imbalance between positive and negative data. The segmentation performance with the SIF feature was tested by comparing it to traditional features with a high dice score (Avg DSC: 0.22).
Lee et al. [82] proposed a new model (Spider U-net) for segmenting blood vessels from various organs (including brain vessels), with U-net serving as the baseline. Spider U-net model was a 2D model architecture that took 3D images with inter-slicing connectivity into consideration for vessel segmentation using RNN. The model was divided into two components: warp path -multiple 2D U-net models used in parallel to extract spatial features sequentially; and weft pathbidirectional convolutional LSTM used between the encoder and decoder warp path to capture the inter-slice connectivity along the z-axis. A new data-feeding strategy for the Spider U-net -striding stencil (SS) -was implemented to optimize memory and training. The model gained a DSC score of 0.79.
Liu et al. [83] proposed a CNN model based on 3D U-net and MIP that is comprised of two streams: a spatial attention-guided 3D Inception U-Net segmentation stream and a 2D composited multi-directional MIPs U-Net segmentation stream. The method considers small and large blood vessels by combining 3D features with 2D MIP features in three directions. They substituted the convolution block with the inception block and incorporated the attention block in order to boost performance (DSC: 0.94) and reduce computation.
Simon et al. [84] presented an automated segmentation technique that used a 2D U-net model to segment the brain's anatomy, including the vessels. They combined the vascular anatomical information from multiple clinical MR image modalities (MRI, MRA) into a single anatomical map to automatically segment the different parts of the brain. A slight modification was made to the 2D U-net. Three additional input channels were added to accommodate three image modalities (3 MRI images), and the output channels were increased to five to classify five distinct brain regions within a single image.

3) OTHER MODELS
Many new deep learning algorithms are employed for vessel segmentation in addition to the traditional deep learning approaches listed above. Here, we have reviewed six relatively new model architectures used for BVS and summarized their performance evaluation in Table 6.
Autoencoder (AE) is a deep learning algorithm that compresses the input into a lower dimension (called representation) in latent space and then reconstructs the output from this lower dimension to the original input dimension.
L. Chen et al. [85] introduced a convolutional autoencoder (CAE) model for cerebrovascular segmentation from 3D TOF-MRA data. The 8-layer CAE model used the structural advantages of autoencoder, which is typically used for noise reduction in images and is employed in a supervised manner. When compared, the model outperformed (DSC: 0.74) three traditional methods (Renyi entropy, Phansalkar local threshold, and Frangi vesselness filter).
The attention mechanism is a deep learning technique for image recognition that focuses on a smaller but vital image portion. H. Zhang et al. [86] proposed the Reverse Edge Attention Network model (RE-net), inspired by the reverse attention mechanism. The model identifies the principal feature that includes edge information and removes extraneous features. The Retinex model preprocessed the data to remove image noise and redundancy before being fed through the REnet model. The proposed model outperforms (DSC: 0.69) the other models tested in the study.
Ni et al. [87] presented a multi-path module and attention mechanism to segment the cerebral vessels. The path module ensured that the network's many Convolution and pooling layers did not diminish the extracted feature information. Initially, a 1 × 1 convolution and bilinear interpolation were used to generate two features, which were then sent into the attention module (USM) to extract additional contextual information. Finally, a 1 × 1 convolution operation was performed to reduce the dimensions of the features. The model achieved a DSC score of 0.97.
Li et al. [88] suggested a novel attention-based medical image segmentation technique evaluated on various organ datasets, including intracranial arteries. The model contains a channel self-attention encoder (CSE) for calculating the similarity between pixels to learn the feature graph's longrange correlations more effectively. In the upsampling stage, the spatial attention up-sampling (SU) block was employed to restore the low-resolution information to its original state by focusing more on the critical pixels. The model's DSC score was 0.87.
The Generative Adversarial Network (GAN) is a deep learning technique that can generate fake data that learns from and imitates the training data. The generated synthetic data inherits the characteristics of real-life data. Kossen et al. [90] proposed the creation of 2D synthetic data using GAN to handle data augmentation and anonymization for brain vessel segmentation. GAN-generated synthetic data was used to train the U-net model, then tested on real data, yielding a positive outcome (DSC: 0.90 on real data; 0.82 & 0.88 on synthetic data).
Similar work has been conducted by Subramaniam et al. [91] in which 3D TOF-MRA and labels were created using a variation of the GAN model - Wasserstein GAN (WGAN). Four distinct types of WGAN were employed to produce pairs of patches and labels. The data were used to build a 3D U-net model separately to assess the performance of synthetic data in comparison to actual data. SN-MP with double filters per layer (c-SN-MP) model performed (DSC: 0.84) the best among all four WGAN models.

IV. DISCUSSION
The construction of appropriate surgical designs is facilitated by the knowledge of the branching pattern and spatial interactions between different vessels. Unlike organ segmentation, vessel segmentation is challenging due to the vessels' complex, heterogeneous background and significant noise directly influencing segmentation results. The shape-based approaches to organ segmentation work well and can be easily combined with other methods to improve segmentation outcomes. However, applying shape models to segment vessels with a branched tree topology is difficult due to the vessels' detailed structure and the presence of small image components. In addition, data noise and variable intensity ranges have a direct influence on segmentation approaches [92]. The U-net model was the most investigated in brain vascular segmentation since it showed significant promise in medical imaging. Recent publications have proposed several variants of the U-net model that use various strategies to minimize training time and improve accuracy. A small amount of recent study in the domain has utilized attention mechanisms, and the most recent introduction of GAN has given a new opportunity. In the following sections, we will discuss the challenges in BVS that is evident currently.

A. CHALLENGES WITH DIFFERENT DIMENSIONS
The vessel's direction occurs not just in the X-Y plane but also along the Z-axis, causing 2D techniques to lose vital information along the Z-axis when used to 3D images. Subtle variations in an image's intensity will also significantly impact the final segmentation results. Complex vessel geometry and topological changes, sparse vessel data in a large-sized 3D volume, and a scarcity of available 3D vasculature datasets all provide considerable obstacles to 3D cerebrovascular segmentation. Domain scientists frequently employ MIP to observe and analyze the vascular structure in three dimensions for diagnosis. Its adaptability to geometric variation and scaling can improve the local vessel signal by suppressing noises. The projection of a three-dimensional volume into a two-dimensional MIP space can increase the local vessel probability and SNR ratio [78].
The difference between 2D and 2.5D models is that 2.5D uses the 2D model where 2D image slices are used for training, and later the output is post-processed to make a 3-D output. The model still fails to understand the spatial features that exist in the 3D image. When dealing with three-dimensional medical volumes, the problem of memory utilization and processing performance is increased. Compared to 2-D CNNs, optimizing and executing calculations for 3-D CNNs requires an enormous amount of time. However, when a 2-D CNN is applied in a slice-by-slice fashion, crucial 3-D background information for monitoring curvilinear structures is lost. To handle the dilemma, 3D cross-hair filters were proposed [71].

B. CHALLENGES WITH DEEP LEARNING IN BVS
While deep learning has improved accuracy in categorizing medical images, there are still certain limitations. The first issue with the Deep Learning model is data scarcity. Deep learning models perform best when sufficient data is provided from which to learn. Still, it is challenging to obtain adequate medical data for training and testing purposes. Annotating data is another significant difficulty since it involves human interaction and takes substantial time for medical specialists, making the work laborious and expensive.
In BVS research, the dataset is a significant concern because privacy regulations complicate research data sharing. Therefore, it becomes a deadlock for deep learning researchers to work. As a result, more than 80% of the BVS data reviewed in the article was using private data. This brings up the following issue: the applicability of presented models in clinical settings. Most research focuses on a particular data distribution, which begs the question of how well it will function with data from a different distribution. For this issue to be resolved and for the research to flourish, a substantial public brain imaging database is necessary.
Researchers apply a range of approaches to address the problems. Data augmentation is a method for overcoming data limitations by introducing minor transformations to the data, such as rotation, blurring effect, and mirroring. Another option is to employ transfer learning, in which previously acquired information is applied to new data to begin training the parameters for the new model. Patch-wise training is a method that divides a picture into numerous patches, which can occasionally benefit in overcoming a challenge. Creating synthetic data using the GAN model could be another approach to the problem. It has the potential to generate synthetic data that appear authentic, so resolving the privacy issue. Recently, Kossen et al. [93] attempted a similar task. However, due to the sensitivity of brain data, research is currently ongoing to determine the quality of synthetic data.
Another difficulty in deep learning is data imbalance, which occurs when the distribution of the target sample and the healthy example is not the same. Typically, in medical imaging, healthy data is abundant, while target data is sparse, resulting in a data imbalance and a significant gap in model accuracy. The issues can be resolved by reweighting samples VOLUME 10, 2022 during training, with a larger weight assigned to foreground patches [94].
While training a deep learning model, some difficulties may arise. Overfitting is a significant issue when training deep learning models and typically happens when sparse training data is utilized. When a model does an excellent job of learning the pattern and noise in the training data but cannot detect similar unseen data used for testing purposes, dropout can be utilized to mitigate the overfitting problem. Another significant concern with deep learning is the lengthy training period required to learn and forecast. Though the topic is currently under active research, academics are attempting to devise a clever solution. To date, pooling layers have been used to lower the dimension of the feature vector and hence the processing time. Gradient vanishing is another issue in deep learning; it occurs when deep models fail to adequately backpropagate the final loss, leaving the model performance constant. Due to the enormous number of parameters and minimal voice variance between the target and nearby voxels, this issue is even more significant in 3D models. Reducing the search space in which the target voxels are positioned can significantly minimize the complexity of the 3D model.

C. FINDING PROPER LOSS FUNCTION
Finding a proper loss function is another critical problem for BVS. Blood vessels make up less than 3% of the voxels in a patient's image volume. This bias toward the base class is typical in medical data. Existing class balancing loss functions that train CNNs are numerically unstable in extreme cases. The process may be skewed toward identifying irrelevant background voxels when training with the current cost function and a significant class imbalance. It typically leads to low recall in favor of high precision in predictions. An inappropriate loss function may raise two significant problems. First, there is the problem of numerical instability. Since the loss takes such huge values, the gradient computation becomes numerically unstable for extensive training sets. Next, the significant false positive rate presents difficulties. A high false-positive rate is indicated by increased recall in both the training and testing stages.

D. ISSUES WITH EVALUATION METRICS
Finding the proper matrices to test the model is one of the significant challenges associated with brain vascular segmentation using deep learning. The scenario is depicted more clearly in the studies mentioned above if the evaluation criteria are closely examined. DSC is the primary evaluation matrix in all the results, followed by Hausdorff Distance (HD) and variation of HD like 95% HD, balanced Average Hausdorff Distance (bAVD), and Average Hausdorff Distance (AVD). Sensitivity, Specificity, recall, Conformity, Sensibility, and numerous other matrices are often employed to evaluate the efficacy of a model for cerebrovascular segmentation. Although the DSC is empirically preferred, there is no scientific evidence that the Dice coefficient, or any other metric, is the best option for arterial brain vascular segmentation. As a result, when the model's performance is claimed, it generates ambiguity. Aydin et al. [95] studied the evaluation ambiguity of vascular segmentation using the manual visual score and discovered that HD and AHD have the highest average correlation among 22 different regularly used evaluation matrices. On the other hand, DSC is ranked 7th on the evaluation matrix correlation list in the visual score. DSC and other similarity-based performance matrices overlook the importance of voxel localization in cerebral vascular segmentation, but distance-based matrices do not. Therefore, HD and AHD should be fundamental evaluation matrices rather than DSC, according to the study.

V. CONCLUSION
The Brain vascular network is a vital component of the human body that might exhibit life-threatening abnormalities. For specialized clinical activities requiring surgical design planning, the development of CAD-based systems, and early patient diagnosis, segmentation techniques with varying degrees of precision may be required. It can also help the radiologist segment the vessels more efficiently. In the past, researchers proposed a variety of supervised and unsupervised strategies, which lacked accuracy and generalizability. Deep learning is relatively new in this field, but its popularity is growing quickly due to its effectiveness. Deep learning's robust feature extraction approach surpasses machine learning's hand-crafted features. This paper examined articles from the previous five years on brain blood vessel segmentation using deep learning. Our primary contribution is analyzing existing deep learning models and challenges faced while segmenting brain vasculature. It will assist researchers in gaining a complete understanding and developing a potent segmentation model for brain vessels.

VI. FUTURE PROSPECT
Current developments in the BVS have the potential to produce a CAD-based system for precise diagnosis. To get clinical approval for the CAD-based system, researchers should focus on specific areas. Almost all the studies centered on segmenting the vessel networks, which are already sufficiently complicated. However, more emphasis should also be placed on segmenting small vessels. Cerebral Small Vessel Disease (CSVD) are subbranch of CVD that occur when BBB leakage of small vessels inside the brain tissue, which create various complexities. The sharp characteristics of the vascular network necessitate a more sophisticated feature extraction technique for segmenting small vessels. BVS's selection of cost function and evaluation metrics is subject to ambiguity. An explanation and guidelines are required to choose between more appropriate metrics and cost functions. Generalizability is one of the critical challenges with the BVS models. Most of the work is performed on the dataset gathered in closed environments. It is required to crossvalidate models against other distributions to improve the model's generalizability. Due to the lack of publicly available information, it becomes impossible to do so. In addition to additional public data, if the weights of prior models could be provided, it would aid future researchers in expanding their work.