A Survey of Deep Learning for Retinal Blood Vessel Segmentation Methods: Taxonomy, Trends, Challenges and Future Directions

Recent advancements in deep learning architectures have extended their application to computer vision tasks, one of which is the segmentation of retinal blood vessels from retinal fundus images. This is a problem that has piqued researchers’ interest in recent times. This paper presents a review of the taxonomy and analysis of enhancement techniques used in recent works to modify and optimize the performance of deep learning retinal blood vessels segmentation methods. The objectives of this study are to critically review the taxonomies of the state-of-the-art deep learning retinal blood vessels segmentation methods, observe the trends of the enhancement techniques of recent work, identify the challenges, and suggest potential future research directions. The taxonomies focused on in this paper include optimization algorithms, regularization methods, pooling operations, activation functions, transfer learning, and ensemble learning methods. In doing this, 110 relevant papers spanning the years 2016 to 2021 are reviewed. The findings could aid future research plans, while the suggested ideas would improve the predictive accuracy of future models for automatic retinal blood vessels segmentation algorithms with good generalization ability and optimal performance.


I. INTRODUCTION
The retina is the light-sensitive tissue responsible for central and peripheral vision, while retinal blood vessels nourish and cleanse the visual system. Any variations or damages to the vascular morphology of the retinal blood vessels may lead to ophthalmological and cardiovascular issues such as Diabetic Retinopathy (DR), Macula Oedema (ME), glaucoma, cataract, and hypertension [1], [2]. DR is an eye disorder that arises from microvascular complications in people with diabetes mellitus due to a high surge of glucose in the bloodstream. Thus, diabetic individuals are more prone to DR [3], [4]. The International Diabetes Federation (IDF) reveals that globally, 425 million people have diabetes, while an estimated 693 million are likely to become diabetic by 2045 [5]. The most common DR risk factors are prolonged or poorly managed diabetes mellitus, high blood pressure, pregnancy, age, and high cholesterol, all of which affect the visual tissue [6].
The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan . DR is a sight-threatening eye disease, but not a vision impairment sentence if detected and treated at the early (mild) stage. Early detection of DR abnormalities helps the ophthalmologist with diagnoses necessary for the effective prevention of complications that could lead to untimely blindness. Unfortunately, early detection is a challenging task due to the visual complexities of retinal fundus images such as noise, varying contrast, uneven illumination, and variations in the appearance, shape, size, location, and shape of the blood vessels, as shown in Figure 1. DR is relatively asymptomatic and progressive, as shown in Figure 2. DR progressively damages the small blood vessels in the retina until the sudden manifestation of impaired vision at a severe, high-risk stage [7], [8].
Thus, regular screening is necessary to facilitate early detection of DR. Early detection of DR will enable proper monitoring of the DR progression rate, which can be remarkably rapid from early to high risk. Timely detection and treatment of DR abnormality at the early stage can forestall untimely, irreversible vision impairment [9] by 95% and stop future re-occurrences [10]- [13]. Vision loss or blindness through DR is incurable, but preventable.
Currently, DR is one of the leading causes of vision loss and blindness that mainly affect the productive-aged population, i.e., adults [14], [15]. According to the World Vision Report, the world hosts an estimated 2.2 billion people challenged with vision impairment or blindness. Of this number, about one billion people could have been prevented from untimely blindness through early detection and proper clinical management [16]- [18]. This failure in early detection could be attributed to a scarcity of experts in the field, as well as a high screening cost or the inaccessibility of screening facilities. These challenges are particularly prevalent among the less privileged and those in rural areas.
Manual screening, known as the standard approach, requires the skill of experts (who are few in the field). The process is laborious, time-consuming, and unreliable [19]. Aside from its high cost and inaccessibility to the poor and less privileged in rural areas, the manual screening approach cannot cater for the vast number of victims with one eye problem or another. Hence, automatic deep learning retinal blood vessels segmentation methods become indispensable as a suitable screening approach to improving the diagnosis of ophthalmological related diseases for early detection and timely treatment.
Several published review papers on deep learning retinal blood vessel segmentation methods from different perspectives can be found in the literature. Soomro et al. [20] surveyed recent advances in deep learning methods for retinal image analysis. Imran et al. [21] conducted an analysis of various machine learning and deep learning methods for automated blood vessel segmentation in retinal images to compare their performance. Chen et al. [22] figured out the deep learning network architectures and models trends. Khan et al. [23] present a comprehensive survey of traditional supervised, unsupervised and neural network techniques, and strategies. Jia et al. [24] focused on studying the conventional machine-learning and deep-learning frameworks. Li et al. [25] reviewed successful deep learning methods in fundus images. Badar et al. [26] focused on deep learning application papers for retinal image analysis. While, Sengupta et al. [27] surveyed Ophthalmic diagnosis using deep learning with fundus images. The differences between this review article and recent/existing review literatures in this domain are presented in Table 1 and analysed below Table 1.
This paper critically reviews state-of-the-art research on deep learning network architectures and their enhancement techniques in the segmentation of retinal blood vessels from retinal fundus images within the time period between 2016 and 2021. In addition, the study presents taxonomy and analysis of the trend of recent methods in existing publications in this domain. A taxonomy conveys the relationships between current works and classifies them based on identified strengths and weaknesses, and gives suggestions that can help researchers and others easily comprehend the topic [28]. Also, a trend analysis gives a run-through of the identified gaps from the current work, suggesting directions for bridging the identified gaps in future research.
This review paper aims to give insight to the state of current research attainment in this domain. It also aims to motivate the development of more robust, efficient, and reliable deep learning models with optimal performance and good generalization ability for automatic retinal blood vessel segmentation. This improvement will facilitate the approval of a deep learning approach for automated retinal blood vessels segmentation systems by the Food and Drug Agency (FDA), the regulatory body [29]. More approval will remove the barrier of inaccessibility and enhance early detection and timely treatment options, which in turn will reduce: • the global estimate of premature vision-impaired victims who could have been prevented [16]- [18], [20]- [29]; • the enormous global financial burden on vision-related problems attributed to a reduction in productivity, since working-age groups are mostly affected [30]- [33]; • the burdens on the victims, family members, and the health sectors [34]- [36]; and • the reduction in Quality of Life (QoL), manifesting as job loss and reductions in other indices of wellbeing such as social involvement, psychological health, security, self-dependence [37]- [42]. This paper aims to review the enhancement techniques of deep learning state-of-the-art retinal blood vessels segmentation methods, while the objectives are as follow: • produce the taxonomy of the deep learning state-of-theart methods for retinal blood vessels segmentation; • picture the trends of enhancement techniques used to modify and enhance the performance of the state-of-theart methods; • identify limitations of the existing methods and; • suggest possible future research directions that are oriented towards developing and achieving improved deep learning models for automatic retinal blood vessels segmentation.
A. ANALYSIS AND DISCUSSION OF TABLE 1 According to George Box, ''All models are wrong, but some can be made useful. . . '' [43]. Although Box argued that all VOLUME 10, 2022  models are wrong, it is not because they are inefficient in all contexts but limited in specific performance. This implies that no model is perfect but can be enhanced for optimal performance. The enhancement of models towards optimal performance is the motivation for this survey article. Therefore, deep learning architectures can be optimised by modifying some parameters to minimise computational complexities and obtain a performance that outperforms the preliminary baseline. This modification requires the enforcement of regularization techniques. At the same time, overfitting is a popular deep learning drawback but a checkable limitation that can be alleviated using an efficient activation function. Testing errors are minimised using a suitable loss/objective function adequate for the training dataset limitations. Inappropriate pooling operation hampers predictive accuracy, to mention but just a few. The application of these enhancement techniques contributes immensely to the model's performance and generalization ability. The review of the architectural enhancement techniques in each of the application papers in this article forms the basis of this paper's focus, scope, and objectives. Table 1 shows the comparison of this review paper with recent/existing review papers to establish their differences.
A critical analysis of Table 1 reveals that the most recent and closest review paper to this article is the excellent review work in Chen, et al. [22]. Although it is similar to this article in segmentation task (retinal blood vessel segmentation) and method (deep learning), they differ in technical approach, scope, and contributions. The authors in [22] VOLUME 10, 2022 focused on the deep learning network architectures for retinal blood vessel segmentation methods and figured out the trend of models. In comparison, my review paper considers the network architectures of the proposed algorithms with their usage trends and dived deeper to review and analyze the technicality behind the performance of these architectures in terms of optimization, regularization, and other enhancement techniques used to improve model's performance. This article further analyzed the usage trend and distribution using graphs and table as presented in sections VI, VII and Table 7. In addition, the authors in [22] reviewed 87 papers, while this paper reviewed 110 application papers. These, in conjunction with Figure 3, demonstrate the differences between my article and the most recent review paper in this domain in terms of novelty, uniqueness, and contributions.
Similarly, Soomro, et al. [20] focused on deep learning retinal blood vessels segmentation with the coverage of 17 review articles spanning from year 2015 -2018. In contrast, Li et al. [25], focused on application of deep learning methods in fundus images and partially covered 26 method papers that are deep learning retinal blood vessels segmentation methods spanning from year 2016 to 2020.
In conclusion, the critical analysis of the similarities and differences between this review article and recent/existing review papers on deep learning retinal  Table 1. vessel segmentation methods above, clearly reveal that the recent/existing review papers majorly focused on deep learning models applied to predict retinal blood vessels. They differ significantly from this review article in objectives, VOLUME 10, 2022 scope, and contributions to knowledge as depicted in Figure 3.

B. THE SIGNIFICANCE AND CONTRIBUTIONS OF THIS STUDY
The contributions of this study to the body of research knowledge include: • To the best of my knowledge, this review article is the first to extensively review enhancement techniques to achieving good generalization ability and optimal performance in deep learning retinal blood vessel segmentation methods and meticulously considers the enhancement techniques behind the models' performance as detailed in Table 7 and analyzed in Section VII.
• This paper in Section III comprehensively outlines the roles of each enhancement techniques identified as the fundamental keys for the modification and enhancement of deep learning models towards optimal performance that outperforms the preliminary baseline architectures and also summarized some precautionary measures to ensure adequate selection and appropriate application of each technique at various scenarios in Tables 3, 4, and 5, in Section III of this manuscript.
• Also unique in this review article is the graphical analysis of distribution and usage trends of each enhancement techniques identified in this paper. This helps to objectively bring out open challenges in this domain through critical analysis. In addition, it suggests novel activation techniques that has never been suggested in any recent/existing review papers as a consideration for better improvement in future research in this domain.
This paper is structured as follows. Section II presents an overview of deep learning methods. Section III outlines and describes CNN architectures, components, optimization techniques, and regulations strategies. Section IV presents the CNN classification and segmentation architectures. Section V details the evaluation metrics and datasets mostly used in the reviewed literature. Section VI focuses on the key taxonomies in the literature. Section VII presents an analysis of the trend, challenges, and future direction of deep learning techniques for retinal blood vessels segmentation. Section VIII concludes the review.
A total of 110 journal and conference papers with informative ideas were sourced and reviewed in this study, as presented in Table 2.

II. OVERVIEW OF DEEP LEARNING ARCHITECTURES
Deep learning is a subclass of machine learning, while machine learning is a subgroup of artificial intelligence. It is a member of the Artificial Neural Networks (ANNs) family of methods which mimic the behavior of the human brain in terms of organization and roles. Deep learning has been in existence for decades with many names, but later resurged less than a decade ago through an unprecedented breakthrough in the backpropagation algorithm which aided the training of deep layer feed-forward neural networks [44]- [46]. Its ability to learn features hierarchically through its deep layers during network training resolves the challenge of feature handcrafting [47]. In addition, there is a remarkable improvement in the deep learning model's segmentation performance compared to traditional supervised and unsupervised methods, though there is still room for improvement.

III. CONVOLUTIONAL NEURAL NETWORKS
CNN architectures have outperformed other deep learning methods in computer vision and image processing tasks. The robustness of CNN in computer vision tasks lies in their ability to build higher-level features to extract more meaningful, rich features for classification, object detection, and segmentation tasks. The importance of convolutions in deep learning cannot be over-emphasized.

A. CNN ARCHITECTURAL LAYERS
The main layers for the construction of CNN architectures are the convolution, activation, pooling, batch normalization, and fully connected layers. These layers are depicted in Figure 4 and discussed below. last convolution layers learn high-level features using the features learnt in the middle layers. The convolution operations, in conjunction with pooling operations, compact the resolution of an input image to contain only the salient feature maps [64].

2) ACTIVATION LAYER
Activation functions are an indispensable unit of the neural network. They introduce a non-linear attribute to the network, enabling it to solve classification, segmentation, and object recognition problems that cannot be linearly solved between the input and the output. Activation functions are the most informative layer component in deep learning architectures. The predictive accuracy and optimal performance of deep learning architectures depend primarily on the choice of activation function, which is why the selection of an activation function should be based on its suitability, strengths, and limitations. Sigmoid [65], Tanh, ReLU [66], LReLU [67], and ELU [68] activation functions and their attributes are outlined in Table 3. An evaluation of the impact of rectified activation functions using an empirical approach is presented in [69].

3) POOLING LAYER
The pooling layer is one of the CNN components situated between consecutive convolution blocks. Its principal function is to down-sample the resolution of the input image by compacting the filtered feature maps at the convolutional layer while ensuring that salient features are captured. It down-samples by regulating the number of parameters to prevent overfitting during training. There are different types of pooling operations, but only the more common; sum [70], [71], average [72] and max [73] pooling operations are presented in Table 4.

4) FULLY CONNECTED LAYERS
Fully connected layers are important but optional components in CNN that often feature at the end of convolutional neural networks before the output (classification) layer. Their brilliant performance in computer vision tasks such as image recognition and classification is highly commendable. The main objective of a fully connected layer is to decide the classification of a given feature into a label by flattening the output tensors from the convolution and pooling operations into a single vector to determine the probability that a particular feature belongs to a label. As shown in Figure 4, the widespread practice is to use a fully connected layer(s) before the soft-max layer.

5) BATCH NORMALIZATION
Training a deep neural network can involve some complexities, such as network instability, exploding gradients and delay in convergence during the network training [50], [74]. Therefore, the use of batch normalization becomes pertinent, especially to step-up the rate of convergence [75]. The batch normalization layer lessens the number of covariance shifts in a network. It normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation as represented in the following Equation:

B. REGULARIZATION TECHNIQUES
Regularization involves a set of parameters often used to address the overfitting challenge in deep neural networks to enhance the performance of a trained model. It is implemeted by modifying a learning algorithm to improve its generalization on a new train dataset that has never been introduced [76].
The main goal of regularization is to reduce testing or generalization error, though this may lead to a moderate increase in the training error. In training a deep learning model, a careful effort must be made to avoid underfitting and overfitting. An illustration of underfitting, normal fitting and overfitting are depicted in Figure 5.
Underfitting manifests when the trained model underlearns the fundamental features in the train datasets and achieves a high training loss. Overfitting is one of the bottlenecks and well-identified limitations in deep neural networks [77].
CNN architectures are data-hungry and need huge data volume to achieve adequate learning of features to produce a model that generalizes well on new datasets. Unfortunately, such a high volume of data is hard to come by in this domain.
Hence, CNN architectures often suffer overfitting. Overfitting occurs when a CNN model is trained on insufficient data, overlearning the underlying details and the noise in the training dataset. This overlearning hinders the optimal performance of the trained model and its generalization ability on new datasets.

1) DROPOUT
Dropout operation is a regularization approach that disconnects the activation of some neurons from the previous layer to the next layer for each training iteration, as shown in Figure 6. This procedure, which is executed at all visible and hidden layers [78]- [80] in the network, randomly drops some nodes with their incoming and outgoing connections for each training iteration during the network training. The dropout technique was designed by [81] to alleviate overfitting and enhance the generalization ability of the trained model on new datasets. Dropout can be implemented after a convolution or max pooling operation with a minimal probability and between the fully connected layers.

2) DATA AUGMENTATION
Data augmentation is a regularization technique that improves a trained model's generalization ability. It moderately transforms the original input images in the training dataset by implementing some random transformations such as rotations, cropping, right and left shifts, shearing, horizontal and vertical flips, and so on. This transformation makes the network see images that are new in appearance but not in labels, to give the illusion of a large training dataset, helping the network to learn full underlying details to enhance the performance of the trained model. Non-linear geometric distortions [82] and a random transformation of colours in a given colour space [83], [84], data augmentation techniques, etc., can be applied in computer vision. There are different augmentation strategies depending on the task at hand and choice.

C. OPTIMIZATION ALGORITHMS
In deep learning, applying an optimization algorithm is challenging. Even though there is a relationship between deep learning and optimization algorithms, they have different goals. Deep learning seeks to get a model with less generalization error by minimizing the problem of over-fitting through data augmentation, dropout, and other regularization techniques. In contrast, the latter seeks to minimise loss function for deep learning to reduce the training errors by minimising overfitting through regularization techniques. To optimise a CNN-based architecture's parameters, there are different optimizers, and these include; RMSprop [85], Stochastic Gradient Descent (SGD) [86], ADAM [87], ADAGRAD [88], ADADELTA [89], and NADAM [90]. VOLUME 10, 2022

D. LOSS FUNCTION
In convolutional neural network methods, the loss function quantifies the similarities between the predicted output and the ground truth to measure the errors in the network, correct and update the weight of the training network to achieve accurate optimization and better predictive accuracy. Hence, the choice of loss function should be carefully based on the images' attributes in the dataset, such as; boundaries, class imbalance, skewness, and distribution [91]. Nevertheless, no single loss function can yield suitably promising performance under all dataset complexities. Binary Cross Entropy [92], Dice [93], [94], Focal [95], [96], and Weighted Cross-Entropy [97] loss functions and their suitable data attributes are outlined in Table 5.

IV. CNN CLASSIFICATION AND SEGMENTATION NETWORKs A. LeNet
The LeNet is the first CNN architecture developed by LeCun et al. [82] for the original purpose of Handwritten Character Recognition (HCR). Its portability in size makes it suitable for beginners to learn CNN with ease and better understanding. It is a feed-forward architecture composed of a convolutional, pooling, activation, and two fully connected layers as depicted in Figure 7(a). At the time of its invention, tanh and sigmoid were the standard activation functions; however, with the availability of the ReLU activation function and its non-linear characteristics, the tanh could be replaced by ReLU. Although LeNet trains with few parameters, it learns features efficiently.

B. AlexNet
Krizhevsky et al. [83] introduced AlexNet architecture, which became the mainspring of CNN after it won the ILSVRC ImageNet competition [98] in 2012 with a remarkable performance. The AlexNet architecture in Figure 7(b) shares some similarities with the LeNet architecture, but it is a larger model with deeper layers and more filters. The idea of stacking one convolutional layer on the other against the norm of stacking a pooling layer on a convolutional layer originated from AlexNet.
The overlapping max-pooling operations in AlexNet impose a reduction in the network's size, which is its primary attribute. The ReLU activation functions at the end of every convolutional and fully connected layer reduce vanishing gradient and enable convergence [99], [100].

C. ZFNet
ZFNet [101] implemented the visualization idea on the AlexNet architecture by modifying its hyperparameters to evaluate the neuron's activation, in a bid to determine redundant and non-redundant neurons. This novel idea led to its first-place achievement in the 2013 competition. The ZFNet approach is similar to the visualization methods adopted in [102]- [105].

D. VGGNet
The Visual Geometry Group (VGG) in [106] invented a VGGNet network regarded to be the most profound attainable network at the time of its invention. Its invention revealed that deep networks, amongst other factors, are correlated to optimal performance, after it emerged as the first runner up in the ILSVRC ImageNet competition in 2014. The VGGNet in Figure 7(c) comprises thirteen convolutional layers and three fully connected layers with ReLU activation functions at each layer's end. The proposed method adopted the stacked-layer technique from AlexNet with fewer filters. A deeper invariant called VGGNet19 was also designed.

E. GoogleNet
Inspired by Lecun et al. [82], Szegedy et al. [107] proposed a ground-breaking GoogleNet architecture in Figure 7(d), which adopted the Inception module [108]. The architecture used 1 x 1 convolution and average pooling operation to minimise the number of parameters, reducing the computational cost and winning the ILSCRC competition in 2014. Root Means Square Propagation, developed by Hinton et al. [85], was adopted to enhance the accuracy of the network. To achieve a vanishing gradient, a gradient descent optimization algorithm for the batch learning of the network was used to effectuate step size balance that controlled the step sizes of large and small gradients [109]. The method also adopted batch normalization and image distortion for optimal accuracy.

F. ResNet
In 2015, Kaiming He et al. [110], [111] designed a ResNet architecture that won the ILSVRC award to resolve the vanishing gradient challenge. The ResNet architecture comprises residual module and pooling layer (often replaced by strided convolutional layers). The ResNet performance negates the notion that networks with deeper layers than the K. Simonyan and A. Zisserman's method are unrealistic. For better performance, the authors introduced a simple extension of a bottleneck in Figure 7(e) to the residual module by replacing the two convolutions in the original residual architecture with three convolutions, while maintaining the shortcut unit. In an attempt to improve accuracy, He et al. [112] investigated the impact of preactivation on the performance of Deep Residual Networks. The main innovation in this model was skipped connections and batch normalization of the features. More variants of residual networks have been developed [113]- [115]. Zagoruvko et al. [116] introduced an extended version of the residual network in 2016.

G. DenseNet
DenseNet architecture was introduced by Huang et al. [117]. The architecture comprises pre-activation batch normalization, activation function (ReLU), followed by 3 × 3 convolutional layers. The architecture concatenates the feature   maps from the previous layer with every successive layer's input using a feed-forward approach as shown in Figure 7(f). With this approach, each layer acquires collective intelligence from the previous layers, enabling the reuse of features. This approach strengthens the learning of diversified features and reduces the number of channels and parameters, proven to be computationally and memory efficient.
The application of CNN architectures has proven to be a cutting-edge breakthrough in classification and other related tasks. Hence, their application to image segmentation tasks is feasible. Nevertheless, such architectures cannot handle output probabilities higher than two classes due to the fully connected layers at the last layer. In addition, predictions from classification architectures suffer from a weakness identified as the loss of spatial information and low-resolution segmented output, due to the several convolution and maxpooling operations. These drawbacks limit CNN classification architectures' performance in image segmentation tasks and motivated the development of a CNN segmentation architecture for semantic segmentation of biomedical images.
Due to limitations associated with multiple classifications for output probabilities at the final layer in CNN classification networks, Lin et al. [108] replaced the fully connected layer with a 1 × 1 filter size convolutional layer. They used a SoftMax classifier at the final layer to resolve the challenge of only binary classification being possible at CNN output layers.

H. FCN
To resolve the limitation of low resolution and coarse output by the CNN classification networks, Long et al. [118] borrowed a leaf from AlexNet, VGGNet and ResNet architectures to design the Fully Convolutional Networks (FCN) architecture as shown in Figure 8 for the semantic segmentation of images. The FCN architecture consists of deconvolutional layers at the end of the convolutional (contracting) network to restore the output image's resolution to its original size. However, the FCN approach could not tackle the challenge of coarse output.

I. U-Net
Inspired by the FCN architecture, Ronneberger et al. [119] introduced the asymmetrical U-Net architecture, consisting of encoder and decoder paths, as shown in Figure 9. The encoder network is composed of 2 consecutive convolutional layers that convolve the dimension of the input image. The activation layer is embedded in the convolution layer and followed by a pooling layer where the feature maps of the input image are down sampled to reduce the image's resolution to capture the contextual information (objects) in the input image.
On the other hand, the decoder path consists of the transpose convolution layer that up-samples the down-sampled features maps for precise localization of boundaries to restore the resolution of the input image to its original size. Between the encoder and the decoder networks are the skip connection to recover lost spatial information by concatenating the contextual and localization information to obtain a precise prediction (output). The U-Net architecture demonstrated an impressive performance superior to the performance of other architectures in a segmentation competition. This brilliant performance of U-Net and its tolerance to small training datasets brought it to the limelight as a reliable architecture for the semantic segmentation of bio-medical images where the availability of a large, annotated dataset is a challenge.

J. SegNet
Badrinarayanan et al. [120] proposed SegNet segmentation networks (shown in Figure 10) by replacing the skip connection in U-Nets with the transfer of pooling indices to transfer information from encoder to decode network. SegNet have achieved cutting-edge success in the medical domain.

V. DATABASES AND PERFORMANCE METRICS A. DATABASES FOR RETINAL BLOOD VESSEL SEGMENTATION
There are different public databases available for interpreting and diagnosing retinal fundus images. This study focuses on only four (4) popularly used to evaluate proposed methods. These include the Digital Retinal Images for Vessel Extraction (DRIVE) [137], Structured Analysis of the Retina (STARE) [121], Child Heart and Health Study (CHASE_DB1) [122] and High-Resolution Fundus (HRF) [123], databases. The characteristics of each of these databases are briefly summarized in Table 6.

B. PERFORMANCE METRICS
Many evaluation metrics are used to measure the performance of retinal blood vessels segmentation methods, but this study considers metrics that are the most common among the literature reviewed in this paper. The metrics are accuracy, sensitivity, specificity, AUC, precision, and F1-Score as depicted in the following Equations: Specificity : Precision :

VI. TAXONOMY OF STATE-OF-THE-ART RETINAL BLOOD VESSEL SEGMENTATION METHODS
The semantic segmentation of retinal blood vessels from retinal images for clinical analysis and diagnosis has received numerous applications such as the classical method: Canny edge operators technique [124], Gradient-based edge operators [125], Sobel based edge operators [126], Robert and Krish differential based edge operators [127], Zero-Crossing based edge operators [128], Prewitt based edge operators [129], classical edge detection filter, and artificial neural network (ANN) [130], traditional unsupervised: Matched Filter [131], Clustering [132], Model-based [133], Multi-scale [134], Adaptive Thresholding [135], and supervised methods; GMM [136], kNN [137], Support Vector Machine [138], [139]. The singular aim of these methods is to obtain a model with optimal predictive ability and efficiency. However, their reliance on expert knowledge in the domain, mathematical operations and hand-engineered feature extraction strategies to determine the relationship of a feature to a prognosis renders these methods inefficient and unreliable for accurate clinical diagnosis. Hence, the development of a deep learning approach for retinal blood vessel segmentation, despite its hunger for massive data for training and high computational requirement, is inevitable.
Moreover, its segmentation performance supersedes other methods. However, though no model is perfect, its performance ability can be improved through efficient optimization and regulation techniques. In light of this, a review of 110 state-of-the-art retinal blood vessel segmentation methods is presented in this paper. Table 7 presents the summary of the nine important taxonomies attributes this study focuses on. These attributes are central to the optimal performance of a model in terms of predictive accuracy and good generalization ability.

A. IMAGE PRE-PROCESSING
Retinal fundus images are often used to analyse and diagnose retinal anatomical features and abnormalities, as they are the safest and cheapest acquisition means. Retinal fundus images are non-invasively acquired using fundus cameras. Due to camera limitations and environmental factors, the obtained images suffer visual complexities and quality degradations, including inconsistent contrast, noise, varying illumination, and heterogeneous background, as shown in Figure 1. Unfortunately, CNN models are vulnerable to these challenges. Therefore, to achieve precise segmentation output, the degraded retinal fundus images require improvements to boost the performance of CNN architectures.
Deep learning networks require a vast amount of data to learn relevant features to obtain a well-generalized model with optimal predictive accuracy and precise segmentation.
However, the available retinal fundus images are limited in size and costly to label. Many methods adopt data augmentation techniques, GAN, and transfer learning methods to resolve the challenge of limited data.

B. REGULARIZATION TECHNIQUES
Data augmentation is a regularization strategy conventionally used to expand images in the dataset as described in Section III B 2 of this study. To ascertain the effectiveness of this approach on the performance of CNN retinal vessel segmentation methods, Boudegga et al. [241] rotated each training image four (4) consecutive times at angles 30 0 , 60 0 , 120 0 , and 150 0 , respectively, and obtained the highest accuracy of 98.19% on the DRIVE dataset as presented in Table 8.
Similarly, [222] adopted a patch data augmentation technique and recorded the highest performance of 99.00% specificity. In addition, Lin et al. [165] applied random crop data augmentation to the training images and achieved a brilliant performance. Various algorithms [162], [182], [184], [238], [248] have employed various strategies depending on the task at hand.

C. POOLING OPERATIONS
CNN architectures use the pooling layer, as detailed in section III-A 3 and Table 4, to down-sample the input image and reduce parameters considered redundant [197], [217], [218], [232]. This operation often leads to the loss of spatial information necessary for clinical analysis and diagnosis, especially in the medical domain where every minute detail boosts accurate diagnosis. For this reason, [143] designed an architecture to extract the retinal blood vessels from the retinal image without the pooling layer. The validation of the proposed method on the DRIVE and STARE datasets obtained an excellent performance.
The studies in [149], [165], [230], [249] explored class weight balancing loss functions. The methods in [142], [173], and [189] used formulated loss functions to optimize their networks. Fan et al. [142] optimized using a MSE loss error function based on the L2-Norm to quantify the robustness of the weight matrices, while Yan et al. [173] formulated joint loss functions based on the L1-Norm to optimize a network for retinal blood vessels and pixel loss. In contrast, [189] used the binary cross-entropy loss function based on the L2-Norm to obtain optimal weight and reduce the training error.

E. DEEP LEARNING MODELS 1) GAN RETINAL BLOOD VESSEL SEGMENTATION METHODS
To solve the critical limitation of a few images available for training, a GAN is another automatic data augmentation strategy, different from the conventional approach in section VI B. A GAN is a deep learning network that learns unsupervised using unlabeled data. It consists of a generator for generating realistic predictions, and a discriminant which ascertains the authenticity of the generated output, as depicted in Figure 11. The studies in [168], [184], [206], [199], [206], [208], [210], [215], [216], and [215] explored the automatic potential in GANs' augmentation technique for expanding training data.
Dong et al. [184] proposed an efficient model with three stages: a U-Net with short connections as a generator for segmentation, a discriminator using an adversarial network, and a deep supervision module for retinal blood vessel segmentation. The proposed framework, tagged DS-LAN, simulates the distribution of retinal vessels from retinal fundus images. The method obtained a remarkable sensitivity score of 85.74% when validated on the DRIVE dataset. The authors in [216] constructed the generator by combining two (2) U-Net architectures, and the discriminator using the residual convolution module.
Implemented in [199] is a GAN and self-supervised model which bases the selection of hyperparameters for the optimization of the proposed network on the Particle Swarm Optimization (POS) technique [254]. Similarly, [206] explored a semi-supervised training approach by combining annotated and unannotated images. The authors in [168] designed a U-Net based GAN model called Retina-GAN. The proposed scheme used the U-Net as the generator, while the Pixel GAN, Patch GAN-1, Patch GAN-2, and Image GAN formed the discriminator networks. The performance of the proposed GAN network on the DRIVE and STARE datasets showed better preservation of fine retinal blood vessels than the U-Net.

2) CNN RETINAL BLOOD VESSEL SEGMENTATION METHODS
Following the unprecedented breakthroughs of CNN in classification tasks, several authors [142]- [145], [220], [249] have explored its potential for the semantic segmentation VOLUME 10, 2022       of retinal blood vessels from fundus images. Samuel and Veeramalai [249] proposed a novel CNN model for extracting retinal blood vessels from retinal fundus images and coronary angiograms using a VSSC Net. The method adopts a transfer learning approach using a pre-trained VGG-16 model as the backbone. The architecture contains VE_1 and VE_2 for blood vessels segmentation layers and is trained end-to-end on pre-processed data.
Vengali et al. [145] removed the pre-processing stage to avoid further degradation of retinal fundus images and loss of fine detail. The method fine-tuned a DeepLab-coco.largeFOV to extract the retinal blood vessels and enhanced the final output using a threshold in place of the last convolution layer. The validation of the proposed framework on the HRF database achieved an accuracy of 93.94% and a low AUC score of 89.4%.
Soomro et al. [157] implemented a CNN model for the segmentation of blood vessels. The authors pre-processed the retinal images to alleviate the low contrast, uneven illumination and noise visual complexities using morphology operations and Principal Component Analysis. A post-processing technique was used to eliminate noisy pixels and other unwanted background features from the CNN's predictions.
Khalaf et al. [144] constructed a seven-layer CNN architecture to extract retinal blood vessels. The method grouped the retinal image pixels into large, small, and background pixels to solve the class variation challenges. The extracted green VOLUME 10, 2022 channels were pre-processed using Adaptive Histogram Equalization (AHE) to improve the contrast [255], [257], and the morphology top-filter operator to enhance blood vessel visibility. Their evaluation of the framework on the DRIVE database obtained a good sensitivity score of 83.97%. Wu et al. [147] developed a combined vessel tracking model that is CNN-based. The method also explores PCA at the preprocessing stage to reduce the dimension of the input images. The proposed method performed well but suffered from disconnected vessels and missingness of the tiny vessels.
To improve the predictive accuracy of CNN retinal blood vessel segmentation methods, Zhou et al. [153] proposed a four-stage framework based on CNN for the segmentation of retinal blood vessels from retinal fundus images. The authors pre-processed the images in the stage one, generated discriminative features through a CNN model in stage two, used combined filters to enhance the thin blood vessels and reduce the intensity gap between the thin and thick blood vessels in stage three, and finally post-processed the discriminative features from the CNN using a dense CRF to obtain the final prediction. The validation of the proposed method on the DRIVE, STARE, CHASEDBI and HRF achieved a F1-score of 0.7942, 0.8017, 0.7644, 0.7627, respectively. CRF as a post-processing technique improves performance and aids the achievement of fine, precise final prediction. However, it introduces multiple stages and increases the computational time.

3) FCN RETINAL BLOOD VESSEL SEGMENTATION METHODS
To address the limitations of CNN architectures, [117] developed Fully Convolutional Networks (FCN) for semantic image segmentation. The authors in [146], [152], [162], [163], [195], [230], [246] also adopted same approach. Atli and Gedik [246] designed an FCN model that first up-samples before down-sampling to capture thick and thin blood vessels. The method also added residual modules to prevent loss of contextual information at the deeper section of the model. The proposed model was evaluated on the DRIVE, STARE, and CHASE-DB1 databases and obtained an excellent performance but obtained a low sensitivity score of 65% on the STARE dataset.
To alleviate the problem of spatial loss of information, Luo et al. [146] developed a size-invariant FCN to extract retinal blood vessels from retinal images. Dasgupta and Singh [152] proposed an FCN method to handle the challenge of diversity in the morphology of the retinal blood vessels. Their approach formulated the segmentation problem as a multi-label problem and optimized the network using the joint loss function. The performance of FCN models on the segmentation of retinal blood vessels have obtained remarkable performance. However, their predictions are often coarse with irregular boundaries.

4) U-Net RETINAL BLOOD VESSEL SEGMENTATION METHODS
The tolerance of U-Nets to small datasets, as well as their ability to produce output of the exact resolution as the input has made them attractive to researchers in this domain. The U-Net structure has experienced significant modifications for improvement due to its extensive application in retinal blood vessel segmentation. The studies in [231] and [239] modified the U-Net structure by replacing the convolutions with Atrous convolutions to increase the receptive field while maintaining the usual parameters to prevent spatial loss of information. In [204] and [187], the authors used the dropout concept to deactivate some neurons from the previous layer, thus, alleviating overfitting, minimizing the number of channels, and producing a model with good generalization ability. The modification in [193] used the attention gate technique to tackle the interference of background objects with the segmented retinal blood vessels. Similarly, [245] adopted the weighted attention gate approach to eliminate unwanted background features.
The algorithms presented in [179], [209], and [186] used dense U-Nets to avoid the learning of redundant activation maps and prevent the loss of detailed information, achieving better predictions with minimal parameters and computational cost. Adarsh et al. [236] implemented an auto-encoder network based on a U-Net and residual path. The authors in [226] and [230] used structural redundancy and active learning to eliminate redundancy, to enhance the U-Net's performance.
Dilated convolutions are explored in [185], [194], [189], and [192] to increase the size of the receptive field while maintaining the number of parameters to prevent the loss of spatial information. A new joint loss function, metric, and data augmentation are introduced in [166] and [173] to resolve data imbalance challenges and evaluate segmentation performance. To increase the depth of the network, maintain a reasonable convergence rate, and prevent gradient vanishing and loss of information, the studies in [169], [164], [193], [205], [242], introduced residual modules. To address the diversity in the size, scale, features, and shape of retinal vessels, [202] modified the U-net structure by introducing deformable receptive fields to reduce the missingness of vessels. The U-Net methods achieved promising results, and their performance evaluations are presented in Table 8. Nevertheless, some of the methods missed tiny vessels.

5) AUTO-ENCODERS RETINAL BLOOD VESSEL SEGMENTATION METHODS
An auto-encoder is an unsupervised deep neural network architecture for feature extraction and selection which is, however, prone to overfitting [257]. The network consists of distinct encoding and decoding phases as shown in Figure 12.
To resolve the problem of overfitting in auto-encoder networks, Vincent et al. [57] in 2008 designed the Denoising Auto-Encoder (DAE), which is a modification to the conventional auto-encoder. A DAE is an auto-encoder that partially corrupts input data with noise or a mask and trains the model to reconstruct and generate the original input. To exploit the benefits of DAEs, Fan Zhun and Jia-Jie Mo [142] proposed a supervised layer-wise initialized neural network based on DAEs aimed at segmenting the retinal blood vessels from the retinal image. The method fine-tuned the proposed network using a backpropagation algorithm. The technique adopted an MSE loss error function based on L2-Norm to quantify the potency of the weight matrices. The proposed method achieved an excellent average accuracy score of 96.12% and 96.14% on the DRIVE and STARE databases. The model also obtained a remarkable sensitivity score of 97.02% on the CHASE-DBI database. Adarsh et al. [236] designed an auto-encoder deep learning model based on the residual module and U-net architecture to extract retinal blood vessels effectively from the retinal fundus image. The proposed framework achieved an F1-Score of 0.8227 and AUROC of 97.95%.

F. ENSEMBLE LEARNING
An ensemble learning method is a training method that combines multiple classifiers into one meta classifier with the aim of obtaining better accuracy than when a single classifier is randomly used. The outputs from the different classifiers are then combined through averaging, probability scores, or voting, to enhance the accuracy of the ensemble method. Maji et al. [140] designed an ensemble ConvNet to alleviate the challenge of overfitting. The method averaged the output of the ensembled models to obtain the final output. The evaluation of the proposed ensemble method revealed a maximum accuracy of 94.7% and AUC score of 92.8%, as presented in Table 8.
The authors in [197] used five similar models to address the diversity characteristics of retinal blood vessels to obtain a precise final prediction. The proposed framework averaged the probabilistic map from the five sub-models and achieved an average performance that outperformed the individual sub-models. Proposed in [203] is an unsupervised ensemble method based on multiple segmentation algorithms for extracting retinal blood vessels from retinal images. The method selected M2UNet [201], LadderNet [258], and Ves-selNet [259] to fuse.

G. LEARNING METHOD 1) TRANSFER LEARNING U-Net RETINAL BLOOD VESSEL SEGMENTATION METHODS
The size of training images in a dataset plays a key role in obtaining a model with good generalization ability void of overfitting and coarse prediction. It boosts the efficiency of the model and improves its overall performance. Therefore, to leverage the performance of a deep learning architecture and make-up the insufficient retinal fundus images for network training, many studies [141], [149], [158], [176], [177], [191], [201], [211], [229], [249] have explored the transfer learning technique.
Inspired by transfer learning, Jiang et al. [162] implemented a three-step scheme to extract retinal blood vessels from retinal images. The authors pre-processed and augmented both the training and test datasets before using an AlexNet fully convolutional network to segment the retinal vessels. The AlexNet prediction was further post-processed using Otsu thresholding.
Likewise, [249] proposed a deep learning algorithm based on VGG-16 for transfer learning to extract retinal vessels from retinal images. The proposed method obtained a significant sensitivity score of 87.38% on the STARE dataset.
To obtain a precise final output, [149] developed a CNN-CRF model for boundary detection and refined the CNN's predicted probability map with a post-processing technique called Conditional Random Fields (CRF). However, to maximally explore the benefits of the transfer learning approach, it is advisable to select a pre-trained model trained on a dataset similar to the one to be trained. But on the other hand, due to the limited size of the available retinal image datasets and medical images in general, it isn't easy to get a pre-trained model trained on a similar dataset.

VII. TREND ANALYSIS, CHALLENGES, AND FUTURE DIRECTIONS
To meet the set objectives, the trend analysis, challenges, suggestions, and future research directions are detailed in this section.

A. ANALYSIS OF DEEP LEARNING RETINAL BLOOD VESSEL SEGMENTATION METHOD TRENDS
The analysis of deep learning retinal blood vessel segmentation method trends from 2016 to 2021 is presented as follows:

1) ANALYSIS OF THE TREND OF DATA AUGMENTATION USAGE
Despite the challenge of data scarcity in this domain and the need to train deep learning architectures with large datasets, the trend in data augmentation usage from year 2016 to  2019 increased, but then drops from 2020 to 2021, as depicted in Figure 13(a).
Also, in Figure 13(b), only 42% of the literature reviewed used data augmentation, while 58% did not. These are indications that many of the existing methods have not explored the advantages of data augmentation as a regularization technique. This may be because the exposure of training network to different data transformations prevents overfitting which may in turn boost the generalization ability of a model. However, the application of data augmentation techniques may incur more overhead cost, computational resources, and training time. More importantly, it lacks the ability to tackle biases in limited datasets.

2) ANALYSIS OF DEEP LEARNING MODEL TRENDS
Several deep learning architectures have been used to extract retinal blood vessels from retinal fundus images. The trend in Figure 14(a) reveals a steady increase in the usage of U-Net architectures from 2017 till 2020, among others. The drop in 2021 may be because more 2021 articles are yet to be published. However, the trend shows that the U-Net architecture is the most preferred, while the auto-encoder is the least. In Figure 14(b), 35% of the state-of-the-art methods reviewed used a U-Net, confirming its suitability and great potential in the segmentation of bio-medical images as mentioned in Section VI-J. The following preferred model is the multi-model (19%) method for retinal blood vessel segmentation. The technique has demonstrated significant performance because it derives its learning efficiency from the strength of different networks. Nevertheless, the multi-model  method is more computationally complex to train. The usage percentage of other architectures ranges from 2% to 15%. Figure 15(a) shows an increase in the use of transfer learning in 2019 and a decrease in 2020 and 2021, while non-transfer learning maintained a steep increase from 2017 to 2021. In addition, 62% of the existing models reviewed learned from scratch, while 38% used transfer learning in Figure 15(b). This may be attributed to the limitation of pre-trained models on datasets whose features are different from medical images as mentioned in Section VI-D or the remarkable segmentation performance of U-Nets and the research interest it has received in this domain, its tolerance to small datasets and low computational resource requirement.   correlation in the base classifiers errors due to limited dataset. It is also perceived as a complex architecture that requires high computational resources. Figure 17(a) is a consistent increase in the usage of the ADAM and SGD optimizers among the reviewed papers over the years. In Figure 17(b), the distribution of each optimizer's usage is presented. More than five optimizers were used in the surveyed methods: RMSprop, NADAM, ADAMW, SGD, and ADAM. The distribution shows that the ADAM optimizer is most frequently used, followed by the SGD optimizer. 40% of the methods investigated used ADAM, while 34%, 1%, 2%, and 1% used SGD, NADAM, RMSprop and ADAMW optimizers, respectively. 22% of the papers did not specify the optimizer(s) used.

Observed in
The analysis of the trend of the usage of optimizers in deep learning retinal blood vessels segmentation methods above reveals a high space in the use of ADAM optimizers. The increased usage of ADAM optimizers may be attributed to its fast convergence, efficient computational resources, and invulnerability to noisy inputs such as retinal fundus images. More importantly, its combinational SGD and RMSprop optimization algorithm attributes make parameter tuning in ADAM optimizer more robust and efficient [264], [265]. The following commonly used optimizer is the SGD algorithm; although it has issues with speed in convergence and local minima, these limitations can be resolved using momentum.

6) ANALYSIS OF LOSS FUNCTION USAGE TRENDS
From the trend shown in Figure 18(a), the number of deep learning retinal blood vessels segmentation methods that employed a binary cross-entropy loss function rose steadily from 2016 to 2021. The distribution chart in Figure 18(b) also shows that 62% of the state-of-the-art techniques reviewed in this study used a binary loss function, 11% used class/weight balancing, 10% used dice loss, 1% used focal loss, and 16% used others (MSE, Jaccard, Negative loglikelihood, Euclidean, Hamming loss, and formulated loss functions).
The analysis of the distribution and usage trend above, show that binary cross entropy loss function is most preferred by many authors to segment the retinal vessels from retinal fundus images using deep learning approach. These methods achieved high accuracy, however, how authentic is the accuracy score of these methods? Especially where the datasets are not skewed to address the data class imbalance before the segmentation task. This is because the two classes (blood and non-blood vessel pixels) involved in this binary classification task are highly imbalanced with the latter having larger region (pixel distribution) than the former. Table 5 clearly shows that binary cross entropy is limited for handling non-uniform distribution among data classes, and it may result in the network biasness in accuracy towards the class with higher pixel distribution. Therefore, focal loss and dice loss functions are more suitable for hard to segment retinal fundus datasets with class imbalance challenge.

7) ANALYSIS OF POOLING OPERATION USAGE TRENDS
The usage trend for max-pooling operation (as seen in Figure 19(a) consistently increased from 2016 to 2021 but rose more drastically in 2020. From 2016 to 2018, no method used average pooling. On the other hand, the idea of eliminating the pooling operation to solve the spatial loss of detail information received usage from 2017 to 2021, although with no record of usage in 2020. Figure 19(b) reveals that 84% of the surveyed deep learning retinal blood vessel segmentation algorithms used max-pooling, 10% removed the pooling layer completely, while 6% used average pooling.
The usage analysis above shows that max-pooling is still majorly used despite its known limitation of degradation in accuracy due to its down-sampling effect on the resolution of the input image. Nevertheless, the down-sampling operation prevents the selection of redundant features, reduces the number of parameters, and minimizes overfitting. Its strength lies in the fact that the spatial information of some pattern is encoded over the different patches of the feature map VOLUME 10, 2022 and obtain more information from maximum features than it obtains from the average features. On the other hand, the strided convolutions pooling layer focus on sparser windows of the inputs, while the average pooling operation focuses on the average feature presence. These two methods can result in either loss of information or feature with diluted information. In addition, the introduction of an efficient pooling layer that aids the reduction of the input image's size may lead to additional parameters, which may attract more training time. Nonetheless, the model's predictive accuracy is vital for accurate clinical diagnosis and interpretation in the medical domain. Figure 20(a) shows the trend in activation function usage over the years for deep learning retinal blood vessels segmentation. The usage of ReLU has consistently risen over the years, followed by LReLU, while the PReLU activation function was used by only one method. In Figure 20(b), ReLU, LReLU, and PReLU showed 64%, 5%, and 1% distribution usage, respectively, while the activation functions used in 30% of the methods are not specified.

8) ANALYSIS OF ACTIVATION FUNCTIONS USAGE TRENDS
From the distribution and trend of usage analysis, ReLU activation functions has attracted the interest of many authors, and this may be attributable to the fact that it is computationally economical (unlike ELU), has a remarkable effect on accuracy (unlike LReLU), minimizes overfitting, and resolves varnishing gradients. However, its neuron becomes dead whenever it is stuck in the negative side without recovery and outputs zero. This is an indication of 'dying' attribute. Therefore, Mish [262] and Swish [263] activation functions could be explored to resolve these limitations.

9) ANALYSIS OF PRE-PROCESSING USAGE TRENDS
The trend in pre-processing image usage in Figure 21(a) shows a consistent increase over the years, indicating that researchers have realized the impact of image pre-processing before segmentation on the deep learning architecture. Figure 21(b) shows that 67% of the methods surveyed preprocessed their images before the deep learning segmentation stage. The remaining 33% eschewed the pre-processing stage. This distribution reveals the awareness that adequate pre-processing steps can improve image quality and boost the segmentation performance of deep learning architectures. This distribution agrees with the thought that adequate preprocessing steps can improve image quality [245], enhance the CNN detection ability [260], and may boost the segmentation performance of deep learning architectures [261]. Another school of thought argues that image pre-processing distorts and degrades the image quality and hinders the CNN's optimal performance [145], [236], [242]. In my view, appropriately pre-processed images can improve the quality of the image and boost the CNN's performance. However, some of the enhancement algorithms may introduce some aftereffects on the images, nonetheless there are corrective measures capable of minimizing such effects. The key issue is the appropriate application of the enhancement algorithms. In my humble opinion, well designed and efficient model is good, but quality data is better.

1) DEGRADED IMAGE QUALITY
Data quality is directly correlated with the performance of deep learning models. Unfortunately, the retinal fundus images used to diagnose ophthalmological issues are often visually and qualitatively degraded due to non-invasive acquisition and environmental factors. These complexities hinder the proper learning of feature representations and reduce the predictive accuracy of the models.

2) LIMITED ANNOTATED DATA POINTS FOR NETWORK TRAINING
Deep learning models are data-hungry and require huge data volume for effective training and feature learning. However, labelled retinal fundus images are limited and are costly to annotate.

3) CLASS IMBALANCE
The retinal fundus image's area of interest which is the foreground (blood vessels), is much smaller than the remaining area (known as the background, i.e., non-blood vessels). This uneven distribution in the groups of pixels to be classified gives rise to the problem of class imbalance. This problem can cause the network bias inaccuracy towards the more represented class.

4) HETEROGENEITY IN THE RETINAL BLOOD VESSEL MORPHOLOGICAL STRUCTURE
Retinal vessels vary in size, scale, width, texture, etc. These variations sometimes make it difficult for deep learning architectures to predict the class (blood vessel or non-blood vessel) a particular pixel belongs to.

5) LACK OF A STANDARD PRE-PROCESSING APPROACH
There is no uniform pre-processing module: authors use low standard techniques, leading to further degradation where adequate measures are not taken.

6) MULTICLASS CLASSIFICATION LIMITATION OF CNN MODELS
Many deep learning models perform better on binary classification tasks than multi-classification, especially when fully connected layers are included in the architectural design.

7) THE TRADE-OFF BETWEEN DEEP LEARNING MODELS, CONVERGENCE SPEED, GRADIENT VANISHING, AND GRADIENT EXPLODE
Deep networks usually obtain high accuracy, but at the expense of convergence speed due to the network's depth and a massive number of parameters.

8) THE TRADE-OFF BETWEEN POOLING AND ACCURACY (LOSS OF DETAILED INFORMATION)
A pooling layer eliminates some features considered to be redundant based on its mathematical assumption. Some of the eliminated features degrade accuracy because they are critical minute details necessary for accurate analysis and diagnosis.

C. SUGGESTED PERFORMANCE ENHANCEMENT TECHNIQUES FOR FUTURE RESEARCH
From the comprehensive and critical analytical review, the identified gaps outlined above have limited the optimal performance and predictive accuracy of some methods. This paper gathered possible techniques that can resolve the identified limitations. The following suggestions would aid researchers in this domain to develop an efficient deep learning model for automatic retinal blood vessel segmentation:

1) DATA QUALITY
Retinal fundus images suffer quality degradation during acquisition, and this requires improvement through efficient pre-processing techniques that can adequately enhance the quality of the images. Unfortunately, 33% of the methods reviewed in this paper eschewed the pre-processing stage for fear of further degradation or other reasons. Meanwhile, the visual complexities of medical images, such as noise and low contrast, can mask details that are critical to medical analysis and diagnosis. The obscurity of details can affect the learning of feature representation in an image and hinder the optimal performance of deep learning models. Hence, it is essential to pre-process using adequate and efficient techniques to enhance the quality of the images.

2) LIMITED LABELLED DATASETS
• Data augmentation is a regularization technique that improves a trained model's generalization ability. It moderately transforms the original input images in the training dataset by implementing some random transformations such as random rotations, cropping right and left shifts, shearing, horizontal and vertical flips, and so on. Data augmentation transformation techniques expose the network to 'new images' (in appearance, but not in labels), giving the illusion of a large training dataset, so that the network can thoroughly learn underlying details. Nevertheless, the transformation techniques selection should be carefully made to minimize the effect of the additional computational cost.
• The application of GANs is an automatic data augmentation strategy that can solve the critical limitation of a small training set. GANs are deep learning networks that have demonstrated remarkable results, outperforming conventional augmentation strategies.
• Transfer learning can be used to leverage the performance of a deep learning architecture and make up the insufficient retinal fundus images for network training. However, to maximize the full benefits of the transfer learning approach, it is advisable to select a model which has been pre-trained on a dataset similar to the one to be trained. Proper implementation of the above techniques would prevent overfitting, reduce training errors, and improve the generalization ability of deep learning models.

3) CLASS IMBALANCE
To solve the problem of unequal distribution between labels to be classified (i.e., blood vessel and non-blood vessel), the dice, focal, weighted class balance, and properly formulated joint loss functions are more suitable.

4) HETEROGENEITY IN THE RETINAL BLOOD VESSEL MORPHOLOGICAL STRUCTURE
The variations in the morphological structure of the retinal blood vessels can be handled using Deformable receptive fields, dense blocks, skip connections, to improve the detection of retinal blood vessels irrespective of the size, shape, scale, etc.

5) LACK OF A STANDARD PRE-PROCESSING APPROACH
A uniform pre-processing technique can be proposed and adopted to standardize the pre-processing of retinal fundus images. This would ensure fairness in performance comparison.

6) MULTICLASS CLASSIFICATION LIMITATION OF CNN MODELS
The challenge of multi-classification limitation in CNN models can be resolved by replacing the fully connected layers VOLUME 10, 2022 with a convolution layer and SoftMax functions at the output layer.

7) THE TRADE-OFF BETWEEN DEEP LEARNING MODELS, THE CONVERGENCE SPEED, AND VANISHING GRADIENTS
Vanishing gradients is one of the significant drawbacks of deep learning networks. This challenge hampers convergence speed and prediction accuracy. Some techniques which could be applied to alleviate this challenge are: • Skip connections. These are most suitable to solve the above trade-off because they optimize error propagation which aids good generalization of models and prevents gradients as low as zero. These attributes aid convergence speed and feature reusability. The residual learning approach is another suitable element which can be adopted to preventing gradient vanishing and exploding.
• Activation functions, which play a significant role in tackling vanishing gradients and poor convergence rates. However, the wrong function choice may not solve the vanishing gradient challenge. Table 3 details some activation functions and their attributes and can serve as a guide to the appropriate selection of suitable activation functions.
• Batch Normalization. This technique has also been shown to speed up convergence.

8) THE TRADE-OFF BETWEEN POOLING AND ACCURACY (LOSS OF DETAILED INFORMATION)
The pooling operations at the pooling layers are responsible for feature selection and image down-sampling, which are aimed at removing redundant features, thus reducing the computational cost and stress of processing millions of parameters. However, this operation removes the fine details necessary for accurate clinical diagnosis. To handle this: • Skip connections could be used to make sure that every layer uses features that were not used in the previous layers. This ensures that every minute detail is captured at every layer, irrespective of the model's depth. Dilation convolution and residual learning techniques can also prevent the loss of detailed information such as tiny retinal blood vessels.
• The pooling layer could be replaced with a stride convolution layer.

9) ENSEMBLE LEARNING
An ensemble learning method combines multiple classifiers into one meta classifier to obtain better accuracy than when a single classifier is used. This learning method has demonstrated excellent performance. However, despite its proven tendency to produce higher accuracy, only 2% of the reviewed methods adopted it. This implies that its benefits are not fully exploited in this domain, perhaps because of its architectural complexities.

VIII. CONCLUSION
This paper carried out a comprehensive and critical analytical review of 110 published articles on deep learning retinal blood vessel segmentation methods spanning from 2016 to 2021. It summarizes the critical attributes of techniques central to the efficiency of state-of-the-art deep learning retinal blood vessel segmentation algorithms. It also focuses on the vital taxonomy and its characteristics, presents trend analyses on recent existing literature, identifies gaps to be filled in future research, and provides possible solutions to bridge identified gaps. This survey is significant as an aid and guide for future research in a direction which would adopt optimal techniques to enhance the predictive accuracy and generalizability of new deep learning algorithms for the automatic segmentation of retinal blood vessels from retinal fundus images. This improvement will facilitate the approval of more algorithms by the US FDA and other national and international regulatory bodies. This, in turn, would ease access to screening facilities for the early detection of diabetic retinopathy and its progression, preventing many prospective victims of impaired vision from premature vision loss and blindness. As observed from the World Vision Report in Section I of this study, about 1 billion people could have been prevented from untimely blindness through early detection and timely clinical intervention and management [16]- [18].