Deep Learning-Based Forearm Subcutaneous Veins Segmentation

In most of the medical treatments, intravenous catheterization is considered as a first crucial phase, in which health practitioners find the superficial vein to conduct blood sampling or medication procedures. In some patients these veins are hard to localize due to different physiological characteristics such as dark skin tone, scars, vein depth etc., which mostly results in multiple attempts for needle insertion. This causes pain, delayed treatment, bleeding, and even infections. To reduce these risks, an automated veins detection method is needed that can efficiently segment the veins and produce realistic results for cannulation purposes. For this purpose, many imaging modalities such as Photoacoustic, Trans-illumination, ultrasound, Near-Infrared etc. are used. Among these modalities Near-Infrared (NIR) imaging modality is considered most suitable due to its lower cost and non-ionizing nature. Over the past few years, subcutaneous veins localization using NIR have attracted increasing attention in the field of health care and biomedical engineering. Therefore, the proposed research work is based on NIR images for forearm subcutaneous veins segmentation. This paper presents a deep learning-based approach called Generative Adversarial Networks (GAN) for segmentation/localization of forearm veins. GANs have shown exciting results in the medical imaging field recently. These are used for unsupervised feature learning and image-to-image translation applications. These networks generate realistic results by learning data mapping from one state to another. Since GANs can produce state of the art results, therefore we have proposed a Pix2Pix GAN for segmentation of forearm veins. The proposed algorithm is trained and tested on forearm subcutaneous veins image dataset. The proposed model outperforms traditional approaches with the mean accuracy and sensitivity, values obtained are 0.971 and 0.862 respectively. The dice coefficient and Intersection over Union (IoU) score are respectively 0.962 and 0.936 which are better than the state-of-the-art methods.


I. INTRODUCTION
It is estimated that about 90-95% of patients in the hospital receive intravenous (IV) catheterization. As IV technique is the fastest way to provide the treatment and medications to patients, therefore it is used for blood transfusions, medications and diagnostic tests. It has been reported that around 500 million intravenous procedures are performed each year worldwide, with a failure rate of approximately 14 million in the first attempt [1]. This failure in intravenous insertion usually causes pain and might result in veins injuries and bruising. The main reason for the failed insertion is that it is difficult to accurately locate the vein and is solely based The associate editor coordinating the review of this manuscript and approving it for publication was R. K. Tripathy . on visualizing and experience of medical practitioners [2]. They face a lot of difficulty when it comes to children, people having darker skin tones and people suffering from obesity. In addition to this, narrowing of veins which occur due to increase in blood pressure levels aggravate the problem of vein identification. One study estimated that on average, each patient requiring intravenous medication needs 2.18 venipuncture attempts to place a proper catheter [3]. Enhancing the visualization of veins can increase rate of success of intravenous cannulation process [4].
Human skin basically comprises of three fundamental layers: epidermis, dermis, and hypodermis [5]. The epidermis (outer) layer absorbs some portion of light, and this light reaches to the tissue which is located beneath this layer. This is the point where light dispersion phenomenon occurs [6]. Most of the dissipating phenomena happen inside the dermis layer before it spreads to the hypodermis, although some part of this light is absorbed [7]. The presence of subcutaneous fat disperses a huge part of light. However, because of the presence of hemoglobin in blood vessels, a portion of this light is absorbed. The blood in subcutaneous veins has greater concentration of deoxyhemoglobin than hemoglobin [8]. These two hemoglobin types possess different light absorption capabilities. The light absorption by veins is greater than light absorption by arteries at the wavelength range of 600 to 800 nm. It decreases rapidly with respect to deoxyhemoglobin levels in blood while it rises slightly and then decreases with respect to hemoglobin [9].
The vein detection systems work by illuminating the patient's skin by light and acquiring the image using a camera sensitive to light [10]. The acquired image is then processed for vessels segmentation. For vein localization, transillumination, photoacoustic and near infra-red (NIR) imaging techniques are widely used. Transillumination technique uses high intensity light waves and can cause skin burn cases during localization of veins [11]. Photoacoustic technique uses optical and ultrasound systems which are complicated and faces system portability issues [12]. NIR technique is however cost effective and can be applied several times on patients without any harmful effects [13].
The NIR image is processed to segment veins for enhancing veins visualization for catheterization procedures. Most common techniques used are contrast enhancement, particularly Contrast limited adaptive histogram equalization (CLAHE) technique [14], thresholding techniques [15], region of interest (ROI) identification [16], erosion or closing operations [17], edge detection [18], line detection [19], high pass or morphological windowed filtering [18] and noise removal using high frequency filtration and blurring [15]. In addition to this, deep learning algorithms are also used for segmentation purposes. Based on their enhanced performance in various medical applications, deep learning is quickly turning into state of the art. They have shown substantially better results and are capable for producing higher segmentation accuracies.
Since from medical point of view vein detection and segmentation is found to be very critical procedure, therefore, to reduce the risks of mishaps a deep learning algorithm is needed that can efficiently detect the veins for cannulation purposes and produce higher segmentation results that have already been obtained on brain, lungs, and retinal segmentation etc. For retinal blood vessel segmentation, Generative Adversarial Networks with U-Net (U-GAN) [20] and Topology Ranking GAN (TR-GAN) [21] have achieved accuracies of 96.15 and 96.29 respectively on DRIVE dataset. For palm vein identification Multi-Scale and Multi-Direction GAN and Convolutional Neural Network (MSMDGAN + CNN) [22] has been proposed, which has produced state of the art results in terms of accuracy. Similarly for finger vein verification Finger vein representation using generative adversarial networks (FV-GAN) [23] based on cycle GAN is proposed which robustly extracts vein patterns and significantly improved equal error rate (EER) and verification accuracy. These examples show that for forearm vein segmentation GAN based network is needed since it can produce higher accuracies even on limited datasets.
The contributions of our work include: a) Proposing a novel technique of forearm vein segmentation by developing a Pix2Pix GAN network with the implementation on the augmented dataset for forearm vein segmentation. To the best of our knowledge, we are the first to use GAN for forearm vein segmentation. b). Augmentation technique presented for forearm images dataset detailed in section III. The remaining of this paper is organized as follows: RELATED WORK is described in section II, section III presents MATERIALS AND METHOD, section IV RESULTS AND DISCUSSIONS, while in section V CONCLUSION AND FUTURE WORKS are described.

II. RELATED WORK
Fernández and Armada [24] proposed a multi-sensor system for subcutaneous veins detection and localization. Two techniques have been experimentally tested for the extraction of veins. One is an adaptive method which uses maximum curvature approach and other is based on k-means clustering. These require several pre-processing steps to boost efficiency in the detection. Because these two methods provide adequate detection accuracy, the approach based on clustering of k-means has higher true positive rate (TPR) detection, making it more acceptable for the registration process.
Mela et al. in [25] presented real time venous imaging framework that combined visible light spectrum and NIR images. Optical instrument using reflectance mode and transmission mode respectively were used for acquiring images. 25 different subjects were selected for experimental purpose. Experimental results showed that the system is capable to detect veins having diameter <0.5 mm captured at any distance from the location. The device is also able to detect deeper veins of 0.25mm kept at fixed distance of 30cm.
Gunawan et al. in [26] deployed an algorithm for improving quality of images by using high boost filter and worked on the back projection by utilizing the intersection of the camera and projector image. The NIR technique is used to obtain veins images. High boost filter is used for segmentation and morphology and closing process along with contour region is used to eliminate segmentation noise. 84.62% accuracies have been reported by using these combinations. Jaemin Son et al. in [27] proposed a method of generating adversarial training to generate an accurate chart of retinal blood vessels. The GANs framework efficiently performed retinal vessels segmentation. The result outcome showed that with the use of discriminator, blood vessels segmentation was more accurate in comparison with U-net which has no discriminator. Compared with the best existing methods, this method has fewer false positives at thin blood vessels and draws clearer lines in places with appropriate details. The dice coefficient obtained by this method on the DRIVE dataset is VOLUME 10, 2022 0.829, and the dice coefficient obtained on the STARE dataset is 0.834, thus producing state of art results.
Neff et al. in [28] proposed a transformation technique for generative adversarial network; in which generative network is trained to produce the unnatural images and their consistent segmentation masks. Evaluation of their proposed network does lung segmentation task by using the thorax X-ray images. The results show that U-Net trained specifically for GAN images with reduced data set showed better results as compared to U-Net trained for GAN images generated using complete data set.
Atli et al., in [29] proposed a deep leaning architecture names Sine-Net for segmentation of retinal blood vessels. They performed up sampling proceeded by down sampling technique with residuals to detect thin vessels. The reported sensitivity on STARE dataset was low and need to be improved.
Park et al. in [30] proposed M-GAN network to improve vessel segmentation task. The proposed network contains M-generator and M-discriminator for efficient training. Furthermore, to achieve better results of segmentation of different sizes of vessels, a multi-kernel pooling block is introduced. The M-GAN showed best performance on most of the evaluation metrics of recall, precision, specificity, accuracy, IoU, F1 score, and Matthew's correlation coefficient (MCC).
The present work mainly aims to reduce multiple attempts for various needle insertion procedures and to have clear vein visibility, a GAN network using NIR imaging system is proposed for subcutaneous arm vein segmentation. The network is then evaluated on various performance metrics, along its comparison is made with classical approaches used for forearm vein segmentation.

III. MATERIALS AND METHODS
The deep learning methods have produced better results with higher accuracy compared to the conventional techniques [31]. These methods have also improved the efficiency of analyzing data because of their optimized automated characteristics. Medical image analysis is a challenging task due to insufficient, expensive, and overly restrictive data due to reasons like patient's privacy etc. In addition, publicly available medical image datasets are often inconsistent in annotation and size thus putting limitations on the development of efficient diagnostic systems. Therefore, the generation of synthetic and segmented images helps tremendously in medical image analysis for efficient diagnostic system. For this reason, generative adversarial networks (GANs) are considered beneficial for various applications, including vessel segmentation, which is a translation problem. The standard GAN architecture does not allow having control on the generated output and does not produce realistic results involving datasets with images having sharp edges and contours [32]. The conditional GANs, a type of GANs which involve conditional generation of images through generator model, allow us to have control over the output. Pix2pix GANs are type of conditional GANs which involve image to image translation. In this work, we use pix2pix GANs to achieve higher vessel segmentation accuracy, ensuring meeting the demand for IV cannulation process.

A. DATASET
The experimental dataset for the proposed work initially contained 18 NIR images from different subjects having different skin tones [33]. The ground truth images of these subjects were verified by a certified radiologist. For image acquisition, wavelength of 800nm was used since this wavelength is independent of oxygen variation in blood stream [34] and the images acquired at this wavelength have higher contrast as compared to the image acquired under visible spectrum. A 2CCD camera (JAI-AD080CL) is used for this purpose. A 64-bit processing unit with PCI slots and a 2.5 GHz processor which is suitable with Dalsa X64-CL card, is utilized to obtain pictures of size 640 × 480. Since the number of images acquired by above mentioned setup was 18, we increased the dataset size for training Pix2Pix GAN using data augmentation technique. Data augmentation is applied to both training and testing images. The technique includes: • Flipping images by horizontal axis.
• Rotating images by 180 • anticlockwise. The augmented dataset resulted in 72 images.
Out of these 72 images, 49 images are used for training, 5 images are used for validation and 18 images are used for testing. The augmentation process is used to overcome the problem of overfitting, which might be encountered during training Pix2Pix GAN. The sample images for data augmentation along with ground truth images are shown in Figure 1.

B. TRAINING PIX2PIX GAN FOR FOREARM VEIN SEGMENTATION
The basic GAN model architecture proposed by [35], involves two sub-models: a generator model for generating new examples and a discriminator model for classifying whether generated examples are real (from the domain) or fake (generated by the generator model). The conditional generative adversarial network, or cGAN for short, is a type of GAN that involves the conditional generation of images by a generator model. Image generation can be conditional on a class label, if available, allowing the targeted generation of images of a given type [36]. The Pix2Pix model is a cGAN in which the generation of the output image depends on the input. The discriminator is given both input and desired images and trained in a manner that it is ready to decide whether the desired image is a transformation of input image or not. Generator however is used to generate reasonable images in the target domain by training it through adversarial loss. It is also updated by the L1 loss which is the loss calculated between expected output and generated images. This loss helps generator to create a reasonable translation of source images.
Goh et al. in [33], determined the optimum filter for NIR forearm vein segmentation through analysis and comparison  of several vein filters such as Hessian, Gabor-wavelet based, Frangi, Top-Hat and Matched Filter. Their experimental results showed that Matched Filter outperforms all other filters. They divided the dataset into 3 groups i.e., G1, G2 and G3, and used 3-fold cross validation. Similar, to their approach, we also used 3-fold cross validation, in which two groups are used for training and the third one is used for testing. The dataset split is as shown in Table 1.
In Pix2Pix for forearm vein segmentation, discriminator model uses PatchGAN which classify patches of input images as real or fake instead of entire image. Its other benefit is that the same PatchGAN model can be applied to images having different sizes. Different configurations of PatchGAN are present in the literature, but for image-to-image translation 70×70 PatchGAN has resulted in better network performance and image quality [36].
The PatchGAN implementation is shown in Figure 2. As shown in Figure 3, the discriminator model takes real images of forearm veins along with ground truth images as input and calculates probability of whether the target image is a real translation of the source image or not. The structure of discriminator relies upon effective receptive field, through  which relationship is defined between one output activation to total number of pixels in the input image. This is called the PatchGAN model and is designed in such a way that predicted output is mapped to a 70 × 70 square or small patch of real forearm images.
Each discriminator output shows the probability that the color patch belonging to input image is real or not. These values are sometimes averaged to obtain overall likelihood. The discriminator model uses both real and generated images for training. This process is quite fast compared to generator, that's why loss of discriminator is reduced by half to slow down training during each iteration.
Discriminator Loss = 0.5 × Discriminator Loss (1) As stated earlier that GAN, consisting of generator and discriminator used for image segmentation, can produce higher accuracies even on limited datasets. The generator and discriminator are two neural networks. In the proposed cGAN, generator uses a U-Net architecture, [37]. The U-Net, proposed by Olaf Ronneberger et al. has shown to prove higher quality results as compared to encoder-decoder model [37]. They showed that such a network can be trained end-to-end from very few images and outperformed the prior best method. The model acquires real forearm vein images and generates ground truth images. It is done by first encoding input images to the bottleneck layer, then applying decoding on bottleneck layer by mapping it with output image size and uses skip-connections. It also consists of standardized blocks of convolutional, batch normalization, dropout, and activation layers. The generator implementation is shown in Figure 4. An image from training dataset is given to both discriminator and generator, but here generator output is connected to VOLUME 10, 2022  the discriminator input as desired image shown in Figure 5. The discriminator anticipates and calculates probability that the generated image is real segmentation image of input image or not. This composite model is updated with two objectives: Cross entropy loss showing that images that are generated are real thus forcing substantial streamlining of weights in generators such that it can produce more practical images and L1 loss that is compared with the output of generator after execution of realistic image translation. Compared with the discriminator model trained on the actual and target images, the discriminator trains the generator. The generator model is updated by the sum of L1 loss and adversarial loss which encourages generator to produce plausible images. This updating is done for loss minimization of produced image predicted by discriminator as authentic image. Hence during training, it is able to generate more realistic images.
The generator loss is given by.

D. TESTING PIX2PIX GAN FOR FOREARM VEIN SEGMENTATION
After training generator and discriminator model on training dataset, the generator model is evaluated on test dataset as shown in Figure 6. The inputs to the trained generator model are real forearm NIR images from test dataset. The output of this trained generator model are the segmented images. The evaluation matrices are then applied on these output images in order to judge the performance of trained model.

IV. RESULTS AND DISCUSSION
The training of Pix2Pix GAN is carried out using Keras with Tensorflow as backend on Google Colab having 12GB NVIDIA Tesla K80 GPU. The dataset division for training is shown in table 2.

A. TRAINING RESULTS
At various iterations during training results are saved and the training process is eventually stopped based on the generated image quality. The results shown in Figure 7 are training results saved after 45,000 iterations. Here first column has real forearm images, second column shows the generated output from the proposed architecture and third column has the ground truth images.

B. TESTING RESULTS
After the training, the trained generator model is saved, which is then used for testing. The test data of 18 images is fed into the trained generator model. The segmentation results through the trained generator model are shown in Figure 8. Here in Figure 8 first column shows source images which are real forearm vein images, second column     without augmentation. Furthermore, our proposed model showed even better accuracy when trained with data augmentation as compared to trained without augmentation as shown in table 4.

V. CONCLUSION AND FUTURE WORKS
To meet the critical requirements of accurate forearm vein segmentation a Pix2Pix GAN network is proposed, that can generate plausible translation of the real forearm images. The proposed method has shown considerable improvements in results when compared with conventional approaches of subcutaneous forearm vein segmentation with an improved accuracy of about 97% with augmentation. The extension and future work will emphasize on incorporating CT, MR and Ultrasound images in our experimentations and see how well our model performs to images belonging to different modalities. This approach will help in developing a standardized model that can segment forearm veins given any type of input images like CT, MR and Ultrasound. Moreover, different generator and discriminator models can be used in Pix2Pix network to check which combination leads to optimum results. Other than Pix2Pix GAN network, we also aim at developing subcutaneous forearm vein segmentation VOLUME 10, 2022 algorithms using cycle GAN, Deep Convolutional Generative Adversarial Network (DCGAN) etc. The comparative analysis of all these types of GANs will be a benchmark for other deep learning algorithms other than Generative Adversarial Networks (GANs).This will be a great deal because standardized segmentation model will be developed this way that can work on different types of images. Moreover, we can also incorporate different models in our generator and discriminator network for Pix2Pix and can also develop subcutaneous vein segmentation algorithms using cycle GAN, DCGAN etc. This comparative analysis will be a benchmark for other deep learning algorithms other than GANs. Limitations of proposed the method includes the general limitations of the cGAN.