Brachial Plexus Nerve Trunk Recognition From Ultrasound Images: A Comparative Study of Deep Learning Models

Brachial plexus block is a common regional anesthesia method widely used in upper limb surgery. Nowadays, ultrasound-guided brachial plexus block has been extensively used in clinical anesthesia. However, accurate brachial plexus block is highly dependent on the physician’s experience, and a physician without extensive clinical experience may cause nerve injury when performing a nerve block. With the development of artificial intelligence technology, the deep learning method can automatically identify the brachial plexus in ultrasound images and assist doctors in completing the brachial plexus block accurately and quickly. In this paper, we aim to evaluate the performance of different deep learning models in identifying brachial plexus (i.e., segmentation of brachial plexus) from ultrasonic images to explore the best models and training strategies for this task. To this end, we use a new dataset containing 340 brachial plexus ultrasound images annotated by three experienced clinicians. Among the 12 deep learning models we evaluated, U-Net achieves the best segmentation accuracy, with an intersection over union (IoU) of 68.50%. However, the number of U-Net parameters is very large, and it can only process 15 images per second. Compared to U-Net, LinkNet can process 142 images per second and achieve the second-best segmentation accuracy with an IoU of 66.27%. It achieves the balance between segmentation accuracy and processing efficiency, which has a good potential for the brachial plexus’s real-time segmentation task.


I. INTRODUCTION
Brachial plexus block is a common regional nerve block method, which has been widely used in upper limb surgery. Compared with general anesthesia, brachial plexus block anesthesia has minimum negative impact on the patient's physiology, fewer complications, and is more effective.
In clinical practice, medical staff has developed nerve block guidance techniques such as paraesthesia, nerve stimulation, and ultrasound guidance to enhance the success rate of The associate editor coordinating the review of this manuscript and approving it for publication was Chulhong Kim . regional anesthesia. Compared with other techniques, ultrasound guidance provides more intuitive and accurate nerve localization. It provides real-time images when blocking the target nerve, thus assisting the physician in achieving precise anesthesia [1]. However, ultrasound-guided brachial plexus blocks still face several challenges. Firstly, ultrasound images have many artifacts and noise, low contrast between tissues, and fuzzy boundaries [2], making it challenging to identify nerve block areas in clinical practice. Secondly, the use of ultrasound-guided localization techniques requires anesthesiologists who not only have a deep understanding of anatomical structures but also have extensive experience in VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the use of ultrasound imaging [3]. Anesthesiologists with extensive clinical experience are in short supply, and doctors with insufficient clinical experience may cause nerve injury when performing nerve blocks. Thirdly, for certain patients, such as those with obesity, edema, and muscle atrophy, the quality of ultrasound imaging is poorer, and the area of nerve block can be difficult to identify [4]. Based on the discussions above, it can be seen that accurately locating nerves in ultrasound images has become an important issue in the development of nerve blocks. To overcome the above problems, an effective approach is to introduce a computer-aided diagnosis (CAD) system, which can be used to supplement doctors' personal experience and knowledge to improve the reliability and efficiency of the brachial plexus block.
In recent years, the progress of CAD and its great potential in medical diagnosis have received extensive attention [5]. Researchers used digital image processing combined with traditional machine learning techniques to identify ultrasonic nerve block areas. González et al. [6] first use the graph cut method to pre-segment the ultrasound image to obtain the region of interest. Features are then extracted from the ultrasound image using a nonlinear wavelet transform. Finally, the pixels are classified using a Gaussian process classifier to obtain the neural region. Vashishtha and Aju [7] combine the canny edge detection algorithm with the support vector machine (SVM) to achieve neural segmentation in the ultrasonic image. Jimenez et al. [8] use random undersampling (RUS) and SVM for neural segmentation of ultrasound images. Although traditional segmentation methods can accomplish segmentation of neural block regions on smaller datasets, they all need to design feature extraction methods manually. The implementation can be complex, and it is challenging to achieve accurate segmentation results.
In contrast to machine learning, deep learning does not need to design the feature extraction process manually, and it can automatically learn from data during the training process to obtain excellent feature representation. It has succeeded in computer vision, bioinformatics, and other fields. Currently, deep learning methods have been applied to the whole process of medical image processing and analysis. They have achieved remarkable results in classification [9], detection [10], segmentation [11], reconstruction [12], and registration [13] tasks. In the research of brachial plexus segmentation, the computer-aided diagnosis system based on deep learning has achieved great results and has become the mainstream direction of related research. Wei and Tong [14] propose a dual-path U-shaped network with an attention mechanism. In this network, two paths are used to replace the ordinary convolution layer, and the attention mechanism is used to improve the efficiency and accuracy of segmentation. Van Boxtel et al. [15] propose a hybrid model consisting of a classification and segmentation model to segment brachial plexus regions in ultrasound images. The experimental results show that the segmentation performance of the hybrid model is significantly improved compared to the single segmentation model. Zhao and Sun [16] propose an end-to-end method based on U-Net for automatic segmentation of the brachial plexus from ultrasound images, which achieves good segmentation performance. In addition to brachial plexus segmentation, deep learning is also used to identify nerve blocks in other body parts. Huang et al. [17] use the U-Net network to identify femoral nerve block regions from ultrasound images. Smistad et al. [18] use a convolutional neural network to segment axillary nerves. Horng et al. [19] combine U-Net and recurrent neural network for localization and segmentation of median nerve.
Although many studies use deep learning to identify nerves in ultrasound images, the relative performance results are not clear when using different deep learning models, training strategies, or loss functions. Researchers have made many contributions to comparative research in the field of medicine based on machine learning or deep learning [20]- [24]. However, such investigations have been missing in the research topic of identifying brachial plexus nerves in ultrasound images using deep learning methods. We attempt to contribute to this comparative study. Therefore, this paper comprehensively evaluates the performance of different deep learning models in brachial plexus ultrasound image segmentation.
It is worth noting that all current studies in brachial plexus identification are to segment a nerve block region rather than individual nerve trunks. However, compared with segmenting only a nerve block region, segmenting individual nerve trunks can assist doctors in locating the position of each brachial plexus trunk more accurately to improve the effectiveness of the brachial plexus block [25]. Therefore, in this study, we segment individual brachial plexus trunks. We use a new dataset containing 340 ultrasound images of brachial plexus and labeled by three clinicians [25]. Then we implement 12 deep learning models and comprehensively evaluate their performance from the following aspects to find the model and training strategy suitable for this nerve segmentation task. (1) To evaluate the neural segmentation performance of different models (including lightweight real-time segmentation networks), we compare the segmentation accuracy and speed. (2) We explore the role of deep transfer learning in network training. (3) We evaluate the impact of different loss functions on the model segmentation accuracy. (4) We evaluate the model's generalization performance in identifying the brachial plexus nerve in images from a new ultrasound machine.
The rest of this paper is organized as follows. Section II details the dataset, the 12 deep learning models used, the transfer learning process, and the loss function. Section III presents the experimental results and analysis, including the experimental setup and evaluation methods. In Section IV, we discuss the results. Finally, Section V concludes the paper with suggested future work.

A. DATASETS 1) DATA DESCRIPTION
To obtain brachial plexus ultrasound images, we cooperate with the Affiliated Hospital of Medicine School of Ningbo University and the Sixth Hospital of Ningbo to collect 340 brachial plexus ultrasound images [25]. To increase the heterogeneity of the data, we collect data from as many patients as possible rather than multiple images from the same patient. In addition, data is collected from two different ultrasound machines (YGY, BK3000). Our images are labeled by three professional anesthesiologists using Labelme [26]. They all have more than seven years of experience in ultrasound-guided nerve block anesthesia. An example of the brachial plexus ultrasound dataset is presented in Fig. 1. Furthermore, we directly annotate the neural trunk of the brachial plexus. To our knowledge, there is no publicly available brachial plexus ultrasound image dataset as annotated as ours. In addition, to validate the role of transfer learning in neural network model training, we use a publicly available dataset of femoral nerve ultrasound images. Huang et al. [17] created this publicly accessible dataset of femoral nerve ultrasound images in the GitHub repository. There are a total of 562 ultrasound images in this dataset. Since the femoral nerve is difficult to identify, they did not label it directly but annotated the connective tissue surrounded by the iliofascial membrane and the iliopsoas. This is a crucial area for identifying the femoral nerve. An example of the femoral nerve dataset is shown in Fig. 2.

2) DATA PREPROCESSING
Raw ultrasound data needs preprocessing before model training in order to obtain good neural segmentation performance results. Firstly, the collected ultrasound images may contain  ultrasound equipment and patient information, and we crop each image to ensure anonymity and remove redundant information. Secondly, the pixel values of the brachial plexus ultrasound images are mainly concentrated in low gray value areas, and the images are dark and low in contrast. Therefore, we perform enhancement processing on the brachial plexus ultrasound images using the contrast limited adaptive histogram equalization (CLAHE) method to improve the image quality, as illustrated in Fig. 3.
As we all know, training with large-scale labeled data is one of the important reasons for the success of deep learning in various fields. However, obtaining sufficient amount of labeled training data remains a significant challenge in medical image analysis. In addition, deep learning models are prone to overfitting problems when the amount of data is insufficient, making it difficult for deep learning models to obtain satisfactory performance results. Data augmentation aims to make small changes to existing data to increase the amount of training data. It is an important step in the training process of deep learning models, which can alleviate the problem of small data volume, thereby reducing the overfitting phenomenon in deep learning model training and improving the performance of neural networks.
In this study, we use the Albumentations library [27] to perform data augmentation operations, which efficiently implements various image transformation operations. More specifically, we implement two data augmentation methods, random cropping, and random flipping, using the Compose method in the Albumentations library. In order to reduce the chance of cropping out the image of the neural region due to random cropping, we first scale the image to 300 × 380 to make it similar to the size of the target image during random cropping. Then we randomly crop the image to 256 × 320. Finally, we flip the image horizontally around the X-axis and horizontally around the Y-axis with a probability of 0.5. In our experiment, images are randomly processed in data augmentation and returned to the network during training.

B. DEEP LEARNING MODELS
The advancement of the deep learning technology has greatly promoted the development of the field of image segmentation. This study explores and evaluates the effect of different deep learning models in brachial plexus ultrasound image segmentation. Ultrasound is widely used to guide nerve blocks during surgery due to its flexibility, convenience, and rapid real-time imaging. In fact, rapid real-time imaging is an important reason for using ultrasound in clinical nerve blocks [29]. Therefore, the deep learning model needs to be accurate when identifying brachial plexus and, at the same time, it needs to have good real-time performance. This is an essential prerequisite for the clinical applications of related technologies. In this paper, we evaluate 12 deep learning models (FCN [30], SegNet [31], PsPNet [32], U-Net [33], U-Net++ [34], DeepLabv3+ [35], GCN [36], LinkNet [37], ENet [38], BiseNet [39], DFANet [40], BiseNetV2 [41]), in which several are considered are real-time segmentation models (e.g., LinkNet, ENet, BiseNet, DFANet, BiseNetV2). A real-time segmentation model has faster speed and low requirements for hardware, which is convenient for deployment on mobile devices or embedded devices. Table 1 summarizes some parameters of the deep learning models used in our research, such as the number of parameters and the depth of the network (including activation layer, batch normalization and so on).

C. TRANSFER LEARNING
In recent years, the success of deep learning in many fields can be attributed to three factors, powerful computers, excellent algorithm models, and larger datasets [42]. However, obtaining enough training data for deep learning tasks in the medical field is still a significant challenge. Compared with traditional machine learning methods, deep learning is very much dependent on training data [43]. In medical image analysis, it is difficult for deep learning models to effectively extract image features by using only small samples for training. In addition, the small dataset may lead to overfitting of the deep learning model, making it difficult for the model to achieve satisfactory performance results and accomplish the expected goal.
To solve the above problem, a common approach is transfer learning. Studies have shown that transfer learning has achieved many results in medical image analysis and has been widely used for various research [44]. Transfer learning aims to extract knowledge from one or more tasks and use it in an intended task. Therefore, we can adopt transfer learning to deal with the problem of insufficient training data in medical image analysis and alleviate the dependence of deep learning models on training data [45]. Currently, there are some classic models in PyTorch. These models are trained on large-scale benchmark datasets such as Ima-geNet, which can provide pre-training parameters for the implementation of various deep learning tasks. Our study uses these classic architectures when implementing segmentation models, such as VGG [46], ResNet [47], and other network structures as feature extractors. The parameters of these classic models are trained on the ImageNet dataset. In addition, we pre-train our networks using the femoral neural dataset [17] and this dataset is similar to our data and can provide better initialization parameters for our segmentation task. Fig. 4 shows the transfer learning process in our research.

D. LOSS FUNCTION
The loss function aims to measure the difference between labels and predicted results. It is an essential part of image segmentation methods based on deep learning. Researchers have designed many loss functions for different image segmentation tasks. The loss function that fits the data characteristics will positively impact the segmentation results. However, choosing a loss function suitable for a task is challenging. Therefore, in this study, we evaluate the effect of several commonly used image segmentation loss functions in brachial plexus segmentation of ultrasonic images. The purpose is to provide a reference or a basis for researchers to select proper loss functions. The loss functions we evaluated are cross entropy (CE) loss, dice loss [48], focal loss [49], combo loss [50], cross entropy loss with focal loss, and cross entropy loss with lovász-softmax loss [51].
Cross entropy is derived from the Kullback-Leibler (KL) divergence and is an indicator to measure the difference between two distributions [52]. Cross entropy loss is one of the most commonly used loss functions. Its effect is stable and can be used in most semantic segmentation scenarios. Dice loss can directly optimize the dice similarity coefficient (DSC), which is an indicator used to measure the similarity of sets and is usually used to evaluate the segmentation performance of the model. Dice loss is suitable for unbalanced samples. Focal loss originates from the direction of target detection and is an improvement of standard cross entropy loss. It aims to solve the unbalanced number of complex and easy samples and unbalanced foreground and background categories. Combo loss is a combination of cross entropy loss and dice loss. It attempts to use dice loss to solve the class imbalance problem and uses cross entropy loss to smooth the curve. Cross entry loss with focal loss is a combination of cross entropy loss and focal loss. Cross entropy loss with lovász-softmax loss is a combination of cross entropy loss and lovász-softmax loss. Lovász-softmax loss is used to optimize Jaccard directly. Since using the lovász-softmax loss alone does not achieve the desired results, we combine the cross entropy loss with the lovász-softmax loss.

III. EXPERIMENTS AND ANALYSIS A. EXPERIMENTAL SETUP
We use PyTorch 1.10 deep learning framework to implement all deep learning models in our study. We complete all training and testing using two types of computers with Intel i3-10100F central processing units (CPU), NVIDIA GTX1050ti graphics processing units (GPU), and Intel i9-10900X CPU, NVIDIA RTX3090 GPU. For all segmentation models, the input image size is (256 × 320 × 3). We set the initial learning rate to 0.0001, the batch size to 8, the number of epochs to 200, and use the Adam algorithm as the optimizer for model training. In addition, we use Pytorch's ReduceLROnPlateau method to adjust the learning rate during the training process. The learning rate decreases to the original by 50% when the results do not improve after ten VOLUME 10, 2022  epochs. After completing training, we save the model weights that perform best on the validation set for testing. The above settings remain the same when running all models.
Our data contains 340 ultrasound images. In order to construct a reasonable dataset, we randomly select 34 images as the test dataset. To allow more data to be used for training, we divide the remaining ultrasound images into a training dataset and a validation dataset by 9:1. We do not divide the test dataset for the femoral neural dataset since our purpose is only to provide pre-training parameters for the neural network model to examine the effect of transfer learning.

B. EXPERIMENTAL RESULTS AND ANALYSIS 1) PERFORMANCE RESULTS OF DIFFERENT LOSS FUNCTIONS
The loss function is an important part of an image segmentation method based on deep learning. To determine the loss function suitable for the brachial plexus segmentation task, we evaluate several common image segmentation loss functions. The specific results are shown in Table 2. We use intersection over union (IoU) as the evaluation metric for segmentation accuracy. IoU measures the similarity between two sets. It is one of the most commonly used metrics to measure the accuracy of image segmentation, which is  where the Intersection is the overlapping part of predicted image and label, and the Union is the combination of the predicted image and label. The results in Table 2 show that no single loss function can achieve the best performance for all 12 deep learning models. But the compound loss functions (combo loss, cross entropy loss with focal loss, cross entropy loss with lovász-softmax loss) are the most robust loss, and they achieve the best performance for ten deep learning models. In addition, although the cross entropy loss function achieves only one best performance result, it achieves relatively good results across all models, with the most robust performance results among the three single losses (CE loss, dice loss, focal loss).

2) PERFORMANCE RESULTS FOR TRANSFER LEARNING
Transfer learning is a common method for training neural networks. In our study, we pre-train the neural network using the femoral nerve dataset to obtain initialization parameters for brachial plexus segmentation. Fig. 5 shows the results   of using the transfer learning method and the results of not using the transfer learning method. It can be seen from the figure that the segmentation accuracy is improved when using the transfer learning method to train the network. Especially in the real-time or lightweight network, the performance improvement is more prominent. Therefore, we suggest using transfer learning methods in brachial plexus recognition when the amount of data is insufficient, especially, when using lightweight networks.

3) PERFORMANCE RESULTS OF REAL-TIME SEGMENTATION NETWORKS
To analyze the performance results of each deep learning model, we compare all deep learning models from three different evaluation metrics of segmentation accuracy, inference speed, and hardware requirements. Table 3 shows the results of all comparisons. U-Net achieves the best result among all deep learning models with an IoU of 68.50%. To visually show the deep learning results in identifying brachial plexus, we select representative results of different IoU levels from the test results of U-Net and highlight them on original ultrasound images, as shown in Fig. 6. Among all the models, the segmentation accuracy of PsPNet is the lowest, and the IoU is only 54.63%, which has a large gap with the best results. The segmentation performance of the real-time segmentation models (LinkNet, ENet, BiseNet, DFANet, BiseNetV2) exceeds 59%, and the IoU of LinkNet reaches 66.27%, which is the second-best result among all models. Fig. 7 shows the distribution of IoU for different deep learning models on the test dataset.
The CLAHE algorithm optimizes the Adaptive Histogram Equalization (AHE) algorithm [53]. It avoids excessive image enhancement and overcomes the problem of excessive noise amplification by the AHE algorithm [54]. It is widely used in image enhancement processing. In our experiments, the ultrasound images are CLAHE processed using the built-in toolkit of the image processing tool OpenCV-Python, with clipLimit set to 1 and tileGridSize set to (8,8). By analyzing Table 3, we observe that all models' segmentation accuracy improved after processing the images with CLAHE. Therefore, we believe that CLAHE has a significant role in improving the performance of the models in identifying the brachial plexus nerve in ultrasound images.
Model inference speed is a critical factor for applying the brachial plexus recognition method based on deep learning in the clinic. Therefore, we compare the inference speed of different deep learning models, including the inference time of a single image and the number of frames that can be processed per second. We set the image size to 256 × 320 × 3 and perform inference speed tests on NVIDIA GTX1050ti graphics processing units and NVIDIA RTX3090graphics processing units. As shown in Table 3 U-Net++ has the slowest inference speed. It can process 28 images per second when using NVIDIA RTX3090 graphics processing units and only 3 images per second on NVIDIA GTX1050ti graphics processing units, much lower than the other models. Secondly, U-Net, SegNet, and GCN models can process fewer than 20 images per second on NVIDIA GTX1050ti graphics processing units, and the model inference speed is slow. However, LinkNet can process 142 images per second in NVIDIA GTX1050ti graphics processing units. The inference speed of ENet, BiseNet, and BiseNetV2 are also significantly higher than other networks. Compared with models such as U-Net, real-time networks such as LinkNet are faster and more suitable for real-time segmentation tasks such as brachial plexus recognition.
To further compare the potential of these 12 deep learning models in mobile or embedded deployment, we analyze their requirements for hardware. The metrics used are the multiply and accumulations (MACs) operations and the storage required to save model parameters. The results show that some models, such as U-Net, are computationally intensive, have many parameters, and require large storage space. The large storage space and computational complexity make it challenging to be applied to various hardware platforms effectively. Real-time segmentation models such as LinkNet are computationally simple. At the same time, its parameter storage requires smaller space, which reduces the demand for hardware storage for model deployment.

4) PERFORMANCE ANALYSIS OF THE MODEL ON DIFFERENT DATASETS
When using deep learning models to identify brachial plexus in actual clinical practice, their performance may be affected by factors such as acquisition parameters, equipment, and methods. To evaluate the generalization performance of deep learning models in identifying brachial plexus nerves in new ultrasound images, we test their performance on different datasets based on two ultrasound machine data. Specifically, we first train and test the model using ultrasound data from one machine. We then test the model using data from another ultrasound machine. Finally, we train and test it using mixed data.
As shown in Table 4, the deep learning model works better in identifying the brachial plexus nerve in images from the same ultrasound machine as the training data. When the model is trained using data from BK3000, its best IoU for segmenting the brachial plexus nerve on BK3000 is 63.02%. In comparison, its best IoU is only 50.24% when it is used to identify data from the YGY. When the model is trained using data from YGY, its best IoU for segmenting the brachial plexus nerve on YGY is 66.03%, while its best IoU is only 45.28% when it is used to identify data from BK3000. In addition, there is no significant difference in segmentation accuracy for each model when training with mixed data versus using one type of data alone. Fig. 8 shows the performance differences of each model on different datasets. In conclusion, it is difficult to obtain the same performance when applying a trained deep learning model to identify brachial plexus nerves in ultrasound images obtained by a new ultrasound machine.

IV. DISCUSSIONS
At present, deep learning has been applied to the whole process of medical image processing and has made outstanding achievements in various medical image analysis tasks. Using deep learning technology to identify brachial plexus in ultrasound images to assist doctors in nerve blocks is of great significance for improving the safety and reliability of nerve blocks. In our study, we implement 12 deep learning models for automatic identification of brachial plexus and evaluate their performance.
Among the 12 deep learning models we used, U-Net achieves the best segmentation result with an IoU of 68.50%, as shown in Fig. 9. Similar structures like U-Net, such as U-Net++, FCN, and SegNet, have achieved good segmentation results. However, the inference speed of these models is low, and the number of parameters is large. It is challenging to deploy them efficiently in mobile and embedded devices. In contrast, real-time segmentation networks such as LinkNet have fast inference speed, fewer model parameters, and lower requirements for deployment platforms. At the same time, these models can also achieve good segmentation results. Compared with models such as U-Net, LinkNet is faster and more suitable for real-time imaging methods such as ultrasound. In addition, the deep learning model obtains better segmentation results when using the same data (machine) for training and testing or using mixed data for training and testing. It is difficult to obtain good identification performance in new ultrasound-acquired images without training the source data.
Transfer learning is a common neural network training strategy. Through comparative experiments, we find that using transfer learning improved the segmentation accuracy of all models. Especially in lightweight networks, performance improvement is more prominent. Therefore, we suggest employing transfer learning in identifying the brachial plexus and using data from similar tasks to assist with the intended task. In comparing loss functions, we find that no loss can achieve the best performance in all 12 deep learning models, but the compound loss function is the most robust loss. In addition, the cross entropy loss is stable among the three single losses, and it can achieve a decent result in all models. Therefore, we suggest that the cross entropy loss be selected first when conducting related research. Then the compound loss function can be selected according to the characteristics of the data set to optimize the results.

V. CONCLUSION
In this study, we implemented 12 deep learning models for automatic segmentation of brachial plexus in ultrasound images and thoroughly evaluated their performance. The results show that complex models (e.g., FCN, SegNet, PsPNet, U-Net, U-Net++, DeepLabv3+, GCN) can often obtain better segmentation results, but the large storage space and computing resource consumption limit their applications on various hardware platforms. The real-time segmentation networks (e.g., LinkNet, ENet, BiseNet, DFANet, BiseNetV2) can achieve good segmentation results while improving the model's speed, which is more suitable for using real-time imaging equipment such as ultrasound. In addition, we also discussed the strategy of model training, the selection of loss functions, and the model's generalization performance on new data, which can be considered for future research.
Our work has a number of limitations, which we hope to address in the future. First, our dataset is small, with only 340 images. Although data augmentation can partially alleviate model overfitting and improve model performance, we will be able to obtain better results if more data are available. Therefore, we will continue to collect relevant data to establish a larger dataset of brachial plexus ultrasound images. Second, the highest IoU result of the 12 deep learning models we evaluated was only 68.50%. We will investigate new neural network models to improve the performance of brachial plexus recognition.