Ultrasound Image Segmentation Method for Thyroid Nodules Using ASPP Fusion Features

Ultrasound imaging technology plays an important role to assist doctors in diagnosing thyroid nodules. The tissue structure around the thyroid is very complex, which makes it difficult to segment and extract the ultrasound image of thyroid nodules accurately. For address this problem, this paper proposes a model algorithm for thyroid nodule ultrasound image segmentation using ASPP fusion features. First, spatial pyramid pooling and depthwise separable convolution are combined in order to solve the problem that the size of the mapping feature will change in the process of better capturing the context information. Besides, Atrous Spatial Pyramid Pooling (ASPP) is proposed to achieve the purpose of processing input image channel and spatial information separately. In order to appropriately reduce the dimension and size of feature images, a $1\times 1$ convolution operation is performed before each convolution calculation, and the model size is optimized. In the decoding stage, decoder module appropriately adjusts the feature map with a relatively low resolution previously from decoder module, and sets the output channel number of two convolutions to the same value. All features have the same dimension by adjustment, and features can be fused by element-wise summation. Finally, Dice Similarity Coefficient (DSC), Prevent Match (PM) and Correspondence Patio (CR) are used as evaluation criteria to compare with other model algorithms. The experimental results show that the proposed model can significantly improve the segmentation effect of ultrasound images for thyroid nodules compared with traditional models.


I. INTRODUCTION
In recent years, artificial intelligence technology and medical imaging have become more and more closely integrated [1]- [3]. The use of artificial intelligence to process medical images has increasingly become the main research focus. Ultrasound image segmentation is one of the research hotspots [4].
One of the most common diseases of endocrine system is thyroid nodules. Relevant studies have shown that 2% to 6% of adults in areas where iodine is not deficient have thyroid nodules [5], [6]. The incidence of ultrasound images is 19-35% [7], [8]. When segmenting ultrasound images of thyroid nodules, the methods generally used include the following: contour and shape-based segmentation methods, region-based segmentation methods, supervised and unsupervised segmentation methods, hybrid The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang .
technology-based segmentation methods, threshold-based segmentation methods, segmentation methods based on Markov random field and segmentation methods based on deep learning [9].
Contour and shape-based segmentation methods can be divided into edge-based segmentation methods, probabilistic filtering-based segmentation methods and deformable modelbased segmentation methods. In the process of image processing, various gradient filters are usually used to extract image edges. But the extraction process is often affected by noise, and the gradient filter often gets wrong edge results during the detection process. Therefore, it is particularly important to design a suitable algorithm and detect the edge by a large number of calculations [10]- [12]. In the traditional edge detection process, the contrast of ultrasound images is relatively low due to the presence of spots and noise, which causes the edge of shadow areas to be inaccurately obtained [13], [14]. In order to solve the above problems, Kwoh et al. used Fourier transform to Fourier decomposition VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of images, and reduced false edges by obtained high-order harmonics [15]. In order to reduce the number of spots in original images, Aaraink et al. used local standard deviation as the basis to identify homogeneous and non-homogeneous regions in images under a multi-resolution framework. The result can provide a more reliable detection method for remote detection of thyroid images [16]. Yu et al. proposed a method to determine the initial contour of an image based on radial base embossing method, and based on this, proposed an algorithm that can remove false edges of images [17]. The algorithm is based on the deformation propagation of two-dimensional slices and can change the contour of each image slice. Gomez et al. achieved the enhancement of image contrast through a histogram equalization method with limited contrast. On this basis, the edge of the image is enhanced by an anisotropic diffusion filter. Finally, a watershed-based image segmentation method was proposed, and the boundary extraction of breast ultrasound images was realized [18]. Based on the U-net model, Pan Peike et al. realized the segmentation of MRI images of nasopharyngeal tumors. The principle was to obtain surrounding information by the contracted path, and on this basis, achieved precise positioning by expanding the path [19]. Most of the existing researches reduce the size of original images in the image segmentation process, and cannot obtain a full-resolution image.
In addition, the shallower network during image segmentation will greatly reduce the accuracy of segmentation results. In order to solve the above problems, Chen et al. proposed an algorithm to increase the receptive field by expanding convolution. This algorithm inserted an appropriate number of zero into convolution kernel, which expands the convolution kernel. The expanded convolution kernel can obtain a larger receptive field and maintain the number of kernel parameters. And different expansion rates can extract the characteristics of different sizes of receptive fields. But when the expansion rate increases to a certain extent, this algorithm will fail [20]. Yang M et al. spliced the result obtained by expansion convolution algorithm in the previous layer with the result obtained by traditional convolution. Then the spliced image was transferred to the next layer of dilated convolutional layer, and a DenseASPP model was proposed. This model make up for the defect that ordinary dilated convolution will fail when the expansion rate increases to a certain extent, and can gradually increase the receptive field of each layer of dilated convolution. However, this model had certain defects in the application of ultrasound image segmentation, such as the unsmooth edges of segmentation [21]. Kumar et al. were based on Convolutional Neural Networks (CNN). In the process of nuclear segmentation in digital microstructure images, the segmentation result of entire images was obtained by predicting the category of each pixel in the form of a sliding window [22]. Lu Qiuju et al. proposed a global segmentation method for multi-threshold color image with adaptive step size for the segmentation of multi-threshold color image. This method improved the efficiency of segmentation by reducing the total number of image colors and does not reduce the quality of images. By listing the objective function and solving objective function based on swarm optimization algorithm, the optimal solution for color image threshold segmentation is obtained [23]. Anas et al. proposed a realtime prostate segmentation technique based on deep neural network during biopsy. It laid the foundation for the dynamic registration of mp-MRI and ultrasound data. In addition to extracting spatial features by convolutional networks, this technology also used recursive networks to collect and utilize time information between a series of ultrasound images. This system used residual convolution in the recurrent network to improve optimization, and finally proved the usability of fully convolutional neural network on ultrasound images [24].
Most of the methods mentioned in the above references have problems such as low contrast, blurred boundaries and speckle echo. Therefore, it is difficult to achieve ideal results when applied to ultrasound image segmentation of thyroid nodules. The main contribution of this paper is: 1) Based on the DenseNet-121 network structure model and combined with the Atrous Spatial Pyramid Pooling (ASPP), and proposes a new segmentation model for ultrasound images of thyroid nodules. Using spatial pyramid pooling for splicing to form the mapping feature solves the problem of changes in the size of mapping features.
2) In the encoding process, hierarchical feature fusion is proposed to generate semantic feature structure. The experimental results show that the segmentation model method proposed in this paper greatly improves the segmentation effect of ultrasound images of thyroid nodules. Besides, its performance is better than other comparison methods.

II. BASIC NETWORK STRUCTURE
The encoder-decoding structure is required to gradually reduce the spatial dimension of input data. Secondly, the structure can gradually restore the details of target and the spatial dimension of target based on the network layer such as deconvolution layer.
Atrous convolution can increase receptive field and maintain the number of kernel parameters at the same time, so as to achieve the purpose of effectively maintaining image resolution.
In the basic network structure of this paper, the encoder-decoding structure and atrous convolution are properly combined. In the encoding process, Fusion Atrous Spatial Pyramid Pooling (FASPP) is proposed, and Hierarchical Feature Fusion (HFF) is proposed in the decoding process to generate semantic feature structure (GSM). Based on the inherent characteristics of thyroid nodules ultrasound images, a targeted network structure Pronet is proposed. The final network structure is shown in Fig. 1.   Dense Net uses a similar idea to Res Net when dealing with problems such as network degradation and gradient disappearance, and uses short connections to deal with such problems. The difference between them when dealing with problems is that the core design of Dense Net uses Dense Block structure. The source of the name Dense Net is because this structure resembles a dense network. Its characteristic is that it can connect any two convolutional layers.

III. ALGORITHM IMPLEMENTATION
The Dense Block structure diagram in Fig. 2 contains 5 layers of structure, which form a densely connected block as a whole. It can be seen from Fig. 2 that any two convolutional layers are interconnected and connected. And the feature layer of the upper layer is the input value of lower layer. The structure diagram of densely connected blocks given in Fig. 2 can reduce the number of parameters of entire network to a certain extent. This makes the network narrower and can achieve the purpose of making full use of the characteristics of each layer. The connection between adjacent layers of Dense Block is to merge channels by Concatenation instead of simply adding them. This is quite different from Res Net network, which is also the essential difference between the two.
The basic settings in any Dense Block structure include growth rate parameters. The growth rate in a certain Dense Block structure represents the number of feature layers output by each layer in Dense Block. At the connection point between layers in Dense Block structure, a layer of bottleneck layer can be added to reduce the number of parameters in network and reduce its feature dimension. The new structure Dense Net formed after adding Bottleneck and Translation layer to Dense Block structure is named Dense Net-BC. The most common structure in Dense Net network is Dense Net-121 structure. The parameters and composition of the network structure are shown in Tab. 1.

2) ATROUS SPATIAL PYRAMID POOLING
In order to better obtain the contextual multi-scale information of input feature map, multiple convolutions with different expansion coefficients can be used in this process to achieve the purpose of obtaining multi-scale feature maps. But this will also bring some negative effects, such as changing the size of mapping features. In order to solve the abovementioned problem that the size of mapping features changes, the mapping features can be formed by splicing by using Spatial Pyramid Pooling (SPP). In addition, depthwise separable convolution is generally used when processing the channels of input images. In summary, a new ASPP can be formed by combining spatial pyramid pooling and depthwise separable convolution to separate input images channel from the spatial information. In this paper, the operation used in the last layer of U-Net network coding in traditional computing is replaced with ASPP. The network structure of ASPP is shown as in Fig. 3.
In the calculation process, ASPP is used to perform convolution operation on the feature map of upper layer, which is mainly divided into the following five convolution processes: (a) The first convolution uses 256 ordinary 1 × 1 convolution kernels to perform convolution calculation on the feature map, and add batch normalization layer operation after convolution.  (b) Use depthwise separable convolution calculations during the second to fourth convolution calculations. The depthwise separable convolution network structure in each convolution process can be expressed by the following formula: DepthConv --3 × 3 dilated convolution with expansion coefficients of 6, 12, and 18. PointConv --Ordinary 1 × 1 convolution.
In the second to fourth convolution operations, using the network structure of depthwise separable convolution of equation (1) can greatly reduce the number of parameters in the model, thereby speeding up the convergence speed of model calculation.
(c) In the fifth convolution process, the size of original image needs to be reduced to 1/output step size of previous size (the output step size in this paper is 16). Then the feature map is sent to 1 × 1 convolution kernel with 256 output channels by performing the global mean pooling operation, then proceed to batch normalization layer operation. Finally, the bilinear interpolation method is used to restore the image size. Although ASPP with different sampling rates can capture multi-scale information well, but as the sampling rate gradually increases, the weight of filter will also decrease. When its weight is reduced to a certain extent, 3 × 3 convolution kernel can no longer fully capture the context information of images. The 3 × 3 convolution kernel will also degenerate into a simple 1 × 1 convolution kernel. According to the above-mentioned method, the fifth convolution can solve this problem to the greatest extent.
It should be noted that after five convolution operations are completed, the five multi-scale feature maps extracted need to be spliced. Its purpose is to be able to get the correlation between different feature maps. After the feature map is spliced, it is sent to 1 × 1 convolution kernel with 512 channels, then performs batch normalization layer operation to send the final feature map to decoding module for decoding.

3) MODEL SIZE OPTIMIZATION
The size of convolutional layer after fusion may be too wide. In order to prevent this, the size of model needs to be optimally controlled. In this paper, in order to appropriately reduce the dimension and size of feature images, a 1 × 1 convolution operation is performed before each convolution calculation. In this way, the dimensionality of feature maps can be reduced to half of original so as to achieve the purpose of reducing output size.
Assume that each convolutional layer initially has a0 input features, and each convolutional layer outputs m feature maps. Then the number of input feature maps ak of 1 × 1 convolution of k atrous convolution layer is: Before each convolution calculation, a 1×1 convolution operation is performed, which reduces the number of channels to a 0 2. Set the output number of feature images of each convolutional layer to m = a 0 8. Assuming that there are Sum parameters in the network, then: The output step size of the output feature of thyroid nodule ultrasound image after passing by the decoder is 16. In the Deeplab v3 structure, bilinear interpolation is performed on the obtained feature images. The coefficient of bilinear interpolation is the same as the output step length of output features, which is 16. This structure is equivalent to a simple decoding module, and its feature is that the resolution of images can be restored to the same as original images. However, it has certain defects, such as losing part of the characteristic information, which will cause the ultrasound image of thyroid nodules to be unable to be completely segmented.
As can be seen from the basic structure of network in Fig. 1, the coding stage in the basic network structure of this paper is applied to ResNet-101 structure. It mainly includes four modules, block1, block2, block3 and block4. The encoder can continuously extract the characteristic information of each layer from block1 to block4. In the decoding stage of basic network structure, this paper proposes a Hierarchical Feature Fusion (HFF) structure. The decoder also includes four modules: dblock4, dblock3, dblock2 and dblock1. The decoder can provide feature images with multiple hierarchical levels from dblock1 to dblock4.
Each of the four modules in the decoder can be divided into the following two stages: encoder adaptation stage and image feature generation stage. Among the four modules, dblock1, dblock2 and dblock3 are integrated with each other, and merge the output of previous decoder with the output of encoder. The Dblock4 module is different from the previous three modules in that it only has inputs and does not perform fusion operations. In the process of fusion between dblock1, dblock2 and dblock3 modules, the output feature information of previous decoder needs to be fused with the matching feature obtained by the encoder. In the decoding stage, this paper uses feature maps obtained in the encoder stage as the basis in the decoder module to appropriately adjust feature maps with a relatively low resolution from the decoder module. The purpose of adjustment is to make the fusion features obtained by different convolutional layers have the same dimensions. That is, the spatial resolution and the number of channels of fused features obtained by different convolutional layers are the same.
The final operation of the encoder and previous decoder should be 3×3 convolution, which is done to ensure the same number of channels. Set the number of output channels that will be two convolutions. Their values are the same and they are both set to the minimum of the number of input channels of convolution. In addition, bilinear interpolation is used to sample low-resolution feature images and the maximum spatial resolution of features to be fused. After the adjustment, the dimensions of each feature information are the same, and the number of output channels of these two convolutions can be set to the same value, that is, the minimum value of the number of input channels of convolution, by fusing the features by element-wise summation. Then, in order to ensure that the features have the same spatial resolution, the bilinear interpolation method is used to up-sample lowresolution feature map and the maximum spatial resolution of features to be fused. Through adjustment, all features have the same dimension, and features can be fused by elementwise summation. The process is shown in Fig. 4.

2) GENERATE SEMANTIC FEATURES
Semantic features are generated based on contextual information, and the last part of each module in the decoder is responsible for capturing contextual information. In the process of capturing contextual information, four convolution pooling operations with 3 × 3 convolution are applied. Then there are the maximum pooling operation of 5×5 convolution and 3 × 3 convolution operation. Different fusion modules obtain the context information of image areas from different spatial positions of feature maps when capturing the context information, and merge the input at this stage with all the outputs of the set operation by connecting mapping features.
In order to appropriately reduce the dimension of feature maps from the fusion layer and the feature dimension of cascade structure, a 3 × 3 convolution operation is applied. The structure of semantic feature structure is shown in Fig. 5. The four modules are stacked, and the final predicted segmentation result is implemented in dblock1 module. In the process of generating semantic features, firstly it reduces fitting by dropout operation. Then the number of output channels of feature maps is adjusted to be consistent with the number of output pixel classes through a 3 × 3 convolution operation. Then, based on softmax function, a semantic segmentation map of the thyroid is generated on all pixels. Finally, the low-resolution feature map is appropriately adjusted based on bilinear interpolation, and adjusted to the size of original images.

C. LOSS FUNCTION
The thyroid ultrasound image segmentation model proposed in this paper is based on the thyroid segmentation network of encoding and decoding. In the segmentation process, the model needs to be continuously trained to predict whether each pixel is a background. This problem is a pixel-level two classification problem.
The loss function is the cost function. It is usually used to measure the difference between the predicted results of model and true results. Its function is to judge the pros and cons of model. The smaller the value of loss function, the better the fitting ability of model, the richer the features learned by model, and the better the overall performance of model. The loss function usually involved in classification problems is Binary Cross Entropy (BCE) loss function, which can be expressed by the following formula: a pre log a tru + 1 − a pre log 1 − a pre (4) a pre --The prediction value, its value is 0 or 1. a tru --The true value, its value is 0 or 1.
When the predicted value is equal to the true value, loss value loss is 0. When the predicted value is not equal to the true value, loss is greater than 0. The more the probability is different, the greater the loss value. In actual operation, the loss function is generally not used directly. In this paper, BCE loss function is replaced by Tversky Loss (TL) loss function [26] in the calculation. The TL loss function allows the flexibility to balance false negatives and false positives, and it can be expressed as follows: a pre --The predicted value of probability that pixel j belongs to the disease class. a pre --The predicted value of probability that pixel j belongs to the non-pathological class. a tru --The true value of probability that pixel j belongs to the disease class. a tru --The true value of probability that pixel j belongs to the non-pathological class. Adjusting γ 1 and γ 2 can redistribute weights, which can improve sample imbalance and improve recall.

IV. SIMULATION EXPERIMENT A. EXPERIMENTAL SCHEME SETTINGS
The parameters of experimental platform used in the simulation experiment are shown in Tab. 2. Ultrasound images of thyroid nodules were taken from 30 patients in the hospital. Each patient can provide 20-40 usable ultrasound images as samples, and the final sample is 1,000. The size of ultrasound images for thyroid nodules is 548 × 456, and the pixel size is 0.35 mm. Due to the small number of effective samples in thyroid nodules dataset, the network training is over-fitting or the network generalization ability is poor. Therefore, it is necessary to optimize and select the original data first. Then we perform an augmentation operation on the filtered data to increase the number of samples to 4000, thereby improving training accuracy and overall network performance.
First, label the serial number of 1000 ultrasound image samples of thyroid nodules, then select 900 images for training, and the remaining 100 images for algorithm evaluation test. From 4000 ultrasound images of enhanced thyroid Y. Wu et al.: Ultrasound Image Segmentation Method for Thyroid Nodules Using ASPP Fusion Features nodules, 3,600 images were taken for training, and the remaining 400 images were used for algorithm evaluation tests. Finally, taking the result of doctor's manual segmentation of images as the standard, the reliability of our algorithm is judged by comparing the image segmentation results of different algorithms.
The ultrasound images of thyroid nodules were trained on the basis of original dataset and enhanced dataset. The loss value and accuracy rate obtained by the network model are shown in Fig. 6. It can be seen from Fig. 6(a) that the loss value comparison curves obtained on the basis of original dataset and enhanced dataset are very close in the initial training stage. However, as the number of iterations continues to increase, the loss value obtained based on enhanced dataset is much smaller than the loss value obtained based on original dataset.
It can be seen from Fig. 6(b) that as the number of iterations continues to increase, the accuracy obtained on the verification set based on enhanced dataset is much greater than the accuracy obtained on the verification set based on original dataset. Thus, the dataset after data enhancement can greatly improve the accuracy of the network model in training process and greatly reduce the loss value.

B. EVALUATION INDICATORS AND EXPERIMENTAL RESULTS
When testing the performance of the model, in order to quantitatively measure the performance of proposed model, this paper will use the following three standard indicators to evaluate the model performance: Dice Similarity Coefficient (DSC), Prevent Match (PM) and Correspondence Ratio (CR).
DSC is generally used to consider the similarity between labels and the predicted value, and its value range is (0, 1). The larger DSC value, the more similar labels and the predicted value. PM is generally used to consider the situation that ultrasound images are missed during segmentation, and its value range is (0, 1). The larger the value of PM, the less the ultrasound image is missed during segmentation. CR is generally used to consider the situation that ultrasound images are incorrectly segmented during segmentation, and its value range is (0, 1). The larger the value of CR is, the less the ultrasound image is mistakenly segmented during segmentation.
DCS can be expressed by the following formula (6): PM can be expressed by the following formula (7): CR can be expressed by the following formula (8): Z --The measured area manually divided by doctors. Z --The measured area segmented by the model proposed in this paper. Z tru --The area that is segmented correctly. Z fau --The area that is segmented wrongly.
In this paper, original dataset and enhanced dataset are used as the basis for training to obtain the corresponding model, and then the corresponding test set is processed with the obtained model. The segmentation results obtained on the test set were evaluated with DSC, PM and CR as the standards. The evaluation results based on the contour distance and evaluation results based on the contour area are shown in Tab. 3    It can be seen from the data in the table that whether it is in evaluation results based on contour distance or the evaluation results based on contour area, the model trained on the basis of enhanced dataset is better than the training based on original dataset on the whole resulting model. Therefore, it can be verified that data growth can effectively improve the generalization ability of the network model and the accuracy of tests.
The figure shows the ultrasound images of thyroid nodules of different nature (malignant or benign) in 4 different patients in samples. According to the selected samples, the ultrasonic image segmentation experiment of thyroid nodules is carried out. Fig. 8 shows the segmentation results of different thyroid ultrasound image samples.
The first row of images in Fig. 8 are different thyroid ultrasound image samples, and the second row corresponds to the image drawn manually by doctors. The third row is the segmentation result of the network model proposed in this paper, and the fourth row is the difference graph of division probability. It can be seen from Fig. 7 that the nodule segmentation model of thyroid ultrasound images proposed in this paper can segment images relatively accurately.

C. OPTIMIZER SELECTION EXPERIMENT
In the process of optimization selection, SGD optimizer, RMSprop optimizer and Adam optimizer are used to test the segmentation data of the ultrasound image of thyroid nodules after coarse positioning. We select the appropriate optimizer by comparing and analyzing different test results.
The variation of the intersection ratio of test sets under different optimizers with the number of iterations is shown in Fig. 9. It can be seen from Fig. 9 that in the curve of test set intersection ratio with the number of iterations corresponding to SGD optimizer, as the number of iterations continues to increase, the test set intersection ratio first decreases and then rises. But it did not converge after 125 iterations. In the curve of test set intersection ratio with the number of iterations corresponding to RMSprop optimizer, as the number of iterations increases, the test set intersection ratio rises rapidly and tends to converge stably when the number of iterations is about 20. Its value is about 86.6%. In the curve of the intersection ratio of test set corresponding to Adam optimizer with the number of iterations, the intersection ratio of test set rises rapidly and tends to converge stably when the number of iterations is about 10. Its value is about 87.1%.
The cross entropy loss function curves under different optimizers are shown in Fig. 9.  It can be seen from Fig. 10 that in the cross entropy loss function curve corresponding to SGD optimizer, the loss function drops rapidly in the first iteration. As the number of iterations increases, the loss function shows a steady and slow downward trend. In the cross entropy loss function curve corresponding to RMSprop optimizer, the loss function also drops rapidly in the first iteration. But as the number of iterations increases, the loss function tends to converge smoothly at the seventh iteration. The cross entropy loss function curve corresponding to Adam optimizer is similar to RMSprop optimizer. However, as the number of iterations increases, the loss function has begun to converge smoothly by the second iteration.
In summary, the overall performance of Adam optimizer is the best. Therefore, Adam optimizer should be selected for optimization in the ultrasound image segmentation model of thyroid nodules.

D. INFLUENCE OF DIFFERENT LOSS ON EXPERIMENTAL RESULTS
Due to the large differences between individuals in the ultrasound image samples of thyroid nodules selected during experiment, there are more slices containing small targets when making slices. Eventually, it may destroy the data balance of positive and negative samples, making it difficult to continue training. This paper proves that using Tversky loss as a loss function can greatly improve the performance of segmentation by comparing the segmentation results of Tversky loss and BCE loss function in the network structure. The calculation results of different loss functions are shown in Tab. 5. It can be seen from Tab. 2 that the evaluation index of segmentation results using Tversky loss as loss function is higher than the segmentation results of BCE loss function. Thus, using Tversky loss as a loss function can effectively improve the performance of segmentation.

E. COMPARISON OF SEGMENTATION RESULTS OF DIFFERENT MODELS AND ALGORITHMS
Reference [27] proposed an active contour model, which can efficiently segment images. In addition, reference [28], reference [29], reference [30] and reference [31] also proposed image segmentation models with different performance. In this paper, evaluation calculations are made for the abovementioned different segmentation models and algorithms, the calculation results are shown in Tab. 6. It can be seen from the above results that DSC, PM and CR calculated by the model algorithm proposed in this paper are 0.9961, 0.9931 and 0.9874 respectively. And the three standards are better than the results of other algorithms. Therefore, it can be known that the segmentation model algorithm proposed in this paper has better segmentation performance and strong generalization ability, and has a certain improvement in the segmentation effect of ultrasound images for thyroid nodules.

V. CONCLUSION
The tissue structure around the thyroid is complex, the resolution of thyroid ultrasound images is low, and image segmentation is difficult due to external interference. Thus, it is difficult to segment and extract the ultrasound images of thyroid nodules accurately. Aiming at these problems, this paper proposes an ultrasound image segmentation model algorithm for thyroid nodules based on ASPP fusion features. Fusion atrous convolution pyramid structure is proposed in the encoding process by properly combining the encoder-decoding structure and atrous convolution. Furthermore, the possibility of fused convolutional layer size being too wide is eliminated by optimizing the control of model size. In the decoding process, hierarchical feature fusion is proposed and semantic features are generated. The feature images with low resolution and the maximum spatial resolution of features to be fused are sampled by bilinear interpolation, and the fused features are calculated by element-wise summation. According to the basic structure of proposed network, DSC, PM and CR are used as evaluation criteria to compare and analyze with other methods.
The experimental results show that the ultrasound image segmentation effect of thyroid nodules is greatly improved compared with traditional segmentation method, and the effectiveness of the model algorithm proposed in this paper is verified. We will further improve the model based on this VOLUME 8, 2020 work in the future. The ultrasound image segmentation effect is further improved, and the detection of corresponding thyroid nodules on this basis is carry out considering increasing the depth of convolutional layer.