An Automatic Diagnosis Method of Thyroid Ultrasound Image Using Feature Fusion Network

Nowadays, diagnosis of thyroid nodules is mainly based on clinical methods, which requires a lot of manpower and medical resources. Therefore, this work proposes an automated thyroid ultrasound nodule diagnosis method that combines convolutional neural networks and image texture features. The main steps include: Firstly, ultrasound thyroid nodule dataset is established by collecting positive and negative samples, standardizing of images and segmentation of nodule area. Secondly, through texture features extraction, feature selection and data dimensionality reduction, texture features model is obtained; Thirdly, by transfer learning, deep neural network is used to obtain feature model of the nodule in images; Then, texture features model and convolutional neural network feature model are combined to form a new nodule feature model called Feature Fusion Network; Finally, Feature Fusion Network is applied to train and improve performance than single network, and a deep neural network diagnosis model that can adapt to the characteristics of thyroid nodules is built. In order to test this method, 1874 groups of clinical ultrasound thyroid nodules are collected. Harmonic average F-score based on Precision and Recall is used as an evaluation indicator. Experimental results show that Feature Fusion Network can distinguish between benign and malignant thyroid nodules with an F-score of 92.52%. Compared with traditional machine learning methods and convolutional neural networks, performance of this work is better.


I. INTRODUCTION
With the increase of people's life pressure and breakthroughs in medical testing technology, prevalence of thyroid nodules has increased year by year in the world, becoming one of the most important diseases threatening human health [1].Therefore, early diagnosis of thyroid nodules is very important [2].The diagnostic methods of thyroid nodules mainly include ultrasound examination, CT examination, aspiration biopsy and pathological examination.CT examination requires nuclear scanning, which is harmful to patients and is expensive.Needle biopsy and pathological examination are more commonly used and reliable methods, but these two methods are very traumatic to thyroid tissue.Also, diagnosis process is more cumbersome, which will occupy more medical resources.Ultrasonography is currently the common imaging method for diagnosing thyroid diseases.It has the advantages of simplicity, good reproducibility, noninvasive, fast and low price.Usually, doctors can only judge benign and malignant based on clinical experience, which are highly subjective and easily affected.Therefore, the ability to accurately and quickly identify and diagnose the pathology of ultrasound thyroid nodules has become an increasingly urgent need.
In recent years, application of artificial intelligence technology in the medicine has gradually increased, especially in imaging [3][4][5] and signal [6].How to use information of ultrasound images to establish a computerassisted automated thyroid diagnosis system is an important direction of current research [7,8].The commonly applied method of assisting medical diagnosis is to use features extraction engineering and classifiers for classification.For example, Zheng et al. [9] used LR (Logistic Regression) to screen out indicators that have a greater impact on judging benign and malignant thyroid.This regression models can achieve pathological classification of images.Gayana et al. [10] extract local texture features of thyroid nodules from region of interest, and apply KNN (K-NearestNeighbor) algorithm to obtain diagnosis results.Choi et al. [11] took thresholds and three-dimensional connected region labeling methods to assist doctors in detection by classifiers based on genetic planning.These technologies are based on computer theoretical systems and establish accurate computer diagnosis methods.However, it depends on the completeness of feature textures information and selection of a suitable classifier.
On the other hand, with the development of deep learning, some researchers are studying convolutional neural networks to diagnose thyroid ultrasound nodules [12][13][14].For example, Chen et al. [15] established S-Detect technology based on the GoogLeNet.They cooperated with clinical sonographers for joint diagnosis to improve diagnostic performance.Xie et al. [16] decomposed nodules into 9 views to learn 3D features.They built a multi-view knowledge-based collaborative model for each view and input three images into ResNet-50 network for training to represent appearance, voxel, and shape specificity.In summary, convolutional neural network usually does not require too much pre-processing operations, and has advantages of convenience and simplicity.However, due to lack of sufficient prior theoretical support, it is very dependent on feature completeness of training data.At the same time, direction and details of feature training are usually unknown.How to further improve diagnosis accuracy is still urgently needed.
Any single method has certain shortcomings and their unique advantages, so researches have begun to gradually switch to fusion method, for example, Chi et al. [17] established an integrated model which contained two different convolutional neural networks to fuse features.In order to fully integrate textures information and image information of thyroid ultrasound, this work builds an integrated convolutional neural network that combine texture features and image features based on ultrasound images of thyroid nodules collected clinically to realize automated pathological diagnosis.Texture features are extracted by feature engineering method, and the obtained feature vector is combined with feature vector of convolutional neural network to achieve the purpose of further improving network performance.Structure of this paper is as follows: Section 2 introduces the methods used in this work, Section 3 conducts experiments and discussion, and Section 4 makes a final summary.

II. METHODS
The method used in this work includes the following steps: Firstly, construction of annotated dataset: including image acquisition, cropping, enhancement, and region of interest extraction.Secondly, based on patient's pathology test, texture information of nodules is obtained by feature engineering, and features selection is performed based on chi-square correlation test of relationship between feature variable and nodules to eliminate influence of irrelevant variables; Finally, transfer learning is applied to establish a ResNet to achieve feature extraction and image texture features fusion, so that the performance of network is further improved.The method steps are shown in Fig. 1.
have undergone biopsy examinations.Therefore, each group of cases contains the characteristic structure of nodules and the accurate diagnosis results of benign and malignant.
During the scan progress, the left, right, and anteroposterior diameters were measured transversely at the maximum diameter of the nodule.Then, the upper and lower diameters were measured at the maximum long diameter, and the images were saved.Because the ultrasound reports contain diagnostic information and redundant background areas of the border, all data are cropped manually and extracted of  I show that there is no significant difference between the training and the testing set.

Texture Features Extraction
This work first uses feature engineering methods to extract the texture features of nodules in ultrasound images, including: Gray-Level Size Zone Matrix (GLSZM) [19], Gray-Level Co-occurrence Matrix (GLCM) [20] and Gray-Level Run-Length Matrix (GLRLM) [21].
For an image with a gray level g and area size s, GLSZM is used to describe the bivariate conditional distribution probability density of image brightness.When image textures are sufficiently uniform, the width of GLSZM will be large enough.In this work, 32 gray levels are applied as the quantization degree of texture classification.
In GLCM, if diagonal elements have larger values, it means that pixels of image have similar pixel values.Also, if elements deviating from diagonal line have a relatively large value, it indicates that pixel gray scale has a large change in local area.In addition, this work performed the contrast, inverse different moment, entropy, and angular second moment to obtain more texture features.Their calculations are defined as Table II, where, ( ) , , GLRLM can realize statistics of consecutive occurrences of the same gray value in the same direction.The calculation equations of texture statistical features related to GLRLM are defined in Table III.
, ,..., Calculate the covariance matrix of the sample T XX .

Eigenvalue decomposition of matrix
4. Take out the feature vector corresponding to the largest , ,..., n w w w feature value.After all eigenvectors are standardized, the eigenvector matrix W is formed.

Data Dimensionality Reduction
In order to eliminate redundant features contained and prevent dimensional disasters, dimensionality reduction is required to compress the data and eliminate data noise.This work applied the filtering method for feature selection, and uses the statistical chi-square test [22] as the feature scoring standard.First assume that the two variables are independent, and then calculate the deviation between actual value and heoretical value, which is defined as:  ( ) where, A is actual value and T is theoretical value.If deviation between two is small enough, deviation of natural sample is considered to be deviation, and null hypothesis is accepted.If deviation is large to a certain extent, null hypothesis is rejected, indicating that two are not independent.And the greater value of 2 x , the greater degree of correlation, which can be used to retain characteristics of correlation with benign and malignant nodules of dependent variable.
Through chi-square test, this work selected the top 150 features for next round of processing.In addition to considering correlation between independent variables and dependent variables, relationship between variables should also be considered during data analysis.
It can be considered that such features actually contain the same kind of information, and only one kind of information is retained.To solve this problem, this work adopted principal components analysis [23] to reduce dimensionality of data.The core idea is to find out the most important features from the feature space, and remove the repeated parts through orthogonal transformation.The specific algorithm flow is as Algorithm.
This work takes t=0.999 to ensure that the sample dimension is as small as possible without loss of information as much as possible.After processing, final sample dimension is reduced to 372.

Feature Fusion ResNet
In order to improve diagnostic performance, this work proposes Feature Fusion ResNet based on the idea of integrated network.Here, ResNet-18 [24] is built as basic ResNet for discussion the role of feature fusion with high efficiency and low cost.Since lung nodules and thyroid nodules have similar texture information, ResNet was firstly trained by Suspicious Nodules to Diagnosis dataset of Kaggle Competition [25] as wear classification network.Then, transfer learning of this network based our data was applied.Schematic diagram of Feature Fusion ResNet is shown in the Fig. 4. In network part, input original ultrasound images are expanded to 32 channels after convolution operations with 7×7 kernels firstly.And max-pooling operation is applied to achieve data compression.Then, convolution operations with 3×3 kernels are performed repeatedly and channels are expanded to 64.Similarly, multiple sets of convolutions are performed in sequence, and the number of channels doubles with each set of convolutions.Then, another max-pooling operation is applied and fully connected calculations are started.Therefore, 1000 size vectors from the network are generated.
Since the fully connected layer of ResNet is a highly abstract latent code, it mainly completes the mapping of the feature space to the mark space.To retain the feature extraction capabilities of ResNet, we started the fusion from the fully connected layer.In order to fuse texture features, these two vectors are connected to another 128 size vectors to meet the unified dimension and normalization.Then the two Fig. 6 Statistical results of different models vectors make bitwise addition, and finally connected to the output layer in a fully connected manner.So, the network incorporates more feature parameters and learns with the input of texture features.Finally, SoftMax function is applied to activate output layers and realize classification.

Initialization
The experiment involved 11G 2080Ti Graphics Processing Unit (GPU), InyterCore i7-6700, 16 GB memory, Ubuntu 18.04 operating system, Pytorch framework and Python 3.7.The training parameters are set as 16 batch size, 0.001 learning rate, Adam optimization, and 50 epochs.Some indexes are introduced to access different methods based on TP, FP, FN, and TN, which are defined as true positive, false positive, false negative and true negative [26] [27].Accuracy index is probability of correct diagnosis of positive and negative samples, which is defined as:

TP TN TP TN FP FN
Recall represents the probability of being correctly predicted as a malignant sample, and is defined as follows: Precision represents the probability of correct prediction as malignant sample among samples predicted to be malignant, which is defined as: Specificity shows the ability to negative samples.It is defined as follows: F-score is to balance the Precision and Recall.It is defined as: ( ) where, β is the balance factor, which can be set by operators for different ratio of Recall and Precision.Usually, more attention is paid to patients' diagnosis results in clinical diagnosis.This work set =1 β to improve weight of Recall based on clinical experience.FPR and FNR are important in our case due to misdiagnosing.They are defined as eq.7 and eq.8: FPR + FNR In addition, AUC [28] is introduced to measure the performance of CNN models.

Performance Assessment
The pretrained model continued to be trained into two new models.One was ResNet model trained by image dataset.The other was trained as Feature Fusion ResNet using image dataset and feature vectors.Fig. 5 reveals loss graphs of ResNet model and Feature Fusion ResNet model.It can be seen that due to the complexity of picture information, it is difficult for the ResNet network to achieve a good convergence effect.Therefore, the current clinical diagnosis of benign and malignant thyroid nodules in ultrasound image is still based on subjective judgment, and sometimes depends on multi-angle observation or even multiple methods for diagnosis.On the other hand, by incorporating the texture information of the nodules into ResNet network, the convergence effect of the network has been significantly improved.This is because the texture information of the nodules has a rich prior knowledge of clinicians, which can assist the computer to achieve better results.
In order to test the proposed method, this work supplemented VGG-16 [29], LR [30], and KNN [31] methods as comparative experiments.Where, ID of training set and testing set of all methods can match exactly.LR and KNN were trained by feature vector obtained from the feature texture to analyze texture information.VGG-16 was applied by transfer learning and trained until convergence.KNN entered K value from 1-80 iteratively, and obtained the best performance when k value was 5, which was taken as a comparison result.LR set regularization coefficient to 1, and internally used coordinate axis descent method to iteratively optimize the cost.
Table VI reveals the assessment results.Compared with Feature Fusion ResNet and ResNet, performance of the integrated network has been significantly improved.Where, Accuracy reached 88.30%, and the proposed F-score rose from 77.48% to 92.52%.Therefore, by fusing texture features, deep neural network can combine more prior features on the basis of image information.This is helpful for improving the performance of the network.In addition, compare the other groups of control experiments, of which the KNN method attains the best comprehensive indexes.Its performance is better than VGG-16 and ResNet.This is because the feature vector obtained by feature engineering has many texture information, which may be richer than image information, especially in small dataset.KNN can effectively use the feature vector by finding the K nearest neighbor samples.It can achieve better results than single network trained by image source in some cases.LR has low accuracy due to insufficient fitting of the data distribution.Finally, we have compared it with the voting strategy [10].Since KNN and ResNet have good performance, we conducted voting fusion based on them.From the results, the voting strategy did not achieve better results, and its performance lies in the middle of KNN and ResNet.Finally, we calculated the model parameters of VGG-16, ResNet, and their feature fusion models.A certain amount of parameters will be increased after the feature fusion, which is caused by the additional connection weight.Nevertheless, the network can still maintain the same computational magnitude, with an increase of less than 0.2%.Therefore, when the fusion weight can be learned, the proposed fusion method will not bring additional computational burden.To visually observe the statistical information of data, Fig. 6 shows performance indexes of different methods (Fig. 6a) and intuitive comparison between ResNet and Feature Fusion ResNet (Fig. 6b) to verify performance of fusion features.

VI. Conclusion
Since clinical diagnosis of benign and malignant thyroid nodules by ultrasound is a subjective and tedious process, this work aims to assist doctors in making clinical diagnosis of thyroid nodules, thereby improving the accuracy and efficiency of diagnosis.Firstly, it is necessary to preprocess the clinically collected data, including cropping, enhancement, and extraction of regions of interest.Then, based on the nodules area, feature engineering is applied to obtain texture features of nodules, and feature dimensionality reduction is realized through the correlation between features and nodules and the relationship between features.Finally, a deep neural network model is established, and texture features from the previous step are merged to achieve the goal of further improving network performance.Under assessment of 1874 cases with thyroid nodules, this method obtained the best performance, which has clinical potential.This work combines the advantages of feature engineering and deep neural networks, and proposes a novel way of fusing features.Although this work mainly validates the diagnostic performance of ultrasound imaging of thyroid nodules, under the transfer learning and fusion feature structure, this work can also be applied to various domains, such as breast nodules, lung nodules and other tumor diagnosis.It is worth mentioning that the method of fusing features is mainly to introduce more features and information for deep neural network, so that the network can converge more accurately and quickly.This is also a future direction for new fusion information.This work has certain inspirations for computer-aided diagnosis, application of

Fig. 1
Fig.1 Flow chart of methods

Fig. 2
Fig.2 Equipment and data display.(a shows GE LOGIQ E9 Ultrasound Machine equipment is shonw; b reveals some preprocessing results.)

Fig. 4
Fig.4 Schematic diagram of the overall structure regions of interest using MITK Workbench 2018.4.0[18].Finally, it is divided into training dataset and testing dataset according to the ratio of 9:1, and ensure that the age and gender ratios are similar.Among them, training dataset contains 1686 data with 1152 malignant and 534 benign nodules, and testing dataset is composed of 188 data with 143 malignant and 45 benign nodules (Fig.2b).Both training and testing sets are guaranteed to have a certain number of benign and malignant nodules.Also, chi-square test and student's t test were made for gender and age factors in training and testing set.The statistical results in TableIshow that there is no significant difference between the training and the testing set.

2 ' 7 .
For Step4, the value of ' n represents the main component specific gravity threshold t.

Fig. 5
Fig.5 Loss graphs for different structures.(a presents result of ResNet; b shows result Feature Fusion ResNet.) Fig.6a explored performance of fusion features.Results comparison between ResNet and Feature Fusion ResNet visually shows the pros and cons of features fusion.Where, four indicators have been significantly improved, and Recall rose from 81.82% to 95.10%.Therefore, the Feature Fusion ResNet can greatly reduce the clinical phenomenon of missed and underdiagnosed diagnosis, which is also in line with the actual clinical needs.Table VI records the chi-square tests for ResNet and Feature Fusion ResNet.The significant results show that there are significant differences between them, so the proposed method has obvious improvement.Comparing the five methods, Feature Fusion ResNet has reached the highest value in most indicators.In addition, some models have also achieved better results on some indicators.For example, VGG-16 model obtained 99.19% Recall and 87.54% F-score, but it did not perform well on Accuracy and Precision.ResNet model reached the lowest performance, and the four indicators were significantly lower than many methods.Performance of VGG-Net and ResNet are different from each other.

Table V
Table V presents the calculation results of the CNN model in terms of FPR, FNR and AUC.It can be seen that the proposed method achieves an AUC value of 90.72%, outperforming the second-place ResNet by about 33%.In addition, the proposed method has low values on both FPN and FPR, especially the FPR value is much lower than the other three CNN models.Therefore, it can reduce cases of misdiagnosing in clinical applications.