Research on Recognition of Medical Image Detection Based on Neural Network

,


I. INTRODUCTION
CRC is a collective term of colon and rectal adenocarcinoma which has a high incidence worldwide, and whose mortality rate is the main cause of tumor death, CRC has higher heterogeneity compared with other tumors, and it can be divided into subtypes with different characteristics based on clinical or molecular characteristics [1]. The cause of CRC is more complicated, but it shows specific characteristics at the molecular level whether it is primary CRC (∼70%) or hereditary CRC (10-25%), including the genome Chromosomal instability (CIN) [2], loss of heterozygosity, and copy number variation. Now it has been shown that Epigenetic changes, such as island methylation, can drive adenomas to cancer in a sporadic and genetic form of CRC [3].
As the CRC develops slowly in precancerous lesions, it is usually in the middle and late stages of CRC when patients are aware of it, and this has a great impact on the prognosis of CRC, so early detection can reduce the incidence and mortality of CRC [4]. Considering the high The associate editor coordinating the review of this manuscript and approving it for publication was Zhihan Lv . diagnostic performance, the optical colonoscopy (OC) is the gold standard study for early detection of CRC [5]. It can be performed concurrently with biopsy specimens for a clear diagnosis, and at the same time as a therapeutic polypectomy, thus preventing long-term CRC death [6]. However, patients with tumor-related stenosis, older patients and those with comorbidities are more likely to have incomplete or difficult optical colonoscopy [7]. Haan et al. [8] found that CTC was comparable to colonoscopy in detecting larger colon polyps. Two meta-analytical studies have shown that carbon tetrachloride has an 87.9 • high sensitivity (100%) for detecting colon cancer which is less than 10 mm for adenomas. Despite such encouraging data, there is currently no cross-continental consensus on whether CTC should be used as a screening method for asymptomatic patients. Detection based on genomic mutations [9], [10], although it provides great help for accurate diagnosis and targeted therapy of CRC, the high heterogeneity of CRC limits the use of this method. MRI [11] is the recommended method for initial stage, because its definition of localization determines the overall expansion of the tumor and its relationship with the peritoneal reflex as high accuracy. Dorudi et al. [12] growth VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ on the initial MRI should be best described in relation to anatomical structures such as the mesorectal fascia. Most of the stages failures of MRI occur in the differentiation of T2 and marginal T3 stages, and stage is the main cause of error [13], [14]. Although previous studies have not shown the great advantages of dedicated phased array coils [15], our clinical experience is positive, and in our institution, we use phased array coils as the standard for the initial diagnosis of colorectal cancer [16], [17]. The advantage of high spatial resolution with a large field of view is that phased array MRI is suitable for staging superficial and advanced rectal tumors. Endorectal ultrasonography (ERUS) is now the established model for assessing rectal wall integrity [18]. The accuracy of T stage is between 69% and 97%. Intrarectal ultrasound (US) is currently the most accurate imaging method for assessing T1 tumors [19]. ERUS and intrarectal MRI have similar accuracy in distinguishing superficial (T1 and T2) and T3 tumors [20]. However, intrarectal MRI is associated with high costs, limited availability, and patient discomfort [21], [22]. Therefore, the European Medical Association guidelines do not recommend intrarectal MRI as the preferred imaging method for clinical T stage of colorectal cancer [23], [24].
Conventionally, methods such as fecal occult blood test (FOBT) and colonoscopy are also used to diagnose CRC. Although both have certain advantages, FOBT is still vulnerable to diet and drugs, and limiting factors such as the high cost and inconvenience of microscopy [25], [26]. In this study, an BPNN algorithm was used to construct a CRC diagnosis model based on expression profiles. The relationship between gene expression changes and CRC and the possibility of CRC diagnosis were explored. CRC molecular detection of expression profiles provides a possible alternative to FOBT.
The specific contributions of this paper include: • The literature comprehensively analyzes the related algorithms for intestinal cancer detection, and their advantages and disadvantages.
• This paper presents an optimized model for medical image recognition based on neural network. And the algorithm is supplemented by expression profiling to enable early detection of intestinal cancer.
• The accuracy of testing has been greatly improved, and the cost of testing has also decreased. The accuracy of the algorithm in this paper fully meets the needs of existing bowel cancer detection, and reduces the errors caused by manual detection. The rest of the article is organized as follows. Section 2 comprehensively expounds the current situation of computer aided application in intestinal cancer at home and abroad. Section 3 discusses and compares the analysis methods, and then in Section 4 the neural network-based detection and analysis model is designed and simulated. Section 5 draws the corresponding conclusions based on the simulation results.

II. RELATED WORKS
Up to now, there is little research on computer-aided diagnosis of cancer at home and abroad, and a complete theoretical system has not yet been formed. Now we will give a brief introduction of our achievements.
Prof. Cui et al. used syntactic pattern recognition to diagnose pancreatic cancer against the X-ray image of the pancreas [27]. Specifically, this method aims to diagnose pancreatic cancer (PC) and chronic pancreatitis (CP) [28]. The analysis of X-ray images based on endoscopic retrograde cholangiopancreatography uses pancreas morphological changes for identification. The pancreas with pancreatic cancer will expand or become narrowed appropriately, and cysts or spongy projections of the pancreaticobiliary ducts develop in the lateral branches [29], [30]. The pancreas with chronic pancreatitis is characterized by abnormal lateral branching of the catheter. Use attribute context-free grammar, enable rapid detection of pathological shape changes, and use syntactic pattern methods for identification and diagnosis. Based on pancreas MRI images, Jennifer A. FIexman obtained indexes of blood vessel size and blood volume fraction, which are used to evaluate the changes of tumor blood vessels in tumor staging. Specifically, the blood vessel size image and blood volume fraction metric in this model of human pancreatic cancer are used to represent the cross-sectional area of the tumor, and it is feasible to monitor the changes of blood vessels to determine the tumor stage [31], [32].
Zhijun Chen et al. proposed a subtle anomaly detection method based on CT pancreas images [33]. It is a simple cascade filter detection method. In the first step, the square of the gray level logarithm operation is introduced to improve the edge of the low gray level, and then the gray level is transferred to the deleted blurred area. Numerical operations to enhance the outline of detail [33], [34]. This algorithm has been tested, and the CT images of two tumor pancreas can be selected to indicate small abnormalities.
The international computer-aided diagnosis of pancreatic cancer has only started for more than ten years, and it is not very useful for reference in the exploration stage [35], [36]. Cai Zheyuan began to publish articles on texture extraction and classification of pancreatic endoscopy ultrasound images in 2008. For pancreatic endoscopy ultrasound images, 69 texture features were extracted using image processingrelated algorithms, class feature spacing was used for initial feature selection, sequential forward search algorithm was used for further feature optimization, and finally support vector machine classification was used [37]. After testing, this algorithm is feasible and can be applied to the computer-aided diagnosis of pancreatic cancer endoscopic ultrasound images, to identify the presence or absence of pancreatic cancer, and to provide doctors with valuable reference opinions [38].
In theory, the imaging examination of any part of the human body can use computer-aided diagnosis to improve the accuracy of diagnosis. More mature research on computer-aided diagnosis of cancer is breast cancer, lung cancer and liver cancer. Applying computer-aided diagnosis technology to intestinal cancer is a relatively new topic both domestically and internationally, not only because of its concealed location, but also as a narrow and long tubular structure, and because of its complicated adjacent relationship and strong adhesion to other organs, which have brought great difficulties to the early diagnosis of intestinal cancer. Currently there is no general digital image processing method [39]. The purpose of this study is to find a universal, rapid and accurate method for the detection of intestinal cancer.
So far, no one has developed a universal and accurate method for intestinal cancer detection based on CT images. Therefore, it is necessary to comprehensively study the more mature liver segmentation and liver tumor detection methods, combined with digital image processing and pattern classification methods, to find a more suitable detection method for intestinal cancer. In general, the steps of bowel cancer detection include: pancreas segmentation, feature extraction and selection, and bowel cancer recognition [40]. A universal and rapid bowel cancer detection method can determine whether there is bowel cancer, and create objective and quantitative diagnostic indicators, which can improve the accuracy of early diagnosis of bowel cancer and improve the overall medical level.

III. ANALYTICAL METHOD
A. DATA SOURCE AND PREPROCESSING CRC dataset we used was from TCGA (The Cancer Genome Altas) and GEO (Gene Expression Omnibus). Firstly we use the GDC Data Transfer Tool to download the RNA seq data (read count) of colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ) and the clinical data corresponding to the sample information from the TCGA database (https://portal.gdc.cancer.gov/). According to the COAD and READ information recorded by the TCGA, 41 pairs of COAD normal and tumor samples and 10 pairs of READ normal and tumor samples were obtained.
Then download the CRC gene expression profile data from the GEO database (https://www.ncbi.nlm.nih.gov/geo/), including GSE39582 (19 normal samples, 443 tumor samples), GSE41258 (54 normal samples), 186 tumor samples) and GSE44076 (98 normal samples, 50 mucosa samples, 98 tumor samples). In order to ensure the consistency of the differential expression analysis of different data sets, we downloaded the raw data of GSE39582, GSE41258, and GSE44076, respectively, and used the RMA (robust mean analysis) method for homogenization. The pre-processed sample information is shown in Table 1, and finally contains 1049 CRC samples for subsequent analysis.

B. DIFFERENTIALLY EXPRESSED GENE SCREENING
Differentially expressed gene analysis is mainly based on GSE39582 and GSE41258 data, because both are affymetrix platforms and the sample type is tissue. The limma (version 3.8) tool was used to identify differentially expressed genes (DEGs) in normalized and tumor samples of GSE39582 and GSE41258 after homogenization. Genes with fold change more than 2 times and FDR (BH adjusted P-value) <0.05 were taken as DEGs. TCGA's COAD and READ data are of the NGS type. The read count value of the transcript was analyzed by DESeq2, and the FDR <0.05 DEGs threshold was also taken. Before the above differential expression analysis, the similarity of the samples was evaluated on the GEO and TCGA datasets, and the correlation coefficients between the samples were calculated. The results show that the tumor and normal samples from different sources have high internal consistency. Both the heatmap and volcano map expressed by DEGs were constructed using R software.

C. FUNCTIONAL ENRICHMENT ANALYSIS
We use clusterProfiler (version 3.8) package to perform DEA on the biological process (BP) of Gene Ontology (GO), cellular component (CC), molecular function (MF) and KEGG pathway enrichment analysis. Take q value <0.05 as the threshold for significant enrichment. The dotplot of clusterProfiler displays the enrichment result.

D. PROTEIN INTERACTION ANALYSIS
Using the protein interaction (PPI) information provided by the STRING database (https://string-db.org/), we built a PPI network of DEGs, retaining PPI information with confidence socre>0.9, and using Cytoscape (version 3.7.1) Show PPI network. Hub gene analysis uses Cytoscape's Network Analyzer plug-in for analysis, calculates the connectivity degree of each gene (node), and ranks genes according to the connectivity degree. For genes, which are significantly up-regulated and significantly down-regulated, construct the above network and take the degree respectively. The largest gene is determined as the hub gene, and finally 2 hub genes are obtained. The PPI network module analysis uses the MCODE tool and the parameters take the default value.
The GO and KEGG analysis of the genes in the module also use the clusterProfiler tool.

E. CONSTRUCTION OF A NEURAL NETWORK-BASED DIAGNOSTIC MODEL
Using the error back propagation neural network (BPNN) algorithm, we constructed a CRC diagnosis model based on hub genes. First, randomly divided the Normal (healthy), Mucosa, and CRC samples of the GSE44076 dataset, set seed = 12345, and divide the 246 samples into a training set and a testing set evenly. The main parameters of the BPNN algorithm are the learning rate, the lambda of the regular term coefficient, the number of hidden layers, and the number of neurons included in the hidden layer. In order to find the optimal parameters, a grid search method is used to evaluate the performance of the model under different parameter combinations. Since our target value is a categorical variable, the accuracy of the model prediction is used as the model's judgment index. Accuracy is calculated as follows: Finally, the model with the maximum training set and testing set accuracy (training set accuracy + testing set accuracy-1) is the optimal model. The model parameter learning rate = 0.006, lambda = 6e-04, hidden layer = 10 neurons. To avoid biasing the model by random grouping, we used the bootstrap method to calculate the accuracy of the training set and testing set of the model under 100 random samples.

F. STATISTICAL ANALYSIS
Statistical analyses were performed using R (version 3.5.2) software. Student t-test was used to test the significance of differences in gene expression levels of paired samples, and Wilcox rank test was used to perform a two-group significance test of gene expression levels of unpaired samples. The Kruskal-Wallis rank test was used for the significance test of two or more groups, and the FDR was calculated using the BH-method. In this study, unless otherwise specified, * * * indicates p<1e-5, * * indicates p<0.01, and * indicates p<0.05.

IV. NUMERICAL SIMULATION A. ANALYSIS PROCESS
In this study, the gene expression profile data of normal and tumor samples provided by the GSE39582 and GSE41258 datasets were first used to calculate the differentially expressed genes (DEGs) of the two using the limma tool. The DEGs common to both were used for subsequent verification. Using the TCGA's COAD and READ data sets, we verified the identified DEGs to further determine the reliability of our DEGs. We then commented on the possible functions of DEGs, including participating biology processes (BP) and pathways. Analysis of protein interactions allows us to have a deeper understanding of changes in cellular pathways (signaling pathways, metabolic pathways) that may be involved in the transition from normal to tumor. The principle of error back propagation neural network is as follows: Suppose there are a neuron in the input layer, b neurons in the hidden layer, c neurons in the output layer,it is the connection weight of the ith neuron to the jth neuron, and the input vector of the input layer I s, the input weighted sum of neurons in the input layer is: Then its output is the function, and F is the excitation function, and then it is transmitted as the input to the hidden layer of the neural network. After the same change, one output of the neural network can be obtained.
Let the expected output vectors of the neural network are, and the actual output vectors. For the input sample, the error signal of the I th neuron in the output layer is: The total square error of the output layer is: In which, c is the number of neurons in the output layer. If the total number of input vectors is n, the average value of the square error is:   Using the limma tool to analyze the differentially expressed genes of the grouped samples of Tumor and Normal, 414 differentially expressed genes (up / down: 111/303) were obtained. After removing genes with inconsistent expression patterns in the two data sets, a total of 270 DEGs were obtained. This shows that 90 genes were significantly up-regulated in the tumor group, and 270 genes were significantly down-regulated in the tumor group. This means that the activation and inhibition of certain biological processes may be involved in the transition from normal to tumor state of CRC.

C. DEGS FUNCTIONAL ANALYSIS
In order to further study the functions of these DEGs, we performed GO function annotation on 270 DEGs. Due to the significant difference in up-and down-regulated gene expression patterns, we analyzed the function of up-and down-regulated genes, respectively. We see that genes that are up-regulated in the tumor are mainly involved in biological processes related to the extracellular matrix and extracellular structure, while the genes that are down-regulated are involved in biological processes that are significantly different from up-regulated genes, mainly related to ion Detoxification is related to stress response. Analysis of the KEGG pathway found that genes that were significantly up-regulated in the tumor were significantly enriched in the ECM-receptor interaction, focal adhesion, and PI3K-Akt signaling pathway, and these pathways are importantly related to tumor formation and progression. The down-regulated genes were significantly enriched in the pathways such as Fatty acid degradation, Glycolysis / Gluconeogenesis, which indicates that these genes involved in fatty acid and glucose metabolism are inhibited in tumor cells.  that genes up-regulated in the tumor are mainly involved in biological processes related to extracellular matrix and extracellular structure. The biological processes involved in the down-regulated genes are significantly different from the up-regulated genes, mainly related to ion's detoxification and stress response. KEGG pathway analysis found that genes that were significantly up-regulated in tumor were significantly enriched in ECM-receptor interaction, focal adhesion and PI3K-Akt signaling pathway. These pathways are important for tumor formation and progression. The down-regulated genes will be significantly enriched in fatty acid degradation, Glycolysis / Gluconeogenesis and other pathways. This indicates that these genes involved in fatty acid and glucose metabolism are inhibited in tumor cells. VOLUME 8, 2020

D. DEGS INTERACTION NETWORK AND HUB GENE ANALYSIS
The 90 DEGs up and 180 DEGs down get 715 and 1089 PPI network edges, respectively. These edges have a confidence score > 0.9. Hub gene analysis found that the CCND1 and FOS genes had the highest degree of up-regulation and down-regulation in the DEGs network and were significantly higher than other genes (43/54). The module analysis can significantly divide the up-regulated DEGs network into 3 subclusters, which contain 25,12, and 5 genes, respectively. Among them, subcluster1 is closely related to cancer occurrence, and the functions of subcluster2 and subcluster3 are unknown. The down-regulated DEGs network can be significantly divided into 5 sub-networks, among which subcluster1 is related to glycolysis / gluconeogenesis metabolism, subcluster2 is related to ion metabolism, subcluster3 is related to bile secretion, and subcluster4 is related to nitric biosynthesis process. Figure 4 is analysis of Differentially Expressed Gene Interaction. The figure shows that the 90 DEGs upwardly adjusted and the 180 DEGs downwardly obtained 715 and 1089 PPI network edges respectively, and the confidence score of these edges is >0.9. Hub gene analysis found that the two genes CCND1 and FOS had the highest degree in the up and down-regulation DEGs network and were significantly higher than other genes (43/54). Module analysis can significantly divide the up-regulated DEGs network into three subnetworks (subcluster), which contain 25, 12, and 5 genes, of which subcluster1 is closely related to cancer occurrence, and the functions of subcluster2 and subclus-ter3 are unknown. The down-regulation of DEGs network can be divided into 5 sub-networks. Subcluster1 is related to glycolysis/gluconeogenesis metabolism, subcluster2 is related to ion metabolism, subcluster3 is related to bile secretion, and subcluster4 is related to nitric biosynthesis process.

E. DEGS EXPRESSION ANALYSIS IN INDEPENDENT VALIDATION SET
Using expression data of intestinal cancer (colon and rectal adenocarcinoma) provided by TCGA, we verified the expression of the 270 DEGs. Since TCGA's CRC expression data is of RNA type, we used the read count values corresponding to these DEGs to compare the overall expression of up-and down-regulated genes on normal and tumor. It is significantly higher than normal, and the down-regulated genes are also significantly lower than normal on the tumor. These are highly consistent with our results based on the GEO chip expression data. Further testing the expression levels of each of the up-and down-regulated genes in tumor and normal, it was found that in COAD and READ samples, 99% (261/263) and 93% (244/263) of the gene expression were significant. The difference (FDR < 0.05, which further shows the reliability of the DEGs we identified. Figure 5 is the expression of differentially expressed genes on the TCGA CRC dataset. The figure shows that in the COAD sample, the expression of up-regulated genes on the tumor is significantly higher than normal, and the down-regulated genes on the tumor are also significantly lower than normal. In the READ sample, the expression of up-regulated genes on the tumor is significantly higher than normal, while the down-regulated genes on the tumor are also significantly lower than normal. In the significance test, 99% (261/263) of gene expressions on COAD samples were significantly different. In the significance test, 93% (244/263) of the READ samples were significantly different in gene expression.

F. CONSTRUCTING A NEURAL NETWORK-BASED DIAGNOSTIC MODEL
The neural network model constructed based on the expression values of FOS and CCND1 of the two hub genes has an accuracy of > 0.9 on both the training set and the testing set, and the median accuracy of 100 random samples reached 0.943 and 0.927, respectively, indicating random grouping It has less impact on our model. On the whole, the AUC predicted by the model for Normal (healthy), Mucosa, and CRC samples also exceeded 0.97, indicating that the prediction model based on the FOS and CCND1 genes has good performance. Furthermore, we compared the expression levels of the two genes in all samples, and there was no strong correlation between the expression levels of the two genes (cor = 0.16, p = 0.013). The expression levels of these two genes in Mucosa samples were significantly lower than those in Normal (healthy) and CRC samples (p <1e-5), while the FOS gene expression in Normal (healthy) samples was significantly higher than that in CRC samples, and the expression characteristics of CCND1 gene were opposite. It is significantly higher than Normal on CRC. Figure 6 is CRC diagnostic model based on FOS and CCND1 genes. The figure shows that the accuracy of the neural network model based on the expression values of the two hub genes FOS and CCND1 on the training set and testing set are both > 0.9, and the median accuracy of 100 random samples has reached 0.943 and 0.927, respectively. In other words, random grouping has less impact on our model. The AUC of the model training for Normal (healthy), Mucosa and CRC samples all exceeded 0.97. The model's AUC for Normal (healthy), Mucosa and CRC samples also exceeded 0.97. The figure compares the expression levels of the two genes on all samples, and there is no strong correlation from the expression levels of the two (cor = 0.16, p = 0.013). The expression levels of these two genes in Mucosa samples are significantly lower than those of Normal (healthy) and CRC samples (p < 1e-5), while the expression of FOS genes in Normal (healthy) samples is significantly higher than that of CRC samples, and the expression characteristics of CCND1 genes are opposite, The CRC is significantly higher than Normal.

V. IN CONCLUSION
This study used the GSE39582 and GSE41258 data sets from GEO to identify a group of differentially expressed genes between normal and CRC. A total of 270 DEGs were obtained in two sets of data sets from different sources, 90 of which were in CRC samples. There are medium up-regulated genes and 180 down-regulated genes in CRC samples.
TCGA database's CRC (colon adenocarcinoma + rectal adenocarcinoma) independent data set was used to verify 270 DEGs. Among them, more than 90% of the genes showed differential expression in normal and tumor samples. And the expression patterns of up-and down-regulated were also consistent. This shows that our DEGs filtered based on the GEO dataset are reliable.
The functional annotation of differentially expressed genes found that genes that are up-regulated in the tumor are mainly involved in biological processes related to the extracellular matrix and extracellular structure, while down-regulated genes are mainly related to the detoxification and stress response of ion. Pathway enrichment analysis shows that the pathways involved in upregulated genes are mainly related to tumor formation and development, while the pathways involved in down regulated genes are mainly related to fatty acid and sugar metabolism.
Analysis of the protein interaction network based on DEGs shows that the hub genes with a degree significantly higher than other genes: FOS and CCND1, where the FOS gene is the hub gene that down-regulates the DEGs network, and CCND1 is the hub gene that up-regulates the DEGs network. The module analysis of the interaction network divides the up-and down-regulated DEGs networks into 3 and 5 sub-networks, respectively. The functions of the sub-clusters are significantly different.
Using two hub genes: FOS and CCND1, we constructed a CRC diagnostic model based on the neural network algorithm. The accuracy of the model on the training set and the test set was 0.943 and 0.935, respectively, and the AUC reached above 0.95, reflecting our model has better performance. VOLUME  GEWEN HE is currently pursuing the Ph.D. degree in computer science with Florida State University. He is good at statistical analysis and chemical structures analysis. VOLUME 8, 2020