Application of Deep Learning Algorithm in Feature Mining and Rapid Identification of Colorectal Image

Based on deep learning technology, this paper proposes a two-stage colorectal image feature mining and fast recognition model to achieve fully automatic medical image pathology discrimination. Drawing on the ideas of multi-factor Meta-regression analysis widely used in the medical field and the model aggregation framework based on Bayesian prior probability theory, a prognostic model of colorectal tumors suitable for various situations and scenarios is constructed. And using a combination of public data sets and real data sets, design two sets of experiments to verify these models from different angles. The algorithm was used to select one, four, and five related features from three sequences to construct three sets of prediction models. The application of the six algorithms failed to obtain a better predictive model (AUC value range 0.439 ~ 0.640). The algorithm (AUC value 0.750± 0.137) and the algorithm (AUC value 0.764± 0.128) can be used to obtain models with better predictive performance, and the four models are less effective (AUC value< 0.7). In the joint model, the algorithm (AUC value 0.742 ± 0.101) and the algorithm (AUC value 0.718± 0.069) can also be used to obtain a model with better prediction performance. Image-based imaging histology tags can be used as a non-invasive auxiliary tool for preoperative evaluation of histological grading of CRAC, and are expected to be applied in clinical practice to assist in the development of individualized treatment plans.


I. INTRODUCTION
Colorectal cancer is one of the most common malignant tumors in the digestive system. The morbidity and mortality rates are increasing year by year. The survival time of patients with colorectal cancer is significantly related to the tumor stage [1]- [5]. The 5-year survival rate of patients with lesions confined to the original position is about 90 %. And the 5-year survival rate of those with local and distant metastases dropped to 71% and 14% [6]. The liver is the most important metastatic organ for colorectal cancer. About 35% to 55% of patients with colorectal cancer will have liver metastases during the disease [7]. Surgical resection is The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang . currently the most important treatment for liver metastasis of colorectal cancer, and it is also the main method for patients to achieve long-term survival. Whether or not liver metastases can be surgically removed affects the prognosis of patients: the median survival time of untreated liver metastases is only 6.9 months. The 5-year survival rate of unrespectable patients is almost 0, while the median survival period of those with complete liver metastases is 35 months, and the 5-year survival rate can reach 30% to 50% [8]. Therefore, it is the key to improve the prognosis of colorectal cancer patients to detect metastases as early as possible to improve the rate of radical surgical resection, and imaging diagnostic techniques and methods can promote the early detection of colorectal cancer and its liver metastases [9]. To this end, this article intends to review the current status and application of traditional imaging methods [56]- [66] and emerging imaging comic's methods in the diagnosis and treatment of colorectal cancer liver metastases [10].
The following research results have been achieved in predicting the occurrence of liver metastasis based on liver parenchymal background [11]- [15]. Palangi patients with colorectal cancer into three groups: no liver metastasis, simultaneous liver metastasis, and heterochronous liver metastasis within 18 months [16]. Texture analysis of the liver disease-free liver parenchyma in the portal phase showed that the results [17]. The entropy value (o = 1.5, c = 2.5) of the sex liver metastasis group was significantly higher than that of the non-hepatic metastasis group (p = 0.02, p = 0.011), and the uniformity was significantly lower than that of the non-hepatic metastasis group (p = 0.04, p = 0.02); while the texture parameters of the metachronous liver metastasis group are not statistically different from the other two groups: the ROC is established based on the texture parameter entropy value and uniformity, and the AUC value of the area under the curve of the diagnosis of liver metastasis is 0.73 to 0.78 [18]. Mathews et al. multi-center retrospective analysis of 165 patients with colorectal cancer, divided into non-hepatic metastasis group, simultaneous liver metastasis group and liver metastasis group within 24 months [19], analyzed the average gray intensity and entropy of whole liver texture parameters and uniformity (filtration coefficient 0.5-2.5) in predicting the value of liver metastasis [20], multivariate analysis showed that only uniformity (c = 0.5) is an indicator for predicting early metastasis (OR = 0.56), but no corresponding texture parameters were found Further predict the mid-term (7-12 months) and late (13-24 months) transfer [21]. In addition to the study of enhancing CT texture features, Acharya et al. analyzed liver CT texture features of patients with colorectal cancer, including average gray intensity, entropy, and uniformity characteristics [22]. The results found that the liver metastasis group and the non-hepatic tumor group the entropy values are different, and they are different from the uniformity of the extrahepatic disease group [23]. In addition to the above texture analysis based on CT images, especially the portal vein phase images, Wang et al. analyzed the histogram parameters of the whole liver MRI portal vein enhancement rate and found that there are differences in histogram parameters between short-term relapse and non-relapse patients after treatment [24]. It also confirmed the heterogeneity of liver parenchyma background in patients with liver metastases [25].
Poon et al. used the electronic medical record data of colonoscopy at Samsung Medical Center, so it is possible to estimate the risk of screening advanced colorectal tumors more accurately, including age, gender, smoking duration [26], drinking frequency and aspirin use as a training feature, logistic regression was used to develop a predictive model to estimate the risk of colorectal tumors [27]. Since the patient's characteristics can be obtained using only questionnaires, it can be generally used by people undergoing colonoscopy screening and improve prediction the model judges the risk awareness of patients in the high-risk group [28]. Z Poon et al. based on the retrospective cohort study data of breast cancer [29], colorectal cancer [30], lung cancer, lymphoma and ovarian cancer in the EHR system of a cancer diagnosis and treatment institution as training data [31], developed a predictive model for predicting neutropenia in cancer patients the risk of the disease and using external data verification [32], evaluated the model from the perspective of differentiation and calibration to provide a basis for determining the patient's chemotherapy regimen [33]. Poon et al. aggregated data from EHR and medical insurance records into a cancer research network virtual data warehouse [34], developed a scalable algorithm for predicting the presence and timing of breast tumor recurrence [35], and used maximized ROC The area under the curve and the minimum mean absolute error were determined and verified using a third-party gold standard recurrence [36]. Compared with previously published results, the mean absolute error was significantly reduced [37]. Because the patient data from the same data center is highly homogenous, it is generally not necessary to do other special processing on the data to train the model, so the process of constructing the model using similar data can be very convenient [38]. But on the other hand, if such a model is used to predict other data with different distributions, the model's performance will not be as expected [39].
In this paper, two separate models are used for lesion segmentation and pathological diagnosis, instead of training the same model and completing two tasks simultaneously in a multi-output manner [40]. Because the integrated model is susceptible to the influence of medical image types and image quality, the scope of application is relatively narrow, and it is difficult to adapt to the diagnosis needs of multiple diseases; moreover, segmentation tasks and classification tasks have different requirements for the extracted features, and segmentation tasks hope to get local details Feature guidance, but the classification task prefers to get the guidance of abstract semantic features. Using the same feature extraction module may affect the performance of both. Because of the two-level task processing mechanism adopted in this paper, the deep lesion segmentation network and the deep pathological diagnosis network will use a multi-stage training method, by setting the task-related losses for the two models and completing the corresponding training separately to obtain two High-precision, high-availability model. Considering that doctors will pay more attention to the areas where abnormal tissues are located during the diagnosis of medical images, we have added an attention mechanism to the feature extraction process of segmentation models and diagnostic models, so that the model can improve the activation of key areas and pay more attention to the lesion itself and Image information of the area adjacent to the lesion. VOLUME 8, 2020 Figure 1 shows the overall structure of the two-stage medical image disease diagnosis model, which consists of the two sub-models mentioned above, namely the lesion segmentation model and the pathological diagnosis model [41]. In practical application, the segmentation mask output by the segmentation model will be input into the diagnosis model together with the original image as the basis for the diagnosis of disease diagnosis, to reduce the overall misdiagnosis rate. The mask from the deep segmentation model will participate in the multi-channel image construction preprocessing process of the pathological diagnosis model together with the original image of the medical image, providing more pathological features for the final diagnosis [42]. After this move, the two models were unified into a whole, realizing fully automatic medical image disease diagnosis. It is worth noting that each sub-model can also be used independently in different tasks, for example, the output of the segmentation model can be directly used by experts to assist in customizing treatment or surgical plans, while the diagnostic model can be used alone to cooperate with expert physician make more accurate judgments [43]. To effectively evaluate the performance of the two-stage model, this article will use multiple indicators to evaluate each sub-model. At the same time, considering the particularity of medical image analysis, the segmentation effects and diagnostic capabilities of the model are also intuitively displayed by displaying their visualization results. Besides, the two-stage model will also carry out integrated disease diagnostic tests to illustrate its relatively high degree of automation. The aggregation model based on the Bayesian hypothesis and a priori knowledge calculation method used can be obtained by multi-factor Meta-analysis [44]- [48].

II. IDENTIFICATION AND ANALYSIS OF COLORECTAL IMAGE FEATURE MINING A. AGGREGATION MODEL BASED ON GUAYES HYPOTHESIS
The calculation of the above two formulas does not include models from interest groups [49]. When we assume that the model from the interest group also follows a multivariate normal distribution, where the mean of the distribution is and the covariance matrix is Bpi [50]. Based on the above prior knowledge, the mean and covariance of interest groups are: First, verify the models from different sources in the interest group. It is assumed here that the models used are all logistic regression models [51]. In this way, you can abandon the model that is defective in the interest group during aggregation. In this step, strategies such as intercept update, model calibration, or model modification can be considered to improve model performance. The updated model is then applied to verify the predicted probability of the calculation result time in the sample.
The weighted average of the final aggregation model can be written as: The process of the evolution of the distance regularization level set curve is the process of continuously reducing ε in equation (6) and eventually approaching zero [52]. This process is expressed by differential equations, and the expression is shown in (7): Among them: The C-V model is a kind of region-based segmentation method [53]. For images with a large difference between the target region and the background, the C-V model can get a better segmentation effect. The principle of the C-V model is to change the energy function of M-S into: (9) Suppose the original image is f (x, y), (x, y) is the pixel coordinates, the mathematical expression of bilateral filtering is: The right side of Equation 10) is the weighted average of pixel values in the neighborhood of the pixel. The weight W consists of two parts, as shown in equations (11) and (12): In equations (11) and (12), M is the spatial proximity degree representing the molecule, M is the brightness similarity representing the molecule, H decreases with the increase of the Euclidean distance between (i, j) and (, y), M as the difference between the brightness values of the two pixels increases, it decreases. In the part where the image changes slowly, the pixel values will not differ greatly. Bilateral filtering is now a kind of Gaussian filtering; while in the part where the image changes drastically, bilateral filtering the brightness value of the points with similar brightness values near the edge of the image [54]. An average value is used to replace the original brightness value, so the bilateral filter not only smooth the original image but also better maintains the edge information of the image. The bilateral filter is controlled by three parameters: filter half-width N, parameters δ. The greater the N, the stronger the smoothing effect; δ and δ, respectively, controlled the degree of attenuation of the spatial proximity factor w and the brightness similarity factor w.

B. MULTITASK FEATURE MINING WITH AN ATTENTION MECHANISM
The fully automatic medical image disease diagnosis model in this paper consists of two sub-models: lesion segmentation and pathological diagnosis [55]. First, focus on the sub-model of lesion segmentation that provides a pathological diagnosis model for the pathological diagnosis model. It is similar to U-net, our proposed deep learning lesion segmentation network also uses asymmetric full convolution structure, and that is, the number of convolutional layers in the feature extraction process is the same as the number of convolutional layers in the detail restoration process. Figure 2 shows the overall structure of the lesion segmentation sub-model. In addition to following the superior strategy of symmetric FCN (ie, phased up sampling, stride connection, and integrated segmentation loss), our multi-task feature supplementary lesion segmentation network with attention mechanism also incorporates supervision The feature map region attention mechanism improves the up sampling process to force the predicted segmentation mask to focus on the area of the key lesion and integrates the key features supplementation of transfer learning based on natural object semantic segmentation and weakly supervised feature filtering based on position detection To compensate for the lack of insufficient training due to lack of medical images and to remove the interference of the irrelevant segmentation mode on the pixel prediction of lesion segmentation.
The input medical image will first be prepossessed by image enhancement. Under the premise of highlighting the lesion information and reducing the interference of noise information on the segmentation network, the pre-processed medical image will first extract the features of different levels through the segmentation network, and then merge these features with the filtered natural semantic features, while using the lesion area Focused attention technology combined with fusion features to obtain a preliminary segmentation result. Finally, the preliminary segmentation results are post-processed according to the application scenario, and a refined segmentation mask is an output.
To make the attention module located deep in the network fully function, AMTFSLSN creatively uses the multi-level loss based on the multi-scale segmentation mask to guide the attention weight training. The multi-level loss is finally added to the segmentation loss in the form of regular terms and is optimized together with the conventional segmentation loss. Besides, because medical images tend to have low contrast, a large number of common cells and tissues can easily be misconceived as lesions, so multi-sized area detection weakly supervised learning is used to train feature filtering convolutional layers to prevent migration learning from different tasks The weight of the network extracts too many extraneous features and reduces the false segmentation prediction caused by these features.
Because the symmetric FCN network structure is generally deep, to avoid the insufficient training of the shallow attention module, AMTFSLSN uses multi-scale supervised learning to guide the attention mapping at each scale. Figure 3 shows the training principle of the multi-scale supervised attention module. AMTFSLSN uses an attention module at each scale level of the expansion path. The network weight parameters involved in the attention module are not updated using conventional gradient backpropagation but are connected to the final output through a ''step-by-step connection'' to directly update the module parameters using the total loss. To make the training process of the segmentation network correct the attention weight of different scales, a series of real-size segmentation masks of different sizes are used to calculate the dice loss of different modules together with the attention weight matrix at each scale. These dice losses represent how close the attention weight matrix is the content that the real segmentation mark focuses on. The smaller their value, the closer the importance of the region reflected by the attention weight matrix to the actual important position. By minimizing multi-scale dice loss during network training, the area of interest of the attention matrix can be effectively modified.
In AMTFSLSN, there are a total of 5 attention modules; representing the number of pixels contained in the image area. The numerator of the BDL main item reflects the degree of coincidence between the focus areas of the true segmentation focus and the attention weight matrix, and the denominator reflects the total area occupied by the two attention areas. The division of the two clearly expresses the importance of the pixel output by the attention module. It is to the importance of actual pixels. It cannot be directly used for training the segmentation network, because it is not derivable (statistically overlapping the number of pixels and pixel synthesis is a differentiable operation), and cannot be directly propagated in the network optimization process. We can find that lower-level attention weight matrices can focus on more details, and they can even find more precise lesion edges, although these lesions have more glitches on the edges; higher-level attention weight matrices can be more Localize the area where the lesion is located, like the attention module 5 of sample 2, although it locates two candidate lesion areas, the main candidate area is significantly more concerned (the main candidate area occupies most of the attention of the larger value Force weight). The reason for this phenomenon is that the high-level attention weight matrix uses image-level advanced semantic features (such as whether it is a lesion, whether there are burrs on the edge of the lesion, or whether the lesion is benign or malignant, etc.), and the receptive fields of these features are often Larger, it is more suitable for positioning the lesion area, and the low-level attention weight matrix combines many unrefined details, so it pays more attention to the local lesion edge and lesion shape. Figure 3 also proves that the multi-scale supervised attention correction mechanism is effective because no matter which level of attention weight matrix has no large deviation compared with the real lesion segmentation mask.

III. COLORECTAL IMAGING AND METHOD ANALYSIS A. COLORECTAL IMAGE ACQUISITIONS
Preparations used before rectal MRI examination: diet less than 1 day before the examination, lasting for 4-6 hours before the examination. Two hours before the examination, external open-celled dew drops were used to clean the intestine and reduce the impact of feces and gas in the intestinal cavity. Routine intramuscular injection of 20mg intramuscularly 15 to 20 minutes before the examination suppresses intestinal peristalsis, except for patients with glaucoma, intestinal obstruction, benign prostatic hyperplasia, or severe heart disease. After the patient goes to the examination table, an appropriate amount of ultrasound coupling agent is injected through the anus to expand the rectum to facilitate the disease display. A 3.0 TMR scanners (SignaHDx, GE Healthcare; GE Discovery MR750, GE Healthcare, USA) with an 8-channel body phased-array coil was used for highresolution rectal MR scanning for initial staging and feature extraction. MRI scan sequence includes (1) high-resolution oblique axis position (perpendicular to the long axis of the tumor) T2-weighted image (T2-weighted image, T2WI), (2) sagittal and coronal position T2WI, (3) axial position T1 Weighted image (T1-weighted image, T1W1), (4) axial T2WI lipid pressure, (5) axial diffusion-weighted imaging (DWI) (b = 0,800s / mm2), (6) axial three-dimensional liver Accelerated volume acquisition sequence (liver acquisition with volume acceleration-extended volume, LAVA), injection of gadolinium contrast agent (Gadopentetate Dimeglumine, Gd-DTPA, Bayer, Germany) via elbow vein, pressure syringe at 0.1 mmol/kg body weight, 2.0 ml/s rate. A mask scan was performed before contrast injection, and a 9-stage enhancement scan was performed after contrast injection. MR images obtained from the (1) and (6) sequences in this study are further used to delineate the lesions, and the remaining sequences contribute to clinical staging. The scanning parameters are shown in Table 1.
The treatment methods of patients with rectal cancer in this study mainly include (1) total mesenteric excision (total mechanical excision, TME), (2) simultaneous contradiction (chemo radiotherapy, CRT), (3) decoglurant chemotherapy (CRT) followed by TME surgery, (4) Adjuvant chemotherapy after TME. According to the clinical guidelines for rectal cancer, follow-up patients are followed through outpatient or inpatient medical records. The clinical  End-point events were defined as liver metastases found on abdominal imaging studies. The initial metastatic organ was the liver, regardless of whether there were metastases elsewhere. The follow-up time is calculated from the first day after the start of the first treatment. As shown in Figure 4, it is the overall architecture of the diagnostic sub-model.
The data set used to train and test the lesion segmentation sub-model covers the entire colon and rectum with three-dimensional CT scan sequences, which are obtained by scanning the cross-section of the abdominal cavity with a 64-layer multi-row ultrasound probe. A total of 366 patients with colorectal adenocarcinoma (Colorectal Adenocarcinoma, CRAC) preliminary signal-enhanced CT sequences were collected. The first physical examination of these patients included 10 years from 2009 to 2019. The CT sequences are taken about a month before surgery. The patients did undergo preoperative radiotherapy or chemotherapy before shooting. Each patient's CT sequence includes approximately 400-500 abdominal sections with different cross-sections. The three consecutive sections containing the lesions were masked by the doctor using the CAD system. They indicate that the lesions located in specific parts of the rectum/colon. Each slice marked with a mask is regarded as a CT scan of the abdominal cavity, so our dataset contains a total of 366 3 = 1098 CT images. Each image records the abdominal pelvis, abdominal tissue, colon, and information about the rectum and other parts. We divide the training set and test set of the model according to a ratio of about 3: 1, where the training set contains 823 CT photos and the test set contains 275 CT photos. It is worth reminding that, considering that the morphology of the same tumor lesion is different on different cut planes. In our experiment, each CT image will be regarded as an independent sample. At the same time, no distinction was made between lesions with different degrees of differentiation.

B. DESCRIPTION OF EXPERIMENTAL EQUIPMENT AND EVALUATION STANDARDS
For the fairness of the experiment, all the algorithms/models involved in the comparison are completed under the same physical configuration. This article uses the 32-core Intel (R) Core (TM) i7-6850K CPU (main frequency is 3.60GHz) to perform conventional calculations and uses the four-way GeForce GTX1080P GPU for image processing and deep learning operations. The memory is 64GB and the frequency is 1600MHz. The operating system is Ubuntu16.04.1. The model code is consistently written in Python. The lesion segmentation network is based on the Keras deep learning platform and the backend is Tensor flow. The pathological diagnosis network is based on the Pytorch deep learning platform. Both store model parameters and structure in a common model format. When conducting experiments, each algorithm/model takes up only one GPU resource to complete the experiment.
For segmentation prediction results, we will use 5 commonly used segmentation evaluation indicators to conduct a comprehensive performance evaluation. These five indicators include pixel precision (recision), pixel recall (recall), dice coefficient (dice coefficient), Hausdorff distance, an IOU. They are defined by formula (13) ∼ formula (18): runs. Also, to measure the stability of the pathological diagnosis results, the standard deviation (Standard Deviation, STD) will be calculated when comparing with the frontier method, and the significance comparison analysis will also be completed in the comparison between DLDPPF and the frontier method. As shown in Table 2, the AUC, ECI, and 95% confidence intervals for each model.
Finally, the area AUC of the receiver operating characteristic curve (Receiver Operating Characteristic, ROC) will be recorded to evaluate the generalization ability to participate in comparison methods. The training of the pathological diagnosis model requires the segmentation mask of the lesion as an input. To avoid the influence of segmentation errors on the training effect of the diagnostic model, during the training process, the segmentation mask required for multi-channel image construction will use the lesion markers obtained by experts through ''Double Reading''. During the test, the pathological diagnosis model will use the segmentation model trained by enhanced sequence breast cancer MRI to provide segmentation mask input. In other words, the pathological diagnosis model will be tested in the two-stage medical image disease diagnosis framework proposed in this paper for testing. Figure 5 shows the effects of using each strategy in AMTF-SLSN individually or in combination. The optimal value under each test indicator is marked in bold, and Baseline refers to the basic network structure of AMTFSLSN (Backbone), a fully symmetric FCN similar to U-net is used here, and does not contain any of the strategies proposed in Chapter 3. Through Figure 5, we can see that the full version of AMTFSLSN has the best overall performance, and three of the five indicators are far superior to the other five strategy combinations, especially overlapping indicators in two more difficult areas. On dice coif and IOU, they achieved good results of 0.72 and 0.58. This is mainly because AMTFSLSN can accurately restore the detailed information of the lesion after the supplement of the filtered features, such as the burr at the edge of the lesion, the connectivity of the tissue block of the lesion, and so on. At the same time, the supervised attention mechanism also corrects the location of the lesion at different scales, so that the full version of the model can achieve high lesion coverage even under the condition of colorectal cancer with variable tumor location and shape.

IV. RESULTS ANALYSIS A. COMPARATIVE ANALYSIS OF AMTFSLSN STRATEGY
Also achieving better results on detached and IOU is the ''Baseline + Expansion Path Attention Module'' strategy combination. Using a multi-scale attention module alone can also achieve a high segmentation accuracy enough to explain its position correction effect. The strategy combination also defeated the full version of AMTFSLSN at the Hausdorff distance, which may be caused by the more coherent lesion blocks it predicts; AMTFSLSN also uses a lot of supplementary semantic segmentation features, so it is more stringent for the prediction of pixels. Another strategy combination to defeat AMTFSLSN is ''Baseline + Feature Supplement''. It is slightly better than AMTFSLSN in precision. This may be because the supplementary features it obtained have not been filtered by the filter layer, thus slightly more pixels are judged as focus pixels, making potential focus pixels easier to find; but it also affects In addition to the recall, the misjudgment of pixels in normal tissues is increased. A similar situation also appears in the ''Baseline + Shrink Path Pre-training'' strategy combination, because the convolution module of the shrinking path uses rich daily semantic segmentation data for pre-training, which also increases the richness of the extracted features to some extent. After adding the weakly supervised learning filter layer, the strategy combination alleviates the situation and improves the recall. Interestingly, although the full version of AMTFSLSN has two indicators that are not optimal, it still achieves considerable results, only slightly worse than the winning strategy combination. Figure 6 shows the changes in the training set loss and test set loss of each strategy combination during the training process (we used the dice loss in the experiment). Obviously, during the training process, the loss of the training set changes more smoothly and smoothly, while the loss of the test set fluctuates greatly in the early stage of training, which is in line with the training rules of deep learning models because the segmentation network is based on the feature information provided by the training set. For optimization, the test set belongs to new data, so in the early stage of training, before the model has learned the underlying mode, the loss of the test set will fluctuate greatly. Another obvious phenomenon is that the strategy combination that uses more daily semantic segmentation data to assist training will converge faster (for example: ''Baseline + feature supplement'' and ''Baseline + contraction path pre-training''. Green line), and the model stabilization time is earlier (as can be seen from Figure 6 (b), the training loss during stabilization will be lower. However, this strategic combination will make the model more likely to fall into the local optimal, because according to Figure 6 (b), we can find that the strategy combination that ultimately results in the lowest loss in the test set is the full version of AMTFSLSN (black line in the figure). Convergence strategies of the remaining strategy combinations are almost consistent with AMTFSLSN, but none of them have achieved high segmentation accuracy like AMTFSLSN. Because attention modules at various scales in AMTF-SLSN use ''stride connection'' to directly connect to the total loss to complete supervised training, each attention module has its loss. Figure 7 is used to observe the impact of the loss change of each level of attention module on the total loss. It should be noted that Figure 7 uses a dual Y-axis format to organize graphic information. Only the black ''breakpoint line'' in the figure represents the change in a total loss, and the remaining color curves represent the loss change in attention modules of different scales. Only the strategy combination using the attention module with the position correction mechanism is discussed here (that is, the full version of AMTF-SLSN and ''Baseline + Expansion Path Attention Module''). Obviously, although the five attention modules are located at different scales of the segmentation model, their change trends are very close, and they also dominate the change mode of the total loss, because the losses of all attention modules are used as regular terms to constrain the total Loss optimization can naturally control the direction of optimization. Similarly, the loss change curve of the training set is smoother and smoother than that of the test set. During the training process, the test set even experienced large oscillations, such as 10 ∼ 12 Epoch in Figure 7 and 6 ∼ in Figure 7. This shows that the optimizer is jumping out of the local optimal and moving towards a more optimal solution. Looking at Figure 7, we can also find that the optimization effect of the attention module at the middle level will be better (yellow, blue and green lines in the figure), which may be because the feature map output by the middle-scale network module can be better Balancing pixel-level features and image-level features will not be affected by the transition of one of the features, but for the test set, the optimization of the middle-level attention module also produces greater fluctuations (such as Figure 7 Loss curve of attention module 2).

B. LAYERED ANALYSIS OF THE EFFICACY OF COLORECTAL IMAGING LABELING
As shown in Figure 8, hierarchical analysis based on rectal adenocarcinoma and colon adenocarcinoma further verified the effectiveness of the imaging histology label in evaluating the histological grade of adenocarcinoma before surgery. The AUC of the rectal adenocarcinoma group and colon adenocarcinoma group was 0.895 (95% CI: 0.838-0.952), 0.725 (95% CI0.653-0.797). When the cut-off value of the Rad-score is also taken as -0.284, the SEN, SPE, PPV, NPV, and ACC of the imaging histology label used for preoperative evaluation of adenocarcinoma histological grade in rectal adenocarcinoma are 0.789, 0.821, and 0.600, respectively., 0.920, 0.813; in colon adenocarcinoma, its SEN, SPE, PPV, NPV, and ACC were 0.500, 0.871, 0.712, 0.732, and 0.727, respectively. The correction curves of the rectal adenocarcinoma group and colon adenocarcinoma group showed that the imaging histology label used for preoperative evaluation of adenocarcinoma histological grade had a good calibration degree, and the P values of H-L test were 0.119 and 0.752, respectively. Use the 50% cross-validation method and two machine learning algorithms (SVM and LR) to build four sets of models, namely the T2WI model (constructed from 5 optimal T2WI feature sets) and the VP model (consisted of 8 optimal venous phase features) Set construction), T2WI + VP model (constructed from a total of 13 feature sets selected by the two sequences) and T2WI / VP model (constructed from the 22 optimal feature sets selected from the 2058 features combined by the two sequences). The ROC curve and average predicted value of the four-group model using the five-fold cross-validation method are shown in Figure 9 and Figure 10. Machine learning algorithms LR and SVM can be used to build predictive models. Choosing appropriate machine learning algorithms may help to improve the stability and predictive performance of the model. Some studies have reported the advantages of the LR and SVM algorithms in tumor research. Some studies believe that the SVM algorithm is slightly better than the LR algorithm. In our study, the LR algorithm of the VP model is superior to the SVM algorithm, while in the other three models. There is no obvious difference between the two algorithms in the two. It is generally believed that the LR and SVM algorithms are more suitable for the model construction of small sample sizes and binary variables, but for MRI enhanced sequences, our small sample results recommend the LR algorithm. Besides, in terms of the algorithm used, our prediction model is also stable. In this study, a five-fold cross-validation method commonly used in machine learning research is used to build prediction models to avoid selection bias as much as possible, and this method is more suitable for the construction of small sample models. Besides, we conducted 100 rounds of cross-validation to test the stability and repeatability of the results of one round of cross-validation. The results show that the round of crossvalidation is reliable and representative, and it also ensures that the results are not obtained by chance.
The T2WI and VP sequences were screened using the LASSO algorithm to obtain 5 and 8 optimal comics' features, respectively. This is consistent with the enhancement sequences obtained from previous studies to obtain more available related features. The possible reason is that the enhancement sequence image contains. More blood supply information and better-reflected tumor heterogeneity.
The T2WI / VP model constructed by 22 features selected from the 2058 features combined by the T2WI and VP sequences is superior to the T2WI + VP model constructed by combining the 13 omits features selected by the two sequences, which may be related to The interaction between the features of different sequences is related. Combining MRI multiple sequences may help to extract more valuable features, to build a more stable and effective model. However, using too many sequences will affect the clinical application of image segmentation because of the time-consuming and labor-intensive. Therefore, it is more important to choose the most valuable sequence.
In this study, the T2WI / VP model constructed by the LR algorithm is superior to the other three groups in predicting MLM. Therefore, for patients with rectal cancer, the multisequence MR imaging histology prediction model is helpful for early prediction and follow-up examination. Early detection of metastases may change treatment strategies, and more high-risk patients with MLM may have the opportunity to receive individualized treatment to improve prognosis. Machine learning-based imaging optics models may suggest the presence of occult metastatic lesions, which are difficult to find with existing imaging examination methods. This retrospective study included few clinical features and no significant differences between groups, so it was not used to construct a predictive model. Prospective studies that include more relevant clinical parameters in the future may improve the predictive power of the model and thus play a greater role in clinical value.

C. EVALUATION EFFECT ANALYSIS
VGG16 with 13 convolutional layers and 3 fully connected layers pre-trained in ImageNet are used to initialize the feature extraction network. All new layers are randomly initialized by extracting weights from a zero-mean Gaussian distribution with a standard deviation of 0.01; the training process uses two stages of training, each of which includes 80,000 RPN candidate regions (the first 60,000 times) The learning rate is 0.001, and the subsequent 20,000 times the learning rate is 0.0001) and 40,000 times based on the candidate region-based feature vector classification and regression (the first 30,000 times the learning rate is 0.001, the last 10,000 times the learning rate is 0.0001); the amount of exercise (momentum) is 0.9, weighted delay (weight decay) is 0.0005; the anchor scale (scale of anchor) of the area generation network is set to 1282, 2562, 5122, and the anchor ratio (aspect ratio of anchor) is set to 0.5, 1, 2; in training In the process, by calculating the error value between the predicted value and the true value, using the error back-propagation (end to end back-propagation) algorithm and SGD (Stochastic Gradient Descent) method, adjust the weighting (weight) and other deep learning network parameters, and then continue to iterate Reduce the value of the loss function to converge the network. The Loss curve of the training process is shown in (Figure 11).
The validation group included 6030 images of 100 patients. The time for manual diagnosis of a single case is about 600s, and a total of 912 images are diagnosed with lymph node metastasis; the total time consumed by FRCNN automatic lymph node detection platform to diagnose the target image is 1071.81s, that is, the average image recognition time for each image is about 0.18s, single The case took an average of 10.72s, and a total of 987 images were diagnosed with lymph node metastasis. A total of 772 images with the same diagnosis results (that is, the location of lymph nodes and the number of metastases are the same). First, to fully reflect the training effect in the training process, we recorded the accuracy and recall rate of the nodule categories in the VOLUME 8, 2020 FIGURE 11. Loss curve of the training process of colorectal lymph node assisted diagnosis system based on deep neural network. training set and verification set, and plotted the data as a PR curve, as shown in Figure 12. The area under the curve is 0.3949.
Although the two sub-models of the two-stage medical image diagnostic method has achieved good results on some difficult data sets. But because they still use a non-heuristic optimizer during training, their final stable results are not necessarily the optimal results. Performance may decline after replacing a batch of data sets. Here, we consider using some heuristic multimodal optimization strategies to replace the optimization algorithm used in training, so that the final optimization results can be used in a wider range. Our laboratory has achieved some results on these strategies, which can be adapted appropriately. Also, because both the lesion segmentation sub-model and the pathological diagnosis sub-model adopt a multi-channel CNN architecture, operations can be performed in parallel. Therefore, we will also consider using the laboratory's achievements in distributed deep learning to improve the specific implementation of the model. The specific method is: during training, several batches of image data are input into several different graphics processing units (Graphics Processing Unit. GPU), and a forward propagation and backpropagation are completed at the same time. Finally, the gradient obtained by the total operation is updated the model weight synchronously. The lesion segmentation model and pathological diagnosis model use different computing resources and tags to complete the training, respectively. When inference (Interference), the two-stage parallel architecture is used to reduce the calculation time, the main and auxiliary network of the split model, the ''static'' and ''dynamic'' two-way network of the diagnostic model are simultaneously operated by different GPUs, and the two submodels. The calculation modules on different scales also complete the calculations in parallel. Of course, the specific implementation of parallelization should also consider the degree of support of the underlying deep learning framework, such as the need to set a reasonable number of convolution kernels and parallel CUDA programming to achieve device control.
At the same time, to more intuitively and comprehensively reflect the regression and classification results during the test, we counted the number of true positives / false positives in all the nodule regions marked in the test set and calculated the true positive rate under different probability threshold (TP) and false-positive rate (FP), drawn as ROC curve as shown in (Figure 13), after calculation, the area under the ROC curve is 0.8862, that is, the AUC value is equal to 0.8862, which accurately and comprehensively reflects the test data Set the effect on the trained model. Aiming at the problem that a pathological diagnosis is prone to insufficient medical features, a two-way CNN feature extractor with a ''static-dynamic'' structure is used in combination with feature redundancy penalty loss to increase the richness of the extracted features. Among them, ''static'' CNN is mainly used to extract general multi-scale medical features, while ''dynamic'' CNN is based on context-aware mechanisms and channel attention units to extract pathological semantic features that are more relevant to medical scenes. To properly retain the required features at different scales. The pathological diagnosis sub-model also uses a multi-level feature filter to achieve controllable retention of the features of each level. Also considering the lack of training data and to reduce the computational overhead of the training process, the ''static'' CNN uses transfer learning for training, so that it can have a more general feature extraction capability.
The colorectal high-resolution MRI automatic lymph node recognition system based on the deep neural network has high accuracy and high efficiency and has clinical significance in assisting diagnosis. The two-stage medical image disease diagnosis method proposed in this paper has achieved excellent performance in the colorectal cancer CT dataset. Aiming at the problem that pixel segmentation of lesions is easy to be misjudged, a multi-scale attention mechanism for position correction, and a feature supplement mechanism with regional feature filtering are proposed. During training, the attention module can learn the pattern of lesion position correction from lesion masks of different sizes, and influence the segmentation of the actual lesion in the form of an attention weight matrix. Feature filter layer learns the corresponding filter activation through multi-stage weak supervisory training that only contains the coordinates of the lesion area, to complete the preservation of the details of important lessons. Also, considering the serious shortage of training data, the auxiliary road network of the lesion segmentation sub-model uses multi-stage transfer learning to complete the training, ensuring that the auxiliary road network can rely on rich feature extraction modes to provide sufficient supplementary features. Whether it is lesion segmentation or pathological diagnosis, it has overcome the cutting-edge methods in the corresponding field in terms of overall performance. These experimental results also prove that the method in this paper has a wide range of applications and has contributed to fully automatic intelligent disease diagnosis.

V. CONCLUSION
In this paper, the deep learning algorithm of colorectal image feature mining and rapid identification has a high 24-month MLM prediction performance for the rectal MR imaging comics feature model before treatment, especially the model built with the LLR algorithm has the best performance. In addition to the VP algorithm, the LR algorithm is superior to the SVM algorithm, there are no significant differences between the two algorithms in the other three sets of models. The lesion segmentation network integrates a supervised attention mechanism for correcting the location of the lesion and multi-scale feature supplements that include regional feature filtering to mitigate the misjudgment of pixels in the mass. The pathological diagnosis network uses a dual-channel convolutional neural network feature extractor with feature redundancy control and multi-layer feature filtering based on maximum correlation to mine more diverse and useful pathological features to improve the diagnostic accuracy. The lesions obtained from the segmented sub-network will be submitted to the pathological diagnosis network to assist in the final diagnosis of the disease. Also, two models overhead through transfer learning and use the multi-channel network structure to improve the richness of the extracted features, which also provide the basis for the parallelization of the model. Our work uses two challenging medical image datasets to test the two subnetworks separately. This one dataset is colorectal cancer electronic computed tomography dataset. The experimental results show that both the segmentation model and the diagnostic model have achieved excellent results, and are significantly better than the cutting-edge methods. At the same time, it also verifies the effectiveness and reliability of the two-stage medical image disease diagnosis model proposed in this paper.
JIAN HONG is currently pursuing the master's degree. He is also a Senior Engineer with the Information Center, The First Affiliated Hospital of Anhui Medical University, China. His research interests include medical informatics, hospital informatization, and data mining.