Chinese Named Entity Recognition of Epidemiological Investigation of Information on COVID-19 Based on BERT

The named entity recognition based on the epidemiological investigation of information on COVID-19 can help analyze the source and route of transmission of the epidemic to control the spread of the epidemic better. Therefore, this paper proposes a Chinese named entity recognition model BERT-BiLSTM-IDCNN-ELU-CRF (BBIEC) based on the epidemiological investigation of information on COVID-19 of the BERT pre-training model. The model first processes the unlabeled epidemiological investigation of information on COVID-19 into the character-level corpus and annotates it with artificial entities according to the BIOES character-level labeling system and then uses the BERT pre-training model to obtain the word vector with position information; then, through the bidirectional long-short term memory neural network (BiLSTM) and the improved iterated dilated convolutional neural network (IDCNN) extract global context and local features from the generated word vectors and concatenate them serially; output all possible label sequences to the conditional random field (CRF); finally pass the condition random The airport decodes and generates the entity tag sequence. The experimental results show that the model is better than other traditional models in recognizing the entity of the epidemiological investigation of information on COVID-19.


I. INTRODUCTION
Since the outbreak in late 2019, COVID-19 has spread rapidly around the world, and it has become a global threat [1]. The epidemiological investigation of information COVID-19 released by the National and Provincial Health Commission contains vital information that plays an essential role in the control of the current outbreak and the prevention of future outbreaks in China, such as the location of the patient route, the time of movement and the means of a vehicle. Therefore, how to quickly and accurately allow computers to find the aforementioned critical information from the epidemiological investigation of information becomes an urgent problem.
The core task of named entity recognition(NER) is to extract entities from natural language text [2].NER is not The associate editor coordinating the review of this manuscript and approving it for publication was Sotirios Goudos .
only a core task for information extraction [3], but also essential for some natural language processing (NLP) tasks [4], such as machine translation [5], text understanding [6] and knowledge graph construction [7].Therefore building a named entity recognition model for the epidemiological investigation of information on COVID-19 can lay the foundation for the following entity relationship extraction.
Most of the named entity recognition in Chinese is based on three entities: person name, place name, and organization name. The instance types in the epidemiological investigation contain the above three entities and more special entities such as patient's transportation and number, and patient's body temperature, which need to be recognized as entities.
To solve the above problems, this paper proposes the BBIEC model by collecting the epidemiological investigation of information on COVID-19 and constructing the dataset with it. It concluded that the BBIEC model could better understand entity boundary information than the traditional named entity model. Compared with a single neural network, the dual neural network of BiLSTM and IDCN can better identify global contextual and local features and improve the metrics of named entity recognition. The BBIEC model proposed in this paper can provide specific ideas, references, and solutions for subsequent research on other COVID-19related fields.
Therefore, we developed a NER model called BBIEC based on the characteristics of the epidemiological investigation of information on COVID-19. The main contributions of this work are summarized as follows.
(1) Based on the problem that the features of text entities in the epidemiological investigation of information on COVID-19 are different from the general named entity recognition, a BERT pre-training model is used in the epidemiological investigation of information on COVID-19 to dynamically generate the COVID-19 word vectors against COVID-19 words based on the input, which is more suitable for prediction of the COVID-19 corpus.
(2) Based on the problem that the text in the epidemiological investigation of information on COVID-19 is too long to extract features effectively, the text information is obtained from both global and local aspects by BiLSTM and IDCN models, and the recognition effects of both kinds of information are compared in serial mode and parallel mode of the model to select the feature fusion method with better recognition effects.
(3) Based on the problems of high text similarity and neuron necrosis of Relu activation function in the epidemiological investigation of information on COVID-19, we use the elu activation function instead of the ReLu activation function to activate some dormant neurons, which solves the above two problems and further improves the feature extraction ability of the model.
The experimental results on the COVID corpus show that the recall and F1 values of the model surpass most of the models, reaching 0.9561 and 0.9521.
The rest of this paper is structured as follows. We review the development of NER models in Section 2. Then, we develop the BBIEC model in Section 3 and conduct analytical experiments on text classification in Section 4. Finally, concluding remarks are presented in Section 5.

II. RELATED WORKS
There are three main approaches for Chinese named entity recognition: rule-based approach, statistical machine learning-based approach, and deep learning-based approach.

A. RULE-BASED APPROACH
Rule-based methods construct rules or dictionaries by hand, with rule or dictionary and string matching as the primary means. This method requires a high level of personnel to construct rules or dictionaries, which is not only time-consuming and laborious, but also prone to errors due to subjective factors and requires the construction of different rules or dictionaries for different domains, and has poor portability.

B. STATISTICAL MACHINE LEARNING-BASED APPROACH
The two main approaches based on statistical machine learning are the classification model approach and the sequence model approach. For example, Ju et al. [8] used Support Vector Machine (SVM) to implement named entity recognition task for biomedical texts; Ekbal et al. [9] used the maximum entropy (ME) framework to construct many classifiers based on different representations of a set of features, applied to named entity recognition for medical use in the biomedical field; Niu et al. [10] discussed the comparative experiments of HMM and MEM in the same environment, analyzed the characteristics of the two models and their applications in named entity recognition, and pointed out the advantages and disadvantages of the two models; Hu et al. [11] investigated CRF-based recognition of Chinese named entities by implementing three main named entities: person, location, and organization recognition, at the word and character levels, respectively. The performance of two-level models is experimentally compared.

C. DEEP LEARNING-BASED APPROACH
In recent years, with the continuous development of hardware devices, deep learning methods with high computing performance requirements have become mainstream. In NER tasks, deep learning-based methods perform better than rule-based or dictionary-based and statistical machine-learning-based methods, which do not require the setting of artificial features, and neural networks can automatically learn features from the dataset, so various neural network-based models are applied to NER tasks. For example, Huang et al. [12] used a BiLSTM-CRF model using a bidirectional long-and short-term neural network model combined with CRF layers for the named entity recognition task. Strubell et al. [13] proposed a CNN with better large context and structured prediction than traditional CNNs called IDCNN, which is a faster NER for replacing the BiLSTM scheme; Yang et al. [14] proposed an improved transformer-BiLSTM-CRF model applied to the text domain of substation knowledge to achieve entity recognition of substation knowledge more effectively; An et al. [15]proposed a bidirectional long term memory conditional random domain model (MUSA-biLSTM-CRF) based on multi-headed self-attentiveness. The model is able to capture the weighting relationship between Chinese characters and multi-level semantic feature information more effectively by introducing multi-headed self-attentiveness and incorporating a medical lexicon; Jiang et al. [16]proposed a BiLSTM-IDCNN-CRF model based on word embedding, combining the BiLSTM network and IDCNN networks to obtain features with different granularity.
The research focus on named entity recognition is quite different between Chinese and English. Chinese does not have spaces to segment different words, and character and word models have a greater effect on entity recognition in Chinese VOLUME 10, 2022 contexts. Li et al. [17] concluded that word-based models consistently outperformed word-based models by comparing the effects of words and characters on language modeling, machine translation, sentence matching, and text classification models. Liu et al. [18] concluded that word vector-based entity recognition is more accurate by comparing the effect of word vector-based and word vector-based entity recognition. However, at the same time, word vectors cannot solve the problem of polysemy. Therefore, Google's Devlin et al. [19] proposed a BERT pre-trained language model, which is a deep learning model based on Transformer bidirectional encoder, which pre-trains a bidirectional language model through a large text corpus to capture the bidirectional relationships in utterances and generate contextually relevant word vectors to effectively solve the above problem. The model combined with BERT has achieved excellent results in named entity recognition in different domains. For example, Li et al. [20] proposed an identification method combining the Bert model with BiLstm-CRF, which was applied to a literature study on thyroid secretion summarization to obtain a high identification rate; Tang et al. [21] proposed a BERT-LCRF model for named entity recognition using a pre-trained language model BERT for feature extraction of clock domain text, and then a linear chain conditional random field (Linear-CRF) method for the NER task; Gao et al. [22] added an attention mechanism to the BERT-BiLSTM-CRF model to increase the local extraction capability of the model; Gan et al. [23] used a Chinese named entity recognition method based on the BERT-Transformer-BiLSTM-CRF model to address the large number of pronouns and polysemous words; Li et al. [24] proposed a BERT-IDCNN-CRF based model for the problem of too many BERT training parameters and too long training time, which is faster and more responsive; Wu et al. [25] proposed an improved NER model that uses BERT as a pre-training layer and a BiLSTM network as a coding layer to improve the performance of NER in railroad construction using the feature extraction capability of CNNs; Chang et al. [26] proposed a Bert-based named entity recognition method and built a BERT-BiLSTM-IDCNN-CRF model. The trained word vectors were then fed into a BiLSTM and an IDCNN for feature extraction.
At present, there are still some problems in the field of epidemiological investigation of information on COVID-19: the lines of epidemiological investigation of information on COVID-19 published by national and local health committees are not the same; the flow transfer information needs to redefine new entities; there is no mature and large amount of annotated data for epidemiological investigation of information on COVID-19 for the time being, and the manual annotation cost is high; the named entity models in other fields are difficult to be perfectly applied to the field of epidemiological investigation of information on COVID-19. To this end, this paper proposes the BBIEC model by embedding the improved IDCNN into the BiLSTM-CRF network and coupling it with the BERT pre-training model.

III. PROPOSED METHOD
This paper mentions that the core of Chinese named entity recognition based on BERT epidemiological investigation of information on COVID-19 proposed is the construction of the BBIEC neural network model, whose overall structure is shown in Figure 1. Moreover, its function can be divided into three layers: the BERT pre-training layer, BiLSTM-IDCNN-ELU neural network layer, and the CRF inference layer.
To represent our model more clearly and intuitively, the detailed procedures of our proposed BBIEC model are given in visualized algorithm format, as shown in Table 1.

A. BERT PRE-TRAINING LAYER 1) CLASSIFICATION LEVEL OF THE CORPUS
Before inputting the original corpus into the BERT pretraining model, we need to select the division level of the corpus, which is divided into two levels: word level and character level.
For example, '' '' means a high-speed train with the number G96, which contains two entities in six characters, '' '' is a NUM entity, and '' '' is a VEH entity, according to the word level, it can only mark the entity type, but cannot determine the boundary of recognition. In contrast, according to the character level, it can not only mark the entity type but also add the position mark for the character; for example, the entity mark of '' '' is ''G, B-NUM,'' ''9, I-NUM'', ''6, I-NUM'', '' , E-NUM'', i.e., the entity marker mentioned in Section 1.1 BIOES tagging method, the inclusion of location tagging can distinguish the boundaries of entities more effectively; meanwhile the effect of entity recognition based on character vectors and word vectors concludes that the effect of entity recognition based on character vectors is more accurate [18]. Therefore, the character-level corpus division is chosen in this paper.
Let the sequence of the input model be X = {X 1 , X 2 , . . . , X n }, where X i is the corpus processed into a single Chinese character.

2) BERT EMBEDDING LAYER
In the COVID-19 corpus, the same character in different positions of the sequence can represent different meanings, that is, the problem of character polysemy. For example, '' '', which means calling 120 emergency hotline, '' '', which means the patient was taken to hospital by 120 ambulances for treatment, where ''120'' is in different positions, the meanings are entirely different, the former represents the contact information of the emergency. The former represents the contact information of the emergency organization, and the latter represents the emergency organization itself.
To better distinguish the entities of this type of data, this paper uses the idea of location coding to add different codes for the same characters in different locations to solve the problem of character polysemy. Under the above idea, the BERT embedding layer is chosen to add position information to the characters, and its structure consists of Token Embeddings, Segment Embeddings, and Position Embeddings, respectively. Among them, the Position Embeddings layer uses the sine and cosine functions to encode the position information of characters into a feature matrix. The same character can express different semantics under the feature matrix of different positions, solving the character polysemy problem.
The character-level corpus is input into the BERT embedding layer, and the sequences are processed and summed by the three embedding layers, and then are converted into embedding vectors E = {E 1 , E 2 , . . . , E n } and output to the Trm layer of BERT, which represents a Transformer [27] encoder part. the BERT embedding layer is shown in Figure 2.

3) BERT OVERALL STRUCTURE
The BERT used in this paper consists of 12 Trm layers. The input of the BERT is the embedding vector E = {E 1 , E 2 , . . . , E n } output from the embedding layer. The output is T = {T 1 , T 2 , . . . , T n }. each of the Trm layers consists of a multi-headed self-attentive mechanism layer, a residual connection and normalization layer, and a fully connected feedforward neural network layer, respectively. After 12 Trm layers are encoded, they are normalized by the Softmax layer to the vector T output. The structure of BERT is shown in Figure 3.

B. BiLSTM-IDCNN-ELU NEURAL NETWORK LAYERS 1) BiLSTM NEURAL NETWORK LAYERS
A recurrent Neural Network [28] can take the output of the previous time slice as the input of the next time slice so that VOLUME 10, 2022  it can handle temporal sequences well. The epidemiological investigation information of the COVID-19 corpus is a temporal sequence. However, its corpus length is long, and some sentences are more than 200 characters long, which is more likely to generate a long-distance dependency problem using RNN. This will lead to the model unable to learn global contextual features effectively, resulting in poor entity recognition.
Long Short-Term Memory Neural Networks [29] is a special kind of RNN. Its neurons have three parts: input gate, forgetting gate, and output gate. The input gate controls which information is input, the forgetting gate controls which information is forgotten in the neuron, and the output gate controls which information is output. The specific formulas of the three gating units are shown in Eqs.
where: input i ,forget i ,output i are the states of the input gate, forget gate, and output gate, respectively; σ is the Sigmoid activation function; W is the weight matrix of the three gating units; b is the bias term; T is the input vector at the time i; T i−1 is the hidden layer state of the LSTM unit at the previous moment; C i−1 is the memory information in the LSTM unit at the previous moment. The memory information is updated as shown in Equation (4).
LSTM neurons pass memory information and hidden layer state information between them. Its control of feature circulation and loss through the gate mentioned above effectively solves the long-distance dependency problem, so LSTM is chosen as the base neural network in this paper.
However, the one-way LSTM network can only learn the historical features and not the future features, so this paper adopts the BiLSTM network, which is a bidirectional LSTM, to avoid the problem of losing the historical features due to the long sentences by splicing the features in both directions, to better learn the contextual features and solve the long-distance dependency problem.
The BiLSTM neural network layer accepts the output of the BERT pre-training layer, and the output of the BiLSTM neural network is obtained as L = {L 1 , L 2 , . . . , L n } and input to the next neural network. The structure diagram of BiLSTM is shown in Figure 4.

2) IDCNN-ELU NEURAL NETWORK LAYERS
In the epidemiological investigation of information on COVID-19 corpus with long sentence length, although the BiSLTM model can extract the global contextual features well, it may ignore the essential local features in the sentence. Convolutional Neural Networks can extract local features through convolutional operations.
Dilated Convolutional Neural Networks (DCNN) [30] is a particular type of CNN with a convolution kernel that adds a dilation distance d, which increases the perceptual field and can learn local features better. The internal implementation of the convolution operation is shown in Equation (5).
where: K is the weight matrix of the convolution kernel, l is the window length of the convolution kernel, T is the vector of BERT inputs, and b is the bias term.the structure of DCNN is shown in Figure 5. Iterated Dilated Convolutional Neural Networks (IDCNN) are composed of multiple layers of DCNNs with different dilation widths, followed by the calculation of the feature vector of the current dilation convolution using the previous layer of dilation convolution. The calculation is shown in Equation (6).
where: elu (Exponential Linear Unit) activation function, H i is the layer i expanded convolutional neural network, and D i is the feature vector learned by the layer i convolutional network.
The elu activation function [31] is an improved activation function for the negative part of relu [32]. as shown in Equation (7).
Elu activation function uses an exponential calculation-like output for x < 0, which solves the problem that some neurons in the relu activation function cannot be activated. The images of the relu activation function and the elu activation function are shown in Figure 6.
IDCNN further strengthens the acquisition of local features relative to DCNN. IDCNN model addresses the problem that BiLSTM would lack local features, and the essential features in the corpus are extracted by convolution operation

C. CRF LAYERS
The BERT-BiLSTM-IDCNN model has been trained to output specific scores for each label, and the largest of these scores is selected as the output label. However, the final score may not be precisely correct. There may be label location errors, e.g., ''Handan, Hebei Province,'' as a location entity. The ''river'' and ''city'' ask for the beginning and the end of the location, respectively, and the rest of the characters are in the middle of the location. If a different wrong location ends at the beginning and middle of the location, the predicted label does not conform to the BIOES annotation system.
CRF can help to modify the above error by introducing constraints directly on the labels. Given that the sequence of the input model is X = {X 1 , X 2 , . . . , X n }, its corresponding VOLUME 10, 2022 predicted sequence is Y = {Y 1 , Y 2 , . . . , Y n }. The score of the label sequence is shown in Equation (8).
where: W is the transfer score matrix, W Y i ,Y i+1 is the transfer score of Y i transferred to Y i+1 ; P is the score matrix of the upper output, P i+1,Y i+1 is the score of the label Y i+1 corresponding to the i + 1th word of the output sequence. The probability of tag sequence Y generation is shown in Equation (9).
where:Ỹ denotes all possible labeled sequences. Finally, in the decoding stage, the optimal path is solved using the Vibit algorithm. The Vibit algorithm is calculated as shown in Equation (10).
After the above process, CRF can effectively check the labels of columns and improve recognition accuracy. Therefore, CRF is used as an inference layer to avoid label position errors. As a result, the sequence X of the input model is predicted by the BBIEC model to obtain the sequence Y * .

IV. EXPERIMENT DESIGN A. DATASET 1) DATASET LABELING
In this paper, the dataset was extracted from the epidemiological investigation of information on COVID-19 published on the structured data of the trajectories of patients diagnosed with the epidemiological investigation of information on COVID-19 [33] released by the national and local health care commissions, major news portals, and Beijing Advanced Innovation Center for Big Data and Brain Computing, of which about 200,000 words of raw data were taken and manually annotated.
The design principle of the entities in the named entity recognition task is to be able to represent the key information in the original text effectively. The epidemiological investigation of information on COVID-19 of named entity identification task is to control the spread of the epidemic, and The spread of the epidemic is mainly caused by direct or indirect human-to-human contact [34], and to control the spread of the epidemic, we need to know the movement trajectory of patients, so the entity design of this paper is centered on the COVID-19 patients, and because the original trajectory text format of the literature [33] is roughly ''someone did something at a certain time and place'', also contains some modifying ingredients(body temperature, transportation, etc.), so based on the above analysis, we abstractly designed the definitions of nine entities with COVID-19 patients as the core entity, including ''Patient name (PER),'' abbreviated as ''Patient''; ''Location of patient's route of residence (LOC),'' abbreviated as ''Location''; ''Organization (ORG''; ''Vehicle used by the patient (VEH),'' abbreviated as ''Vehicle''; ''Telephone number (TEL)''; ''Patient temperature (TEMP)''; ''Number of vehicles (NUM) ''; ''Time (TIME)'' and ''Date (DATE).'' A closed-source named entity recognition dataset for the epidemiological investigation of information on COVID-19 is constructed based on the above nine entities [35]. The specific label meanings of the dataset and the examples are shown in Table 2.

2) DATA PRE-PROCESSING
For data that were found manually added from the national and local health care commissions and major news portals, they were changed to a similar expression to that of literature [33]; counting the lengths of all the original data in which 82% of the sentences are in the interval [150,250], and removing those with lengths less than 50 and greater than 500; using the format of the classic Chinese named entity recognition dataset: People's Daily dataset [36] as the construction standard, the original text of the new crown is processed into the form of ''character LPH (label placeholder)'', which is called a single column of unlabeled entity data, where LPH is the label placeholder, and there is a space between the character and There is a space sign between the character and the LPH. After slicing the original data into the above format, the entity information present in the text is found according to the design rules declared in IV.A.1, and the LPH is replaced with the entity corresponding to the character. The pre-processing of the dataset is completed.

3) DATASET COMPOSITION
The original data of the epidemiological investigation of information on COVID-19 was collated and filtered to obtain 1026 data by the preprocessing operation above. The total was divided into a training set, a validation set and a test set according to the ratio of 8:1:1 characters. As shown in Figure 8, it represents the number and percentage of each entity in the dataset.

B. LABELING RULES
In the named entity recognition task, there are two main methods for character-level entity labeling: the BIO tagging method and the BIOES tagging method, respectively [37]. In this paper, we use the BIOES tagging method, which differs from the BIO tagging method in that there are only three tag types, and its method is to introduce an end tag for each entity with a character length greater than one so that the boundary of the entity can be better distinguished. Where B-Entity denotes the first character in an entity, I-Entity denotes the character in the middle of an entity, E-Entity denotes the last character in an entity, and S-Entity denotes an entity composed of a single character, and O denotes a non-identified entity. Examples of sentence labels are shown in Table 3.

C. EVALUATION INDICATORS
In this paper, the precision rate P(Precision), recall rate R (Recall), and F 1 score are used as evaluation indexes. The calculation formula is shown in Equations (11)- (13).
where:T P is the number of correctly identified entities, F P is the number of incorrectly identified entities, and F N is the number of unidentified entities.

D. EXPERIMENTAL ENVIRONMENT
The specific environment of all experiments of the Chinese named entity identification study for the flow of information of patients throughout the COVID-19 epidemic is shown in Table 4.

E. EXPERIMENTAL PARAMETERS
In the epidemiological investigation information of the COVID-19 dataset, the longest sentence consists of 195 characters, and the average sentence length is about 170 characters, so the maximum sentence length (max_len)is set to 200 characters. After the hyperparameter ablation experiment, other relevant settings are shown in Table 5.

V. RESULTS AND ANALYSIS A. COMPARATIVE EXPERIMENTAL DESIGN
The following sets of comparison experiments are set up in the Chinese named entity recognition experiments for the epidemiological investigation of information on COVID-19. (1) Word2Vec-BiLSTM-CRF [12]: word2Vec generates character vectors, BiLSTM neural network extracts semantic features, and the CRF inference layer classifies different entities.
(5) BERT-BiLSTM-Attention-CRF [22]: The BERT pre-trained model generates character vectors, the IDCNN neural network extracts semantic features, the Attention mechanism layer reinforces the extracted semantic features, and the CRF inference layer classifies different entities.
(6) BERT-Transformer-BiLSTM-CRF [23]: The BERT pre-trained model generates character vectors, the Transformer encoding area constructs contextual long-range semantic features of text, the BiLSTM neural network extracts semantic features, and the CRF inference layer classifies different entities.
(7) BERT-BiLSTM-IDCNN-CRF (Serial) [16]: The BERT pre-trained model generates character vectors, the BiLSTM and IDCNN neural networks extract semantic features and then fuse the features serially, and the CRF inference layer classifies different entities.
(8) BERT-BiLSTM-IDCNN-CRF (Parallel) [25]: The BERT pre-training model generates character vectors, the BiLSTM and IDCNN neural networks extract semantic features and then fuse the features in parallel, and the CRF inference layer classifies different entities.
(9) BBIEC: BERT pre-trained model generates character vectors, BiLSTM, and improved IDCNN (replacing the relu activation function with elu activation function in IDCNN) neural network extracts semantic features and later fuses the features in parallel, and CRF inference layer classifies different entities.

B. EXPERIMENTAL PROCEDURE
The epidemiological investigation of information on COVID-19 origin data is collected and preprocessed (including data de-duplication, screening of unqualified data, etc.) to construct the unannotated new coronavirus text dataset, and then the unlabeled dataset is divided to segment the text into the character-level corpus. And each character is an entity labeled according to the rules in section IV.A.1) (Dataset labeling), with 37 labeling types, using spaces between characters and labels for segmentation and line breaks for each patient's information to construct the labeled epidemiological investigation of information on COVID-19 dataset. This dataset was then fed into the four comparison models and the BBIEC model for prediction. The experimental flow is shown in Figure 9.

C. COMPARISON EXPERIMENTS RESULTS
According to the process described in Section 4.4 (Experimental Procedure), comparative experiments were conducted on three evaluation metrics: accuracy, recall, and F1 value. The results of the experiments are shown in Table 6. As shown in Table 6. Without the BERT pre-training model, the BiLSTM-CRF model has improved all the metrics compared with the IDCNN-CRF model because the BiLSTM has stronger global context extraction ability compared with IDCNN, but also takes more time; after adding the BERT pre-training to the BiLSTM-CRF and ICDNN-CRF models, the metrics of both models have significantly improved. This is because BERT adds location coding and a multi-headed self-attention mechanism. Hence, it has a stronger semantic recognition ability and can make the downstream model perform better, but it does not change the fundamental strengths and weaknesses of the downstream model, so the evaluation index of the BERT-BiLSTM-CRF model is still higher than that of the BERT-IDCNN-CRF; and the advantage of IDCNN over BiLSTM is that it has a stronger extraction ability for local semantic features, which is determined by the principle of IDCNN; inserting the Attention mechanism into the BERT-BiLSTM-CRF model emphasizes extracting local semantic features, decreasing the model's extraction ability; inserting a Transformer encoder to the BERT-BiLSTM-CRF model improves the ability of the model to construct contextual semantic vectors, which improves the model's extraction ability. Serially inserting the IDCNN model based on the BERT-BiLSTM-CRF model improves the extraction ability of local features; based on the BERT-BiLSTM-IDCNN-CRF model, the BiLSTM and IDCNN neural networks are connected in parallel with dual channels to improve in the extraction ability. The BBIEC surface model proposed in this paper is based on the dual channel parallel model and improves the IDCNN model in it by improving the activation function from relu to elu, and the three evaluation indexes reach 0.9492, 0.9561, and 0.9521, respectively, which is better than the BERT-BiLSTM-IDCNN-CRF (dual channel) model with the best results in the comparison experiments in terms of recall and channel) The best model in the comparison experiments has significantly improved recall and F1 value. The reason is that elu has most of the advantages of relu and does not have the Dead relu problem of the relu activation function; the elu function makes the gradient closer to the unit gradient by reducing the effect of bias shift, and the mean value of output is also closer to 0. However, the BBIEC model has decreased in the accuracy rate because the elu activation function contains power operations in its calculation, which is computationally intensive. The elu activation function introduces the case of wanting x < 0, which activates some neurons, resulting in a larger FP value in the accuracy formula, leading to a decrease in the accuracy rate.

D. SENTENCE-LEVEL RECOGNITION RESULTS
The effect of entity recognition at the sentence level of the BBIEC model is shown in Table 7. As shown in Table 7. Without using the BERT pretraining model, the whole-sentence recognition accuracy of the IDCNN-CRF model and BiLSTM-CRF model is less than 0.5; after using the BERT pre-training model, the two models have a greater improvement in whole-sentence recognition accuracy, which is because the BERT pre-training model brings stronger semantic extraction ability to the model; after using the attention mechanism, over-emphasis on local features, instead, is a decrease in extraction ability;stronger semantic extraction capability is obtained after using Transformer encoder; after the two neural networks are connected, The two-channel parallel connection method has better recognition effect than the serial connection method, which is because, in the serial connection method, the downstream neural network is vulnerable to the error of the upstream neural network, resulting in the poor recognition effect of the serial connection model; the BBIEC model proposed in this paper improves the activation function used in the IDCNN network, which makes some dormant neurons in the IDCNN activate and improves the overall model of semantic extraction ability.

E. ENTITY-LEVEL RECOGNITION RESULTS
In some relevant application areas of the epidemiological investigation of information on COVID-19, entity-level evaluation metrics are required, and the evaluation metrics for each entity of the BBIEC model are shown in Table 8. According to the numerical analysis of the indicators, we can get: (1) the three evaluation indicators are 1.0000 have TEL and TEMP entities because the format of these two entities is very fixed, indicating the phone number and the patient's body temperature, and the model is easier to achieve the result of successful recognition of all of them; (2) the three evaluation indicators are more than 0.9000 have four entities: VEH, NUM, TIME and DATE respectively entities; because these four types of entities all have strong regularity and are mostly represented by English, numeric and Chinese characters, the recognition difficulty is small and the indicators are high. (3) There are three entities whose remaining Analysis by entity type shows that (1) PER entity: the recognition effect of I-PER is worse than that of the boundary entities B-PER and E-PER because, in the epidemiological investigation of information on COVID-19 corpus, the patient's name is treated as ''last name'' plus ''a,'' or the patients are numbered and their real names are not published, so the training set of I-PER entities is small and cannot learn the semantic features effectively, resulting in low metrics.
(2) LOC entities: The epidemiological investigation information of the COVID-19 corpus is published by the relevant institutions in each region, which contains many single character place name abbreviations, i.e., S-LOC entities, and these individual character entities do not have accurate upper and lower boundaries and cannot be corrected for errors by the CRF classifier, so the metrics are relatively low compared to other entities. (3) ORG entities: In natural language expressions, LOC entities and ORG entities have a great connection, and some words can be represented as both LOC entities and ORG entities, and the complex semantic understanding leads to a low accuracy rate.

F. EFFECT OF HYPERPARAMETERS ON MODEL PERFORMANCE
In this paper, we conduct experiments on the effects of three model hyperparameters, dropout rate (dropout), maximum sentence length (max_len), and learning rate (lr), on model performance, and the results are shown in Figure 10.
As shown in Figure 7(a), using dropout to prevent the overfitting phenomenon, it can be concluded that the best fit is achieved when dropout is 0.4; when dropout=0.5, the three indicators are as close as when dropout=0.4, which proves that its fitting ability is more excellent; when dropout=0.3, all indicators decrease, which is due to dropout rate is low, leading to model overfitting and lower verification accuracy; when dropout>0.5, the indicators decrease significantly, which is due to the loss of too much semantic information due to too high dropout, leading to the decline of model performance; as shown in Figure 7(b), max_len takes 200 as the best choice, and max_len=100 intercepts a part of the sentences, and the distribution of epidemiological investigation of information on COVID-19 corpus entities is more dense, which will lose a large amount of semantic information, resulting in three indicators far below the optimal choice; when max_len=250, because the longest value of the sentence is 195 and the average length is about 170, filling in too much useless information will lead to low prediction results; when max_len=150, the truncated part is less and can still contain most of the semantics, so it is close to the optimal index; as shown in Figure 7(c), when lr=1e-4, the three indexes are the highest; when lr>5e-4, lr is too large, which leads to the model cannot converge and the indexes are 0; there is a graph that shows the overall change trend incrementally.

VI. CONCLUSION
In this paper, we address the problems in the field of epidemiological investigation of information on COVID-19, design the entity definition of the epidemiological investigation of information on COVID-19 named entity recognition dataset by analyzing the original corpus of epidemiological investigation of information on COVID-19, and select the appropriate annotation system to build the epidemiological investigation of information on COVID-19 dataset, and propose the BBIEC model with dual neural network serial, which reduces the labor cost and time cost in the annotation process.
The model can fully learn both global context and local features and improve the entity recognition by fusing both features. The model achieves better results in the epidemiological investigation information of the COVID-19 dataset and multiple entity levels. Three metrics reach 0.9492, 0.9561, and 0.9521, respectively.
The recognition of Chinese named entities of epidemiological investigation of information on COVID-19 achieved in this paper lays the foundation for the construction of a knowledge graph in the field of epidemiological investigation of information on COVID-19, which is a good aid to better control the epidemiological investigation of COVID-19. The next step in the research direction is to extract the relationship between the epidemiological investigation of information on COVID-19 entities. Processing. He has coauthored 20 publications, held eight China national invention patents and seven software copyrights. He has presided over or participated in more than ten projects on scientific research. His research interests include the Internet of Things, wireless communication, big data, and artificial intelligence.
WEI WANG received the Ph.D. degree in control science and engineering from the School of Information Engineering, University of Science and Technology Beijing, in 2012. He is currently an Associate Professor with the Hebei University of Engineering. His research interests include the public safety Internet of Things and implicit human-computer interaction learning.