Recognition of the Agricultural Named Entities With Multifeature Fusion Based on ALBERT

High quality agricultural named entity recognition (NER) model can provide effective support for agricultural information extraction, semantic retrieval and other tasks. However, the existing models ignore the potential characteristics of Chinese characters, resulting in the lack of internal semantics. Moreover, the agricultural text sequence is long, which leads to the lack of long-distance dependence of model capture. In order to solve the above problems, a self-attention mechanism RSA-CANER agricultural named entity recognition model is proposed which incorporating the potential characteristics of Chinese characters. First, the model takes character features and potential features of Chinese characters as input to enrich semantic information. Among them, character features are obtained based on ALBERT pre training tool, radical features are extracted based on convolutional neural network (CNN), and stroke features are extracted based on bidirectional long short-term memory model (BiLSTM). Then, based on the BiLSTM, the sequence characteristic matrix is obtained, and the self-attention mechanism is used to further enhance the ability of the model to capture long-distance dependence. Finally, the global optimal sequence is generated based on conditional random field (CRF) model. It obtains an F-score of 95.56%. The experimental results show that the model learns semantic information at multiple fine-grained levels of radicals and strokes, enriches the vector expression of target words, and its recognition precision is better than other models, improving the generalization ability of the model.

question and answer data, how to quickly and accurately 23 locate keywords and mine deep semantic relationships is an 24 urgent problem to be solved by the agricultural intelligent 25 question and answer system [1], [2]. The main task of agri-26 cultural named entity recognition is to identify different types 27 of entities from unstructured question and answer data, such 28 as crop diseases and pests, crop varieties, pesticide names, 29 etc. It is the key technical link to build an intelligent ques-30 tion and answer system, and finally provide professional and 31 The associate editor coordinating the review of this manuscript and approving it for publication was Prakasam Periasamy . personalized decision-making information services for grass-32 roots agricultural technicians ( [3], [4], [5]). 33 The research on named entity recognition in agricultural 34 field started late, and there is a lack of standardized and open 35 corpus. Moreover, the diversity of agricultural entities and the 36 existence of a large number of nested entities, abbreviations 37 and alias entities directly hinder the further development of 38 tasks related to natural language processing in the agricultural 39 field. 40 In NER task, entity recognition is treated as a sequence 41 annotation task based on statistical machine learning [6], [7]. 42 Common models include hidden Markov model [8], maxi-43 mum entropy model [9] and conditional random field [10]. 44 In the field of agriculture, Li et al. [11] proposed a named 45 entity recognition method based on conditional random 46 fields. Through feature combination and adjusting the size of 47 context window, the precision of identifying named entities 48 mechanism to make up for the shortcomings of BiLSTM and 90 improve the attention of the model to key nodes. 91 Wu et al. [30] proposed Att-BiLSTM-CRF model based on 92 self-attention mechanism. Based on the self-attention mecha-93 nism, establish a direct connection between each role to learn 94 long-term dependence. The model achieved good results in 95 CCKS-2017 shared task 2 data set. Wei et al. [31] proposed 96 an attention mechanism to improve the vector representation 97 in BiLSTM, and designed different attention weight redistri-98 bution methods and fused them. The F 1 -score of the model 99 on JNLPBA corpus is 73.50%. Jin et al. [32] proposed a 100 new character based gated convolutional recursive neural net-101 work GCRA. An additional gated self-attention mechanism is 102 used to capture global dependencies from different subspaces 103 and any adjacent characters. Zheng et al. [33] proposed a 104 new model Att-CNN-BiLSTM-CRF. Convolutional attention 105 layer combines local attention mechanism and CNN to cap-106 ture the relationship between local context. The global multi 107 head attention layer optimizes the processing of sentence 108 level information. The recall rate of the model is 88.16%, 109 and the precision rate is 89.33%. In order to better extract 110 the characteristics of agricultural texts from multiple perspec-111 tives and multi-level perspectives, this paper uses the self-112 attention mechanism to adjust the weight of the output matrix 113 of the BiLSTM model, obtain more abundant correlation 114 information, and improve the attention of the model to key 115 nodes.

116
The model based on attention mechanism can better cap-117 ture the internal correlation of data or features, obtain rich 118 context information, and improve the performance of the 119 model. However, the model ignores the semantic information 120 of Chinese characters. Chinese characters are hieroglyphics 121 with compact structure and rich internal characteristics. Com-122 pared with English word embedding vectors, chinese radical, 123 stroke, pinyin and so on contain a lot of valuable seman-124 tic information and morphological information   sentence to obtain the global optimal tag sequence. In agricultural texts, entities have different meanings in 203 different contexts, and there is polysemy. For example, 204 '' (gryllus testaceus)'' belongs to insect pest in dif-205 ferent contexts, which endangers cotton, peanuts and other 206 crops. It can also belong to oil hyacinth, a plant belonging to 207 the sandalwood genus in the sandalwood family. In order to 208 make full use of sentence context information and obtain rich 209 character level semantic representation, this paper introduces 210 ALBERT pre training model to complete the character level 211 feature vector representation of corpus set.

212
ALBERT shares the parameters of each layer of the trans-213 former encoder, so that the superposition of multiple layers 214 of attention becomes the superposition of one layer of atten-215 tion with the same parameters, and the amount of parame-216 ters can be greatly reduced. At the same time, the stability 217 of the model has also been improved. For input sequences 218 X = {x 1 , x 2 , x 3 · · · , x n }, character level semantic informa-219 tion E C = {e c 1 , e c 2 , e c 3 · · · , e c n } is obtained through ALBERT 220 model. result is obtained E S . The expression is as follows: LSTM is a special cyclic network model, which overcomes 270 the gradient explosion problem of RNN model in the training 271 process. In order to accurately identify agricultural named 272 entities, a bidirectional long short-term memory model is 273 constructed to represent the text in two different directions: 274 forward and reverse, so as to fully obtain the past and future 275 feature information of the target word.

276
The main structure of LSTM network can be formally 277 expressed as: Define the model input sequence X = (x 1 , x 2 , x 3 · · · , x n ) 292 and input it to the BiLSTM layer. Through the BiLSTM 293 network model, the text representation in the forward and 294 reverse directions is carried out to fully obtain the past and 295 future feature information of the target word. The vector X of 296 the character embedding layer will be used as the input of the 297 BiLSTM layer at time T. through the forward LSTM output 298 feature sequence − → h and the reverse output sequence ← − h , the 299 vector of the hidden layer splicing will be obtained, and the 300 final output result h t will be obtained by weighting the tanh 301 activation function.
The multi head attention mechanism maps Q, K, V through  Take the output sequence Z as the input of CRF layer, and the 342 probability of the corresponding output tag sequence is:

344
Then, the conditional probability of sequence y is obtained 345 by using softmax function. Finally, Viterbi algorithm is used 346 to take the sequence with the highest score y * as the final 347 annotation result of the model.

348
In the prediction process, a set of sequences that maximize 349 the overall probability of output are: In this section, Att-ALBERT-BiLSTM-CRF is used as the 381 benchmark model, and two additional feature information of 382 radicals and stroke sequences are integrated into the model for 383 comparative experiments. The results are shown in Table 3. 384 Among them, the benchmark model Att-ALBERT-BiLSTM-385 CRF takes the character level feature E C as the input, and 386 the recognition precision is 94.25%, the recall rate is 94.18%, 387 and the F-score is 94.21%. Incorporating the radical feature 388 E R , the precision of the model is 94.71% and the F-score 389 is 94.64%. With the stroke feature E S , the precision of the 390 model is 94.93% and the F-score is 94.99%. In order to ensure 391 the uniqueness of Chinese characters, the E R + E S fusion is 392 used as an additional feature of the model and spliced with 393 the word level feature E C . The precision of the model is 394 95.48% and the F-score is 95.56%. The analysis shows that by 395 integrating stroke features and radical features, the model can 396 not only capture the stroke dependency of text words, but also 397 enhance the semantic representation of words and improve 398 the recognition ability of the model through the embedding 399 of radical features.