Chinese Address Recognition Method Based on Multi-Feature Fusion

A place name is a textual identification of a specific spatial location by people and is an important carrier of geographical information. The recognition of Chinese place names is of great importance in information retrieval and event extraction. The traditional approach is to transform the recognition of Chinese place names into a sequential annotation problem, with commonly used classification models such as support vector machines and conditional random fields. In this paper, Chinese address recognition is converted into a sequential annotation task, and a multi-feature fusion approach to Chinese address recognition is proposed. A deep learning network architecture model based on the fusion of character, word, and address features is constructed to convert characters, words, and their features into vector representations; finally, the sequential annotation of sentences is performed by CRF to achieve the recognition and extraction of address information. On the autonomously constructed dataset, the proposed method MFBL (Multi-Feature-BiLSTM) improves in accuracy by 4 to 10 percentage points compared to other methods, demonstrating that the MFBL model has better performance in the address recognition task.


I. INTRODUCTION
A place name is a textual identification of a specific spatial location by people and is an essential carrier of geographic information. Identifying Chinese place names is significant in information retrieval and event extraction. However, Chinese addresses have the characteristics of diverse sources and different descriptions. The critical factor for subsequent tasks is how to accurately parse Chinese addresses and identify address entities. The current research on Chinese address recognition mainly includes the following three methods: one is the rule-based address recognition method, which extracts Chinese addresses according to the feature words of address elements and then realizes the recognition of address elements. However, it is difficult to parse and extract non-standard addresses or complex addresses effectively, and thus lacks adaptability. Another method, a statistical model of place name recognition through a large-scale The associate editor coordinating the review of this manuscript and approving it for publication was Claudio Zunino. prediction database, is based on statistics and machine learning. The model combines the lexical information of placename phrases and the context information of sentences, which can solve the problem of semantic ambiguity to a certain extent. The last is a method based on deep learning, which realizes the purpose of address recognition by mining the potential regular features in the data. A method that has been used more recently is the combination of BiLSTM and conditional random field CRF to build an address recognition model.
The methods based on rules and statistics have certain limitations. They rely on the construction of the standard address library and are ineffective in dealing with disordered and missing addresses. Moreover, these methods lack the understanding of address semantics and cannot extract address semantic information effectively. However, the methods based on deep learning still have much room for improvement. In response to this problem, this paper proposes a BiLSTM address semantic recognition model based on multifeature fusion. First, the Trie syntax tree structure is used to VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ construct a standard address tree, and then a deep learning network architecture model based on character, word, and address feature fusion is used to convert characters, word tags, and their features into vectors, and then input the obtained sentence representation vectors into bidirectional recurrent network layers to obtain sentence semantic information, and finally use CRF to perform sequence labeling of sentences to realize the information identification and extraction of address. The contributions of this paper are as follows: 1) We propose an address feature recognition method based on deep learning combined with syntax tree rules. Based on the characteristics of Chinese addresses and the unique properties of address feature words, we extract address feature words as the key factors of address semantic representation, and then construct an address syntax tree for Chinese addresses to realize the matching and recognition of addresses.
2) This paper proposes a BiLSTM address semantic recognition model based on multi-feature fusion, an address semantic representation model based on the fusion of character, word, and address features. The first important component of this architecture is responsible for converting word and word tokens and their features into vector representations, then the obtained sentence representation vectors are input to two bidirectional recurrent network layers to obtain address semantic information, and finally the sequence annotation of addresses is performed by CRF to realize the recognition and extraction of address information.

II. RELATED WORK
The current difficulties in identifying Chinese addresses are mainly reflected in the following points: Chinese address descriptions are diverse, and there is no unified specification and coding; Chinese address descriptions are more arbitrary, no unified rules can be followed, and there are redundancy and lack of address descriptions. The Chinese address naming method is relatively updated, and it is difficult to form a dictionary database with comprehensive coverage and high accuracy, so it is impossible to identify Chinese addresses by matching.
Chinese addresses have specific rules and characteristics. Based on the analysis of address rules, each element of the address can be identified. Relevant scholars analyzed and summarized the characteristic words and parsing rules of addresses and realized address recognition based on specific terms and regulations. However, the problem of ambiguity in addresses cannot be solved. For this reason, Tan proposed a mechanism that combines rule tree and ambiguity storage, which partially solves the problem of incompleteness and ambiguity in Chinese addresses [1]. In general, although address recognition based on address rules has a specific effect, the natural complexity of addresses makes it difficult to collect and cover all regulations. The address matching algorithm based on rules and dictionaries proposed by Zhang et al. [2] uses address element feature words and address feature dictionary to extract the most effective matching elements from non-standard addresses to realize the identification of address elements. However, it relies on the completeness of address elements, making it challenging to formulate rules. Relevant scholars use the Chinese address tree model for research [3], [4], [5]. The Chinese address extraction method based on the address tree model proposed by Kang et al. uses the topological relationship as a spatial constraint to extract standard addresses from non-standard addresses. However, it cannot list all non-standard address and place name element sets. To overcome this situation, Wu et al. [6] proposed a multi-strategy address matching algorithm, which combines character similarity and addresses element extraction strategy, and uses the method based on a feature word dictionary and sequence annotation to perform the matching algorithm with standard address database.
The statistical-based word segmentation method considers that the more adjacent characters appear in the same order, the more likely it is to form a word. Based on this idea, related scholars take the address string as the observation sequence and the address element type as the label sequence [7], obtain the address parsing model by training the labeled dataset, and use the parsing model to mark the address to identify the address element type. Statistics-based method models include Hidden Markov Model, Conditional Random Field Model, Support Vector Machine Model, etc. [8], [9], [10]. Zhu et al. [11] summarized the characteristics of Chinese addresses and trained a Chinese address parsing model based on conditional random fields. Based on the conditional random field model, Xu et al. [12] used the fuzzy matching method of local information of Chinese addresses to standardize and parse Chinese addresses. Yuan [13] and others combined statistical methods and rules to design a crime based on statistics and regulations, which explicitly affects disambiguation and improves the efficiency of Chinese address recognition. Related scholars have also proposed similar methods [14], [15], [16], [17].
Zang et al. [18] proposed a construction method based on an address semantic model, which avoids the ambiguity of address semantics by introducing SVM and effectively improves the accuracy of address recognition. Song et al. [10] proposed a Chinese address matching algorithm to identify unstructured addresses by establishing a spatial relationship address model and an address library logic model. However, it only covered 1,000 residential addresses, which were too small in the experiments. Luo and Huang [19] proposed a standardization method based on a finite state machine and an address hierarchy model, which is simple in principle and has strong operability. However, its shortcoming is that when it is aimed at specific practical problems, the model's generalization ability is not strong and has significant limitations. Duan et al. [20] proposed an algorithm for extracting administrative divisions of Chinese addresses based on conditional random fields. An expression model of administrative divisions was obtained by constructing a feature corpus template. Li et al. [21] proposed a rule-based method for the extraction of administrative divisions to obtain the complete administrative divisions corresponding to address elements by establishing a set of address element rules. Liu et al. [22] proposed a method based on distinct characters, which recognizes Chinese addresses by segmenting the address field. It is simple in principle and easy to use, but it also faces the problem of low accuracy. Other scholars have also conducted in-depth research to address semantics [23], [24], [25], [26].
However, the methods mentioned earlier based on rules and statistics do not consider semantic information, so these methods have poor performance in the case of the irregular distribution of the center of gravity of some non-standard address information. To address such problems, many studies have begun to apply neural networks to address semantic representation tasks by using CNN [27], [28], [29], Recurrent Neural Networks (RNN) [30], [31], [32], Long Short-term Memory Networks (Long Short Term Memory, BiLSTM) [33], [34] or some fusion methods [23], [35], [36], etc. Specifically, there are many studies on the semantic parsing and recognition of Chinese addresses. Zhang et al. [37] used BiLSTM and a four-lemma tagging method to tag the address dataset, which improved the Chinese word segmentation effect. Cheng et al. [38] proposed to use BiLSTM combined with CRF to build a Chinese address parsing model, which improved the accuracy of Chinese address recognition. Wang et al. [39] realized the parsing and recognition of Chinese addresses on a small amount of annotated datasets by combining the improved Transformer and CRF. Some scholars have also done other CRF-based Chinese entity recognition research.
In conclusion, the current identification methods of Chinese addresses all rely on the integrity of the standard address library or use standard methods to identify Chinese addresses, which are not from the point of view of understanding address semantics. The neural networkbased method can effectively solve the problem of the lack of semantic information in address matching and the poor effect of traditional methods on differences between address elements. However, for such models, how to effectively integrate the contextual information of global and local contexts is an important issue. By analyzing the characteristics of Chinese address structure, this paper proposes a Chinese address recognition method based on the fusion of characters, words, and their semantics. This method does not rely on the address feature library and realizes accurate identification of Chinese addresses from the perspective of address semantic understanding.

A. PREREQUISITES
Chinese addresses contain three features: address elements, part of speech, and syntax. Chinese geographical names are usually composed of multiple elements, and each address element belongs to an independent part of the geographical name entity. The address elements are composed of ordinary characters and characteristic words, among which the characteristic words that can reflect the actual semantics and location information of the address can better reflect the essential difference between the address elements. Therefore, the characteristic words are a mark for distinguishing address elements and dividing address levels. In this section, we propose a deep learning network architecture model based on the fusion of character, word, and address features according to the characteristics of the Chinese address structure. The first critical component of the architecture is responsible for converting characters, word tokens, and their features into vectors and then inputting the resulting sentence representation vectors into two bidirectional recurrent network layers to obtain address semantic information. Finally, the sequence labeling of the address is carried out through CRF to realize the identification and extraction of the address. A Chinese address is composed of multiple address elements. A valid address element should include the name of the administrative division, the name of the street, the name of the community, the door address, the name of the landmark, and the name of the point of interest. The description of Chinese addresses usually covers the following types: one is to describe the administrative area from wide to narrow, such as ''No. 88 Ruixiang Road, Jiujiang District, Wuhu City, Anhui Province'', which is a standardized address. Using address element feature words to distinguish address elements, such as ''province'', ''city'', ''district'', ''road'', ''number'', etc. An address tree model can be constructed through these characteristic words. The other is to use the building address as the abbreviation of the address, such as ''Wuhu City Wanjiang Fortune Center Building'', or use relative addresses to describe, such as ''Next to Zhongjiang Avenue opposite Wuhu Municipal Government'', such non-normalized addresses need to be normalized before identification. Chinese address elements include multiple levels, such as provinces and municipalities directly under the Central Government as the first level, provincial capitals and prefecture-level cities as the second level, districts and counties as the third level, and streets and towns as the fourth level. Taking streets and townships as an example, the feature sets that may include corresponding address elements are towns, townships, offices, neighborhood committees, communities, and streets. Based on the unique attributes of the address feature word, this paper will extract the address feature word as the key factor of address semantic representation and construct the address syntax tree shown in Figure 1, where the nodes correspond to the address elements in the address. In this paper, the syntax tree is used to match Chinese addresses, and corresponding to the standard addresses that conform to the syntax tree, the address semantic model described below will be used for entity recognition.

B. CHINESE ADDRESS RECOGNITION MODEL
In this paper, we propose a novel Chinese address recognition model, Multi-Feature-BiLSTM (MFBL). According to the characteristics of Chinese addresses, the proposed Chinese address semantic recognition model is established by fusing the characters and word attributes of Chinese addresses. Specifically, the address semantic recognition is divided into three stages: Firstly, extracting character-level information by converting the address character information into vector representation by incorporating the local and global features of characters in address text. Secondly extracting word-level information by obtaining the forward and backward context dependencies of words in the address text. The word-based address semantic information can be extracted in this stage, and the address semantic representation is synthesized by leveraging the association between address indicator words appearing in the address text. Finally, the CRF module is used to synthesize the context feature vector of the previous structure output to obtain the category probability value of each character, and the annotation sequence with the maximum probability is selected as the recognition and annotation result of address elements. The main algorithm is described as follows: The address semantic recognition model takes the address text as input, generates the semantic fusion representation vector of the address text based on the character level and word level respectively, and then uses BiLSTM to obtain the address semantics representation. In the end, CRF is used to complete the recognition of address elements. The overall construction of the model is shown in Figure 2. The MFBL model is composed of an embedding module, a semantic feature fusion module, and a CRF module. The specific details of each module are explained in the following.

C. FUSION REPRESENTATION OF CHINESE ADDRESS SEMANTICS
In this paper, we use the address semantic representation that fuses character and word features. In detail, firstly, the character-level embedding vector representation is obtained from the input address text, and the local semantic information is extracted through the convolutional network. Then the generated character-based token sequence representation is passed to the input layer of BiLSTM. Secondly, segment the address text and use a pre-trained language model to encode the segmented sequence into a representation vector and feed it to the word embedding layer. The overall structure is shown in Figure 3.

1) CHARACTER-LEVEL REPRESENTATION
In order to learn more latent semantic connections in the address text, we take the Chinese character features as an input dimension and encode them globally and locally to learn richer semantic information in them. Specifically, firstly, the BiLSTM structure is used to encode the characters in the address text in a bidirectional way, and then the self-attention mechanism is used to effectively obtain the correlation between the characters, which is used as the global semantic information at the character level. Then, we use a convolutional neural network to extract the local features of the characters, and the max-pooling layer is used to remove the redundant local semantic information, and finally, the character-level local features are obtained. The overall   For the character w t in address text at position t, the pretrained language model Bert is leveraged to convert it into an embedding vector: The embedding vectors of the characters are sent into the BiLSTM network, and the semantic representation output of BiLSTM is obtained as where − → h t and ← − h t denote the forward and backward output of BiLSTM network. Based on h t , the self-attention mechanism captures the relationship weight between every two characters in the address text, and the relevant calculation formulas are shown as follows: where c t is the context vector, w a , w b , w c are the weight matrices, and χ is the randomly initialized parameter vector. The convolutional neural network is used to extract the local features of characters, and then manipulate the output results with the max-pooling layer to retain the most important features from learned features. The relevant formula for local feature extraction using CNN is as follows: where x t is the embedded input representation of the character, K is the convolution kernel size, and mask means padding the input sequence with zero to unify the input dimension. After that, the output results are compressed by a max-pooling layer to retain the most relevant information for subsequent predictions. At time t, the obtained characterlevel local features are:

2) WORD-LEVEL REPRESENTATION
In this paper, we not only use character-level features but also use word-level features. By introducing the character-level and word-level features, we can make full use of the boundary and semantic information of the input text. Specifically, this method assigns four labels to each character: B, M, E, and S, where B denotes the latent word set beginning with the current character, M denotes the latent word set containing the current character, E denotes the latent word set at the end of the current character, and S denotes the current character.

D. FEATURE FUSION
A multi-feature fusion strategy is used to represent the character-level features, containing both global and local features. Multi-feature fusion is a robust and efficient strategy that makes full use of the most significant features to achieve better results. Character-level-based feature fusion can combine multiple related features into a global information representation of the original input sequence. In the feature fusion stage, an adaptive connectivity strategy is used to fuse global and local features. The multi-feature fusion is expressed as follows: where h A t and h C t are the features obtained from Section 3.2.1, u 1 is the parameter used to adjust the importance degree of these two features.
Finally, the fused character-level representation vector h t and the enhanced representation vector Emb(B, M , E, S) are concatenated to obtain the representation vector I of the final input layer, and then the concatenated vector was fed into the BiLSTM network.

E. CRF
The fusion representation vector of multi-level features is fed into the BiLSTM network to fully mine richer semantic information. It is assumed that the output sequence of the BiLSTM network is X = (x 1 , x 2 , . . . , x n ), and the corresponding label sequence is Y = (y 1 , y 2 , . . . , y n ). Conditional Random Field (CRF) is a discriminative probabilistic model, which combines the advantages of Hidden Markov Model (HMM) and Maximum Entropy Model (HEM), and is suitable for the sequence labeling task. When the labeling sequence of X is Y, its probability is calculated by the following formula, where e s(X ,y) is the score of the true path and where S(X , y) = i (u x i ,y i + P[y i , y i−1 ]), then the solution of the maximum likelihood function is transformed into the following formula, where, u x i ,y i denotes the probability of element x i to be labeled as y i , and P denotes the label transition matrix.

IV. EXPERIMENT A. SETTING
In this paper, the deep learning framework Keras 2.3.0 based on CUDA 10.0 was used to build the network model. The experiment was carried out on Ubuntu 18.04 LTS system with memory DDR4 32G, 3.6GHz i7-7700 Intel(R) Core(TM) CPU, NVIDIA GeForce GTX 1080 Ti.

B. DATASETS
In order to evaluate the stability of the model proposed in this paper, we used the standard address library to construct datasets containing 268973 Wuhu city address information.
Then we selected 90% of the datasets, about 242076 data as the training set, and the remaining 26897 data as the test set.
In addition, the ratio of positive and negative samples in the training set and test set is about 3:1. Before model training, the address data needs to be marked. In this paper,''BIO'' annotation system is used to annotate address data. First of all, the Chinese processing tool jieba was used to perform segmentation of address data before annotation. Considering that address, as a kind of short text with a special structure, may contain a large number of specific words of place names, we used a self-defined stop word set to perform segmentation. Then we annotated each word according to the rules in the ''BIOE'' annotation system. As shown in Table 1, B-begin, I-inside, O-outside, and E-end were directly annotated at the end of each address element, and then automatically converted to BIOE format.

C. EXPERIMENTAL SETUP
In this paper, the dimension of character features of Chinese characters is set as 20 dimensions, and the word2vec model is used to encode vectorization for each Chinese character. The address data with less than 20 dimensions is encoded with 0 to complement the 20-dimensional coding, and then each word in the address data is represented as the corresponding word vector, which is fused as the vector representation of the whole address data. In terms of the setting of hyperparameters, the output dimension of each word is set as 768 dimensions according to the possible length of address data. And the semantic representation dimension of the output address data after representation is 100 dimensions. After the semantic representation is completed, the two semantic vectors are respectively input into the network structure of the next layer.
In the training process, the batch size is set to 1024, and the two-layer BiLSTM network is used to obtain the global context information. Combined with the CNN method to obtain local context information, we set dropout to 0.5. Then, the output results of BILSTM and CNN are concatenated and then fed into the self-attention network as a feature matrix. Finally, a 100-dimensional representation vector is output as the semantic representation of address data. For this model, Adam optimizer with epoch 25, learning rate 0.01, dropout 0.5, beta1 0.9, beta2 0.999, and decay rate 0.1 are used as the optimization method of the model. The specific parameters of the model are shown in Table 2.

D. EXPERIMENTAL RESULTS AND ANALYSIS
In terms of evaluation metrics, in order to effectively evaluate the prediction results, we select some reference metrics to measure the final results, including accuracy, precious, recall, and F1-score. The higher the accuracy is, the more accurate the model is for the sequence annotation of address data. The higher the F1 score, the better the overall performance of the model.

1) ABLATION EXPERIMENTAL ANALYSIS
In order to verify the effectiveness of the MFBL model, the ablation experiments were performed: a. The first group only obtains global features at the character level, and uses BiLSTM and attention mechanism to extract global features from the input character representation; b. The second group only obtains character-level local features through the CNN model and maximum pooling operation; c. The third group only obtains the features encoded by the word and then obtains the corresponding word set for each character. Finally, it is concatenated with a single character representation; d. The fourth group is the global and local feature representation at the fusion character level; e. The fifth group is the multi-feature fusion representation method based on character and word coding proposed in this paper.  Table 3, it can be found that the experiment based on a single feature makes the performance of the model deteriorate. The overall accuracy, recall, and F1-score are all poor whether only character-based global features or local features are selected or simply word-level features are used as the input of neural networks. The reason is that the input information of the model is less and the model is not fully trained, which leads to the poor performance of the model on the test corpus. For the fourth group of experiments, the global features and local features at the character level are used as the common input. It can be seen that the performance of entity recognition is improved, and the results are slightly better than those of the third group of experiments with wordlevel features as the input. The reason is that the extraction of global and local features of characters fully mines the potential information contained in the characters, which is helpful for the model to fully train the corpus. From the fifth group of experiments, it can be seen that the proposed model tends to be more effective with the increase of fused features, which shows that the fused features are helpful to improve the performance of entity recognition from a multilevel perspective.

2) COMPARISON WITH BASELINE MODEL EXPERIMENT
In order to verify the effectiveness of the MFBL model proposed in this paper, the model proposed in this paper is compared with the classical model. This paper sets up the following groups of comparison model experiments: a. The first group only uses the BILSTM-CRF model for address sequence annotation; b. The second group uses the BILSTM-CRF model and adds the attention mechanism to the experiment; c. The third group combines with CNN network to obtain local context information and runs the experiments with BILSTM-CNN-CRF model; d. The fourth group is the MFBL model proposed in this paper. The attention mechanism is added to BILSTM-CRF and the CNN network is combined for joint training. The results of the comparison experiment are shown in Table 4, from which it can be concluded that the MFBL model proposed in this paper has achieved the best results in terms of accuracy, recall, and F1-score. The result indicates the effectiveness of the proposed method in Chinese address recognition. From the table, the second group adopts the entity recognition method combined with the attention mechanism, which improves the overall effect of the model. It shows that adding the attention mechanism can learn effective features from a global perspective. From the third group of experimental results, it is found that using CNN to obtain local effective features can also improve the performance of the model. Meanwhile, by comparing the experimental results of the fourth group, the second group, and the third group, it can be seen that the model proposed in this paper improves the performance of other models by about 4-5 percentage points in terms of F1-score. It is directly proved that the model cannot effectively capture some key information in the address when only considering the attention mechanism or the local information obtained by CNN. At the same time, the F1-score proves that the accuracy improvement of MFBL model is not affected by the proportion of positive and negative samples in the dataset. However, the overall learning ability of the model is indeed enhanced compared with other ablation models.

3) EXPERIMENTAL SUMMARY
For the problem that the redundant information of Chinese addresses cannot be recognized, a Chinese address recognition model based on multi-feature fusion is proposed in this paper. The actual experimental results show that this method has a good effect on many metrics. By comparing the simple BILSTM-CRF model, CNN network and attention mechanism are used to extract local features and global features of address data, which can effectively improve the performance of the model. In this paper, the particularity of address is considered in Chinese address recognition, and the semantic features of address are extracted from multiple dimensions. However, the association between an address and the geographical entity is not studied. In the next step, we will consider and try to introduce information such as a geographic information graph to enhance the accuracy of recognition, and the generalization ability of unknown address datasets also needs to be further studied.

V. CONCLUSION
This paper proposes a BiLSTM address semantic recognition model based on multi-feature fusion, using a grammar tree structure to construct a standard address tree, followed by a deep learning network architecture model based on the fusion of word, word and address features, converting word and word tokens and their features into vector representations to obtain sentence semantic information, and finally performing sequence annotation of sentences through CRF to achieve address information recognition and extraction. The experiments show that the MFBL model proposed in this paper has better performance in the address recognition task. However, with the development of urbanization, the description of Chinese addresses has great differences and volatility. In the face of the complexity and irregularity of Chinese addresses, our method has some limitations. To solve this problem, we need to collect a wider range of data sets, optimize the data with appropriate methods, and further improve the experimental methods, so that we can more accurately understand the semantic information of Chinese addresses and complete the semantic identification of addresses.
MENG WANG was born in 1993. He received the master's degree from Anhui Normal University, in 2020. He has been working as an Engineer with Chery HuiYin Motor Finance Service Company, Ltd., since 2021. His research interest includes machine learning.
CHAOLING DING was born in 1988. He received the master's degree from Huangshan University, in 2011. He has been working as an Engineer with Chery HuiYin Motor Finance Service Company Ltd., since 2020. His research interest includes data mining.
XINGHUA YANG was born in 1996. He received the master's degree from the Anhui University of Finance and Economics, in 2022. He has been working as an Engineer with Chery HuiYin Motor Finance Service Company Ltd., since 2022. His research interest includes NLP.
JIAN CHEN was born in 1989. He received the master's degree from Sichuan University, China, in 2014. He has been an Assistant Engineer with the Big Data Laboratories, Yangtze River Delta Information Intelligence Innovation Research Institute, since 2020. His research interests include natural language processing and data mining. VOLUME 10, 2022