An Automatic Mapping Method of Intelligent Recorder Configuration Datasets Based on Chinese Semantic Deep Learning

Mapping the Intelligent Electronic Devices (IEDs) output interface address description datasets to the intelligent recorder is the groundwork for the recorder to accurately collect IEDs’ operation information. These datasets, which are also intelligent recorder configuration datasets, are included in the Substation Configuration Description (SCD) file of an intelligent substation. The mainstream mapping method is manually mapping these datasets based on output interface Chinese description texts. When the number of IEDs is extremely large, the manual operation often takes a huge amount of time, together with the higher labor cost. And since the Chinese description texts have a certain degree of irregularity, it also poses a problem for the automatic mapping of the datasets. Aiming at this problem, this paper proposes an automatic mapping method of IEDs configuration datasets based on a deep learning framework—Dynamic Convolutional Neural Network (DCNN). Firstly, it uses the word representation model Word2vec to vectorize words in Chinese description texts as well as their semantics relationships. Then word vectors will be imported in the DCNN, which, based on its multilayer abstract learning characteristics of typical sample features, can perform semantic law mining and automatic mapping. The configuration datasets of intelligent recorder will be automatically mapped based on the Chinese descriptions mapping result. The Practical example shows that the Chinese description texts classification method based on the Dynamic Convolutional Neural Network model has strong semantic analysis ability and high classification accuracy, which effectively improves the accuracy of automatic mapping of intelligent recorder configuration data.


I. INTRODUCTION
The intelligent recorder, which integrates the functions of transient signal recording, network message recording, on-line monitoring, diagnosis of secondary equipment, and managing the fault information for relay protection, is an important device for the operation and maintenance of the intelligent substation. The substation configuration description (SCD) file which is written according to the IEC61850 protocol is the key input file for an intelligent recorder to accurately monitor Intelligent Electronic Devices The associate editor coordinating the review of this manuscript and approving it for publication was Shuaihu Li .
(IEDs) operation information. When the recorder is put into operation, it collects the IEDs operation information and divides them into three different information groups for classified monitoring, including the strap information group, alarming information group and status monitoring information group. Each one contains sub information groups, such as hard strap information group, SV receiving strap information group and function strap information group, etc. Mapping the information of each IED data output interface address in SCD file to different information groups of the recorder is the basic step to ensure that the operation recorder can monitor different types of IED operation information accurately in real time. Therefore, the IED data output interface address description datasets are also the intelligent recorder configuration datasets. At present, the mainstream mapping method is mapping each interface address description datasets to the right information group manually according to the corresponding interface Chinese description texts in the SCD file. Therefore, the classification accuracy of Chinese description texts directly determines the mapping accuracy of these configuration datasets. In high-voltage and large-scale substations, there are many IEDs, and the workload of manual classification is extremely large. For example, a 500kV substation SCD file contains nearly 300 intelligent secondary devices information, and there are tens of thousands of Chinese description texts, which makes the configuration time last more than a month. Besides, the problem of datasets automatic mapping is that there are differences in Chinese description texts. Relative standards have semi-structured constraints on Chinese description texts of different IEDs, but there are still problems in semantic law recognition. In order to meet the high-precision requirements of intelligent recorder automatic mapping task and strengthen its semantic analysis ability of semi-structured text, it is necessary to conduct deep semantic learning for Chinese description text of a large number of IEDs data output interface, and then design the automatic classification mapping unit in the recorder.
Chinese description texts classification consists of four steps: words segmentation and their numerical representation, features extraction and classification. Words segmentation method usually relies on the dictionary constructed with more than 10000 words and uses toolkits to divide them. Traditional numerical representation method usually relies on relevant algorithms to map text words to the numerical space [1], [2]. Traditional features extraction methods rely on eigenvalue function to screen eigenvalues as features [3]- [5]. Traditional classification models include Decision Tree [6], [7], Bayesian classifier [8], [9], Support Vector Machine [10], [11], etc. However, the traditional numerical representation method has two major problems: semantic gap and dimension explosion; the traditional feature extraction method has poor ability to identify typical features; and the traditional classification models relies heavily on specific tasks, and the text association relationship processing is rough [12].
Convolutional Neural Network (CNN) is a typical structure of deep learning frameworks which integrates features extraction and classification as a single step. It can use multiple convolution kernels to collect feature values in different regions from input matrices. Feature vectors representing key semantics are outputted at the top layer of the network and then classified. F. Xu. et al introduce the CNN to the Chinese text sentiment analysis, and the performance of model semantic recognition is better than that of traditional machine learning model [13]. B. Gu. et al introduce the CNN model to human action recognition task, which effectively improves the accuracy of recognition result under different conditions and the efficiency of classification process [14].
B. Zhao and Z. Yang introduce CNN models which are optimized by attention mechanism and crisscross algorithm to the process of power grid load forecasting. Models have small prediction error and great application potential [15], [16]. In a word, Convolutional Neural Network has excellent performance in feature extraction and classification prediction tasks, which can realize Chinese texts deep learning.
This paper presents an automatic mapping method of intelligent recorder configuration datasets based on the DCNN (Dynamic Convolutional Neural Network). Firstly, the SCD file is used to obtain the address description datasets of the IEDs data output interface and the corresponding Chinese description texts. Then, the text representation model Word2vec which is constructed based on BP neural network is introduced to generate the word vectors reflecting the word association relationship. Then, Different DCNN models with different key hyper-parameters are constructed and used to classify the datasets in order to find the optimal model which has the best ability of feature recognition and text classification. Finally, the intelligent recorder configuration datasets are automatically mapped based on the description texts classification result.

II. PREPROCESSING OF INTELLIGENT RECORDER CONFIGURATION DATASETS
Preprocessing of intelligent recorder configuration datasets includes obtaining datasets about IEDs output interface address description information from SCD file, matching the interface address with its Chinese description texts, and representing Chinese texts.

A. OBTAINING CONFIGURATION DATASETS
The configuration datasets of the intelligent recorder are the address description datasets of IEDs data output interface. They are obtained by analyzing the SCD file, which is created by the extensible markup language XML, including nodes, sub nodes, node attributes and their values. Under each IED node, there are sub nodes including LDevice (logical device), LN0 (logical node zero), and FCDA (function constraint data attribute) which contains index information of IEDs interface Chinese description text. A single output interface address could be described by combining a set of FCDA attributes values. At the same time, the FCDA attributes values can also be used as the basis for obtaining the desc attribute value of the sub node DOI (object instance) in the sub node LN (logical node), which is the Chinese description text of the data output interface. Xml.etree.ElementTree [17] module is a commonly used XML file analyzing tool in Python. It built an ''Element Tree'' to describe different nodes and their affiliation in the orginal XML file, and the most superior node is called ''Root''. All IEDs output interface address description datasets and Chinese description texts can be obtained by traversing all nodes and attributes values, and these two kinds of information could be matched one by one to ensure that the address information mapping result is VOLUME 8, 2020 the same as the text classification result. Pseudocodes of the obtainment algorithm are shown below.

Algorithm 1 Obtainment of IEDs Output Interfaces Address Description Datasets and Chinese Descriptions (for One Interface)
Parameters: W ij : 1 × 7 Variable to storage the j th output interfaces address description datasets of the i th IED and its Chinese description text Inputs: The SCD file of an intelligent substation BEGIN Initialize W ij with random values Use the xml.etree.ElementTree module to analyze the SCD file

B. REPRESENTATION OF CHINESE DESCRIPTION TEXTS 1) FEATURES OF CHINESE DESCRIPTION TEXTS
Compared with common texts, IEDs data output interface Chinese description texts usually involves electrical terminology, such as '' (remote modification of fixed value soft strap)'', '' (remote operation hard strap)'' and so on. In the words segmentation stage, it is easy to have the false words segmentation result, which leads to the false representation of word vectors. Therefore, the word representation model in this paper will introduce user-defined terms into the classification Python unit ''jieba'' [18] to improve the accuracy of word vectors representation. At the same time, the text is often the mixture of Chinese and English, such as '' (the switch position of PT switch is invalid)'', and there are some expression differences in texts which may suggest similar information, such as '' 3GOOSE A (GOOSE receiving network A of link 3 is disconnected)'', '' (the GOOSE receiving process of GOCB1 within the process layer network A is interrupted)'', when using the expert system to carry out automatic mapping, it is difficult to control the completeness of reasoning rules, resulting in the rules missing or redundant. In this model, the above characteristics are considered in the text representation stage to ensure the accuracy of the Chinese text numerical representation vectors.

2) THE CHINESE DESCRIPTION TEXTS REPRESENTATION MODEL BASED ON Word2vec
Before the description texts are input into the classification model, it needs to be encoded as a word vector matrix. The traditional word representation model BOW (Bag of Words) assigns number '1' to a single dimension in the word vector according to the index value of the sample word in the dictionary and assigns number '0' to other dimensions. Because different word vectors are orthogonal to each other, each word exists in isolation. It is difficult for vectors to represent the semantic relationship between words. In addition, a BOW word vector is high-dimensional and sparse, which makes the classification process time-consuming and inefficient.
Word2vec text representation model [19], [20] (including Continuous BOW and Skip-Gram two kinds of forms), which is built based on BP neural network, can effectively reduce the dimension and sparsity of word vectors and improve the vectors ability to represent semantic relationship of the original words. The model sets local text analyzing window and assumes that the context word vector in the window is known Then it regards maximizing the occurrence probability of the central word as the Word2vec model training target, in order to realize the mapping of words in the semantic space and obtain the dense static word vectors after dimension reduction.
In this paper, Continuous BOW model is used to represent the words in IEDs data output interface Chinese description texts, and negative sampling method is used to optimize the weight-updating process. With the sliding of the text analysis window, each word can appear as the central word and the context word. Therefore, Continuous BOW model contains the context vector matrix and the central word vector matrix. These two matrices are randomly initialized. When a word appears as the context word, the model inputs its BOW word vector to obtain the context vector. Then the word appear as the central word and the input value of the output layer is obtained by multiplying the central word vector of the word and the context word vector of the nearby word. Through training the Continuous BOW model, the parameters of the context word vector matrix and the central word vector matrix can be updated. Finally, the word vector of each word is the average value of its context word vector and central word vector. The final numerical representation result of a single text is a matrix which is constructed by connecting word vectors in sequence. The schematic diagram of Continuous BOW is shown in Figure 1, in which w 1 . . . w c are different words included in a sample text and w i is the central word, W 1 is the context vector matrix and W 2 is the central word The calculation process from the hidden layer to the output layer includes solving the occurrence probability of all words and finding the maximum value which leads to the calculation amount complex and time-consuming. In order to improve the efficiency, the model introduces the negative sampling method [20], which sets one context word included in the analyzing window as the positive sample, and negative samples choose five other word vectors not included in the analyzing window according to the probability, so that the number of matrices element values to be updated at a single time is reduced to 5% of the original number, Thus, the calculation time is greatly reduced. The selection probability is shown in (2): where, w j is the positive sample and f (w j ) is the frequency of negative sample words w j 's occurrence. The loss function is shown in (3), in which Wneg is a set of words not to be solved and σ is the Sigmoid function.
(2) Convolution neural network (CNN) is the most powerful deep learning network framework for local feature mining in computer image processing tasks. In the field of text classification, when the text is represented as a word vector, the vector value can be seen as the image gray value, and the word vector matrix can be processed as a one-dimensional gray image, which is input to CNN for local semantic feature analysis, text key numerical value selection and classification. Traditional text classification CNN includes convolutional layer, pooling layer and full connection layer, in which the pooling layer usually adopts the maximum pooling method to reduce the dimension of input vectors, and the output vector only uses a single key feature value to describe the input vector, ignoring the correlation between the omitted value of the vector and the feature value. Thus, the semantic learning ability of the traditional CNN is weak. In view of this, this paper introduces the DCNN (Dynamic Convolution Neural Network) as the Chinese text classifier, in which dynamic K-max pooling layer is adopted to reduce the dimension of the input vectors, and key feature values of different numbers are extracted according to the length of each text sentence, in order to retain the relative position between the words and the word order information in the original text. In addition, the wide convolutional layer in the DCNN model can not only effectively capture the local feature value but also the edge information of the text more comprehensively. Generally speaking, DCNN structure includes four kinds of network layers: wide convolutional layer, K-max pooling layer and dynamic K-max pooling layer, folding layer and full connection layer. The structure schematic diagram is shown in Figure 2. The input text is '' the GOOSE receiving process of GOCB1 within the process layer network A is interrupted '', which has seven words in Chinese. VOLUME 8, 2020 In the wide convolutional layer, 1×m-dimension wide convolution kernel is used to convolute vectors from different dimensions of the word vectors matrix. m is the convolution dimension. w is a single dimension value of the convolution kernel, and x is a single dimension value of a word vector. o is the output of a single dimension value after convolution, and b is the model offset. the calculation process is shown in (5). Different from the common narrow convolution kernels, when processing vectors with wide convolution kernels, the edge of vectors is usually filled with zeros to retain the edge semantic information.
The K-max pooling layer and the dynamic K-max pooling layer are used to filter the key feature values. The K-max pooling layer selects top K largest feature values from each dimension of the input vector, and the selected values retain the typical semantic and order information of the input vector to the greatest extent. When the order of feature values increases, the processing of fixed number maximum pooling will easily lead to the redundancy of semantic information selection. To ensure that the number of values of pooling layer can be adjusted adaptively to the length of input text sentence, the dynamic K-max pooling layer sets K as a function of text sentence length and network depth, as is shown in (6). Where n is the input sentence length, l is the sorting number of the current layer in all convolutional layers, L is the number of convolutional layers, and K top is the K value of the top-level pooling operation.
In the folding layer, the adjacent two-dimensional vector values of the input matrix are summed and spliced. In the previous calculation process, the convolution operation is only carried out for each dimension of the input vector and different dimensions are independent of each other. The folding layer can map the relationship between adjacent dimensions, and the input vector dimension can be reduced by half. Finally, the full connection layer realizes the text classification.
The DCNN classification prediction results will be imported into the Python dictionary that matches the IEDs output interface Chinese description texts and their address description datasets, which makes preparation for the datasets mapping into the configuration file of the intelligent recorder.

2) INPUTTING CONFIGURATION DATASETS INTO THE INTELLIGENT RECORDER CONFIGURATION FILE
The configuration file of the intelligent recorder is also written in XML language. Each IED_node contains a_node set constructed by three major information groups and their sub information groups. These_nodes also contain a blank address description sub_node to be filled by the address description datasets of the IED output interface. This paper uses the xml.etree.ElementTree module in Python to read the classification results of Chinese description texts before inputting address information, and loop each information group of IED_node to find the target category according to the classification results. Then the model automatically input the interface address description datasets to the address description sub_node, completing the automatic mapping of the datasets. When the recorder is put into use, it can automatically find the address of each IED data output interface by analyzing address description sub_node contained in each information group of its configuration file, so as to accurately monitor IED operation information. Pseudocodes of algorithm inputting datasets into the configuration file are shown below.

A. INFORMATION ABOUT SAMPLES
This paper selects 2500 classified Chinese description texts of IEDs data output interface recorded in 110kV and 500kV substations in Hebei Province as analysis samples. The description text and information groups which the  datasets belong to are complete. In this paper, samples are divided into the train set, validation set and the test set in a ratio of 6:2:2. The model uses the train set and validation set to update model parameters and the test set to evaluate model performance in each epoch. In this paper, three typical subcategories are selected as the target class to which the sample datasets belong: the functional strap information group in the strap information group, the optical brand alarming information group in the alarming information group, and the online monitoring information group in the status monitoring information group. Some text samples are shown in table 1. This model is programmed in Python 3.6.8, and word vector is constructed by gensim module. DCNN classifier is built by tensorflow toolkit. The CPU of the computer is Intel Core i7-8565u, the main frequency is 1.8GHz, the running memory is 8g, and the capacity of solid-state disk is 256gb.
The metric F 1 is the mostly used evaluation metrics in the common binary classification. Table 2 shows the confusion matrix, which is a basis for defining the metrics F 1 . Therefore, the P, R, and F 1 can be defined as: However, the texts can be divided into three groups (i.e. the functional strap information group / the optical brand alarming information group / the online monitoring information group), so it is a three-class problem. Hence, macro-F 1 (MF 1 ) is adopted to measure the average performance of whole groups. MF 1 is calculated by averaging F 1 of all groups.
where, MP and MR are macro-P and macro-R respectively. They are defined in (9).
where, P m and R m are the calculation results corresponding to the class l m (m = 0, 1, 2).

B. CALCULATION AND ANALYSIS OF THE EXAMPLE 1) EVALUATION OF Word2vec REPRESENTATION ABILITY
The Word2vec model could represent words semantic relationship in the original text. In order to prove that, this paper introduces the Principal component analysis (PCA) method, which is a common algorithm capable of reducing the dimension of high-dimensional vectors and converting them to low dimensional coordinates in another vector space. After the Word2vec vectors of different words being processed by the PCA method, the words semantic relationship could be measured by the distances between different coordinates in the three-dimensional semantic space, and the distances are used to evaluate the Word2vec representation ability. If the distances between words with close semantic relationship are much shorter than that of words with sparse semantic relationship, the Word2vec representation ability is proved to be strong. (high voltage side)'' and '' (process layer)'' coming from different description texts as central words. To make the relationship between different words more obvious, five most closely related words of each central word are also selected as background words. In total, 18 Word2vec word vectors are chosen to be processed by the PCA method and mapped into the three-dimensional semantic space. The results are shown in Figure 3. Since each central word only appears with its background words together in a single description text instead of the other 12 words, the distances between each central word and their background words are significantly shorter than that of other words, which reveals the fact that each central word has closer semantic relationship with its background words. Finally, three clusters with three central words as their centers could be obviously observed in the three-dimensional semantic space. Equation (10) defines the method to calculate the relativity between words coordinates w l (x l , y l , z l ) and w m (x m , y m , z m ), in which L max is the largest distance between words in the three-dimensional semantic space. The bar charts of relativities calculation results are shown in the Figure 4, in which three groups of bars representing relativities between three groups of background words closely related to the central words and central words themselves.

2) EVALUATION AND OPTIMIZATION OF DCNN CLASSIFICATION ABILITY
The hyper-parameters are set to fixed values before model training. Usually, hyper-parameters need to be optimal to improve the performance and effectiveness of learning. There are four key hyper-parameters closely related to the ability of the model. The learning rate and batch size directly determine the speed and range of model parameters updating within a single batch. The convolution kernel size decides the size of key semantic feature value, the more features considered in a single time, the larger the convolution kernel size is. The dropout ratio of full connection layer directly affects the generalization ability of the model. This paper adjusts hyper-parameters referring to the sentiment prediction sentiment prediction project in movie reviews [21]. Firstly, the convolution kernel size is set to 3 and the dropout ratio dp is 0.5, and the contrast experiments between models with different learning rates and batch sizes shows that the model whose learning rate lr equals 0.01 and batch size bs equals 64 has the highest test set MF 1 . The experiments results are shown in Figure 5. Secondly, contrast experiments between models with different kernel sizes and batch sizes are conducted to choose the kernel size and dropout ratio. Models with three groups of different convolution kernel sizes are set according to the sample text length and the commonly used kernel sizes of text convolution network. Three groups of dropout ratios are also set up, in which 0.8 indicates that the reserved value is less, which may lead to the loss of key semantics, and 0.2 is close to complete retention, which results in poor generalization performance and excessive dependence on local features. The experiment results are shown in Figure 6. According to the result, when the kernel size equals 4 and the dropout ration equals 0.5. the MF 1 of test set reaches the highest     Table 3.
In order to reflect the superior performance of the automatic mapping model DCNN, this paper selects four typical traditional classification models and two kinds of shallow neural network classifiers for comparison. The training set is used to update model parameters, and the evaluation standard is test set MF 1 . Four kinds of traditional It can be seen that the MF 1 of the classification model based on Word2vec and DCNN is significantly higher than other models, which proves that the classification ability of the DCNN is excellent.

V. CONCLUSION AND FUTURE WORK
This paper considers that when mapping the IEDs data output interface address description datasets to the different information groups according to Chinese description texts, the manual mapping workload is large and automatic mapping method could have high error rate due to the complexity of the data and the semi-structure feature of description texts. The paper proposes a self-mapping method of address description datasets based on the Dynamic Convolution Neural Network. After obtaining the Chinese description texts and representing them through BOW model, the Word2vec is used to reduce the dimension of the BOW word vectors, map the semantic relationship between different words and therefore calculate the final word vectors. Then, the DCNN model is used to select local semantic feature values of word vectors, identify and extract key semantics for classification, which greatly improves the classification accuracy. The classification time is far less than the manual configuration work time, and the comprehensive evaluation index macro-F 1 of test set samples can reach more than 97%, which is significantly higher than that of the traditional classification model and shallow neural network. Through providing the mapping result for the configuration datasets input into the intelligent recorder configuration file, the DCNN model effectively improves the accuracy of automatic mapping of recorder configuration datasets.