MetaGNN-Based Medical Records Unstructured Specialized Vocabulary Few-Shot Representation Learning

With the continuous breakthroughs in artificial intelligence technology, it has become easier to extract general-purpose knowledge using machine learning, but it is a challenging task to extract and learn small samples of knowledge in medical expertise. On the one hand, it is difficult to represent medical expertise entities, and on the other hand, the training samples of such expertise are small, and deep learning methods often require a large number of samples to complete the learning task. To this end, we proposes a graph network learning method for specialized vocabulary representation. Specifically, a contextual knowledge representation model based on graph meta-learning is proposed, which combines text, phrase, vocabulary, and other information to solve the problem of sparse data of medical electronic medical record entities that cannot be extracted and learned. In this method, a text-independent lexical representation learning method, a context-aware graph neural network, and a combined LSTM language model are used to model information from different perspectives as a way to learn semantic representations of professional discourse entities. The experimental results show that the accuracy of the method outperforms other similar methods and proves its effectiveness.


I. INTRODUCTION
In the past two years, significant progress has been made in the field of graph neural network research. Several graph neural network aggregation models have emerged, each suitable for specific scenarios. In the text relationship extraction task, the combination of Graph and LSTM models, the combination of graph network and meta-learning, etc. have achieved good results [2], [3]. According to the research of related scholars, graph neural networks have three general frameworks: the first one, message passing neural network (MPNN) [4], unified graph neural network, and graph The associate editor coordinating the review of this manuscript and approving it for publication was R. K. Tripathy . convolutional network (GCN) method [5]; The second non-local neural network (NLNN) [6] combines various Attention mechanisms; the last graph network (GN) unifies the first two MPNN and NLNN, such as GGNN model, relation network (Relation Network [7], Interaction Networks, etc.
However, although these network models perform well in large-sample learning, they still perform poorly in small-sample professional vocabulary representation learning such as electronic medical record representation, Chinese bioinformatics relationship extraction, electronic health record information extraction, medical patient diagnosis classification, etc [8], [9], [10], [11]. The text corpus of medical professional vocabulary is generally small and beyond the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ expertise of most people, and requires a certain level of medical professional background to understand. The current mainstream approach for extracting medical expertise and textual representation is to extract expertise directly from the text corpus through a data-driven approach. However, this extraction approach expresses a single relationship, which is not sufficient to accurately model the conceptual knowledge of expertise in the schema. In addition, the lexical knowledge of these specializations has a large number of polysemous words and is ambiguous. These ambiguities have a serious impact on the representation learning of the text task, causing semantic bias in the representation vector. From RNN to LSTM, it works well in processing contextual text data [12], but it is difficult to obtain the grammatical structure information of words. Some scholars have studied syntactic information of sentences, such as tree-like LSTM methods [13]. However, a large amount of data is required to obtain syntactic information. If the amount of data is small, it is difficult to obtain the dependency structure information in the vocabulary. Therefore, it is a challenging task to accurately model the dependency structure expressed by the contextual relations of specialized lexical knowledge. The contributions of this paper are as follows.
1) To address the problem of medical professional vocabulary representation learning, we propose a network representation of graph meta-learning classification method. Specifically, we use graph networks to represent the feature information of medical professional vocabularies such as the semantic information of word context and grammatical structure of words and then combine them with meta-learning methods to solve the problem that medical professional samples are difficult to learn. 2) For the difficulty of extracting professional medical vocabulary, phrases, and text information, we propose a combination of a graphical neural network and LSTM language model to solve the problem of learning unstructured medical professional vocabulary with multiple meanings.
This study is divided into 5 parts, I. Outlines the research background and contributions of this paper, II. Introduces the work related to graph networks, meta-learning, LSTM fusion, and knowledge representation learning based on graph networks, III. The method section analyzes the methods proposed in this paper, IV. Does comparison experiments on public and real datasets, and does ablation experiments and case analysis, respectively. V. The research work of this paper is summarized.

A. GRAPH NEURAL NETWORKS AND KNOWLEDGE REPRESENTATION
Graph networks and graph embeddings are currently very mature in the fields of images [14]. Commonly used methods include DeepWalk [15], node2vec [16], SDNE [17] and graph2vec [18]. These methods have achieved great success in the field of network representation learning. However, in the field of text, because the context relationship is asymmetric, if the vocabulary is polysemous, the learning effect will be negatively affected. In order to reduce these negative effects, more semantic information needs to be learned, so adopting a combined network model is a good choice.
In general, GNNs are classified into structured scenarios, unstructured scenarios, and other three categories. The graph neural network is divided into five categories [26], [27], See table 1 below for details. Previous work generally uses one-hot encoding to represent entity knowledge, but using graph neural networks to encode structural knowledge and attribute entity representation for fine-grained modeling of entities, vocabulary, etc., allows the introduction of multi-angle feature knowledge [19]. Knowledge representation based on graph networks can flexibly use local, spatial, and other structural information, and can effectively address the polysemy of words [20]. Therefore, using graph networks to implement knowledge representation has become a hot research topic in recent years such as Ebisu, Balazevic Zhang, and other related scholars used graphs to accomplish different tasks respectively [21], [22], [23].

B. META-KNOWLEDGE REPRESENTATION LEARNING OF SPECIALIZED VOCABULARY
Meta-learning is based on task learning. The target task is to learn the prior knowledge of sample data, so that the model has the ability to learn, and then can quickly learn new tasks of the same kind. The definition of a meta-learning task is shown below: T = {L(x 1 , a 1 , . . . , x H , a H ), q(x 1 ), q(x ( t + 1)|x t , a t ), H}. (1) The equation (1) L is the loss function of the task, x 1 and x H are the 1 and H observations, and a 1 and a H are the 1 And H state output, q(x 1 ) is the distribution of the initial observation of the first sample; q(x (t+1) |x t is the distribution of the next observation x (t+1) , and a t is the output value.
The complementarity of small samples and knowledge graphs has attracted extensive interest in recent years, and Yuan X and other related scholars have completed the missing facts of long-tail relationships by supporting instances of small samples in knowledge graphs [24]. Previous graph models encode small-sample relationships by aggregating information about each entity's neighbors but do not make fine-grained distinctions between neighbor relationships and entities. Inspired by humans and the organization of knowledge represented by hierarchical features, some scholars have explicitly unpacked the meta-learner as a meta-hierarchical graph with different blocks of expertise [25]. When encountering a new task, it constructs meta-knowledge paths by exploiting the most relevant blocks of knowledge or exploring new knowledge.

C. INTEGRATION OF GRAPHS AND LSTM MODELS
A huge quantity of labeling is necessary for relation extraction using supervised learning, and labeling is likely to consume a significant amount of manpower and material expenditures. The graph network's development goal is to reduce expenses by dealing with the problem of a small number of sample annotations or even no annotations. In recent years, using a minimal number of samples to train a graph neural network has become a research hotspot [28]. The goal of few-shot relation extraction is to train a model Graph neural network model using a short quantity of labeled data and then use an LSTM model to learn the sample's context [29]. This paper explains how to accurately classify the category of very few-shot using graph neural network and LSTM model approaches.
There are two recursive propagation units in the Graph-LSTM model: GraphLSTM, a liner LSTM, and a Graph Model. For each connection in the phrase, liner LSTM generates a vector equation (2).
The LSTM model is a time-recurrent neural network designed to solve the long-term dependence of RNNs. Compared with RNN, there is one more value representing cell memory. There are three forgetting gates, input gates, and output gates. The forget gate can be decided to discard and retain that information. The input gate can update the unit state. The output gate first passes the previous hidden state and the current input to the Sigmoid function; then the newly obtained unit state is given to the Tanh function; then the Tanh output and The Sigmoid output are multiplied to determine the information that the hidden state should carry, and finally, the hidden state is output as the current unit. The model of LSTM is illustrated below equation (3).
The model of Graph LSTM is as follows equation (4).
The difference between equation (4) and (3 )is that Graph LSTM is flexible and versatile. It can extract the LSTM relationship of the entity binary relationship and another LSTM entity identification while seamlessly combining in together.
In addition, Graph LSTM obtains the dependency vector of the sentence and multiplies it by the output state point of the hidden layer unit. The length must be equivalent to the output vector h t . This problem can be solved by a tensor product. The algorithm is illustrated below equation (5).
where, e j is the vector of dependency relation, and the final result is the matrix form of the tensor product of L dimension vector and D dimension vector, see as follow: On the other hand, in the research of related scholars in the context of specialized vocabulary, the model-based method in 2001 proved to be useful for meta-learning. A neural network using memory is used for meta-learning, where updating the weights can adjust the output. It is also useful to embed the VOLUME 10, 2022 training set of the matching network in the method of metalearning metrics. When gradient optimization is not possible with small samples, the meta-learner can be represented by LSTM. The LSTM model uses the parameter update method of gradient descent to learn the small-sample training set. From the equations of (3) and (4), Graph Model can also be used in few-shot learning. This paper is based on this idea and proposes Graph Model and few-shot meta-learning. And verifies the scenario of dealing with the contextual semantic relation understanding of specialized vocabulary. Metalearning pseudo-code see algorithm 1.

Algorithm 1 Meta-Learner Training
In the natural language processing task, in the simple binary relationship extraction, the main method is the closest dependency path between entities. In multivariate relationship extraction, n-ary relationships can be divided into n-binary relationships, but the inter-sentence dependency relationship cannot be realized, and the Graph and LSTM model can solve this problem well. Graph and LSTM is a kind of multi-task learning based on a sub-relationship. In each sub-task learning, a meta-learner is utilized to accumulate knowledge about the similarities and differences between tasks.
When dealing with the relationship between document sentences, you can use the Graph and LSTM skeleton to process the relationship between sentences through graphs. The nodes of the graph represent the vocabulary, and the edges of the graph represent the relationship between different vocabulary. Are these relations related to whether the utterance is related and whether it exists Syntactic dependence, whether there is a contextual correlation, etc?

D. INTEGRATION OF GRAPHS AND META-LEARNING
Combining Meta-GNN learning and text application, relevant scholars have already conducted research on primitives [30]. The literature proposes a learning framework called Meta-GNN to solve the problem of node classification in the primitive learning environment. Through learning and training multiple similar few-shot learning tasks, according to the a priori knowledge of the classifier, the new class with few labeled samples is used to classify the node. The cross-entropy loss function for Meta-GNN training is the equation ( 7).
When updating the parameters, each step or several steps in the task T i uses a gradient descent algorithm, and the parameters of gradient descent are updated to the equation ( 8).
The α 1 is the learning rate of the learning task and the parameters of the model, and the optimized performance after metatraining f θ , see equation ( 9).
In essence, the goal of Meta-GNN is to optimize model parameters and achieve the best performance of node classification on new tasks. The new task is updated with a small number of gradient descent algorithms. Perform stochastic gradient descent (SGD) to perform cross-task. Metaoptimization, and the model parameters θ are updated accordingly, see equation ( 10).
Summary: in recent years, more significant progress has been made in graph-based representation learning. However, these methods are difficult to apply effectively to representation learning in specialist domains for two reasons: firstly, specialist domain knowledge bases are often sparse and difficult to model accurately; secondly, some particular specialist knowledge bases have a large number of polysemous words, and these polysemous words can have a serious negative impact on the representation learning of their neighbors, causing a semantic shift in the representation vectors of their neighbors.
Different from data-driven acquisition methods, metaknowledge-based representation learning can collect a large amount of tacit knowledge. In addition, researchers can ''query'' a large amount of tacit knowledge from it without designing pattern engineering. However, efficient ''query'' methods are often difficult to design. Secondly, although a large amount of tacit common sense can be obtained based on the acquisition method of meta-knowledge, it is difficult to acquire specialized knowledge other than meta-knowledge due to the reasoning process of learning through learning.
In the current related works, the acquisition method based on text mining has higher acquisition efficiency, but the knowledge acquired by this single method is less accurate. In addition, although this method is a widely used knowledge acquisition method, it is often only able to extract the professional knowledge explicitly mentioned in the text corpus, and it is difficult to obtain the implicit professional knowledge.

III. METHODOLOGY
In this study, we propose a deep representation learning model for primitive learning based on contextual semantic information. The model fuses information about different sources of specialized lexical triples, including vocabulary Phrases, text, and graph information, to solve the problem of low data volume. In this method, a word representation learning method and a graph GNN network are first used to model the information of the sentence, so as to learn the semantic representation of the subject and object. In particular, in the process of textual sentence modeling, a long and short-term memory network is introduced to learn the contextual information of different sentences. The overall architecture description is shown in Figure 1.

A. LEARNING THE REPRESENTATION OF GRAPHS
Given a triple (s, p, o), graph information refers to subgraphs g1 and g2 centered on subject s and object o, respectively, both of which are part of a specialized knowledge base G. g1 and g2 provide rich structured information about the subject and object, which is useful for determining the relationship between them.
To obtain the representation of subject s and object o from the graph structure information, GNN is used to model the graph information. GNN is a multilayer neural network that operates directly on the graph and learns the semantic representation of nodes through the information transfer from neighbors. It is important to note that the representation learning of relations is not considered in graph information because of: first, the sparsity of the specialized knowledge base; and second, it can improve the computational efficiency. Formally, the expertise base is defined as a graph G = (V, E), where V and E denote the nodes and edges in the graph, respectively. In the modeling process of GNN, for node v V, there exists an edge association of node v with itself, i.e., (v, v) E. GNN introduces the adjacency matrix A and degree matrix D of graph G, where D ii = j A ij .

B. META-GNN-BASED CONTEXT CLASSIFICATION
Traditional CNN is difficult to deal with intensive classification tasks that assign unique categories to individual nodes in the graph structure, or processing text sentence classification tasks, or processing word-level sequence labeling. To solve these problems, some scholars have proposed a deep learning model based on graph CNN [31], which first converts text into word graphs, and then uses graph convolution operations to calculate word graphs. The method is to convert documents or Sentences that are regarded as graphs of word nodes, and a graph is constructed by relying on document reference relations.
The combination of GNN and meta-learning [32] solves the problem of node classification in the graph element learning environment. The method is to train multiple similar few-shot learning tasks through learning and training. According to the prior knowledge of the classifier, use the mark the new class with few samples classifies the nodes. The primitive learner can be divided into a multi-task metalearner and a meta-tester, where the meta-tester continuously updates the learning parameters of each task through testing. Specifically, the algorithm is shown in algorithm 2.
The pseudo-code of the algorithm for primitive learning is: The Meta-learning loss function is the cross-entropy loss as Algorithm 2 Graph Meta-Learning Algorithm Require: distributed meta-learning task p(T ), meta-learning task T mt , task learning rate α 1 , meta-learning rate α 2 Ensure: The label of the node in the query set T mt 1: random initialization θ 2: while notconverged do 3: batchofmeta − learningtaskT i ∼ p(T ) 4: for Task in T i do 5: Use S t to evaluate L τ (f θ )through the equation (11) 6: calculate and update the parameter θ through the equation (9) 7: uses Q t to evaluate L τ i f θ i 8: end for 9: Update θ by equation (10) 10: end while 11: Use the support set T mt to calculate the updated parameters through the equation (8) θ m t 12: Use model f θ mt in query set T mt follows: Deep learning is a mapping from a certain data distribution X to another distribution Y . Meta-learning is a mapping from a set of tasks D to the optimal function f (x) corresponding to the task. Primitive optimization learning is the mapping of each task from Euclidean space to non-Euclidean space, and fast learning on new tasks is achieved by training model parameters, as long as a small number of gradient updates are required. θ is continuously optimized according to the loss gradient of the new task, and finally, the optimal parameters are obtained. The specific definition and algorithm of meta-learning will not be expanded in this section.
Meta-learning uses prior information to find new categories that don't exist. The meta-learning method involves building a brand-new network model based on empirical knowledge, letting the network learn the empirical knowledge, and then completing the learning using the empirical knowledge learned by the network model, or building an external memory network to introduce prior knowledge, which can be LSTM, GRU, GNN, and other models. Twin networks, relational networks, and other metric computation methods are used to split meta-learning. In the context of specialized vocabulary, with the prototype network as the benchmark, the data enhancement method process is used, and the cross-domain metric is calculated as in equation (12).

C. EXTRACTION OF SPECIALIZED VOCABULARY
Pixel learning is processed on few-shot learning such as text processing. Unlike Graph-LSTM, the former focuses on text node classification, while the latter focuses on text relationship extraction. The same is that both are currently based on the GNN model. The theoretical basis of GNN is Banach's Fixed Point Theorem (Banach's Fixed Point Theorem), see the literature [35]. For a graph, all nodes on the graph finally converge to a certain fixed point after constant iteration, which is called a fixed point. The points on the graph are mapped by a function, and the two points (x, y) on the plane are compressed into another space to become points f (x), f (y).
In this study, the graph long and short-term memory network meta-learning is applied to the real hospital electronic medical record data for few-shot application detection. The preprocessing of few-shot electronic medical record data extraction in hospital departments is shown in Figure 2. The input is the first layer, followed by a context vector acquired by LSTM for each word and fine-tuned using a meta-learning technique for the context of specialized vocabulary, in which the context vectors of numerous entities are coupled as an input to a relational graph meta-classifier. The average of each word is taken, together with the meta-knowledge of the contextual phrases, and fed into the graph network structure for multi-word entities. The relational classifier and the graph part of the model are unconnected.

D. GraphLSTM-BASED KNOWLEDGE REPRESENTATION
This section first builds a graph network vocabulary and phrase representation learning model to learn contextual semantic representations such as subjects and objects from the perspective of graphs. The relationship between sentences is processed through the graph. The nodes of the graph represent the vocabulary, and the edges of the graph represent the relationship between different vocabularies, such as whether the language is related, whether the syntactic dependency exists, whether there is a contextual connection, etc. The Figure 3 application architecture is also suitable for multi-task learning. For each sub-task, an LSTM can be used as a classifier to learn simple binary relations. Under the framework of meta-learning application, an end-to-end LSTM model of a small number of sample context learning classifiers is constructed. The LSTM model maps the data in each category to space and extracts the mean value of these data features as the prototype. The specific algorithm is in the next Section details.

IV. EVALUATION AND EXPERIMENT
In order to evaluate the effectiveness of the method, this section will conduct a large number of experiments on the part-of-speech and word segmentation knowledge representation tasks of text sentences, and compare the accuracy of the model with the corresponding baseline models on public datasets. In addition, this section will conduct case analysis experiments of the model to further verify the effectiveness of each module method in this study.

A. DATASETS
This section first chooses to validate the Meta-GNN method using the Reddit dataset, which consists of posts from the Reddit forum. If two posts are commented by the same person, then they are considered relevant and the tags are corresponding to each post. Also, the public dataset GDKD is used to validate and evaluate the effectiveness of our method. To validate the effectiveness of our practical application, we used 110 actual hospital electronic medical records to complete the effect validation. In addition, we compare different knowledge representation models on the accuracy of entity extraction from hospital electronic medical records in expertise experiments, and the experimental results show that our method works better.

B. EXPERIMENTAL RESULTS
The results of Accuracy, Recall, and F1 Value of the graph network meta-learner LSTM model on the public data set are as follows table 2.
Compared with deep learning CNN in the public data set ATR, the Graph-LSTM model has an accuracy rate of 97.60%, a recall rate of 83.28%, and an F1 score of 83.76%. The accuracy of the three indicators of CNN, the recall rate, and the F1 score are respectively. Only 95.23%, 74.42%,  76.95%. It can be seen that the combination of the image and the meta-learner LSTM can achieve the SOAT(State-Of-The-Art) effect.
The ''OURS'' model uses a 3 × 3 convolution kernel, a meta-GNN classifier, and the Softmax activation function. Aiming at the extraction of sentence relations, the experiment chose the public data set GDKD, which is an open-source drug-gene data set, which is mainly to explore whether there is an association relationship between the data and then perform the treatment. This experiment is not to mine genetic relationships but to verify the effectiveness and accuracy of the model's application in sentence relationship extraction. Specifically as shown in table 3. Comparing the deep learning CNN model, bidirectional LSTM model, and Tree LSTM model, The OURS model single sentence and cross sentence relationship extraction can reach 75.9% and 76.7%, which is higher than the best Tree LSTM. It can be observed that the ''OURS'' method has more advantages in the few-shot field. For practical applications, being able to extract more spatial information from the sample set is quite beneficial.
We selected 110 medical record data samples to mark the effect as shown in Figure 4. From Figure 4, we can see that the average accuracy of lexical annotation is 82.35%, which can basically meet the practical application requirements. The effect of the entity extraction model for different knowledge representation methods is shown in table 4.  Through table 4, it can be seen that our fusion model can reach 85.7% in expertise entity extraction accuracy, which is somewhat higher than all other methods.

C. ABLATION ANALYSIS
To evaluate the effectiveness of each feature and method proposed in this study, we next conduct a series of ablation experiments using 5-way 1-shot and 3-shot on the public dataset Reddit. Comparison with current state-of-theart related working model results in dataset Reddit. These related works include 2019-2022 related graph representation learning models such as Meta SGC [40], GCN [41], SemiGNN [42], CGNP [43], etc. Based on these models, we have modified the dataset to meet the knowledge representation of meta-learning. The experimental results are shown in table 5, in which our method based on Meta-GNNN fusion achieves better 3-shot learning accuracy on small sample datasets than all other models, but there is no CGNP learning for one sample The effect is good, and the effect will be better as the number of samples increases, indicating that the method based on Meta-GNNN fusion is better than other methods. It can be seen from table 5 that the meta-learning classifier based on graph knowledge   representation can achieve 20.71% accuracy on a 5-way 3-shot compared with other classifiers.
Analyzing from the comparative results, graph representation learning models such as GCN, SGC, and CGNP are VOLUME 10, 2022 not very effective with few samples, but when combined with meta-learning, more improvement in accuracy can be obtained, so this fused approach is better adapted to new tasks and more node representations can be learned through metalearning methods

D. CASE ANALYSIS
To execute the relationship extraction experiment, use the real data set ''Hospital Electronic Medical Record''. Patient rescue records, daily medical records, admission records, surgical records, preoperative discussions, preoperative summaries, and other few-shot data sets are all included in the data collection. Figure 5 illustrates the specific data format.
The patient's rescue information, such as name, gender, age, rescue time, doctors and nurses who participated in the rescue, and the rescue process information, such as state of consciousness, respiratory status, blood pressure, and so on, are used in the experiment. Extract crucial information from this data and categorize it based on the contextual relationship and content. The findings of the text relation extraction are depicted in Figure 6.

V. CONCLUSION
In the extraction task of small-sample text relations, we propose a MetaGNN medical professional vocabulary feature representation learning method based on MetaGNN for the problem that it is difficult to obtain an accurate representation of medical professional vocabulary. Specifically, firstly, text information, sentence label information, and syntactic structure information of the sentences are extracted from the medical record text. Second, the contextual space information of the sentences is obtained. In addition, we propose a joint model based on LSTM and meta-learning for professional medical vocabulary and phrase extraction. The experimental results demonstrate the effectiveness of our method, which outperforms other text information extraction task methods, such as TreeLSTM, in terms of accuracy and precision.
HONGXING LING received the Ph.D. degree in computer applications. His research interests include information system processing and machine learning.
GUANGSHENG LUO was born in Huangshi, Hubei, China, in 1982. He received the Ph.D. degree in computer application from Fudan University. He is currently a Postdoctoral Fellow at the Management Science and Engineering Research Station, Jiangxi University of Finance and Economics. His research interests include text processing and machine learning.
YU YANG received the bachelor's degree. He is currently the Vice President at AISHU Information Technology Company Ltd., Shanghai, China. His research interests include machine learning, deep learning, and smart finance. VOLUME 10, 2022