Entity Thematic Similarity Measurement for Personal Explainable Searching Services in the Edge Environment

Currently, search engines are widely used to address the information overload problem. Different from the existing client-and-sever-based frameworks, edge computing (EC) technology can provide a new architecture for personalized searching services. The issue of how to measure the similarities among entities by using the context information generated by user behavior in the edge environment is vital in the task of entity-related personal searching. To analyze and measure the similarities among entities, existing methods are mainly based on either the textual content or relationships unilaterally, and the results usually have a fixed degree of similarity. However, the similarities among entities depend on the set of properties that belong to the entities. This approach should be used in determining the similarity or dissimilarity associated with the surrounding context. To address this limitation, we propose a novel semantic augmentation method with a double attention mechanism. The method refers to a dynamic representation learning process that maps an entity to a real number vector in semantic space. In this article, different from the existing similarity measurement methods, we propose a thematic similarity measure approach to analyze the connotation and denotation similarities among entities. The experimental results show that the double attention mechanism leads to a significant improvement in the entity thematic similarity measurement tasks. The model can make a separation among the entities from different domains effectively. In addition, it can take similar entities that are closer in the same domain. It also shows excellent performance on the task of entity thematic similarity, which makes the recommendation results more explainable.


I. INTRODUCTION
Edge computing technology gain increasingly more attention because of its special advantages, and it has been applied in various types of domains [1]- [3]. In addition, it provides a new architecture for personalized recommendation systems. Different from the existing client-and-sever-based framework, personal searching services in the edge environment The associate editor coordinating the review of this manuscript and approving it for publication was Lu Liu . call for processing the user's intents or search results data at the edge of the network. By sinking operations such as intent identification and search results filtering into the EC servicers (or called edge cloud), the data generated by user interaction behavior is used to realize personalized computing in an edge environment. It has the potential to address the common concerns of bandwidth cost-savings as well as data safety and privacy.
The issue of how to measure the similarities among entities by using the context information generated by user behavior in the edge environment is vital in the task of entity-related personal searching.
As a branch of word semantics similarity computing, entity semantic similarity computing or similarity measuring is the essential work of entity-related personal searching in the edge environment. With the rapid growth of information retrieval [4], recommendations [5] and knowledge graph (KG) research in artificial intelligence [6], [7], estimating the similarity between entity names also plays an essential role in entity resolution [8] and entity linking [9] tasks.
As correlational research, words semantic similarity computing has long been an established research area in natural language processing and has attracted much research attention in natural language understanding, as well as in the information retrieval communities [10]. There are many ways to quantify the similarity of a pair of words. In a survey, the methods can be categorized into property similarity measures [11] and relation similarity measures [12]. For property similarity measures, the method is to measure the similarity by comparing the properties of each word. For example, the two words missile and fly bomb share many properties (e.g., both of them are weapons of destruction). In recent years, we use context words in a pretraining corpus instead of properties, and the state-of-the-art methods are to compute the similarity between two distributional word vectors [13]; this approach has been shown to perform well on semantic similarity and relatedness tasks [14], [15]. By following the distributional hypothesis, two words are assumed to be more similar if their surrounding contexts are more similar or they appear together more frequently. Thus, words with similar meanings will have vector representations that are close together in the embedding space. Accordingly, cosine similarity correlates with the cosine of the angle between vectors, and it is a popular measure approach to assess the similarity between words.
Since the distributional word vectors always are pretraining uniquely [16], this results in a fixed degree of similarity between two words. However, a similarity relation is a binary relation between objects; it is reflexive and symmetric relative. The similarity among entities depends on their properties; these properties should be used in determining the entities' similarities or dissimilarities that are associated with the surrounding context [17]. In practical applications, moreover, we note that the entities' semantic similarity is related to the contextual scenario and depends on the properties that they have expressed in common. For example, when we talk about propulsion systems, a missile is more similar to a rocket than to a bomb. (In general, missiles have their own propulsion systems, whereas bombs still use gravity). However, if we talk about a damage model, the opposite is true: the missile is more similar to a bomb compared with a rocket. However, in the current general word embedding method, the vectors are built for individual words, and each word is represented by a single vector in the semantic space and has a single meaning regardless of its practical application context. It has a significant limitation to differentiate entities' similarities in different context scenarios and would often produce inaccurate semantic similarity computing results for the subsequent tasks.
The main contributions of this work can be summarized as below. We propose a new semantic embedding method for entities' semantic similarities; in addition, we transform a point-to-point similarity measurement problem to a sequence-to-sequence problem, and then, we propose a novel semantic augmentation method by utilizing a double attention mechanism. For measuring the semantic similarity among entities, the knowledge graph embedding and transformer networks were introduced in the model. Different from the previous state-of-the-art similarity measurement methods, in this article, we propose a thematic similarity measure approach to analyze the connotation and denotation similarities among entities.
The remainder of this article is organized as follows. Section 2 presents related work and models, and we review the conventional semantic similarity measurement methods and semantic augmentation related methods. Section 3 gives the formal definition of the entity thematic similarity problem, and our proposed method is illustrated in Section 4. Section 5 describes the evaluation matrix, corpus construction, and experiments in detail. Finally, we draw conclusions and outline aspects of future work in Section 6.

II. RELATED WORK AND MODELS
In this section, we review previous work that is relevant to aspects of our proposed method. This review mainly involves the conventional semantic similarity measuring approaches, the state-of-the-art results for popular evaluation datasets, knowledge graph embedding, transformer networks and Siamese neural networks for sequence similarity measurement.

A. WORD SIMILARITY MEASUREMENT
Measuring the similarity between words is the core and roughly equivalent problem of entity similarity. In the literature, there are many metrics for measuring the similarity between words [18], [19]. The approaches of measuring VOLUME 8, 2020 words' semantic similarities can be divided into the following groups: corpus-based approaches [20]- [24] and knowledgebased approaches [25]- [27].
A large number of the proposed approaches could be categorized as corpus based. These semantic similarity approaches are based on word associations learned from large text collections following the distributional hypothesis. Two entities are assumed to be more similar if their surrounding contexts are more similar or if they appear together more frequently. Therefore, by analyzing a large corpus, the valuable information can be extracted and used to measure the similarity between words. According to different computational models, there are count-based approaches and predictivebased approaches.
Count-based methods [28] use normal statistical analysis, and the cooccurrence statistics are directly applied with probabilistic models, matrix factorization and dimension reduction. Web-based word similarity uses web content as a corpus. In a general way, the similarity between words is calculated as the ratio between the number of web pages or snippets that contain both and number of pages or snippets that contain only one word of them.
There is a predictive-based approach that uses word embeddings to find word similarity [29]- [32]: Deep learning is used to represent words semantically. The generated word representation depends on the cooccurrence of words in the corpus. Usually, it directly learns dense vectors through predicting a word from its surrounding context. It has been reported to have good performance in many applications. Since the continuous bag of words (CBOW) model is more computationally efficient and suitable for a larger corpus than the skip-gram mode, the CBOW model is used to train word vectors in a neural network architecture that consists of an input layer, a projection layer, and an output layer, to predict a word given its surrounding words with a certain context window size. Having the trained word vectors, the word similarity is computed using standard cosine similarity. However, the training of word vectors uses only word sequences, and a wide variety of word relations is considered to be equally related according to their cooccurrences, which makes the similarity between trained word vectors coarse and unable to address synonymous words and hierarchical relations accurately. In consequence, knowledge-based semantic similarity methods are considered to enrich some commonsense knowledge of words.
Knowledge-based semantic similarity methods are used to measure the semantic similarity among entities based on semantic knowledge bases (e.g., Hownet [33] and Wordnet [34]), text knowledge bases (e.g., Wikipedia) and a triple-based knowledge bases (e.g., DBpedia [35] and YAGO [36]). Two words are considered to be more similar if they are located closer in the given knowledge bases. Many knowledge-based methods have been proposed in the literature. For example, measuring similarity in WordNet exploits various information, such as the shortest path length, depth, and information content of concepts. Two concepts are assumed to be more similar if they are closer to each other in WordNet. Another common piece of information that is used to compute the semantic similarity is the depth, which is defined as the shortest path length between the root concept and a given concept through hierarchical relations. However, the structural knowledge of the taxonomy has a common drawback of having a uniform distance between concepts. Some methods considered can overcome this drawback by computing the similarity between the information content of the concepts. As information content-based methods lack important information on the path and depth, they are not able to represent the concepts' distance and specificity accurately. Apart from semantic similarity methods for specific ontologies such as WordNet, recent works have also started to propose semantic distances and similarity methods for linked open data or knowledge graphs [7], where a wide coverage of semantic relations between semantic resources is provided. These similarity methods are proposed for more general semantic networks and are focused on entity level resources.

B. KNOWLEDGE GRAPH EMBEDDING
Knowledge graph embedding [37] is to project the components (such as entities, relations and property concepts) into a continuous low-dimensional, real-valued vector space, to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks, such as KG completion and relation extraction, and hence, it has quickly gained massive attention. It is highly beneficial for solving various natural language tasks that involve real world knowledge to access the embeddings of entities in a large knowledge base.
Recently, knowledge graphs (KGs) such as Wikidata, DBpedia, YAGO, and Microsoft Concept graphs, have been published as noteworthy large, cross-domain, and freely available resources; they have also been broadly used in improving the transparency of learning methods, as well as making it explainable for many artificial intelligence tasks. These graph structured knowledge bases contain a large number of entities and property information. Hence, KGs could be an ideal resource for obtaining a properties set of an entity and its representation correspondingly.
Most of the open knowledge bases, such as Wikipedia and YAGO, have items that are organized in a graphical structure. Graph analysis has been attracting increased attention in recent years due the ubiquity of networks in the real world. There have been several solutions that use graph matches to compute the similarity of two entities based on their neighborhood graphs. Recently, methods based on representing networks and each individual node in a vector space, while preserving their properties, have become widely popular.
. n] such that d |V | and the function f preserves some proximity measure defined on graph G. An embedding therefore maps each node to a low-dimensional feature vector and attempts to preserve the connection strengths between vertices. Therefore, each entity is treated as a point in a vector space, and each relation is viewed as an operation over entity embeddings.
Each entity in the Wikipedia knowledge base is associated with an undirected graph whose nodes are entities and whose edges represent links among entities. Therefore, the entity embeddings could be learned by predicting neighboring entities in this link graph. Wikipedia2Vec [38], a Python-based open-source tool for learning the embeddings of entities from Wikipedia, learns embeddings by jointly optimizing wordbased skip-grams, anchor context, and linkgraph models. It implements the conventional skip-gram model to map words and entities into a vector space. The skip-gram model [31] is a neural network model with a training objective to find embeddings that are useful for predicting context items given each item.
A typical knowledge graph such as YAGO usually depicts knowledge as multirelational data, and to demonstrate the relation between two entities, the triple facts (head entity, relation and tail entity) are used in general. An adjacency matrix is commonly used to represent the topology of a network, where each column and each row represent a node, and the matrix entries indicate the relationships among the nodes. To describe the local structural characteristics of a node, the neighborhood structure is important for network embedding. Although the adjacency vector of a node encodes the first-order neighborhood structure of a node, it is usually a sparse, discrete, and high-dimensional vector due to the nature of sparseness in large-scale networks. Such a representation is not friendly to subsequent applications. In the field of natural language processing, the word representation also suffers from similar drawbacks. The development of Word2Vec significantly improves the effectiveness of the word representation by transforming sparse, discrete and high-dimensional vectors into dense, continuous and lowdimensional vectors. To make analogy with Word2Vec, random walk models are exploited to generate random paths over a network. Some representative methods include DeepWalk [39] and Node2Vec [40]. Similar to DeepWalk, Node2Vec preserves higher-order proximity between nodes by maximizing the probability of occurrence of subsequent nodes in fixed length random walks. Moreover, some methods such as SDNE [41], SDAE [42], and SiNE [43], propose deep learning models for network embedding to make deep models fit to network data and to impose network structure and property-level constraints on deep models.
An entity is embedded into a low-dimensional continuous vector space while certain properties of the graph are preserved. The embedding vectors are usually obtained by minimizing a global loss function with regard to all entities and relations in such a way that each entity vector captures both global and local structural patterns of the original knowledge graph. Thus, we can utilize entity embeddings to encode prior knowledge for measuring the entities' similarities.
For obtaining the degree of attention for each property in a given thematic sentence, it is necessary to identify a semantic space and the corresponding representations. In this article, knowledge graph embedding methods were used to obtain the representation of each individual entity, property and other concept.

C. TRANSFORMER NETWORKS
Recurrent neural networks (RNNs) [44], such as long shortterm memory networks (LSTMs) [45], have been successfully employed for many tasks that require the modeling of sequential data; such tasks include language modeling [46], speech recognition [47], and machine translation [48]. In RNNs, the output predictions were made by computing a hidden state vector ht based on the current input token and the previous states. However, because of requiring previous hidden states to be computed before the current time step, they cannot benefit from parallelization; this property underlies their ability to map arbitrary inputoutput sequence pairs. The transformer network [49] avoids the recurrence completely and uses only self-attention. The feed-forward network in each layer of the transformer network is a twolayered network with a ReLU activation. The sublayer is defined as follows: In this article, we propose a modified encoder part of a transformer network wherein the first component of each layer is a Siamese architecture and all parametric variables are shared.

D. SIAMESE NEURAL NETWORKS
Siamese nets were first introduced in the early 1990s to solve signature verification as an image matching problem [50]. A Siamese neural network consists of twin networks, and the parameters between the twin networks are tied. An energy function that computes some metric between the highestlevel feature representation on each side joined the networks at the end. For entity similarity measuring, these Siamese architectures guarantee that two extremely similar property sequences could not possibly be mapped by their respective networks to very different locations in semantic space.
The skip-thoughts model [51], which extends the skipgram approach of Word2Vec from the word to sentence level, feeds each sentence into an RNN encoder-decoder. To measure the similarities of sentences, the RNN encoder was used to obtain a skip-thought vector. Subsequently, a separate classifier is trained by using skip-thought vectors for the pair of sentences that appear in each training example.
As a benefit from the order-sensitive chain-structure, standard LSTMs have become the state-of-the-art models for a variety of machine learning problems, such as text classification and language translation. The Manhattan LSTM model [52] is a Siamese adaptation of the LSTM network, which is proposed for labeled data comprised of pairs of variablelength sequences. It could be applied to assess the semantic similarity between sentences. This model produced a mapping from a general space of variable length sequences into an interpretably structured metric space of fixed dimensionality. By restricting the subsequent operations to rely on a simple Manhattan metric, a highly structured space whose geometry reflects complex semantic relationships was formed for representing the sentences.
BERT [53] and RoBERTa [54] have set a new state-of-theart performance on sequence-pair regression tasks. However, the construction of BERT makes it unsuitable for semantic similarity search; it requires that both sentences are fed into the network, which causes a massive computational overhead. Sentence-BERT (SBERT) [55] is a modification of the pretrained BERT network that uses Siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This approach reduces the effort for the massive computational problem.

III. PROBLEM FORMULATION
Existing similarity computing methods cannot be applied directly in this case because, in reality, the similarity is a dynamic phenomenon: it varies with the context scenario.
To address this limitation, a measurement for reasoning about relative similarities is presented. The relative similarity measurement of entities is defined as accessing the comparison of similarities between two pairs of entities' connotation and denotation information. Therefore, let C be a function of relative similarity measuring; it has two parts of subfunctions that represent the connotation and denotation similarity of entities, respectively. Let Sim_C be the function of connotation similarity of entities, and let Sim_D be the function of denotation similarity of entities; then, the semantic similarity between entity E A and entity E B could be defined as follows: Here, a ∈ [0, 1] is a compound factor. Usually, we consider that the set of properties always implies the connotation information of the entity, while the denotation information of the entity is contained in the context scenarios. In the field of natural language processing, the context scenarios of an entity are always implicit in the sentence associated with the entity.
In this article, we proposed a novel semantic augmentation method with a double attention mechanism. The method refers to a dynamic representation learning process that maps an entity to a real number vector in semantic space based on its properties and the context scenario.
For measuring the connotation similarity Sim_C, we construct a property attention matrix, according to the thematic sentence and the properties of the entity. Thus, let P r be a properties set of an entity; then, the connotation similarity between entity E A and entity E B could be defined as follows: On the other hand, we assumed that similar entities have similar context scenarios. That means that there is a degree of interchangeability between two similar entities associated with the context scenarios. Thus, we defined the denotation similarity between entity E A and entity E B as the following: Here, S (y) x means a batch of sentences in which item x has been replaced by y.
Through the above analysis, we can transform the pointto-point similarity measurement problem to the sequenceto-sequence problems. State-of-the-art results on sequence embedding often use attentional models with some form of convolution or recursion [56], [57]. Instead, the research in [49] introduces the transformer network, which uses only self-attention and feed-forward layers to avoid the recurrence equation and maps the input sequences into hidden states. Specifically, the authors use positional encodings in conjunction with a multihead attention mechanism. This approach allows for increased parallel computation and reduces the time to convergence and achieves state-of-the-art results on several natural language processing tasks [58]- [61].
Drawing on the research approach of transformer networks, for comparing the similarity among entities, we employ an attention mechanism in this work. Our proposed semantic augmentation model represents the properties of sequences and context scenarios of a sentence of an entity by using self-attention and Siamese self-attention networks, respectively. Consequently, a semantically structured representation space can be learned.

IV. SEMANTIC AUGMENTATION
In this section, we present a semantic augmentation method for measuring the similarity of entity names. The processes consist of semantic feature selection and the double attention mechanism embedding refining phase.

A. SEMANTIC FEATURE SELECTION
Through the embedding process, we obtain a concept/property embedding index, denoted as I p , which would be used as a basic look-up table for converting a property to its vector representation in subsequent processes. To identify and rank the properties of an entity according to the degree of attention in a thematic sentence, we present a semantic feature selection method that consists of properties' attention calculating phase, selecting phase and sorting phase.
The goal of feature selection is to obtain a sequence of the ideal attention ranking of the properties. Let f be a connotation attention function and let P r be a property set of an entity, as mentioned above. Let pr f i be the degree of attention for a property p ri (p ri ∈ P r ); then, pr  the degree of relevancy associated with the thematic sentence and ranked by the attention function f .
Let W = {w 1 , w 2 , . . . , w i_entity , . . . w m } be a sentence that implies the context scenario of the entity w i_entity , and then, the entity w i_entity is represented by its properties set. In this article, we use the entity's adjacency neighbors in the knowledge graph instead, denoted as pr = {pr 1 , pr 2 , . . . pr j . . . , pr n }, and obviously, pr i ∈ I p .
Here, is an indicator function, and (w i ) = 1, if the word-embedding pair < w i , w i_embedding > is contained in dictionary I p ; otherwise, (w i ) = 0.
By applying max-pooling to each row of the attention matrix A, we obtained the max attention value of the property, and then, the property sequence pr . → pr f n was identified according to the descending order from the values of the attention. This property sequence could express the semantic information of an entity combined with the context scenario.

B. ATTENTION MECHANISM
The connotation and denotation information could be utilized to enhance the degree of similarity among entities. The selfattention and a Siamese self-attention mechanism were used to compile that information in our work.
For encoding the semantic feature sequence which was selected from the knowledge graph, as mentioned in the transformer networks, our model follows the architecture that uses stacked self-attention and pointwise, fully connected layers for the encoder shown in the left half of Figure 4.
To encode the interchanged context scenarios with comparable entities, the proposed Siamese transformer networks model is outlined in the right half of Figure 4.
From the source tokens of each of the sentences pairs, we add ''positional encodings'' to the pre-trained input embeddings at the bottoms of the encoder. Since the positional encodings have the same dimension as the embeddings, by using an additive operation, the embeddings of dimension d model are generated. The encoder consists of N layers, and each layer contains two sublayers. The proposed attention layer can be described as follows: Different from the original transformer network, there are two parallel networks of multihead attention, which each process one of the sentences in a given pair (S (∼) A , S (∼) B ), but we solely focus on Siamese architectures with tied weights in this work.
We employ h = 4 for each of the parallel multihead attention. For each of these, we use Due to the reduced dimension of each head, the total computational cost is similar to that of single-head attention with full dimensionality. More concretely, each pair of sentences (each represented as a sequence of word vectors) is passed to the Siamese multihead attention, which updates its hidden state at each sequence. The final representation of the sentences pair is encoded by a feed-forward network, and it uses a two-layered network with a ReLU activation.
The max and mean pooling methods are operated on the output of fully connected layers for the encoders shown in the left and right halves of Figure 4. Then, we concatenated the last hidden state of the models, and the refined embedding of the entity was normalized via a softmax layer. Since the dimension of the embedding vector is dependent on the outputs of dimension d model , for similarity measuring, the entities have embeddings with the same dimension.

V. EXPERIMENTS
This section presents the datasets, implementation, and evaluation and provides a brief discussion about the obtained experimental results. To the best of our knowledge, there is currently no standard method and data sets to evaluate the performance of the entity semantic similarity method. Therefore, we construct an entity similarity dataset that is based on popular datasets for word similarity evaluation and datasets for measuring entity relatedness.

A. DATASETS AND IMPLEMENTATION
Attributional similarity is the degree to which two words are synonymous. We collected several publicly available standard datasets for evaluating semantic similarity models, and these data sets are conventionally the most commonly used for word similarity evaluation. Some typical evaluation data sets are described as follows.
The WordSimilarity-353 (WS353) Test Collection is perhaps the most commonly used gold labeled data set for semantic similarity, at present. This data set contains 353 pairs of English words, which is merely a concatenation of two smaller sets. The first set contains 153-word pairs along with their similarity scores assigned by 13 subjects. The second set contains 200-word pairs, with their similarity assessed by 16 subjects. Each set provides the raw scores assigned by each subject, as well as the mean score for each word pair.
The Rubenstein & Goodenough (RG-65) dataset is the first and most used data set that contains human assessments of word similarities. The dataset resulted from experiments conducted in 1965 in which a group of 51 subjects (all native English speakers) assessed the similarity of 65 pairs of words selected from ordinary English nouns. The similarity of each pair is scored according to a scale from 0 to 4 (the higher the value is, the higher the similarity).
SimLex-999 [62] is a recently released dataset that consists of 999 word pairs for evaluating semantic similarity specifically. Pairs of words were chosen to represent different ranges of similarity, with either a high or low association. Each pair of words was rated by more than 35 subjects (native English speakers) with similarity scores on a scale from 0 to 10, and the average was assigned as the final judgment. The MEN Test Collection contains 3,000-word pairs, which were randomly selected from words that occur at least 700 times in the freely available ukWaC and Wikipedia corpora combined and at least 50 times in the open-sourced subset of the ESP game dataset. Two sets of English word pairs (one for training and one for testing) together with humanassigned similarity judgments were obtained by crowdsourcing using Amazon Mechanical Turk via the CrowdFlower interface.
All of the datasets described above contain a list of word pairs along with human-assigned similarity judgments. The semantic ratings of those word pairs have been proven to be highly correlated and could be reliably used for evaluating semantic similarity methods. The state-of-the-art results for each dataset are listed in table 1. The semantic similarity metrics presented in this article are used for entities, which are associated with the text around them. For training the Siamese network, more datasets for evaluating the methods of word similarity and entity relatedness were employed.
We used the January 2020 version of the English Wikipedia dump and an open-source tool WikiExtractor 1 to create the training corpus. For each word/entity in the word pairs, was removed duplicate data and overlapping definitions in Wikipedia. A sentence that contains the word/entity itself was selected as the context scenario sentence. There are a total of 28068 training data instances, and the format of the data is explained in Figure 5.
To evaluate the similarity among the entities in a certain context, the testing data set was constructed based on 300 unique pairs of entities, such as person, cities, organizations, and so on. In total, 60 pairs were allocated with different similarity values according to different context attention of their entities. Thus, we consider that these are different testing data. Finally, the testing data set was composed of 360 entity pairs, and the similarity of each pair is scored according to a scale from 0 to 1 (the higher the ''similarity of meaning'' is, the higher the number). The similarity values in the dataset are the means of judgments made by 6 subjects.
To obtain the embeddings for the semantic feature selection, we used an open-source Python library, Wikipedia2Vec, 2 which provides a unified interface to the implementations of entity embedding methods. We used a source Wikipedia dump file enwiki-20200101-pagesarticles.xml.bz2, 15.6 GB from Wikimedia Downloads (https://dumps.wikimedia.org/). Then, the embeddings were trained from a Wikipedia dump using the train command of wikipedia2vec with the following options:  To obtain the embeddings sequence for the input embedding process of the connotation information encoder, we employed a large semantic knowledge base YAGO, which was derived from Wikipedia, WordNet, WikiData, GeoNames, and other data sources. Currently, YAGO knows more than 17 million entities (such as persons, organizations, and cities), and it contains more than 150 million facts about these entities. The reason is that choosing the right balance enables node2vec to preserve the community structure as well as the structural equivalence between the nodes. We employed node2vec as the method for knowledge graph embedding.
We utilized an open-source Python library, GEM 3 (graph embedding methods), which provides a unified interface to the implementations of graph embedding methods. The library provides implementations of node2vec [41]. Through the knowledge graph embedding process, we concatenated the embeddings of the relation and tail entity associated in triple facts with the entity. These steps constructed a basic dictionary for converting a property to its vector representation for subsequent processes. For input embedding processing of the denotation information encoder, we used the pretrained BERT model with the huggingface PyTorch library 4 to finetune our model.
Finally, for a pair of entities E A and E B , their similarity score is defined by the cosine value of the corresponding refined embedding vectors We used the training set to fine-tune our models using the regression objective function. At the time of prediction, we computed the cosine-similarity between the entity embeddings. All of the models were trained with 10 random seeds to counter the variances.

B. EVALUATION MATRIX AND RESULTS
Spearman's rank correlation coefficient is a nonparametric rank statistic that was proposed by Charles Spearman as a measure of the strength of an association between two variables. In this article, we utilize Spearman's rank correlation coefficient to evaluate the strength of an association between similarities accessed by comparable models and gold labels.
To evaluate the model's context adaptation, we use the testing data set DataSet360 and its subset DataSet240 (after removing test data that has two similarity scores for the full testing dataset). In order to be able to present a reliable comparison, we have done tests with multiple state-of-theart models in such a way that the same input data was used those models.
The results are depicted in table 4. We experimented with three semantic augmentation models: Only using the connotation information (SiameseNet(C)) or denotation information (SiameseNet(D)) and the combined model (Double Attention) to measure the similarity among entities.
The results show that directly using the output of BERT leads to rather poor performances, which achieve an aver-  age correlation of only 0.633 and 0.581 on Dataset240 and Dataset360, respectively, for the best of three semantic augmentation models. All are worse than the existing state-ofthe-art methods.
We show that using the described Siamese network structure and fine-tuning mechanism substantially improves the correlation, and the performance of the semantic augmentation models is comparable with existing state-of-the-art methods in the traditional entity similarity measurement task. Moreover, we observe that the combined strategy leads to a significant improvement in the entity thematic similarity measuring task.

C. CASE STUDY
For the case study, we show that our method's performance on the Related Entities Dataset KORE, which contained 20 seed entities from 4 domains (IT companies, Hollywood celebrities, video games, television series). For each of these entities, 20 entities linked from their Wikipedia article were selected, and they were ranked by human annotators on Mechanical Turk  Our model shows good performance on the task of entity thematic similarity, as shown in Figure 9 (a). Taking the person name entities of Alan Turing, Isaac Newton and Steve Wozniak as an example, if the context is to talk about computer science, then Alan Turing is closer to Steve Wozniak than to Isaac Newton. Our model could also give an explainable cause in that both Alan Turing and Steve Wozniak are computer designers. If the context is to talk about a professional's nationality, then Alan Turing is closer to Isaac Newton than to Steve Wozniak. An explainable cause is that both Alan Turing and Isaac Newton are English inventors and mathematicians. In Figure 9 (b), taking company name entities Microsoft, IBM, Apple Inc., Hewlett-Packard, Google and Yahoo! as an example, if the context is to talk about hardware manufacture, then Microsoft is closer to IBM, Apple Inc., Hewlett-Packard, and an explainable cause is that those companies are all computer hardware companies and notebook manufacturers. If the context is to talk about web services, then Microsoft is closer to Google and Yahoo! because all of them are web service providers and have products that are internet search engines.

VI. CONCLUSION
Estimating the similarity between entity names plays an important role in personalized searching services in the edge environment. By considering that the entities' semantic similarity is related to the context scenario and depends on properties they have expressed in common, we propose a novel semantic augmentation method with a double attention mechanism. The method refers to a dynamic representation learning process that maps an entity to a real number vector in semantic space. Different from previous state-of-the-art similarity measurement methods, in this article, we propose a thematic similarity measure approach to analyzing the connotation and denotation similarities among entities. By utilizing a double attention mechanism, we transform a point-to-point similarity measurement problem to a sequence-to-sequence problem, and the knowledge graph embedding and transformer networks were introduced in the model.
The experimental results show that directly using the output of BERT leads to rather poor performances, but the combined strategy leads to a significant improvement in the entity thematic similarity measurement task. The model could effectively make a separation among the entities from different domains. In addition, it makes similar entities closer. It also shows good performance on the task of entity thematic similarity.
YU BAI is currently pursuing the Ph.D. degree with the Nanjing University of Aeronautics and Astronautics, and a Lecturer with Shenyang Aerospace University, China. He has published more than 30 articles. His research interests include AI, NLP, IR, and KG. Since 2017, he sat on the Language and Knowledge Computing Profession Committee on Chinese Information Processing Society of China. He currently serving as the CKO at Global Envoy Software Company Ltd. ZHIGUANG WANG is currently a Graduate Student with the School of Computing, Shenyang Aerospace University, China. His research interests include information retrieval and knowledge graph. He is currently developing machine learning algorithms and human-machine cooperation solutions in information services in the field of IIoT.
JIANJUN CHEN is currently an Assistant Professor with the Shenyang Northern Software College of Information Technology, China. He joined the Global Envoy Software Company Ltd., in 2002. His research interests include knowledge engineering and knowledge services. As the Director of software engineering, he is currently organizing the research and development of machine learning algorithms and human-machine cooperation solutions in information services in the field of IIoT.
PENG LIAN is currently an Assistant Professor with the Shenyang Northern Software College of Information Technology, China. He joined the Global Envoy Software Company Ltd., in 2002. His research interests include knowledge engineering and knowledge services. Since 2012, as the Chief Technology Officer, he presided over the development projects of software systems in the field of IIoT more than 30.