Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain

Knowledge graphs (KGs) are widely used for modeling scholarly communication, performing scientometric analyses, and supporting a variety of intelligent services to explore the literature and predict research dynamics. However, they often suffer from incompleteness (e


I. INTRODUCTION
Science of Science is a rapidly emerging research field that studies the interactions among scientific agents in order to develop tools and policies for accelerating the scientific process [12]. The large increase in the volume of scholarly outputs, such as articles, data sets, and software packages, yields unprecedented opportunities to this field, but also results in The associate editor coordinating the review of this manuscript and approving it for publication was Ahmed A. Zaki Diab . many challenges. This mass of available information has the potential to support a new generation of intelligent systems for exploring and improving research efforts, but at the same time poses a risk to drastically reduce the effectiveness of previous approaches for analysing available information. For instance, a recent article in Science [5] reported that the reaction to the COVID-19 pandemic is being slowed down by the fact that ''scientists are drowning in COVID-19 papers'' and need new solutions to efficiently analyse the scientific literature.
In order to address this challenge, we urge for structured, interlinked, and machine-readable representations of scholarly outputs. Knowledge Graphs (KGs) are becoming a standard solution for describing the actors (e.g., authors, organizations), the documents (e.g., publications, patents), and the research knowledge (e.g., research topics, tasks, technologies) in this space [17], [57]. One of the main limitations of most KGs is the incompleteness problem, i.e., a large number of relevant facts are not present in the KG. Scholarly KGs are typically incomplete regarding crucial relations such as affiliations, references, research topics, conferences, and many others. This issue is usually tackled by producing a representation of the nodes and edges based on knowledge graph embeddings (KGEs) [19] and applying link prediction techniques [10] to this representation. Embedding models were successfully applied on KGs in different domains, including digital libraries [64], biomedical [23], and social media [50]. However, several KGs contain also facts with numerical weights in which the relationship is characterized by a numeric value, which is typically a confidence value, an intensity, or it further qualifies the information in the triple. Such a representation has already been described, analyzed and verified through a formal declarative semantics [55], [56]. The resulting model theory is known as Annotated RDF (aRDF) and builds upon annotated logic. In aRDF any partially ordered set with a bottom element can be employed. For a given partially ordered set (A, ), an element φ is the bottom iff φ x for all x ∈ A. A might capture temporal, pedigree, possibilistic or fuzzy values.
In the scholarly domain the uncertainty typically stems by automatic approaches for disambiguating actors in this space [20] (e.g., authors, organizations, countries) or classifying articles according to specific categories [16], [41], [58] (e.g., topics, technologies, industrial sectors) as well as the limited coverage of complementary knowledge bases, such as GRID 1 (Global Research Identifier Database) and ORCID 2 (Open Researcher and Contributor ID). Since most of the existing KGE models can only handle triples that are either true or false, they perform quite poorly on KGs that contain weighted triples. There has been limited work on KGEs able to consider weighted triples. The main solution in this space is the Uncertain Knowledge Graph Embeddings (UKGE) [9], which however cannot properly handle erroneous or approximated weights in the graph. This is the situation, common in case of data incompleteness, in which the weights are potentially inaccurate.
In this paper, we propose the Weighted Triple Loss, a new loss function for KGE models that can effectively incorporate the numerical weights on facts and it is tolerant to incorrect or approximated weights. This loss is very general and can be used with different interaction models, e.g., DistMult [63], TransE [4], ComplEx [54]. We also introduce the Weighted Rule Loss, a loss function that extends the Rule Loss [34] in 1 GRID -https://www.grid.ac/ 2 ORCID -https://orcid.org/ order to work with weighted triples. This solution exploits a set of automatically extracted logical rules to further improve performance.
We implemented a KGE model based on DistMult which combines these two solutions and applied it on several knowledge graphs, obtaining significant performance improvements with respect to the state of the art.
The motivating scenario for this work concerns the Academia/Industry DynAmics (AIDA) Knowledge Graph [1], which was created for supporting an analysis of the flow of knowledge between academia and industry and systems for the prediction of research dynamics. The current version of AIDA integrates the metadata of about 21M research articles from Microsoft Academic Graph (MAG) and 8M patents from the Dimensions Dataset 3 in the field of Computer Science. AIDA classifies these documents according to the research topics from the Computer Science Ontology (CSO) 4 [42] and to the authors' affiliation types from the Global Research Identifier Database (GRID) (e.g., 'education', 'company', 'government', 'healthcare'). This knowledge base enables tracking the evolution of research topics across academia, industry, government institutions, and other organizations. For instance, it was recently used for predicting the impact of specific research efforts on the industrial sector [40]. However, out of the 21M articles, only 5.1M were linked with GRID IDs in the source data and thus could be associated to their affiliation types and countries. Completing this data is thus crucial in order to improve the scope of different kinds of analysis about geopolitical factors [27], researcher migrations [29], collaboration patterns between academia and industry [2], and many others.
More in details, our main contributions are: • The Weighted Triple Loss, a loss function for weighted triples which is agnostic with respect to their meaning and tolerant to incorrect weights.
• The Weighted Rule Loss, a second loss function for weighted triples that takes advantage of a set of automatically extracted logical rules.
• AIDA35k, 5 a new dataset describing 35K entities in the scholarly domain described by weighted triples.
• An evaluation showing that a KGE model based on Dist-Mult that incorporates these loss functions outperforms several the state-of-the-art alternatives (UKGE, TransE, Distmult, and ComplEx) on AIDA35k, NL27k, CN15k and obtains competitive results on PPI5k. The rest of the paper is organised as follows. In Section II, we review the literature on current embedding models for data completion and scholarly knowledge graphs. In Section III, we present a motivating scenario involving the completion of the AIDA knowledge graph. In Section IV, we describe the architecture of the new optimization technique. Section V reports the evaluation of the model versus alternative solutions. Finally, in Section VI we summarise the main conclusions and outline future directions of research.

II. PRELIMINARIES AND RELATED WORK
In this section, we provide the background for knowledge graph embedding models (Section II-A) and then review the related work. Specifically, in Section II-B we give an overview of state-of-the-art KGE models that will be used in the evaluation of our work. In Section II-C we present some alternative methods for link prediction. In Section II-D we illustrate loss functions for KGEs. Finally, in Section II-E we discuss the knowledge graphs covering the scholarly domain.

A. KNOWLEDGE GRAPH EMBEDDING MODELS
A KGE model includes several components: embeddings (e.g., vector, matrix, tensor), a score function, and a loss function.

1) TRAINING SAMPLES
Each KGE model requires a set of samples used for training. The training set should contain both positive and negative samples where positive means true triples and negative means false triples. However, each KG T = {(h, r, t)} used for training contains only positive samples, where negative samples are not given explicitly. Therefore, for each individual positive sample, a set of negative samples N h,r,t = {(h , r, t )} is randomly generated. This is performed by corrupting either the head h or the tail t. While most of KGs contain triples of the form (h, r, t), recent works focus on learning over KGs facts in the form of (h, r, t, w h,r,t ) where w h,r,t represents the weight of the triple h, r, t (e.g., the degree of uncertainty for the triples).

2) EMBEDDINGS
For a given triple (h, r, t), a KGE model mainly aims at obtaining vector representations for entities (shown in bold h, t) and the relation r involved in each triple. The embedding space can be real R d , complex C d [51], [54] and quaternion H d [65], which are generalized in geometric algebra G d [62].

3) SCORE FUNCTION
A score function f (h, r, t) takes the embeddings of a triple (h, r, t) as input and returns a value reflecting its plausibility in the context of the KG. It is typically used by link prediction methods for assessing new candidate triples. Most modern KGE models either use distance functions (e.g., TransE [4], Trans4E [30], RotatE [51]) or inner product functions based on semantic matching (e.g., DistMult [63], ComplEx [54], QuatE [65], RESCAL [36], 5*E [31]) to compute scores of triples. The embeddings and consequently the scores of triples depend on the optimization of the loss function, which is introduced below.

4) LOSS FUNCTION
Typically KGE models employ loss functions that can be applied only on positive and negative triples, such as Mar-gin Ranking Loss (MRL) [4], negative likelihood of logistic model [54], self-adversarial loss [51], Soft Mariginal [32] and full multiclass log loss [22]. An exception is RESCAL [36] which adopts the Mean Square Error (MSE) loss and thus can be trained also on weighted triples, where the weight reflects the uncertainty of a triple. A more recent work [9] combines the MSE loss with a rule-based loss in order to improve the ability to learn from weighted triples. In Section II-D we review these loss functions in detail.

B. OVERVIEW OF KGE MODELS
In this section, we provide a summary of the state-of-the-art KGE models that are also considered in the evaluation of this work. We consider two main classes of approaches: 1) distance-based models, which use distance functions (e.g. L2 norm) for score computation, and 2) semantic matching-based models, which use inner product.

1) DISTANCE-BASED KGE MODELS
TransE [4] is one of the primary translation-based KGE models and it is still considered a very competitive approach due to its simplicity and high performance. The score function of this model is: With this simple score function, the TransE model is mostly used as a baseline and outperforms many of the recent complex models.
RotatE [51] applies a rotation-based mechanism for transforming the head entity to the tail entity via a relation specific transformation. It uses a complex space for embedding the entities and the relations and its score function is: where • is an element-wise product. This model is currently one of the top performing KGE models for link prediction.

2) SEMANTIC MATCHING-BASED KGE MODELS
ComplEx [54] uses similarity of latent representations for scoring the positive and negative triples. The name of this model also represents the space in which it is designed, complex space. The underlying scoring function is: wheret is the conjugate of the vector t and returns the real part of the complex number. It uses a matrix represented as diag(r) where the values of r are the diagonal elements of the matrix and the non-diagonal elements are zero.
116004 VOLUME 9, 2021 For the computation of h , the relation embedding r = p r + q r i + u r j + v r k is normalized to a unit quaternion: Furthermore, the Hamiltonian product (shown by ⊗) between r (n) and h = a h + b h i + c h j + d h k is computed as: DistMult [63] extends the RESCAL [36] model. The score function of RESCAL is: where M r is a relation-specific matrix. DistMult improves RESCAL using a matrix multiplication for capturing the relational semantics. The score function of this model uses a pairwise interaction of the latent features:

C. ALTERNATIVE METHODS FOR LINK PREDICTION
Beside shallow embedding models (e.g., QuatE, ComplEx, TransE, and RotatE) that are classified as one of the dimensionality reduction-based, many other techniques could be considered from information theory, clustering, and perturbation [26]. Graph neural networks (GNNs) [61] are one of the main techniques for link prediction which compute the state of embeddings for a node according to the local neighborhood [66]. Such models provide a d-dimension vector for each node and compute embeddings based on local neighborhood. Generally, GNNs have high computation costs which are problematic in large scale knowledge graphs. Another family of approaches that gained attention recently is the few shot learning (FSL) [60]. This does not deal with the structure of the data but the quantity of the data. In contrast with other machine learning approaches that need massive data to do accurate prediction and analysis, FSL approaches take smaller dataset as input and provide high performance output.
In terms of weighted triples, there are a few works from the Semantic Web community [6], [7], [24], [53] where RDF format is specified for weighted triples. However, in these works, the weights are not considered as quadratic information that extends the triple representation to quadruple, but as part of the tail of a triple. In [8], the time factor and uncertainty of triples are represented as weighted triples. However, they are not using any embedding model of prediction but Markov Logic Networks.
Some other non-embedding link prediction methods, such as [25], [46], [47], are also able to take into consideration weighted triples. However, they can only process simple undirected graphs, since they do not support multiple relation types and self-loops. Therefore, they are not applicable to most KGs, which are typically multi-relational directed graphs allowing self-loops for some relation types.

D. LOSS FUNCTIONS FOR KGEs
In this paragraph we review several loss functions that are used by state-of-the-art KGE models.

1) MARGIN RANKING LOSS
The margin ranking loss (MRL) [4], which is inspired by general margin based approaches [3], aims at forcing a margin between each positive sample (h, r, t) and its corresponding negative sample (h , r, t ). The negative samples are generated by replacing either the head or tail of positive samples with a random entity from the KG. The formulation of the T is the set of all positive samples, N h,r,t is the set of all negative samples generated from the triple (h, r, t), and γ is the length of the margin between positive and negative samples. MRL has been widely used for training TransE and its variants.

2) LIMIT-BASED SCORING LOSS
When a KGE model is trained by using the MRL, the score of positive triples may be unbounded. In the case of translation based KGE model (e.g., TransE), such a limitation would prevent the model from fulfilling the translation in the vector space, resulting in poor performance [67]. The limit-based scoring loss [67] aims at avoiding this issue by including the boundary for the scores of positive triples in the margin ranking loss.
The term [f (h, r, t) − γ 1 ] + enforces the constraint f (h, r, t) ≤ γ 1 . Therefore, scores of positive triples are bounded (by γ 1 ) not to be very large. λ is a multiplier of the regularization term that determines the degree of importance of the term [f (h, r, t) − γ 1 ] + in the optimization. This loss function improved the performance of translation based embedding models (TransE, TransH, TransR etc).

3) SOFT MARGIN LOSS (SML)
The soft margin loss [33] aims at handling noisy negative samples. It adds slack variables (η h,r,t ) to negative samples optimization in order to mitigate the negative effect of false negative samples: where λ, λ + , λ − are regularization weights used as hyperparameters. This loss function has been used for training TransE and RotatE. VOLUME 9, 2021

4) SlidE LOSS
The length of the margin is an important factor which affects the performance of KGE models. In Limit-based Scoring Loss and Soft Margin Loss the length of the margin is determined by setting two hyper-parameters by a trial and error process. The SlidE loss [35] addresses this issue by determining the center of the margin and automatically adjusting the length of the margin by means of a trainable variable (η). The formulation of the SlidE loss is as follows: where γ is the center of margin, 2η is the length of margin, and σ is the variance of the Gaussian function that affects the margin.

5) SELF ADVERSARIAL LOSS (SAL)
Self-Adversarial Loss [51] obtained state-of-the-art performances on distance based KGE models. Its formulation is: , α is adversarial temperature which represents the extent of attention on the score of negative samples used for random sampling. The loss assigns higher weights for negative samples with high values to reduce their scores as much as possible. σ is Sigmoid function.

6) NEGATIVE LOG LIKELIHOOD LOSS (NLL)
The negative likelihood [54] of the logistic model with regularization is: where θ is used to represent all adjustable parameters for simplicity purpose and y h,r,t is the label of triples (positive triples are labeled with 1 and negative triples are labeled with −1).

7) FULL MULTICLASS LOG LOSS (FMLL)
The ComplEx model was originally trained using the negative likelihood log loss. However, it has been recently shown that the model obtains state-of-the-art performances by using the full multiclass log-loss [22]. The loss applies full negative sampling and is defined as: where l(f (h, r, t))) = l 1 (f (h, r, t)) + l 2 (f (h, r, t)), , r, t))), where log and exp are logarithmic and exponential functions, respectively. The loss gives big (small) scores for positive (negative) triples.

8) UKGE LOSS
The previously discussed loss functions are suitable for learning over triples which are either positive or negative and cannot handle weighted triples.
Recently, the UKGE model used the MSE loss (already adopted by the RESCAL model) together with probabilistic soft logic loss, for training KGs with uncertain triples i.e., triples associated with a weight that reflects the confidence of their correctness [9]. This loss is model independent and it is formulated as: (14) where w h,r,t is the weight of a triple (h, r, t) and T w is the set of all the weighted triples. g refers to a rule and ψ g is the weighted distance of the rule g obtained by probabilistic soft logic.
An important limitation of this loss is that it constrains the KGE model to learn scores that are very close to the input ones. This can be problematic when dealing with approximated or incorrect values that are then incorporated in the model without any correction. We will discuss this issue further in Section IV-A.

E. SCHOLARLY KNOWLEDGE GRAPHS
Knowledge graphs about research outputs typically either focus on the metadata (e.g., titles, abstracts, authors, organizations) or they offer a machine-readable representation of the knowledge contained in research articles.
A good example of the first category is Microsoft Academic Graph (MAG) [59], which is a heterogeneous knowledge graph containing the metadata of more than 242M scientific publications, including citations, authors, institutions, journals, conferences, and fields of study. Similarly, the Semantic Scholar Open Research Corpus 6 is a dataset of about 185M publications released by Semantic Scholar, an academic search engine provided by the Allen Institute for Artificial Intelligence. One more knowledge graph is the OpenCitations Corpus [39], that includes 55M publications and 655M citations. Scopus is a well-known dataset curated by Elsevier, which includes about 70M publications and is often used by governments and funding bodies to compute performance metrics. The Open Academic Graph (OAG) 7 is a large knowledge graph integrating 208M papers from MAG and 172M from AMiner.
All these resources suffer from data incompleteness to different degrees. For instance, it is still challenging to identify and disambiguate affiliations. This hinders our ability to categorize the articles according to their affiliation type or country [27]. Similarly, references are usually incomplete, and the citation count of the same paper tends to vary dramatically on different datasets [39].
A second category of knowledge graphs focuses instead on representing the content of scientific publications. This objective was traditionally pursued by the semantic web community, e.g., by creating bibliographic repositories in the Linked Data Cloud [37], encouraging the Semantic Publishing paradigm [48], implementing systems for managing nano-publications [14], [21] and micropublications [44], and developing a variety of ontologies to describe scholarly data, e.g., SWRC, 8 BIBO, 9 BiDO, 10 SPAR [38], 11 CSO [43]. A recent project is the Open Research Knowledge Graph (ORKG) [18], which aims at describing research papers in a structured manner to make them easier to find and compare. Similarly, the Artificial Intelligence Knowledge Graph (AI-KG) 12 describes 1.2M statements extracted from 333K research publications in the field of AI. Since extracting the scientific knowledge from research articles is still a very challenging task, these resources tend also to suffer from data incompleteness.

III. MOTIVATING SCENARIO: THE AIDA KNOWLEDGE GRAPH
New scientific knowledge is continuously produced by the collective effort of a variety of actors, such as universities, commercial companies, government institutions, non-profit, and many others. Analysing how these organizations collaborate in different research areas and exchange ideas and persons is crucial for harmonising their efforts as well as for understanding, monitoring, and anticipating research dynamics [2].
In order to support such analysis, we recently released the Academia/Industry DynAmics (AIDA) Knowledge Graph [1], a resource that includes more than one billion triples and describes 21M publications from Microsoft Academic Graph (MAG) 13 [59] and 8M patents from Dimensions. AIDA is available at http://aida.kmi.open.ac.uk and can be downloaded as a dump or queried via a Virtuoso triplestore (http://aida.kmi.open.ac.uk/sparql/). All the articles and patents in AIDA are associated with a distribution of topics from the Computer Science Ontology (CSO) [42], which is the largest taxonomy in the field, counting more than 14K topics. 5.1M publications and 5.6M patents are also categorized according to the type of the author's affiliations from the Global Research Identifier Database (GRID), a openly accessible database of research institution identifiers. The classification is composed by eight exclusive categories: Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, and Other.
The combination of organization types and topics in AIDA allows researchers to produce several kinds of advanced analysis. For instance, it was recently used to improve the state of the art regarding the prediction of research impact on the industrial sectors [40]. Table 1 shows, as example, the number of publications in three well-known research topics classified according to the percentage of authors in organization type (we show just five for space constraints). For instance, about 15.7K of Neural Networks articles have at least one author from a company, 11.7K have at least half of the authors in this category, and only 8.6K have all the authors from a company. Overall this data show that these organization types are very different in terms of contributions. Authors from academia tend mostly to collaborate among themselves, and the same is true even if to a lesser degree for authors from companies. Conversely, the other categories tend to collaborate more with different types.
For instance, in Computer Science only 8.1% of the articles involving authors from Universities (Education) include at least one collaborator from the other categories 14 (e.g., Company, Government). Conversely, authors from companies collaborate with at least another category (mostly Education) in 14.6% of the cases. This number raises to 46.1% for Government Institutions, 46.6% for Nonprofit, and 69.0% for Healthcare.
However, these dynamics can vary drastically in different research areas. For instance, companies tend to collaborate much more with other categories (mostly universities under Education) in the fields of Neural Networks (45.7% of collaborations) and Semantic Web (49.8%).
The main shortcoming of the current version of AIDA is that only about 25% of the articles (5.1M out of 21M) and 70% of the patents (5.6M out of the 8M) are associated with the GRID affiliation type. The missing data are due to the fact that some affiliations are not present on GRID or they were not correctly mapped to the relevant GRID IDs in the original data. In order to improve the scope of the analyses supported by AIDA is thus critical to address this issue by mapping articles to the correct organization type.
This scenario motivated us to investigate different models for link prediction that could be applied on AIDA and on other knowledge graphs that suffer from the same issues. However, as previously mentioned, several information regarding the documents in AIDA are best represented as weighted triples. For instance, since the authors of a paper can have different affiliation types, each category is 14 This percentage is computed as the difference between the number of articles in Computer Science with at least an author from Education (Computer Science (>0) in Table 1, 3,969,096) and the number of articles in Computer Science with only authors from Education (Computer Science (=1.0), 3,648,629). associated with a weight equal to the fraction of authors associated with that type. Therefore, a paper that has three authors associated with the type 'Education' and one with the type 'Industry' would be assigned the category 'Education' with a weight of 0.75 and the category 'Company' with a weight of 0.25. This can be represented as two weighted triples: < paperID, hasGridType, Education, 0.75 > and < paperID, hasGridType, Company, 0.25 >. The same mechanism is also used to associate articles with countries: a paper that has half of the authors from UK will be associated with the weighted triple: < paperID, hasCountry, UK , 0.5 >. This same solution is also used to quantify the number of citations received by a paper in a specific year. When representing these data as Resource Description Framework (RDF) we need to reify these triples as shown by Figure 1.
These considerations led to the design of the loss functions presented in this paper. In order to complete AIDA KG, we implemented a version of DistMult that incorporates the Weighted Triple loss function, labelled in the following Weighted Graph Embedding (WGE).
In order to empirically evaluate the effectiveness of our loss functions, we apply them on a subset of the AIDA knowledge graph which we named AIDA35k, a new dataset including 35K entities from AIDA associated with triples with numerical weights. AIDA35k is a weighted Knowledge Graph K, where K = {E, R, T w } and T w = {(h, r, t, w h,r,t )} ⊂ E × R × E × R. w h,r,t is the weight of the fact (h, r, t).

IV. OPTIMISING KGE MODELS FOR WEIGHTED TRIPLES
In this section we propose two loss functions: Weighted Triple Loss and Rule Loss for Weighted Triples. These loss functions optimise the weighted triples of the form (h, r, t, w h,r,t ) where h and t are the head and tail entities, r is a relation between them, and w h,r,t is the weight assigned to the triple (h, r, t).

A. WEIGHTED TRIPLE LOSS
The loss function is agnostic with respect to the kind of weight. Conceptually, we consider two main types of weights. The first is related to the correctness of the triple and indicates its degree of plausibility. The second refers to the intensity of the relation and reflects the degree of association between the head and the tail.
The main intuition behind the Weighted Triple Loss (WTL) presented in this paper is that in many practical cases the weight w h,r,t is estimated on the basis of potentially incomplete data and possibly biased computational methods. For instance, the weights associated with the organization types in AIDA depend on many factors such as the coverage of the GRID database in a particular moment in time and the performance of the disambiguation approaches applied by MAG. Therefore, these weights are typically approximations and some of them may be simply incorrect. This limitation needs to be taken into account during the learning phase. Therefore WTL allows the model to learn the score f (h, r, t) of a triple (h, r, t) in the range: where η − 2 h,r,t and η + 2 h,r,t are trainable variables which allow the score f (h, r, t) not to be exactly equal to w h,r,t , but rather to be a number bounded between w h,r,t − η − 2 h,r,t and w h,r,t + η + 2 h,r,t . In order to optimize the embedding vectors of entities and relations as well as adjusting η − 2 h,r,t and η + 2 h,r,t , the following optimization framework is proposed: where λ 1 , λ 2 are hyper-parameters that affect the degree to which η − h,r,t , η + h,r,t are minimized, λ 3 is the multiplier of the regularization term over the embeddings of entities and relations, θ contains all the adjustable parameters including the embeddings of entities, relations and η − 2 h,r,t , η + 2 h,r,t i.e. θ = {(h, r, t)∪{η − 2 h,r,t , η + 2 h,r,t }|(h, r, t) ∈ T }. L is the regularization over the entities and relations embeddings i.e. L = E 2 + R 2 . E and R are the embeddings of all the entities and relations in the KG. For each quadruple in the training set (h, r, t, w h,r,t ), we generate a corrupted sample using uniform negative sampling technique [4] where either h or t is replaced by a random entity e ∈ E, i.e., the resulting triples are (h = e, r, t, w h ,r,t ) or (h, r, t , w h,r,t ). For the corrupted samples, we set their weights w to zero. We indicate the set of all corrupted samples by N .
This solution results in a high tolerance to incorrect weights. Indeed, the UKGE [9] loss forces the KGE model to learn scores that are very close to the input weights, therefore incorrect values are preserved and incorporated in the model. Conversely, WTL allows for more flexibility in the learning process, by allowing for a wider range of scores to be learnt and as a result the incorrect weights can be corrected by using contextual information from other triples.

B. WEIGHTED RULE LOSS 1) EXTRACTION OF RULES
In order to include additional logical rules as complementary knowledge, we used the AMIE rule extractor [13], which is specifically designed for rule extraction on KGs. A logical rule is generally of the form of

2) DEFINITION OF THE RULE LOSS FOR WEIGHTED TRIPLES
In order to apply rules to weighted triples we extend the approach presented in Nayyeri et al. [34]. For a given rule of the form rule : q 1 ∧ q 2 ∧, . . . , ∧q n − → q n+1 where q i , i = 1, . . . , n + 1 are atoms (weighted triples where relations are fixed, but entities are variable). To model rule loss for the above-mentioned rule, we use the following formula.
where w q i is the weight of the weighted triples q i , i = 1, . . . , n after grounding of entities (replacing the variables by entities in E). f (q n+1 ) is the score of the triple (h, r, t) in the weighted triples (h, r, t, w h,r,t ) where w h,r,t is not given in the training set, but is approximated by the score of the used KGE model i.e., f (q n+1 ) = f (h, r, t). For each rule i , i = 1, . . . , l in the rule set, we provide the corresponding rule loss R i . The rule loss can be added to the optimization framework as or added as additional weighted triples T w = {(h, r, t, w h,r,t = w q 1 * , . . . , * w q n )}, where (h, r, t) is in the head of a rule q 1 ∧ q 2 ∧, . . . , ∧q n − → (h, r, t). Therefore, the following optimization problem is suggested

V. EVALUATION
In this section, we compare the performance of i) Weighted Graph Embedding (WGE), the version of DistMult that incorporates the Weighted Triple Loss function (see Section 3), ii) the Uncertain KG Embedding (UKGE), which uses the loss function presented in Chen et al. [9] (see Section II-D.h), iii) VOLUME 9, 2021    DistMult [63], iv) TransE [4], and v) ComplEx [54] on several datasets. In addition to AIDA35k, which was introduced in Section 3, we used three other datasets that include weighted triples: CN15k, NL27k, and PPI5k. These were used in the evaluation of UKGE [9], which is one of the baselines. CN15k is a subgraph of ConceptNet [49] that covers 15,000 entities and 241,158 uncertain relation facts in English. NL27k was obtained from NELL [28], an uncertain KG extracted from webpages containing information about cities, companies, emotions and sports teams. It covers 27,221 entities, 404 relations, and 175,412 uncertain relation facts. Finally, PPI5k was extracted from the Protein-Protein Interaction Knowledge Base STRING [52] and contains 271,666 uncertain relation facts for 4,999 proteins and 7 interactions.
We considered two different versions of the WGE and UKGE models using two score functions, respectively the logistic function (WGE_logi and UKGE_logi) and the bounded rectifier (WGE_rect and UKGE_rect) [9]. These are defined as follows: • Bounded Rectifier: (x) = min(max(wx + b, 0), 1). In addition, since the original article about UKGE also presented an alternative version that injects probabilistic soft logic (PSL) rules for deriving weights between 0 and 1 for unobserved triples, we also considered other two alternative versions of UKGE that make use of PSL (UKGE_rect_psl and UKGE_logi_psl). However, these models could only be used for the three datasets released with the original paper about UKGE (PP15k, CN15k, NL27k) [9], since the article does not give enough details to reproduce these models on a new dataset.
On AIDA35k, we further tested two versions of WGE (WGE_rect_rules and WGE_logi_rules) and two versions of UKGE (UKGE_rect_rules and UKGE_logi_rules) that use the Weighted Rules Loss as defined in Section IV-B. In order to extract the rules from AIDA35k we used AMIE+ [13]. This results in initial set of about 40 rules from which we filtered strong rules only (18) and categorized them to produce the corresponding groundings (around 126k). When we run AMIE+ on the other three datasets, it produced an unfeasible number of rules (more than 10k rules) to be integrated in the model. Therefore, we limited the evaluation of the Weighted Rules Loss to the AIDA35k dataset.

A. EXPERIMENTAL SETUP 1) ENVIRONMENT
We implemented our model WGE using Python 3.7 and PyTorch (version 1.7.1) library. We used the Sklearn library (version 0.22) for implementing the evaluation metrics. Furthermore we adopted Adam as optimizer for training our model. We employed AMIE+ [13] 15 to automatically extract logical rules from the KG and ran the code on Google Colab servers using a Tesla K80 GPU and 13 GB of RAM.
The code of our approach can be freely accessed at https://github.com/gokcemuge/WeightedGraphEmbedding, while the data used for training and evaluation are publicly available at http://aida.kmi.open.ac.uk/aida35k/.

2) RULE EXTRACTION
We set a probability threshold of 0.4 for the extracted AMIE rules. When binding rule variables to entities, those rules generate grounded triples. We used a threshold of 0.1 for filtering those grounding triples. Overall, this process generated 18 rules and 126,031 grounded triples. Among these, 20,450 grounding triples belong to hasGridType relation.

3) METRICS AND HYPERPARAMETERS
We adopted the Mean Square Error (MSE), the Mean Absolute Error (MAE), the Area Under Curve (AUC) [45] and the F1 measure as evaluation metrics. We also evaluated the time complexity of our approach with the granularity of seconds per epochs. Since the space is limited and they are standard metrics used by most works in this field [15], they will not be described in this paper.
The  Table 3 reports the performance of the approaches on the PPI5k, CN15k, and NL27. On the NL27k dataset, the rectifier version of WGE (WGE_rect) outperforms the other approaches in MSE and MAE, while the logistic version (WGE_logi) achieves the best results in F1, AUC and accuracy. On PPI5k, WGE obtains competitive results, outperforming TransE, Distmult, and Complex in all the metrics and UKGE in MAE. However, UKGE performs better in F1 and MSE and yields comparable accuracy. This is due to the limited size of PPI5k which includes only 5K entities and 7 relations. On CN15k the rectified WGE (WGE_rect) obtains the highest performance in all the metrics. Table 4 reports the performance and running time on the AIDA35k dataset. WGE_rect_rules obtains the best results in terms of AUC (0.864) , F1 (0.847) and accuracy (0.871), while WGE_logi_rules yields the best MAE and WGE_logi the best MSE. While our model outperforms the other competitors, the running time (reported in seconds per epochs) of our model is close to that of other models. Table 6 zooms on two specific relations from AIDA35k: hasGridType and hasCountry. For hasGridType, WGE_rect achieves the best results for MAE, F1, and accuracy. For hasCountry, the rectified version of WGE with rules (WGE_rect_rules) outperforms all the other models in MSE, F1, and accuracy. This suggest that incorporating the Weighted Rule Loss can enhance significantly the performance, especially for types of certain relations.
Overall, WGE, our solution based on WTL, outperformed UKGE on AIDA35k, CN15k, and NL27k and obtainsed competitive results on PP15k. This seems to be due to the ability of WTL of tolerating better incorrect weights. WGE also outperformed by a large margin the other models based on loss functions that do not handle weighted triples. In particular, the difference in terms of F1 score and accuracy between the standard DistMult model and the DistMult interaction model with the best variant of our proposed loss function is higher than 10% on average across all datasets. This empirically confirms our hypothesis that there is a substantial benefit VOLUME 9, 2021 in using triple weight information, if available, in the loss function.

VI. CONCLUSION AND FUTURE WORK
In this paper we proposed the Weighted Triple Loss (WTL), a new loss function for KGE models that can effectively handle weighted triples and is tolerant to incorrect or approximated weights. We also introduced the Weighted Rule Loss, a loss function that extends the Rule Loss in order to work with weighted triples.
In order to test these solutions, we developed the Weighted Graph Embedding (WGE), a new KGE model which uses the interaction model of DistMult with the two loss functions.
The evaluation showed that this approach outperforms all the baselines (UKGE, TransE, Distmult, and ComplEx) and achieves higher result than baseline on AIDA35k (metadata of research articles), NL27k (data from web pages), and CN15k (concepts from ConceptNet). It also obtains competitive results on PPI5k (proteins from STRING).
WGE was originally designed to address the real world scenario of completing the AIDA Knowledge Graph, in order to enable more comprehensive quantitative analysis of science about geopolitical factors [27] and the flow of knowledge between different types of organizations [2] (e.g., university, industry, non-profit). However, the loss functions presented in this paper are general solutions that can be used in many different domain in order to take into account the weighted triples. They can also can support different interaction models, such as DistMult [63], TransE [4], ComplEx [54].
The approach presented in this paper opens up several interesting directions of work. First, we aim to apply WGE on other KGs in this space for improving their coverage of the research landscape. Specifically, we plan to run it on recent KGs describing scientific concepts (e.g., tasks, methods, materials) and their relationships, such as AI-KG [11] and ORKG [18], where the numerical weights could represent the consensus of the research community on the relevant statements. We also plan to apply model selections techniques in order to investigate the best set of parameters and evaluation methods in this space. Finally, we would also like to apply our approach to a range of KGs in other domains for investigating how the results and performances might be affected by the underlying domain.
GÖKCE MÜGE CIL received the B.Sc. degree in computer engineering from Bilkent University, Turkey, in 2017. She is currently pursuing the master's degree in computer science with the University of Bonn. She is working on her master's thesis with the Smart Data Analysis Group.
SAHAR VAHDATI received the M.Sc. and Ph.D. degrees in computer science from the University of Bonn. She has been a Senior Researcher and holds a postdoctoral position with Oxford University, U.K. She is currently leading the Nature-Inspired Machine Intelligence Research Group, Institute of Applied Informatics (InfAI), University Leipzig. Her research interests include using knowledge representation, analyzing knowledge graphs, and artificial intelligence (AI).

FRANCESCO OSBORNE is currently a Research
Fellow with the Knowledge Media Institute, The Open University, U.K., where he leads the Scholarly Data Mining Team. His research interests include artificial intelligence, information extraction, knowledge graphs, the science of science, and the semantic web. He has authored more than 80 peer-reviewed publications in top journals and conferences in these fields. He collaborates with major publishers, universities, and companies in the space of innovation for producing a variety of innovative services for supporting researchers, editors, and research polities makers. He recently released the Computer Science Ontology that is currently the largest taxonomy of research areas in the field.
ANDREY KRAVCHENKO is currently a Researcher with the University of Oxford and with Skolkovo Institute of Science and Technology. His Ph.D. research was at the intersection of machine learning and unstructured data extraction. He also played a significant role in the DIADEM project, which produced state-of-the-art research in the field of large-scale fully automated web data extraction. His current research interests include theory and application of anomaly detection in big data using sequences and graphs, and in particular, the development of efficient machine learning algorithms based on the embedding of vectors. He works on exploring the broader connection between black-box machine learning models and knowledge-based systems, with a particular focus on knowledge graphs. SIMONE ANGIONI received the B.S. and M.S. degrees in computer science from the University of Cagliari, Italy, where he is currently pursuing the Ph.D. degree. His research interests include the science of science, scientometrics, information extraction, the semantic web, and robotics. He is the main developer of the academia/industry dynamics (AIDA) knowledge graph, an innovative resource for studying the relationship between academia and industry.
ANGELO SALATINO received the Ph.D. degree in early detection of research trends. He is currently a Research Associate with the Intelligence Systems and Data Science (ISDS) Group, Knowledge Media Institute (KMi), The Open University. In particular, his project aimed at identifying the emergence of new research topics at their embryonic stage. His research interests include the semantic web, network science, and knowledge discovery technologies, with a focus on the structures and evolution of science.
DIEGO REFORGIATO RECUPERO received the Ph.D. degree in computer science from the University of Naples Federico II, Italy, in 2004. He has been an Associate Professor with the Department of Mathematics and Computer Science, University of Cagliari, Italy, since December 2016. From 2005 to 2008, he has been a Postdoctoral Researcher with the University of Maryland, College Park, USA. He is the author of more than 140 conference and journal papers in these research fields, with more than 1600 citations. His current research interests include sentiment analysis, semantic web, natural language processing, human-robot interaction, financial technology, and smart grid. He won different awards in his career (such as Marie Curie International Reintegration Grant, Marie Curie Innovative Training Network, Best Research Award from the University of Catania, Computer World Horizon Award, Telecom Working Capital, and Startup Weekend). He co-founded six companies within the ICT sector and is actively involved in European projects and research (with one of his companies, he won more than 40 FP7 and H2020 projects). In all of them, machine learning, deep learning, big data are key technologies employed to effectively solve several tasks.
ENRICO MOTTA received the Laurea degree in computer science from the University of Pisa, Italy, and the Ph.D. degree in artificial intelligence from The Open University. He is currently a Professor in knowledge technologies and the Former Director of the Knowledge Media Institute (KMi), The Open University, U.K., from 2000 to 2007. His research interests include the intersection of large-scale data integration and modeling, semantic and language technologies, intelligent systems, and human-computer interaction. Over the years, he has led KMi's contribution to numerous high-profile projects, receiving over £10.4M in external funding, since 2000, from a variety of institutional funding bodies and commercial organizations.
JENS LEHMANN received the Ph.D. degree (summa cum laude) from the University of Leipzig and the joint master's degree in computer science from the Technical University of Dresden and the University of Bristol. He is currently the Head of the Smart Data Analysis Research Group, a Full Professor with the University of Bonn, and a Lead Scientist with Fraunhofer IAIS. He authored more than 100 publications, which were cited more than 18000 times and have won 12 international awards. His research interests include semantic web technologies, question answering, machine learning, and knowledge graph analysis. He contributed to various open-source projects such as DL-Learner, SANSA, LinkedGeo-Data, and DBpedia.