Biomedical Relation Extraction With Knowledge Graph-Based Recommendations

Biomedical Relation Extraction (RE) systems identify and classify relations between biomedical entities to enhance our knowledge of biological and medical processes. Most state-of-the-art systems use deep learning approaches, mainly to target relations between entities of the same type, such as proteins or pharmacological substances. However, these systems are mostly restricted to what they directly identify on the text and ignore specialized domain knowledge bases, such as ontologies, that formalize and integrate biomedical information typically structured as direct acyclic graphs. On the other hand, Knowledge Graph (KG)-based recommendation systems already showed the importance of integrating KGs to add additional features to items. Typical systems have users as people and items that can range from movies to books, which people saw or read and classified according to their satisfaction rate. This work proposes to integrate KGs into biomedical RE through a recommendation model to further improve their range of action. We developed a new RE system, named K-BiOnt, by integrating a baseline state-of-the-art deep biomedical RE system with an existing KG-based recommendation state-of-the-art system. Our results show that adding recommendations from KG-based recommendation improves the system’s ability to identify true relations that the baseline deep RE model could not extract from the text. The code supporting this system is available at https://github.com/lasigeBioTM/K-BiOnt.


I. INTRODUCTION
T HE exponential growth in scientific literature does not allow researchers to keep up with all recent advances in their respective fields and in adjacent areas that could be of interest to their research [1]. To this end, the field of Natural Language Processing (NLP) is mostly focused on automatic means to identify and extract relevant information from unstructured text [2], [3]. One of the prominent tasks of the NLP field is Relation Extraction (RE), which aims at extracting and classifying relations between entities of interest. Most Biomedical RE studies focus on extracting relations between the same and different type entities, such as diseases, genes, phenotypes, and pharmacological substances. Recently, there have been several advances regarding this task, mostly using deep learning techniques [4]. However, few make use of external sources of knowledge that are openly available such as domain-specific ontologies, which are highly popular in the biomedical domain.
Using additional sources of knowledge can also result in more system explainability by facilitating the re-traceability of AI decisions to specific components of the models [5]. An ontology is a structured way of providing a common vocabulary in which shared knowledge is represented [6]. Biomedical ontologies are usually structured as directed acyclic graphs. Each node corresponds to an entity and the edges correspond to known relations between those entities of type is-a. Some of the most prominent ever-evolving biomedical ontologies are the Gene Ontology (GO) [7], the Chemical Entities of Biological Interest (ChEBI) [8], the Disease Ontology (DO) [9], and the Human Phenotype Ontology (HPO) [10]. Yet, researchers do not incorporate this structured information in most biomedical RE deep learning models.
On the other hand, Knowledge Graph (KG)-based Recommendation systems already showed the importance of external sources of knowledge to add additional features to items when using deep learning models [11]- [13]. This focus on external sources of knowledge could be highly relevant for the biomedical RE field since most researchers focus their work on already known relations between biomedical entities, which implies that a large volume of relations are not explicitly described in the literature. Consequently, systems that rely solely on available literature to identify these relations do not have enough information to establish more complex interactions. Therefore, we need to go beyond the text and uncover how to integrate entity annotations knowledge into our systems, as in most recent recommendation systems.
In this work, we propose to integrate a KG-based recommendation model into biomedical RE to answer the following question: Can recommendations add value to the biomedical RE task, enhancing their range of action?
The first step towards biomedical RE based on recommendation was to adapt three publicly available RE datasets into the standardized recommender systems format of <user-item-rating>. We chose the PGR-crowd [14] dataset that describes relations between human phenotypes and genes ( Fig. 1), the DDI Corpus [15]) that describes relations between drugs/chemicals, Fig. 1. An example of an annotated sentence retrieved from the PGR-crowd dataset from article PMID:29307790. The phenotype entities are linked to the HPO ontology (HP:0005321 and HP:0000252), and the gene entity is linked to the NCBI gene database (9343). We simplified the sentence to facilitate reading comprehension. Fig. 2. An example of a relation recommendation retrieved from the adapted PGR-crowd dataset [14]. This example illustrates how a graph connection can contribute to a new relation recommendation. and the BC5CDR Corpus [16], that describes drugs/chemicals interactions with diseases, to demonstrate the range of our approach.
To make the adaption of biomedical RE datasets to the <useritem-rating> recommendation format, we had to first decide which entities would be considered the users (user entities) and the items (item entities), for the datasets that had different type entities (PGR-crowd and BC5CDR dataset). Our choices for item entities, described in detail in Section III under the A. Datasets sub-section, were to give priority to entities that were covered by ontological KGs (phenotypes in the PGR-crowd by the HPO) and to diversify the type of ontological KGs chosen (the DO for diseases in the BC5CDR Corpus). Therefore, as KGs, we used the ontologies HPO [10], ChEBI [8], and DO [9], linked to the item entities when possible. Fig. 2 illustrates an example of a relation recommendation where the user EFTUD2 is related to Microcephaly and Mandibulofacial dysostosis in our RE dataset of reference ( Fig. 1). Using the KG we can determine that one of the ancestor connections for Microcephaly and Mandibulofacial dysostosis is Abnormality of the skull. By sharing an ancestor connection, these two items reinforce the connection between other descendants and the user EFTUD2. Thus, we can recommend a relation between our user EFTUD2 and our item Cephalocele (i.e., the green dashed line).
In real-case scenarios, our approach would imply adapting existing and future RE biomedical datasets so that all entities are linked to a KG identifier, which by itself would enhance the applicability of such datasets. Moreover, linking identified entities to KGs is already a widely disseminated NLP task called Named-Entity Linking (NEL) or Concept/Entity Normalization, and most of the time, a natural precedent to RE [17]. As stated previously, a large volume of biomedical ontologies covers most types of studied entities. Therefore, the biggest hurdle would be to guarantee high KG coverage for the original datasets, while our adaptation of the datasets to a recommendation format is highly generalizable. For instance, given a biomedical sentence within a dataset, where the offsets of the entities of interest are identified and linked to KGs (Fig. 1), the next step would be to give a rating to the possible relations considering the whole training dataset and identifying if there are other entities within the KG ancestry line that exhibit similar relations that can further support a true relation. Thus, we expect that the degree of coverage of each KG over each dataset dictates the effectiveness of the approach as well as other factors that we will discuss throughout this manuscript.
After dataset re-formation, we adapted a state-of-the-art recommendation system to recommend relations between biomedical entities considering the specificities of the domain and evaluated the system's added value to a standard state-of-the-art deep learning biomedical RE system.
In this paper, we present the results for relation recommendation on its own, and then the added value of these recommendations to a deep learning biomedical RE system. Our results show that adding KG-based recommendations improves RE systems' ability to identify true relations in high KG coverage settings that baseline deep RE models could not extract from text, indicating that the recommendation model adds value to the RE task. This work resulted in the following main contributions: r Pipeline for integrating a KG-based recommendation model in a biomedical RE system, including the adaptation of RE datasets for training.
r Biomedical RE deep learning system with added knowledge in the form of KG-based recommendations (K-BiOnt).
The following section will present the related work for RE and recommender systems that specifically target the biomedical domain. We will then proceed to methodology, where we describe the dataset construction and the different training stages on their own and their joint evaluation. Next, we present results and discuss the effects of the KG-based recommendation on biomedical RE. Finally, we finish with the main conclusions and future work.

II. RELATED WORK
Biomedical RE is a task in NLP that usually follows Named-Entity Recognition (NER) and Named-Entity Linking (NEL) or Concept/Entity Normalization [18], the identification and mapping of entities in unstructured text, respectively [19]. This information extraction task mostly focuses on relations within the same sentence [20], with approaches that range from cooccurrence to a variety of Machine Learning methods. However, in recent years, deep learning approaches became state-of-theart for most domains achieving high levels of precision. However, many biomedical relations are still hard to extract even when exploring the full documents [21]. This can be explained by the complexity of the domain that requires a extensive domain knowledge to be correctly perceived. One of those models, inspired by the BO-LSTM and BI-LSTM models [22], [23] is the BiOnt model [24]. The BiOnt model uses deep learning methods along with domain-specific ontologies, achieving state-of-theart results. Thus, it is a model that already explores external sources of knowledge to perform biomedical RE even if with less KG depth than our proposal. Therefore, in this work, we propose a more in-depth use of KGs with recommendation approaches to detect missed true annotations by BiOnt, which we detail in the methodology section. We chose BiOnt over other biomedical or even non-biomedical [25], [26] RE systems [27], due to its use of similar external entity knowledge. If we can effectively improve the BiOnt model, it shows that their use of knowledge is not sufficient to fully grasp less frequent relations. Nevertheless, we also use the state-of-the-art BioBERT system [27] on the same datasets to provide an extra comparison baseline. The BioBERT system is a BERT-based contextualized word representation model based on a masked language model and pre-trained using bidirectional transformers on large-scale biomedical corpora.
Item recommendation initially focused on similarity-based methods that aimed at extracting features of users and items, computing their similarity, recommending similar users or items to a target user. Similarity-based methods using Neural Network (NN) models effectively extract latent features of users and items for recommendations [28]- [30]. However, they deal with issues such as data sparsity [31], cold-start [32], and lack of explainability [33] (i.e., an user understanding why an item is being recommended). Content-based methods introduce additional information, such as relational data [34] and knowledge graphs [35], [36], and help relieve those issues. Therefore, recently, researchers have focused their attention on generating recommendations using knowledge graphs as additional information [13], [37], such as the TUP system created by Cao et al. [38].
Current recommender systems that deal with biomedical data, target different tasks, such as recommending ontologies to annotate biomedical text [39], model biological processes [40], recommending drugs to target SARS-CoV-2 regarding the COVID-19 pandemic [41], recommending entities of potential interest to specific researchers [12], or even recommending articles to expand existing biomedical datasets [42]. There is also a focus on recommending articles and venues to researchers to limit their search space [43], [44], for instance, by performing keyword-based recommendation [45]. Further, there is a significant amount of work done on biomedical KG completion [46]- [48], including trying to depend less on domain-specific labeling and going through a minimum supervision route that can scale with the volume of literature available [49].

III. METHODOLOGY
To demonstrate the benefits of allying KG-based Recommendation to deep Biomedical RE, we used three publicly available datasets describing relations between different types of biomedical entities: the PGR-crowd [14], [50], the DDI Corpus [15], and the BC5CDR Corpus [16]. The first step was to convert these RE datasets into a format compatible with KG-based Recommendation, which required several adjustments and rating assessment described in detail in the following section. Moreover, in this section, we will provide the training and evaluation details for each component system: deep biomedical RE and KG-based Recommendation on their own, and as added features in a deep biomedical RE system plus recommendations. All baseline systems used or adapted throughout our work are openly available through their respective authors, including original configuration details.

A. Datasets
To take advantage of KG-based Recommendation into RE, we had to create standard <user-item-rating> datasets using the PGR-crowd, DDI Corpus, and BC5CDR Corpus original RE datasets. These datasets describe relations between human phenotypes and genes, using NCBI gene database 1 and HPO identifiers (PGR-crowd), between drugs/chemicals, that can be linked to ChEBI ontology identifiers (DDI Corpus), and interactions between drugs/chemicals and diseases, that can be linked to the ChEBI and DO ontologies (BC5CDR Corpus). The PGR-crowd and DDI datasets are available in the same XML format, and the BC5CDR Corpus is available in a text format, for standard RE applications. Table I presents the RE datasets' general statistics, including counts for the total number of entity annotations and the distribution of true and false relations. A relation is considered true if semantically there is an implication of an association between two entities considered in the same sentence, and false if there is no semantic relation or a semantic connection negates the relation between the entities. We did not consider the DDI Corpus relations' different labels classifications for this work, only the binary classification of true/false.
Although the protocols were similar for the three datasets, we had to consider that in the PGR-crowd and BC5CDR datasets, each relation had two distinct entities (genes and human phenotypes, and drugs/chemicals and diseases, respectively). To cast our item roles for the PGR dataset, we chose the phenotype entities since these were already mapped to an ontological knowledge graph (HPO). To diversify our approach, for the BC5CDR Corpus, we decided to map disease entities to the DO ontology as item entities. While on the DDI Corpus dataset, we were dealing with relations between the same type of entities (i.e., drugs/chemicals) that could both be mapped to a knowledge graph (ChEBI).
In the PGR-crowd dataset, we considered our users as genes and our items as human phenotypes. Each relation can appear more than once in a RE dataset since different sentences/articles can describe the same relation. We can have multiple instances of the same relation with the same or different labels. Therefore, we attributed 1 to true relations and −1 to false relations and considered the rating the sum of all occurrences of the same relation within the training dataset. This process allowed us to have only one occurrence of each relation as expected in recommender systems type datasets, where the user only rates an item once. Fig. 3 further elucidates the process by direct comparison with an example of the MovieLens-1 m dataset [11]. We followed the same procedure for the BC5CDR Corpus, where we considered our users drugs/chemicals and our items diseases.
The DDI Corpus was not as straightforward to assign user and item roles to the drug/chemical entities. As each relation has two entities of the same type, we had to verify the symmetry between relations (e.g., is entity one an effect of entity two and vice-versa, or is it just a one-sided relationship?). For this we considered the classification done by the creators of the DDI Corpus dataset, where each true relation could be of type effect (asymmetric), mechanism (asymmetric), advice (symmetric), or int (symmetric). While the other types are intuitive, the int type refers to the default positive interaction for which there is no additional information. So, we disregarded the entities' order for false and symmetric relations and maintained the order assigned for true asymmetric relations when adapting the RE dataset for recommendation. The process of calculating the ratings was identical to the previously described for the PGR-crowd and BC5CDR datasets.
For model training, we converted all entities to an internal identifier. Also, the existing ratings were treated as positive interactions while negative interactions were generated randomly by corrupting items following other models that target implicit feedback [51]. In the work done by Cao et al. [38] the negative sampling was done by corrupting items that were less commonly used by users, which could not be applied to datasets with low average ratings.
While previous works [52], [53] mapped publicly available datasets such as MovieLens-1m [11] and DBbook2014 2 to DBPedia [54] entities, whenever a mapping was available, we mapped our datasets to three publicly available biomedical ontologies (HPO for PGR-crowd, ChEBI for DDI Corpus, and DO for BC5CDR Corpus). For the PGR-crowd dataset, since the preexisting entity identifiers already linked to HPO, our coverage was 100%. However, for the other two datasets (BC5CDR and DDI), the coverage was 26.0% and 32.1%, respectively, which is expected since the creators did not rely on the DO or ChEBI ontology to identify the original entities. The mapping was done automatically by exact matching, allowing for a Levenshtein distance of 1. Thus, particularly in the DDI and BC5CDR Corpora, we did not match possible synonym entities. Doing a more detailed mapping would require either the usage of external normalization tools [18] or domain expertise to review all entities, which would be time and cost-intensive. However, we recognize the limitation in our entity normalization stage, that could be improved in future work. In contrast with previous work [38], we did not preprocess the datasets to filter low-frequency users and items or performed editing on the type of entities or relations in triples, due to our universe being considerably smaller and the reassurance of using domain-specific ontologies instead of the generic domain of DBPedia.
Table II describes the statistics for the three datasets (PGRcrowd, DDI Corpus, and BC5CDR Corpus) regarding the KGbased recommendation format. The data sparsity issue is prevalent in all datasets due to the low number of average ratings.

B. Training
The deep biomedical RE system BiOnt [24] worked as our baseline since it achieved state-of-the-art performance in the datasets used in this work and also uses knowledge graphs as added information layers. We designed experiments regarding relation recommendation, following the work of Cao et al. [38], and the incorporation of those recommendations into BiOnt (K-BiOnt). These systems were chosen due to their state-of-the-art results but also their availability and in-depth documentation. For all our experiments, we divided the datasets into a 6:1:3 ratio, corresponding to the training set, the validation set, and the test set, respectively. We used the original datasets as provided for the Deep Learning component, making the appropriate parsing for the system specifications. While for the Recommendation component, we used the re-formatted datasets, as described in the previous sub-section.

1) Baseline Deep Learning Model:
The BiOnt model uses ontologies as external sources of knowledge to add information layers to a baseline deep learning model, following the work of Lamurias et al. [22]. An ontology is a formal definition of concepts related to a specific subject. It can be represented by a tuple < C, R >, where C represents the set of concepts in an ontology and R the set of relations between the same ontology concepts. Similar to our dataset construction, the type of ontology relations considered by Sousa and Couto [24] is subsumption relations, is-a due to its transitive aspect. For instance, with (c 1 , c 2 ) ∈ R, and (c 2 , c 3 ) ∈ R, the authors assume that (c 1 , c 3 ) is a valid relation within the ontology. The ancestors of each concept c are given by: where T is the transitive closure of R. The authors define the common ancestors between the concepts c 1 and c 2 as: A relation between different ontology concepts can be represented by (x 1 , y 1 ), where x 1 ∈ X and X represents the set of concepts in the first ontology, and y 1 ∈ Y and Y represents the set of concepts in the second ontology. For instance, with (x 2 , y 2 ) ∈ RA, where RA is the set of relations between ancestors, and (x 2 is −a x 1 ), and (y 2 is −a y 1 ), their model assumes that (x 1 , y 1 ) is a valid relation. The concatenation of the relations between the ancestors of concepts x 2 and y 2 is defined using: Since the common ancestors' channel could only be used for relations between the same type of biomedical entities (i.e., DDI Corpus), we only use the concatenation of ancestors channel for the relations between different biomedical entities (i.e., PGRcrowd).
Each ontology concept corresponds to one-hot vector v c , a vector of zeros except for the position corresponding to the concepts' ID. An embedding matrix M ∈ R D×C transforms these sparse vectors into dense vectors, where D is the dimensionality of the embedding layer and C is the number of concepts of the ontologies. Then, the output of the embedding layer is given by: Next, the ontology embedding layer, with a dimensionality of 50 (as suggested by [22]), initializes its values randomly to be later tuned through back-propagation. Then, the vectors' sequence representing the relations between the ancestors of the terms is fed into a Long short-term memory (LSTM) layer, ordered from the more general concepts to the terms themselves. Finally, the system uses a max pool layer fed into a dense layer through a sigmoid activation function, and a softmax layer outputs the probability for each class.
The model was trained using a stochastic gradient descent optimization algorithm where weights were updated using the back-propagation of error algorithm. At each iteration, the model with a given set of weights creates predictions and computes the error for those predictions. The optimization algorithm seeks to alter the weights to reduce that error in the next evaluation. The relevant hyperparameters of this model tuned for our experiments were mini-batch gradient descent optimization algorithm (RMSprop), learning rate (0.0001), loss function (categorical cross-entropy), and dropout rate (0.500) for every layer except the penultimate and output layers.
We used the three standard evaluation metrics for RE models: Precision: Expresses how often the results are correct; Recall: It is the number of correct results identified; F1 score: Expresses overall performance by the harmonic mean of precision and recall.
2) Item Recommendation: The TUP model created by Cao et al. [38] takes a list of user-item pairs Y = (u, i) as input, and outputs a relevance score g(u, i; p), indicating the likelihood that u likes i, given the preference p ∈ P. In this work, instead of the terminology being the likelihood that u likes i is the likelihood that u as a biomedical entity is related to i as another biomedical entity. For each user-item pair, the TUP model induces a preference. The authors designed two strategies for preference induction: a hard approach that selects one out of the P preferences and a soft way that combines all preferences with attentions. The soft strategy yielded a better performance for both authors' datasets. Therefore, we opted for this strategy to create our models. The preferences constitute the motives for each an user entity may be connected to an item entity. For traditional recommendation set-ups, these usually lack depth in explainability. To avoid that, in this work, we provided them explicit semantics by aligning them with the KG relations, capturing the intuition that the types of item attributes play a crucial role in user assignment. Considering that an entity might be related to another entity according to various factors, which can not be restricted by a firm boundary, instead of selecting the most prominent preference, the soft strategy combines multiple preferences via an attention mechanism: where α p is the attention weight of preference p , and defined as proportional to the similarity score: To deal with the issue where one entity (i.e., user) might be associated with multiple entities (i.e., items), and also, several entities (i.e., users) may be associated with a single entity (i.e., item) (1-to-N and N-to-1 issues), the authors introduce preference hyperplanes assigning each preference with two vectors (inspired by TransH [55]): w p for the projection to a hyperplane, p for the translation between users and items. The authors define the hyperplane-based translation function as follows: where u ⊥ and i ⊥ are projected vectors of the user and the item, and are obtained through the induced preference p. w p is the projection vector that is obtained along with the induction process of preferences p through attentive addition of all projection vectors based on the induced attention weights in the soft strategy: Then, the authors encourage the translation distances of the interacted items to be smaller than random ones for each user through BPR Loss function: where Y contains negative interactions by randomly corrupting an interacted item to a non-interacted one for each user.
The relevant hyperparameters for TUP tuned for our experiments were learning rate (0.005), L 2 coefficient (10 −5 ), optimization method (Adagrad), batch size (512), embedding size (64), and the number of preferences (1). The number of preferences corresponds to the number of different relations within the KGs attributed to each dataset (Table II). Since this model is inspired by the TransH model described previously, some functionalities could not be explored due to the lack of relation diversity.
The TUP model uses as evaluation metrics the following: Gain (nDCG) is a standard measure of ranking quality that considers the graded relevance among positive and negative item entities within the top 10 of the ranked list.

3) K-Biont:
The general approach pipeline that joins BiOnt to a TUP adaptation for RE is presented in Fig. 4.
First, we proceed with the standard training process of the BiOnt model, which can be divided into three main stages after sentence tokenization: WordNet classes [56], word embeddings, and ontology embeddings. The ontology embedding stage represents the relations between the ancestors for each ontology concept corresponding to an entity. For instance, for the PGRcrowd dataset, the system links the entities to the HPO and GO biomedical ontologies, the DDI Corpus to the ChEBI ontology, and the BC5CDR Corpus to the ChEBI and DO ontologies with different coverage degrees, as discussed previously. The system uses a max pool layer fed into a dense layer through a sigmoid activation function, and a softmax layer outputs the probability for each class. The BiOnt model adds external entity knowledge through two channels of common ancestry and concatenation of ancestors. These knowledge channels aim to answer the questions: i) Do the entities in question share ancestors? (only applicable to relations between the same type of entities); and ii) Do the entities in question have ancestors that have established relations? The BiOnt model uses the answer to these two questions to support or discard a relation. However, our K-BiOnt knowledge layer goes deeper into the inferences that can support or discard a relation by answering: Do we have entities outside of the ones considered in the relation that we know that establish relations with one of the entities ancestors? If yes, how many? In what capacity (true/negative)? And in what degree (e.g., 2, 3, or −5)?
Thus, going back to the example in Fig. 4, our goal is to support or discard a relation between a gene HDAC4 and a human phenotype, Mandibular prognathism. Yet, since their ancestors do not have known relations (excluding the BiOnt concatenation of ancestors' channel) and the entities are from different types (excluding the usage of the common ancestors' channel), the BiOnt knowledge layer does not provide information to support or discard this relation. However, since we know from the training set that the gene, HDAC4, shares a true relation with both the Jaw ankylosis and the Trimus phenotype, and both of these entities have the same ancestor as the Mandibular prognathism phenotype, we can support a true relation between HDAC4 and Mandibular prognathism. Our knowledge layer considers translational relationships between users and items. Which in our example means that for each human phenotype (item) related to a gene (user), we consider the whole subsequent ancestry of the phenotype until root to provide information on our gene.

C. Joint Evaluation
To evaluate our approach, we created a confusion matrix table to compare the BiOnt model's output and the adjusted TUP model results against our gold standard test sets. This table served as a way for us to detect the main contributions of adding KG-based recommendation to a deep RE model (K-BiOnt) to all datasets.
One caveat is that an extracted relation between two entities is specific to the text where they are mentioned; however, in KG-based recommendation, the relation tag is specific to the entities it refers. To overcome this issue, we primarily considered the label generated through the application of the baseline system to the normal text-bound RE dataset. We only changed the output if the modules (Deep Learning and Recommendation) disagreed with the label. Upon disagreement, we only altered the label if the Recommendation module assigned true and the baseline system false. Thus, only considering the Recommendation module input to capture an undetected connection. Table III presents an example of a confusion matrix table for five distinct scenarios using relations from the PGR-crowd dataset. All true relations captured by KG-based recommendation were attributed with the final judgment of true independently of the Top@N. We only considered false relations in the final judgment if all model components agreed on the label false. Therefore, the BiOnt module component was preferred for the attributed label since it is based on the linguistic context of the relationship. The Recommendation module, based exclusively on knowledge regarding the target entities, was only considered for potential true labels that we hypothesis could not be retrieved solely on the linguistic context or were less frequent in the training data.

IV. RESULTS AND DISCUSSION
This section presents our assessment of the benefits of using KG-based recommendation as an added resource for RE systems in the biomedical domain. As baseline, we compared the results of the baselines deep learning models BiOnt [24] and BioBERT [27] for RE with the adjusted TUP model [38] for  item recommendation, and the integration of BiOnt with TUP (K-BiOnt).

A. Deep Learning Model
Table IV presents the results for the application of our three datasets to the deep learning models BiOnt and BioBERT.
For BiOnt, the results were slightly different from the performances reported on the original work for the DDI Corpus [24], but almost identical for the PGR-crowd dataset [14]. We can justify the DDI Corpus performance differences (≈ 4% in F1) with our use of the updated ChEBI version that has fewer alignments with the entities in the original dataset. The PGR-crowd as a significant imbalance of true/false relations with the majority of relations being true and the DDI and BC5CDR Corpora share the same imbalance but in favour of the false relations (Table I), which can affect the performance of these datasets differently, despite the BiOnt model ability to assign class weights.
As for the BioBERT system, the PGR-crowd dataset and the BC5CDR Corpus results were very similar to the BiOnt's model performance. However, given the class imbalances of all three datasets, it should be possible to alter class weights. Still, BioBERT's loss function does not allow this flexibility, possibly undermining their results. The BioBERT system significantly outperformed the BiOnt model for the DDI Corpus, despite the class imbalances.

B. Knowledge Graph-Based Recommendation
Table V presents the results for the adapted TUP model using the soft item recommendation strategy mentioned in Section III-B3. TUP authors [38] state that the peak performance for their model is when the average number of ratings for user ranges from 100 to 200. This range is far from our average number of ratings for both datasets (2 for PGR-crowd, 10 for DDI Corpus, and 4 for the BC5CDR Corpus). Thus, we believe that more training data allied with less sparsity would enhance our results further. Also, the higher overall results for the PGR-crowd demonstrate the importance of item-entity alignments since all items (i.e., human phenotypes) were linked to the HPO [10]. In contrast, on the DDI and BC5CDR Corpora, only 47.8% and 26.0% of the items could be linked to the ChEBI [15] and DO [9] ontologies, respectively, leading to a drop in performance.

C. Joint Evaluation
Table VI presents the final results by adding the adjusted TUP model recommendation to the BiOnt model (K-BiOnt), considering top@3, top@5, and top@10 recommendations. These results are a reflection of the results of the confusion matrix tables created as described in the example of Table III. For the PGR-crowd dataset, the average number of ratings per user entity is 2. A false relation usually appears only once being rated with −1 and not a lower number which is not sufficient to indicate to the model that the user entity is unrelated to an item entity. An approach that we could study in the future is creating negative sampling using false relations, not the traditional random sampling for negative observations associated with implicit feedback. The added performance of TUP over the BiOnt model (K-BiOnt) holds for all Top@N. However, the number of false positive relations increases with the subsequent decrease in performance for Top@5 and Top@10. Although, after closer inspection to the added false positives, for the majority of them, the item entity human phenotype is under the Mode of inheritance category of HPO, not under Phenotypic abnormality. This last branch is the most developed branch within the HPO and of more interest to researchers. Likewise, the BC5CDR Corpus also increased performance compared to the BiOnt baseline for Top@5 and Top@10 despite the low TUP performance, which indicates potential for a more impactful approach following improvement in linking the entities to KG identifiers.
For the DDI Corpus, the results are identical across all Top@N (the BioBERT baseline) since we could not capture a true positive through item recommendation within the first ten recommendations.

D. Ablation Study
To study the impact of knowledge graph coverage, we chose the dataset with the lowest value in coverage (BC5CDR Corpus).  We created the recommendation module only taking into account the 156 items linked to DO ontological concepts. Table VII presents the results for this study. By Table VII, we can verify that the presence of only ontological covered items, even if in a small number, is enough to impact the performance of item recommendation, almost doubling our previous results. Even if there was no significant impact on the K-Biont model due to the small number of items, we know that augmenting the covered items through a more robust concept normalization step can improve the K-BiOnt performance.

E. Impact on RE
We decided to perform error analysis on the performance of the PGR-crowd dataset, comparing the baselines BioBERT and BiOnt to our approach K-BiOnt to measure their actual impact on the RE task. We found that more true relations were identified by considering relations that the KG-based recommendation model recommended.
In the PGR-crowd dataset, the item entities (i.e., human phenotypes) are all linked to the HPO, with subsequent complete coverage of the KG entities over the item entities. The full coverage translated a higher contribution of the recommendation module to the RE task. Fig. 5 illustrates one of those true relations detected by the recommendation module and missed by the BiOnt model. Note that our models only added true relations recommended with the adjusted TUP at Top@3. All other experiments also recommended false positives, undermining the recommendation module benefits.
These results show the advantage of adding recommendations to RE, mainly to populate knowledge bases of gold standard relations, where the goal is not only to identify the relation that is explicitly mentioned in the text but to find every true relation that we can derive from it. The success of the recommendation module is explained by the exploration of KGs that allows the RE process to consider the connections between the associated KG.
However, existing KGs are far from complete, limiting the knowledge we can transfer into RE systems. Considering this limitation, Cao et al. [38] aligned item recommendation (TUP) with KG completion (TransH). KG completion is a field in accelerated popularity given its relevance for question answering tasks [57], [58], but also to search entities and their relations in text [59]. This field should be considered for future exploration of added knowledge to biomedical RE, to further enhance the recommendation of less frequent relations.

V. CONCLUSION AND FUTURE WORK
This paper proposed a new recommendation-based complementary approach to deep learning biomedical RE that considers biomedical ontologies as additional sources of information. The KG-based recommendation pipeline presented in this work takes advantage of user entity-item entity interactions as well as knowledge graphs that can be linked to item entities. In our case study, the biomedical KGs were HPO, ChEBI, and  VII  ABLATION STUDY RESULTS REGARDING THE TUP MODEL FOR ITEM RECOMMENDATION FOR THE FULL DATASET AND THE KG  COVERED SUBSET OF THE BC5CDR CORPUS DO. We performed experiments using both item recommendation algorithm on its own and as an added module to a deep biomedical RE system. We present the benefits of using both methods simultaneously and the RE task's added value. Our results show that KG-based recommendation can be a valuable asset to biomedical RE by detecting previously undiscovered true relations between biomedical entities. However, the low coverage of the associated KGs damages performance. Additionally, we produced three recommendation datasets in the format <user-item-rating> for human phenotype and gene relations, drugs/chemicals interactions, and drugs/chemicals and diseases relations attributing a rating for each user-item pair. We also presented a comprehensive pipeline for creating a biomedical RE system using KG-based recommendation (K-BiOnt). Ultimately, we demonstrated that adding recommendations can increase deep biomedical RE models' performance by considering external sources of knowledge when they have sufficient coverage of the domain.
Biomedical RE datasets usually do not describe more than one type of relation. However, upon the availability of a dataset describing more types of labeled relations, a multi-graph approach could be employed linking each item entity to their respective ontological identifier. Even though we could argue that our representation of the ratings between user-item pairs is not representative of real-world, it is a cross-approach problem. Current deep learning approaches to biomedical RE also take labeled data to create models where the distribution is not a representation of real-world data and where a lot of less frequent associations are missed. In the future, the approach could be expanded by considering other types of relations between biomedical entities and by applying it to different types of baseline systems (i.e., BioBERT). Another angle to be explored could be adding more biomedical ontologies, including possible interconnections between multiple ontologies, that could expand our KGs even further by increasing the number of preferences. Also, upon availability within biomedical ontologies, another complementary route could be adding informative axioms such as disjointness and studying the effect of ontological depth.