High-Precision Biomedical Relation Extraction for Reducing Human Curation Efforts in Industrial Applications

The body of biomedical literature is growing at an unprecedented rate, exceeding the ability of researchers to make effective use of this knowledge-rich amount of information. This growth has created interest in biomedical relation extraction approaches to extract domain-specific knowledge for diverse applications. Despite the great progress in the techniques, the retrieved evidence still needs to undergo a time-consuming manual curation process to be truly useful. Most relation extraction systems have been conceived in the context of Shared Tasks, with the goal of maximizing the F1 score on restricted, domain-specific test sets. However, in industrial applications relations typically serve as input to a pipeline of biologically driven analyses; as a result, highly precise extractions are central for cutting down the manual curation effort, thus to translate the research evidence into practice smoothly and reliably. In this paper, we present a highly precise relation extraction system designed to reduce human curation efforts. The engine is made up of sophisticated rules that leverage linguistic aspects of the texts rather than sticking on application-specific training data. As a result, the system could be applied to diverse needs. Experiments on gold-standard corpora show that the system achieves the highest precision compared with previous rule-based, kernel-based, and neural approaches, while maintaining a F1 score comparable or superior to other methods. To show the usefulness of our approach in industrial scenarios, we finally present a case study on the mTOR pathway, showing how it could be applied on a large-scale.


I. INTRODUCTION
In the last 30 years we have positively observed a rapidly growing body of biomedical literature. As a consequence, it is more and more difficult for researchers to keep pace with the advances in their fields. Indeed, it has been recently shown that one would have to examine 27 papers per day from 130 previously scanned journals to stay up to date with the literature about a single, specific disease [1]. Such a large volume of written biomedical knowledge is becoming increasingly available in the form of electronic data resources such as digital libraries and biomedical databases. The largest bibliographic archives such as PubMed [2] and PubMed The associate editor coordinating the review of this manuscript and approving it for publication was Arianna Dulizia .
Central (PMC) [3] give access to a total of over 31 million abstracts and 6.3 million full text documents that are currently growing at a double-exponential rate. Since researchers struggle to cope with this amount of data, the development of effective biomedical text mining systems has become increasingly important to allow them to dig through undiscovered knowledge.
A variety of text mining tools have been developed over the last two decades. Efforts by the US National Library of Medicine have led to the well-known PubMed search service which allows users to browse research publications filtered according to user queries including concepts specified with manually curated MeSH terms [4]. Systems such as FACTA [5] and Polysearch2 [6], [7] have also been conceived to retrieve relevant information exploiting the co-occurrence of VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the concepts of interest. However, assuming concepts mentioned together to be related typically leads to a lot of irrelevant results. Natural Language Processing (NLP) techniques have begun to be explored in the last two decades in order to effectively derive meaning from human language in a deeper useful way. Particularly, Relation Extraction (RE) has attracted a lot of interest as a valuable tool ranging from the population of knowledge bases to the construction of biochemical pathways [8]. To encourage the development of highly performing relation extraction systems, several community challenges (i.e., Shared Tasks) [9] have been designed. For instance, biomedical relation extraction has been employed to answer research questions ranging from the identification of protein-protein interactions [10], [11], genedisease associations [12]- [15], adverse drug events [16], [17], and protein subcellular localization [18]. Many systems have been designed to deal with the peculiarities of the specific application domain because of the great complexity and diversity of topics included in the biomedical literature, falling short when dealing with different research questions. As a matter of fact, most systems have been proposed in the context of Shared Tasks, in which the focus is on improving the harmonic mean of precision and recall (i.e., the F1 score) on specific test data, rather than providing a highly precise extraction of information that cuts down the need of a human manual curation. Indeed, the results of relation extraction systems still need to undergo a manual scrutiny by field experts in order to make the information ready to be exploited. This resource-demanding manual scrutiny should ideally be avoided in real-world contexts, where biomedical relation extraction is the first step of a complex pipeline of biologically driven analyses which requires highly precise relations in order to produce reliable insights. This is even more important because of the rapidly growing body of biomedical literature, which calls for frequent updates of the extracted evidence during a project life cycle. Highly precise relation extraction results, with a satisfactory recall, are thus crucial in real-world scenarios to smoothly translate the extracted information into actionable knowledge.
In this paper, we present a relation extraction system designed to extract highly precise semantic relationships from biomedical texts without the need of training data. Our approach is based on a sequence of NLP syntactic modules, and a novel dependency tree based relation engine which captures relations by means of syntactic rules based on common linguistic patterns. As a result, our system could be applied to different corpora without relying on application specific biomedical relation instances. The highly precise results largely limit the need for human manual curation, allowing scientists to quickly keep abreast of novel discoveries and thus to drive an effective research. 1 The paper is organized as follows. Section II lists related work in the field. Section III describes the methods of our system, going through the natural language processing analysis and the relation extraction engine. Section IV presents the results of our system showing the quality of the method with respect to well-established gold-standard corpora and recent approaches. Also, a detailed error analysis, current limitations, and room for improvements are discussed. Section V outlines a case study to show how our system could be effectively applied on large-scale industrial scenarios, whereas conclusions are in Section VI.

II. RELATED WORK
A variety of methods has been adopted for biomedical relation extraction. These approaches can be mainly divided into three categories: rule-based methods, feature-and kernelbased methods, and neural methods. Rule-based approaches typically make use of linguistically-motivated patterns on dependency parse trees or surface words in order to capture semantic relationships. Fundel et al. [19] showed how a small number of carefully designed rules based on the shortest dependency path (SDP) between two examined entities produces fairly good results. Yu et al. [20] exploited dependency parse trees and a flexible pattern matching scheme, enriching the system with a decision tree classifier. Diverse syntactic and orthography features have been extensively used in feature-and kernel-based methods. Phan et al. [21] proposed an automatic feature selection method based on the contribution levels of different feature groups, followed by a k-nearest neighbor (k-NN) classifier. A variety of kernelbased methods have been proposed too, ranging from the walk-weighted subsequence kernel [22] to a combination of kernels based on different parsers [23]. Other kernelbased approaches for biomedical relation extraction include a linguistic pattern-aware dependency tree kernel combined with a tree kernel [24], a convolution tree kernel [25], and a distributed smoothed tree kernel combined with a feature kernel [26]. In the rising wave of deep learning, Zhao et al. [27] proposed a deep multi-layer neural network for the task. More recent neural methods use Recurrent Neural Networks (RNNs), including Bidirectional Long Short-Term Memory (LSTM) and tree LSTM networks, and Convolutional Neural Networks (CNNs). Zhang et al. [28] showed how leveraging the complementary advantages of RNNs and CNNs in a combined hybrid model improves biomedical relation extraction. Yadav et al. [29] experimented with a bidirectional LSTM network with an attention mechanism, exploiting word sequences and the shortest dependency path between the entities, whereas Zhang et al. [30] introduced a residual CNN to tackle the task. Ahmed et al. [31] exploited a tree LSTM network using a structured attention architecture, showing how the attention mechanism improves the performance in relation extraction. A recent research line in NLP include the Transformer, an encoder-decoder architecture which dispenses with convolutions and recurrence, being based solely on an attention mechanism [32]. In our biomedical relation extraction approach each input document is firstly analyzed by syntactic preprocessing modules (i.e., tokenizer*, POS tagger, chunker*, dependency parser, and syntactic corrector*). The resulting syntactic dependency parse tree and token annotations, along with candidate entity pairs, are analyzed by a relation router to detect candidate relations. Actual relations are finally identified by a relation classifier, powered with pattern matching rules on the dependency tree. *Custom implementation of preprocessing components.
This architecture is the core of pre-trained language models such as BERT (Bidirectional Encoder Representations from Transformers) [33], and its adaptively pre-trained variants for biomedical texts, namely BioBERT [34] and SciBERT [35]. Despite the recent advances in deep learning based techniques, we rely on carefully designed syntactic rules on dependency parse trees in order to avoid being dependent on labeled data, and to be able to reuse our system in diverse industrial scenarios. The most similar approach to our work is thus represented by the work by Fundel et al. [19].

III. METHODS
The system is designed to extract highly precise relational information from input texts. In Fig. 1 we schematically show our approach to relation extraction, that includes two stages: (i) text preprocessing, in which a sequence of natural language processing modules are applied to texts (Section III-A), and (ii) relation extraction, in which relationships between entities are identified and classified (Section III-B).

A. SYNTACTIC PREPROCESSING
A pipeline of natural language processing modules is needed in order to provide the relation extraction engine the information needed to extract highly precise semantic associations. We present them in the following.

1) TOKENIZATION
The raw text is separated into tokens using the spaCy 2 tokenizer. We customize it to segment text units also on punctuation (e.g., hyphens, slashes, etc.) by means of regular expressions. This fine-grained approach to tokenization originates from the observation that not all the symbol-separated 2 www.spacy.io tokens are the smallest units of information to work with. For instance, ''IL6-induced atrophy'' is typically divided in two tokens (''IL6-induced'' and ''atrophy''). However, ''IL6induced'' implicitly encodes relational information that is eventually desirable to analyze.

2) PART-OF-SPEECH TAGGING
Each token is assigned a label describing its part-of-speech (POS) at two different granularities: a coarse-grained one (from Universal POS tags 3 ) and a fine-grained one (from Penn Treebank tags 4 ). We use the spaCy en_core_web_lg neural model for POS tagging. These labels serve to both the chunking and the syntactic dependency parsing steps.

3) CHUNKING
Our fine-grained approach to tokenization allows the system to ultimately merge only the tokens that together form a selfcontained chunk of information. For instance, given the set of tokens T = {(, AKT , ), −, 1}, they are part of the same concept ''(AKT)-1'', hence it is desirable to merge them into a single token. To the goal, we designed sequence patterns based on the orthography, and on the fine-and coarsegrained POS tags of the tokens. The resulting token stores all attributes of its original constituents (POS tags, lemmas, surface text, etc.). The list of patterns is in Table 1, together with common examples. We designed this module to: 1) reduce potential errors in syntactic parsing in case of long and articulated texts; 2) process easily multi-token words (e.g., ''Interleukin 6''), frequent in biomedical texts. The chunking patterns used by the system. Each token T i in a candidate sequence T 1 . . . T n must satisfy some pattern restrictions on the orthography and part-of-speech levels in order to be merged. Underlined tokens are the triggers of a pattern, which have to meet their restrictions in order to proceed. The rest of the sequence tokens are thus subsequently checked for restrictions. If all the pattern restrictions are satisfied, the merging rule is applied.
For instance, the units the chunker produces for the following sentence are presented inside brackets:

4) SYNTACTIC DEPENDENCY PARSING
A syntactic dependency parse tree of the text is built using the spaCy non-monotonic transition-based parser. We chose to rely on the spaCy parser since it has been benchmarked to be the fastest to date, 5 and thus it fully meets industrial requirements. The grammatical dependencies (hereafter, edges) of the tokens or chunks (hereafter, nodes) are drawn from the CLEAR tag set for dependency parsing. 6

5) SYNTACTIC CORRECTOR
The predicted POS tags and grammatical dependencies that are assigned to tokens are not always correct. Correcting POS tags or parse trees as a whole is an hard problem; however, some wrong labels can be easily detected. As a consequence, for the most trivial errors we automatically correct the labels, whereas in more complex cases we label the sentence as unreliable to avoid false positives in the relation extraction phase. The following corrections are applied: 5 https://spacy.io/usage/facts-figures#benchmarks 6 https://github.com/clir/clearnlp-guidelines/ • tokens that are heads of a direct object (dobj) or a nominal subject (nsubj) having a coarse-grained POS tag different from verb are assigned verb as a POS tag; • tokens that are heads of an adjectival modifier (amod) having verb as a coarse-grained POS tag are assigned adj as a POS tag, since they are in most cases past participles used as adjectives.

B. RELATION EXTRACTION
The syntactic preprocessing provides the information needed to extract biomedical relationships between entities from text. Following previous work in biomedical relation extraction, we assume entities are given. We rely on syntactic rules, in order to have a single system that can be applied to diverse corpora, completely removing the need of training data. Thus we fully exploit the dependency parse tree and the syntactic information encoded to each token. Our strategy involves a routing phase to detect candidate relations (Section III-B.1), and a classification phase to assess the relations, assigning them an effector and an effectee roles (Section III-B.2).

1) RELATION ROUTER
We analyze the minimum path of the dependency parse tree between entities to assess if the path is eligible for representing a candidate relation pair. We devise several rules to the goal, based on common linguistic constructs. The process of routing a syntactic path involves both the analysis of crossed edges (i.e., dependency relations) and node attributes (e.g., lemma, POS tags, etc.). Fig. 2 summarizes the workflow. The router stops immediately labeling the candidate relation pair as negative if one of the following conditions is met: The logic of the relation router. Rhombus shapes indicate tested conditions, while arrows indicates the router flow. If all the conditions are negative, the entity pair is considered a relation candidate. Otherwise, the entity pair is labeled as a negative relation instance.
1) Same entities. If the lemmas of entities are the same, the candidate pair is labeled as negative; 2) Ungrammatical text. In the case the input text has no verb if not in subordinate clauses, the pair is considered unreliable and thus labeled as a negative instance; 3) Unrouteable conjunctions. If conjunctions introducing subordinate or coordinate prepositions are met (i.e., but, whereas, if, therefore, and while), the entities are unlikely to be related, thus the pair is negative; 4) Unrouteable prepositions. The prepositions if, therefore, during, despite, from, and at typically introduce phrases that specify where -or when -a specific event occurs -or has occurred. If one of these prepositions is found in the path, the candidate pair is unlikely to be a relation and thus discarded (i.e., labeled as negative); 5) Clause routing constraint. Sentences in the biomedical literature are complex and articulated, with one or more coordinate and subordinate clauses. Entities in different clauses could be in a relation, but only under some conditions. We allow the router to cross a clause only if the target clause has no explicit subject dependency, and if the final path has exactly one subject. Otherwise, we consider the pair a negative instance; 6) More than one subject crossed. If more than a subject dependency relation is crossed we label the relation pair as negative, because the minimum path is typically crossing semantically independent phrases or clauses. For instance, in the sentence ''A causes B and C triggers a D-reaction'', the entity A is not related to the entity D; 7) Purpose-description statements. Some sentences express a broad research purpose (e.g., ''In this paper we aim to demonstrate that tuberculosis could be prevented by vaccines.''), instead of actual relations. When crossing the path between entities, the lemmas of the tokens is thus compared to a list of purpose-related words (Supplementary File 1, ''purpose_words''). If a match is found, the pair is labeled as negative. While crossing the path, the relation router also checks if the relation is affirmed or negated. This is particularly useful to detect actual associations for real-world use. Negations are detected using the following rules: 1) Negative auxiliary. A crossed token node is incident to an edge having a negation modifier dependency tag (neg), or is adjacent to a token node with no lemma; 2) Negative verb. One of the crossed verbs belongs to a negative meaning verb list (Supplementary File 1, Section ''negation_verbs''); 3) Negative adverb. A crossed token node is incident to an edge having a negation adverb as target (Supplementary File 1, Section ''negation_adverbs''); 4) Negative noun. One of the crossed nouns belongs to a negative meaning noun list (Supplementary File 1, Section ''negation_nouns''). 5) Negative adjective. One of the crossed adjectives belongs to a negative meaning adjective list (Supplementary File 1, Section ''negation_adjectives''). If the relation router navigates the whole path between the two entities without any of the routing conditions is met, the pair is considered a relation candidate and is analyzed by the relation classifier (Section III-B.2).

2) RELATION CLASSIFIER
The relation classifier analyzes the relation candidates the router identified, assigning the entities the effector and the effectee roles. We identified three categories of linguistic constructs that are typically used to express semantic relations in the English language. The categories are the following: • Relation expressed by a verb (R V ). A generalized version of the effector-relation-effectee rule proposed in [19] that we enhanced to capture constructs of the form: where a phrase can appear zero, one, or multiple times. As a result, the rule matches elaborate statements with interleaved phrases such as ''A plays a big role in B assimilation'' or ''abundance of A causes B degradation'', and not only triples of the form entity A -verbentity B .
• Relation expressed by a nominalization or a participle (R N ). Associations in the biomedical literature are often expressed by nominalizations or participles. We thus employ the following rules:  (1) and (2) are inspired by the relation-ofeffectee-by-effector and relation-between-effector-andeffectee proposed in [19], the rule (3) widens the scope of rule (1), and rule (4) allows the system to effectively handle nominalized adjectives expressing relations.
• Relation expressed by a conjunction (R C ). This category is designed to capture relations of entities that act together to do something, which are typically both subjects of a statement. We use the following pattern: entity A -conjunction-entity B -verb Example: ''A and B form a complex'' As a result, if the path between the entities contains a verb, we consider the candidate relation pair as a R V relation. The verb found in the path is considered the verb for the relation, and if multiple verbs are found, we take the last one in the text order. To assign the roles to the entities, we look at the verb voice. If the voice is active, the entity that appears first in the sentence is labeled as the effector, while the second one is labeled as the effectee. Otherwise, the first entity is labeled as the effectee, and the second entity as the effector.
In the case no verb is found in the crossed path, but it contains (a) a past participle, 7 (b) an adjective ending in ''ent'' (e.g., A-dependent B), or (c) a nominalized verb, we consider the candidate pair as a R N candidate. Additionally, we have to focus on the types of the edges crossed. During the routing we allow many edge types to be crossed, but a lot of them only exist in verb-expressed relations. Since a R N relation represents a more compact connection between the entities, it should not contain verbs (if not the participle form), nor both subjects and objects. To model this additional restriction in terms of edge types and paths, we check whether the minimum path between the two entities only contains certain types of grammatical dependencies. Beyond links expressed by conjunctions and prepositions, only modifiers, 7 This holds under some limitations: it should be incident to an edge with a npadvmod or an amod dependency relation. compounds, and appositions should exist (i.e., npadvmod, amod, compound, appos, punct, prep, pobj, or conj). If condition (a) or (b) is satisfied, the effector and effectee roles are assigned according to the text order, whereas if condition (c) is met, roles are assigned by analyzing the preposition connecting the nominalization and the entities. Specifically, the effector is the entity that does not have a preposition or, by as ancestor, whereas the effectee is the entity that has a preposition amongst on, of, or with as ancestor.
If the crossed path only contains a conjunction, the remaining part of the text is analyzed for R C relations. We check whether the top-level node of the path is incident to a verb node. In such case, we check if the verb lemma is interact or form, and if so, we consider the relation as a R C type. 8 Note that in the R C category the effector and effectee roles are not needed since both entities are interacting as both effectors.
Lastly, if all R V , R N , and R C categories are not satisfied, the candidate relation pair is labeled as negative.

IV. RESULTS AND DISCUSSION
We evaluate our relation extraction method on different benchmark corpora annotated for biomedical relations: LLL [36], IEPA [37], and HPRD50 [19]. The corpora are about different topics in biomedicine, thus they represent a good evaluation benckmark for our system for diverse real-world applications. In particular, LLL is a corpus about the model bacterium Bacillus subtilis, focused on gene transcription and sporulation; HPRD50 is about regulatory relations, direct physical interactions and modifications on documents from the Human Protein Reference Database [38]; and IEPA is a corpus focused on interactions between a restricted set of biochemicals (e.g., insulin, oxytoxin, leptin, etc.). Relations between entities are annotated within the sentence boundaries, and entities offsets are provided with the raw texts. Given a set of entities {e 1 , e 2 , . . . , e n } ∈ E belonging to an input sentence S, we generate n 2 candidate relation instances (if n ≥ 2) for the sentence S. Following previous work, negative instances are represented by pairs that are not annotated as relations in the corpora. The statistics of the corpora are summarized in Table 2.
For the sake of comparison to previous work, we evaluate our relation extraction method using precision (1), recall (2), and F1 score (3): where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives. However, we are mainly interested in the precision metric, motivated by a real-world application of the system. We compared our method to existing methods in literature, including rule-based approaches [19], [20], featureand kernel-based approaches [21]- [26], and neural network approaches [27]- [31]. Additionally, we compared our system to recent transformer-based methods pre-trained on biomedical texts, namely BioBERT [34] and SciBERT [35]. We fine-tuned both BioBERT and SciBERT on each corpus, reporting the average performance using 10-fold cross validation. We used the official implementation and optimal hyperparameters provided by the respective authors [34], [35]. Table 3 shows the performance of our system across corpora compared to other methods. Our system achieves the highest precision on all the corpora (93.2%, 90.7%, and 91.7% on LLL, HPRD50, and IEPA, respectively), outperforming by a large margin the BERT-based approaches in the precision metric while maintaining a F1 score comparable to other methods. The only exception is on the LLL corpus, where transformer-based methods and the ''DSTK & feature kernel'' approach achieve a very high F1 score. It is worth noting that our relation extraction approach also achieves the highest F1 score (81.1%) on the IEPA corpus, and differently from machine learning based systems, it does not need and rely on training data. Additionally, since our approach is a single system for all the corpora, it can be used as is on new data (see Section V), a typical requirement in industrial scenarios. These results strongly meet our expectations, since our goal was developing a high-precision system to allow researchers, and in particular biologists, to obtain reliable information without having to manually review the results and discard all the false positive instances. The relation extraction results on benchmark corpora can be further explored at: https://apps.cosbi. eu/high-precision-nlp-benchmark/.

A. ERROR ANALYSIS
To get additional insights on our approach, we analyzed both the false positives and the false negatives the system produces in order to make room for future work. A complete list of all the errors is provided in Supplementary File 2. We identified three sources of false positives, also summarized in Fig. 3: • Annotation inconsistencies. Most false positive results (i.e., 58.62%) are caused by annotation inconsistencies in the corpora. We found sentences in which a relation, on a grammatical basis, actually exists, but it has not been annotated. For instance, in the following sentence 9 : Several distinct mutations in exon2 of VHL disrupt binding of pVHL to TBP-1. a relation between ''VHL'' and ''pVHL'' has not been annotated even if it is stated in the text. This could be due to the complex mutation statement that is described, which may be considered a biomedical event; • Dependency parsing errors. The 20.69% of false positives is due to errors in the dependency parse tree. For instance, in the sentence 10 : A low level of GerE activated transcription of cotD by sigmaK RNA polymerase in vitro, but a higher level of GerE repressed cotD transcription. the verb ''activated'' has amod as the head relation label, denoting it is the adjectival modifier of ''transcription''; • Algorithm errors. Other sources of errors account for the 20.69% of the total, and are mainly due to articulated syntactic structures that our algorithm wrongly navigates. For example, in the following sentence 11 : These results clearly demonstrate that UCP3 gene expression is upregulated by TZDs in the WAT and BAT in Wistar fatty rats, an obese model with leptin receptor defect, and that adipose UCP3 gene expression is increased in response to TZDs in vitro. our system incorrectly identifies a relation between the biomedical entities ''UCP3'' and ''leptin''. False positive errors can be instead classified depending on both the relation category they have been tested on, and their cause. Fig. 4 summarizes the distribution of false negatives according to this classification. Particularly, the 87.50% of false negatives belong to the R V category, the 11.03% belong to the R N category, and the 0.74% fall into the R C category. The remaining 0.74% are cases that do not belong to any of the previous categories. For each category, the causes of false negatives we identified are the following: • Dependency parsing errors. In the 64.71% of the total cases, false negatives are caused by errors in the dependency tree of the sentence being analyzed. For example, in the following sentence 12 : We have shown previously that the transcription of degR is driven by an alternative sigma factor, sigmaD. 10 Corpus: LLL, sentence ID: d18. 11 Corpus: IEPA, sentence ID: d88. 12 Corpus: LLL, sentence ID: d26. ''sigmaD'' is labeled as an appositional modifier (i.e., appos) of the verb ''shown''; however, its head should instead be ''factor''. This results in a wrong structure that prevents our algorithm to correctly navigate the tree. We found this kind of error particularly prominent within the R V category (i.e., 65.55%) and the R N category (i.e., 66.67%). No errors of this kind are found in R C ; • Complex or unconvered grammatical structure. In the 25.00% of the cases, the grammatical structure of the sentence has more than one subordinate or coordinate clause, and it is not easy to route. To give an example of this latter case, we can look at the following sentence 13 : SpoIIID at low concentration repressed cotC transcription, whereas a higher concentration only partially repressed cotX transcription and had little effect on cotB transcription. where to identify the actual relation between ''SpoIIID'' and ''cotX'', the system should be able to figure out that ''higher concentration'' is actually referring to ''SpoI-IID''. However, this is far beyond the capabilities of our algorithm. While this false negative cause accounts for all the error within the R C category, it only accounts for the 23.53% and the 33.33% within the R V and R N categories, respectively; • Annotation inconsistencies. Similarly to the false positive analysis, false negatives could also be due to annotation inconsistencies. These errors account for the 8.09% of the total false negatives, and an example of this error type is exemplified by the following sentence 14 : The aim of this study was to investigate the effects of hCG, hCG plus oxytocin and oxytocin on [3H] inositol phosphate (IP) formations in porcine 13 Corpus: LLL, sentence ID: d27. 14 Corpus: IEPA, sentence ID: d17. myometrial cells obtained from ovariectomized and cyclic gilts. where ''oxytocin'' and ''inositol phosphate'', following the annotation standards of the corpora, are not actual relations, but instead statements about the purpose of the study. Fortunately, these errors are not common, representing only the 8.40% of the total errors in R V ; • Negation errors. The remaining false negatives (i.e., 2.21%) are due to errors by our negation detector. For instance, in the sentence 15 : From these results we conclude that ComK negatively regulates degR expression by preventing sigmaD-driven transcription of degR, possibly through interaction with the control region. our system misses the relation between ''ComK'' and ''degR''. This is due to difficulties in discerning negated relations from negative relations. This error type is only present within the R V category, accounting for a relative amount of 2.52% of the errors.

B. ABLATION STUDY
In order to provide additional insights on our method, we investigate the contribution of each rule category on the final performance of the system. In Table 4 we report precision, recall and F1 score on all corpora when R V , R N , R C , and negation rule components are individually removed. As expected, the negation rules are crucial to the precision of the relation extraction system. In fact, when removed the precision score decreases on all the corpora (-3.1%, -8.3%, and -6.6% on LLL, HPRD50, and IEPA, respectively). We also notice a small increase in the F1 score on the LLL corpus (+2.0%). This is due to the characteristics of LLL, which exhibit few negated relations with respect to the other corpora. When removing R V , R N , and R C , we obtain deeper insights about the importance of each relation category. For instance, the relation expressed by a verb (R V ) is by far the most important rule set. When removed, the precision increases on all the corpora (+6.8%, +3.7%, and +2.2% on LLL, HPRD50, and IEPA, respectively), while an important decrease appears evident in the recall metric (-58.0%, -58.4%, and -49.1% on LLL, HPRD50, and IEPA, respectively) and thus in the F1 score. This behaviour confirms that the R V category is the primary source of errors of our system, but also the mean of a tradeoff between a very high precision and a satisfying recall. We notice a similar but less pronounced trend when removing relations expressed by nominalizations or participles (R N ). On the other hand, the category of relations expressed by conjunctions (R C ) contributes a little on all corpora. Particularly, it improves the precision (+0.1%), the recall (+0.7%), and the F1 score (+0.5%) on HPRD50, whereas it decreases the precision (-0.4%) and the F1 score (-0.2%) on IEPA. 15 Corpus: LLL, sentence ID: d26.

TABLE 4.
Ablation study on the contribution of each rule type. We report precision, recall, and F1 score of the relation extraction system on all the corpora when each rule category is removed.

C. LIMITATIONS AND OUTLOOK
Despite the good results, we identified some limitations which could be tackled in future work. Our system is able to extract highly precise binary relations, however there are use cases in which it would be useful to extract high-order associations (i.e., relations of relations), making a relation the argument of another relation, or modeling relations with more than two arguments. These requirements go beyond the purpose of this paper since we have focused on relation extraction and gold-standard annotations proposed in literature. We thus plan to enrich our system with this enhanced representation in future work, following the recent trends in event extraction [8]. Another limitation is about the algorithm errors, and in particular some difficult cases we presented in Section IV-A. We decided to rely on a rule-based method instead of using a machine learning approach to have a high degree of control on the behavior of the system, and to avoid to depend on application-specific training data. We designed rules as general as possible, relying only on syntactic information thus avoiding to overfit to words or corpus-specific constructs. This is a strong point in favour of our approach, since we are able to use the same system with the same rules across multiple corpora, obtaining high performance on all of them without retraining it on new target data. However, even if we employed a general approach, there are cases the system still does not capture, and where a machine learning system can be complementary. We thus plan to combine the complementary power of both rules and machine learning methods in future work. An interesting research direction is to exploit our flexible rule sets in a postprocessing stage to refine the results of a neural relation extraction method.

V. CASE STUDY
We present a case study on the mTOR signaling pathway [39] in order to show how our relation extraction system can be used in an industrial scenario. We have queried PubMed and VOLUME 8, 2020 PMC to get all the relevant documents about the mTOR signaling pathway. The search has been performed ensuring the documents contain ''mTOR pathway'' in the title, abstract, MeSH (Medical Subject Headings) terms, or keywords, while asserting at least two proteins or genes belonging to the pathway -according to KEGG 16  In order to find the semantic relationships between the actors of the pathway, we have firstly searched the proteins and genes within the documents using a dictionary-based approach. Those entities have been looked up using the Aho-Corasick algorithm [40] with their common textual variants: (i) hyphenation: search the entity also without hyphens; (ii) Greek symbols or words: search the entity also with the corresponding uppercase and lowercase Greek symbols (''α'', ''β'', etc.) and words (e.g., ''alpha'', ''beta'', etc.), (iii) case: search the entity regardless of its letter case, and (iv) lemma: search the entity in its lemma form to abstract both the word person and the verb tense. 16 https://www.genome.jp/kegg-bin/show_pathway?hsa04150 Then, our relation extraction system has been used to find relevant associations of those concepts, resulting in 22,379 evidence sentences from the literature. We have also assigned the relation (i.e., R V , R N , or R C ) a label indicating a semantic category by taking its lemma, and looking it up in a manually curated biomedical lexicon comprising 4,600 verbs in a lemma form together with their categories (e.g., changed → AFFECTS). This resource has been manually curated by field experts [41], and refined by biologists in our R&D team (Supplementary File 4). Fig. 5 shows the resulting relation network, where nodes are the proteins and genes of the pathway, and edges represent evidence relations having more than 75 sentences supporting them (orange: ASSOCIATED_WITH , blue: AFFECTS, brown: MEASURES). It is worth noting that the knowledge base is not intended to be a biological ''network pathway'': a biomedical relation could in fact be stated even if two actors are interacting in the long run rather than directly. This is of particular interest to biologists, since the network is not restricted to show only evidence sentences about direct interactions. As a proof of concept, we hereafter present some associations identified by our relation extraction system:

VI. CONCLUSION
We presented a high-precision relation extraction system aiming to speed up the time-consuming process of the manual curation of semantic biomedical associations. Its rulebased design on syntactic dependency structures of texts gives the system the independence from specific training data, making it a one-for-all solution for industrial applications. Experimental results on gold-standard corpora showed that our method outperforms existing rule-based, feature-and kernel-based, and neural-based biomedical relation extraction approaches on the precision metric, while reaching a comparable or superior F1 score. Importantly, results indicated the high precision of our method is complementary to the high recall of transformer-based approaches, highlighting the need for more research on traditional linguistics-based methods. As a result, we met the requirement of limiting the expensive curation of the extracted semantic biomedical relationships to smoothly and reliably translate the extracted information into actionable knowledge. We plan to improve our methods by means of the richer representation of event extraction, exploiting the complementarity of both our rule sets and recent deep learning approaches, by blending them into a single system, one acting as a corrector of the other. From 2008 to 2010, he was a Researcher and a Developer with the Embedded System Unit at FBK (Fondazione Bruno Kessler), Trento, Italy, in the field of hardware formal verification. From 2012 to 2014, he was a Researcher and a Developer with the Research and Development unit of a simulation based engineering company in the field of artificial intelligence. Since 2016, he has been a Researcher in the field of systems biology, data analysis, and modeling and simulation of biological systems at Fondazione The Microsoft Research-University of Trento Centre for Computational and Systems Biology, Rovereto, Italy. His research interests include artificial intelligence applied to natural language processing, systems biology, model checking, and cyber security. He also holds the CEH (Certified Ethical Hacker) certification from EC-Council.
CORRADO PRIAMI received the M.Sc. and Ph.D. degree in computer science from the University of Pisa.
He held a postdoctoral position with a competitive EU Marie Curie Grant at the Ecole Normale Superiéure, Paris, from 1996 to 1997, a Researcher and an Associate Professor with the University of Verona, from 1997 to 2001, a Visiting Scholar with Microsoft Corporation, in 2004, a Visiting Professor at Stanford University, from 2016 to 2017, and a Professor with the University of Trento, from 2001 to 2017. He is currently a Professor of computer science with the University of Pisa, the Director of the Pisa node of the Stanford SPARK Global initiative, and has more than 20 years of academic and industrial experience in the application of computational technology for pharma and food companies. The results of his Ph.D. thesis on stochastic pi-calculus were the basis for the foundation of COSBI, that he led more than 12 years as the Founder, President, and CEO. He is currently the Founder and CSO of COSBI. In early 2018, he also joined Vydiant, a California-based, health tech company, as CTO. He served in the Senate of the University of Verona, in the BoD of the University of Trento, in the BoD of the Trento School of Management, and in the BoD of COSBI as Chairman of the board. He published over 200 scientific articles, gave more than 100 invited talks and lectures, regularly serves in advisory and scientific boards (including the Stanford SPARK program) as well as in reviewing panels for international funding agencies and institutions. He teached more than 2000 hours in the fields of programming languages and bioinformatics at undergraduate and graduate level. He supervised more than 100 people (students, Ph.D. students, and Post-Docs) of which about 40 are now in senior or research positions in academia and industry.
ROSARIO LOMBARDO was born in Phoenix, AZ, USA. He received the B.Sc. and M.Sc. degrees (both cum laude) in computer science from the University of Pisa, the M.B.A. degree in consulting management from the SP Jain School of Global Management (Dubai, Sydney, Singapore), and the Ph.D. degree in bioinformatics from the University of Verona.
He is currently the Head of the Bioinformatics, Fondazione The Microsoft Research-University of Trento, where he has been supervising several research projects in collaboration with pharma and nutrition companies, driving the innovation towards industrial scientific research. Among his research interests are deep learning and text mining, enabling technologies for complex quantitative systems pharmacology models such as visual modeling, high-performance simulations. He was a Lecturer with the Universities of Pisa and Trento and has been mentoring several intern/B.Sc./M.Sc./Ph.D. students. In over 15 years' experience in business, scientific, and technological consulting, he was coach to a number of colleagues and has managed cross-functional, international projects in Pharma, Nutrition, Academia and Banking. He has entrepreneurial experience in tech-enabled ventures. VOLUME 8, 2020