Named Entity Extraction for Knowledge Graphs: A Literature Overview

An enormous amount of digital information is expressed as natural-language (NL) text that is not easily processable by computers. Knowledge Graphs (KG) offer a widely used format for representing information in computer-processable form. Natural Language Processing (NLP) is therefore needed for mining (or lifting) knowledge graphs from NL texts. A central part of the problem is to extract the named entities in the text. The paper presents an overview of recent advances in this area, covering: Named Entity Recognition (NER), Named Entity Disambiguation (NED), and Named Entity Linking (NEL). We comment that many approaches to NED and NEL are based on older approaches to NER and need to leverage the outputs of state-of-the-art NER systems. There is also a need for standard methods to evaluate and compare named-entity extraction approaches. We observe that NEL has recently moved from being stepwise and isolated into an integrated process along two dimensions: the first is that previously sequential steps are now being integrated into end-to-end processes, and the second is that entities that were previously analysed in isolation are now being lifted in each other’s context. The current culmination of these trends are the deep-learning approaches that have recently reported promising results.


I. INTRODUCTION
Knowledge Graphs (KG) [1]- [3] were introduced to wider use by Google in 2012 to precisely interlink data that could be used, in their case, to assist search queries [4]. In a KG, the nodes represent either concrete objects, concepts, information resources, or data about them, and the edges represent semantic relations between the nodes [5]. Knowledge graphs thus offer a widely used format for representing information in computer-processable form. They build on, and are heavily inspired by, Tim Berners-Lee's vision of the semantic web, a machine-processable web of data that augments the original web of human-readable documents [6]. KGs can therefore leverage existing standards such as RDF, RDFS, and OWL. In practice, however, KGs tend to be less formal than the ontology-based semantic web promoted in [6].
At the same time, an abundance of information on the internet is being expressed as natural language (NL) text that is not easily processable by computers. Making all this text available to computers requires Natural Language The associate editor coordinating the review of this manuscript and approving it for publication was Victor Hugo Albuquerque .
Processing (NLP) and other types of information extraction techniques. One of the central challenges is to identify the entities mentioned in the text. These entities can be either named entities that refer to individuals or abstract concepts.
Because entities can be represented as nodes in KGs and relations as edges, KGs are a natural way of representing NL text in computer-processable form. This paper therefore presents a literature overview of recent advances in one research area that is central for lifting NL texts to KGs: that of extracting named entities from texts, or Named Entity Extraction (NEE) [7]. NEE is a task that involves recognising the mention of the named entity in the text (NER), disambiguating its possible references (NED), and linking the named entity to an object in a knowledge base (NEL).
In this fast-moving area, the paper presents an overview of how the research front has moved over the last 5 years.
To identify suitable articles we have searched six digital libraries: ACM, IEEE, Science Direct, Springer, WoS, and Google Scholar. Papers that investigated approaches to named-entity extraction, were published in 2014-2019, and had sufficient quality were then considered further, whereas low-quality or out-of-scope papers were excluded.
We searched for papers that explored at least one of the main tasks (NER, NED, and NEL) on the path from NL text to KG. A total of 362 articles were identified based on abstract filtering, and 89 papers remained after closer screening. These remaining main papers were categorized as either NER, NED, or NEL core papers according to the task they focused on. We used snowballing to identify a few additional papers referenced by other main papers.
The rest of the paper is organized as follows: Section II presents the background for our work. Section III constitutes the bulk of the paper and reviews recent approaches to lifting named entities in NL texts to KGs. Section IV discusses the current state of the art, before Section V concludes the paper and suggests further work.

II. BACKGROUND
This section explains the most central concepts used in the paper.

A. NATURAL LANGUAGE PROCESSING
Natural-language processing (NLP) attempts to enable computers to process human language in meaningful ways [8]. For example, [9] describes NLP as ''an area of research and application that explores how computers can be used to understand and manipulate natural-language text or speech to do useful things''. NLP is commonly used to derive semantics from text or speech and to encode it in a structured format that is suitable for semantic search [7] and other types of computer processing. Many well-established techniques and tools are already available for this purpose, such as GATE [10] and other NLP pipelines; IBM Watson 1 and other NL analysis and lifting services; NLTK [11], Stanford CoreNLP [12], DBpedia Spotlight [13], and other NL programming APIs; OpenIE 2 , MinIE [14], and other information extraction tools.

B. KNOWLEDGE GRAPHS
A knowledge graph (KG) represents semantic data as triples (i.e., as ordered sets of terms) composed as (s, p, o): subject s, predicate p, object o, that can be either IRIs (Internationalized Resource Identifier) [15] i ∈ I , blank nodes b ∈ B or literals l ∈ L, so that s ∈ I ∪ B, p ∈ I and o ∈ I ∪ B ∪ L [16].
The IRIs used as subjects, predicates, and objects can be taken from well-defined vocabularies or ontologies in the Linked Open Data (LOD) cloud [5], [17], [18] that attempt to define their meaning as precisely as possible. The literal values used as objects can be represented using well-defined data types, such as the ones defined by XML Schema Definition (XSD). Through the use of terms with well-defined meanings for types, instances, and relations, the knowledge graph aims to describe the semantics of real-world entities and their relations precisely and to link the descriptions to further information in semantic LOD repositories [18]. 1 https://www.ibm.com/watson 2 https://nlp.stanford.edu/software/openie.html

III. REPRESENTING NAMED ENTITIES IN KGs
A named entity is an individual such as a person, organization, location, or event. A mention is a piece of text that refers to an entity.
As already mentioned, extracting named entities from texts and representing them as nodes in KGs involves three main tasks: Named Entity Recognition (NER) attempts to find every segment in a text that mentions a named entity. Named Entity Disambiguation (NED) attempts to determine which named entity a mention refers to; for example, the mention ''Trump'' can refer to either a person, a corporation or a building. Named Entity Linking (NEL) attempts to provide a standard IRI for each disambiguated entity; for example, Trumpthe-president can be linked to the IRI that represents him in Wikidata: https://www.wikidata.org/entity/Q22686. NED and NEL are closely entwined, because an ambiguous entity must be disambiguated before it can be linked and because an IRI is a good way to represent the result of disambiguation. Therefore, we will often discuss NED and NEL together. Figure 2 shows the resulting sequence of tasks from NL to KGs. The figure also shows the more recent end-to-end (also called zero-shot) approaches, which lump together all three tasks, typically using deep neural networks. These approaches usually rely on standard pre-processing steps, which we will review first, before we go on to the other tasks.
The appropriate choice of pre-processing technique depends on the lifting technique to be used. For example, removing stop words might be good for a Bag-of-Words (BOW) based approach or for a model that does not consider word order, but deep-learning approaches might leverage stop words to disambiguate entities that have different meanings. Recent contributions indicate that robust NED and NEL systems require accurate tuning of several prior steps, especially tokenization and semantic similarity [27]. Recently, deep neural networks, in particular the end-to-end approaches, have reduced the need for pre-processing steps. Using deep neural networks for pre-processing tasks such as tokenization has also produced promising results [28].
Data quality plays a key role in selecting the most suitable pre-processing technique too. For example, most gold-standard datasets do not require the same pre-processing as raw-web or real-time streaming data, from which cleaning and normalization are needed to remove unnecessary or noisy terms (like emojis, currency symbols, hashtags, and so forth).

B. NAMED ENTITY RECOGNITION
NER was first introduced in [29]. According to [30], the purpose of NER is to identify named entities contained in a text like persons, locations, organizations, time, clinical procedures, money, biological proteins, etc.
NER is a rapidly evolving research field [31]. Most of the proposed approaches have been domain-specific, limiting themselves to for example news, reviews, etc. We divide them into three main categories following [31], as shown in Figure 3: • The first, and earliest, category is the rule-or knowledge-based approaches [32], [33]. Most studies in this category are based on hand-crafted rules [31], [33].
The advantage of such methods is that they do not require annotated training data since they rely on lexical resources. Another advantage is that the precision of handcrafted methods can become high because of the lexicons and domain-specific knowledge. The disadvantage is that this also makes them domain dependent [30], that lexicon resources may be unavailable, and that constructing and maintaining such resources for many languages is costly [34].
• The second category of NER systems is the learningbased approaches [35]- [37]. These models are used to replace the human-curated rules needed by the first category. The methods in this category can be divided into three types: supervised, semi-supervised, and unsupervised. In supervised and semi-supervised methods, a machine-learning model is trained on input examples together with their targeted outputs. Support Vector Machines (SVM), Hidden Markov Models (HMM), Conditional Random Fields (CRF), and decision trees are common in this category [30]. NER accuracy is sometimes limited by the used classifier. For example, when HMM and SVM are employed, the dependencies among words are not considered. The unsupervised and bootstrapped methods [32], [38], [39] are more automated, although they need a minimal training dataset (seeds). Although these methods do not require as much effort as the first category, seed data is still needed for training. Moreover, they do not benefit from the feature inference of the third category below [31].
• The last, and most recent, category is the featureinferring neural-network approaches [40]- [45]. They rely on machine learning like the previous category, but they differ from the rule-and learning-based approaches by automatically inferring features through deep learning. Recent research reports that they thus outperform earlier methods [30], [31], [46]. Unlike the above-mentioned approaches, they do not require seeds, ontologies, or domain specific lexicons and are thus more domain-independent. They also benefit from the precision of their inferred features. On the other hand, big datasets are needed to build robust models. Although numerous NER studies have been reported recently, few have been used for semantic lifting or received attention in KG research. There may be at least two reasons. The first reason is that NER is an initial step for many other taskssuch as sentiment analysis, concept extraction, event identification and so forth -that have received more attention than semantic lifting. The second reason is that semantic lifting research has focused on the KG-construction side, leaving the NER task to off-the-shelf APIs and tools. However, using standard systems suffers from configuration restrictions and makes the combination of or switching between different NER solutions more difficult [47]. This paper will not review NER approaches in greater detail, because they are already well summarized in recent surveys [30], [31], [31]. However, to the best of our knowledge, these state-of-the-art NER methods [40]- [44] have not yet been used in pipelines that lift NL to KGs. Hence, there is a need to exploit their outputs to achieve more precise and complete lifting.
The rest of this paper will use a running example of the main lifting tasks on three input texts. We assume that the text has already been normalized and pre-processed. In the first stage, the NER task extracts all the entity candidates in the text as shown in Figure 4.

C. NAMED ENTITY DISAMBIGUATION
Based on how they rank candidate entities, NED approaches can be classified into three categories:

1) TRADITIONAL NED APPROACHES
Traditional NED studies typically use hand-designed features to calculate the similarity between the mention and its candidate entities [48], [49]. These studies can be further subdivided into independent (or individual) and collective approaches.
• Independent approaches [13], [50] use semantic similarity techniques to rank candidate entities solely according to their lexical similarity and/or empirical co-occurrence with mentions. In these methods, each mention is independently disambiguated, and they consider entity disambiguation as a ranking problem that picks the entity with the highest confidence. The confidence value is scored by combining hand-designed features extracted from the mention's context (surrounding words) with textual descriptions of the candidate entity, for example using text about it from Wikipedia. Different similarity measures have been proposed, such as BOW-based cosine similarity. Because hand-designed features tend to contain little textual information, the accuracy of such approaches decreases in complex cases [49]. Moreover, these methods fail to catch interactions between mentions in the same document [51].
• Collective approaches [19], [52]- [54] also rely on semantic similarity, but they take into account that what is mentioned in the same (part of a) text tends to be about the same topic and that co-occurring entities should therefore often be semantically related [19], [27].
Most commonly used collective approaches thus establish the associations between candidate entities and build the mention-entity pairs using a probabilistic graph approach. AGDISTIS [52], Babelify [53] and TagME [55] use graph connectedness to exploit semantic relations between disambiguation candidates VOLUME 8, 2020 FIGURE 5. Example of a NED task.
described in a knowledge base (KB). In fact, feature representations of entities and mentions are the key factors for most of these NED approaches. Most of them rely on BOW [52], [53], which, in general, has some shortcomings, such as ignoring word meanings and expensive computation [50]. The methods thus do not exploit latent features in mention contexts and candidate descriptions [51]. A well-matched entity group can be identified using either random walks [56], Pagerank [54], [57], or dense sub-graph computations. Although the collective approaches perform more robustly than the independent ones, computation costs grow rapidly as the numbers of mentions and the lengths of documents increase. [58], [59] proposes a simplified collective pair-linking approach, which resolves the candidate entities pair-by-pair in order to decrease computational cost and complexity. In hand-crafted feature-based approaches, it is difficult for NED systems to fully leverage the inherent semantics of mention contexts and entity descriptions, which is pivotal for NED accuracy. This is due to the limitations of hand-designed features, which capture lexical information based on the surface form of the text [49].

2) NEURAL-NETWORK (NN) APPROACHES
In recent years, neural-network (NN) based methods [51], [58], [60], [61] have become more common and achieved competitive results. One family of approaches map words into continuous vector spaces using word2vec or similar models [62] that comprise more semantic information than traditional BOW representations. Another family uses deep NNs to learn the latent semantic features automatically. NED accuracy is thereby enhanced along with the model's generalization ability. The majority of the earlier NN-based approaches [63], [64] grant all the words in the mention context equal importance, which is adequate for many practical cases [49], [58], [65]. More recently, attention mechanisms [58], [65]- [67] have been introduced to assign graded importance. However as mentioned in [49], most of these methods only apply attention to mention contexts and omit the entity side. Also, they only apply attention to a single aspect of knowledge, which may not be sufficient in complex circumstances, for example with high noise contexts or less popular entities. Recently, [49] proposed a multi-perspective attention NN model to enrich mentions and entities representation in different perspectives, to capture more informative features and improve accuracy. Empirical evaluations show that NN-based approaches are effective in NED systems [49], [51], [61], [65].
In Figure 5, the running example text contains disambiguation candidates such as ''Washington'', ''Trump'', ''Apple'', ''Hussein'' and ''Bush'', calling for NED to be applied. The task becomes more challenging because mentions must often be classified on several levels. An example is ''Bush'' where the NED system must both recognize whether it refers to a person or organization and, in the first case, recognize whether it refers to Bush Jr. or Sr. Another challenging mention is ''Hussein'', which the NED system has to recognize in a context that mentions ''IRAQ'', ''Bush Sr.'', and ''Missiles''.
The accuracy of existing NED systems is still far from perfect, especially when dealing with short texts that have little context [19], [27]. Pangloss [27] is a system for entity disambiguation and linking of noisy text. It uses a semantic similarity engine that depends on context-dependent document embeddings. It also leverages a local database to store its metadata and statistics, which enables fast disambiguation when processing streams and on low-memory devices such as mobile phones.
Few papers have employed KGs for NED tasks [19], [68]. [19] proposes to exploit existing KGs with associated texts based on unsupervised semantic similarity. The study has compared graph-based and corpus-based approaches to unsupervised similarity-based NED on real-world datasets. The study uses WordNet [69] for graph-based semantic similarity [70] and word embeddings from word2vec [62] for corpus-based semantic similarity. The category2vec model [19] is proposed to learn vector representations of words and their categories jointly in a shared vector space without depending on labeled datasets. The study reports that graph-based semantic similarity approaches are better than corpus-based approaches when the contextual information is limited. Hence, semantic similarity features can be used to complement existing approaches.
Most previous studies have focused on unstructured text to learn representations of entities and neglect the useful structured data provided by the KB itself. However, [68] proposes a method that leverages the structured data in the KB for NED, using graph embeddings to integrate structured data from the KB with unstructured texts. The results suggest that graph embeddings learned from a graph of hyperlinks between Wikipedia articles can improve NN-based NED systems.

3) JOINT NER AND NED
Many previous studies have dealt with NER and NED in two separate steps. The consequence is that NED may not use all the information provided by NER [71], [72]. Hence, information about entity types and confidence [72] that might be useful for both tasks is not shared. Also, weak NER precision may decrease subsequent NED accuracy. Named Entity Recognition and Disambiguation (NERD) therefore deals with NER and NED jointly [24], [71], [72].
The JERL model [72] utilizes the mutual dependency between NER and NED. If NER's confidence is high for both entity types and boundaries (contexts), it will encourage NED to link the consistent entity with NER's outputs, and vice versa. The results show improvement of both tasks.
TwitterNEED [24] supports NERD of short and informal tweets. The extraction method focuses on high recall. An SVM is then used to filter out false positives using features from the disambiguation phase and the KB. For each extracted mention, a list of entity candidates is obtained from the YAGO KB along with the top-ranked pages from a Google query. An SVM is then used to rank the candidates based on context similarity features and a set of URLs. This increases accuracy without substantial harm to recall. Results are better compared to DBpedia Spotlight [13], Stanford NER, and the AIDA 3 disambiguation system. [73] presents an empirical study of NER and NED on short texts like tweets. J-NERD [71] uses a probabilistic graph model to perform joint NER and NED. It captures mention types, spans, and the linking of mentions to their entities in a KB. It then captures dependency parse trees for each sentence and derives both non-standard features about domains and novel features from the parse trees. In contrast to [24], information about uncertainty is retained for later steps.
Earlier joint methods, such as NERD-ML [74] (more below), relied on a dedicated extractor for each specific language. More recently, [75] has proposed a multilingual ensemble that combines multiple extractors for joint NER and NED, where the ensemble idea is to combine the output of several alternative components into a single and presumedly better result, for example by averaging or voting. The ensemble produces a list of entities with their types and disambiguation links (called ground truths). The outputs of the NERD extractors are represented as real-valued vectors, which are then used as input to two ensemble neural networks (one for NER and one for NED).

D. NAMED ENTITY LINKING
NEL annotates each mention in a text with the IRI of its corresponding entity as described in a KB in the LOD cloud. Although some NER approaches too annotate entity mentions, they are restricted to the type level, using a predefined set of a handful or a few dozen types such as persons, locations, organizations, and their subtypes. The number of entities available in a KG, on the other hand, can be in the millions. For example, DBpedia describes more than 14 million entities, and Wikidata more than 59 million. Figure 6 illustrates the main lifting tasks on three input texts. First, NER extracts all the mentions in the text. Then NED ranks the disambiguation candidates. Finally, NEL maps each mention to an entity in the LOD cloud.
NEL's mention-entity mapping can be considered a ranking problem that reduces the number of candidates by assigning a weight to each possible entity. The NED and NEL tasks are closely related: the former finds which entity a mention like ''Bush'' refers to, and the latter provides the LOD IRI for that entity. Differences are that NED does not have to deal with unambiguous mentions that refer to a unique entity and that NEL also has to deal with NIL entities (also called dark or emerging entities) that have no entry in the KB. Nevertheless, many studies make no distinction between NED and NEL, and recent contributions have proposed joint-learning and end-to-end methods that perform NER, NED, and NEL together [19], [48].
Regardless of type, NEL has three main sub-tasks: candidate-entity generation, candidate-entity ranking or disambiguation, and NIL clustering, where the middle step, candidate-entity ranking, resembles disambiguation. First, candidate-entity generation aims to retrieve all possible entities in the KB that may refer to an entity mention. Then, candidate-entity ranking aims to rank the candidate entities and return the most likely one for each targeted mention. Finally, NIL clustering deals with those mentions that cannot be matched with an entity in the KB [48]. We can group the majority of proposed approaches according to the NEL sub-tasks they cover, as shown in Figure 7. The figure also summarizes the main techniques proposed to tackle each step. As can be seen, NIL clustering has so far received less attention than the other sub tasks.
Although Figure 7 shows that many NEL approaches [84] have been proposed, few of them have dealt with all three sub-tasks. NEL approaches can therefore also be classified into: disambiguation-only methods and end-to-end methods. The disambiguation-only approaches focus on the second NEL step. They assume that all mentions and candidates have already been generated and link mentions to their corresponding entities in the KB. End-to-end and joint approaches deal directly with an input text and aim to extract all candidate mentions and link them to their corresponding entities in the KB. Most early contributions were disambiguation-only approaches, with end-to-end approaches being proposed more recently along with the emergence of deep neural networks. Based on the literature we have reviewed [84], Figure 8 proposes a general model of NEL with an overview of the central sub-tasks and techniques.

3) DISAMBIGUATION-ONLY NEL METHODS
Many empirical evaluations have been performed on stateof-the-art (SOTA) NEL methods using a variety of datasets. Figure 10 and Table 1 summarize the evaluation results for the most important methods on the most commonly used dataset: AIDA-CoNLL. [93], followed by [94]- [96], achieved the best   Early NEL methods as evaluated by [76]. results for disambiguation only; whereas [97] achieved the best end-to-end results followed by [98]. Table 2 shows that there is no perfect NEL model for all datasets. The best model for one dataset may perform poorly on others. An example is the SGTB-BiBSG model [99], which performed well on the WNED-CWEB dataset but not on the others. Only a small number of models performed best on more than one dataset. One example is WSRM [100], which was the best performer on two datasets (Reuters128 and RSS-500). Another is [94], which outperform the others on the AQUAINT and ACE2004 datasets, but not on any others. Models that perform well across many different types of dataset are called for.
The majority of studies neglect the relations between entity types and entity context. Joint mention and entity embeddings have therefore been proposed to take them into account [19]. An overview of early approaches to joint NER and NEL of long and short texts is provided in [101], [102]. Recognizing that NER and NEL were the focus of different research communities (i.e., of the NLP and the semantic-web communities, respectively), [101] proposes the NERD-ML approach to combine them. NERD-ML combines the strength  of crowd-entity extractors with web-entity recognizers and machine learning.
Some previous contributions used frame semantics along with semantic role labelling, such as PIKES [103] and FRED [104]. The PIKES system [103] for extracting knowledge from NL text has two phases. First, several NLP techniques are combined to integrate entity mentions into a single RDF graph. Then, the resulting mentions graph is processed with SPARQL-like mapping rules to produce a KG organized around semantic frames.
FRED [104] is a frame-based machine reader that produces OWL/RDF representations of texts. It integrates the outputs of existing NLP tools and LOD resources such as Boxer [105], BabelNet 4 , and DBpedia to expand extracted tacit knowledge. FRED uses TagME 5 , which uses deep sentence parsing for NER and NEL, combined with Wikipedia context for NED. Although FRED [106] is reported to be flexible and usable without specific tuning, the reported precision is not high. Reference [107] proposes a hybrid approach for NER and NEL that outperforms FRED [106] by 20% in F-measure. The approach combines linguistic and semantic features and uses them to recognize and index IRIs from DBpedia. The aim is to increase recall in the recognition task and then prune candidates later.
Entity Extraction and Linking (EEL) [20] is an OpenIE-based approach that employs thematic roles to link relation phrases with known properties used on the semantic web by integrating an ensemble of alternative NER and NEL systems. EEL handles both cases where entities are duplicated because they have the same text fragment and IRI and cases where two or more entities are overlapping because they share the same text fragment [20]. The authors demonstrate that using multiple entity-linking systems improve extraction accuracy. Moreover, the named entities associated with noun phrases maintain the coherency of RDF data.

4) END-TO-END NEL METHODS
The majority of previous studies assumed that mentions and entities were already available and focused only on the disambiguation process, neglecting mutual dependencies between mentions and their entities. But a practical solution must cover all extraction phases. To overcome this, end-to-end entity linking has received increasing attention recently, inputting raw text and aiming to extract all its mentions and link them to their entities in a KB. Although few end-to-end studies have been published so far [53], [66], [71], [108], interesting recent examples include NN-based end-to-end linking models such as [97], [98], [109], [110].
Reference [98] proposed the first NN-based end-to-end linking system to do joint mention detection and entity disambiguation, in order to capture the dependency between both tasks and reduce propagation of errors from NER to NEL. They consider all word spans that might mention an entity and use word and entity embeddings to compute a context-aware compatibility score for ranking the candidates. The authors demonstrate that engineered features are almost unnecessary when using end-to-end approaches. Their model reaches SOTA results on the AIDA/CoNLL dataset and, when combined with Stanford NER, it generalizes well to other datasets with different characteristics.
The end-to-end, trainable Neural Collective Entity Linking (NCEL) model [109] addresses the data sparsity of local contextual features when resolving entities independently. It applies Graph Convolutional Networks on subgraphs of the entity graph. NCEL thus learns features from both local and global information. It is trained on Wikipedia hyperlinks using an attention mechanism to deal with noisy data.
Ment-norm [97] is an end-to-end system for NEL that considers relations as latent. Representation learning was used to learn relation embeddings, eliminating the need for extensive feature engineering. [110] proposes a Stack-LSTM network model for joint NER and NEL. Sharing information between the NER and NEL tasks, NER suggests the entity type (e.g., ORG, PER, LOC) which is then used to better disambiguate the entity during NEL. The results of using the joint model are compared to using NER and NEL separately, showing that the joint model outperforms the individual ones. Tables 3 and 11 summarize the most relevant endto-end NEL approaches and results for the most commonly used dataset (AIDA-CoNLL). The best end-to-end results were produced by the approach presented in [97], followed by [98]. Many other evaluations have been done using other datasets, as shown in Table 4. As for the disambiguation-only approaches, there is no perfect method for all datasets, and approaches that produce good results on a particular dataset do not perform well on others. Examples are the NCEL model [109] that outperforms others on the WNED-WIKI dataset, and Deep-ed [66] that reports good results for AQUAINT. Moreover, Ment-norm [97] is the best end-to-end model for the MSNBC and ACE2004 datasets, as shown in Table 4. Possible reasons include the nature of the datasets and the variation of the type and amount of data used to train each model. Developing models that perform well accross different types of datasets remains a challenging task.
Recent NN-based approaches for NEL using joint embedding and end-to-end learning have performed best so far.  For real-word applications with more open domains and dynamic environments, end-to-end approaches look promising, although there are issues that need to be considered. One issue is that their accuracy drops dramatically when trained on or applied to small datasets [98], so that extensive datasets are required for robust models. Other issues are the training of contextual embeddings for entities and extending the approaches to become cross-lingual [110]. Furthermore, the end-to-end approaches have not yet taken the NIL problem sufficiently into account.
Several popular LOD KBs have been used as a targets of NEL systems. The most popular ones are DBpedia, Freebase, Wikidata, and Wikipedia. DBpedia is a semantic (RDF) extract of Wikipedia and the two contains almost the same entities [76], [84], [111]. Wikidata is the Wikimedia foundation's knowledge graph, intended as a central, crowd-sourced knowledge base for feeding Wikipedia's fact boxes and other Wikimedia projects. Freebase is an older and now defunct large, crowd-sourced knowledge graph.

IV. DISCUSSION
Many approaches have been proposed for all or some of the tasks involved in lifting NL texts to KGs. In the early stages, NLP has played a prominent role, providing techniques for dealing with NL text, including various pre-processing tasks. NLP techniques have also been used for NER, which recognizes and classifies mentions in the text. In the later stages, AI, ML, KG and LOD techniques have become important. They are used by NED-and NEL-methods to disambiguate the mentions and link them to knowledge-base resources that represent entities. Recent deep-NN techniques have covered both the early and late stages. VOLUME 8, 2020
We have only found a few studies that investigate more than one language: NewsReader [112] which considers English, Dutch, Spanish and Italian; [118] which considers English and Dutch; and [119] which considers morphological languages such as Turkish and Russian. Hence, dealing with multilingual texts in general has so far received little attention [75], perhaps in part due to the limited availability of training datasets for languages other than English. Recently proposed multi-and cross-lingual approaches such as [112], [119] can train a model on a richly-resourced language and then apply it to a more sparsely-resourced one. Other approaches to multi-and cross-lingual NEL [93], [120]- [122] need to be re-trained for every new language and, compared to the single-language approaches, accuracy is still weak [121]- [123]. One reason may be that environment and structure change from one language to another. Another possible reason is that some of these studies, such as [123], use automatic translation from the source to the target language in order to benefit from its available resources, making their accuracy dependent on translation quality. Most of the proposed studies focus on either NER, NED, or NEL [121]. A cross-lingual approach that covers all parts of named-entity extraction is called for [124].

B. DOMAINS
The majority of previous studies have focused on general news due to the abundance of datasets [23], [112], [125], and a few of them focus on news for specific domains such as IT [20] and sports [126]. Besides news, there is research dedicated to domains such as software and source code [127]- [129], cyber security [130], and reviews [23]. Despite the abundance of training data available (albeit mainly for English), the datasets are restricted to certain domains. Hence, establishing a stable NL lifting system for a new domain is effortful and expensive. Only a few papers [124], [131]- [133] have investigated transfer learning and domain adaptation, in which the model makes use of labelled training datasets available in a richly-resourced domain (such as news) and transfers the learning to recognize entities in another, more poorly-resourced domain (such as biomedicine). The few systems proposed so far suffer from low accuracy.

C. TEXT TYPES
In general, most of the proposed methods are optimized for either short or long texts. Few studies have considered both. The limited set of papers that discussed short texts like tweets [19], [23], [24], [134], micro posts [125], [135], SMS texts [22], and chats [27] have reported that dealing with shorter texts is harder and more challenging [22].
Only a very limited number of published studies have dealt with real-time data and even fewer have been run in a real-time environment [21].

D. EVALUATION
Lack of standard evaluation methods makes it difficult to encourage improvement of the field based on systematic comparison the strengths and weaknesses of the different approaches. Although comparing approaches across tasks, languages and domains will always be difficult, repeatable and reliable methods are required to evaluate and compare methods that have similar aim and scope.
Concerning evaluation tasks, four aspects need to be considered: the ability of the system to recognize entity mentions; the ability of the system to correctly assign types to mentions; the ability of the system to identify the intended entity referred to by a mention; and the ability of the system to link mentions to entities in a KB.
Concerning experimental set up, only a few studies have used the GERBIL [125], [136] framework for benchmarking NER and NEL systems. GERBIL outputs comparable evaluation results that can help researchers and developers discover the strengths and weaknesses of their approaches compared to the state of the art. It tackles the main issue of NEL evaluation and clarifies how two IRIs could be compared to each other and evaluated without being limited to a particular KB. However, GERBIL does not offer any additional explanations (such as error analysis) for each mention [120].
Concerning gold standards, the majority of NER and NEL models are evaluated using manually-created gold standards, which has some drawbacks according to [137]. First, the current gold standards do not share a common set of rules concerning what should be recognized and linked as an entity. Second, most gold standards frequently include mistakes because they have not been verified by other scholars. Third, many gold standards become outdated when the reference KBs that entities have been linked to evolve over time but the gold standards are not updated to reflect the most recent version of the reference KB [137]. [84] points out that, to evaluate NEL on non-popular entities or on specific domains, benchmark datasets that link to domain-specific resources and long-tail entities are needed.
Concerning validation, although tenfold cross validation is most commonly used in NLP applications, some studies have used different cross-validation settings [24]. This makes comparison more difficult and potentially unfair.
Concerning metrics, most of the selected studies have used the standard evaluation metrics for NLP tasks: precision, recall, F-measure, and accuracy.

E. NLP TOOLS AND APIs
Many NLP tools and APIs have been used in NL-to-KG pipelines. For pre-processing, the Stanford Core NLP [12] is the most commonly used [20]. Other tools include spaCY 6 , TextBlob 7 , NLTK 8 , OpenNLP [138], the Pangloss NLP pipeline [27], TEXTPRO for Italian language [25], GATE ANNIE [21], and NLP-Ce [28]. Most of them are restricted to a single or a limited number of languages.
In view of the research published on NER, NED, and NED, the available tools have not kept pace with the most recent developments. They also mostly focus on a single task, such as NER or NED, with only a few covering more than one. Moreover, most of the tools are built for only one or a limited number of languages, and they are hard to adapt to new settings, among other things due to: dependence on commercial tools; choice of programming language; code that is hard to understand and extend; and tight coupling with a particular KB. Hence, it can be challenging to adapt existing systems to build domain-specific applications [21]. According to [120], each tool constructs its KG differently. Consequently, it is challenging to utilize those tools for linking from different KBs. Table 6 lists the most commonly used tools along with the tasks they support.
Recently, [145] have evaluated three of the most commonly used tools (Ambiverse (known as AIDA previously), Babelfy, and TagMe) and reported that Babelfy and Ambiverse achieved the best result with slightly lower recall in Ambiverse. Nevertheless, their precision were at most 81% while their recall did not exceed 68%. Although the study did not include a popular system like DBpedia Spotlight, the results suggest that these systems still perform far from the state of the art. Figure 12 shows the results of evaluating 15 common tools over a wide selection of available datasets using GERBIL 14 . Figure 12 shows the results with strong annotation match setting, while Figure 13 depicts the results with weak annotation match. VOLUME 8, 2020

F. MAIN FINDINGS AND OPEN DIRECTIONS
NER results impact later tasks such as NED and NEL [146], [147], which in turn are highly affected by the type and quality of the documents used for training [146]. KG researchers have been focusing more on the KG-construction side and have paid less attention to the NER task, which is usually done by off-the-shelf systems or established tools. They integrate such NER tools mostly internally using the available APIs in their code. With such integration, re-configuring those external tools or switching between them becomes difficult [47]. Despite recent deep NN-based approaches improving the state-of-the-art NER results, they have not yet been leveraged to lift NL to KG as mentioned earlier.
Most NER systems are designed for a limited set of entity types (usually people, organizations and locations) and cannot be easily reconfigured to support other types of entities [20], [34]. Transfer-learning approaches that tackle this shortcoming is a promising direction for future research [148], [149].
Approaches to NED can be classified into individual, collective, and recent NN-based approaches. Of these, the NN-based approaches appear to be most effective [51], [58].
Our paper has defined NEL as a wider task that includes NED as one of its sub-tasks, but this is not a generally accepted distinction, and many researchers are using the two initialisms synonymously. Many NEL approaches use off-the-shelf systems for NER. However, choosing the best NER-model to use for NEL is still a challenging task due to the hardness of comparing the dataset used to train the system and the dataset that needs to be processed [47]. Unless end-to-end methods are used, obtaining good NEL results requires improving and making use of the most recent NER approaches [120].
Reference [120] discusses several NEL issues. One is that gold standards can contain wrong annotations (such as U.K instead of UK) and missing annotation links. Another concern is that KBs can contain bad redirects. Moreover, most NEL tools confuse regions and cities that have the same names (e.g., New York State or New York City, and Valencia the region, province, or city).
Recent studies indicate that the accurate tuning of several steps, especially tokenization and semantic similarity, is necessary for robust NED and NEL systems [27]. Deep NN-based NLP methods [40]- [44] currently outperform the earlier approaches [28], [147]. [150] argue that all recent NEL models neglect three essential points. The first one is that mentions and entities are correlated, particularly in semantic features. The second one is that entities and relations are equally important and tightly connected basic units in the KB. The third one is that the attention mechanisms in recent neural-network approaches, which can potentially filter out irrelevant information from contexts, have been neglected in entity-embedding construction [150].
NEL difficulty varies not only between datasets but also between mentions, where datasets and mentions that are difficult to link often share characteristics like short documents, many entities per document, many salient entity types, and many entities in total [145].
Recent contributions that lift NL into KGs demonstrate that the integration of entity-extraction and -linking systems improves extraction accuracy [20]. Moreover, associating named entities with more descriptive noun phrases preserves meaning when combining entities from multiple sources. [20]. However, although the disambiguation-only NEL methods tend to beat end-to-end approaches today, the latter may be a more promising direction with potential for further improvement. End-to-end NEL is particularly promising for open domains where there are no gold standards available.
Reference [151] calls for a genuine semantic-web reference evaluation framework to assist the research community. Lifting approaches are usually assessed using tasks that do not focus on specific semantic web and KG aims. Of course, general tasks such as NER, NED, NEL, and relation extraction are important, but but they are usually designed without evaluating the output as knowledge graphs, Linked Data or OWL ontologies [151].

V. CONCLUSION
The paper has presented an overview of recent advances in a central task for lifting NL texts to KGs: that of named-entity extraction. We see this as an important step towards making the abundance of NL information on the internet available as computer-processable KGs. We conclude with a few general observations: Many approaches for lifting NL to KG are based on previous-generation NER methods, and new lifting approaches are needed that add disambiguation and linking to best-of-breed NER techniques. There is also a lack of standards for comparing extraction approaches. This can partly be attributed to a lack of commonly accepted evaluation methods, but it also a consequence of the recognitiondisambiguation-linking pipeline. For example, it is hard to fairly compare pure NER with combined NER-NED-NEL approaches when the latter is restricted to identifying named entities in the KB that is used for disambiguation and linking.
NEL has moved from being a pipeline of clear and isolated steps into an integrated process in two main ways. The first is that traditional sequential steps are now being integrated by joint learning and end-to-end processes. The second is that mentions and entities that were previously analysed in isolation are now being lifted in each other's context. The current culmination of these trends are the deep-learning approaches that have reported promising results in recent years.
In our future work we plan to develop complementary overviews that also cover lifting of general concepts and of relations between entities. Many recent approaches also lift relations jointly with entities (both named entities and concepts), emphasising the need for a unified lifting framework that is not restricted to named entities. A comparative evaluation of state-of-the-art methods in the same environment using the same settings and datasets is another research direction.
TAREQ AL-MOSLMI received the B.Sc. degree (Hons.) in computer science from the University of Science and Technology, Yemen, in 2009, the master's degree in computer science and the Ph.D. degree from the Universiti Kebangsaan Malaysia (UKM), Malaysia, in 2014 and 2018, respectively. He is currently a Postdoctoral Fellow with the University of Bergen, Norway. His research interests fall under natural-language processing, machine learning, semantic and knowledge graphs, text and web mining, and sentiment analysis.
MARC GALLOFRÉ OCAÑA received the B.Sc. degree informatics engineering and the M.Sc. degree in innovation and research in informatics from the Polytechnic University of Catalonia-BarcelonaTech. He is currently pursuing the Ph.D. degree with the University of Bergen. His current research includes knowledge graphs and big-data software architectures to support journalistic knowledge platforms. His past researches fall under information systems strategic planning and informational analysis of university rankings. He had been previously working as a business intelligence consultant in different sectors like sports, finance, and emergency services.
ANDREAS L. OPDAHL received the Ph.D. degree from the Norwegian University of Science and Technology, in 1992. He is currently a Professor of information systems development with the University of Bergen, Norway. His research interests include ontologies and knowledge graphs, enterprise, IS modeling, and safety and security requirements. He is the author, a coauthor or the Co-Editor of more than a hundred peer-reviewed research articles that have been cited more than 3700 times. He is also a member of IFIP WG5.8 on enterprise interoperability and WG8.1 on design and evaluation of information systems. He serves regularly as a Reviewer for premier international journals and on the program committees and as an Organizer of the most renowned international conferences and workshops in his fields of interest.
CSABA VERES the received the Ph.D. degree in cognitive science from the University of Arizona, but migrated to computer science, from 1999 to 2000. He is currently an Associate Professor with the University of Bergen, Norway. Since then, he has worked as a Computer Scientist. His main interest has always been semantics, and he carried this into computer science. His areas of expertise include NLP, machine learning, and semantic web. He has experience as an Academic and Practitioner. He has published original research articles, consulted with technology startups. He has popular linked data apps on the Apple store, called MapXplore and AuotoMind. VOLUME 8, 2020