Analyzing the Scientific Evolution of Face Recognition Research and Its Prominent Subfields

This paper presents a science mapping approach to analyze thematic evolution of face recognition research. For this reason, different bibliometric tools are combined (performance analysis, science mapping and Co-word analysis) in order to identify the most important, productive and the highest-impact subﬁelds. Moreover, different visualization tools are used to display a graphical vision of face recognition ﬁeld to determine the thematic domains and their evolutionary behavior. Finally, this study proposes the most relevant lines of research for the face recognition ﬁeld. Findings indicate a huge increase in face recognition research since 2014. Mixed approaches revealed a great interest compared to local and global approaches. In terms of algorithms, the use of deep learning methods is the new trend. On the other hand, the illumination variation impact on face recognition algorithms performances is nowadays, the most important and impacting challenge for the face recognition ﬁeld.


I. INTRODUCTION
Face recognition (FR) and its applications have become part of our daily lives. When using a biometric passport to cross a border, when using social networks (especially those based on the use of photos), when shopping in certain stores in China, automatic FR is used.
FR is one of the most active research themes in the computer vision field, as shown by the numerous scientific articles published each year in this domain (For example in Web of science data base: 316 documents published in 2010 versus 862 documents in 2019). This interest is due to the high number of applications using this technology and the wide availability of cameras and photos containing faces. FR can be applied in several areas: 1) Access control: the human face can be considered as a biometric signature, hence FR can be used to validate a person's identity. This technology has an advantage over other access control techniques, since it doesn't require any physical contact with the device, unlike, for example, access control with fingerprint. [1]- [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Zahid Akhtar .
2) Criminal investigations: FR can be used to find and validate a suspect's identity at the crime scene, using images from surveillance cameras or sketches as described by witnesses [4]- [6]. 3) Wanted persons identification support: real-time FR using surveillance cameras increases the security level in public spaces. It can be used to detect wanted persons and it has proved to be a very practical tool for law enforcement to neutralize suspects [7]- [9]. This increase in scientific research activity has led to an overall improvement in algorithm performance, in recent years. Indeed, recognition accuracy exceeds 95% in certain cases [10]- [13]. However, the number of published papers in this research area keeps increasing each year, because of unresolved challenges.
Various FR algorithms have been proposed to achieve high accuracy rates and to solve known problems in this research area. Most of these algorithms are composed of a series of sub-algorithms which offer the researchers several possibilities to improve them. Due to this diversification of choices and challenges, FR domain is fragmented, which makes it difficult to obtain single access to this research topic.
Furthermore, it is difficult to have a global vision of this field. Recommendations and analyses for future researchers are not complete. For this reason, exhaustive reviews are needed to integrate contributions and provide a critical perspective in this area. In fact, several works [14]- [16] have been published in this direction to analyze this research field, by addressing the major challenges, classifying the approaches, providing comparisons between these methods, and giving some recommendations for future work in the field of FR. However, no quantitative analysis was used in any of these studies. For example, these studies do not allow to fragment a research field in an exhaustive way, nor to find links between challenges, methods and techniques used. Also, they do not allow to evaluate the efforts made by the researchers on each of the axes of this field, nor the impact of these studies on future works. Accordingly, science mapping, 1 Co-word analysis 2 and performance analysis (quantitative analysis) are necessary to examine the sets of terms shared by the documents, mapping the literature from the interaction of key terms, and showing the evolution of the FR field. In this regard, the science mapping approach presents the structural and dynamic aspect of scientific research [19] and is a spatial representation of how disciplines, fields, specialties and individual papers or authors are related to one another.These methods focus on domain monitoring and research area delimitation to establish the research cognitive structure and evolution. Indeed, this is done by determining the continuity of these fields over consecutive subperiod of time and by analyzing the evolution of their performances [20]. This longitudinal study based on co-words [21] allows to analyze the evolution of research subjects, and a longitudinal study based on co-citations allows to analyze the continuity of the intellectual base. Additionally, it detects the most important productive and impactful sub-areas.
Consequently, this article presents a general approach to analyze quantitatively the FR research field, by combining performance analysis and science mapping to detect and visualize conceptual subdomains [19], [20], in order to examine the concept's evolution and the impact of the research themes in the FR domain.
The rest of the paper is organized as follows: Section 2 provides a general overview of the latest reviews covering the RF theme. Section 3 presents the methodology and data collection. Then a presentation of the analysis and the results is discussed in Section 4. Finally, a conclusion is made in section 5.

II. FACE RECOGNITION: A LITERATURE REVIEW
Several studies have been performed to compare FR approaches and methods to solve major challenges or improve the performances of these systems. These studies can be divided into several categories according to the authors' concern.
A. GENERAL CONTEXT Some works have focused on FR on a general standpoint. [14] Zahid et al. provides a state-of-the-art analysis of FR algorithms, focusing on their performance on public databases. The work highlights the impact of image database conditions on the recognition rate of each approach. Finally, it gives researchers the ability to choose more easily the algorithm for a specific FR application. [22] Chihaoui et al. divided the 2D FR methods into three categories: Global approaches, Local approaches, and Hybrid approaches. The authors present an overview of some well-known methods in each of these categories. A comparison between FR techniques is provided. In addition, the databases used in FR are listed, and some results of the application of these methods on FR databases are presented. [23] Hassabalah and Aly reviewed current FR achievements and examined many challenges and key factors that can significantly affect the performance of FR systems. The use of FR technology in other scientific applications and daily life was proposed. Several research directions to improve the performance of advanced FR systems are also recommended for future work.

B. SPECIFIC FR CHALLENGE
Other works have focused on a specific FR challenge to give it more attention. [24]) Ochoa-Villegas et al. deal with the FR uncontrolled illumination challenge. For that they classify the algorithms into two categories: relighting and unlighting. Relighting methods attempt to match the probe's illumination conditions using a subset of representative gallery images, while unlighting methods try to eliminate variations. The author's present the best methods for both categories that can be useful to determine research directions. [25] Abdurrahim et al. present an extensive and focused survey that covers recent research on demographic covariates (i.e., race, age, and gender) on FR performance. The authors examine and summarize the effect of age, gender, and racial covariates on FR. In addition, suggestions on the future direction of the field are made to fully understand these effects individually and their interactions with each other.
[16] Dagnes et al. provide a review of the methods of 3D FR to handle the problem of partial occlusions. The datasets used to evaluate a various techniques are presented. Moreover, the comparison of recent approaches is presented, and some conclusions and recommendations are suggested. The most studied and tested occlusions are those caused by free hands in front of the face and eyeglasses. The occlusions caused by scarves, caps and other accessories remain the major challenges to be solved for this category. [26] Wang et al. provide a comprehensive overview of the methods used to recognize faces from low resolution images with varying pose expression and illumination. In their work the authors classify the methods in two categories, super-resolution for LR FR and resolution-robust feature representation for LR FR. The concept descriptions for each approach is presented. Their strategies advantages and disadvantages are also highlighted. [15] Ouyang et al. provide an exhaustive review of techniques to solve Heterogeneous Face Recognition (HFR) problems. The authors present the state of art, methodology and datasets in HFR across multiple modalities including Photo-Sketch, Visible-NIR, 2D-3D and High-Low Resolution. Different methods are listed and analyzed to give the best approaches of each modality. An identification of common themes is also carried out to establish links between the different research communities on HFR, and to identify challenges in this area and orientations for future research.
Some other studies related to FR have focused on age and gender estimation. [27] Panis et al. present an overview of the research works on facial ageing impact on FR using the FG-NET database. An analysis of published articles using the FG-NET database is performed and the benchmark results are presented. The authors summarize the obtained results to provide roadmaps for future trends and an orientation for future research in facial ageing. [28] Choon-Boon et al. present a review of facial gender recognition, focusing on 2D computer vision approaches. The challenges involved are highlighted, which can be divided into human factors, image conditions and qualities. The authors examine the approaches and the dataset used for evaluation of gender classification performance.

C. REVIEWS INTERESTED IN TECHNIQUES OR APPROACHES
Other works have focused more on techniques or approaches to improve the performance of FR systems. [29] Kasar et al. present a review of the studies published in the literature on FR using Neural Network approaches. For that, the authors explore various architectures, algorithms and databases for training or testing images. In addition, they measured the performance of FR systems used in each study. [30] Sharma and Patterh provide an extensive survey of feature extraction and recognition methods for FR applications. They evaluate feature extraction techniques for various FR methods to draw a summary diagram, and to select the technique with the best accuracy. [31] Hongjun Wang et al. present a review of feature extraction framework for robust FR. More than 300 papers regarding face feature extraction are collected, analyzed and categorized into four components: filtering, local features, feature encoding, spatial pooling and holistic feature processing. Each component is analyzed and applied in a task with multiple levels. Also, they provide a brief review of methods using deep learning networks. Finally, a detailed performance comparison of various features on LFW and FERET face database are provided. [32] Tian and Wu present a review of compressive sensing (CS) methods employed for FR. These methods are grouped into four categories: Sparse representation classification (SRC), the method using the sparsity idea in CS theory, the combination of kernel trick and SRC, and the method based on sparse preserving techniques. The results are summarized and analyzed to obtain the advantages and inconveniences of each approach. [33] Plichoski et al. bring a survey of Swarm Intelligence and Evolutionary Computation applied in 2D FR systems. The authors analyze the key techniques and approaches used and summarize the obtained results to suggest an orientation for future research.

D. SPECIFIC SUBJECTS
Finally, various works cover more specific subjects. [34] Blanco-Gonzalo et al. analyze the usability and the accessibility of FR systems used by visually impaired people. A comparison between the FR algorithms is provided in terms of performance and time spent in the process, which are critical aspects for this case. [35] Phillips and O'Toole provide a comparison between human and computer performance across FR. The cross-modal performance analysis (CMPA) framework is used to analyze performance across methods. The results of the analysis can be divided into two categories: 1) Frontal faces on still images; 2) Video and difficult still face pairs. For the first category the algorithms are always more efficient than humans. However, for the second, humans are better. [36] Sepas-Moghaddam et al. address the issue of the vulnerability of FR systems to presentation attacks. For that a review of methods are presented in the literature on light field based face presentation attack detection solutions. Finally, the approaches are assessed in terms of accuracy and complexity.

III. METHODOLOGY
In this work, we mainly employed the software SciMAT (Science Mapping Analysis software Tool) [37]. It is an open source software used to perform science mapping analysis on a research topic, based on a longitudinal approach. This tool has a graphical user interface and integrates algorithms, methods and measurements for all stages of the science mapping process, from the pre-processing to the results visualization [19]. This software combines performance analysis and scientific mapping tools, to analyze a research domain, to detect and to visualize its conceptual sub-domains (specific themes / themes or general thematic domains), as well as their evolution through the different subperiods studied [20]. SciMAT combines modules needed to achieve the scientific mapping workflow. From downloaded files, this software allows to load article information to build a database and automatically detect duplicate elements. Furthermore, this tool proposes several techniques for data normalization such as Jaccard's index, inclusion index, equivalence index, association strength and Salton's cosine. Different bibliometric measures based on citations are employed to build science maps enriched, such as: h-index, g-index, hg-index and q2-index. Finally, the tool allows to visualize the results in the form of strategic diagram, cluster network, overlapping map and evolution map.
Figure1 shows the workflow used in this work. The first step is to recover the data. The raw data is collected from the VOLUME 10, 2022  The second step consists in preprocessing the data using the tool ''SciMAT''. The goal of this operation is to detect duplicate documents, to regroup similar keywords, and to define time-slicing periods (1991-2003, 2004-2006, 2007-2009, 2010-2012, 2013-2015, 2016-2018 and 2019-2021), in this study periods of 3 years have been chosen in a perspective, to use a few years not too long to have variability, and not too short to have stability. In general, in literature studies, the last 3 or 5 years are often used. In this study we used the keywords (Authors keywords, journals keywords) presented in the documents as the basic elements of analysis. These keywords are used to extract a direct link between documents and references, this relationship is illustrated in a strategy diagram and an evolution graph. The third step consists to conduct a normalization process to build relationships network. Similarities between the items are calculated, using the frequency of keywords' co-ocurrences. Different measures can be used, in our case the Salton's Cosine and the Jaccard indexes are applied. The fourth consists in identifying research issues or areas of interest for the research community using clustering algorithms on keywords data resulting from previous step. For that, many clustering algorithms can be used to build the science map. In this work the principal component analysis is used. The fifth step is dedicated to extrating useful knowledge and measuring the relationship among the detected clusters of keywords, by carrying out a network analysis. The sixth step is the visualization of results. Each detected cluster (considered as research themes) is characterized by two parameters: centrality and density (Eq.1,2). These measures are used to have two different visualization instruments: strategic diagram and thematic network.
A strategic diagram as shown in fig.2 is a 2D graph built by plotting research themes based on their centrality and density values, the x-axis represents centrality and the y-axis the density.
Centrality (Given in Eq.1) measures the strength of external links with other themes. It gives an indication of the importance of the theme in the development of the entire research field analyzed. k is a keyword belonging to the cluster and h is a keyword belonging to other clusters.
Density (Given in Eq.2) measures the strength of internal links between all keywords defining the research theme. It can be considered as a measure of the evolution of the theme. i and j are the keywords belonging to the cluster, and n is the number of keywords in the cluster.
Once the visualization is performed, it is necessary to interpret the graphs and the generated results. The themes presented in the strategic diagram [19] are classified into four groups: 1) Themes in the quadrant 1 (Fig. 2) are the welldeveloped and also the most important for research field structuring. They are known as the motor-themes of the specialty, given that they present strong centrality and high density. 2) Themes in the quadrant 2 ( Fig.2) have well developed internal links, but unimportant external links and are therefore marginally important to the field. These themes are highly specialized and peripheral in nature. 3) Themes in the quadrant 3 (Fig. 2) are important for a research field, although they are not developed. Thus, this quadrant contains transversal and general basic themes. 4) Themes in the quadrant 4 ( Fig. 2) are both underdeveloped and marginal. They have low density and low centrality representing mainly emerging or disappearing themes. As shown in fig.3 thematic evolution can be considered as a bipartite graph, it allows visualizing the evolution of the themes through subperiods. The periods are presented by column, and the different thematic areas are linked from one column to the next column. Thus, a thematic area is defined as a group of themes evolving across different subperiods. Each theme is plotted as a sphere and labeled with the name of the most significant keyword in the cluster. These themes are linked together through subperiods by lines.
In thematic evolution (Fig. 3) the interconnection between two themes ''conceptual nexus'' indicates the relationship between them. The solid lines mean that the linked themes share elements other than the name of the themes. A dotted line means that the themes share elements that are not the name of the themes. The depth of the edges is proportional to the inclusion index [19] and the volume of the spheres is proportional to the number of published documents of the themes.

IV. RESULTS ANALYSIS
In this section, the results obtained from this bibliometric study are analyzed. 3 The research activities in the field of FR experienced a growth in the early 2000s to reach a peak 3 It should be noted that the extraction of the information was done at the end of 2021, therefore all the bibliometric informations (number of citations, Hidex) are updated until that date (end of 2021).   Fig. 4) -8898 papers published. This growth in interest for this field is due to the high number of applications of this technology, the increasing availability of images containing faces and the challenges that have remained unsolved until now. The first growth period, until 2005, is explained by the availability of the largest face databases during this period (for example: FERET, ORL, AT&T, Yale Face, UMIST, AR, PIE, MIT-CBCL, CMU (for more details refer to [38])). At the start of 2010 many countries decided to boost the security of their cities by installing surveillance cameras. In China, this vision can be illustrated clearly by the Skynet project launched in 2011 to equip cities with video surveillance systems (1.1 million cameras were installed in 2012, to reach 200 million cameras installed in 2018), which explains the second growth. Also, the emergence of smartphones and social networks at this same period has pushed the scientific community to make more effort to meet this huge need.
This global growth tendency in research activity in the field of FR, is not uniform on a geographic standpoint, as can be seen in Figure5. China has become within 15 years an overwhelming actor in this technological field generating more than 50% of all articles since 2015 while it only produced VOLUME 10, 2022 and 282 from India (11%). China's activity in terms of publications has grown by 575% on this periode of time, while the USA activity only increased by 65%. On the same periode, england and South Korea have experienced a growth of research activity, of the same order: +147% for england (from 45 to 111 articles) and +205% for Korea (from 37 to 113 articles). Another key fact is the emergence of India as a leading actor in the FR field. In the 1990s India was responsible for only 1% of the total research publications, while it generated between 2019-2021 11%, it became the second most active country in this field of research. It experienced an increase by +755% between the 2008-2010 and 2019-2021 periods. Research in the FR field is now an asian leaded activity, since this continent generated more than 69% of total publication in the 2019-2021 period. Across all periods, the countries most involved in scientific research in this field are: China, the United States and India (see figure 7). These participations represent 75% of the articles published worldwide.

A. FR THEMES VISUALIZATION
In the following, the results of the subperiods analysis are presented as a strategy diagram and tables containing quantitative and impact measures for each subperiod.
In the subperiod 1991-2003, a total of 491 documents (Fig. 6) in the FR theme are considered.
Looking at both the strategic diagrams (Fig. 8) and the quantitative measures (Table 1), we can observe that (i) the motor themes, PCA and CLASSIFICATION received a high citations and had the highest impact (high h-index scores); (ii) the basic and transversal theme, FEATURES received many citations and had a great impact later; (iii) a specific topics, TRACKING received a few citations and had the lowest impact; (iv)The emerging themes, 3D received a few citations and didn't have a big impact;   The (Table2) lists the 3 most important documents of this subperiod by theme.
In the subperiod 2004-2006, the number of documents published in the FR theme has seen a marginal increase, be a total of 808 documents (Fig. 6). According to (Fig. 9) and (Table 3) it can be observed that (i) the motor themes, PCA and FEATURES are the most cited and they present the highest impact; (ii) the basic and transversal themes, CLASSIFICATION had many citations and a great impact; (iii) a specific topics, TRACKING and POSE present the lowest impact; (iv)The emerging themes, SVM received few citations and had limited impact.    (Table 9) lists the 3 most important documents of this subperiod by theme.
In the subperiod 2007-2009, the FR theme has seen a slight decrease in terms of published documents. A total of 665 papers were published, either an decrease of 17% compared to the previous subperiod (Fig. 6). In this subperiod, according to (Fig. 10) and (Table 4), as with the previous subperiod, the motor-themes are the most cited and have the greatest impact. Also the emerging and specific topics have       subperiod (Fig. 7). In this subperiod, according to (Fig. 11) and (Table 5), the themes of the FR field were increasingly developing and motor themes were becoming more and more numerous. Same as previous subperiod, the motor-themes are the most cited and had the greatest impact, especially LDA and PCA themes, while the specific topics have the lowest impact (LOW-RESOLUTION).
The (Table 11) lists the 3 most important documents of this subperiod by theme.
In the subperiod 2013-2015, the FR theme received even more attention from the scientific community. A total of 1436 documents are published, either an increase of 51% compared to the previous subperiod (Fig. 6). In this subperiod, according to (Fig. 12) and (Table 6), as with the previous subperiod, the motor-themes were the most cited and had the  biggest impact, whereas, the emerging and specific topics have the lowest.
The (Table 12) lists the 3 most important documents of this subperiod by theme.
In the subperiod 2016-2018, the FR topic kept attracting the scientists' focus. A total of 2013 documents were published (Fig. 6). In this subperiod, according to (Fig. 13) and (Table 7), we can observe that (i) the motor themes, SPARSE-REPRESENTATION, DIMENSIONAL-REDUCTION, EIG-ENFACES ILLUMINATION, PCA, 3D and GABOR received a high citations and had the highest impact; (ii) the basic and transversal themes, PCA, HYPERSPECTRAL and REGULARIZATION received a lot of citations and had a powerful impact. Thus, the emerging and specific themes had the least impact.  The (Table 13) lists the 3 most important documents of this subperiod by theme.
In the subperiod 2019-2021, the scientific community's interest in the FR theme continued to grow. A total of 2538 documents were published, either an increase of 26% compared to the previous subperiod (Fig. 6). In this subperiod, according to (Fig. 14) and (Table 8), as with the previous subperiod, the motor-themes were the most cited and had the biggest impact, especially FEATURES, SPARSE-REPRESENTATION and ANN themes. Also the emerging and specific themes had the least impact.
The (Table 14) lists the 3 most important documents of this subperiod by theme.
In general, it can be remarked that in all the subperiods studied, the motor-themes achieved the highest number of documents, citation scores and impacts. It is logical to find CLASSIFICATION, EIGENFACES, LDA, FEATURES and PCA as the motor themes, they receive more attention and citations because they present the basic of the FR algorithms. In contrast to TRACKING, HETEROGENEOUS, FUSION and 3D, which represent very specific challenges or new techniques.

B. EVOLUTION OF THE FR THEMES
In this sub-section, the thematic evolution of the FR research field is analyzed, using the components of each thematic and their evolution through the subperiods. According to the table of themes for each subperiods (Tables: 1, 3 That is the FR research field evolves through the time. So, there are some themes that appear from one subperiod to the next, others that disappear and others that remain in all the subperiods. For example, the keyword CLASSIFICATION and ILLUMINATION appear in all the subperiods. However the keyword HYPERSPECTRAL,   HETERONEOUS, FLD and DEEP-LEARNING only appear in the last three subperiods.
The thematic evolution of FR research field is shown in Fig. 15. As previously mentioned the solid lines signify that   the related themes share the same name. A dashed line signifies that the themes share elements other than the theme name.
The edge thickness is proportional to the inclusion index, and the spheres' volume is proportional to the number of VOLUME 10, 2022  published documents in each theme. The Thematic Network as shown in (Fig. 16, 17 , 18, 19, 20, 21,22) presents the composition and evolution of each of the themes across the subperiods. Each theme is formed by the assembly of several keywords with a strong link between them, and labeled with the name of the most significant keyword in the group. The volume of the sphere is proportional to the number of documents published using the keyword, the thickness of the link between two spheres is proportional to the number of documents using both keywords. As according to  was a basic and transversal theme, then it became an emerging theme, after that it became a motor-theme in ''2013-2015'', it maintained its importance during the last two sub-periods by associating at the beginning with the 3D theme and after that with FEATURES.
• The themes LOW-RESOLUTION appeared during the subperiod ''2010-2012'' in the specific theme category, and composed of (LOW-RESOLUTION and SUPERESOLUTION). They remained in the same category for the following subperiods with a slight increase of published documents. However, the H-index did not show any progress.
• The HYPERSPECTRAL theme appeared in the subperiod ''2013-2015'' after the democratization of hyperspectral cameras for large public in this period, but this use did not involve a large part of the scientific community.This theme was composed of (HYPERSPECTRAL and FUSION), this theme was divided into two themes HYPERSPECTRAL and FUSION for the next subperiod to join again in the same theme for the last sub-period.
• The theme HETEROGENEOUS appeared in thesubperiod ''2013-2015'', it is composed of HETERO-GENEOUS and FORENSIC-SKETCH, which shows a strong interest in using forensic sketch to identify persons in real image databases, compared to other heterogeneous modes of use such as the infrared or multispectral image fields. This theme appeared in the category of specific topics, with a very low number of published papers and citations.The interest of the scientific community for this theme is not consistent in time, the theme has disappeared from the active themes of the RF during the subperiod 2016-2018 to re-appear again in the subperiod 2019-2021 with the same status of subperiod 2013-2015.
• The DEEP-LEARNING theme formed by the keywords (DEEP-LEARNING, ANN and CNN) appeared in the subperiod ''2014-2016''. the scientific community started to use and apply this technique in the field of FR, with the emergence of new framework in this subperiod to ease the development and deployment of this technology, like Caffe in 2013 [270], and Tensorflow [271] and Keras in 2015 [272]. This theme has seen a huge growth in terms of the number of documents published in the following subperiod, it is considered as a specific theme but it has a high potential in the following years.

V. CONCLUSION
In this paper, the topic of Face Recognition (FR) was analyzed using bibliometric tools, including science mapping analysis, Co-word analysis, and performance analysis techniques. In a first step we have retrieved all the documents published in the FR theme through Web of Science database. Then we used the Scimat tool to detect, quantify, and visualize the evolution of a Research Field. Finally, we analyzed and interpreted the results. A growth of interest in the FR field, reflecting a gradual increase in the number of studies published in international journals since 2010, particularly an increasing of 59% over the period 2014-2016 and of 31% over the last period 2017-2019. Therefore, it can be concluded that the theme is still relevant and attracts increasingly the attention of the scientific community. The researchers have used the local and global approaches equitably for the first periods of study, then the local approaches received more attention, and finally the mixed methods became the trend. The deep learning methods are applied on FR field from the 2014-2016 period, and they have an enormous potential, it represents the biggest evolution in terms of documents shared among all the FR theme over all the periods of our study, representing an increase of 938% between the last two periods (2014-2016 and 2017-2019). On the other hand, the Illumination challenge has preserved the same interest throughout the periods of study, and still has a great impact on the theme. It can be considered that it represents the most interesting challenge to solve, in addition to the challenges Pose and Heterogeneous sources. At last we can also highlight a very big increase of china's activity in this field since 2008, which is responsible for more than 50% of all published articles since 2014.