Evolution of Publications, Subjects, and Co-Authorships in Network-on-Chip Research From a Complex Network Perspective

The academia and industry have been pursuing network-on-chip (NoC

the scalability and bandwidth requirements essential for then system-on-chip (SoC) designs featuring tens or even hundreds of cores. In response, a few new interconnection architectures and technologies were considered and emerged. Noticeably, on-chip packet-switched micro-network of interconnects, later coined by the academic and industry communities as network-on-chip (NoC) architecture, stood out and has been accepted as a viable solution to on-chip interconnection after 20 years of active research and development. More than ever, NoC plays an even more important role in today's CPU-and GPU-based many-core chips (e.g., Epiphany-V has 1024 64-bit RISC cores), and particularly so, in emerging neural network accelerators with tens or even hundreds or thousands of cores (e.g., 400, 000 AI cores exist in Celebras) that need to communicate with each other.
Set to provide scalable, high bandwidth, and low latency interconnection for various SoC systems, NoC actually encompasses a wide array of research topics and paradigms, ranging from those inherited from traditional computer networking (e.g., topology, routing, etc.) to the latest ones that are aligned with machine learning applications and emerging technologies. Over its twenty years of timespan, the entire area of NoC has gone through three phases and stages, as shown in Figure 1: the start-up (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007), the growth and shakeout (2008)(2009)(2010)(2011)(2012)(2013), and the mature and stable (2014-2020), broadly measured by the numbers of publications and new authors per year.
Over the past twenty years, a number of good survey papers covering one or a few specific aspects of NoC, including application mapping [1]- [3], to topology design and routing algorithms [4], to fault tolerant NoC [5], to optical NoC [6], [7], to low power design [8], [9], NoC for QoS [10], [11], testing [12], simulators [13], and reconfigurable network [14], and many others, have been published. Since these survey papers only focus on specific research subjects/topics, by human experts and based on the authors' own knowledge of the subjects, these surveys did not provide an overarching and objective view of the NoC, and the surveys did not quite follow a data-driven analysis method to quantify the dynamics of NoC research and make a sound prediction of what is lying ahead.
Based on a methodology that marks a significant departure from that was adopted by the existing NoC survey papers, this paper instead conducts a data-driven analysis of the evolution of publications, subjects, and co-authorships of the NoC area from the perspective of complex network [15]. To this end, we construct three networks, the citation network, subject citation network, and co-authorship network, that can represent three key elements of NoC research, namely publications, subjects, and authors. Furthermore, we examine the network structures (e.g., network diameter, cluster coefficient, average degree, etc.) and the dynamics of these networks. This exercise shall not only help us quantify the progress made in NoC, the shift of research topics over the time, and career development of both new and established NoC researchers, but also create an opportunity to dig out the FIGURE 1. Numbers of publications in NoC area from Y2000 to Y2020. (b) Numbers of new authors entering the NoC area from Y2000 to Y2020. An author is recognized as a new author in the year when his/her first publication got published or made publicly accessible. Name ambiguity is handled by checking the names and the affiliations altogether.
reasons behind these accomplishments and changes that have happened in the past 20 years. One specific goal of this study is to track the knowledge flow among different subjects and how NoC research was shaped by other factors, which include the relationship between subjects, and the collaboration patterns among authors, and the authors' preferences in choosing specific NoC subjects.
The rest of the paper is organized as follows. Section 2 takes a glimpse of the 20 years of research on NoC. Section 3 goes on to survey the related work. In Section 4, the analysis methodology followed in this paper is presented, followed by the network structure and evolution of NoC publications, NoC subjects, and co-authorship as detailed in Sections 5, 6, and 7, respectively. In these sections, we also provide the analysis regarding the reasons that drive the formation and evolution of the networks/data. Actually, the data and the analyses in Sections 5 through 7 lay down the ground for the prediction for the evolution of publications, subjects, and co-authorship, as presented in Section 8. Finally, Section 9 summarizes the paper.
The contributions of the paper as follows. 1) A data-driven complex network analysis is performed to study the evolution of NoC research over its entire 20 years of existence. A longitudinal study by nature, this study is based on the build of three complex networks, namely citation network, subject citation network, and co-authorship network. 2) Network structures and dynamics of the three networks are analyzed to reveal the evolution of publications, subjects, and author behaviors. A few interesting results about NoC area have been thus unveiled.
(i) Six subjects, i.e., topology, routing, flow control, router design, mapping, and emerging technology have been identified as the most influential subjects in NoC research, and they are found highly related to each other. The knowledge flow among them can be clearly measured with the metrics defined in this study.
(ii) A few subjects, noticeably reliability, are becoming increasingly influential as time goes by, while some others, such as physical design, is gradually losing its momentum. (iii) The collaboration among the authors shows interesting motifs (sub-graphs that frequently appear in a network), indicating different collaboration patterns. 3) Researchers, not limited to those working in the NoC area, can adopt this methodology, along with use of the analysis tools developed in this study, to understand and gauge the scientific research areas of their interest and have the ability to make the predictions about the future directions.

II. TWENTY YEARS OF RESEARCH ON NoC: A REVIEW
The start-up of the NoC area was marked by a few highly influential early works (e.g., [16], [17]), rightfully recognizing the fact that interconnection infrastructure based on conventional bus, crossbar, or point-to-point connections could no longer keep up with the scalability, bandwidth, and efficiency requirements as more processing cores (elements) were integrated into the same silicon. As a viable alternative, NoC enables efficient communication among the processors, memory units, and IO units with data packetization and transmission in a manner as similar to that in computer networks, which was poised to provide high bandwidth, low latency, and scalability. But since there are physical limits imposed by material and layout, there are unique requirements for NoC in term of size, system performance, and power consumption. A generic NoC architecture, shown in Figure 2, is made of blocks like tiles and links that are set to allow packetized data to flow through. Each tile is composed by a core, a memory unit, a network interface (NI), and a router. The routers are connected to form a network topology. This generic NoC architecture often in a real setting needs to include additional features and/or be tailored to handle diversified traffic patterns and meet the traffic demands. The NoC research has been driven by three main factors, including 1) demands of emerging applications, 2) emergency of various architectures, and 3) emerging technologies.
(1) In the past 20 years, a myriad of new applications have emerged, ranging from embedded systems to cloud FIGURE 2. Illustration of a generic NoC system that has multiple routers and links. Each processing element interfaces with its attached router through a network interface (NI). A router typically has a number of components like input/output buffers, routing computation unit, and switching fabric. Core is the processor core. Mem is the local memory or L1 cache or L2 cache bank. R refers to router.
computing. These emerging applications can also be broadly classified into multimedia processing, machine learning and AI, graph-based computing, telecommunications, etc., and they exhibit vastly different needs in terms of bandwidth and/or latency.
(2) Different NoC architectures have different levels of support for different traffic patterns and demands. For example, in a cache coherent many-core system, its data traffic relies on some coherent protocols, including read request, write invalidate, etc., and such traffic is one-to-many in nature. GPU, on the other hand, sees a great deal of many-tofew-to-many (i.e., from tiles to memory controllers and back to tiles) type of traffic. In addition, the rise in FPGA-based AI accelerators calls for more efficient and scalable NoC to support broadcasting, multicasting, and many-to-one types of traffic.
(3) The last a couple of decades have witnessed great improvements in semiconductor material and frontier technologies, including silicon photonics, wireless technologies, transmission line, interposer, and 3D integration, which have opened up new opportunities to build non-electrical, or wireless, or true 3D NoCs. There has been a hype and hope that, 3D integration that uses through silicon vias (TSV) or monolithic inter-layer vias (MIV) for vertical interconnection can significantly reduce the lengths of the global wire. As a more radical approach, wireless NoC can provide a multicastbased interconnection without the need of wire links, which is poised to connect large systems with extremely low latency. Silicon photonics, either on chip or in package, provide low power and high bandwidth communications for cores or dies. Interposer-level networks have become one of the most popular research topics in recent a few years, as chiplet design paradigm has dominated the design landscapes. The new chiplet-based processors that rolled out from Intel and AMD use SERDES (serializer/deserializer) or transmission line sitting on an interposer for data communications.
Above three research drivers actually impacted NoC research quite differently over the time. From 2000-2007, research and development of NoC were in their start-up stage, and the focus was mostly on the network structure design (topology, routing, flow control, etc.), application mapping, design space exploration and tools. All the major early works were capped by a survey paper that was published in 2006 [18]. From 2008-2013, as NoC research hit its peak, the dominant subjects shifted to emerging technologies, new applications, simulators, new NoC architectures, power management, etc. Various important NoC architectures, including dimension decomposed router [19], virtual channel (VC) design and express VC design [20], 3D NoC [21], WiNoC [22], [23], optical NoC [6], all came to light during this time. Once the NoC research enters its next phase (2014-2020), NoC for AI accelerators, and many new subjects like approximate NoC, security, network on interposer in chiplet based systems have dominated the publication and development landscape. The timeline (history) of the NoC research and the changing research topics will be qualitatively analyzed in later sections.

III. RELATED WORK A. SURVEY PAPERS IN THE NoC AREA
Several survey papers were published, covering many aspects of NoC research, ranging from application mapping [1]- [3], topology design and routing algorithms [4], [18], [24], [25], fault tolerant NoC [5], optical NoC [6], [7], low power design [8], [9], NoC for QoS [10], [11], testing [12], simulators [13], to reconfigurable network [14], etc. However, all these survey papers are based on the domain knowledge of the human experts (authors). They do not follow a qualitative approach to classify and merit the exiting research merely based on the survey contributors' own understanding of the subjects at hand. This paper differs from those survey papers as we aim to provide a comprehensive view of the NoC publications from a complex network perspective, and thus, in a more objective manner, to reveal how the NoC evolves, what relationships among different NoC subjects hold, and what collaboration features we can perceive within the NoC community by analysing the network structure and dynamics quantitatively.

B. SCIENCE OF SCIENCE, IN THE EYES OF COMPLEX NETWORK
Science of science studies what science is really about or what constitutes a specific discipline. Specifically, it identifies the features or dynamics of a specific discipline of science or the whole science, focusing on how knowledge, publications, and scientist behaviors would evolve [15]. In the literature, complex networks or complex systems, as a powerful mathematical tool, have been applied to empower data-driven analysis of science of science [26]- [30]. Of the many complex networks that can be built, citation network and co-authorship network are the two most commonly used [31]- [34], while other kinds of networks, such as network of subjects and network of keyword occurrence, can also provide a useful perspective in studying the flow of knowledge and evolution of a scientific area. For instance, the authors in [35], used the conference publications in bio-informatics to study the evolution of that area. [36] built a network of academic conferences to track the changes of research topics and the dissemination of scientific ideas. [37] modeled the co-authorship network evolution, while [38] modeled the population of the computer science community using the DBLP dataset.

IV. ANALYSIS METHODOLOGY
Following the methods established in science of science, we investigate how the overall properties of publications, research subjects, and author behaviors in the NoC area have changed over time. Figure 3 shows the analysis methodology adopted in this paper.
• First, preparing the dataset. All the NoC related publications presented in the five top conferences (DAC, DATE, HPCA, ISCA, and NoCS) as well as high-impact NoC papers (being cited at least 100 times from Microsoft Academic Search, including both journal and conference papers), are collected from the IEEE Xplore and ACM digital libraries, and the total number of paper reached 997. As conference proceedings are a timely medium to publish new ideas than journals, the proceedings of the flagship conferences are justifiably determined to allow us to grabble with the evolution of NoC research thoroughly and timely. DAC, DATE, HPCA, ISCA, and NoCS are generally agreed to be the most important conferences by NoC researchers, and all the major contributions in NoC research over the past 20 years appeared in these conferences. We adopted 100 as the threshold for citation count to determine if a paper is considered as a high impact paper. The highest citation count of an NoC paper is found around 5000, and most of the papers understandably receive much fewer citations. We picked up the threshold to be 100 such that the high impact papers account for around 20% of the total dataset, which follows the Pareto Principle or the 20/80 rule, with the assumption that the top 20% most cited papers are the most influential ones. Microsoft has a team working on science of science with many publications. They also released an open source dataset for academic relationship, see https://www.microsoft.com/en-us/research/project/ microsoft-academic-graph/. We thus use Microsoft Academic Search for convenience. This does not exclude people from using Google Scholar or any other similar service for the same analysis and prediction purposes.
• Second, building an NoC subject tree. We build a subject tree that presents the NoC subjects in a hierarchical manner. The tree takes NoC as the root node, and each topic in NoC can be divided into several smaller subtopics at the next levels, until there is no point to divide a subtopic further. Figure 4 shows a snapshot of the NoC subject tree, where the level 1 branches are wireline NoC and emerging technology (including optical, wireless, etc.). Under wireline NoC there are application, system, networking, micro-architecture, and circuit and physical layers. Note that usually we use the tags at the bottom of the subject tree (leaf nodes) to annotate articles, but emerging technology alone becomes a tag instead of its subtopics. The full subject tree is available at https://github.com/ FCAS-SCUT/Science_of_Science_NoC. Since this subject tree has been checked by multiple NoC researchers, there is a high degree of confidence of its accuracy and completeness. Terms in this subject tree are the subjects in NoC, which serves as a dictionary for subject labeling. That is, the text processing algorithm in Appendix uses this dictionary to automatically label each paper for its subject(s). Note that a paper may be affiliated with multiple labels as it may fall into multiple subject areas.
• Third, building three complex networks and analysing them.
1) The citation network is built based on the citation relations between the NoC publications. Both the network structure and dynamics are studied. 2) To get the evolution and interrelationship of subjects in NoC, it is inadequate to just examine the citations between publications. Rather, by aggregating the citation relations to the subjects at different levels, the subject citation network needs to be built. Again, its network structure and dynamics are analyzed.
3) The co-authorship network is further built, to understand the behaviors of the NoC researchers, which also contributes to our knowledge about the evolution in NoC research. The analysis methodology that is followed in this paper.
We have developed corresponding software tools including data processing program and analysis algorithms for the whole process shown in Figure 3. These tools can be easily extended to analyze the evolution of other research areas, and the source code can be found in https://github.com/ FCAS-SCUT/Science_of_Science_NoC.

V. EVOLUTION OF PUBLICATIONS BASED ON THE NoC TEMPORAL CITATION NETWORK
The NoC citation network is represented as a directed graph, where each node is an NoC paper and a directed link is inserted from node A to node B when paper A cites paper B. The size of a node corresponds to the number of being cited after publication, i.e., the in-degree of a node. The links are unweighted (i.e., the weight is always set to 1) due to the nature of a citation relationship.

A. NETWORK STRUCTURE AND DYNAMICS OF THE NoC CITATION NETWORK
Evolution of the network structures can be visualized from the four citation networks drawn at four milestone years, namely Y2005, Y2010, Y2015, and Y2020 ( Figure 5). A community detection algorithm [39] is performed to examine the relationship among the NoC publications. To reflect the structural strength of these communities, they are colorcoded. The vertices painted with the same color belong to the same community. Notice that there are still a few papers that are neither cited nor citing other papers in our dataset, such papers are hereinafter referred as isolated papers. Table 1 summarizes the number of nodes, clustering coefficient, network diameter, average degree, modularity (with resolution = 1.0), and percentage of the giant connected branch for NoC citation networks in four milestone years. One can see that the number of nodes in NoC citation network increases rapidly (rising from 107 to 997), especially from 2005 to 2015, indicating that the NoC field is evolving and knowledge is accumulating at a fast speed. Since the number of papers in the dataset is much smaller than the total number of citations they have, the average degree of the network is low (smaller than 3), and the network diameter grows over the years (rising from 3 to 9). On the other hand, the increase of average degree indicates that the knowledge base in the NoC field is maturing, and more publications in NoC area are citing articles within the NoC itself. The modularity and clustering coefficient of the network are always low, indicating that the citation is a weak relationship with no clear preference. In another word, it is difficult to form an obvious and stable community in NoC. Another interesting phenomenon is that the giant connected branch of the NoC citation network grows larger each year, suggesting that there are increasingly fewer isolated nodes in the citation network. Table 18 in Appendix further shows that, as time goes by, small communities tend to merge into large ones, corresponding to the growth of the NoC knowledge base.
A close look at the early papers reveals that they mostly cited works outside the NoC area, which means before 2010, the NoC research benefited from the inward knowledge flow from other areas to the NoC. For instance, the references of [17] (the largest node in the citation network in 2005, centering in the graph), a paper published as early as in 2002, were drawn from the traditional areas like computer network, digital system design, IC design, and power optimizations. In a sharp contrast, the papers published in 2015 or later had a lot of internal citations itself (i.e., the cited papers were from the NoC area). Such increase of internal citations is a clear indicator that the NoC area became more matured and a significant amount of knowledge was created and spread within the NoC circle. Correspondingly, the NoC citation network became more densely connected.   Table 16 and Table 17 in the Appendix further tabulate the top 5 most cited publications in each calendar year. One can see that, in the early years, [17] as one of the earliest papers in the NoC area, ranked top one in terms of the number of being cited. Later on, the two papers describing NoC simulators [40] and [41] got more citations, as these simulators were adopted by the mainstream of the researchers in the NoC community. In a similar token, another two papers, [42] and [43], were recognized as the milestones in NoC designs, featuring high speed and low delay router/network designs.

B. RELATIONSHIP BETWEEN COMMUNITIES IN THE CITATION NETWORK AND NoC SUBJECTS
A paper is more likely to be cited by those that fall into the same subject areas, and such citation relationships thus are generated and grabbed in the citation network. Correspondingly, a paper and all the papers that cite it appear to be ''close'' to each other in the citation network and they together are likely to form a community. Based on these observations, we test whether all NoC publications in a community belong to a single subject or not. In another word, we try to figure out if there is a one-to-one mapping from a W. Chen et al.: Evolution of Publications, Subjects, and Co-Authorships in Network-on-Chip Research community to a particular subject. Table 15 shows the subject distribution in selected large communities in the NoC citation network. Each of these communities has an ID generated by the community detection algorithm. From Table 15, we have the following observations.
1) The communities do not show a strong correspondence with single subjects. That is, publications in a community often do not belong to only one subject, but several main subjects can broadly cover the community. 2) In the communities, subjects like flow control, routing, router design, topology, emerging technology, and mapping are closely correlated. These subjects appear together frequently and always show similar (high) distributions, and the reason is provided in the next section.

VI. EVOLUTION OF SUBJECTS BASED ON THE NoC SUBJECT CITATION NETWORK
The NoC subject citation network is defined as a directed graph, where each node represents an NoC subject, and a directed link from node A to node B is established if a paper that falls into subject A cites a paper in subject B. The links of the NoC subject citation network are weighted, and the weight of the link between nodes A and B corresponds to the number of citations by the papers in subject A citing those in subject B. Note that the NoC subject citation network includes self-loops to account for self-citations. The node size is proportional to the in-degree of each node, which corresponds to the total number of papers being cited in that subject.
A. NETWORK STRUCTURE AND DYNAMICS OF THE NoC SUBJECT CITATION NETWORK Figure 6 shows the subject citation networks created at Y2005, Y2010, Y2015, and Y2020. Table 2 summarizes the number of nodes, network diameter, average degree (weighted and unweighted), and clustering coefficient for NoC subject citation networks in four milestone years. One can see that the number of nodes grows at a fast pace over time (starting with 26 and rising to 81), indicating that many VOLUME 9, 2021 new subjects emerge in the NoC field, particularly during the early days of the NoC era. The subject networks had a shorter network diameter (from 7 to 4), while the average degree (rising from 3.308 to 13.272) and the clustering coefficient (from 0.174 to 0.645) rapidly grew. All these observations point to the fact that the NoC subjects tend to increasingly cluster together and the knowledge flows among them are becoming more frequent over the years.

1) DYNAMICS OF THREE REPRESENTATIVE SUBJECTS
The evolution of the influence of specific subjects in the NoC area over time is also studied, expecting to identify the trends and measure them quantitatively. We plot the changes of the influence for three typical subjects, namely physical design, reliability, and topology in Figures 7-9. The networks are presented in ''Fruchterman Reingold'' style layout that nodes/subjects with higher weighted in-degrees (i.e., nodes/subjects with their publications being cited more frequently) are placed close to the center. The influence of a subject is quantified by the percentile rank scores of its weighted in-degrees. For example, the weighted in-degree of physical design ranked top 10% among all the NoC subjects up to 2005, so its influence score is 90% (meaning it is more popular than 90% of the NoC subjects). The greater the weighted in-degree of an NoC subjects, the greater its influence score and the closer it is to the center of the network. Figure 10 shows the change of influence scores of three NoC subjects over time. One can see that: (1) the influence score of physical design drops from 90% in 2005 to only 50% in 2020; (2) the influence score of reliability jumps from lower than 30% to higher than 80%; and (3) the influence score of topology is kept relatively flat, at around 80% or slightly higher.
The reasons behind these trends can be explained as follows. (1) Physical design was recognized as a critical issue in the early phase of NoC development, and as such, it was at the center of NoC research before 2005. Once it became more matured, the research interest faded away. As so, this subject moved towards the periphery of the NoC subject citation network. We believe in the future this topic will draw some attention again as the physical design may require early floorplaning for chiplets sitting on an interposer or substrate.
(2) Moving in a direction quite opposite to physical design, reliability in NoC has gained more attention and traction in the research community. Performance, rather than reliability, was the primary optimization objective in the early years of NoC. But due to relentless increase of chip power density and continued device minimization and wire shrinkage, which together contribute to higher failure rates, more soft errors, or faster aging, there has been growing concerns on NoC reliability, which mandates more research to address these problems. Reliability and DFT (Design for Test) will become more important for the interconnect, circuits, and logic in the interposer or substrate. (3) Topology holds almost the same location (near the center) in the NoC subject citation network, indicating that it stands the test of time of its popularity as a main research subject. This phenomenon may be explained by the fact that, with the emergence of new technologies and new architecture designs, there is more space to explore when it comes to topological design. Correspondingly, the research community still maintains a high level of enthusiasm for this subject. With the aforementioned analysis tools and method, a quick and objective data-driven survey on any NoC subject can be established. For instance, a brief survey of NoC reliability is done by solely reviewing the papers from our dataset.
According to the topic tree, the subject reliability can be further divided into 3 subjects that appear as the children of the reliability node: • First sub-subject: Fault-tolerant approaches using fault tolerant routing or reconfigurable typologies to bypass faulty components, or using remapping or migration to move tasks to fault-free cores [5], [44]- [56]. Collectively, the total citation number received by these papers is 45 in our database.
• Second sub-subject: Modeling of soft errors in routers/links or approaches to handle soft errors using error correction link/router designs [57]- [65]. The total citation number received by these articles is 11.
• Third sub-subject: Modeling or mitigating aging through adaptive routing or remapping/mapping [66]- [71]. The total citation number received by these articles is 7.
From the number of citations, it can be seen that the research concerning fault-tolerance in NoC remains the most popular one under reliability, as permanent faults are becoming a serious concern these days.

B. DYNAMICS OF SIX MOST INFLUENTIAL AND HIGHLY CORRELATED SUBJECTS IN NoC
From Figure 6, one can see that topology, routing, flow control, router design, mapping, and emerging technology are the most influential and strongly connected subjects in NoC. Since these six subjects also happen to have the highest eigenvector centrality values (>0.95), as reported in Table 3, their importance in NoC design is further confirmed. For instance, topology has always been the most fundamental element of the network infrastructure, and routers are the primitive building blocks of an NoC network. Flow control and routing are the two key elements to improve network performance. Application mapping has a key implication on the system performance in NoC. In recent years, emerging technology has been a driver for NoC research. In the following, we will examine the relationships of these six subjects through the lens of subject citation network. When paper a cites paper b, we claim knowledge flows from b to a. Note that the direction of knowledge flow is opposite to that of the citation direction (as shown in Figure 11 (a)). Because the number of papers falling in one subject can vary significantly from that of papers in another subject, it can be misleading to simply use the number of citations between subjects to represent the knowledge flows.     For example, suppose there are 100 articles in subject A and 10 in subject B, and articles in A reference 9 papers in B, while articles in B reference 10 papers in A 10. In this case, it could be wrong to conclude that the knowledge flow from A to B is greater than that from B to A. In order to eliminate factors that might lead to inaccurate knowledge flow assessments (for example, each subject has different number of papers), the statistical significance of each citation relationship has to be verified with respect to a null model.
In Figure 11 (b), we illustrate the calculation method of knowledge flow in a simple subject network with three subjects A, B, and C to illustrate the null model and statistically significant network. Assume the papers in subjects A, B, and C reference a 1 , a 2 , and a 3 papers, respectively. a 13 through a 15 respectively represent the numbers of citations received by papers in subjects A, B, and C. a 4 through a 12 are the numbers of citations between each pair of subjects A, B, and C respectively, for example, a 5 is the number of citations that papers in subject A cite those in subject B. The total citation count T c is the total number of references, given as a 1 + a 2 + a 3 , which is also equal to the total number of citations received by papers in these three subjects, counted as a 13 + a 14 + a 15 . We thus have, T c = a 1 + a 2 + a 3 = a 4 + a 5 + a 6 + a 7 + a 8 + a 9 + a 10 + a 11 + a 12 = a 13 + a 14 + a 15 (1) is defined as the possibility of a paper from subject A cites that from B. So, in the statistically significant network, we have: We then consider a null model in which the papers published in subject X randomly select papers as their references, regardless of which subject they belong to. Let X citing be the subject whose papers cite those in other subject(s) and Y cited be the subject whose papers are cited by other subject(s). Hence the probability in the null model can be written as follows: The knowledge flow metric F [73] is defined as follows.
B A indicates that the knowledge flows from B to A. F = 1 is adopted as the critical threshold to distinguish whether the knowledge flow from subject B to subject A is statistically significant or not. When F > 1, it means that subject A is more likely to have extracted knowledge from subject B than would be expected at random. The F values of the subjects in 2020 are computed in Table 4. Figure 12 shows a sub-graph of the subject citation network, and this sub-graph contains the six most influential subjects, namely, topology, emerging technology, mapping, flow control, router design, and routing. In the figure, each node represents a subject and a directed link between node A and node B is established if node A flows knowledge to B. The size of a node is not distinguished, and the weight of an edge corresponds to the value of F. One can see that: 1) For each subject, the number of self-citations is much more than that of citations across different subjects.
For example, emerging technology shows a very strong self-citation (F = 2.4), indicating that this subject tends to rely on the knowledge contributed by the publications from the same subject.
2) The knowledge flows between a pair of subjects are symmetric, with a maximum difference of only 0.2. The contribution of subjects A to B and that of B to A are almost equal. 3) By removing the edges whose F values are below 1, the six subjects can be further divided into two communities as in Figure 12 (b). That is, emerging technology, mapping, and topology form a community, and so do routing, router design, and flow control.

C. NUMBER OF NoC PUBLICATIONS OF THE SUBJECTS EACH YEAR
All the NoC publications are clustered into three time periods, i.e., 2000-2007 (period 1), 2008-2013 (period 2), and 2014-2020 (period 3). The subjects may evolve, following these three trends: • Rising, the number of publications in one NoC subject increases from period i to period i + 1.
• Declining, the number of publications of one subject decreases from period i to period i + 1.
• Stable, the number of publications of one subject remains flat in two consecutive periods. Figure 13 (a) shows the number of publications falling into the subjects at the top level (level 1) of the NoC subject tree (see Figure 4). The trend change of a subject at any level, in terms of number of publications, actually follows a similar pathway. Starting from a low number, the publication number climbs up to reach its peak, and afterwards, it goes down. As the networking layer has always been the center of NoC architecture, the publications in this subject have consistently outnumbered those in other NoC subjects in each period. It is worth noting that emerging technology soars and remains strong in recent years, setting it as an important driving force to push the envelope of the NoC research. Figure 13 (b) shows the numbers of publications of the subjects under the parent node of networking layer (level 2) in the NoC subject tree. In the networking layer, deadlock, modeling, flow control, and topology are the main research subjects, while most of these subjects were seeing an upward tick from period 1 to period 2 and their numbers declined from period 2 to period 3 with their percentages literally unchanged. Figure 13 (c) shows the number of publications of the subjects under the subject topology (level 3) in the NoC subject tree. Direct network is the subject with most publications under the topology layer, and its publication VOLUME 9, 2021  number is stable across all three periods, while subjects application-specific and bus declined rapidly from period 2 to period 3. 3D NoC appeared in period 2 and kept stable up to 2020.
A closer look at the history of the conference NoCS will enable us to better understand why NoC research reached its peak in 2008-2013 and started to decline from 2014. The inaugural NoCS conference was held in 2007 which was right before the NoC research hit its peak. Between 2008 and 2013, NoCS benefited from the uptick of NoC research and was considered one of the main vehicle to publish/present NoC research with a good number of publications. However, in line with the decline of NoC research after its peak in 2013, NoCS received smaller number of submissions, and eventually, this anemia forced NoCS to become part of ESWEEK conference in 2017. Except for a small bounce back in terms of the number of publications in 2020 (Figure 1), the overall trend is that the conventional NoC subjects continue to decline in their significance and popularity among researchers, and there is continued and sustained interest on emerging technology and new subjects like chiplet and security.

D. CONTRIBUTIONS OF THE EMERGING TECHNOLOGY
NoCs built upon emerging technologies, e.g., silicon photonics, wireless, carbon-nano-tube-based antenna, inductive coupling between vertical layers in 3D IC, and transmission line, are being investigated with noticeable progress. These new technologies are largely driven by a pressing demand for much greater bandwidth efficiently, extremely low power consumption, and superb scalability that current interconnect technologies are hard to deliver. For instance, a chiplet/2.5D/wafer scale system is found possible to integrate a sea of cores or memory vaults (up to 850,000 AI cores in Celebras wafer scale system) into one single package. Many AI applications exhibit heavy one-to-many traffic, meanwhile, photonics and wireless NoCs are able to genuinely support multicast or broadcast traffic. By having an optical NoC along with components to perform optical computing, it is even possible to see the emergence of all-optical chip systems with extremely low power consumption. Many of these new technologies can benefit more than just intra-chip, networked communications. There is a growing trend is to build up hierarchical networks to link on-chip, inter-chiplet, to inter-node levels together for HPC systems. Actually, quite a few HPC systems are already using optical connections to connect compute nodes. Silicon photonics can pave the path to enable an all-optical interconnection network that goes from on-chip to inter-node levels. In addition, wireless NoC enables short distance communication among HPC compute cards which can reduce the wiring complexity and/or cost in HPC networks.

VII. EVOLUTION OF THE NoC CO-AUTHORSHIP NETWORKS AND AUTHOR BEHAVIORS
In this section, we study both the NoC co-authorship network and the author behaviors. The co-authorship network is an undirected network, where each node represents an author, and an edge is established between two authors if they have collaborated on at least one publication. The weight of a node reflects the contribution of an author to the NoC field and is defined as the total weighted number of articles published by this author. Note that the contributions of authors in a paper are inversely proportional to the order of the author list, i.e., the i th author has a contribution weight of 1/i. The weight of an edge corresponds to the number of joint publications between the two authors. Figure 14 shows the NoC co-authorship networks created for Y2005, Y2010, Y2015, and Y2020. The different communities in each of these networks are color-coded. Table 5 shows the clustering coefficient, network diameter, average (weighted) degree, modularity, and the percentage of the giant connected branch of the co-authorship network. The modularity of the network is always very high (close to 0.9), indicating the NoC co-authorship network has a very significant community structure. That is, the NoC authors have tendency to collaborate with peers from the same research community. Different from the weak citation relationship seen in the citation network, the NoC co-authorship networks come with much higher clustering coefficients (the value is almost ten times that of the former), which means that NoC researchers have relatively dense connections with each other. Note that many of the authors in our dataset were students at the time when they published their papers. Majority of them left academia after graduation, and they stopped collaborating with other authors, leading to a high network diameter (mostly greater than 10) and low average (weighted) degree (smaller than 5) in the NoC co-authorship network. Either from the perspective of the increased average degree and clustering coefficient, or from the phenomenon of small communities merging into larger ones over the four milestone years (further detailed in Table 18 in Appendix), it suggests that there is frequent collaboration among the authors.

A. NETWORK STRUCTURE AND DYNAMICS OF THE NoC CO-AUTHORSHIP NETWORK
Furthermore, we apply the method in [74] to check whether the NoC co-authorship network is a small world network. Let N , k, and d be the number of nodes, average degree, and average network diameter respectively. If a network satisfies d ≈ ln N /k, it can be regarded as a small world network [74]. In our case, N , k, and d are 2184, 3.106, and 5.85 respectively in 2020, and ln(N )/ ln(k) is 6.81. Therefore, formula d ≈ ln(N )/ ln(k) holds. We hence conclude that the NoC co-authorship network is indeed a small world network, in agreement with most research findings. In a small world network, most nodes are not adjacent to each other, but the neighbors of any given node are likely to be neighbors of each other, and most nodes can be accessed from any other nodes with very little steps or jumps. This indicates that, the distance between two randomly selected nodes (authors) in this co-authorship network is very short.

B. COLLABORATION PATTERNS AND CROSS-COMMUNITY COLLABORATION PHENOMENON
In this section, the collaborative patterns and interesting cross-community collaborations in the co-authorship network are investigated, shown in Figure 15 and Tables 6-8.
The NoC co-authorship network shows salient collaborative patterns as in Figure 15. A node is referred as a small, medium, or big node if its weight is less than 5, between 5 and 10, or greater than 10 respectively. Three collaborative patterns are found 1) collaboration pattern A: a big node (a person with a large number of publications), a medium node (a person with a moderate number of publications), and several small nodes (persons with few publications), 2) collaboration pattern B: that consists of a medium node and small nodes, and 3) collaboration pattern C: that involves a mixture of several medium nodes and small nodes.  Collaboration pattern A corresponds to an academic group with one prolific leader, one rising young scholar, and several students or postdoc fellows. Collaboration pattern B is drawn from a smaller group with one faculty member and some students, while pattern C indicates the collaboration between research groups of similar composition of faculty and students. Table 6 reports the numbers of the three collaborative patterns. Note that there are 9 big nodes and 38 medium nodes in our dataset, representing the most experienced and influential authors in the NoC field.
The cross-community collaboration frequencies of different types of nodes and edges are also measured. The cross-community collaboration frequency of an edge is defined as the ratio of the number of collaborations of the two end nodes to the total number of edges of the same type in cross-community collaborations, whereas the six edge types are listed in Table 7. The cross-community collaboration frequency of a node is defined as the ratio of number of its participation in cross-community collaborations to the total number of nodes of the same type, whereas the node types are big, medium, and small. In Table 7, the frequency of the cross-community collaboration between big and big nodes is 1.57, meaning that the average number of collaborations between two large nodes belonging to two different communities is 1.57. On the other hand, as seen from Table 8, the cross-community collaboration frequency of big node is 10.67, meaning that a big node collaborates 10.67 times on average with authors in other communities. One can see that: (1) The cross-community collaboration frequencies of the edges are greater (>1.5) if their node types are medium or big, indicating that if a big/medium node collaborates with another big/medium node, there is a high probability for them to cooperate multiple times. (2) The collaboration frequency is increasing rapidly when the size of nodes grows, indicating that a bigger node is easier to communicate with others and plays a more important role in cross-community collaboration than a small one.

C. AUTHOR BEHAVIORS
We study the career lifetime (the interval between the year an author published his/her very first NoC paper and the year he/she published his/her most recent paper). Figure 16 (a) shows the distribution of the career lifetime of all the authors, which follows a power law distribution (with R 2 of 0.9961). The maximum career lifetime of NoC authors is set to be20 years, which is the same as the life cycle of NoC. Note that only a handful of authors have that long career lifetime, and most of the authors actually have a career lifetime of only 1 or 2 years. The reason is that, these authors were students, and they did not remain in academia after publishing  a few papers and graduated. Figure 16 (b) shows the average number of articles published by each author per career year. One can see that, for each author, the number of publications is 0.9 per year on average. These authors are most productive in their first career year, and least productive in their 19 th career year. Figure 1 (b) shows the number of new authors per year. One can see that around 2008 to 2010, NoC reached its peak in attracting new authors. After that fewer authors entered this area, which can be mainly attributed to the fact that NoC was becoming mature and fewer authors would feel they could contribute to the study of NoC. Another possible reason is that new topics like neural network architecture or security were becoming more appealing, shifting away researchers' attention to pursue more rewarding research. Figure 17 and Table 9 further show the subjects chosen by the new authors for their first papers. In 2015-2016, new authors were more likely to choose emerging technology and flow control as the starting points for their careers. However, more recently, fewer people are choosing these subjects. Instead, network in interposer (a.k.a., chiplet systems), topology and router design are being chosen by more new authors, and these subjects are widely accepted as the frontiers of the NoC area.
To investigate whether the newly published authors are more likely to track the ''hot'' research subjects, we use the Kendall's tau rank correlation coefficient to estimate how the hotness rank of subjects selected by new authors in a given year is correlated to the rank by the hotness of subjects for all the papers published in the same year in Table 10. Here, the hotness (or popularity) of a subject is measured by the total number of publications published in that subject, or the number of new authors entering into that subject. One can see that, the correlation coefficients are very high. However, when choosing their research subjects, the new authors tend to be dragged into hot subjects in terms of the articles published in the previous year, for example, an author had a publication in 2010 could only track hot subjects in 2009, as he/she could not foresee the hot subjects in the same year 2010. Table 11 further shows the Kendall tau rank correlation coefficient by comparing the ranks of the subjects selected by the authors and hot subject ranking of total publications in the previous years. One can see that this correlation is very low especially in recent years. Therefore, we conclude that, new authors do not show obvious behaviors of tracking hot subjects, instead, they are contributing to the hot subjects each year.

VIII. ENVISIONED EVOLUTION OF PUBLICATIONS, SUBJECTS, AND CO-AUTHORSHIPS BEYOND 2020
From the analysis presented in previous sections, we are ready to make a prediction of the evolution of publications, subjects, and co-authorship in NoC research going forward.
1) Evolution of publications: As discussed earlier, NoC is considered as a matured research area after 2013 and the citation network gets more densely connected. Our analysis has indicated that communities do not show    a strong correspondence with a subject. These results can be extrapolated to make the following prediction of the NoC evolution: i) Citations among the publications will be more frequent due to maturity of the topic.
ii) The communities will continue to belong to more than one subject and publications will likely touch upon more than one NoC subject. iii) The self-referencing in NoC will become more obvious, as the NoC area grows mature and diversified. 2) Evolution of subjects: Based on the observations presented in Section VI, we envision the following: i) The NoC area will be matured further, but new topics will continue to emerge and they tend to expand the breadth of the NoC area. ii) The ''hot'' NoC research subjects will have a much shorter lifespan, typically lasting only 2-3 years. The 6 most influential subjects (topology, routing, flow control, router design, mapping, and emerging technology), however, will continue to dominate the research map and keep their popularity. iii) New topics, such as package level network, security, NoC for neural network (NN) accelerators, and emerging technology empowered NoC (e.g., silicon photonics) will likely gain traction. Many of the future research works are expected to span multiple subject area. For instance, we are already witnessing looming photonic interconnection network tailored for connecting chiplets to build large scale neural network (NN) accelerators. iv) Going forward, evolution of the six most influential subjects and the emerging topics need to take many practical issues into consideration. For example, the topology and router of a network on an interposer should consider the pin count constraint of each chiplet; the network topology for applications like an NN accelerator should be tailored to customized its unique data flow patterns and requirements. 3) Evolution of co-authorships: Based on our analysis in Section 7, we envision the following: i) The willingness of authors to collaborate will increase, and authors who previously collaborated will be more likely to continue their collaboration in the near future. ii) As the collaboration network becomes denser, there will be fewer isolated nodes (i.e., authors who have not collaborated with others). iii) New and more collaboration FIGURE 17. The subjects chosen by the new authors. VOLUME 9, 2021 patterns will emerge due to the diversified interests of the researchers. iv) New authors will contribute more to NoC publications, especially in the hot topics.

IX. CONCLUSION
Between 2000 and 2020, the NoC area has gone from startup, to growth and shakeout, to maturity. In this paper, we used the complex network approach to visualize and quantify NoC evolution in these 20 years. Specially, we built the citation network, subject citation network, and co-authorship network, and for each of them, we analyzed their respective network structure and dynamics (evolution). The main findings of this paper are summarized as follows: 1) As time goes by, the citation network, subject citation network, and co-authorship network have more nodes and links, and their community structure is becoming more salient.   Will NoC ever come back strong in the near future and reach a second or even third peaks? This is an ongoing question and can only be answered with continued analysis of the trends following the proposed methodology and using the tools created for this study. As one can readily expand our dataset with more publications from other conferences and leading journals and bring in additional research subjects like memory system design, accelerators, etc. for analysis and prediction purposes, more meaningful and influential results shall be obtained to meet specific research and development needs.
Last but not least, the data analytical tools are released at https://github.com/FCAS-SCUT/Science_of_Science_NoC, and the website for all the data plots is available at https://www.sci-sci.com. Researchers, not limited to those working in the NoC area, can adopt this methodology to understand and gauge all scientific research areas of their interest and make sound predictions about the future.

A. THE PROPOSED SUBJECT LABELING ALGORITHM
In order to automatically select a subject or subjects for each paper, we have developed an algorithm based on supervised learning. Part of speech, word frequency, location (when a word first appears in an article), external feature (word vectored trained from glove [75]), and tf-idf are selected as the features. SVM is used as the classification algorithm. Half of the labeled publications are used as the training dataset,  and the rest are for inference. To improve the algorithm performance, a synonym table is used to record the synonyms (e.g., NoC, networks-on-chip, and network-on-chip are included into the table and they have the same meaning). A dictionary is also built which is composed by the leaf nodes in the subject tree. The proposed algorithm works as follows. It first extracts the candidate subject set by scanning the title, abstract, keywords, and full text of a paper to match with entries in the synonym table and the dictionary. Next, SVM is used to choose the final subjects from the candidate set.
Three unsupervised learning (KNN-based clustering [76], LDA [77], and textRank [78]) algorithms, and a supervised learning algorithm are compared against the proposed subject labeling algorithm. KNN-based clustering, LDA, and textRank can extract keywords from a document without going through training. The supervised learning algorithm uses SVM directly on the title, abstract, keywords, and full text of each paper without the help of the dictionary and the synonym table. Half of the labeled publications are reserved for training, and the rest for inference in the supervised  [17] is a pioneering work in NoC. [42] is a milestone NoC design. [43] reports an NoC used in a commercial processor chip from Intel. Two NoC simulators widely used in academia research are detailed in [40] and [41], respectively. learning algorithm. Tables 12 and 13 show the results of the unsupervised and supervised learning algorithms. Generally, these algorithms lead to unsatisfactory results for reasons below.
• Unsupervised learning is biased to plow the keyphrases of each paper, which are not necessarily the same as the research subject. For example, in the case of the paper [79], textRank outputs ''SMART, SMART++, and multi-hop''. The reason is that this article coins the name SMART for the proposed router architecture, and the algorithm mistakenly takes SMART as an NoC subject, not '''router design'' as expected.
• Supervised learning also has a low precision, since there are many sources of noise. For example, a paper on router design that uses XY routing may be incorrectly labeled as a paper on NoC routing.  • Our proposed scheme achieves the best performance, as it takes advantage of the NoC subject tree and the supervised learning. The candidate subjects are generated by scanning the title, abstract, keywords, and full text first by string matching. The supervised learning is applied on top of the candidate subjects to minimize the effects of noise.

B. EFFICIENCY TEST OF PREDICTION MODEL
We use an autoregressive model of order n to predict the number of papers to be published in year t, denoted as X t . In this case, X t is estimated by taking into account of the publication numbers over the preceding n years (X t−n , X t−n−1 , . . . , X t−1 ).
where β i 's are the regression coefficients, and t is the random error in year t. This model is trained and fitted using the maximum likelihood estimation method [80] to minimize the regression error. To verify the accuracy of the model, we use the number of publications in year 2001-2019 to train the model and measure the error of forecasting the number of publications in 2020. The relative error is defined as δ = X 2020 − X 2020 /X 2020 where X 2020 is the predicted number of publications. Table 14 shows the relative errors of the prediction model. One can see that the model produces fairly accurate predictions. AMIT KUMAR SINGH (Member, IEEE) received the B.Tech. degree in electronics engineering from the Indian Institute of Technology (Indian School of Mines), Dhanbad, India, in 2006, and the Ph.D. degree from the School of Computer Engineering, Nanyang Technological University (NTU), Singapore, in 2013. He was with HCL Technologies, India, for year and half before starting his Ph.D. at NTU, in 2008. He worked as a Postdoctoral Researcher with the National University of Singapore (NUS), from 2012 to 2014, and at the University of York, U.K., from 2014 to 2016. He is currently working with the University of Essex. His current research interests include system level design-time and run-time optimizations of 2D and 3D multi-core systems with focus on performance, energy, temperature, and reliability. He has published over 45 papers in the above areas in leading international journals/conferences.

TERRENCE MAK is currently an Associate
Professor of electronics and computer science with the University of Southampton. Previously, he worked with Turing Award holder Prof. Ivan Sutherland, at Sun Lab, El Monte, CA, USA, and has awarded Croucher Foundation Scholar. His newly proposed approaches, using runtime optimization and adaptation, strengthened network reliability, reduced power dissipations and significantly improved overall on-chip communication performances. Throughout a spectrum of novel methodologies, including regulating traffic dynamics using network-on-chips, enabling unprecedented MTBF and to provide better on-chip efficiencies, and proposed a novel garbage collections methods, defragmentation, together led to three prestigious best paper awards at DATE 2011, IEEE/ACM VLSI-SoC 2014, and IEEE PDP 2015, respectively. He has published more than 100 papers in both conferences and journals and jointly published four books.
MEI YANG received the Ph.D. degree in computer science from The University of Texas at Dallas, in August 2003. Since 2016, she has been a Full Professor with the Department of Electrical and Computer Engineering, University of Nevada, Las Vegas. Her research interests include computer architectures, networking, and embedded systems.