A Hypergraph Approach for Estimating Growth Mechanisms of Complex Networks

Temporal datasets that describe complex interactions between individuals over time are increasingly common in various domains. Conventional graph representations of such datasets may lead to information loss since higher-order relationships between more than two individuals must be broken into multiple pairwise relationships in graph representations. In those cases, a hypergraph representation is preferable since it can preserve higher-order relationships by using hyperedges. However, existing hypergraph models of temporal complex networks often employ some data-independent growth mechanism, which is the linear preferential attachment in most cases. In principle, this pre-specification is undesirable since it completely ignores the data at hand. Our work proposes a new hypergraph growth model with a data-driven preferential attachment mechanism estimated from observed data. A key component of our method is a recursive formula that allows us to overcome a bottleneck in computing the normalizing factors in our model. We also treat an often-neglected selection bias in modeling the emergence of new edges with new nodes. Fitting the proposed hypergraph model to 13 real-world datasets from diverse domains, we found that all estimated preferential attachment functions deviates substantially from the linear form. This demonstrates the need of doing away with the linear preferential attachment assumption and adopting a data-driven approach. We also showed that our model outperformed conventional models in replicating the observed first-order and second-order structures in these real-world datasets.


I. INTRODUCTION
Network modeling, a notable application of graph theory, can reveal static and dynamic natures of interactions between individuals in various real-world complex systems [1]- [3]. However, in some data domains, there is an information loss in simplifying the interaction of complex systems with graphs: we implicitly break group interactions of three or more individuals into independent pairwise interactions. For example, in scientific co-authorship data, papers may be written by more than two researchers. The co-authorship of such papers is decomposed into pairwise co-authorship when the data is represented by graphs. This inability to preserve higher-order interactions is a serious limitation of graph representations. We can address this problem by replacing graphs with hypergraphs [4]. Using hypergraphs, the co-authorship of each paper can be represented by one hyperedge, regardless of how many authors the paper has. This preserves the collectivity of co-authorship [5]. This study aims to examine hypergraph growth models that can capture the dynamics of higher-order interactions in temporal data.
A hypergraph consists of a set of nodes and a set of hyperedges. A hyperedge contains an arbitrary number of nodes, whereas an edge in ordinary graphs contains only two nodes. The number of nodes that a hyperedge contains is referred to as the size of the hyperedge. We note that this size can take different values for each hyperedge. For a node in a hypergraph, the number of hyperedges containing it is called its "hyperdegree", whereas the number of edges connected to a node in ordinary graphs is called its "degree". In hypergraph representations of co-authorship data, each node and each hyperedge represents one researcher and the coauthorship of one paper, respectively. The size of a hyperedge indicates the number of co-authors of the corresponding paper, and the hyperdegree of a node corresponds to the number of papers the corresponding researcher has written in the past. Hypergraphs have been applied successfully in a wide variety of domains, including recommender systems [6], bioinformatics [7], classification [8], clustering [9], and document retrieval [10].
Although there have been many attempts in complex network theory to model the growth of interactions in temporal data using graph representations, there is little existing research on hypergraph-based growth models. One of the most well-known growth mechanisms is preferential attachment (PA) [11]. PA is a "rich-get-richer" mechanism that can provide a compelling explanation of the heavy-tailed degree distributions appeared in many real-world networks. In this mechanism, the probability a node will get new edges at some time-step is proportional to its degree, i.e., the number of edges connected to the node up to that time-step. In case of temporal graph-based models, several models and estimation methods have been proposed for various growth mechanisms, including PA [12]- [16].
Of the few existing works on hypergraph-based growth models, most employ a data-independent growth mechanism which is the linear preferential attachment [17]- [19]. This prespecification of the growth mechanism risks over-simplifying the potentially complex interactions in real-world datasets. In fact, existing works that employed graph-based models suggested that the PA mechanism in real-world temporal graphs is hardly linear [13].
In this work, we propose a hypergraph growth model with a data-driven PA mechanism that can be estimated from observed data. Whereas graph-based PA mechanisms are defined on the degree of a node, our hyper-graph based PA mechanism is defined on the hyperdegree of a node. In our PA mechanism, a node with hyperdegree , i.e., the node is contained in hyperedges, will belong to a new hyperedge with probability proportional to , the preferential attachment kernel. For example, the linear model is specified as = . The exact form of is estimated from observed data.
The contributions of this paper can be summarized as follows: • We propose a novel hypergraph-based growth model with a non-parametric PA kernel. What we mean by "non-parametric" is that the tunable parameter is without specific functional forms. Since the model is invariant to the scale of , we may set 1 = 1 without loss of generality. In most existing works on hypergraph-growth models, the linear PA kernel = is assumed. Such unfounded pre-specification of the growth mechanism completely ignores the data at hand. In contrast, in our model, the PA kernel is entirely free of assumptions. We stress that our non-parametric PA kernel is more flexible than the one-parameter kernel = , which is often employed in graph-based growth models but not used in any existing hypergraphbased models. We provide a method to estimate from the data each value of for each observed hyperdegree . Specifically, we employ maximum likelihood estimation of for this task and derive a recursive formula that significantly reduces the computation cost of the likelihood function of our model. We are currently preparing a publicly available software of the proposed method. • We provide a new approach to treat a selection bias that arises in modeling the emergence of new hyperedges with new nodes. Since parameter estimation in hypergraph-based growth models has not been considered, there is no existing work on this bias in hypergraph settings. In conventional graph-based growth models, in order to remove this bias, new edges are often removed from calculations of the log-likelihood function. However, a similar approach of removing hyperedges from calculations of the log-likelihood function would discard too much information, since the typical number of hyperedges with new nodes can be high in real-world datasets. In our method we use conditional probabilities in order to treat the selection bias in a principle way. We note that this approach can also be applied to graph-based growth models. • We fit our proposed model to 13 real-world datasets that can be divided into five categories: scientific coauthorship datasets, online thread participants datasets, online tagging datasets, national drug code directory datasets, and email datasets. We show that our proposed hypergraph PA model was better in replicating the observed data comparing with conventional graphbased models. When one considers replications of the observed distributions of local clustering coefficients, the proposed hypergraph outperformed conventional models in all 13 datasets. When one considers replications of the observed distributions of the number of triplets, the proposed model provided the best fit in seven datasets, including all co-authorship networks. These findings confirm the importance of considering the collectivity of edges in modeling temporal complex data.
Some words are needed to bound the scope of our proposed hypergraph growth model. While our model does not allow the deletion of nodes and edges in the temporal network, it is indeed natural for nodes or edges to disappear in some network types. For example, an author may become inactive in co-authorship networks, while in relationship networks a relationship edge may be dissolved after some years. Our model also assumes that the PA function does not change with time. However, in co-authorship networks it is not unreasonable to expect that yearly advancements in communication and transportation technologies, which potentially ease how collaborations are formed and maintained, may make the PA function change with time. Even for those types of networks, the proposed growth model can still be a viable approximation for the growth of the network in a short time span, e.g., five to ten years, where one can reasonably assume that the disappearances of nodes and edges, as well as any temporal change in the PA function, are negligible.
Our approach can potentially be used in predicting properties of a temporal network in the future. Some common network properties that are of potential interest are local properties such as degree and betweenness centrality of a node or global properties such as the diameter of a network [20]. In principle, by using a probabilistic generative model such as our proposed hypergraph-based model, one can get information about any network property at a specific time in the future in the form of probability distributions. In order to do this, one uses the fitted model to generate multiple simulated networks at that specific time in the future, and calculates the empirical probability distribution of the property of interest in these simulated networks.
The rest of this paper is organized as follows: Section II describes temporal hypergraphs comparing with temporal graphs. Section III introduces related graph-based PA models and our hypergraph-based approach and presents illustrative results of our proposed model. Section IV provides our estimation methodology and the hypergraph generation algorithm for our growth model. Section V explores the performance of the proposed method in 13 real-world datasets by comparing it with the conventional method. Finally, Section VI concludes this work and outlines future work.

II. TEMPORAL HYPERGRAPH MODELS
In this section, we explain some properties of discrete-time hypergraph-based growth models, comparing it with conventional growth models of ordinary graphs. Let = ( , ) be the hypergraph at time-step = 0, . . . , . The hypergraph consists of the node set and the hyperedge set existing at time-step . The temporal hypergraph grows from 0 = ( 0 , 0 ), the initial hypergraph, with the emergence of new hyperedges and nodes at each time-step. Although a hypergraph where the set of nodes in a hyperedge does not have any order is called an "undirected hypergraph" [21], in this paper we simply refer to it as a "hypergraph". Similarly, we refer to "undirected graph without self-loops" as "graph". Fig. 1 illustrates a discrete-time temporal hypergraph and its graph representation. To handle some features of the real-world hypergraph data collected by discrete-time observations, we explicitly allow the following two points in the temporal hypergraph model.
1) The hyperedge added at each time-step can include both existing nodes and new nodes. This is illustrated in the hypergraph at time-step = 1 in Fig. 1. We call these new nodes newcomer nodes.
2) The number of hyperedges added at each time-step can be more than one. An example is given in the hypergraph at time-step = 3 in Fig. 1. Guimerà et al. [22] proposed a probabilistic model of temporal hypergraphs that controls the proportion of newcomer nodes that appear with hyperedges, and analyzed the relationship between this proportion and the success of collaboration. In this paper, both our proposed estimation method and generator for hypergraphs address the above two points.
Next we explain some information losses that can occur with graph representations of complex temporal data. The growth of complex network data such as co-authorships of papers is conventionally modeled by ordinary graphs, where each motif (e.g., paper) occurring among nodes (e.g., authors) is represented by edges (e.g., pairwise co-authorships). When a motif contains more than two nodes (e.g., a paper with more than two authors), the group interaction created by the motif is decomposed into multiple edges. On the other hand, in hypergraphs, one motif is represented by one hyperedge. An example is given in the hypergraph at time-step = 0 in Fig. 1. In order to present a motif on four nodes, in hypergraph representations, a hyperedge containing these four nodes is used, whereas in ordinary graph representations, six ordinary edges, which constitute a clique on the four nodes, are used instead. Generally, when a motif occurs among nodes, in ordinary graphs, ( − 1)/2 edges are added collectively, whereas one hyperedge of size is added in hypergraph representations. As can be seen from the hypergraph at timestep = 3, given only one graph representation at one timestep, in general, we cannot identify which edges were added together in the past without hyperedge information. Thus, the information about motifs is not perfectly preserved when one employs a graph representation of the data.
Furthermore, there is another information loss when = 1, i.e., when a motif contains only one node (e.g., a single-author paper) is added. In such cases, a hyperedge of size = 1 is added to the hypergraph, while the edges of the graph remain unchanged, as illustrated at time-step = 2 of Fig. 1.
As mentioned above, graphs do not preserve the information about which edges appeared jointly. In other words, the conventional graph-based growth models assume implicitly that all edges are independent. Let denote the size of a hyperedge. Although data with huge exist in the real world, for example in multinational projects [23], [24], few studies have examined whether this independence assumption is appropriate under such large values of . The datasets used in the experiments in Sections III and V of this paper contain hyperedges whose are greater than 100. Using these datasets, we will investigate the performances of some conventional graph-based growth models in the case of large , which has not been examined much in existing studies. In addition, it is reported that the modern science collaboration has shown a trend that hyperedge sizes are becoming larger. From the beginning of the 20th century to present, the average number of co-authors per paper has increased in almost all disciplines [25]. Such changes in data over time may also lead to unforeseen problems for graph-based growth models.
The main motivation for considering hypergraph models in this study is that it does not require the above independence assumption throughout the growth process. In the next section, we describe our proposed approach to capture the characteristics of temporal hypergraphs.

III. PROPOSED APPROACH
In this section, we first review the conventional PA growth model for graphs and describe our proposed PA growth model for hypergraphs. We then present illustrative results showing the effectiveness of the proposed model.

A. GRAPH-BASED GROWTH MODELS WITH PA
We describe an undirected graph version of the General Temporal (GT) model with PA [13]. This model is a generalization of various classical models such as Barabási-Albert model [26] and Price's model [27]. In the GT model, the probability that a node pair , will acquire an edge at time is expressed as: where ( ), ( ) are the degree of node , at time-step , and is the PA value of degree . The function of is often called the attachment function or attachment kernel. Note that is assumed to be time-invariant. This graph PA model is hereinafter referred to as "Edge PA". In Edge PA, no functional form is assumed on the PA function . Several methods have been proposed to estimate the PA function from the temporal network data. Parametric estimation methods for estimating based on the assumption that the PA function has the functional form = with a tunable parameter include regression-based methods [28], maximum likelihood estimation methods [29], and methods based on Markov chain Monte Carlo [30]. There are also nonparametric estimation methods that do not make assumptions on the functional form of the PA function including methods using histograms [31], [32] and maximum likelihood estimation [13].

B. PROPOSED HYPERGRAPH GROWTH MODEL
We propose a hypergraph version of the PA model based on the GT model. Instead of defining PA growth for edges, we define the probability of PA growth for hyperedges, i.e., sets of nodes. Let = ( , ) be the hypergraph at time-step and C ( ) be a family of sets whose elements are the sets that satisfy ⊂ , | | = . We define the probability that a node set = { 1 , 2 , . . . , } ∈ C ( ) acquires a hyperedge of size at time-step as follows: where ( ) is the hyperdegree of node at time-step , and is the PA value of hyperdegree . We refer to the above proposed growth model as "Hyper PA". As in Edge PA, we assume no functional form for the PA function . Do et al. [17] proposed a generative model with linear PA ∝ for hypergraphs in which is defined as the number of hyperedges that contain the set. However, when the size of hyperedges is large, i.e., | | ≫ 1, the value of becomes sparse, which is not suitable for estimating the functional form of . Therefore, in our model we define the PA growth on the hyperdegrees for ∈ .
Edge PA and Hyper PA are equivalent only for data where the size of all hyperedges is two. The difference between Hyper PA and Edge PA emerges when we consider the probability of a hyperedge whose size is greater than two. As an example, let us consider the Hyper PA and Edge PA for a group interaction occurring on the set of three nodes = { 1 , 2 , 3 } at time-step . In Hyper PA, this interaction is considered as a single hyperedge and the probability of this event is On the other hand, in Edge PA, the joint probability for all pairs of nodes is In addition to the difference between using hyperdegree and degree , the exponent − 1, which is equal to 2 for the case of = 3 above, in Edge PA makes the event of large very rare.
The value of in Hyper PA can be estimated from observed data by maximum likelihood estimation. More details of the proposed model, including a treatment of a selection bias that arises when the observed node set contains some newcomer nodes, and estimation method are described in Sections IV-A and IV-B. We can also generate hypergraphs with a given PA function by a procedure provided in Section IV-C.

C. ILLUSTRATIVE RESULTS
This section illustrates that our proposed Hyper PA model is better than the conventional Edge PA model and some other baseline models in terms of goodness-of-fit in two real-world co-authorship temporal networks: STA-coauthor from the statistics field [33] and HEP-coauthor from the high energy physics field [34], [35]. The details of these datasets are provided in Section V. We first fit the models to these data by estimating the PA function for Hyper PA by our proposed method and for Edge PA by the method in [13]. We then compare some statistics of simulated graphs generated from the fitted models with those of the real-world data. The closer the simulated statistics of a model are to the realworld statistics, the better the model is in term of goodnessof-fit. To compare with the graphs generated by Edge PA, the hypergraphs generated by Hyper PA were converted into graphs. The detail of proposed estimation method and the procedure for generating hypergraph is described in Section IV. Fig. 2(a) visualizes the final portion in the growth of STA-coauthor and those of the simulated temporal graphs generated by Hyper PA and Edge PA. Specifically, we plot the final 8% increments of the temporal graphs, which correspond to the last 260 papers that appear in STA-coauthor. To visualize the collectivity of edges around each node, we colored each node according to the size of the largest clique that contains . In the graphs generated by Edge PA, there are fewer red nodes than in the observed data, which means that Edge PA did not capture enough higher-order information and failed to replicate large cliques. On the other hand, Hyper PA generated large cliques similarly to those observed in the real data. This observation can also be supported quantitatively by looking at the average¯of over the whole graphs in 10 simulations. To the observed value¯= 3.26, Hyper PA gave a close match of¯= 3.29, which is much closer than the value¯= 2.83 given by Edge PA.
The reason why Edge PA failed to replicate the collective nature of edge increments in STA-coauthor is that Edge PA adds each edge independently. The poor fit of Edge PA is also confirmed for second-order structures of graphs such as triangles in not only STA-coauthor but also other real-world co-authorship datasets. See Section V-C for more results.
As noted earlier, although conventional graph-based models such as Edge PA may be a reasonable modeling choice if the typical size of hyperedges in the data is small, this number can be enormous in some datasets. Fig. 2(b) shows the distributions of the numbers of co-authors per paper in STA-coauthor and HEP-coauthor. In hypergraph expression of scientific co-authorship, the number of co-authors of a paper is equal to the size of the corresponding hyperedge. As can be seen in Fig. 2(b), HEP-coauthor contains many relatively large hyperedges. The maximum hyperedge size is 201 for the HEP-coauthor and 10 in for the STA-coauthor. More detailed data descriptions for all datasets used in this paper are provided in Section V-A. For a dataset that has a tendency for edge collectivity as strong as HEP-coauthor, one would expect clear differences between Hyper PA and Edge PA. The following experiment in Fig. 2(c) illustrates this point.
In Fig. 2(c), we demonstrate that both hypergraph-based growth and PA mechanism are needed to provide a reasonably good fit to HEP-coauthor. To this end, we add another baseline model, namely Hyper Uniform, that is a special case of Hyper PA in which = 1 for every hyperdegree , i.e., there is no PA effect in Hyper Uniform. Fig. 2(c) shows the observed and simulated probability distributions of degrees and local clustering coefficients. The local clustering coefficient, whose mathematical definition is given in Section V-C, is a popular way to express the density of triangles around a node. The distribution of local clustering coefficients can be used to express the degree of collectivity of edges in the data.
From Fig. 2(c), the following observations can be made. 1) Hyper PA outperformed both Edge PA and Hyper Uniform in reproducing the overall degree distribution, thanks to both the hypergraph-based growth and the PA mechanism. Although Edge PA captured better the right tail of the degree distribution comparing with Hyper Uniform, both Edge PA and Hyper Uniform underestimated the portion of low degrees comparing with Hyper PA. This implies that the PA mechanism may be responsible for reproducing the right tail of the degree distribution, whereas the hypergraph-based growth is potentially responsible for replicating the left tail. 2) In replicating the distribution of local clustering coefficients, Hyper PA also outperformed both Edge PA and Hyper Uniform. Edge PA significantly underestimated the local clustering coefficients, which implies that it could not capture the collectivity of edges in the data. This is expected, since Edge PA adds edges independently. To summarize, both hypergraph growth and PA mechanism are crucial in capturing first-order and second-order structures of the data. Hyper PA employs both ingredients and thus was able to provide good fits to STA-coauthor and HEP-coauthor comparing with conventional models. Further experiments are provided in Section V.

IV. METHOD
In this section, we describe the maximum likelihood estimation of the PA function in our model and pseudo codes for generating temporal hypergraphs from our model. In addition to the derivation of the likelihood function, we provide a recursive formula that enables a fast calculation of the likelihood function. We also provide a principle approach based on conditional probabilities for handling a selection bias that arises in modeling the emergence of new hyperedges with newcomer nodes.

1) Likelihood Function
We first derive the likelihood function of for Hyper PA model. As we described in Section III, our growth model (2) is based on the undirected GT model. Therefore, the derivation of the likelihood function in Hyper PA model here is also based on previous works [13], [14], [16] where maximum likelihood estimation of the PA function is derived for the GT model.
We define some notations needed in the exposition. Let = ( , ) be the hypergraph at time-step . and are the node set and the hyperedge set, respectively. Let { } =0 be the hypergraph sequence, and {ℎ } =1 be the sequence of the number of hyperedges added to the hypergraph at each time-step. We denote the size of each hyperedge at time-step as = ,1 , . . . , ,ℎ and the number of newcomer nodes that appear with each hyperedge as = ,1 , . . . , ,ℎ . For the -th (1 ≤ ≤ ℎ ) hyperedge at time-step , its size is given by , , and we have 0 ≤ , ≤ , ; the number of newcomer nodes is , and the number of existing nodes is , − , . This hyperedge contains solely existing nodes if , = 0, and contains solely newcomer nodes if , = , . Now we consider the probability that some node set , whose size is , acquires a new hyperedge of size , . If we assume that , contains only existing nodes, the acquisition probability is: where C , ( ) is the family of sets such that a set belongs to C , ( ) if and only if ⊂ and | | = , . The case that , contains some newcomer nodes needs some special care, since there is a selection bias. This problem is treated in Section IV-B.
Suppose that the joint distribution of ℎ , , and is governed by the parameter vector , and that the initial hypergraph 0 is determined by init . As in previous works [13], [14], [16], we assume that and init are independent of in growth process. This assumption is interpreted as follows: the increments of hypergraph (i.e. the number of additional nodes and hyperedges) at each time-step are independent of the PA growth. With this assumption, the probability of the observed data can be written as: Taking the logarithm of both sides, the log-likelihood function of can be expressed as: Since on the right-hand side only the first term includes the PA function , we can omit the other terms related to the nuisance parameters init and . The log-likelihood function can then be rewritten as follows: Note that we substituted (3) into (4). The maximum likelihood method estimates the value of by maximizing ( | 0 , . . . , ). The parameter vector in (1) actually includes only elements with the observed values of in the dataset. In addition, to reduce the number of parameters, we employ the "logarithmic binning" [13], [14], [16] of , where we set +1 = in groups of values. Since (5) is computed numerically, we need its efficient evaluation. The term ′ ∈ C , ( ) ∈ ′ ( ) is the normalization of the probability (3), which is the summation of the probabilities of every node set in C , ( ). When the hyperedge size , is large, the computational cost of this term becomes intractable in a naive calculation. This is because of the combinatorial explosion of the number of possible node sets. Specifically, when the number of nodes in the entire hypergraph at time-step is ( ), the computational complexity of a naive calculation is O ( ) , = O ( )! , !( ( )− , ))! , which scales exponentially in ( ) if ( ) is much larger than , . Next we describe a fast computation which scales linearly in ( ).

2) A Recursive Formula for Fast Computation of the Normalizing Factor
A fast computation of the normalizing factor is possible if one can find a way to reduce the numbers of summations needed by exploiting its recursive structures.
We also define the -th power sum as where and are positive integers. According to Newton's identities [36], we have: for all positive integers and satisfying ≤ . We finally arrive at the key recursive formula: Note that ≤ ( ) always holds in hypergraphs. Our approach is to use this formula recursively to calculate the normalizing factor ( ). Since the calculation of the power sums dominating in (7), the time complexity of calculating ( ) can be reduced from O ( )! !( ( )− ))! to O 2 ( ) . By utilizing (7), the log-likelihood function given in (5) can be optimized by standard numerical methods such as quasi-Newton methods or MM algorithms [13].

B. A SELECTION BIAS IN MODELING THE EMERGENCE OF NEW HYPEREDGES WITH NEWCOMER NODES
We consider the case that the node set , contains some newcomer nodes. Denote such , simply as . Naively treating the hyperdegree of newcomer nodes as = 0 causes a selection bias, since in that way such newcomer nodes with hyperdegree = 0 acquire new hyperedges a priori. Here we assume for our dataset that newcomer nodes are included in only when they got a hyperedge. This may lead to overestimation of the true value of 0 . We are going to solve this problem by considering conditional probabilities given that newcomer nodes acquire new hyperedges.
Whereas one can use an existing remedy of a similar bias occurred in graph-based models [13], [14], this conventional approach is sub-optimal in hypergraph settings. Specifically, this approach excludes any new hyperedge that contains some newcomer nodes in calculating the log-likelihood in (4). Since the proportion of new hyperedges with newcomer nodes can be high in many real-world data [22], this leads to throwing away too much data and risks destabilizing the estimation of .
Assume that consists of 1 and 2 , where 1 is the set of newcomer nodes and 2 is the set of existing nodes. Instead of throwing away all the information contained in as in the existing remedy approach, our approach is to salvage the portion of information contained in the event that a new hyperedge emerges on the set 2 of existing nodes, given that the hyperedge also contains the set 1 of newcomer nodes. We will include a term for 2 in the log-likelihood function, thus it contributes to the estimation of . However, we do not estimate 0 , because we assume that existing nodes have at least one hyperedge.
This intuition can be formalized as follows. For convenience, we denote new = \ −1 , exist = −1 . With these new notations, note that = 1 ∪ 2 , 1 ⊂ new , and 2 ⊂ exist . Let and ′ be the sizes of 1 and 2 , respectively, and thus the size of is = + ′ . Now we consider the conditional probability that gets a new hyperedge, conditioned on the event that the set of newcomer nodes is equal to some pre-specified set * with * ⊂ new . This conditional probability can be written as: which is essentially equivalent to (3) but applied to 2 part. In other words, we simply ignore 1 part. In calculating (4), if the observed node set , contains only existing nodes, one uses (3), whereas if it contains some newcomer nodes, one uses (8). Note that in (8), all calculations occur solely on existing nodes. Therefore, we can remove the selection bias and obtain a stable estimate of ( > 0) at the same time. The calculation of the denominator of (8) can also be accelerated by the recursive formula (7). See Appendix for a derivation of (8).

C. ALGORITHM FOR GENERATING HYPERGRAPHS
In this section, we describe the procedure for generating hypergraphs in simulations. The pseudocode for our proposed hypergraph generator is provided in Algorithm 1.
First, we describe how to determine the input of the generator. By using real-world observations, we can set reasonable values in our simulations; inputs (ii) to (vi) can be given directly as descriptive statistics of a dataset, whereas input (i) needs to be estimated from the data.
can be given either as a estimated nonparametric sequence or a function with estimated parameters.
At each iteration of in the procedure, the set of nodes that acquire a new hyperedge ( exist at line 5) is sampled from the Hyper PA model with the HyperPA procedure, which is described at the bottom of Algorithm 1 as a subroutine. We use the conditional probability given in (8) of Hyper PA to separate the effects of the node birth process from the hyperedge acquisition process. In our experiments, the set of newcomer nodes ( new at line 4) is simply taken from the history of a dataset; this is not explicitly described in Input though.

V. EXPERIMENTS
In this section, we first describe the real-world datasets and then present the estimation results for the PA function in these datasets. We then perform simulations to evaluate goodness-of-fit of our proposed hypergraph model.

A. REAL-WORLD DATASETS
We use 13 real-world datasets as temporal hypergraphs in experiments. The datasets can be divided into five categories.
• Scientific co-authorship datasets: Each node is an author and each hyperedge is a set of authors who have written a paper collaboratively. We use the following four datasets: Complex Network Theory (CMP-coauthor) [37], High Energy Physics (HEPcoauthor) [34], [35], Strategic Management Journal (SMJ-coauthor) [38], and Statistics (STA-coauthor) [33]. • Online thread participants datasets: Each node represents a user answering questions on threads and each hyperedge describes a set of users in a thread in which questions are posted. We use three datasets created from sub-forums of the online Stack Exchange forum: askubuntu-user, math-sx-user, and stack-overflow-user. • Online tagging datasets: Each node is a tag and each hyperedge is a set of tags associated with a question. We use three datasets created from sub-forums of the Stack Exchange forum: ask-ubuntu-tags, math-sx-tags, and stack-overflow-tags. • National drug code (NDC) directory datasets: We use two datasets: NDC-classes and NDC-substances. Each node is a class label of drugs (NDC-classes) or a substance in drugs (NDC-substances) and each hyperedge is a set of class labels of a drug or a set of substances in a drug. • Email network datasets: Each node is an email address and each hyperedge is a set of email addresses of the sender and all recipients contained in an email. There is one dataset in this category: Eu-email. Except for the four scientific co-authorship datasets, the remaining datasets are from the hypergraph collection of Benson et al. [39].
Some preprocessing is needed before one can perform model fitting. For each dataset in Benson et al. [39], we extracted the latest 5000 hyperedges for analysis. In addition, several datasets with too few or too many nodes extracted from Benson et al. [39] are excluded from the analysis and not listed here. In each dataset, we set the initial state 0 as the first 50% of each data in terms of the number of edges. In co-authorship datasets, except for STA-coauthor, the original datasets only have temporal graphs and do not contain hyperedges. For this reason, we heuristically reconstructed the hyperedges from the increments of edges at each time-step. Specifically, at each time-step, we repeatedly replaced the new edges that constitute the largest clique with a new hyperedge until there was no more new edges. We tested this procedure on the STAcoauthor, and confirmed that all hyperedges were successfully reconstructed from its graph representation.   are generally increasing, which implies the existence of preferential attachment in hypergraph growth. PA exponent calculated with the estimated PA values for all datasets including the above four datasets are given in Table 1. Table 1 shows some summary statistics of the datasets. It is important to note that HEP-coauthor contains large collaborative research projects such as accelerator physics [40], and hence the average number of co-authors per paper (i.e., the average size of hyperedges) is larger than the other datasets.

B. PREFERENTIAL ATTACHMENT IN 13 DATASETS
We first demonstrate that our estimation method works in some hypergraphs generated from the Hyper PA model. We generated one hypergraph (Hyper PA 1) using the PA function = 3 (log ) 2 +1, and two other hypergraphs, namely, Hyper PA 2 and Hyper PA 3, using the PA function = with = 0.5 and = 1.5, respectively. The former functional form is also used in previous works [13], [16] to verify the nonparametric estimation of the PA function for graph growth models. The latter log-linear form is a widely-used form for the PA function of Edge PA with degree , as described in Section III. When applying the hypergraph TABLE 1. Summary statistics and experiment results for 13 datasets. Summary statistics of the datasets described in Section V-A; is the number of time-steps, is the number of nodes, is the number of edges, is the number of hyperedges,¯is the average size of the hyperedges, is the power-law exponent of the degree distribution, and is the clustering coefficient. Here and are counting repetitions in duplicate. is the estimated PA exponent by fitting the proposed hypergraph-based growth model in Section V-B. Each of triplet is the mean value of the error between the probability distributions of the numbers of triplets in observed and simulated data with Edge PA (EP), Hyper Uniform (HU), and Hyper PA (HP) in Section V-C. Each of local is the mean value of the error between the distributions of local clustering coefficients averaged over nodes with each degree in observed and simulated data with EP, HU, and HP in Section V-C. For each of triplet and local , the values are expressed in percentages, and the best values are in bold (lower is better).  In each panel, the data points are colored according to the type of network. Datasets belonging to the same network type show similar trends in relationships between and and between and . The PA mechanisms successfully captures first-order structures in the networks, as can be inferred from the high correlation between and . However, PA alone is not enough to explain second-order structures, as can be seen from the low correlation between and .
generator of Algorithm 1 in Section IV-C, input parameters other than were determined from STA-coauthor in order to generate realistic hypergraphs. Fig. 3 shows the estimation results for each of the generated hypergraphs. Without making any assumptions on the functional form of the PA function, each estimation result captured reasonably the shape of the corresponding true PA function.
We next estimated the PA function by our proposed method in all datasets. Fig. 4 illustrates the nonparametrically estimated values of the PA functions of Hyper PA model in HEP-coauthor, STA-coauthor, math-sx-user, and NDCsubstances. The nonparametrically estimated in these datasets increase in average, which indicates the existence of the preferential attachment. Furthermore, they are substantially linear in log-log scale. Therefore, we also fitted the log-linear form = with hyperdegree to the estimated values and calculated the exponent by the least-squares method. The values of PA exponent of for all datasets are given in Table 1. Since the estimated attachment exponents are greater than 1 in all co-authorship datasets and thread participants datasets, the PA effect is superlinear in those datasets. And the values of for the tagging datasets and NDC datasets were all in the range of 0.9 to 1.2, and 0.77 for Eu-Email. This result suggests the existence of the PA effect, particularly the strong PA effect in the co-authorship datasets and thread participants datasets. For example, in coauthorship data, the PA effect is that authors who have written more papers in the past are more likely to write new papers in the future.
In the 13 real-world datasets, we found that, while PA successfully captures first-order structures, it alone cannot explain the observed second-order structures. Fig. 5(a) shows a remarkably high correlation between the estimated attachment exponent with the power-law exponent in the realworld datasets. There is a theoretical reason for this high correlation. In PA trees with = , has been shown to be highly correlated with when ≤ 1 [41]. Extrapolating this result to our hypergraph-based growth model, it is reasonable to expect that when the average size of hyperedges is not large, the degree distribution of our model behaves similarly to that of a PA tree. This explains the observed high correlation between and when is around 1. However, given , one cannot infer too much about the clustering coefficient , as can be seen from the high variation of in Fig. 5(b). This implies that PA alone cannot explain second-order structures, which is expected since PA is only a first-order mechanism. It is reasonable to expect that second-order structures, such as the clustering coefficient , also depend on various higherorder growth factors, such as the distributions of sizes and numbers of hyperedges at each time-step.

C. EVALUATION OF GOODNESS-OF-FIT FOR SECOND-ORDER STRUCTURES
In this section we perform additional experiments to compare the proposed Hyper PA model with some baseline models in reproducing second-order structures in the observed data. As in Section III-C, in addition to Edge PA and Hyper PA models, we also consider the Hyper Uniform model. This special case of the Hyper PA model uniformly adds hyperedges, i.e., the PA function in Hyper Uniform is = 1 for all hyperdegree . As already described in Section III-C, we will adopt a simulation-based approach to investigate the goodness-of-fit of the models. Specifically, we will generate networks from each model and compare several important statistics of the generated data to those of the real-world data. In each dataset, Hyper PA incorporates estimated in the previous section, and Edge PA incorporates obtained from a nonparametric estimation method [13]. Since Edge PA generates graphs, we converted hypergraphs generated by Hyper PA and Hyper Uniform into graphs for comparison.
One of the most important graph properties often found in real-world networks is triangle-rich, which is manifested as a high value of the clustering coefficient [31], [42]. We here examine the distribution of the number of triangles that a node   has. We denote the number of triplets of node as: where , = 1 indicates the presence of edges between and , whereas , = 0 indicates the absence of edges. Fig. 6 shows the observed and simulated cumulative probabilities of the numbers of triplets in representative cases when Hyper PA succeeded and when it failed. When Hyper PA succeeded, Edge PA often over-estimated the region of small number of triplets, which may be caused by the edge independence assumption in Edge PA. When Hyper PA failed, it often underestimated the region of small number of triplets. Table 1 shows a quantitative comparison for all datasets using local , which is calculated as follows: where bin is the number of logarithmic bins, and obs and sim are the probability distributions of the numbers of triplets in real-world data and simulation data, respectively. We note that Δ ′ 1 , . . . , Δ ′ bin is the logarithmic binning of Δ, and we describe the result with bin = 10 in Table 1 provided the best fit in 10 datasets, whereas Edge PA and Hyper Uniform prevailed in the remaining three. For closer inspection, we investigate the density of triangles around nodes, i.e., the local clustering coefficient. The high density of triangles in low-degree nodes is a signature property of many real-world networks. This property is also important since it may make practical tasks such as low-dimensional embedding more difficult [43]. The local clustering coefficient of node with degree is The average of over all nodes in a graph is called the clustering coefficient. We analyze the distribution of averaged over nodes that has the same degree : where the set is all nodes with degrees in a graph, and is the number of nodes in . Fig. 7 shows the observed and simulated distributions ( ) for some representative datasets. Hyper PA succeeded in reproducing the signature decreasing of ( ) when increases, while Edge PA under-estimated ( ), especially in the region of low . Table 1 shows a quantitative comparison in all the 13 datasets using local , which is calculated as follows: where bin is the number of logarithmic bins, and obs and sim are (9) of observed and simulated data, respectively. We note that ′ 1 , . . . , ′ bin is the logarithmic binning of , and we describe the result with bin = 10 in Table 1. Out of the 13 datasets, Hyper PA provided the best fit in all. Hyper Uniform prevailed over Edge PA in 11 datasets, in spite of the fact that Hyper Uniform uses a constant linear PA function, while Edge PA estimates the PA function from data. This highlights the importance of incorporating hyperedge information. Taking into accounts the results in Section III-C, Hyper PA replicates well various first-order and second-order structures in all datasets.

VI. CONCLUSION
Investigating the trade-offs of graph models, such as simplification by pairwise relationships, can provide valuable insight when considering which structures to choose for realworld complex systems: graphs, hypergraphs, or others. In this paper, we have proposed a statistical method for estimating the preferential attachment in temporal hypergraphs. We also derived the conditional probability and the recursive formula that stabilize and accelerate the estimation on the hypergraph model. The analysis of the real-world datasets showed that the PA function of the hypergraph model had a similar form to that of the graph model in previous works. Furthermore, we demonstrated that our hypergraph PA growth model has advantages over conventional graph-based models in that it can better capture the first-order and second-order structures around each node.
Future work includes more scrutiny of growth mechanisms in hypergraph models. In the case of functions using node features such as hyperdegree in our model, the computational complexity of the likelihood function can be similarly reduced by utilizing the proposed recursive formula. For example, the log-likelihood function can be efficiently computed when (2) is modified to where is the "fitness" parameter of node [14]. However, in the case of features that use dyadic relations, such as common neighbor nodes between a node pair, or features defined for a node set, the recursive formula can not be directly applied. Therefore, when extending the method to higherorder features, it will be necessary to solve the combinatorial computation problem, which hypergraph models often face. .

APPENDIX A DERIVATION OF THE CONDITIONAL PROBABILITY TO EXCLUDE EFFECTS OF NEWCOMER NODES
We here derive the conditional probability (8). Let = ( , ) be the hypergraph at time-step . and are the node set and the hyperedge set, respectively. For convenience, we denote new = \ −1 and exist = −1 . Recall that Hyper PA determines the probability that a set of nodes will acquire a hyperedge of size . Let 1 and 2 be the sets of the nodes satisfying = 1 ∪ 2 , 1 ⊂ new , 2 ⊂ exist , | 1 | = , and | 2 | = − = ′ . We here decompose (2) into the conditional probability given that 1 = * for a prespecified set * ⊂ new and the probability of observing * : ( ) The term 1 ∪ 2 | 1 = * ( ) corresponds to the desired conditional probability in (8). Note that we denote * ⊂ new and not * = new because we allow the temporal hypergraphs to add multiple hyperedges at each time-step . With (3) and (10), we obtain: thus showing (8). So far we have assumed that 1 = * is pre-specified, but we can change the setting so that 1 is randomly sampled from new with ( 1 ) = 1/ | new | . Then log ( 1 ) terms may be added to the log-likelihood function ( | 0 , . . . , ). However, since log ( 1 ) does not involve , it does not change the maximum likelihood estimation of .