Randomizing hypergraphs preserving degree correlation and local clustering

Many complex systems involve direct interactions among more than two entities and can be represented by hypergraphs, in which hyperedges encode higher-order interactions among an arbitrary number of nodes. To analyze structures and dynamics of given hypergraphs, a solid practice is to compare them with those for randomized hypergraphs that preserve some specific properties of the original hypergraphs. In the present study, we propose a family of such reference models for hypergraphs, called the hyper dK-series, by extending the so-called dK-series for dyadic networks to the case of hypergraphs. The hyper dK-series preserves up to the individual node's degree, node's degree correlation, node's redundancy coefficient, and/or the hyperedge's size depending on the parameter values. We also apply the hyper dK-series to numerical simulations of epidemic spreading and evolutionary game dynamics on empirical hypergraphs.

For hypergraphs, the properties of hyperedges as well as those of nodes are considered to affect their structure and dynamics. The existing reference models for hypergraphs preserve only up to the degree of each node and the size of each hyperedge (i.e., number of nodes that belong to each hyperedge) of a given hypergraph 23,29,[44][45][46] . In the present study, we propose a family of reference models for hypergraphs, called the hyper -series. The original -series is a nested family of reference models that preserve local properties of nodes of the given dyadic network 37,41,43 . The hyper -series extends the -series to the case of hypergraphs. The hyper -series preserves local properties of nodes and hyperedges of the given hypergraph to tunable extents. Then, we showcase its use in investigating epidemic spreading 47 and evolutionary game dynamics 48 models on hypergraphs. Our code for the hyper -series is available at https://github.com/kazuibasou/hyper-dk-series.

Hypergraph and Bipartite Graph
We represent a network including higher-order interactions among two or more entities as an unweighted hypergraph that consists of a set of nodes = { 1 , . . . , } and a set of hyperedges = { 1 , . . . , }, where is the number of nodes, and is the number of hyperedges. We assume that the original hypergraph, for which we generate sample hypergraphs using reference models, contains no multiple edges. Each hyperedge ∈ is a subset of with arbitrary cardinality | |.
We denote by = ( , , E) the bipartite graph that corresponds to the given hypergraph, where E is a set of edges in the bipartite graph. An edge ( , ) exists between each node and each hyperedge if and only if belongs to the hyperedge in the hypergraph. We denote by M = |E | the number of edges in . We show in Fig. 1 a hypergraph and its bipartite-graph representation.

Local Properties of Nodes and Hyperedges
In this section, we describe local properties of bipartite graph some of which our reference models preserve. We denote the incidence matrix of by = ( ), where = 1, . . . , , = 1, . . . , , = 1 if ( , ) ∈ E, and = 0 otherwise. Let = =1 be the degree of node . We denote the size of hyperedge , i.e., the number of nodes that belong to hyperedge , by = =1 . We define the joint degree distribution of two nodes that share at least one hyperedge, which extends the joint degree distribution for dyadic networks defined in Refs. 37,43 . Let ( , ) denote the number of hyperedges that nodes with degree = 1, . . . , and nodes with degree = , . . . , share. For example, in a bipartite graph shown in Fig. 1(b), one obtains (1, 2) = 2 because node 1 with degree 1 = 2 and node 3 with degree 3 = 1 share a hyperedge 2 , and node 2 with degree .

=1 =
( , ) = 1. We also define the average degree of the nearest neighbors of nodes with degree , which extends the definition for dyadic networks defined in Refs. 1,49 , by .
Parameter value Properties to be preserved = 0 Average degree of the node = 1 Degree of each node = 2 Pairwise joint degree distribution of the node = 2. 5 Degree-dependent redundancy coefficient of the node = 0 Average size of the hyperedge = 1 Size of each hyperedge Equations (1) and (2) are consistent with the corresponding definitions for dyadic networks when = 2 for each hyperedge ∈ .
We also examine quadruple relationships around a node in a bipartite graph, which is similar to local clustering (i.e., abundance of triangles) in dyadic networks. The redundancy coefficient of node , denoted by , quantifies the amount of quadratic relationships around the node in a bipartite graph 50 . It is the fraction of pairs of hyperedges to which belongs such that at least one different node also belongs to both hyperedges. Formally, if > 1, we define where we define Γ = { ∈ \{ } | , , > 0} and 1 { } denotes an indicator function that returns 1 if a condition holds, and it returns 0 otherwise. We define = 0 if ∈ {0, 1}. The degree-dependent redundancy coefficient of the node is the average of the redundancy coefficient over the nodes with degree , i.e., where ( ) is the number of nodes with degree . One can also define the pairwise joint size distribution of the hyperedge and the redundancy coefficient of the hyperedge in the same way as for the node. However, we do not introduce them because we construct reference models that preserve up to the size distribution of the hyperedge. This choice stands on our observation that it is practically difficult to generate randomized bipartite graphs preserving up to pairwise correlation and quadratic relationships for both nodes' degrees and hyperedges' sizes. If one is interested in preserving the size correlation and redundancy for hyperedges instead of the corresponding quantities for nodes, one can apply our algorithm described in the following text after interchanging the nodes and hyperedges in the bipartite-graph representation of the hypergraph.

Reference models for hypergraphs -hyper -series
In this section, we propose a family of reference models for hypergraphs that preserve local properties of nodes and hyperedges in the given hypergraph to different extents. We extend a class of reference models for dyadic networks called the series 37,41,43 to the case of hypergraphs. The -series preserves some local properties of nodes (i.e., degree distribution, joint degree distribution, or degree-dependent clustering coefficient) of a given dyadic network.
The proposed model, which we refer to as hyper -series, produces a bipartite graph that preserves the joint degree distributions of the node in the subgraphs of size ∈ {0, 1, 2, 2.5} or less and the size distributions of the hyperedge in the subgraphs of size ∈ {0, 1} or less in the given bipartite graph . We list the quantity corresponding to each and value in Table S1. By definition, the hyper -series with = 0 preserves the numbers of edges in , or equivalently, the average degree of the node. The hyper -series with = 1 preserves the degree of each node. With = 0 and = 1, the hyper -series similarly preserves the average size of hyperedges and the size of each hyperedge, respectively. With = 2, it preserves the degree of each node and aims to preserve the pairwise joint degree distribution of the node. With = 2.5, it intends to preserve the joint degree distributions of nodes in the subgraphs of size between = 2 and = 3. By definition, this means that the hyper -series preserves the degree of each node, approximately preserves the pairwise joint degree distribution of the node, and approximately preserves the degree-dependent redundancy coefficient of the node. Like the -series for dyadic networks 37,41,43 , the hyper -series have an inclusiveness property. In other words, the hyper -series with given values of and preserve quantities that any hyper -series with ( , ), where ≤ and ≤ , preserve.

∈ {0, 1}
In this section, we describe generation of bipartite graphs using the hyper -series with ∈ {0, 1} and ∈ {0, 1}. We distinguish between the original bipartite graph, denoted by = ( , , E), and the bipartite graph produced by the hyper -series, denoted by˜= ( , ,Ẽ). We allow˜to have multiple edges between nodes and hyperedges and to have multiple connected components, which are allowed in previous studies as well 29,50 . We define a component of a bipartite graph as any of its maximal subgraphs in which any two nodes are connected to each other by a path within the subgraph. Our algorithm of the hyper -series starts with a bipartite graph with nodes, hyperedges, and no edge. When ( , ) = (0, 0), we sequentially add edges to construct˜as follows. We select a node uniformly randomly, i.e., with probability 1/ and a hyperedge uniformly at random, i.e., with probability 1/ , and connect them ( Fig. 2(b)). We repeat this procedure M times. The generated bipartite graph has nodes, hyperedges, and M edges, and hence preserves the average nodal degree and the average size of the hyperedge. When ( , ) = (1, 0), we first attach half-edges to each node ( Fig. 2(c)). Then, we connect each of the M half-edges to a hyperedge chosen uniformly at random, i.e., with probability 1/ . The case of ( , ) = (0, 1) is parallel to that of ( , ) = (1, 0). Specifically, we first attach half-edges to each hyperedge ( Fig. 2(d)) and then connect each of the M half-edges to a node chosen uniformly at random, i.e., with probability 1/ . When ( , ) = (1, 1), we first attach half-edges to each node and half-edges to each hyperedge ( Fig. 2(e)). Then, we select a free (i.e., yet available) half-edge attached to a node and a free half-edge attached to a hyperedge uniformly at random and connect them to create a hyperedge. We repeat these steps until we exhaust all the free half-edges.

3.2
∈ {2, 2.5} The hyper -series with ≤ 1 and ≤ 1 exactly preserves up to the degree of each node and the size of each hyperedge. However, it is practically difficult to construct a bipartite graph that exactly preserves the pairwise joint degree distribution of the node by starting from the empty network and adding edges. The intuitive explanation for this difficulty is as follows. Consider an edge, of which one end has already been attached to a node with degree . Suppose that we connect the other end of this edge to hyperedge of size . If ≥ 3, then ( , ), i.e., the number of hyperedges that a node with degree and a node with degree share simultaneously changes for multiple values of in general. This fact makes it difficult to connect edges between nodes and hyperedges one by one while exactly preserving the node's pairwise joint degree distribution, i.e., Targeting-rewiring process for ! ! = 2.5 (a) Hyper !"-series with ! ! , ! " = (2, 0) (c) Hyper !"-series with ! ! , ! " = (2.5, 0) Targeting-rewiring process for ! ! = 2.5 (d) Hyper !"-series with ! ! , ! " = (2.5, 1) Hyper !"-series with ! ! , ! " = (1, 1) Targeting-rewiring process for ! ! = 2 Targeting-rewiring process for ! ! = 2 Targeting-rewiring process for ! ! = 2 Figure 3. Workflow of the hyper -series with ∈ {2, 2.5} and ∈ {0, 1}. represents the number of hyperedges; ( ) represents the degree distribution of the node; ( ) represents the size distribution of the hyperedge; ( , ) represents the joint degree distribution of the node; ( ) represents the degree-dependent redundancy coefficient of the node.
This problem is similar to the one for dyadic networks; it is difficult to construct dyadic networks that exactly preserve higher-order structures than the pairwise joint degree distribution of the node 37,41,43 . To mitigate this problem, the algorithm of the -series for dyadic networks uses the so-called targeting-rewiring process with the aim of preserving the pairwise joint degree distribution and the triadic relationships, i.e., the degree-dependent clustering coefficient of the node. In the targeting-rewiring process, one repeatedly rewires edges in the generated network such that the final network exactly preserves the pairwise joint degree distribution and approximately preserves the degree-dependent clustering coefficient of the input network.
We extend the targeting-rewiring process for -series to the case of bipartite graphs to realize the algorithm of hyper -series with ∈ {2, 2.5}. We show the composition of the hyper -series with ∈ {2, 2.5}, which involves the targetingrewiring process, in Fig. 3. Specifically, the hyper -series with = 2 starts by generating a bipartite graph using the hyper -series with = 1 and the given ∈ {0, 1} (see Fig. 3(a) and 3(b)). The generated network preserves the degree of each node and either the average size of hyperedges or the size of each hyperedge depending on whether = 0 or = 1, respectively. Then, we run the targeting-rewiring process for = 2, which amounts to repeatedly rewiring edges such that the randomized hypergraph approximately restores the joint degree distribution of the original hypergraph while exactly preserving 5/28 the degree of each node.
The targeting-rewiring process for = 2 runs as follows. We first select a pair of edges ( , ) and ( , ) such that ≠ and ≠ uniformly at random (see Fig. 4(a)). Then, we replace ( , ) and ( , ) by ( , ) and ( , ) if and only if a distance between the original and present pairwise joint degree distribution, denoted by 2 , decreases if we rewire the edges. Using the normalized 1 distance, we define 2 by where ( , ), ( , ), and represent the pairwise joint degree distribution of the node, the number of hyperedges that nodes with degree and nodes with degree share, and the size of hyperedge , respectively, for the rewired hypergraph.
To derive the second line in Eq. (5), we have used =1 = ( , ) = 1. We repeat the rewiring attempts times until 2 becomes sufficiently small and hardly decreases by further rewiring. We set = 500M.
The rewiring preserves the normalization factor, =1 ( − 1), because the rewiring does not alter for any = 1, . . . , . This property makes it easy to calculate 2 . In other words, for each edge ( , ) to be added or removed by the rewiring, it is sufficient to calculate how the number of hyperedges, ( , ), where and are the degrees of two nodes belonging to hyperedge , changes (see Eq. (5)).
It is also difficult to construct bipartite graphs that exactly preserve the degree-dependent redundancy coefficient of the node, ( ), over the values of . This is because the redundancy coefficients of multiple nodes simultaneously change if one adds or removes an edge in general. Therefore, for = 2.5, we further repeatedly rewire edges of the hypergraph generated by the hyper -series with = 2 as follows. (We call this procedure targeting-rewriting for = 2.5. See also Figs. 3(c) and 3(d).) We first select a pair of edges ( , ) and ( , ) such that ≠ , ≠ , and = uniformly at random (see Fig.  4(b)). Then, we replace ( , ) and ( , ) by ( , ) and ( , ) if and only if the distance defined by where ( ) represents the degree-dependent redundancy coefficient of the node for the rewired hypergraph, decreases after the rewiring. We repeat the rewiring attempts = 500M times. It is easy to calculate 2.5 upon a rewiring attempt. To explain this, we rewrite Eq. (6) as

6/28
where represents the redundancy coefficient of node for the rewired hypergraph. To derive Eq. (7), we have used the fact that the rewiring exactly preserves the degree of each node. Equation (7) implies that it is sufficient to only calculate the change in for the nodes that are involved in the rewiring (i.e., and ) and those that share at least one hyperedge with either or . The first subprocess comprising the hyper -series with ∈ {2, 2.5} is to generate a randomized hypergraph using the hyper -series with = 1 (see Fig. 3). This process preserves the node's degree distribution and destroys the degree correlation and redundancy of the node. The second subprocess comprising the hyper -series with ∈ {2, 2.5} is the targeting-rewiring process. This process also preserves the node's degree distribution. Therefore, the entire procedure of the hyper -series with ∈ {2, 2.5} preserves the node's degree. Furthermore, the targeting-rewriting with = 2 and = 2.5 makes the degree correlation and redundancy, respectively, approach those of the original hypergraph, which has been lost in the course of the first subprocess. Therefore, the entire hyper -series with = 2 and = 2.5 approximately preserves the degree correlation and the redundancy, respectively.
The targeting-rewiring process for = 2.5 also preserves the degree correlation, i.e., ( , ) for each and , for the following two reasons. First, owing to the constraint that = , the rewiring preserves ( , ), i.e., the number of hyperedges that nodes with degree and nodes with degree share, for any and . Second, the rewiring preserves the normalization factor =1 ( − 1) as in the case of = 2. The targeting-rewiring process for = 2 or 2.5 preserves the size of each hyperedge of the randomized hypergraph. However, with ( , ) = (2, 0) or (2.5, 0), the hyper -series does not preserve the size of each hyperedge of the input hypergraph. This is because we first generate a bipartite graph with ( , ) = (1, 0), which destroys the size distribution of hyperedges, prior to the targeting-rewiring (see Figs. 3(a) and 3(c)).

An alternative algorithm for
, we have an alternative to the targeting-rewiring process, which is an extension of the so-called randomizingrewiring process in -series for dyadic networks 37,43 to the case of bipartite graphs. The randomizing rewiring produces bipartite graphs that exactly preserve both nodal degree distribution and ( , ). In randomizing rewiring, the initial bipartite graph is a replica of the original bipartite graph . Then, we select a pair of edges, ( , ) and ( , ), such that ≠ , ≠ , and = uniformly at random, and then replace ( , ) and ( , ) by ( , ) and ( , ). The rewiring preserves the degree of each node, ( , ), and the size of each hyperedge. We repeat this rewiring procedure times, where is sufficiently large, and use the final result as˜. We set = 100M because we have found up to our numerical efforts that the overlap of edges of and those of the rewired hypergraph converges sufficiently before = 100M.
The randomizing rewiring has an advantage over the targeting rewiring in that it exactly preserves both the degree of each node and ( , ) of the input bipartite graph; the targeting rewiring only approximately preserves ( , ). However, in contrast to the case of dyadic networks for which the randomizing rewiring is efficient 37,43 , the randomizing rewiring for the hyper -series has two drawbacks. First, it is only practical with ( , ) = (2, 1). On one hand, although we can easily extend the randomizing rewiring to the case of ≤ 1 and ≤ 1, efficient algorithms for generating bipartite graphs exactly preserving the quantities with ≤ 1 and ≤ 1 already exist, as we described in Section 3.1. On the other hand, it is practically difficult to apply the randomizing rewiring in the case of ( , ) = (2, 0), (2.5, 0), and (2.5, 1) because a proposed random rewiring that respects the constraints imposed by the given ( , ) rarely preserves ( , ). Second, the overlap of the edges in and those in the rewired hypergraph converges to a nonnegligibly large value with the randomizing rewiring with ( , ) = (2, 1). In other words, the randomizing rewiring does not sufficiently randomly shuffle the edges of the original bipartite graph even if one carries out the rewiring many times. We show numerical evidence of this phenomenon in Appendix A. Therefore, we use the targeting rewiring in the following analyses when ( , ) = (2, 1).

Data
In this section, we apply the hyper -series to four empirical hypergraphs. The NDC-classes hypergraph, which we refer to as the drug hypergraph in the following text, is a drug network constructed from the National Drug Code Directory 13 . Its nodes are class labels, such as serotonin reuptake inhibitor, and a hyperedge is a set of class labels associated with a single drug. The Enron hypergraph is an email communication network 7,13 , in which a node is an email address, and a hyperedge is a set of all recipient addresses of an email. The primary-school hypergraph is a social contact network, where nodes are individuals (i.e., students or teachers), and a hyperedge represents an event in which a set of individuals are in face-to-face contact event with each other 5,13 . The high-school hypergraph is also a social contact network, where nodes are students, and a hyperedge is a face-to-face contact event among a set of students 6,13 . We preprocessed each data set by first removing multiple hyperedges in the original hypergraph, and then by extracting the largest connected component. Table S2 shows properties of the largest connected component, which we use in the following analysis, for the four data sets.

Structural Properties
For each empirical hypergraph, we compare six structural properties among the given hypergraph and hypergraphs generated by the hyper -series with ∈ {0, 1, 2, 2.5} and ∈ {0, 1}. We also analyze an existing reference model for bipartite graphs, the B2K 45 , as a benchmark. In terms of the terminology of hypergraphs, the B2K model preserves the degree of each node, the size of each hyperedge, and the number of hyperedges with size to which nodes with degree belong for each and . Figure 5 compares the six structural properties between the drug hypergraph, the hyper -series, and the B2K model. The results for the hyper -series with = 0 and various values of together with those for the original drug hypergraph and the B2K model are shown in Fig. 5(a)-5(f). We make the following observations. First, Fig. 5(a) indicates that the hyper -series with ∈ {1, 2, 2.5} but not = 0 exactly preserves the degree of each node (and therefore the degree distribution) of the drug hypergraph, which is expected. Second, Fig. 5(b) indicates that the hyper -series with ∈ {2, 2.5} but not ∈ {0, 1} approximately preserves the average degree of the nearest neighbors of nodes with degree , denoted by nn ( ), in the input hypergraph. Because nn ( ) is a derivative of the pairwise joint distribution of the node's degree, ( , ), which the hyper -series with ≥ 2 intends to preserve, this result is also expected. The hyper -series with ∈ {0, 1} produces networks without noticeable degree correlation of the node (see Fig. 5(b)). Third, as expected, the hyper -series with = 2.5 but not with smaller values approximately preserves the degree-dependent redundancy coefficient of the node, ( ), of the empirical hypergraph (see Fig. 5(c)). Fourth, as expected, the hyper -series with any ∈ {0, 1, 2, 2.5} and = 0 does not preserve the distribution of the size of the hyperedge of the original hypergraph; it only preserves the average size of the hyperedge (see Fig. 5(d)). Fifth, the hyper -series with a larger value of better approximates the distribution of the shortest path length between node pairs although the hyper -series is not designed to preserve this quantity (see Fig. 5(e)). Finally, we show in Fig. 5(f) the cumulative degree distribution of the one-mode projection, where each pair of nodes in the projected network are adjacent if they belong to at least one common hyperedge, and the multiplicity of the edge is equal to the number of hyperedges that the two nodes share 57,58 . The hyper -series progressively better approximates the cumulative degree distribution of the one-mode projection when is larger, whereas the results are similar between = 2 and = 2.5. Note that the hyper -series is not designed to preserve the degree distribution of the one-mode projection. We show in Fig. 5(g)-5(l) the results for the hyper -series with = 1 and various values of together with those for the B2K model. The results for the empirical hypergraph and the B2K model shown in these figures are the same as those shown in Fig. 5(a)-5(f). We make the following observations. First, as expected, the results shown in Fig. 5(g)-5(i) are similar to those shown in Fig. 5(a)-5(c). In other words, the hyper -series with ≥ 1 preserves the degree distribution of the node, that with ≥ 2 additionally preserves nn ( ), and that with = 2.5 additionally preserves ( ). Second, Fig. 5(j) indicates that the hyper -series preserves the distribution of the size of hyperedge, which is because we set = 1. Third, similar to the case of = 0, the hyper -series with a larger value better approximates the distribution of the shortest path length between nodes (see Fig. 5(k)). A comparison between Figs. 5(e) and 5(k) suggests that the approximation accuracy is not notably different between = 0 and = 1. Finally, a comparison between Figs. 5(f) and 5(l) suggests that the hyper -series with ≥ 2 and = 1 more accurately approximates the cumulative degree distribution of the one-mode projection than the hyper -series with the same value and = 0 and than that with ≤ 1 and = 1. This is presumably because the node's degree in the one-mode projection depends not only on the degree of the node in the original hypergraph but also on the size of each hyperedge to which the node belongs.
The B2K model exactly preserves the distributions of node's degree and hyperedge's size, as expected (see Figs. 5(a) and 5(d)). However, it little preserves the node's degree correlation and the redundancy coefficient of the empirical network (see Figs. 5(b) and 5(c)). Therefore, roughly speaking, the complexity of the B2K model is somewhere between that of the hyper -series with ( , ) = (1, 1) and that with ( , ) = (2, 1). We also find that the B2K model accurately preserves the degree distribution of the one-mode projection (see Fig. 5(f)) although the B2K model does not intend to preserve it.
To be quantitative, we measure the distance in the distribution of each of the six quantities between the empirical hypergraph 8/28  Average of neighbor's degree     Table 3. Distance between the empirical hypergraphs and those generated by the reference models (i.e., hyper -series and B2K model). In the table, ( ) represents the cumulative degree distribution of the node; nn ( ) represents the average degree of the nearest neighbors of nodes with degree ; ( ) represents the degree-dependent redundancy coefficient of the node; ( ) represents the cumulative size distribution of the hyperedge; ( ) represents the distribution of the shortest path length between nodes; (˘) represents the cumulative degree distribution of the one-mode projection.

Data
Model  and each type of synthetic hypergraph for each data set. For the degree distribution of the node, the size distribution of the hyperedge, and the degree distribution of one-mode projection, we calculate the Kolmogorov-Smirnov distance between the cumulative distribution for the original bipartite graph and that for the generated bipartite graph. The Kolmogorov-Smirnov distance between two cumulative distributions, denoted by 1 ( ) and 2 ( ), is given by sup | 1 ( ) − 2 ( )|. For nn ( ), ( ), and the distribution of the shortest path length between nodes (which we denote by (ℓ) for the shortest path length ℓ), we calculate the normalized 1 distance between the vector corresponding to the original bipartite graph, denoted by = ( 1 , 2 , . . . , ), and that corresponding to the synthetic bipartite graph, denoted by˜= (˜1,˜2, . . . ,˜). Specifically, we set = nn ( ) with = 1, . . . , , = ( ) with = 1, . . . , , or = (ℓ) with = 1, . . . , − 1, and similar for˜. The distance between and˜is defined by =1 |˜− |/ =1 | |. We calculate the distance average of each property over the independent 100 runs for each model. In each model, we generate an independent bipartite graph for each run.
We show the distance measurement results in Table S3. The following observations apply to all the data sets unless we state otherwise. First, we verify that the degree distribution of the node is the same between the empirical data and the hyper -series with ≥ 1 and the B2K model. Second, the hyper -series with = 2 realize a considerably small distance to the empirical data in terms of nn ( ). Third, the hyper -series with = 2.5 yields a small distance to the empirical data in terms of ( ). Fourth, the distribution of hyperedge's size is the same between the empirical data, any hyper -series with = 1, and the B2K model. Fifth, for both = 0 and = 1, the hyper -series is more similar to the empirical data in terms of the distribution of shortest path length between nodes (i.e., (ℓ)) when is larger. However, with the exception of primary-school hypergraph, the relative error between the hyper -series and the empirical hypergraph in terms of (ℓ) is large (i.e., > 30%) even with ( , ) = (2.5, 1). Finally, the hyper -series with ( , ) = (2, 1), (2.5, 1) and the B2K model are close to the empirical data in terms of the degree distribution of the one-mode projection. All these results are consistent with those shown in Fig. 5. We also statistically tested how significantly the hyper -series changes each structural property of a given hypergraph (see section S2 in the supplementary material).
To examine if the targeting rewiring introduces sufficient randomization, we measure the distance measures 2 and 2.5 , which are defined in Eqs. (5) and (6), as a function of the number of rewiring attempts, , for the hyper -series with ∈ {2, 2.5} and ∈ {0, 1}. The results for the drug data set are shown in Fig. 6. For both = 0 and = 1, 2 rapidly decreased to values that are ≈ 15% larger than the final value in the first ≈ 100M targeting rewiring attempts. Then, 2 continued to decrease slowly towards the final value. Similarly, 2.5 in the case of both = 0 and = 1 rapidly decreased to values that are ≈ 10% larger than the final values in the first 100M targeting rewiring attempts and then slowly decayed towards the final values. We confirmed that the trajectories of 2 and 2.5 were similar for the other three data sets.

Epidemic Spreading
A primary application of the hyper -series is to simulations of dynamical or other processes on hypergraphs. Specifically, comparisons between the results on the original and synthetic hypergraphs generally help us to understand particular structural properties of the hypergraph that impact the processes on hypergraphs. For example, comparisons between a dynamical process on networks generated by the hyper -series with = 0 and with = 1 will reveal the effect of the node's degree distribution. This is because the hyper -series with = 0 destroys the degree distribution of the original hypergraph, whereas that with = 1 preserves it. Likewise, comparisons between = 1 and = 2 will reveal the effects of degree correlation; comparisons between = 2 and = 2.5 will reveal the effects of redundancy; comparisons between = 0 and 1 will reveal the effects of the hyperedge's size distribution. We showcase the application of the hyper -series with epidemic spreading and evolutionary game dynamics models.  In this section, we examine a susceptible-infected-susceptible (SIS) model on hypergraphs in continuous time 47 . Each node is in either the susceptible state or the infectious state at any time . Each infectious node recovers and becomes susceptible according to a Poisson process with rate . A fundamental assumption underlying the present model, which distinguishes it from other SIS models on hypergraphs [59][60][61] , is that the contagion process is critical-mass dynamics, which generalizes a previous model 62 . Let denote the fraction of infectious nodes in hyperedge ∈ . For each hyperedge , each susceptible node in becomes infected at rate if and only if ≥ , where is a parameter. We set = 1 and = log 2 | |, where is a parameter 47 .

11/28
We assume that all the nodes are initially infectious and run the SIS model on the primary-school and high-school hypergraphs until = 100. We confirmed that the fraction of infected nodes converges to an approximate stationary value before = 100. For the given and values, we average the fraction of infected nodes over 95 ≤ ≤ 100 and over 100 runs. In the case of the hyper -series, we generate an independent bipartite graph for each run.
In Figs. 7(a) and 7(b), we set = 0.1 and compare the fraction of infected nodes among the primary-school hypergraph and hypergraphs generated by the corresponding hyper -series. We set = 0 in Fig. 7(a) and = 1 in Fig. 7(b). The results for the empirical hypergraph shown in Figs. 7(a) and 7(b) are the same. We make the following observations. First, the hyper -series with = 0 considerably overestimates the fraction of infected nodes and underestimates the epidemic threshold for the empirical hypergraph for any . Second, the fraction of infected nodes in the hyper -series with = 1 is closer to that in the empirical hypergraph than with = 0. Third, the hyper -series with ( , ) = (1, 1), (2, 1), and (2.5, 1) accurately estimate the fraction of infected nodes and the epidemic threshold in the empirical hypergraph and almost to the same extent. In other words, the hyper -series with ( , ) = (1, 1) is necessary and sufficient for reproducing the fraction of infected nodes as a function of the infection rate. These results indicate that the size of each hyperedge, or equivalently, its distribution, is a main determinant of the epidemic spreading more than are the node's local properties with > 1, such as the degree correlation and redundancy coefficient, and mesoscopic or macroscopic structure of the hypergraph. These results qualitatively remain the same for a different threshold value, i.e., = 0.5 (see Figs. 7(c) and 7(d)) and for the high-school hypergraph (see Figs. 7(e)-7(h)).

Evolutionary Dynamics
Next, we compare evolutionary dynamics on the empirical hypergraphs and the hyper -series. We use a previously proposed model of public goods game on hypergraphs, which proceeds as follows 48 . Each node selects either to cooperate or defect in each round of evolutionary dynamics. A cooperator transfers an asset to the public goods of hyperedge , where | | ≥ 2. A defector does not contribute to the public goods. The total investment in is C , where C is the number of cooperators in . Then, one multiplies the total investment by the synergy factor , where > 1, and then equally distributes the multiplied total investment among all the nodes in . The payoff that a cooperator and defector receives from hyperedge is equal to C = C /| | − and D = C /| |, respectively. As in the previous study 48 , we assume = | | , where > 0 and ≥ 0.
We numerically simulate the evolutionary public goods game on the given hypergraph as follows. Initially, each node is independently cooperator or defector with a probability of 0.5 each. In each round, we first uniformly randomly select a node , whose strategy (i.e., cooperation or defection) may be updated, with probability 1/ and then select a hyperedge to which belongs with probability 1/ uniformly at random. We continue this selection procedure until we select a hyperedge with | | ≥ 2. We have confirmed that each node belongs to at least one hyperedge with | | ≥ 2 in all cases. Then, all the nodes that belong to play the public goods game just once in each of the hyperedges to which they belong. Each node accumulates the payoffs from all the games that the node plays. Then, we divide the accumulated payoff by the number of games that the node has played. We denote by the payoff of node . Node adopts the strategy of the node that has gained the largest payoff in hyperedge , denoted by , with probability ( − )/Δ. When < 1, we set where˜m in = max{ min , 2}, and max and min are the largest and smallest sizes of the hyperedge, respectively. When ≥ 1, we set

13/28
Equations (8) and (9) guarantees that the probability ( − )/Δ is normalized (see Ref. 48 for details). If ≤ , node does not adopt the strategy of . For the given and values, we measure the fraction of cooperators as the average over the (10 6 + 1)st and (10 6 + 10 3 )th rounds in a single run and over 100 runs. In the case of the hyper -series, we generate an independent bipartite graph for each run.
In Figs. 8(a) and 8(b), we set = 0 and compare the fraction of cooperators on the primary-school hypergraph and the hyper -series. We set = 0 in Fig. 8(a) and = 1 in Fig. 8(b). The results for the empirical hypergraph shown in Figs. 8(a) and 8(b) are the same. We make the following observations. First, at both values, the node's pairwise degree correlation present in the empirical hypergraph promotes the cooperation but the node's degree distribution or the profile of the redundancy coefficient does not. Second, the fraction of cooperators in the hyper -series with any and = 0 is considerably smaller than that in the empirical hypergraph. In contrast, the fraction of cooperators in the hyper -series with = 1 is generally close to that in the empirical hypergraph. Therefore, destroying the distribution of the hyperedge's size in the original hypergraph suppresses cooperation. In fact, the size distribution of the hyperedge is a stronger determinant of the amount of cooperation than any of the node's local properties investigated (i.e., the degree distribution, pairwise degree correlation, and redundancy coefficient). Figures 8(c) and 8(d) show the results for = 1. We make the following observations. First, when = 0 (see Fig.  8(c)), preserving the node's degree correlation and redundancy of the original hypergraph individually enhances cooperation. However, when one destroys the degree correlation (i.e., = 0 or 1), there is less cooperation than in the original hypergraph. Furthermore, intriguingly, the hyper -series with ( , ) = (2, 0) and (2.5, 0) realize more cooperation than on the original hypergraph, suggesting that destroying the network structure that is higher-order than the degree-correlation and redundancy promotes cooperation. Second, there is less cooperation when the distribution of the hyperedge's size is preserved (i.e., = 1; Fig. 8(d)) than destroyed (i.e., = 0; Fig. 8(c)). This result is opposite to that for = 0 (see Figs. 8(a) and 8(b)). Third, similarly to the case of = 0, the preservation of the node's degree correlation and redundancy (but not higher-order structure) of the original hypergraph individually increases cooperation in the case of = 1. In particular, hyper -series with ( , ) = (2.5, 1) realizes more cooperation than on the original hypergraph (see the purple line in Fig. 8(d)). A comparison between Figs. 8(c) and 8(d) suggests that, no matter whether the distribution of the hyperedge's size is destroyed or preserved, destroying the structure that is higher-order than the node's redundancy by randomization yields more cooperation than in the original hypergraph. All these results qualitatively remain the same for the high-school hypergraph (see Figs. 8(e)-8(h)).
The critical point = ( ) separating the defection and cooperation phases is analytically calculated as follows 48 : where˜( ) = ( )/ max =˜m in ( ), and ( ) represents the fraction of hyperedges of size . Note that it holds that max =˜m in˜( ) = 1. In the infinite well-mixed population, the evolutionary dynamics converge to full defection and full cooperation when < ( ) and > ( ), respectively. When = 0, the primary-school hypergraph yields (0) ≈ 2.31. Roughly consistent with this, the fraction of cooperators on the empirical hypergraph reaches ≈ 1.0 at ≈ 2.5 in our simulations (see the red lines in Figs. 8(a) and 8(b)). The corresponding hyper -series with any and = 0 leads to (0) ≈ 2.77, which underestimates the threshold obtained from the numerical simulations, i.e., ≈ 3.3 (see Fig. 8(a)). However, Eq. (10) and our numerical results are consistent in the sense that the critical point in terms of for the hyper -series with = 0 is larger than that for the empirical hypergraph. The hyper -series with any and = 1 has the same analytically determined threshold, (0) ≈ 2.31, as the empirical hypergraph because these hypergraphs have the same distribution of the hyperedge's size. This result is also consistent with our numerical result that the fraction of cooperators reaches ≈ 1.0 at ≈ 2.5 in the hyper -series with any and = 1 (see Fig.  8(b)). When = 1, Eq. (10) predicts that (1) = 1.0 regardless of and the size distribution of the hyperedge (therefore, regardless of ). This result is consistent with our numerical results shown in Figs. 8(c) and 8(d).

Conclusion
We proposed a family of reference models for hypergraphs called the hyper -series. The hyper -series preserves the local properties of nodes and hyperedges in the given hypergraph to different extents. We empirically showed that the hyper -series 14/28  Figure 9. Comparison between the targeting-rewiring and randomizing-rewiring processes for the drug hypergraph. We set ( , ) = (2, 1). (a) Cumulative degree distribution of the node, (b) average degree of the nearest neighbors of nodes with degree , (c) degree-dependent redundancy coefficient of the node, (d) cumulative size distribution of the hyperedge, (e) distribution of the shortest path length between nodes, and (f) cumulative degree distribution of the one-mode projection. We indicate the curves behind other curves by the arrow and label wherever multiple curves completely or almost overlap each other.
preserves the properties of nodes and hyperedges, as intended, across different hypergraph data sets. We also showcased its use as reference models in investigating epidemic spreading and evolution of cooperation on hypergraphs. Models of dynamical processes on hypergraphs, such as the epidemic spreading [59][60][61]63 , evolutionary dynamics 48,64 , opinion dynamics [65][66][67] , and synchronization [68][69][70][71] , have been proposed. Deploying the hyper -series to studies of various models of dynamics is expected to better reveal how the dynamics depend on the specific structural properties of the given hypergraphs.
Up to our numerical efforts, we found that the hyper -series with a larger value better approximates the distribution of the shortest path length between nodes for the empirical hypergraphs. However, as expected, even the hyper -series with the largest value (i.e., = 2.5) does not accurately approximate the distribution of the shortest path length. In particular, we found that the average shortest path length for the hypergraphs generated by the hyper -series with = 2.5 is smaller than that for the empirical hypergraph for all the four data sets (e.g., the drug hypergraph has the average shortest length of 3.53, whereas the hyper -series has 3.03 for ( , ) = (2.5, 0) and 2.77 for ( , ) = (2.5, 1)). The community structure is one of network structures that is higher-order than the redundancy coefficient of the node and likely increases the shortest path length between nodes. Extending the hyper -series to reference models that additionally preserve the community structure warrants future work. To this end, it may be useful to employ a family of stochastic block models with the community structure for bipartite graphs [72][73][74][75] or hypergraphs [76][77][78][79] . In this section we compare the targeting-rewiring and randomizing-rewiring processes with ( , ) = (2, 1). We show the distributions of the six quantities for the two rewiring processes for the drug hypergraph in Fig. 9. Both rewiring processes exactly preserve the degree distribution of the node and the size distribution of the hyperedge of the original bipartite graph (see Figs. 9(a) and 9(d)). The randomizing-rewiring process exactly preserves nn ( ), whereas the targeting-rewiring process only approximately preserves it (see Fig. 9(b)). The two rewiring methods produce similar networks in terms of the degree-dependent redundancy coefficient, the distribution of the shortest path length between nodes, and the degree distribution of the one-mode projection, as shown in Figs. 9(c), 9(e), and 9(f), respectively.
We also compare the two rewiring processes in terms of the overlap of the edges of the empirical hypergraph and those of the synthetic hypergraphs. Figure 10(a) shows the Jaccard index between sets of edges in the drug hypergraph and the hypergraph generated by the randomizing rewiring as a function of the number of rewiring attempts. The figure indicates that the Jaccard index steadily decreases as the randomizing rewiring proceeds. However, it plateaus at ≈ 0.32, which implies that a set of edges in the synthetic bipartite graph is not sufficiently shuffled due to the constraints that each edge rewiring step has to preserve ( , ) in addition to the degree of each node. The Jaccard index similarly plateaus at ≈ 0.45, ≈ 0.45, and ≈ 0.21 for the Enron, primary-school, and high-school hypergraphs, respectively (see Figs. 10(b), 10(c), and 10(d), respectively). In contrast, the Jaccard index is ≈ 0.036, ≈ 0.016, ≈ 0.006, ≈ 0.005 under the targeting rewiring for the drug, Enron, primary-school, and high-school hypergraphs, respectively. Therefore, we conclude that the randomizing rewiring does not sufficiently shuffle the edges of the input hypergraph.

Supplementary Materials for:
Randomizing hypergraphs preserving degree correlation and local clustering Kazuki Nakajima, Kazuyuki Shudo, Naoki Masuda

S1 Size of the largest connected component of hypergraphs generated by hyper series
We measured how the size (i.e., number of nodes) of the largest connected component of the empirical hypergraphs changes by randomization using the hyper -series. We show in Table S1 the size of the largest connected component of hypergraphs the hyper -series generates, divided by that of the original hypergraph. The table indicates that we barely lose nodes in the largest connected component by the randomization.

S2 Statistical test for the structural properties of hypergraphs generated by hyper series
In this section, we statistically test whether the hyper -series changes each structural property of a given hypergraph. Consider a combination of any of the four empirical hypergraphs, any ( , ) pair, and any of the six structural properties. To carry out a -test, we first generate 100 pairs of independent hypergraphs using the hyper -series. Second, we measure the distance between the two hypergraphs in each pair in terms of the distance measure for the selected structural property. We denote by rand and rand the mean and standard deviation, respectively, of the distance calculated on the basis of the 100 pairs of randomized hypergraphs. Third, we generate another 100 hypergraphs using the hyper -series with the same ( , ). Fourth, we measure the distance between the empirical hypergraph and each of the 100 hypergraphs in terms of the selected structural property. We denote by emp and emp the mean and standard deviation, respectively, of the distance between a randomized hypergraph and the empirical hypergraph calculated on the basis of the 100 pairs. Finally, we calculate the effect size for the -test, called the Cohen's 1 , as We define = 0 if both emp − rand and ( emp ) 2 + ( rand ) 2 are equal to zero. We regard the effect size to be very small ( = ±0.01), small ( = ±0.2), medium ( = ±0.5), large ( = ±0.8), very large ( = ±1.2), and huge ( = ±2.0) 1, 2 . Table S2 shows rand , rand , emp , emp , and Cohen's for the cumulative degree distribution of the node. The effect size is huge when ( , ) = (0, 0) and (0, 1) for all the four empirical hypergraphs because the hyper -series with ( , ) = (0, 0) and (0, 1) destroys the degree of each node. For the other ( , ) values, the effect size is zero because the hyper -series with ∈ {1, 2, 2.5} exactly preserves the degree of each node. Table S3 shows the results for the average degree of the nearest neighbors of nodes with degree . The effect size is huge when is 0 or 1 because the hyper -series with these values destroys the degree correlation. When is 2 or 2.5, the hyper -series intends to preserve the degree correlation of the node. However, Table S3 indicates that the effect size ranges from medium to huge values, depending on the empirical network and the value. This is because the rand and emp are small. Nevertheless, the Cohen's values in these cases are much smaller than those for = 0 and 1. Table S4 shows the results for the degree-dependent redundancy coefficient of the node. The effect size is huge when is 0, 1, or 2 because the hyper -series with these values destroys the redundancy of the node of a given hypergraph. When is 2.5, the hyper -series intends to preserve the redundancy of the node. However, the effect size is huge for all the four hypergraphs and values. As in the case of the degree correlation, this is because rand and emp are small. Nevertheless, similar to Table S3, is much smaller with = 2.5 than with ≤ 2. Table S5 shows the results for the cumulative size distribution of the hyperedge. The effect size is huge when = 0 for all the four empirical hypergraphs because the hyper -series with = 0 destroys the size of each hyperedge. When = 1, the effect size is equal to zero because the hyper -series with = 1 exactly preserves the size of each hyperedge.

20/28
Tables S6 and S7 show the results for the distribution of the shortest path length between nodes and those for the cumulative degree distribution of the one-mode projection, respectively. For both properties, the effect size is huge in almost all cases. This result is consistent with the fact that the hyper -series does not intend to preserve these two properties. However, the value is considerably smaller when or is larger in most cases.