Spectrum Analysis of Contact Network for Public Policy in a Pandemic

In a pandemic, in order to slow down the spread of the virus, protect national health, and maintain the normal operation of economic activities, countries around the world will formulate public policies to limit the number of citizens that can gather. Our research focuses on how to achieve optimal public policy under different conditions. Traditional SIR and SEIR models can well reflect the transmission process and obtain credible prediction results from a macro perspective, but lack the sensitivity of micro data, and cannot assess the risk of epidemic transmission brought by close contacts and sub-close contacts. Based on the Barabási-Albert scale-free network and the Random spanning tree algorithm, we generate a simulated spread network for non-specific infectious diseases. At the same time, we also generate group networks under different gather constraints. The superposition of the two forms a composite contact network. Our research work on the contact network shows that after considering close contacts and sub-close contacts, the public policy optimization problem of slowing the spread of the epidemic can be answered by the spectrum analysis of the contact network. We perform computer simulations and theoretical proofs of this model, and conduct a transmission analysis of its process.


I. INTRODUCTION
I N the digital age, governments and scientific research institutions can easily obtain some accurate group contact network data in society, such as public transportation passenger lists, company employee lists, and student registration information in educational institutions. The contact network within each group constitutes a sub-network in the giant relational network of a community or city. From the relationship between the sub-networks, we can extract a lot of useful information. In a pandemic, it has become a challenge for government departments to reasonably limit the connections between these groups and formulate optimal public policies. Combining known epidemic data, analyzing the relationship between these sub-networks to provide a scientific theoretical basis for the government's public policy formulation has become one of the primary goals of our work.
In a pandemic, countries around the world will take various countermeasures and formulate different public policies in order to slow down the spread of the virus, protect national health, and maintain the normal operation of economic activities. Among them, one of the most extensive measures is to limit the number of people gathered in public places, such as limiting the number of passengers on public transport, limiting the number of people in restaurants, limiting the size of gatherings, etc. Our work focuses on researching and evaluating the social infection risk level under different aggregation restriction levels. This evaluation result can take into account the impact of close contacts and sub-close contacts on the risk level during the spread of the epidemic.
In the past, the main mathematical model used to study the spread of infectious diseases was the SIR or SEIR model. An SIR model is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. The name of this class of models derives from the fact that they involve coupled equations relating the number of susceptible people S(t), number of people infected I(t), and number of people who have recovered R(t) [1]. An SEIR model additionally considers number of people exposed to source of infection E(t). The SIR model consists of a system of three coupled nonlinear ordinary differential equations, where t is time, β is the infection rate, and γ is the recovery rate [2]. Spectrum Analysis is a mathematical method for evaluating the connection properties between vertices by studying the eigenvalues of the Laplacian Matrix of the Contact Network. For infectious diseases, common models based on analytical mathematics, such as SIR and SEIR models, can well reflect the propagation process and obtain reasonable prediction results from a macro perspective, but lack the sensitivity to micro-accurate data. I.e., these models cannot take into account the close contacts and sub-close contacts of the infected person. Therefore, traditional models cannot assess the impact of public policies on crowd gathering restrictions.
The study of the eigenvalue properties of the Laplacian Matrix makes it possible to focus on the microscopic connections between vertices, due to the sensitivity of the Laplacian Matrix.This means that we can analyze the close contacts of infected nodes as well as the next close contacts. Then, the overall infection risk level of the society can be assessed, so that governments of various countries can formulate reasonable public policies for epidemic prevention and control based on their own social conditions and characteristics.
Governments around the world have imposed different polices and undertaken preventive public health measures, such as social distancing orders, travel restrictions, local or national lockdown, and partial or complete border closures, to control the spread of the pandemic [3]. A better understanding of the underlying processes affecting pandemic dynamics and related infection patterns will help to improve the effectiveness of public health interventions worldwide [4]. Consequently, we propose a method of spectrum analysis that can assess the impact of close and sub-close contacts on risk levels across the community.
The eigenvalues of a matrix A are called its spectrum [5]. In the past, research on spectrum of a matrix has mainly focused on the algebraic connectivity of a graph, and the relationship between the largest eigenvalue and domination number. The algebraic connectivity (also known as Fiedler value or Fiedler eigenvalue) of a graph G is the secondsmallest eigenvalue (counting multiple eigenvalues separately) of the Laplacian matrix of G [6]. Clemens Brand and Norbert Seifter proved that there is a close connection between the domination number and the largest eigenvalue [7]. With the development of computer performance and the popularization of informatization, the use of spectral analysis methods to evaluate large-scale complex networks has become an acceptable option.
Governments all over the world were and are still looking for ways to contain the disease and alleviate its disastrous consequences to their people's health and economy [8]. With rapid advances in epidemic surveillance systems, abundant data of infection are collected. The availability of these data provides unprecedented opportunities to investigate the relationships between different risk factors [9]. human mobility patterns are primarily attributed to the structure of the spatial network [10]. In a small-scale spatial environment, individuals often follow the road network to visit a set of locations (e.g., workplaces, shopping malls, grocery stores). In a large spatial environment, people tend to travel through a spatial network of airports, railroads, and highways [11]. As such, spatial network characteristics can also be potentially related to the infectious disease spread in the spatial environment. The objectification of the spatial network is extremely contact network. Fast, reliable, and easily accessible clinical assessment of the severity of the disease can help in allocating and prioritizing resources to reduce mortality [3]. The analysis of the contact network can help us understand the transmission network of the disease, and thus obtain valuable information.
To achieve our research goals, we simulated a relational network with at least 10,000 nodes, and the relational network was generated by superimposing the simulated propagation network and the group relational network. We hope that our model can cope with different types of infectious diseases. To adapt to diseases with different spreading characteristics, we used the Barabási-Albert scale-free network and the Random spanning tree algorithm to generate our simulated spreading network, respectively, so that the degree distribution of the nodes showed a power-law distribution and a Possion distribution, respectively. By studying the properties of the Laplacian Matrix spectrum, we hope to reduce the time complexity of the process of analyzing the connections between sub-networks from O(n n ) (considering all possible connections and propagation paths) to O(n 3 ) (time complexity of computing matrix eigenvalues).
Based on a series of simulation operations and theoretical analysis of complex networks, we believe that after considering close contacts and sub-close contacts, the public policy optimization problem of slowing the spread of the epidemic can be solved by the spectrum analysis of the contact network. In the following text, we give the analysis process and proof, and conduct a diffusion analysis of its process.

II. NETWORK BUILDING AND SIMULATION
Before the theoretical analysis, we established a simulation model of infectious disease transmission and a group generation model. We generated our simulated propagation network using the Barabási-Albert scale-free network and Random spanning tree algorithms, respectively. All networks are finally recorded as an adjacency matrix of an undirected graph. From the specific implementation point of view, we divide the nodes into two categories, namely infected vertices and safe vertices. Whenever a new node is infected, we randomly select a node from safe vertices to join infected vertices. Then, according to different infection network generation algorithms, we select a node from the infected vertices as the infection source and connect it with the new infected vertex. For the Barabási-Albert scale-free network, we recorded the degree of each node and the sum of the degrees of all nodes. For each new infected vertex, node k becomes the source of infection with probability where d i is the degree of vertex i. For networks generated by the Random spanning tree algorithm, the selection of infection sources is equally possible. Finally, modify the adjacency matrix corresponding to the network to adapt it to the changed network structure.

FIGURE 1. Transmission network (left) and group network (right).
In the entire model we built, there is no time index. The reason is that our study focuses on the impact of the current infection status on the overall structure of the network. Different infectious diseases spread at different rates, and we only need to focus on the current state of the network. In addition to this, in order to make our simulated network realistic, our contact network model is based on the following two assumptions during establishment: 1) Stochastic Consistency Assumption. Although in the real world each node represents a different individual, all nodes in our contact network are considered to be randomly consistent. This means that although each individual in reality has its specific contact information, in our simulated large-scale contact network, this specific contact information is always represented by a randomly generated edge. The establishment of the stochastic consistency assumption has become a reasonable theoretical premise for our subsequent simulation calculation and analysis process. 2) Homogeneity assumption. Because our research focuses on the impact of infectious diseases on the overall network after the spread of infectious diseases across groups, in order to facilitate us to find the changes in the group network after superimposing the information of infectious diseases, we consider all group sub-networks to be homogenous. This means that during our simulations, the group sub-networks we generate are of equal size and ignore the internal non-infectious contact information. The homogeneity assumption makes our final theoretical analysis result correspond to the worst case in practice, because our group cardinality size is chosen as the maximum value under current public policy.
Based on the homogeneity assumption of the group network model, we generate the group network required in the data analysis process. The final step is to perform overlay processing. In order to facilitate the realization of the algorithm, the method we adopt is to directly continue to generate the group network on the generated transmission network. If an edge in the group network already exists on the transmission network, the edge remains unchanged. Finally, after making sure that the simulation program ran without errors, we wrote a script to simulate the transmission process. We get about 60GB of data for a total of 300 networks with two transmission networks, with infection rates ranging from 1% to 30% and group limit from 5 to 25 (interval is 5) when considering 10,000 nodes. Based on this, the followup theoretical analysis was carried out.

III. DIFFUSION ANALYSIS OF COMPLEX NETWORK
Epidemiological research has a long history, and a variety of epidemic transmission models have been proposed [12]- [14].
In a typical propagation model, individuals in a population are abstracted into several classes, each of which is in a typical state. Its basic states include S (susceptible), usually healthy state; I (infected), infected state; R (removed, refractory or recovered), removed state (also known as immune state or recovery state). The different contagion models are often named after the transitions between these states. For example, susceptible populations become infected and then recover and become immune, which is called the SIR model. If the susceptible population is infected and returns to the susceptible state, it is called the SIS model. Based on these node dynamics models, the propagation properties of viruses in complex networks have been studied.
In our model, the diffusion process between groups is explained by a modified NW small-world model. Newman and Watts first described the diffusion process on a ddimensional NW small-world network based on the NW small-world network model [15]. Moukarzel made a more specific analysis [16]. We construct the diffusion equation by adapting Moukarzel's ideas to our diffusion model.
Suppose that starting from the initial infected node, the virus runs at a constant velocity v = 1 start spreading.The density of shortcut endpoints in the NW small world network is ρ = 2p, whereρ is the probability parameter of adding a new shortcut in the small world network model.To fit our model here, we treat the probability of spread across groups in our model as a probability parameter ρ.
Suppose that this diffusion process is continuous.Therefore, the infection volume of a node in the network V (t) is a sphere with a radius t starting from the initial infected node Γ d t d−1 , where Γ d is the hypersphere constant in the d-dimensional small-world network. The probability of the infection source spreading across groups is ρ, and the resulting new infection sphere is ρΓ d t d−1 . Therefore, the average total infection VOLUME 4, 2016 With scale and differential transformation for (5), the linear diffusion equation is obtained Yang thinks that there is a time-delay δ in the diffusion process [17]- [18]. Equation (6) will be changed as follows: In our model, the size δ reflects the length of the incubation period of different infectious diseases, and the corresponding solution is This theoretical result matches well with the network model we studied. In our model, most of the propagation process analysis is not applicable due to the absence of a time index. But in the model of Moukarzel and Yang, the transmission rate is constant velocity v = 1, which corresponds to the process of computer program running. In addition, we treat the probability of spread across groups in our model as a probability parameter ρ, which fits our model well.

IV. EIGENVALUES OF LAPLACIAN MATRIX
The Laplacian matrix of an undirected graph G is a real symmetric matrix L, where and I.e., the degree of vertex i is denoted by l ii . The eigenvalues of L are listed in ascending order L is positive semidefinite and λ 1 = 0. Let W be a space such that {W |x T x = 1, x T e = 0}. According to Courant-Fischer min-max theorem [19], we have The numerical size of λ 2 is very critical, also known as Algebraic connectivity. If λ 2 = 0, then graph G is disconnected. Furthermore, if a graph G has n connected subgraphs, then we have (proof in appendix) Therefore, it is necessary to consider the number of 0eigenvalue firstly. In theory, the number of zero eigenvalues represents the number of connected subgraphs. In the real world, when a transmission chain occurs between two different groups, this means that the number of connected subgraphs decreases. The number of zero eigenvalue is denoted by z. From a macro perspective, if we consider all nodes to be equivalent, then we can treat the transmission process as random. Theoretically, We can calculate the expected value of z under different epidemic spread scenarios.If every k nodes form a group, then the percentage of safe nodes is k(z − 1)/n. Consider 10,000 nodes, set the group limit to 10, and use the random spanning tree model to get the simulation results. The theoretical expectation of the ratio of safe nodes can be calculated using probability. As shown in Figure  3, we compare the simulation results with the theoretical expectations. In fact, when judging the number of safe nodes by considering the number of connected branches, the expected value curve of the results obtained by the Barabási-Albert scalefree network model or the random spanning tree model is theoretically the same. Because the probability of spread across groups is the same in both cases. We simulated different groups under the limited number of people. The transmission process of the Barabási-Albert scale-free network model and the random spanning tree model are shown in Figure 4 respectively.
After considering zero eigenvalues, we convert all eigenvalues of the Laplacian matrix of the relationship network generated using random spanning tree into a mesh graph.For the largest eigenvalue λ n , there is a close connection between the domination number γ(G) and λ n .It's proved that if γ(G) ≥ 3 [7]. Since our transmission process is randomly generated by the model, it does not make sense to consider a specific dominating set as well as the largest eigenvector. Observing the remaining eigenvalues, a remarkable rule is that these eigenvalues exhibit a ladder feature.
In order to further obtain its change characteristics, we selected the 90% smaller eigenvalues of the Laplacian matrix of the corresponding network during the epidemic spread simulated by random spanning tree to draw a profile. Figure  5 intuitively demonstrates this property.
The eigenvalues are in ascending order,and the ladder feature gradually disappears. Let L s be a Laplacian submatrix of network G, the eigenvalues of L S will remain constant in that of L G while there is no edge between group s and others. I.e.,if then, while the elements of matrix A and B are all zero. The disappearance of the eigenvalue ladder feature is caused by the increase of infected persons. It is reflected in complex networks, that is, the number of edges between each group increases, which changes the eigenvalues of the original group. The edges between different groups will affect the eigenvalues, so the changes in the eigenvalues can reflect the changes in the contact network to a certain extent. By evaluating the ladder features appearing in the eigenvalues, the spread of the epidemic can be indirectly understood.When the number of close contacts and sub-close contacts gradually increased, the eigenvalue curve gradually became smooth. Therefore, our next work needs to evaluate the ladder feature of the eigenvalue curve. We know the number of 0 eigenvalues z. Now we can indirectly evaluate the contact relationship between all nodes by evaluating the ladder feature.It should be noted that the premise of this idea is that the homogeneity assumption of the model holds. This means that we ignore the contacts that exist inside the safe group. This assumption is reasonable, because in the real case, we only need to track the impact of infected individuals on the group. So far, how to evaluate the ladder feature has become the main content of our next work.

V. EVALUATION OF RISK
Evaluating the ladder feature and getting quantifiable results is one of the goals of our research. After seeing data features, we can first think of using classification algorithms, such as Naive Bayesian Classifier algorithm, Logistic Regress algorithm, decision tree algorithm, K-Nearest Neighbor algorithm, Artificial Neural Network algorithm, etc.
To solve the problem with an efficient and suitable algorithm, the following requirements are necessary: 1) The algorithm logic is simple, easy to implement and highly interpretable.
2) The algorithm needs to estimate fewer parameters.
3) The algorithm has a small error classification rate. 4) When the number of attributes is relatively large, the classification effect is not affected or is less affected. 5) For a huge amount of data, the time complexity and space complexity of the calculation are small enough. Since the numerical distribution of the eigenvalues of a specific Laplacian matrix is deterministic and complex. Each eigenvalue is related to all nodes in a specific connected subgraph, so our challenges are as follows: 1) Eigenvalues are difficult to trace. For a large-scale complex network, it is difficult to determine which sub-matrix of the connected sub-graph a particular eigenvalue comes from. Because the essence of the ladder feature is generated by the invariance of the connected subgraph, with the change of the connected subgraph, the ladder feature gradually disappears. 2) Uncertainty in the number of eigenvalues at a level in the Ladder feature. Because as the connected subgraph changes, the eigenvalues of the corresponding submatrix change. The eigenvalues of the changed matrix may migrate to other levels, which makes the number of eigenvalues at different levels continue to change.
The above phenomenon will affect the subsequence analysis. Combining data characteristics and analysis purposes, we use an algorithm that does not perform classification to evaluate different Laplacian matrices. Here we use an improved linear regression model to analyze the data. Before the calculation we first analyze the numerical distribution of the eigenvalues. Through model calculation, the number of eigenvalues presenting the ladder feature is limited. For the network models generated by the Barabási-Albert scale-free network and Random spanning tree, the larger eigenvalue sizes of the Laplacian matrices of these two models are related to the degree distribution of nodes. In order to unify the evaluation interval and avoid the influence of larger eigenvalues on the evaluation results, our evaluation interval covers VOLUME 4, 2016  70% of the smaller eigenvalues. The reason for choosing this interval is that the larger eigenvalue reflects the nature of the dominating set, and we focus on the information of close contacts and sub-close contacts in the real world, i.e., those nodes within the group that are close to infected nodes are the ones we want to focus on.
To reasonably quantify the ladder feature, a more intuitive method is to use linear regression to calculate residual sum of squares (RSS). The results of such calculations are intuitive and interpretable, and we demonstrate this well using actual calculation results. The operation results are close to the ideal results, so we analyze the transmission network generated by the Barabási-Albert scale-free network and the Random spanning tree algorithm. Since the difference between the eigenvalues of the two transmission networks is mainly reflected in the larger part, the results of the two are similar in the linear regression model calculated with 70% of the smaller eigenvalues.Finally, we decided to use the RSS of the linear regression model as an indicator for evaluating close and sub-close contacts. Intuitively, the more obvious the ladder feature, the less the connection between the connected subgraphs, and the larger the RSS. As the ladder feature fades away, the RSS decreases, reflecting the decrease in the number of safe vertices. Figure 7 and Table 1 show the RSS values at different infection rates and at different aggregation limits.
At this point, we can make a reasonable evaluation based on the RSS table. For example, when considering 10,000 nodes, when the infection rate is 1%, the RSS value of the simulated network corresponding to the group limit of 15 people is 216.97. This result was similar to the 10-person group limit when the infection rate was 5% to 6% (RSS values were 208.36 and 222.31, respectively). This means that the risk level for the 10,000-person community is similar in both cases. Then, according to this method, the local government can formulate the optimal public policy according to the medical level and economic and trade situation of its own society. On this basis, the infectious disease model has also been extended. The results of the evaluation method for close contacts and sub-close contacts proposed in this paper can also be one of the basis for subsequent policy formulation and situation prediction.    In fact, in a pandemic, the challenges facing governments and residents go beyond infectious disease and economic loss. Around the world, logistics disruptions, shortages of medical resources and living materials due to the pandemic are also real. Therefore, when faced with a pandemic, the government's public policy must be commensurate with the actual scale of disease transmission and the corresponding level of risk. In addition, through simulation analysis in advance to correctly understand the scale of disease transmission and social risk level, it will also help the government to rationally allocate living materials and medical resources.
In conclusion, we hope our work can contribute to the fight against infectious diseases for mankind.

VI. CONCLUSION
In this paper, we propose a method that can assess the impact of close and sub-close contacts on risk levels across the community. Based on the Barabási-Albert scale-free network and the Random spanning tree algorithm, we generate a simulated spread network for infectious diseases. After stacking the group network, we performed spectral analysis on the laplacian matrix of the generated network. Our work shows that, after considering close contacts and sub-close contacts, VOLUME 4, 2016 the public policy optimization problem of slowing the spread of the epidemic can be answered by profiling the contact network. We perform computer simulations and theoretical proofs of this model, and conduct a propagation analysis of its process. We believe that the conclusions obtained in this paper can provide a basis for making public policy and predicting the state of disease transmission.
Our model also has some shortcomings and needs to be continuously improved in the follow-up work. In the process of simulation and analysis, with the increase of the model scale, the space for storing the laplacian matrix of the transmission network continues to increase, and the time required to calculate the eigenvalues also increases. Considering the need to reduce storage space, after analysis, the matrix that needs to be stored in the model is a sparse matrix. Therefore, we can introduce a compressed storage algorithm for sparse matrices to improve the utilization of storage space. I.e., store the laplacian matrix in a compressed format, such as Compressed Sparse Row format(CSR), Block Row Sparse format(BSR), etc. Considering reducing the computing time, in the follow-up work, we can parallelize the serial code and use GPU or multi-core CPU for calculation, thereby improving the operation efficiency.

APPENDIX PROOF OF SPECTRUM ANALYSIS A. MIN-MAX THEOREM
Let M be a Hermite matrix of order n, where M has n real eigenvalues corresponding to n linearly independent eigenvectors v 1 , v 2 , v 3 , ..., v n . The eigenvalues are listed in ascending order and v 1 , v 2 , v 3 , ..., v n constitute a set of orthonormal basis for n-dimensional space ω. We can get a k-dimensional subspace U and the n − k + 1dimensional subspace from space ω. The intersection between U and is not null ,since the sum of dimensions ≥ n. Because we can get a vector x from the intersection such that and k n α 2 i = 1. Then, since the orthogonality of the eigenvectors. Similarly, we can also get a vector x from the intersection between a n − k + 1 -dimensional subspace V and the k-dimensional subspace such that Then, For (23) and (28), we have

B. PROPERTIES OF LAPLACIAN MATRIX
Let L be a Laplacian matrix, where L is positive semidefinite hence the quadratic form Moreover, we can know that all eigenvalues of L are at least 0. Furthermore, 0 is one of the eigenvalues corresponding eigenvector e = (1, 1, 1, ..., 1) T . (31)

C. CALCULATION OF ALGEBRAIC CONNECTIVITY
We have Rayleigh quotient of Matrix L The eigenvectors of L are critical points of function R L (x), and the eigenvalue corresponding to its eigenvector is the value at the critical point. We have subspace W hence we have known the smallest eigenvalue 0 and the eigenvector e corresponding to itself. W is the orthogonal complement of the eigenvector corresponding to the 0 eigenvalue of L. Thus, we can get λ 2 by Courant's theorem.

D. PROOF OF ALGEBRAIC CONNECTIVITY
For matrix L, suppose then, since λ 1 = 0, the multiplicity of 0 the eigenvalue of Laplacian Matrix L, is at least 2. To ensure no loss of generality, Let G be an oriented graph with n vertices and m edges. We have an incidence matrix D n×m from oriented graph G, where Perform a row transformation on D, then, where C i is the incidence matrix from each connected subgraph G i and G i has n i vertices. We have There is no intersection among different sets of edge from different connected subgraphs.Therefore, the row vectors of C i are linearly independent. Then, The Laplacian matrix L of graph G is denoted by Then, we have That means graph G is disconnected.The same approach also works for the converse and the corollary. Therefore, the conclusion is also true for undirected graph.