Modeling Information Diffusion With Sequential Interactive Hypergraphs

In online social networks, numerous users generate and distribute tremendous content simultaneously, and how information spreads in online society has attracted critical attention. Recent studies introduce deep learning to help mine the complex diffusion pattern and forecast the diffusion trend. However, there are two limitations of the state-of-art methods. On the one hand, existing models only focus on the internal influence within the diffusion flow but ignore the external influence from the dissemination of other contents. On the other hand, the dynamics of user interest are barely considered, while the changes of user preference for contents also have a considerable impact on future diffusion. To address these issues, we introduce the hypergraph structure and a sequential framework to model complex interactions in social networks. Then, we propose dual-channel hypergraph neural networks to tackle the diffusion prediction problem, denoted by HyperINF. Specifically, in the user channel, we build sequential user interactive hypergraphs and learn the dynamic user representation, and in the diffusion channel, we construct a diffusion interactive graph to capture the cross-diffusion relation. At last, we consider the social relation to help make the prediction. Experimental results on three datasets suggest the effectiveness and practicability of the proposed framework.


INTRODUCTION
T HE emergence of social networks enriches the real-time communication of individuals, and a large number of contents are spreading in the social network. Hence, understanding information diffusion is critical for social marketing and even misinformation recognition. Researchers explore the factors of information diffusion and propose various models to extract the diffusion patterns for future prediction.
Previous studies have shown that the factors of information diffusion are diverse, such as topics [1], social relationship [2], and time sensitivity [3]. The influence of different factors is mixed and complex. Recent models settle the diffusion prediction problem with the help of deep learning techniques. Instead of manually extracting the features, representation learning is widely used in diffusion models [4], [5], [6], [7]. Furthermore, researchers get to adopt Recurrent Neural Networks (RNNs) to model the diffusion cascade and improve the prediction accuracy. For example, some models [8] directly adopt RNNs to process the diffusion sequence in time order, while other works incorporate structural features of social networks by innovating basic RNNs [9] or adding customized structural module [10], [11].
Even though the state-of-art models achieve promising results in diffusion prediction, these models have two limitations. First, the external influence from the dissemination of other contents is ignored. The existing models only focus on the internal influence within the diffusion sequence, such as temporal and social relations. As shown in Fig. 1, multiple information diffusion flows coexist in the social network. There are complex interactions not only within the diffusion but also cross various contents dissemination. For example, in the dissemination of p 1 and p 2 , it is observed that the similar diffusion pattern occurs among u 3 , u 2 , and u 4 at t 5 . Meanwhile, u 5 retweets p 2 . In this case, if we utilize previous models to predict who will retweet p 1 at t 6 , the most likely user is u 5 . While it has not reckoned with the spreading of p 4 at the same time, the fact remains that u 5 retweets p 4 not p 1 at t 6 . It shows that the simultaneous diffusion flows may interfere with each other, and the external influence from the dissemination of other posts needs to be considered. So it is necessary to model the diffusion by considering the complex interactions in the global view.
Second, the dynamics of user interest are barely considered in existing methods. In the social network, the interests of users are changing over time, and the diffusion patterns may also change. As Fig. 1 shows, p 3 is retweeted by u 5 at t 1 and u 6 at t 3 , while p 4 is retweeted by u 5 at t 5 and u 6 at t 6 . It is also observed that u 3 retweets p 4 at t 4 . With a similar pattern of p 3 and p 4 , it is more likely to predict p 3 will be retweeted by u 3 in the future. But if u 5 and u 6 have changed the preference of contents at t 4 , it is of a high probability that the contents of p 3 and p 4 are unrelated. In that case, u 3 is probably not interested in p 3 and unlikely to retweet p 3 . Hence, it is significant to consider the dynamics of user interest and adopt dynamic user modeling for more accurate prediction. Even though some works [12], [13] have paid attention to the dynamics of user preference, they still fail to model the diffusion flow in more realistic scenarios.
To address the above limitations, we introduce a hypergraph structure and dynamic user modeling to model the complex interactions. As shown in Fig. 1, the global interactions include various many-to-many connections between users and posts. Hence, modeling the complex interactions among users coincides with the concept of the hypergraph, where an edge can link any number of nodes [14]. In this way, we formulate each diffusion flow as a hyperedge and construct a user interactive hypergraph. As shown in Figs. 2a and 2b, each hyperedge links users who spread the same post, and the hypergraph preserves the high-order relations among users. Meanwhile, to further capture the connections between different diffusion flows, we construct a diffusion interactive graph. As Fig. 2c shows, each vertex represents a hyperedge in the user interactive hypergraph. The linkage exists if at least one common user shares two related posts, and the corresponding weight indicates the concurrence of users. Furthermore, to model the dynamics of user preference, we construct sequential user interactive hypergraphs by dividing the period into sequential time windows, shown in Fig. 2d.
Based on that, we propose dual-channel hypergraph neural networks to tackle the information diffusion prediction problem, denoted by HyperINF. As Fig. 3 shows, in the user channel, we adopt hypergraph convolution networks on each user interactive hypergraph to utilize the highorder relations. Then, we establish a sequential framework with a residual layer between two continuous periods to model the dynamics of user interest. In the diffusion channel, the weighted graph neural networks are applied to the diffusion interactive graph to capture the cross-diffusion relations. Then we design a fusion layer to combine the static user embedding, dynamic user embedding from the user channel, and diffusion embedding from the diffusion channel. Then we consider social relations and adopt diffusion graph neural networks to forecast the subsequent activated users in the near future. The evaluation results suggest the validity of HyperINF. Additionally, we make comparisons of time and memory efficiency between the proposed method with the state-of-art methods in Section 4.6. The result shows that the proposed model is a practical framework running with much less GPU memory consumption and a little more time. In summary, the main contributions of the paper lie in three aspects: We study the external influence from the dissemination of other contents and further consider the dynamics of user interest when modeling information diffusion. We propose the dual-channel hypergraph neural networks and establish a sequential framework for information diffusion prediction, denoted by HyperINF. We evaluate HyperINF on three real-world datasets. The experimental results suggest the effectiveness and practicability of the proposed method. The rest of the paper is organized as follows. The second part gives a brief review of related works. The third section introduces the proposed model in detail. The next section describes the experimental settings and analyzes the evaluation results. The final section summarises the proposed model and includes a discussion of future works.

Information Diffusion Prediction
Many studies have investigated information diffusion in online social media and aimed at discovering the latent diffusion patterns for future diffusion prediction. Generally, the information diffusion prediction falls into two categories: macro-prediction and micro-prediction. The macro-prediction seeks to forecast the future popularity while the other focuses on predicting the following activated users at the micro-level.
Recently, researchers have attempted to adopt emerging deep learning techniques and proposed many diffusion models for prediction. Representation learning is widely used to learn the user embedding for further diffusion, such as EmbeddingIC [4], inf2Vec [5], and HID [7]. Some works [6] also consider the content feature and project the users and contents into the same vector space. But these works fail to incorporate the representation learning into the diffusion model for prediction. Thus, there is a large number of diffusion models that build up an end-to-end framework for prediction. DeepCas [15] and DeepHawkes [16] are the earlier RNN-based models to settle the macro-prediction problem and consider the time features at the same time. In addition, [17] introduces the uncertainty in the diffusion process with a variational autoencoder. Moreover, the models in [18], [19] also introduce the content features when modeling the temporal process. Meanwhile, the graph neural networks are utilized in recent studies [20], [21], [22], [23], [24]. For the diffusion prediction at the micro-level, some models [8], [25] consider the temporal characteristics and enrich the basic RNNs-based framework with the attention mechanism. NDM [26] adopts the convolutional neural network and attention mechanism to make a prediction. Other models [9], [10], [27] introduce the structure information in social networks. For example, TopoLSTM [9] improves the basic LSTM with social topology while SNIDSA [10] adds a structure attention model before the recurrent neural networks. Some diffusion models utilize both time and structure features, such as infVAE [28]. Moreover, FOREST [11] proposes a multi-scale diffusion model to solve the microprediction problem and further realizes macro-prediction with reinforcement learning.
By contrast, previous studies pay more attention to the internal influence within the diffusion flow and ignore the mutual influence between simultaneous information dissemination. Furthermore, the dynamics of user interest are barely considered. To address these issues, we propose the model HyperINF and mainly focus on the diffusion prediction at the micro-level in the paper.

Hypergraph Neural Networks
The hypergraph is a generalization of the concept of the graph, where the edge could link even more than two nodes [14]. Therefore, a hypergraph can represent the complex relations among nodes. There are a lot of applications in various fields, such as image recognition [29], sentiment classification [30], and knowledge representation [31]. With the emergence of graph neural networks, an increasing number of scholars get to explore graph neural networks on a hypergraph. Gao et al. first propose hypergraph neural networks [32] and consider the dynamic modifications of graph structure in [33]. Besides, HyperGCN [34] extends graph convolutional networks on a hypergraph. The study [35] proposes hypergraph neural networks with the selfattention mechanism. Similarly, the model in [36] leverages an attention module after performing convolution on a hypergraph. Meanwhile, there are many personalized hypergraph neural networks for various tasks. For example, HGC-RNN [37] is proposed to settle the time series prediction problems, and the works [38], [39] apply hypergraph neural networks to recommendation tasks.
In this paper, we attempt to tackle the diffusion prediction problem with hypergraph neural networks. Inspired by previous works, we propose a dual-channel hypergraph convolution module and establish a sequential framework to model the dynamics of user preference for the content on a social network.

METHODS
In this section, we present the overall framework, as shown in Fig. 3. We first give the problem definition and details of the proposed methods. To model the complex interactions in social networks, we construct the user interactive hypergraph and the diffusion interactive graph. Then we propose sequential hypergraph neural networks with dual-channel, including user channel and diffusion channel. In the user channel, we construct a sequential hypergraphs convolution framework and model the dynamics of user preference. In the diffusion channel, we adopt graph neural networks to extract the cross-diffusion relations. At last, a prediction module introduces the social structure for final diffusion prediction. To facilitate reading, we summarize some symbols in the paper in Table 1.

Problem Definition
We study the information diffusion on a social network G ¼ ðU; EÞ, where U denotes the users and E represents the following relationships. Generally, we can collect the users' action logs L ¼ fðu; p; tÞju 2 U; p 2 Pg, where ðu; p; tÞ denotes that user u takes action with post p 2 P at time t. In the paper, we focus on the diffusion prediction problem at the micro-level. For instance, we aim to predict who will like, retweet, or comment on the post p on Twitter. Hence, the overall objective is to predict the following activated users, and the diffusion prediction problem can be formulated as follows: Problem Definition. Given the users' action logs L in G, we can get the diffusion sequences S ¼ fs j jp j 2 Pg. For each p 2 P, s ¼ fðu 1 ; t 1 Þ; :::; ðu n ; t n Þg, where ðu k ; p; t k Þ 2 L and t kÀ1 < t k . The objective is to predict the following activated users at t nþ1 in s. Hence, the prediction problem is formulated as u nþ1 ¼ arg max u2U PðUjs; G; LÞ.

Modeling the Interaction in Social Networks
Generally, numerous contents are spreading in the social network simultaneously. The users take actions on various posts, and complex interactions exist between users and posts. It is observed that the global interactions include various many-to-many relations, as shown in Fig. 1. Each diffusion flow consists of the participation of many users, and a single user could spread multiple posts. In this case, the simple graph structure cannot meet the requirements of modeling many-to-many relations among users triggered by information diffusion. Thus, we introduce the hypergraph structure to model the complex interactions. We formulate each diffusion flow as a hyperedge and construct the user interactive hypergraph, as shown in Fig. 2b. By doing this, the connections between different diffusion flows are implied in the user interactive hypergraph, and yet we still want to build the cross-diffusion relations directly to further address the connections between different diffusion flows. Hence, we construct the diffusion interactive graph, as shown in Fig. 2c. Meanwhile, it is also important to model the dynamics of user preferences as mentioned before. Thus, we attempt to implement dynamic user modeling. First, we model the interactions among users in multiple time windows to extract current user preference. So we get the sequential user interactive hypergraphs in chronological order, as shown in Fig. 2d. Since it is not irrelevant to the user preference in continuous time windows, we design a sequential framework with the residual layer between two continuous-time windows to model the dynamic changes of the user interest.
We illustrate the construction of the graphs mentioned above in Fig. 2. Specifically, Fig. 2a shows the hypergraph construction, and Fig. 2b presents an example of the user interactive hypergraph, defined by: u k ; :::g; where ðu q ; p i ; t q Þ; ðu k ; p i ; t k Þ; ::: 2 Lg and W 2 R jEjÂjEj is the weight matrix for hyperedges.
W is the positive weight assigned to each edge , and we define W ¼ 1 in the paper. The connection between users and hyperedges is denoted by Moreover, we can get the degree matrix of users and hyperedges, denoted by D u 2 N jUjÂjUj and D 2 N jEjÂjEj , in which Meanwhile, we notice that the derived line graph of a hypergraph presents connections among hyperedges. In the line graph, each vertex denotes a hyperedge, and two vertices are adjacent if corresponding hyperedges link at least one common node in the hypergraph [40]. For the user interactive hypergraph, the hyperedge presents the diffusion flow. Then the connections among different diffusion flows can be represented by the line graph derived from the user interactive hypergraph. In this way, we construct the diffusion interactive graph to capture the cross-diffusion relations shown in Fig. 2c. For each p 2 P, we use p to represent the diffusion of the post p for simplicity. The definition is as follows: Moreover, the interests of users are not static and will change over time. It is also important to capture the dynamics of user preference for more accurate predictions. To model the users at different times, we construct the sequential user interactive hypergraphs in chronological order, as shown in Fig. 2d. Specifically, we divide the period into continuous-time windows and generate user interactive hypergraphs for each time window. We give the formulation in Definition 3.

Definition 3. Sequential User Interactive Hypergraphs.
Let H t k ¼ ðU t k ; E t k ; W t k Þ denote the user interactive hypergraph at kth time window, and H T ¼ fH t 1 ; H t 2 ; :::; H tn g is the sequential user interactive hypergraphs.
Based on this, we propose the dual-channel hypergraph neural networks, including the user channel and the diffusion channel. The user channel is to extract the high-order relations between users and learn dynamic user embedding, while the diffusion channel utilizes the connections between diffusion flows to embed the cross-diffusion relations into diffusion embedding.

The User Channel
In the user channel, the objective is to utilize the high-order relations in user interactive hypergraph and learn robust users' representation for the prediction task. Hence, we expect the representations of users, who share more common posts, to be closer to each other. Recently, the study of graph neural networks (GNNs) on the hypergraph has obtained promising results, such as [32], [36]. The hypergraph convolution networks update the hidden state of the user by aggregating the related hyperedges' features, and the hyperedges' features are generated by gathering the linked users' features in a previous step. Similarly, if we apply the convolution operation on user interactive hypergraph, the user could receive more information from neighborhoods linked by more common hyperedges. Since sequential user interactive hypergraphs are constructed in continuous time windows, we apply hypergraph convolution networks on each hypergraph.
Let X denote the static user embedding matrix and X t k ;0 ¼ ½x t k ;0 1 ; :::; x t k ;0 jU t k j denote the initial user embedding matrix at kth time window. Then the updated users' representations X t k ;1 are derived as where fðÁÞ denotes the hypergraph convolution operation. H t k and W t k are the incidence matrix and the weight matrix of H t k . Q 0 is the transform matrix to extract the user feature for propagation in the convolution layer. For each user on H t k , the updating operation can be formulated by: For all users of H t k , we can get the matrix form as follow: While stacking l layers in the model, the scale of X t k ;l would be changed by the computation in Eq.(4), which could make the gradients explode or disappear during the feature propagation. Thus we add the symmetrical normalization to keep the convolution layer numerically stable. Moreover, we add an activation function sðÁÞ before the output. Overall, the hypergraph convolution layer is derived as where D t k u and D t k are the degree matrix of users and hyperedges in H t k . Then, with lth layers, the user representation is computed by: Residual Layer. Since the dynamic user embedding is independently learned in each time window, it extracts the user preference in the present time but still fails to model the dynamics of user interest. Hence, we add a residual layer between two continuous-time windows. It is expected to keep the static information of users and model the dynamics of user interest. To do that, the layer is designed as the linear weighted sum of user static embedding and user dynamic embedding from last time windows. Therefore, for each user u 2 U t k , the initial user embedding X t k ;0 of kth hypergraph convolution module is derived as where x u is the static user embedding and x t < k ;l u is the latest user embedding for u before kth time windows.
The momentum term a controls the proportion of the feature from the last time window. It is noticed that the longer the time interval, the smaller the impact on the current time window. Meanwhile, the residual layer can guarantee that the gradient will not disappear especially when there are many time windows. Because there always exists a constant 1 À a after taking derivative, the loss can still be effectively propagated back even if the @L @x is extremely small. In this way, we learn the dynamic user embedding in each time window.

The Diffusion Channel
In the diffusion channel, we extract the cross-diffusion relations and learn the diffusion embedding based on the diffusion interactive graph. In the graph, the linkage exists if at least one common user share both two posts. The more common users are, the bigger the related edge weight is. Thus, we expect the representations of the diffusion, with the connection of higher weight, to get closer to each other. For this to happen, we adopt the weighted graph neural networks, and the diffusion embedding matrix X l d at lth layer is updated by: where Q l d is the transformation matrix for the lth layer. Let A d ¼ A þ I and I be the identity matrix. A d 2 R jPjÂjPj is the weight matrix of G D , where A d;ði;jÞ ¼ w d i;j . Thus we can get the diagonal degree matrixD d 2 R jPjÂjPj , whereD d;ði;iÞ ¼ P jPj j¼1Âd;ði;jÞ . In this way, the weighted graph neural networks realize to aggregate information from the neighbors in line with the proportion of edge weight. Fusion Layer. To integrate the information embedded in the learned embeddings, we combine the static user embedding, dynamic user embedding from the user channel, and

Symbol Description
G the social network U the users in G P the posts in G L the users' action logs S the diffusion sequences of P H the user interactive hypergraph E the interaction hyperedges in H W the weight matrix of hyperedges E H the incidence matrix of H G D the diffusion interactive graph H t k the user interactive hypergraph at kth time window H T the sequential user interactive hypergraphs G p the diffusion graph of the post p diffusion embedding from the diffusion channel for further diffusion prediction. Thus, we propose a fusion operation gðÁÞ to combine the learned embeddings. In the paper, we choose concatenation to generate the embedding for each user. Then, the embedding for user u in the diffusion of p in k-th time window is computed by: where x u is the static user embedding and x t k u denotes the latest dynamic user embedding at t k . x d;p is the embedding of the diffusion flow with post p. W F is the learnable transformation matrix in the fusion layer.

Information Diffusion Prediction
After getting the user representation through the fusion layer, we next settle the diffusion prediction problem. To predict the future flow of a specific post, we need to consider the internal influence within the diffusion cascade. For example, we can adopt the GRU to model the diffusion flow in time order.
Considering that the social relations between users are important factors of triggering the information diffusion, we further introduce the social structure in diffusion prediction. Thus, we adopt the diffusion graph neural networks with a gated mechanism to introduce the influence of social relations. We formulate each diffusion flow as a diffusion graph. For p 2 P, the diffusion sequence s ¼ fðu 1 ; t 1 Þ; :::; ðu n ; t n Þg, where ðu k ; p; t k Þ 2 L and t kÀ1 < t k . Let G p ¼ fU p ; E p g denote the diffusion graph of p, where U p ¼ fu k jðu k ; t k Þ 2 sg and E p ¼ fðu a ; u b Þjðu b ; u a Þ 2 E and t a < t b ; where u a ; u b 2 U p g.
Similar to the model in [27], we expect to learn each user embedding on the diffusion graph by aggregating the neighborhood users' embedding and the previous related users' embedding alternately. Then, the initial user embedding x ð0Þ p;u ¼ e p;u , and the update function cðÁÞ is derived as where x ðtÞ p;u is the hidden representation of user u 2 U p at time step ðtÞ in the dissemination of p and X ðtÀ1Þ p ¼ ½x ðtÀ1Þ p;u 1 ; :::; x ðtÀ1Þ p;u n is the user embedding matrix at previous time step ðt À 1Þ. Moreover, since the diffusion graph is a directed graph, the adjacent matrix A p is the combination of the incoming and outgoing adjacent matrix of diffusion graph G p , denoted by A p ¼ ½A In p ; A Out p . To be specific, we first gather the features from the spatial neighborhood users in the diffusion graph, denoted by: where fðÁÞ is the linear transformation function. Then we use a gated mechanism of GRU to model the diffusion flow in the temporal dimension. The related users' states at previous time steps are aggregated by: where z ðtÞ p;u and r ðtÞ p;u are update gate and reset gate. sðÁÞ is activation function. Then, we learn the user hidden representation X p ¼ ½x p;u 1 ; :::; x p;un on the diffusion graph G p .
In some cases, social relations are not available. Thus, we also explore the proposed framework with recurrent neural networks and self-attention mechanisms to make the prediction in Section 4.4. Diffusion Prediction. For the final prediction, we fuse the final users' embedding to generate the diffusion graph representation. In the paper, we use the soft attention mechanism to calculate the graph representation e p as follows: where W 1 and W 2 are the transformation matrix. w a is the vector to learn the attention score of the user in each time step, and w b is the bias vector. Since the latest activated users have a significant influence on the future diffusion, we combine the diffusion graph representation and the most recently activated users' representations x p;un for final prediction, denoted by: whereŷ p 2 R jUj is the next activation probability distribution. Model Optimization. The objective of diffusion prediction is to predict the following activated users in the diffusion flow. Hence, we choose the cross-entropy as the loss functions, derived as where P t is the training samples and y p 2 N jUj is the ground truth. To train the proposed model, we adopt Adam optimizer to find the optimal model in the paper.

EVALUATIONS
In this section, we conduct experiments to evaluate the proposed model, denoted by HyperINF. First, we test the model on three datasets and explore the function of different modules. Then we analyze different settings of the hyper-parameters. At last, we compare the time and memory efficiency between HyperINF and baselines.
Digg is a news-sharing platform where users vote and elect the most valuable articles. The dataset includes 3553 reports and related users' activity logs during one month in 2009. Flixster is a social movie site where users share or comment on movies. We filter the diffusion cascades where the participants are less than five from the original dataset. For evaluation, we randomly select 10000 movies from 2006 to 2009. Weibo is a social networking site based on a microblogging service. We select the cascades where there are at least five participants and more than half of the users take action at least three times. Then we collect the 13, 808 posts and the retweet logs between 2009 to 2012.

Baselines
To compare HyperINF with the state-of-art models, we choose several baselines as follows: Deepdiffuse [8] is proposed to model the diffusion with RNNs and predicts the following users with timestamps. In the experiment, the predicted timestamp is neglected. TopoLSTM [9] aims to improve the diffusion model by considering the social relations between users and proposes a novel LSTM-based framework with the social structure. SNIDSA [10] defines the structure attention module to introduce the structure features and then adopts the RNNs to model the diffusion for prediction. FOREST [11] incorporates the user's neighbor information into the GRU-based diffusion model and deals with the macroscopic diffusion prediction problem with reinforcement learning. We only utilize FOREST for diffusion prediction at the microlevel in the experiments. infGNN [27] proposes personalized graph neural networks to model the diffusion in spatial and temporal dimensions alternately for prediction. DyHGCN [12] builds a heterogeneous graph including following relations and reposting relations between users and considers the evaluation of the heterogeneous graph while modeling the diffusion with an attention mechanism.

Overall Results
We evaluate the proposed model on three datasets and compare it with the baselines mentioned above. The results are presented in Figs. 4 and 5. In general, HyperINF performs the best in most cases. It is observed that there is a significant improvement in the accuracy of the Top-k predictions when k is small, such as Hits@1 and MRR@5 on three datasets. HyperINF excels all baselines when k is smaller than 10. For example about Hits@1, the improvements are nearly double on Digg and more than double on Flixster. The performance on Weibo also improves up by half in most cases. It shows that the accuracy of the prediction is greatly promoted by HyperINF. It is noticeable that the performance gains on Flixster dataset are more than 10% about Hits and over 20% about MRR. Since Flixster is a social movie site, users gather due to the same interest in films, and the preference is relatively stable in a certain period. Then, the diffusion pattern is highly correlated with the type of movie. Thus, the effect of the cross-diffusion relationships in the diffusion interactive graph is at work, and the proposed method performs much better. For Digg dataset, even if FOREST and DyHGCN perform slightly better than HyperINF in terms of certain indicators, the values of MRR are always the highest. For Weibo, all baselines and HyperINF perform relatively worse than Digg and Flixster. It is noticed that there are more users and the average length of cascades is also shorter in Weibo dataset. In this case, HyperINF still has a relatively higher performance. It shows that modeling the diffusion from a global view can extract more information of users' preferences and offer enough information for prediction even in the early stages of diffusion.
As for the baselines, the models incorporating the social structure, such as FOREST and infGNN, are better than those that sequentially model the diffusion in the temporal dimension, such as Deepdiffuse. In addition, the results illustrate that the dynamic user modeling has a positive effect on the accuracy of diffusion prediction, which is introduced in DyHGCN and HyperINF. By contrast, our model performs better than DyHGCN in most cases. Overall, our proposed method is proved to be effective, and the sequential hypergraph neural networks framework works in the case of modeling the diffusion from a global view.

The Analysis of the Dual-Channel Module
To further explore the function of two channels in Hyper-INF, we conduct an ablation study. Therefore, we establish several variants of the proposed model. To be specific, HyperINF-U gets rid of the whole modules in the user channel, while HyperINF-D is the variant without the diffusion channel. HyperINF-S is the model without the sequential framework in the user channel, and the user interactive hypergraph is constructed in the whole period. HyperINF-R only takes off the residual layer between two hypergraph convolution modules in continuous time windows. In addition, we also study the effect of the static user preference for prediction. Thus, we only preserve the dynamic user embedding from the user channel for further prediction in HyperINF-d. The evaluation results on Digg and Flixster datasets are presented in Table 3.
The Analysis of User and Diffusion Channel. HyperINF-U and HyperINF-D take off the user channel and the diffusion channel, respectively. Overall, the two variants suffer performance degradation compared to HyperINF, except for HyperINF-U on Digg about Hits@50. While HyperINF-U still has clear performance degradation on Hit@1 and the performances of HyperINF-U are degraded by around 0.7% about MRR. Besides, even though HyperINF-D has a higher performance than HyperINF-U about Hit@1 on Digg, HyperINF-D still performs worse than HyperINF-U about the remaining metrics. It shows that the diffusion channel and the user channel play different roles in the case. The user channel can help to improve the prediction accuracy with the fewer top-k candidate users and the diffusion channel plays an important role in accurately selecting effective users on the whole. As for the Flixster, HyperINF-D, without the diffusion channel, has relatively worse performance than HyperINF-U. The performance of Hyper-INF-D drops by over 10% about Hit@1 and 6% about MRR@5, while the performances of HyperINF-U only fall by around 1%. It also indicates that cross-diffusion relations have a more profound impact. Because Flixster is a contentsensitive community and users share the movies depending largely on individual interests. Thus, the post embedding learned by the diffusion channel embeds the feature of the content and plays an important role in the prediction. Furthermore, it is found that HyperINF-S, without the  sequential framework, also performs better than HyperINF-U in some cases. It demonstrates the function of the user channel in HyperINF again. Thus, the proposed dual-channel model benefits from each channel, and the combination has a good effect on improving performance. The Analysis of Sequential Framework. We further explore the function of the sequential hypergraphs framework in the user channel. As shown in Table 3, it is found that HyperINF-R, without the residual layer, has better performance than HyperINF-S, without the sequential framework. It illustrates that the sequential hypergraph neural networks in the user channel do work. The dynamic user modeling with the sequential framework has a beneficial effect. One interesting finding is that the model without the residual layer performs better than HyperINF in some cases. For Digg, the residual layers lead to a slight performance degradation about the Hits@50. The possible reason could be there is a palpable shift in some users' interests, and the residual layer may introduce nasty noises into next time windows in that case. We also observed that HyperINF performs worse than HyperINF-R in terms of MRR@5 on Flixster. Since Flixster datasets include the cascades during 3 years, the instant interests of some users may not be able to last. Then the noise will be introduced into the following periods and has a bad impact on the performance. While HyperINF-R performs worse than HyperINF in terms of the remaining metrics. Overall, the rank of right prediction by HyperINF is higher than HyperINF-R. At last, we further remove the static user embedding within the fusion layer in the absence of the diffusion channel. The results show that HyperINF-d suffers degraded performance. So the static user feature embedded in the user representation is crucial and useful. It is essential to preserve the static user characteristics while modeling the dynamics of user preference.
In summary, the high-order relations between users and cross-diffusion relations, extracted by the dual-channel modules, have a beneficial effect in improving the prediction ability of HyperINF. Moreover, the sequential framework is useful and necessary for capturing the dynamics of user interest.

The Analysis of the Prediction Module
In this part, we want to explore the function of the social structure in diffusion prediction. We test the proposed framework without considering the social relation. Specifically, HyperINF-LSTM and HyperINF-GRU are the variants replacing the diffusion graph neural networks with LSTM and GRU. While HyperINF-SF adopts the self-attention mechanism as the sequence model without the pooling layer and uses the last outputs for further prediction. The results are shown in Table 4. It is observed that HyperINF achieves the best performance on Digg dataset, while HyperINF-GRU and HyperINF-LSTM perform better than HyperINF on Flixster. It shows that introducing social structural information does not always work. For example, the influence of social relations is not significant for Flixster, since it is a content-sensitive community. By contrast, the topic of the content is the dominant factor of information diffusion. Hence, it is not necessary to introduce the structure information in that case. In addition, it is noticed that HyperINF-SF performs the worst by only utilizing the attention layer. In comparison, the RNNs-based diffusion sequence model works in helping capture the diffusion pattern. In general, Hyper-INF is a flexible and practical framework with different methods to model the diffusion flow. In some cases, it is beneficial to introduce social relations for further prediction. In other cases, we can use RNNs to model diffusion flow in time order.

Hyper-Parameters Sensitivity Analysis
This section explores the hyper-parameters sensitivity of batch size, embedding size, learning rate, and a in the

CONCLUSION AND FUTURE WORKS
For information diffusion prediction, we propose dualchannel hypergraph neural networks with a sequential framework, denoted by HyperINF. We construct the user interactive hypergraph and diffusion interactive graph to model the users' interactions in a more realistic and complex scenario. In the user channel, we build sequential user interactive hypergraphs and apply the hypergraph convolution networks on each hypergraph to learn the dynamic user representation. We capture the cross-diffusion relations in the diffusion channel by utilizing the weighted graph neural networks on the diffusion interactive graph. Then, we combine the user static user embedding, dynamic user embedding from the user channel, and diffusion embedding from the diffusion channel with a fusion layer. For further prediction, we introduce the social structure information with the diffusion graph neural networks. In the end, the model outputs the forecast result through a softmax layer. We conduct extensive experiments, and the evaluation results show the effectiveness of our proposed model. In addition, we evaluate the time and memory  efficiency. The results show that the proposed model is a practical framework with much less memory consumption and a little more computation time.
In the paper, we mainly focus on diffusion prediction at the micro-level. The extension of the proposed framework into popularity prediction at the macro-level is also worthy of further study. Meanwhile, if content features of the post are available, it can be incorporated as the input of the diffusion channel and further improve the model's generalization ability. Besides, it is the most challenging and significant to study a more explainable diffusion prediction model. Last but not least, since there are many complicated application scenarios for information diffusion models, the exploration of the energy-efficient framework also needs to be considered seriously.
Hai Jin (Fellow, IEEE) received the PhD degree in computer engineering from HUST in 1994. He is a chair professor with computer science and engineering, Huazhong University of Science and Technology (HUST) in China. In 1996, he was awarded a German Academic Exchange Service fellowship to visit the Technical University of Chemnitz in Germany. He worked with The University of Hong Kong between 1998 and 2000, and as a visiting scholar with the University of Southern California between 1999 and 2000. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. He is a Fellow of CCF, and a life member of the ACM. He has coauthored more than 20 books and published more than 900 research papers. His research interests include computer architecture, parallel and distributed computing, big data processing, data storage, and system security.
Yao Wu received the BS degree in communication from the Huazhong University of Science and Technology (HUST), Wuhan, China in 2016. Currently, she is currently working toward the PhD degree with HUST. Her research interests include data mining and social computing.
Hong Huang received the PhD degree in computer science from the University of G€ ottingen, Germany in 2016, and the ME degree in electronic engineering from Tsinghua University, Beijing, China in 2012. She is an associate professor with the Huazhong University of Science and Technology, China. Her research interests include social network analysis, data mining, and knowledge graph.
Yu Song received the BS degree in electronic information engineering and the MS degree in computer science and technology from the Huazhong University of Science and Technology, Wuhan, China in 2018 and 2021. His research interests include data mining and social network analysis.
Haohui Wei received the BS degree in computer science and technology from Dalian Maritime University, Dalian, China in 2020. He is currently working toward the master's degree with the Huazhong University of Science and Technology. His research interests include knowledge graph and collaborative filtering.
Xuanhua Shi is a professor with the School of Computer Science and Technology, Huazhong University of Science and Technology, China. He is the deputy director with the National Engineering Research Center for Big Data Technology and System (NERC-BDTS). His current research interests focus on cloud computing, big data processing, and AI systems. He published more than 100 peer-reviewed publications (such as ASPLOS, VLDB, ACM Transactions on Computer Systems, IEEE Transactions on Parallel and Distributed Systems). He received research support from a variety of governmental and industrial organizations, such as the National Science Foundation of China, Ministry of Science and Technology, Ministry of Education, and European Union.