Joint Learning of Embedding-Based Parent Components and Information Diffusion for Social Networks

Diffusion on social networks refers to the phenomena that opinions are spread via connected nodes. Given a set of observed cascades, the underlying diffusion process can be inferred for social network analysis. Earlier studies for modeling the diffusion process often assume that the activation of a node depends independently on the activations of its neighbors (or called parent nodes). Nevertheless, the activation of a node also depends on the connectivity of its neighbors. For instance, the opinions from the neighbors of the same closely connected social group are often similar, and thus those neighbors exhibit similar influence. Some recent studies incorporate the structural dependency of neighbors as connected components, which allow more accurate diffusion models to be inferred. However, the effectiveness of such component-based models often depends on how the components are identified. Existing methods are not designed to directly preserve the local connectivity of neighbors. In this paper, we propose to incorporate network embedding to enhance the performance of component-based diffusion models in social networks. In particular, we embed nodes in a social network in a latent vector space with local connectivity of the nodes preserved. Parent component identification then becomes a clustering task in the embedding space. A united probabilistic framework is proposed so that the parent components and the component-based diffusion models can be inferred simultaneously using a two-level EM algorithm based on observed information cascades. For performance evaluation, we apply the proposed model to both synthetic and real world data sets with promising results obtained. The empirical results also show how the use of the embedding-based framework can enhance both the component identification and the diffusion model.


I. INTRODUCTION
Diffusion on social networks refers to the phenomena that actions or information spreads among connected nodes, resulting in information cascades. Given a set of observed information cascades, the underlying diffusion process can be inferred [1], [2] for social network analysis, such as influence maximization [3]- [5], personalized recommendation [6], [7], authoritative user identification [8], and et al.
The associate editor coordinating the review of this manuscript and approving it for publication was Jeonghwan Gwak .
How actions or information spreads is well known to be highly related to the interactions of connected nodes. Various diffusion models have been proposed [1], [2], [9], [10] in the literature. Among them, the independent cascade (IC) model [1] and the linear threshold (LT) model [2] are two widely used diffusion models. The IC model assumes that a node can be influenced by any of its neighbors independently with some chosen probability, while the LT model assumes that whether a node will be influenced requires the social affirmation from multiple neighbors. Here we focus on the IC model and its extensions for more accurate modeling of VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ diffusion processes. Since the basic IC model was first proposed, there have been different variants of the IC model to uncover temporal dynamics [11], take continuous time [12], consider topic-aware [13] and role-aware [14], [15] diffusion, incorporate node embeddings [16] and et al.
The long-standing framework associates the probability of adopting a behavior with the number of network neighbors who have already adopted the behavior [17], [18]. Nevertheless, in social networks, the network neighbors could exhibit different forms of influence depending on their connectivity. For example, neighbors of the same social group are closely connected and communicate through their connectivity, thus those neighbors often have similar opinions and exhibit similar influence. In particular, Ugander et al. observed that the user engagement behavior in Facebook was affected by the connected components of Facebook users in the contact neighborhood instead of the individual users [19]. In addition, Zhang et al. proposed a related notion called social influence locality for modeling retweeting behaviors [20]. The structure of neighbors has been regarded as the resources they hold, which is termed social capital in [21]. Related perspectives have also been explored for mobile communication network and social network analysis [22], [23]. Based on that, Bao et al. [24] first proposed a componentbased diffusion model for social networks, which assumes that the influence of the neighbors is not exerted individually but by connected components of the neighbors (called parent components in [24]). However, the effectiveness of component-based models often depends on how the components of the neighbors are identified. In [24], the components in the contact neighborhood are assumed to form different communities and are detected with community detection algorithms by optimizing quantities such as modularity [25]. As those quantities not designed to directly preserve local connectivity information among neighbor nodes, not all nodes in the detected components are closely connected.
In this paper, we propose a component-based diffusion model with local connectivity information of neighbors preserved. Recently, network embedding approaches have been proposed to encode individual nodes as low-dimensional embedding vectors that summarize the rich network properties [26]- [28], such as network structure [29]- [31], node content [32], [33], heterogeneous information networks [34], [35], temporal networks [27], [36], dynamics on the network [28] and et al. The learnt node embeddings can be used as feature inputs for downstream machine learning tasks such as node classification, node clustering and community detection. Among them, LINE [37], DeepWalk [29] and Node2vec [30] are commonly adopted to preserve the neighborhood structure of nodes via optimizing local connectivity information. The resulting node embeddings make the connected nodes near in the embedding space. In the context of component-based diffusion model, given the learnt node embeddings, parent component detection can then become a clustering task in the embedding space and probabilistic clustering algorithms such as Gaussian Mixture Models (GMM) can be naturally adopted. This further hints that the component detection and diffusion network inference problems both can readily be integrated under a unified probabilistic framework. We here formulate the component identification and diffusion network inference problems based on an overall likelihood function so that both can be solved at the same time in a disciplined manner. The corresponding learning algorithm is derived and the effectiveness of our proposed model is evaluated using both synthetic and real data sets. The results show the use of the embedding-based framework can enhance both component identification and diffusion prediction, and our model can be applied to support dependency analysis of different online news media.
The contributions of this paper are summarized as follows. 1) We model the local connectivity information of neighbors using network embedding to enhance both component identification and diffusion network inference. In particular, we propose a component-based diffusion model with embedding-based parent components, and formulate the component identification and diffusion network inference problems based on an overall likelihood function. To the best of our knowledge, this is the first work to solve both problems simultaneously within a unified framework. 2) We make use of the expectation maximization (EM) [38] algorithm and derive a corresponding two-level EM algorithm for obtaining maximum likelihood (ML) estimates of the model parameters. Both parent components and underlying diffusion network can be inferred simultaneously based on the observed information cascades. The remainder of this paper is organized as follows. Section II gives the detailed formulation of our proposed model, followed by the EM algorithm for learning the model. Experimental results and related discussion can be found in Section III. Section IV concludes the paper and provides pointers for future work.

II. FORMULATION
In this section, we propose a novel component-based diffusion model where the parent components are identified with local connectivity information well preserved, with the objective to better model the underlying information spreading process. We first apply node embedding to the neighborhood network of each node to obtain the parent node representations. We then model the communities of parents for each node based on their node representations using Gaussian Mixture Models. Each mixture component tries to capture a community, and then the component-based diffusion model is inferred accordingly. We formulate it as a maximum likelihood estimation problem so that the components and the diffusion model can be inferred simultaneously given sufficient observed cascades.

A. PRELIMINARY
We represent a social network as a directed graph G = (V , E) where V is the set of nodes and E is the set of edges. Let e = (v, w) be an edge from node v to node w, and f (w) and b(w) be the sets of child nodes and parent nodes of node w respectively, given as: f (w) = {u : (w, u) ∈ E} and b(w) = {v : (v, w) ∈ E}. Also, we denote by D s = {D s (0), D s (1) · · · D s (T s )} the s th observed information cascade where D s (t) is the set of nodes activated at time step t and T s is the final time step for the cascade D s .

B. PARENT COMPONENTS BASED ON LOCAL CONNECTIVITY INFORMATION
For learning the embeddings of parent nodes, we consider neighborhood network of each child node w and adopt the widely used node2vec algorithm [30]. The node2vec algorithm introduces biased random walks on a graph to optimize the sampling strategies so that nonlinear graph structure can be turned into linear sequences for learning embeddings under the Skip-gram architecture [39], [40]. We assign each parent node v ∈ b(w) with a continuous real-valued K dimension vector v w . For each node w ∈ V , we define N S (w) ⊂ V as a network neighborhood of node w that are nearby in the generated sequence through the sampling strategies S. Given the generated sequences by the random walks, we seek to optimize the following objective function, which maximizes the log-probability of observing network neighborhood N S (w) of a node w conditioned on its representation, give by f : max Under the Skip-gram architecture, we can approximate it using negative sampling. We optimize the objective function using the stochastic gradient ascent over the model parameters defining the embedding. Based on the learnt parent node embeddings for each child node w, parent component identification essentially becomes a clustering task in the embedding space, and here we adopt the Gaussian mixture model. But note that we propose to do the ''clustering'' and the diffusion network inference together under a unified probabilistic framework to be detailed in the sequel. For each node w ∈ V , we assume the presence of N z (w) different latent components consisting of w's parent nodes b(w), which will make node w likely to be activated. We denote by z w ∈ {1, . . . , N z (w)} the index of the latent components, and model the set of parent components of a child node w as a mixture of Gaussians. We use a Gaussian distribution N (v w |µ z w , z w ) to model each component z w , and each component z w has a mixing proportion α z w where N z (w) z w =1 α z w = 1 and ∀α z w (α z w ≥ 0). Thus, the probability of observing a parent node v ∈ b(w) of child node w in the embedding space is given as And thus the likelihood of observing the parent components for all the nodes w ∈ V is given as where is to represent the model parameters defined above.

C. COMPONENT-BASED DIFFUSION PROCESS
Component-based diffusion models assume that nodes within a parent component have similar influence on their common child. We here associate each parent component z w of the child node w with a diffusion probability p(w|z w ) = τ z w ,w . When a parent node in component z w is activated at time t, there will be a probability τ z w ,w that node w will be activated due to the component z w . Note that we allow overlapping components. Thus, p(w|v), the probability of a parent node v to activate a child node w, becomes an expected value of the diffusion probabilities {τ z w ,w } over all the latent components, that is Given the proposed diffusion model, the diffusion process of a particular cascade proceeds as follows.
is the set of nodes activated at time step t and T s is the final time step for the cascade D s . Given the initial set of activated nodes in the s th cascade (D s (0)), we assume that each of them will try to activate its child nodes. Note that we allow a parent node to be able to activate its child node not just for the next immediate time step but also for the subsequent time steps up to a limit. To explain that, we denote by C s (w, t) the set of nodes which have at least one activation within the interval between the latest activation of the child node w in the s th cascade denoted as L (s) w (t + 1) and the time step t. This assumes that we only pay attention to recent news and that the posts earlier than our latest post have little influence on our future posting behaviour. b(w)∩C s (w, t) then gives the subset of C s (w, t) which are parents of w. Then, the probability that VOLUME 8, 2020 the child node w will be activated at time t + 1 is given as: and whether node w will be activated is determined accordingly. The process proceeds until there is no more node being activated and then the cascade will stop.
We formulate the likelihood function under the delay agnostic setting as in [41], which counts a successful case for a parent node regardless of how many time steps it had tried before its child node got activated. We denote the set of nodes that were not activated by parent nodes as U s , and the likelihood function of the observed cascades D s can be formulated as follow: To solve the component detection and diffusion network inference problems under a probabilistic framework, we can optimize the overall likelihood function, given as

D. LEARNING ALGORITHM
We first adopt the widely used network embedding algorithm node2vec [30] to learn the embeddings of parent nodes of each node in the network. Then, a two-level expectation maximization (EM) algorithm is derived to obtain the ML estimates of the model parameters based on the observed cascades. The framework of our learning algorithm is shown in Fig. 2.

1) EM ALGORITHM
We make use of the expectation maximization (EM) algorithm [38], and derive a two-level EM algorithm to infer the latent parent components and the componentbased diffusion probabilities simultaneously. We maximize the likelihood function L( ) with respect to the parameters The inferred components will be those making the information diffusion mostly likely to happen, as well as grouping similar parent nodes.

a: FIRST LEVEL EM
Let I v,z w be a latent variable that takes the value of 1 when a parent node v belongs to the latent component z w , and 0 otherwise, given the constraint the whole set of the latent variables. We then derive the corresponding Q-function and obtain ML estimates via EM iterations consisting of an E-step and an M-step.
If we assume that I is known, the complete likelihood function can be written as P(D, I | ) = P(D|I , )P(I | ), where (a) E-step: As I is missing in most of the cases, we can do the E-step by first computing the posterior probabilities of I with the current parameter estimatesτ z w ,w ,α z w ,μ z w andˆ z w , given as Here I v∈b(w)∩C s (w,T s −1) is an indicator function which equals 1 if v ∈ b(w) ∩ C s (w, T s − 1). We define T + w,s as the set of time steps {t} in the s th cascade satisfying that w ∈ D s (t + 1) and b(w) ∩ C s (w, T s − 1) = ∅. Moreover, we define Then, we take expectation of all possible assignments of I which can explain the observed cascades. The corresponding expected likelihood function, i.e., Q-function, can be defined as For the M-step, we maximize Q( |ˆ ) by taking the derivative of Q with respect to to obtain the updating rule of the model parameters. Therefore, To update {α z w }, according to the Lagrange multiplier method, maximizing Q( |ˆ ) with the constraint To update {τ z w }, setting to zero the derivative for the first term E I [log p(w(s, t + 1), I )] in Q( |ˆ ) does not have a simple solution. So, within this M-step, we introduce another level of the EM algorithm.

b: SECOND LEVEL EM
Let Y (s) v,w (t) denote a latent variable that indicates whether the activation of a node w at time step t in the s th cascade is due to w's parent node v or not. We further define Y s = v,w (t)} represents the set of latent variables corresponding to the activations at time step t in the s th cascade. Then, we derive corresponding Q-function and infer parameters {τ z w } via second-level EM iterations.
(a) E-step: First, we compute the posterior probability of Y (s) v,w (t), given as whereτ z w ,w stands for the current estimate of τ z w ,w , and η v,z wτ z w ,w ).
Then, we take expectation of all possible assignments of Y and the corresponding Q-function can then be defined as (b) M-step: For the M-step, we maximize Q ( |ˆ ) by taking the derivative of Q with respect to {τ z w } to obtain the updating rule of {τ z w }. Thus, ∂Q /∂τ z w ,w = 0 yields: The E-step and M-step repeat until convergence. The detailed steps are summarized in Algorithm 1.

2) COMPUTATIONAL COMPLEXITY
Implementing the learning algorithm involves two main steps: 1) load the network data, the per-node neighbors' embeddings and the cascades related data, and 2) carry out the EM iterations. For step 2), we calculate the cost for each EM iteration. The E-step (lines 5-8) visits the neighbors v ∈ b(w) for each node w ∈ V , which is equivalent to visiting all edges. By denoting I max to be the maximum number of components set among all nodes, the worst case complexity becomes O(I max × |E|). VOLUME 8, 2020

III. EXPERIMENT
We compare our model with the basic models as well as the recently proposed models. We use both synthetic and real world data sets to evaluate the model accuracy and discuss how the use of the embedding-based framework can enhance both the component identification and diffusion prediction, and how our model can be applied to support dependency analysis of different online news media.

A. EXPERIMENTAL SETTINGS
In all our experiments, the number of parent components N z (w) of each child node w is set as the number of communities in the child node w's neighborhood network estimated by the CNM community detection algorithm [25] for all the component-based models. Also, for all the experiments, the initial values of τ z w ,w are within [0, 1], and the initial values of α z w are generated within [0, 1] satisfying z w α z w = 1. For learning embeddings, the nodes within two hops are considered instead of just the immediate neighbors for more precisely learning of the embeddings of parent nodes, and only the embeddings of parent nodes are kept. The widely used random-walk based network embedding method node2vec [30] algorithm is adopted, which introduces biased random walks to balance between BFS and DFS sampling strategies using hyperparameters p and q. The parameters are set as p = 0.125, q = 16 that is biased to BFS-strategy sampling nearby nodes mostly in the same communities. We compare our model with the basic IC model, a component-based diffusion model and an embedding-based IC model, given as: 1) ICM: The basic IC model. We extends the original IC model by considering the influence of all the parent nodes activated after the child node's latest activation instead of only those just activated at the previous time step. The main reason of using this modified IC model is to make sure that the comparison is only based on whether the components structure is adopted or not. 2) Comp: The component-based model proposed in [24]. The components of each child node are directly detected by the CNM community detection algorithm [25]. Parent nodes within two hops are considered to enhance the detection accuracy. We did not model redundancy in Comp for fair comparison about whether a particular way of identifying components is useful. Also, the parent nodes belonging to the same component have the same component-based diffusion probability instead of independent values as in IC model.

3) IC_Emb:
The embedding-based IC model in [16]. A set of sender embedding and receiver embedding is learnt for each node. And the diffusion probabilities are modeled as a function of sender embeddings and receiver embeddings, instead of independent values. However, neither local connectivity information nor the notion of parent components is considered. 4) Comp_Emb: The proposed component-based model with embedding-based parent components.

B. PERFORMANCE EVALUATION
As the ground-truth is unknown for real data, we adopt perplexity for evaluating the performance of various models on predicting unseen data. Perplexity is widely used for evaluation of language models [42], which calculates the average probability for each word to be generated by the trained model. For our case, the perplexity over the observed cascades is defined as where P(D s ) is the probability for the s th cascade to be generated, and W is the normalization term representing the number of activations due to the influence of the corresponding nodes' parents. The average probability is negatively correlated with perplexity. A lower perplexity score indicates the inferred model to be more probable, and thus indicates better performance. In particular, we denote the average probability for the first model as P 1 , the second as P 2 . Also the perplexity for the first model as PPL 1 , the second as PPL 2 . Then the relation between the improvement of average probability and the decrease in perplexity becomes ln P 2 P 1 = PPL 1 − PPL 2 . When the decrease in perplexity equals 0, then P 2 P 1 equals 1, meaning the models give the same performance. When the decreased perplexity scores take values of 0.05, 0.1, 0.15 and 0.5, then P 2 P 1 becomes 0.05, 0.1, 0.16 and 0.65, which means there are 5%, 10%, 16% and 65% improvement of average probability respectively. Also, we divide the cascades into five folds and obtain the average performance using crossvalidation.

C. EXPERIMENTS ON SYNTHETIC DATA
We generate synthetic cascades based on our model and anticipate that the inferred model with the same assumption for cascade generation should perform the best.

1) EXPERIMENTAL SETUP
We first generate two scale-free networks of 1000 nodes using the SNAP platform [43] as real networks are most scalefree. One network is generated with 5000 edges and the other with 10000 edges. For each network, 100 cascades are generated based on our model, where the diffusion probabilities are randomly assigned within [0, 1] and the embeddings are learnt from the network structure in advance. Note that the network with 10000 edges is denser and thus more activations in the generated cascades, providing more data for model training. We then investigate the connectivity of the nodes in the parent components detected by the CNM community detection algorithm in the basic component-based diffusion model Comp. Each component is measured with clustering coefficient which is defined as the network density of the corresponding subgraph of the component. Fig 3 shows the probability density function of the clustering coefficients of parent components in the two networks. The high percent of components with low clustering coefficients indicates that in scale-free networks, which is common for real networks, not all nodes in the detected components are closely connected, and thus we shall investigate local connectivity information of neighbors for better identification of components.

2) GENERATIVE ABILITY
We apply ICM, Comp, IC_Emb and Comp_Emb to the synthetic networks. The performance comparison results in terms of perplexity are shown in Fig. 4. We observe that all the models perform better for the network with 10000 edges when compared with that with 5000 edges due to more training data. Also, for the network with 5000 edges, while Comp and IC_Emb can result in a comparable perplexity decrease of 0.74 and 0.79 in terms of ICM respectively, our model Comp_Emb apparently gets further improvement and achieves a perplexity decrease of 1.37 in terms of ICM and is most similar to the groundtruth value. Similar phenomenon can be observed for the network with 10000 edges. While both Comp and IC_Emb result in a perplexity decrease of 0.05 in terms of ICM, our model achieves a perplexity decrease of 0.08.

D. EXPERIMENTS ON REAL DATA
To validate if our model assumption indeed happens in online social and information networks, we apply the proposed model to real data sets.

1) DATA SET
Two real datasets are used for the evaluation, namely MemeTracker [44] and Digg [45] where both the network structure and the information cascades are available, given as: VOLUME 8, 2020  We filter out the websites publishing less than 50 articles and get 11, 457 nodes and 71, 460 cascades.
Digg: The story voting process under a directed friendship network over one month in 2009. Users are modeled as nodes and following relations are modeled as edges. Each cascade is defined based on a particular frequently voted story. We filter out the websites publishing less than 50 articles and extract the corresponding cascades, resulting in 8, 954 nodes and 3, 553 cascades.
We also investigate the connectivity of the nodes in the parent components detected by the CNM community detection algorithm in the basic component-based diffusion model Comp for the two real networks. Fig 5 shows the probability density function of the clustering coefficients of parent components in both MemeTracker and Digg datasets. The high percent of components with low clustering coefficients indicates that not all nodes in the detected components are closely connected and thus we shall investigate local connectivity information of neighbors for better identification of components instead of using global quantities such as modularity.
2) GENERATIVE ABILITY Figure 6 shows the experimental results. The results consistently show that our model Comp_Emb outperforms the basic IC model, the basic component-based model Comp, and the embedding-based IC model IC_Emb. In particular, our model outperforms ICM by a perplexity decrease of 0.41 and 0.40 on MemeTracker and Digg datasets respectively, and outperforms Comp by a decrease of 0.14 and 0.04 respectively. It indicates that incorporating local connectivity information in component-based models can further result in more accurate models to be inferred. Our model achieves more apparent improvement on MemeTracker dataset as the information sources of news media are from a variety of types and they form components with a wide range of influence ability. Thus, the identification of components plays a vital role. Also, our model outperforms IC_Emb by a decrease of 0.26 and 0.09 respectively on the two data sets. Therefore, node connectivity information as well as components considered in our model are important factors when learning node embeddings for modeling information diffusion. In addition, we test different numbers of dimensions for the node embeddings as shown in Fig 7 and the settings giving the best performance are adopted.

3) QUALITATIVE EVALUATION
To illustrate how well the neighborhood network embeddings and the components are learnt, we visualize the neighbors' embeddings of the websites New York Times (abbreviated as NYTimes) and Seattle Local News (abbreviated as SEANews). Figure 8 shows (1) the learnt embeddings projected into two-dimensional space using the t-SNE [46] algorithm and (2) the mean of each component represented by red triangles. For comparison, the embeddings are colored based on the result of CNM community detection algorithm, with one color representing one cluster. Our model can detect three major components for NYTimes and two for SEANews, while CNM can only detect two major components for NYTimes. It is noted that the nodes in the two major parent components of NYTimes and SEANews identified by the CNM algorithm are also close in our embedding space. However, for NYTimes, near the boundary of the two    components lie the mass media websites such as news.bbc. co.uk (BBC News), foxnews.com (Fox News), and et al. Although they are in the same color with *.blogs.nytimes.com, the two sets of websites are relatively far away and indeed should form another category. They are recognized as another component by our model. In addition, the diffusion data helps learning more meaningful components. There is a red triangle around the websites reporting international news, including iht.com (International Herald Tribune), ap.google. com (Google News), and upi.com (United Press International). They should have different degrees of influence as compared to other mass media on NYTimes, even though they are near in the embedding space. This can also be identified by our model as a separate component. VOLUME 8, 2020 FIGURE 9. The neighborhood node embeddings learnt from the baseline embedding-based IC model IC_Emb for the website of New York Times (abbreviated as NYTimes) and Seattle Local News (abbreviated as SEANews). The embeddings are projected into two-dimensional space using the t-SNE [46] algorithm, and are colored based on the result of CNM community detection algorithm for comparison (each color represents one cluster).
Similar phenomena can be found for SEANews. On the right corner, it shows our model can identify the entertainment news websites such as etonline.com (Entertainment Tonight) and community.tvguide.com (TV Guide), and the entertainment news websites specifically about celebrities such as people.com (People) and tmz.com (Thirty Mile Zone) as two separate components.
For comparison, Figure 9 shows the neighborhood node embeddings of NYTimes and SEANews learnt from the baseline embedding-based IC model IC_Emb by projecting the embeddings into two-dimensional space. The embeddings are also colored based on the result of CNM community detection algorithm for comparison. Apparently, the learnt node embeddings by IC_Emb are not well clustered. Nevertheless, some of the major nodes in the major components identified by our model are also relatively close in the embedding space learnt from IC_Emb. For NYTimes, there are some mass media websites on the left (e.g., online.wsj.com (Wall Street Journal), cnn.com (CNN News), latimes.com (Los Angeles Times), foxnews.com (Fox News), and news.bbc.co.uk (BBC News)), information technology websites beside them (e.g., bits.blogs.nytimes.com, valleywag.com (Valleywag), computerworld.com (Computerworld) and pcmag.com (PCMag)), and weblogs of NYTimes itself on the right corner (*.blogs.nytimes.com). For SEANews, there are some mass media websites (e.g., cnn.com (CNN News), news.bbc.co.uk (BBC News), and portfolio.com (Portfolio)). Also, people.com (People) and tmz.com (Thirty Mile Zone) are still relatively closer than other nodes, and the same for etonline.com (Entertainment Tonight) and community.tvguide.com (TV Guide). The main reason behind is that IC_Emb learns parent node embeddings based on observed cascades, and parent nodes with similar influence are close in the embedding space. As the major nodes within each parent component identified by our model have similar influence on its child node, the learnt embeddings are also close under IC_Emb. However, only parts of the nodes are clustered under IC_Emb as the observed cascades are not sufficient. Meanwhile, our model learns node embeddings based on local connectivity information and directly models closely connected nodes as components based on the embeddings, and thus can achieve better performance. Figure 10 lists the major components identified for NYTimes and SEANews with associated component-based diffusion probabilities under our model. It shows that the parent nodes for each component are well associated with NYTimes and SEANews respectively. For NYTimes, Component I is most influential and composes of websites of international news agencies. Component II are weblogs of NYTimes itself. They contain featured articles of different sections of NYTimes' own blogs, which are written by wellknown experts. Component III are webistes of well-known mass media such as cnn.com (CNN News), news.bbc.co.uk (BBC News) and et al. Component IV are mainly information technology websites such as engadget.com (Engadget) and pcworld.com (PC World), and some related weblogs. For SEANews, Component I and Component II are websites about entertainment news, but Component I is specially on entertainment news about celebrities and less influential than Component II. Component III contains mass media websites such as cnn.com (CNN News), news.bbc.co.uk (BBC News) and et al.

IV. CONCLUSION
In this paper, we proposed to model local connectivity information of neighbors to enhance both component identification and diffusion network inference in social networks. In particular, we embedded nodes in a latent vector space using network embedding and proposed a component-based diffusion model with embedding-based parent components. A united probabilistic framework was proposed so that the parent components and the component-based diffusion models can be inferred jointly. Also, a two-level EM algorithm was derived for model inference based on the observed information cascades. For performance evaluation, we applied the proposed model to both synthetic and real world data sets with promising results obtained. We also discussed how the use of the embedding-based framework can enhance both component identification and the diffusion model. This paper has some limitations. For instance, unlike some related work where network structure is unknown [47], our work is applied to the situations where network structure is known. And also for simplicity, the diffusion rate is assumed to be static and topic-independent, and the activations only occur at discrete time steps. For future work, the above assumptions can be further relaxed. The proposed model can also be applied to other network analysis tasks such as influence maximization.