Analyzing Effects of Social Media User’s Influence on Contents Caching in ICN

In recent years, with the rapid growth of Social Networking Service (SNS), Information-Centric Networking (ICN) has been emerged as a promising network architecture for mitigating the traffic generated by SNS users. To fully utilize potential of ICN as a sort of the cache network, we have to carefully design the caching strategy and the cache-replacement policy while considering both of features of social network among SNS users and those of cache network. Therefore, in this paper, we assume that ICN is introduced as a content distribution infrastructure for SNS, and extensively analyze the characteristics of the socially-aware ICN. In particular, we focus on the influential user called influencer, which is one of typical features of social networks, and investigate the effect of the selection of influential users and their ratio on the content caching in ICNs. Our key findings are summarized as follows. First, the effectiveness of the content caching is dominated by the centrality measures to determine influential users. Second, on determining influential users, it is enough to sample at most 30% of the social network rather than its entire structure. Third, to gain the benefit of the content caching, we have to carefully select the ratio of influential users on the social network whose degree distribution follows power-law.


I. INTRODUCTION
In recent years, with the rapid growth of Social Networking Service (SNS), e.g., Twitter and Facebook, the amount of contents produced by users has dramatically increased.In addition, the amount of traffic to text and video data is increasing due to the active communication among users on SNS.
One of promising techniques to realize the efficient content distribution in SNS is a cache network which enables us to re-use contents in a network.Content Delivery Network (CDN) [1], which supports today's Internet, enables users to retrieve contents not only from a server storing the original, i.e., origin server, but also from geographically-neighbor cache servers.Also, Information-Centric Networking (ICN) [2] has been drawing attention as The associate editor coordinating the review of this manuscript and approving it for publication was Barbara Guidi .the next-generation Internet architecture.In ICNs, routers in the network can temporarily maintain the replica of content, so users can obtain contents from caching routers as well as the origin server.Utilizing caches in networks offers us preferable properties; for instance, the content delivery delay is shortened and the content availability is improved.
The key for cache networks to function efficiently is to appropriately design caching strategy and cache-replacement policy that operate at caching nodes, e.g., cache servers in CDN and routers in ICNs.The caching strategy is a method that when a caching node receives a content, it determines whether it inserts the content to its own cache or not.In contrast, the cache-replacement policy is to determine a content to be discarded from the caching node when the cache is fully occupied with other contents.Because the communication performance of cache networks is dominated by diverse factors, e.g., the content popularity and the routing strategy for the content request, we have to carefully design the caching strategy and the cache-replacement policy while considering the complex interaction caused by these factors [3], [4].
Furthermore, in the case of the content distribution for SNS using cache networks, it is essential to take into account specific features of OSN (Online Social Network) which represents social tie among users in SNS.Representative OSN-specific features are the existence of influential users called influencer and the community structure [5].Here, let us focus on the relationship between these two OSN-specific features and the content caching.At first, we easily assume that contents produced by influential users are requested by a large number of users, which implies that these contents should be preferentially cached at caching nodes for the efficiency.In addition, it is known that the communication among users in the same community occur more frequently than users in different communities (see, for instance, [6]).This phenomenon means that the content produced within one community is likely to be consumed by users in the same community, so when caching contents, it is desired to consider such locality of the content.As previous works, e.g., [7], have tackled to take features of OSN into designing communication networks, it is a challenging task to elaborately design cache networks which incorporate specific features of OSN as well as those of cache network itself.
In the literature, several studies have been devoted to the design of cache networks, e.g., CDN and ICN, considering OSN-specific features, however, to the best of our knowledge, the interaction between OSN-specific features and the content caching in cache networks has not been fully understood.A most relevant study [8] to this paper proposed a caching strategy which incorporates the existence of influential users on OSN.Although Ref. [8] revealed the effectiveness of the proposed caching strategy, it does not provide the detailed criteria to determine influential users.
Fundamental differences from Ref. [8] can be summarized as follows.First, we discuss the effect of centrality measures which determine influential users on the content caching.Reference [8] focuses on only two centrality measures, eigenvector and PageRank, but we focus on eight types of centrality measures to clarify the effect of selecting influential users.Second, we also discuss the effect of the ratio of influential users on the content caching.In our understanding, it is essential to appropriately select the ratio of influential users according to network resources, e.g., the number of caching nodes and the cache size, because it affects the performance of content caching.For this reason, we tackle to reveal the relationship between the ratio of influential users and the content caching.
Therefore, in this paper, we assume to introduce ICNs, particularly CCN (Content-Centric Networking) [9] and NDN (Named Data Networking) [10], as a content distribution platform for SNS, and then analyze the characteristics of socially-aware ICN.Specifically, we focus on influential users, which is one of fundamental features of OSN, and extensively investigate the effect of selection of influential users and its ratio on the content caching.Through our quantitative analysis of the interaction between influential users on OSN and the content caching, we contribute to realize efficient socially-aware ICNs.
The contributions of this paper are summarized as follows.
• Utilizing both of simulation and mathematical analysis, we extensively analyze how selection of influential users on OSN and its ratio affect the content caching in ICNs.
• Consequently, we reveal that regardless of the network topology of OSN, the cache hit ratio, which is defined as the probability that a user can retrieve contents from any caching routers, significantly differs according to the centrality measure which determines influential users.
• We also reveal that estimating influential users from a partial structure of an OSN, which is obtained with a sampling strategy, is effective; namely, the cache hit ratio obtained from a limited knowledge of the OSN is almost same with that obtained from the complete topology of the OSN.
• Moreover, we analytically reveal that determining the appropriate ratio of influential users can dramatically improve the cache hit probability when the degree distribution of OSN follows power-law.On the other hand, when the degree distribution does not follow power-law, the benefit caused by selection of influential users on the content caching is limited.This paper is an extended version of our previous work [11].In [11], we presented a preliminary evaluation of the socially-aware ICN through simulations.Meanwhile, in this paper, we aim to extensively analyze the characteristics of the socially-aware ICN by combining simulation and mathematical analysis.
The structure of this paper is organized as follows.First, Section II reviews previous works on cache networks incorporating specific features of OSN.Section III presents a conceptual ICN-based platform for SNS content distribution.Section IV investigates how selection of influential users affect the content caching in ICNs through simulations.Furthermore, Section V investigates the effect of the ratio of influential users on the content caching in ICNs by using mathematical analysis.Finally, Section VI provides the summary of this paper and addresses future works.

II. RELATED WORK
Pioneering works, which developed the caching strategy while taking into account of OSN-specific features, are References [12] and [13].Wang et al. [12] conducted a largescale measurement of OSNs to clarify characteristics of contents distribution on SNS.Based on their observations, they proposed a cache-replacement policy at an edge server which achieves better performance than typical cache-replacement policies, e.g., LRU (Least-Recently Used) and LFU (Least-Frequently Used).Also, Hu et al. [13] proposed an efficient CDN-based platform for delivering video contents produced by SNS users.The proposed method is comprised of twofold: (i) classifying users into a community based on social relationship among users, geographical locality, and user's interest, and (ii) designing a cache-replacement policy and selection of a cache server such that users belonging to the same community can share their video contents.
While [12], [13] focused on the edge-caching like CDN, a few studies [8], [14] have been devoted to design the socially-aware ICNs.Truong et al. [14] investigates the benefits of introducing NDN [10], which is one of promising network architecture realizing ICNs, as a content distribution platform for SNS, e.g., Twitter.Specifically, by using mathematical analysis, they focused on a network with hierarchical topology, and derived the number of hops required for content delivery and the amount of traffic transferred through a network.Also, through several numerical examples, they showed that the geographical locality of users, which is one of OSN-specific features, has positive impact on the contents caching in NDN.Bernardini et al. [8] proposed a caching strategy called SACS (Socially-Aware Caching Strategy) for socially-aware ICNs.The concept of SACS is that a router do not cache all contents uniformly, but contents only published by influential users.Simulation and experiment on a testbed revealed the effectiveness of SACS, for instance, the cache hit ratio can be dramatically improved.
In addition to the aforementioned studies, recent advancements in the contents caching for wireless environments have been remarkable [15], [16].Zheng et al. [15] analyzed throughput and delay on information-centric wireless networks using the asymptotic analysis.Naeem et al. [16] performed a comparative study that investigates the effectiveness of in-network caching in the IoT (Internet of Things) environment.Furthermore, [17], [18], [19] incorporate user's mobility and studied the contents caching on wireless environments such as MANETs (Mobile Ad-hoc NETworks) and VANETs (Vehicular Ad-hoc NETworks).Do et al. [17] analyzed information-centric IoT networks which comprised of mobile devices and access points with a finite cache, and presented the optimal caching strategy.Liu et al. [18] analyzed the impact of mobility's speed on the throughput and delay performance of information-centric MANETs.Gupta et al. [19] proposed a cooperative caching strategy for ICN-based vehicular networks by using hierarchical clustering.However, to the best of our knowledge, these studies do not consider users' sociality; for instance, content request is simply given by a probability density function, which ignores users' social activity.

III. ICN-BASED CONTENT DISTRIBUTION FOR SNS
In this section, we present a conceptual architecture realizing ICN-based content distribution for SNS and its component.
The key idea of our proposed architecture is summarized as follows.
• We realize the content delivery among users who have a social relationship on a physical network comprising of ICN routers.
• We provide a way for the user to grasp a name of content that they seek by introducing two types of requests: prerequest and actual-request.
• We incorporate a mechanism for sharing a list of influential users among ICN routers.This enables ICN routers to operate the caching strategy considering influential users.

A. PRELIMINARY
As an SNS application, we assume a closed user-oriented application such as Twitter and Facebook.In the SNS application, a user can have social relationship with other users;1 for instance, in Twitter, this relationship corresponds to the follower and the followee.In addition, we assume that contents produced by a user can be shared among social neighbors of the producing user.The rationale behind this assumption is to focus on the effect of the social relationship among users on the content caching as an initial step to evaluate the socially-aware ICNs.Also, this assumption makes the implementation of our conceptual architecture easy; specifically, to achieve such a content retrieval, a user participating the SNS application simply manages and updates his/her own friendship list.A substantial benefit of this mechanism is that a user does not need to grasp the entire structure of the social relationship, which is feasible for the social network evolving with time.
The content in an SNS application is delivered over an ICN network comprising of multiple ICN routers with finite cache.Note that we refer an ICN router to a router used in CCN [9] and NDN [10].In the following explanation, we assume that the forwarding table at each ICN router is appropriately configured according to the routing protocol.For this reason, the ICN router can forward a request packet and a response packet, i.e., content request and content, to a desired destination.It is worth of noting that this assumption is realized by using the routing protocol for ICNs, NLSR (Named-data Link State Routing) protocol [20].More specifically, each router advertises its accommodating users as a prefix of contents in accordance with NLSR protocol; as a consequence of repeated advertisements among routers, the router can construct its own forwarding table.

B. NAMING
Because a user is needed to uniquely identify its name when requesting a content, a unique content identifier must be assigned to each content in ICNs.In our conceptual architecture, we suppose that the content identifier includes the identifier of the user who published the content, e.g., the user's account name [14].Specifically, we define the content identifier as the following notation:

C. CONTENT RETRIEVAL
A user participating in SNS application, who is directly connected to a nearest ICN router, can perform two activities: publish and retrieve [8].''Publish'' indicates that a user newly produces a content, and ''retrieve'' indicates that a user obtains contents from its social neighbors.For instance, in the context of Twitter, ''publish'' and ''retrieve'' correspond to posting a new tweet and obtaining new tweets by followees, respectively.
In general ICNs, the content retrieval is realized by a simple procedure -a user simply issues a request packet including the identifier of requesting content that is given as a priori knowledge; in contrast, in our network architecture, the content retrieval is realized by two types of requests, pre-request and actual-request.First, a user issues pre-requests to its social neighbors to retrieve lists of contents newly-published by each of social neighbors.Specifically, for every social neighbor, i.e., target user, the requesting user injects a single request packet that includes a timestamp when the requesting user lastly issued a pre-request for the target user.Then, the target user receiving the pre-request summarizes a list of contents that are published after the timestamp embedded in the pre-request, and returns the list describing identifiers of these contents as a response packet to the requesting user.After that, based on the returned list, the requesting user sequentially issues actual request packets to the target user.The request packet is relayed through ICN routers, and an intermediate ICN router immediately sends back the corresponding content if the ICN router caches it.Otherwise, the request packet will arrive at the target user, then it simply returns the requested content.
The advantage of using two types of requests is to enable the user to retrieve contents published by social neighbors even though the user does not know their content's identifier.In contrast, its intuitive disadvantage is twofold: (i) increase in the time for a user to retrieve contents and (ii) increase in the traffic volume.In particular, regarding (i), using two types of requests causes that an additional delay due to a prerequest, which corresponds to one RTT (Round-Trip Time) between ICN routers accommodating users, is added to an actual content delivery delay.
An example of content retrieval on ICN-based SNS application is illustrated in Fig. 1. Figure 1(a) depicts a physical network comprising of multiple ICN routers and users each of which is accommodated to an ICN router.Also, Fig. 1(b) depicts a social relationship among users; for instance, user A has relationship with user C and user F. As aforementioned, we assume that a request destined to a content published by a user is passed through ICN routers.In this example, user A retrieves contents published by user C and user F. In this case, a request from user A to the content published by user C is passed through routers 1 and 3; that to the content by user F is passed through routers 1, 2, 4, and 5.

D. CONTENTS CACHING
The ICN router controls its own cache according to the caching strategy and the cache-replacement policy when forwarding contents produced by SNS users.For instance, an ICN router following LCE (Leave Copy Everywhere) [9], which is a typical caching strategy, uniformly caches contents.Of course, other caching strategies, e.g., LCD (Leave Copy Down) [21], can operate at an ICN router.
Because we will use SACS [8] as a caching strategy in Section IV and V, we have to mention how SACS operates on our conceptual architecture.In the case of caching strategy SACS, an ICN router needs to judge whether the content publisher is an influential user or not, to determine whether caching the content.To operate SACS, it is necessary for an ICN router to maintain a list of influential users in advance.This is realized that an administrator of the ICN network observes social relationship among users on SNS application, i.e., OSN, and then distributes a list of estimated influential users to each ICN router.Consequently, when an ICN router receives a response packet, it can determine whether it caches the packet by performing the following operations: (i) extracting a user account contained in the content identifier of the response packet, and (ii) collating the user account and a list of influential users.We should note that routers in ICN networks do not need to look up a content itself of response packet on determining their cache decision because we assume that the identifier of influential user is embedded in the content identifier of response packet as specified in [22].

IV. SIMULATION
In this section, through simulation, we extensively investigate how the selection of influential users affect the content caching in socially-aware ICNs.Our experiment is comprised of two parts.The first experiment aims to reveal what centrality measure should be used when selecting influential users.To accomplish this aim, we measure the cache hit ratio while changing various centrality measures.In the first experiment, we assume that a complete structure of a social network is given to calculate the centrality measure of nodes; however, it is generally known to be difficult to completely grasp the entire structure of a social network due to its dynamics, i.e., the structure of the social network dynamically changes.Therefore, the second experiment aims to verify that selecting influential users is still effective when the entire structure of the social network is unknown.We investigate the effectiveness of the selecting influential users from a partial structure of the social network, which is obtained with a sampling strategy.

A. METHOD 1) EFFECT OF SELECTION OF INFLUENTIAL USERS WITH CENTRALITY MEASURES
We generated a random graph, which represents connection relationships among ICN routers, with ER (Erdõs-Rényi) model [23].The number of nodes and links in the random graph were set to 100 and 200, respectively.From two types of datasets: Last.fm2 and Facebook, 3 both of which are typical SNS, we obtained graphs which represent social relationship among OSN users.The number of nodes and links in Last.fm are 1,843 and 12,268.Also, the number of nodes and links in Facebook are 4,039 and 88,234.We should note that because a graph contained in Last.fm dataset is unconnected, i.e., a graph is composed of fragmented multiple graphs, we used the largest-connected component as a graph for our simulation.In our simulation, an OSN user is randomly accommodated to an ICN router. 4ccording to a users' interaction model in OSN [8], a user repeatedly produces a content and issues a request for contents provided by users which have social relationship with the requesting user, i.e., adjacent nodes from the requesting node in OSN.The parameter setting in the users' interaction model were the same with those in [8].Please refer to [8] and [24] for the details of the users' interaction model and its parameter setting.
As a caching strategy at a router, we used SACS proposed in [8].Namely, a router following SACS only caches contents published by influential users.The cache size at an ICN router was set to 10 [contents], and the cache-replacement policy was LRU.
In this paper, we calculated the centrality of a user using a given centrality measure, and we regarded top p (0 ≤ p ≤ 1) users with the highest centrality measure as influential users.We used eight types of centrality measuresdegree centrality, betweenness centrality, closeness centrality, eigenvector centrality, PageRank [25], k-core index [26], VoteRank [27], and CI (Collective Influence) [28].Degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, PageRank, and k-core index are typical centrality measures used in the field of complex networks.In contrast, VoteRank and CI are recently proposed one, and it is known that these centrality measures are superior in terms of efficient information diffusion.In our simulation, we set the ratio of influential users p to 0.1.As described in Section II, p is one of important parameters affecting our simulation results, but it is difficult for us to investigate the effect of p through the simulation due to its computational cost.For this reason, Section V will investigate the effect of the ratio of influential users p through mathematical analysis.
We used our ICNSIM (ICN SIMulator) for the simulation, and measured the cache hit ratio, which is defined as the ratio of the number of corresponding contents returned from intermediate ICN routers along a path to the number of requests issued by users.We repeated a single simulation with 100,000 [slot] 10 times and computed the average cache hit ratio over an entire network.

2) EFFECT OF SELECTION OF INFLUENTIAL USERS FROM UNKNOWN SOCIAL NETWORK
The overview of this experiment is as follows; (i) we obtain a partial structure of the social network, which is corresponding to a subnetwork comprised of the sampled nodes, with a sampling strategy; (ii) we select influential users from the subnetwork based on a given centrality measure; (iii) we measure the cache hit rate for a given influential users through a simulation.Except for the process of selecting influential users from a social network, the experiment methodology is same with that described in Section IV-A1; hence, we shall explain only the selection of influential users from an unknown social network.
To obtain a partial structure of a social network, we used the following four types of sampling strategies [29], [30]: This sampling strategy obtains a partial structure of a network topology with a randomwalk on a graph [31].Specifically, a walker, called an agent, sequentially visits to an adjacent node, which is randomly chosen from neighbor nodes of the currently-visiting node with a uniform probability.This procedure is repeated until the number of nodes visited by the walker reaches a given number of sampled nodes.
• DFS (Depth-First Search) From a randomly-chosen starting node, DFS iteratively visits a one of visited nodes.DFS visits and explores unvisited neighbor nodes of the earliest visited node [29].Similar to RW sampling, the exploration with DFS terminates when the number of explored nodes reaches a given number of sampled nodes.
• BFS (Breadth-First Search) Sampling with BFS is same with DFS except for the selection of node to be visited.BFS visits and explores unvisited neighbor nodes of the most-recently visited node [29].
• Random sampling This sampling strategy simply obtains a set of nodes that is randomly-chosen from an entire network.We should note that this sampling strategy is not intended to obtain a partial structure of the network; we simply used this strategy to provide a baseline result.
In RW sampling, DFS, and BFS, we randomly selected a source node starting exhaustive exploration.We identified the top p N nodes with highest degree from a sampled network as influential users, where p and N are a ratio of influential users and the number of nodes of a (complete) social network, respectively.Unlike the first experiment (see Section IV-A1), we only used the degree centrality as a centrality measure.This is because, in this experiment, we focus on the effectiveness of the sampling technique to identify influential users rather than the effect of centrality measures.

1) EFFECT OF SELECTION OF INFLUENTIAL USERS WITH CENTRALITY MEASURES
First, we discuss the effect of the centrality measure on the cache hit ratio.Figure 2 shows the average cache hit ratio with different centrality measures.Results of Last.fm and Facebook are shown in Figs.2(a) and 2(b), respectively.For the sake of detail analysis, we additionally plotted results when randomly selecting influential users as ''random''.
Fig. 2(a), we can find that selection of influential users based on centrality measures significantly improves the cache hit ratio.Specifically, among eight centrality measures, the highest cache hit ratio is achieved with betweenness centrality, PageRank, and VoteRank.In contrast, the lowest one is achieved with eigenvector centrality and k-core index.This tendency means the importance of appropriately selecting influential users, that is, the centrality measure is an important factor which affects the communication performance of socially-aware ICNs.
From Fig. 2(b), we confirm that the above observation is maintained in a different graph.Comparison between Figs. 2(a) and 2(b) implies that observed tendency is maintained in the case of Facebook.Namely, the cache hit ratio is dominated by the centrality measure regardless of the network topology of OSN.It is worth of mentioning that the cache hit ratio in the case of Facebook is overall smaller than Last.fm.This difference is caused by the fact that the number of nodes in Facebook is larger than Last.fm.Hence, there exist more contents generated by influential users, which become caching candidate at ICN routers, in a network.This results in degradation in the cache hit ratio.
We discuss which the centrality measure should be used while considering computational cost of calculating centrality measures.Through our simulation, we confirmed that selection of influential users with betweenness centrality and PageRank significantly improves the average cache hit ratio compared to other centrality measures.However, we cannot ignore the computational cost required to calculate betweenness centrality and PageRank.Because the size of OSN is tremendously large in general, these two centrality measures whose computation depends on the network size might be unsuitable.From this viewpoint, degree centrality becomes an alternative approach because of the following reasons: (i) calculation of degree centrality does not need any computational overhead, (ii) the cache hit ratio of degree centrality is close to that of betweenness centrality and PageRank.

2) EFFECT OF SELECTION OF INFLUENTIAL USERS FROM UNKNOWN SOCIAL NETWORK
Next, we investigate whether estimating influential users from a partial structure of an OSN is effective.Especially, we focus on how much the number of sampled nodes is required to achieve the cache hit ratio when the OSN is completely given.
Figure 3 depicts the relationship between the sampling ratio, which is defined as the ratio of the number of sampled nodes to the number of nodes in a complete OSN, and the cache hit ratio.In these figures, results with different sampling strategies are plotted.
Figure 3 reveals that selecting influential users from a limited knowledge of an OSN is effective in socially-aware ICNs.In particular, the cache hit ratio with the sampling 127684 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.ratio of 0.3 almost reaches that of 1.0, i.e., the cache hit ratio when the complete structure of the OSN is known.This achievement stems from that the structural property of the OSN is (moderately) maintained in the sampled network, namely, the sampled network contains influential users corresponding to nodes with high degree.In particular, in the case of RW sampling, the degree distribution of the sampled network is biased towards high degree (see, for instance, [32]) due to RW's characteristics, i.e., the RW agent tends to visit nodes with high degree.
We shall refer to the effect of the sampling strategy, although our experiments focus on how much effectively the sampling strategy works for estimating influential users for socially-aware ICNs rather than which sampling strategies is the best.Results of Last.fm (Fig. 3(a)) indicate that DFS among four types of sampling strategies used in this experiment achieves the best performance; on the other hand, those of Facebook (Fig. 3(b)) indicate that the difference due to crawl-based sampling strategies, i.e., RW sampling, DFS, and BFS, is marginal.This means that we can freely choose a sampling strategy among these crawl-based sampling strategies, while considering the sampling cost.5

V. ANALYSIS
In this section, using mathematical analysis, we approximately derive the cache hit probability for a given degree distribution of OSN, and then analyze the relationship between the ratio of influential users and the cache hit probability.Section IV focuses on the selection of influential users using centrality measures; in contrast, this section focuses on the ratio of influential users on a simplified analytic model.

A. ANALYTIC MODEL
In this paper, we consider content delivery between OSN users with social relationship on a cache network comprising of multiple ICN routers.
We denote the number of users and the degree distribution of OSN representing social relationship among users as N and P(k), respectively.
Each user is accommodated into a randomly-chosen ICN router, and it repeatedly issues a request to retrieve contents produced by its adjacent users in OSN.The content request issued by a user is forwarded through ICN routers, and the discovered content in the network is returned to the requesting user.
We assume that the caching strategy at an ICN router is SACS [8], so the ICN router caches contents only produced by influential users.In this paper, we select influential users based on a centrality of a user and on the ratio of influential users in OSN.Unlike Section IV, we focus on degree centrality as a centrality measure.Hence, we regard top p N users with the highest degree in OSN as influential users, where p (0 ≤ p ≤ 1) is the ratio of influential users.It is worth of noting that SACS involves the conventional caching strategy LCE (Leave Copy Everywhere) [9].This is because SACS with the ratio of influential users p of 1 is equivalent to LCE, i.e., ICN routers uniformly cache contents without considering influential users.
The cache size at an ICN router is denoted as B [content].Also, the cache-replacement policy is supposed to be LRU.

B. DERIVATION OF CACHE HIT PROBABILITY
In this paper, we derive the expected value of the cache hit probability, which is the probability that when a user requests content, the corresponding content is returned from any ICN routers along a path.We denote the expected value of the cache hit probability as H .
Because a user issues multiple requests to all adjacent users including influential users and non-influential users, the cache hit probability H is composed of twofold: the cache hit probability H + for contents produced by influential users and the cache hit probability H − for contents produced by non-influential users.By letting q be the fraction of influential neighbor that is a set of influential users which is adjacent to a user, the cache hit probability H is given by Recall that we assume the caching strategy at an ICN router as SACS.Hence, ICN routers cache contents only produced by influential users and it never caches contents produced by non-influential users.Therefore, we obviously have H − = 0 and we can rewrite Eq. ( 1) as In the following, we first derive the fraction of influential neighbors q.The fraction of influential neighbors q is corresponding to the probability whether an adjacent user is influential user.To determine whether an adjacent user is belong to influential users, it must be satisfied that the degree of the adjacent user is larger than or equal to the minimum degree of influential users.Hence, by denoting the degree distribution of an adjacent user and the the minimum degree of influential users as Q(k) and k + min , respectively, the fraction of influential neighbors q is given as follows.
In the above equation, degree distribution Q(k) is known as an excess degree distribution, and Newman [25] formulates Q(k) as follows.
Here, ⟨k⟩ is the average degree of OSN and ⟨k⟩ = ∞ k=1 k P(k).To derive the minimum degree of influential users, we use a cumulative distribution function F(k) of degree distribution P(k).From an inverse function of cumulative distribution function F(k), we have Next, we derive cache hit probability H + for contents produced by influential users.The cache hit probability H + is the probability that when a user issues a content request, a cache hit occurs at any ICN routers along a path.By denoting the cache hit ratio at a single ICN router as h, H + is given by where ⟨ℓ⟩ is the average path length of a cache network comprising of ICN routers.
In the case of SACS, only p N types of contents are candidate for cached contents within the cache network.
Hence, using cache size B at an ICN router, the cache hit probability h is approximately given by h ≃ min( B pN , 1).
In the above equation, we implicitly assume that the cache size is sufficiently small compared to the number of users in OSN, i.e., B ≪ N .

C. NUMERICAL EXAMPLE
Through several numerical examples, we investigate the relationship between the ratio of influential users and the cache hit probability.The degree distribution of OSN is given by three types of probability mass function -Poisson, exponential, and power-law distributions [34].
• Poisson distribution • Power-law distribution In Eqs. ( 9) and ( 10), µ and α are parameters, and ζ (α) is also Riemann's zeta function.We determined parameters µ and α such that the mean of the probability mass function is equal to average degree ⟨k⟩.In our numerical example, the number of users N in OSN was 10,000 and the average degree ⟨k⟩ was 4.
We show degree distributions with three types of probability mass functions as Fig. 4. Note that the x-axis and the y-axis of this figure are logarithmic.This figure clarifies the difference in the three types of mass probability function.In particular, ''power-law'' exhibits difference tendency from ''poisson'' and ''exponential''; namely, the probability that nodes with degree k exist P(k) linearly decreases with the increase in the degree k.This is because, as Eq. ( 10) implies, the degree distribution is proportional to a power of the degree.Unless explicitly stated, we used the following parameter settings: cache size at an ICN router B = 100 [content] and average path length of the cache network ⟨ℓ⟩ = 3.
Before discussing the relationship between the ratio of influential users and the cache hit probability, we confirm how the fraction of influential neighbors (Eq.( 3)) is varied according to the degree distribution.Figure 5 shows the relationship between the ratio of influential users p and the fraction of influential neighbors q.In this figure, results with three degree distributions are plotted as ''poisson'', ''exponential'', and ''power-law''.From this figure, we can find that when the degree distribution follows power-law, the ratio of influential neighbors rapidly increases as the ratio of influential users increases because of the existence of a few users with extremely-high degree.Let us provide the rationale behind this tendency more specifically while focusing on the region where the ratio of influential users p is small.We recall that the ratio of influential neighbors q is dominated by two factors: k + min and the partial sum of Q(k), i.e., ∞ k=k + min Q(k), as shown in Eq. (3).In the region of small p, the minimum degree of influential users k + min becomes relatively-large; this indicates that the ratio of influential neighbors q is determined by the partial sum of Q(k) in the region of high degree.Based on the fact that Q(k) is derived  from the degree distribution P(k) and that P(k) of ''powerlaw'' is biased towards high degree, the partial sum of Q(k) of ''power-law'' becomes large compared to ''poisson'' and ''exponential''.As a consequence, the ratio of influential neighbors q of only ''power-law'' exhibits rapid increase even though the ratio of influential users p is small.Finally, as shown in Fig. 6, we present the relationship between the ratio of influential users and the cache hit probability.Similar to Fig. 5, results with three types of degree distributions are plotted in Fig. 6.From this figure, we can easily observe that the cache hit probability varies significantly according to the degree distribution of OSN.In the case of ''poisson'' and ''exponential'', there is almost no difference between the peak cache hit probability and the cache hit probability when p = 1.This is mainly due to the fact that, in the small regime of the ratio of influential users p, the ratio of influential neighbors q is not sufficiently increased; consequently, as Eq.(2) implies, the cache hit probability H remains low with respect to p.These results imply that the benefit to the content caching by selecting influential users is limited.In contrast, in the case of ''power-law'', the cache hit probability exhibits convex upward around p = 0.02 with respect to the ratio of influential users p.This is mainly caused by the following two reasons: (i) even if the ratio of influential users p is small, the fraction of influential neighbors becomes large as shown in Fig. 5, and (ii) restricting the number of influential users contributes to improve the cache hit probability (Eq.( 7)) at an ICN router.From these observations, we conclude that by appropriately selecting the ratio of influential users in OSN, we can expect a significant improvement in the cache hit probability compared to the conventional caching strategy.

VI. CONCLUSION
In this paper, we presented a conceptual architecture realizing ICN-based content distribution for SNS and extensively analyzed its characteristics by using both of simulation and mathematical analysis.Specifically, we focused on influential users, which is one of typical features of OSN, and investigated the impact of selection of influential users and their ratio on the content caching in ICN.Below, we remark on three important results obtained from simulation and mathematical analysis.First, on selecting influential users for the content caching, PageRank and betweenness centrality achieved the best performance among eight types of centrality measures.But, from the perspective of the computational complexity, these two centrality measures might be unsuitable for large-scale OSNs.Hence, we emphasize that it is effective to use degree centrality that achieved nearbest performance.Second, on determining influential users, we need not obtain the complete structure of the OSN; it is enough to obtain at most 30% of a partial network sampled by the sampling technique.Third, the importance of determining the ratio of influential users differs according to the degree distribution of a network; the importance is particular noticeable in the case that the degree distribution follows power-law.Based on the fact that the OSN's degree distribution typically follows power-law, we have to carefully select influential users on the OSN to gain the benefit of the contents caching.
As future works, we are planning to perform an evaluation of taking account of geographical locality of SNS users and to design a cache strategy for socially-aware ICN using our approximation analysis.

FIGURE 1 .
FIGURE 1.An example of content retrieval on ICN-based SNS application.

FIGURE 2 .
FIGURE 2. Average cache hit ratio with eight types of centrality measures.

FIGURE 3 .
FIGURE 3.Relationship between sampling ratio and cache hit ratio.

FIGURE 4 .
FIGURE 4. Degree distributions used in our numerical.examples

FIGURE 5 .
FIGURE 5. Relationship between ratio of influential users and ratio of influential neighbors.

FIGURE 6 .
FIGURE 6. Relationship between ratio of influential users and cache hit probability.