Caching and Multicasting for Fog Radio Access Networks

Cache-enabled Fog radio access network (F-RAN) is a promising technology to alleviate the traffic congestion and boost the contents delivery success rate. The efficiency of disseminating the cached contents in F-RAN can be boosted by enabling multicast service at the fog access points (F-APs). This paper proposes a joint random caching and multicasting optimization scheme for wireless backhauled F-RAN. Using tools from stochastic geometry, the expression of the successful transmission probability (STP) is derived by carefully analyzing the different types of serving F-APs and interferers. Then, a closed-form expression of the asymptotic STP in the high SNR region is derived to reduce the complexity. The joint caching and multicasting optimization problem is formulated to maximize the STP. The optimization problem is complex and non-convex in general. A novel projected cuckoo search algorithm (PCSA) is proposed to obtain the optimal content placement that maximizes the STP. The numerical simulation results show that PCSA outperforms the original cuckoo search algorithm (CSA) and the proposed asymptotically joint caching and multicasting scheme outperforms the benchmark caching schemes by up to 15% higher STPs.

strategy that maximizes the hit probability in self-backhauled millimeter-wave F-RAN. The optimal cache design presented in [10] maximises the fractional offloaded traffic (FOT) and successful transmission probability (STP) for joint transmission and parallel transmission strategies. In [11], an edge caching scheme was proposed to improve the cache hit rate by predicting contents popularity and learning the endusers preferences. In [12], the authors proposed a proactive cache placement strategy to optimize the STP in wireless backhauled F-RAN, where the optimal caching design was obtained using projection gradient method. In [13], a dynamic distributed edge caching strategy was proposed to reduce the fronthaul traffic load and request service delay in ultra-dense F-RAN. In [14], the authors used actor-critic reinforcement learning to optimize the service delay in a fog-enabled IoT network by jointly considering the computing, caching, and radio resources. In [15], the association of end-user, prediction of content popularity, and placement of content were optimized using machine learning based algorithms. In [16], in order to mitigate the inter-F-AP interference, the F-APs were self-organized into multiple clusters and the joint radio and cache resource management optimization problem was formulated as Stackelberg game. A graph based cooperative caching scheme to maximize the offload traffic was presented in [17]. In [18], a hybrid caching scheme was proposed to balance the delay and energy efficiency. In [19], the authors proposed a delay-aware cache update scheme that takes into account the mobility of the end-users in F-RAN, wherein dueling deep-Q-network framework was utilized to minimize the average transmission delay. The alternating direction method of multipliers algorithm was used to optimize the cache placement of the capacity-aware edge caching strategy proposed in [20]. In [21], the authors formulated the edge caching problem to optimize the FOT, STP, and delay, where an improved fruit fly optimization algorithm was used to find the optimal content cache placement. In [22], the authors proposed a joint power allocation and proactive cooperative caching scheme that minimizes the latency in F-RAN. In [23], [24], the authors proposed multi-objective caching schemes for wireless backhauled F-RAN. Nevertheless, none of the aforementioned contributions considers the multicast transmission scheme.
The joint caching and multicasting in F-RAN was investigated in [25]- [27]. In [25], the authors tackled the optimization of non-orthogonal multiple access (NOMA) multicast transmission in F-RAN, wherein the problem was formulated to reduce the delivery latency of the downlinks from the content provider to end-users via the fog nodes. However, the problem of optimizing the cache placement was not addressed. In [26], the authors proposed a joint caching and node association strategy to maximize the energy efficiency in a multicast-enabled heterogeneous F-RAN. However, the wireless backhauling of the fog nodes was not taken into consideration. In the joint caching and multicasting scheme in [27], the caching optimization problem was formulated as a mixed-integer non-linear programming (MINLP) problem aiming at minimizing the transmission time of cloud processor. Then, the problem was relaxed and transformed into a tractable one. However, the fronthaul links from the F-APs to the end-users and end-users preferences were not taken into account. The optimization of joint caching and multicasting in traditional cellular network was addressed in [28]- [31], wherein the problem of caching and multicasting design was formulated to improve the STP. It is noted that [28]- [31] did not take into account the wireless backhauling of the base stations.
Motivated by the previous discussions, this paper proposes a joint random caching and multicasting scheme to maximize the STP in F-RAN. The proposed scheme is based on the content combinations and takes into consideration the stochastic nature of the channel, multicast transmission, and the wireless backhauling of the F-APs. The main contributions of this paper are summarized as follows: • First, expressions of the association probabilities with the direct and transit F-APs with respect to the requested content are derived. Then, an expression of the STP in the general SNR region is derived using stochastic geometry tool and by carefully analyzing the different types of serving F-APs and interferers.
• A closed-form expression of the asymptotically STP in the high SNR is derived to reduce the complexity.
• The optimization problem is formulated to obtain the optimal cache placement that maximizes the STP.
• A novel projected cuckoo search algorithm (PCSA) is developed to obtain the optimal caching placement.
• Finally, the numerical simulations show that the proposed PCSA outperforms the original cuckoo search algorithm (CSA) and converges faster than it. The results also show that the proposed asymptotically caching scheme outperforms the traditional caching schemes.
The rest of this paper is organized as follows. In Section II, we present the system model, including the network, caching, multicasting, and association models. The performance analysis and the problem optimization are presented in Section III. The numerical results and discussions are provided in Section IV. The concluding remarks are delivered in Section V. A list of the key notations used throughout the paper is provided in Table 1.

II. SYSTEM MODEL A. NETWORK MODEL
Consider a downlink cache-enabled F-RAN consisting of a tier of F-APs connected through wireless backhaul links with a tier of cloud access points (C-APs). The locations of the limited cache capacity F-APs in R 2 are modeled as an independent homogeneous Poisson point process (PPP) F with density λ F . Whereas, the C-APs are modeled as another independent homogeneous PPP C with density λ C , such that λ F > λ C . The locations of the end-users are modeled as an independent homogeneous PPP U with density λ U . With no loss of generality, we focus on the typical end-user u 0 located on origin [32]. It is assumed that all the F-APs and C-APs are equipped with a single antenna with transmission power P. The total transmission bandwidth of each F-AP and C-AP is W F and W C , respectively. Each end-user has one receive antenna. For the propagation model, both large-scale and small-scale fading are considered. For large-scale fading, a signal propagates distance D is attenuated by D −α , where α is the path loss exponent. Rayleigh fading is assumed to model the small-scale fading, wherein each small-scale fading coefficient h follows |h| 2 d ∼ exp(1).

B. CACHING AND MULTICASTING
Let M = {1, 2, ..M } stand for the set of M contents in the network. All the contents are assumed to be of the same size and have an identical apriori known popularity distribution among all the end-users for sake of tractability. Let a = (a m ) m∈M represent the content popularity distribution, where a m ∈ (0, 1), such that M m=1 a m = 1, denotes the probability of randomly requesting content m by an end-user. It is assumed that a m is characterized by Zipf distribution, i.e., a m = m −γ / m∈M m −γ , where γ is the skew parameter. With no loss of generality, the contents are assumed to be indexed according to a ranked from the most popular content to the least one (i.e., a 1 ≥ a 2 ≥ · · · ≥ a M ).
It is assumed that all the contents are cached at each C-AP. Whereas, in the tier of limited cache capacity F-APs, each F-AP is provided with a cache of size K ≤ M . Accordingly, each F-AP caches a combination of K different contents out of M contents. Therefore, the total number of combinations in the F-AP tier is J = M K . Let J = {1, 2, · · · , J } stand for the set of J combinations. Combination j ∈ J can be represented as a vector χ j = χ j,m m∈M with dimension M , where χ j,m = 1 if combination j contains content m, otherwise χ j,m = 0. Hence, the set of K contents in combination j can be represented as M j = {m; χ j,m = 1}.
The considered random content placement strategy is based on the combination of contents as in [28]- [31]. Let p = (p j ) j∈J denote the caching distribution of combinations, where p j is the probability of caching combination j at each F-AP, such that p j satisfies Based on the caching distribution of combinations p, the caching distribution of contents can be defined as T = (T m ) m∈M , where T m denotes the probability of caching content m at each F-AP and given as follows where J m = {j ∈ J : χ j,m = 1} is the set of J m = M −1 For an efficient content dissemination in the fronthaul link from the F-APs to the end-users, this paper adopts multicast transmission service at the F-APs. Multicast is a content-centric transmission scheme, in which a F-AP with L 0 associated end-users requesting K 0 (i.e., L 0 ≥ K 0 ) different contents cached by it, serves all the users requesting the same content using a single transmission, in which each requested content is transmitted once using frequency division multiple access (FDMA) over 1/K 0 of the total bandwidth [28]- [31], which results in a more efficient utilization of the total bandwidth compared with the connection-based transmission scheme known as unicast, in which the F-AP transmits a single content to each one of the L 0 requesters over 1/L 0 of the total bandwidth. Note that, by adopting multicast transmission service the transmission rate of the F-AP is improved by L 0 /K 0 compared with unicast transmission service. Whereas, for sake of tractability, broadcast VOLUME 10, 2022 transmission service is adopted by the C-APs, such that the transmission bandwidth of a C-AP is evenly shared by the content library, i.e., each content is transmitted over 1/M of the total bandwidth.

C. ASSOCIATION MODEL
This paper considers the content-centric association mechanism illustrated in Fig. 1, under which the serving F-AP is statistically determined by the caching distribution p and may not be the geographically nearest F-AP to the requester. By taking into account both the content-centric and physical layer proprieties, the considered association mechanism is more flexible and has a higher probability of association than the traditional connection-based association mechanism that only considers the physical layer proprieties and under which the requester is always associated with the geographically nearest F-AP to its location. It is assumed that the C-APs can only be accessed by the F-APs, i.e., there is no direct communication between the C-APs and end-users. Let R denote the discovery range (i.e., maximum communication range) of the typical end-user u 0 . When u 0 randomly requests content m, the considered association mechanism can be demonstrated as 1) If content m is cached by a F-AP within R, u 0 will be associated with nearest F-AP that caches content m to serve it directly, e.g., end-users A, B, and C in Fig. 1.
Here, the serving F-AP is denoted as F m,0 and called 'direct F-AP'. Note that all the F-APs cache content m can be direct F-APs. Thus, the point process of the direct F-APs is m ⊆ F , which is a thinned PPP of density T m λ F . Then, the probability of serving u 0 by a direct F-AP within R when content m is requested can be obtained as follows Please refer to Appendix A for the proof of (6). 2) If there is no F-AP caches content m within R, the typical end-user u 0 will search within R for the nearest available F-AP F a,0 to fetch content m from the nearest C-AP C 0 , e.g., end-user D in Fig. 1. Due the 2-hop transmission F a,0 is called 'transit F-AP'. Note that the available F-AP is a F-AP caches at least one inactive content (i.e., a content not requested by any end-user associated with it). Let the random variable Y µ ∈ {0, 1} denote whether content µ cached at a F-AP is being requested by end-users associated with it, where Y µ = 1 represents the event of requesting content µ by any associated end-user, and Y µ = 0 otherwise. The probability mass function (PMF) of Y µ is influenced by the probability density function (PDF) of the Voronoi cell size of the F-AP caching content µ. Then, using proposition 1 of [33], the probability of content µ being inactive b µ can be obtained as follows where notation P [.] stands for the probability.
Accordingly, the probability of the transit F-APs with respect to content m can be obtained as follows Please refer to Appendix B for the proof of (8). Let a,m ⊆ F denote the point process of the available F-APs with respect to content m. Then, by noting that a,m is a thinned PPP with density m λ F , the probability of serving of u 0 when it requests content m by a transit F-AP within R can be obtained as follows Please refer to Appendix C for the proof of (9).

III. PERFORMANCE ANALYSIS AND PROBLEM OPTIMIZATION A. PERFORMANCE ANALYSIS
The STP, defined by the probability that the end-users successfully receive their desired contents, is the performance metric considered by this paper. Therefore, when the typical user u 0 randomly requests content m, it can be successfully received and decoded if the channel capacity of u 0 , which depends on the signal-to-interference-plus-noise ratio (SINR) and transmission bandwidth, exceeds the threshold transmission rate τ . However, under the adopted caching and multicasting strategy both the SINR of u 0 and the dedicated bandwidth to transmit content m are influenced by the combinations caching distribution p. Thus, it is necessary to carefully analyze the impact of the caching distribution on both when u 0 is associated with the different types of serving F-APs to derive the expression of the STP. When a direct F-AP F m,0 within R serves u 0 when it requests content m, the SINR of u 0 can be expressed as in (10), as shown at the bottom of the next page, where D 0,0 denotes the distance between F m,0 and u 0 , h 0,0 denotes the small-scale channel coefficient between F m,0 and u 0 , N 0 is the noise power, ∈ m \ F m,0 denote all the F-APs caching content m except the serving F-AP F m,0 , ∈ −m denote all the F-APs that do not cache content m, ∈ C denote all the C-APs, D ,0 stands for the distance between access point and u 0 , and h ,0 denotes the channel coefficient between access point and u 0 .
Let the discrete random variable K 0 ∈ {1, · · · , K } represent the content load of the serving F-AP (i.e., the set of contents downloaded by the end-users associated with F m,0 ). Bearing in mind that F m,0 disseminates each one of the K 0 contents over a bandwidth of W F K 0 due to the adopted multicasting scheme. Thus, the STP of content m when u 0 is being served by F m,0 is given as follows where C m,0 represents the channel capacity of the link between F m,0 and u 0 . Generally, the content load K 0 is correlated in a complex manner with the SINR since the larger association region of the F-APs results in higher content load due to the higher number of associated end-user and lower SINR owing to the larger F-AP to end-user distance [34].
As in [28]- [31], the correlation and dependence are ignored for analytical tractability. Hence, (11) can be rewritten as in (12), as shown at the bottom of the next page, where P [K 0 = κ] represents the PMF of K 0 , which is given in (13), as shown at the bottom of the next page, and proved in Appendix D, and q κ,m,0 (p) is the STP of content m when it is served by F m,0 with a content load of κ.
Next, stochastic geometry tools are used to obtain q κ,m,0 (p) as in the following theorem.
Theorem 1: The STP of content m when u 0 is served by the direct F-AP F m,0 with a content load of κ can be expressed as in (14), as shown at the bottom of the next page, where Proof: See Appendix E Noting that a 2-hop transmission is required for successfully delivering content m to u 0 when it is served by a transit F-AP F a,0 within R. Thus, the STP of content m can be expressed as follows where C a,0 is the channel capacity of the links between F a,0 and u 0 , C C,a is the channel capacity of the links between C 0 and F a.0 , SINR a,0 represents the SINR of u 0 , and SINR C,a is the SINR of F a,0 . By carefully analyzing the different types of interfering access points, SINR a,0 and SINR C,a can be obtained as in (17) and (18), as shown at the bottom of the next page, respectively, where D a,0 and h 0,0 are the distance and the channel coefficient between F a,0 and u 0 , respectively. ∈ a,m \ F a,0 represent all the available F-APs with respect to content m except F a,0 , ∈ −a,m denote all the unavailable F-APs, i.e., −a,m F \ a,m , D C 0 ,a is the distance between C 0 and F a,0 , h C 0 ,a is the channel coefficient between C 0 and F a,0 , VOLUME 10, 2022 denote all the F-APs except F a,0 , D ,a is the distance between access point and F a,0 , and h ,a is the channel coefficient between access point and F a,0 .
Taking into consideration the random nature of the content load of the serving transit F-AP F a,0 , equation (16) can be rewritten as shown in (19), as shown at the bottom of the page, where q κ,a,0 (p) is the STP of content m over the link from F a,0 to u 0 when the content load of F a,0 is κ, q C,a (p) is the STP of content m over the link from C 0 to F a,0 , and q κ,C,0 (p) represents the STP of content m conditioned on the Content load of F a,0 over the link from C 0 to u 0 via F a,0 . Utilizing tools from stochastic geometry, q κ,C,0 (p) can be obtained as in Theorem 2.
Theorem 2: The STP of content m when u 0 is served by the transit F-AP F a,0 with a content load of κ can be obtained as in (20), as shown at the bottom of the next page, where and Proof: See Appendix F. Bearing in mind that there are M distinct contents in the system. Then, the STP of u 0 denoted as q(p) can be expressed as in the following theorem.
Theorem 3: The STP of the typical end-user u 0 can be obtained as in (23), as shown at the bottom of the next page.
Proof: The proof is straightforward using total probability theorem and noting that the system contains M different contents and each one of them can be delivered to u 0 by either a transit or direct F-AP.
To reduce the complexity of the STP, the following corollary considers the asymptotic scenario of P N 0 → ∞, which represents the scenario of interference-limited network.
Proof: As P N 0 → ∞, the terms containing it in the exponential functions in q κ,m,0 , q κ,a,0 , and q C,a approach zero. Thus, q κ,m,0 q κ,a,0 , and q C,a can be written in the form of a 0 bd exp −cd 2 dd = b 2c 1 − exp −ca 2 , where a, b, and c are constants. Therefore, we complete the proof.

B. PROBLEM OPTIMIZATION
Since the STP is fundamentally influenced by the caching and multicasting model via the combinations caching distribution p, this paper aims at maximizing the STP by optimizing the combinations caching distribution p. Accordingly, the caching optimization problem is formulated as follows Problem 1 (Caching Optimization) Solving Problem 1 involves very high computational complexity as a result of the very complex form of q(p). However, the complexity can be reduced by considering q ∞ (p) as an objective function as in Problem 2.

Problem 2 (Asymptotic Optimization when
The convexity of q (p) and q ∞ (p) cannot be ensured due to their complex form. Thus, Problems 1 and 2 are non-convex in general.
This paper proposes PCSA to obtain optimal solution of the caching optimization problem. PCSA is a novel modified version of the original CSA. CSA is a meta-heuristic algorithm that shows high efficiency in solving the non-convex optimization problems [35]. Inspired form the nature, the original CSA was proposed by Yang and Deb in 2009 to mimic the obligate brood parasitism of cuckoo birds by generating new nests by Lévy flights and random walks in the first and second generation of each iteration, where the nest represents a solution [36]. CSA has attracted high attention lately because it outperforms the other nature-inspired optimization methods in terms of simplicity, quality solution, success rate, number of evaluations of the objective function, and execution time [37], [38]. However, the fast convergence of CSA cannot be assured due to the random walk-based search for the best nest. Moreover, the original CSA utilizes penalty method to treat constrained problems. However, it is not easy to guarantee the feasibility of the optimal solution using penalty method when there are equality constraints. Also, the selection of the user-defined penalty factors significantly affects the quality of the optimization. Moreover, when the optimization problem has multiple constraints, it is difficult to determine the penalty factor for each constraint as each one might has a different scale. To overcome these limitations, we develop PCSA outlined in Algorithm 1, where t is the iteration index, N C is the maximum number of iterations, N P is the population size, and P a ∈ (0, 1) is the fraction of nest to be abandoned.
In PCSA, the feasibility of the randomly generated nests is assured by the projection optimization performed at steps 3 and 7. The projection optimization is a search for the nearest nest to the projected nestp i in the set of feasible nests satisfying the problem's constraints. Therefore, the projection optimization problem is formulated as Problem 3 (Projection Optimization ) where ||.|| stands for the Euclidean norm. The projection optimization problem is convex and the optimal solution of which can be obtained using Matlab Optimization Toolbox, wherein the optimal solution is computed by VOLUME 10, 2022 Algorithm 1 Projected Cuckoo Search Algorithm (PCSA) 1: initialization set t, N C , N P , and P a 2: generate a random initial population of N P nests {p i : i ∈ {1, 2, · · · , N P }} 3: obtain a feasible population {p i : i ∈ {1, 2, · · · , N P }} by projecting eachp i onto the set of the variables satisfying (1) and (2) 4: calculate the fitness value q ∞ p i , ∀i ∈ {1, 2, · · · , N P } 5: while t ≤ N C do 6: generate new nestsp i new , ∀i ∈ {1, 2, · · · , N P }, by Lévy flights using (27)  if q ∞ p j < q ∞ p i new then 10: p j ← p i new 11: end if 12: abandon randomly P a fraction of worse nests and obtain new ones by sorting the components of each abandoned nest descendingly 13: find the best nest 14: end while In step 6 of Algorithm 1, the new nests in the first generation of each iteration are generated via Lévy flights as follows wherep i new and p i are the locations of the i-th new and current nest, respectively, δ is the scaling factor of the step size, which depends on the scale of the problem and should be chosen wisely to avoid flying too far, ⊗ denotes the entry-wise multiplication of two vectors, ϑ ∈ [0.3, 1.99] represents the Lévy distribution index, and L(ϑ) = (L j (ϑ)) j∈J denotes the Lévy random vector. According to Mantegna's algorithm [39], the components of L(ϑ) can be calculated by where ω d ∼ N (0, σ 2 ω ) and ν d ∼ N (0, σ 2 ν ) are random numbers drawn from normal distribution with a zero mean and variance of and respectively, where (.) denotes gamma function.
The scaling factor δ is assumed to be decreasing with iteration index to encourage the localization of the Lévy's search, i.e., As the most popular content are more probable to be cached, in step 11, we encourage PCSA to converge faster by generating the second generation of the nests by sorting the components of the abandoned nest in a descending order instead of generating it via random walk as in the original CSA.
In each iteration of PCSA, the time complexity of the projection optimization in steps 3 and 7 is at most O(N P × I ), where I is the maximum number of function evaluations in the projection optimization operation. The time complexity of the sort operation in step 12 is O(N P × J 2 ). whereas, the time complexity of the rest of the algorithm is O(N P ×J ). However, it is essential that I > J 2 to efficiently obtain a feasible nest by the projection optimization. Thus, the time complexity of PCSA in each iteration after neglecting the low order terms is O(N P × I ). As there are N C iterations, the overall time complexity of PCSA becomes O(N C × N P × I ).

IV. NUMERICAL RESULTS
In this section, we first evaluate the performance of proposed PCSA. Then, we verify the asymptotic approximation of the STP presented in the previous section. Finally, the performance of the proposed joint random caching and multicasting scheme is evaluated.
The parameters of PCSA used in this work are N C = 500, N P = 100, ϑ = 1.5, and P a = 0.25. To evaluate the performance of PCSA, it is compared with CSA presented in [40] with penalty factor of 100 and the closest feasible solution to the solution obtained using CSA. The parameters of CSA were set to be the same as PCSA. Fig. 2 plots the average fitness value (i.e., STP) of the asymptotic caching and multicasting scheme obtained over 20 trails versus the iteration index. The figure shows that the average fitness values of CSA are infeasible (i.e., STP > 1). Whereas, the closest feasible nests to the nests obtained by CSA has a poor performance compared with the proposed PCSA. The figure also depicts that the PCSA with sorting the components of the abandoned nest in a descending order converges faster and outperforms the PCSA with random walk-based search. The statistical results of the performance of PCSA and CSA provided in Table 1 indicate that PCSA with sorting the components of the abandoned nest in a descending order outperforms the other algorithms. Fig. 3 plots the STP versus the transmission SNR P/N 0 . The simulation curves obtained by performing Monte Carlo simulations 50000 times show that the proposed caching strategy with multicast transmission service always achieves higher STPs than unicast transmission scheme. Moreover, the figure shows that the curves of the analytical and simulation curves of the proposed caching and multicasting scheme with general transmit SNR are close. Also, it shows that the analytical curve of the general transmit SNR caching and multicasting scheme approaches the analytical asymptotic curve, which verifies Theorem 3 and Corollary 1 and reveals that q ∞ (p) provides a good approximation of q(p) when the transmission SNR is high, i.e., P/N 0 ≥ 40 dB. Accordingly, we focus on the asymptotic caching optimization as P/N 0 → ∞ in the rest of this section.
In Fig. 4-11, the performance of the proposed asymptotic caching and multicasting strategy for F-RAN using PCSA is compared with two benchmark strategies. The first benchmark is 'Uniform', which refers to caching strategy proposed by [41], in which the F-APs randomly cache a combination according to the uniform distribution, i.e., the probability of caching each combination is p j = 1 J , ∀j ∈ J . The second benchmark is 'Popular' caching strategy, in which only the most K popular contents are cached at each F-AP [42]. It is assumed that the benchmark schemes adopt the same association, multicasting, and wireless backhauling models of the proposed scheme. Note that the requested contents except the K top ranked contents are served by a transit F-AP in Popular scheme. Whereas, the contents are often served by the direct F-APs in Uniform scheme. Fig. 4 illustrates the relationship between the STP and threshold transmission rate τ . The figure shows a decrease in the STP with the threshold transmission rate for all the considered schemes. It is observed that Uniform scheme achieves lower STPs than Popular scheme when τ ≥ 0.2 Mbps, which  is due to the content that is served by direct F-APs with high separation distance from the requester in Uniform scheme has a lower STP than severing it by a 2-hop transmission using a closer transit F-AP as in Popular scheme. It is also observed that the proposed asymptotic caching schemes outperforms the benchmark schemes as a result of optimizing the cache placement. Fig. 5 illustrates the relationship between the STP and discovery range R. We can observe a logistic growth in the VOLUME 10, 2022 STP with the discovery range for all schemes, which is due to the higher probability of association with a F-AP to serve the requested contents as there are more F-APs residing within the discovery range. Moreover, the figure shows that optimizing that cache placement in the proposed scheme improves the STP by up to 15% over the benchmark schemes when the discovery range is higher than 20 m. Fig. 6 illustrates the relationship between the STP and Zipf exponent γ . The figure shows an increase in the STP for all schemes with Zipf exponent, which is due to the higher probabilities of requesting the high ranked contents with the increase in Zipf exponent. However, Uniform scheme has a lower increasing trend than the other schemes, since the number of the cached high ranked contents within R is lower as a result of caching the contents evenly at the F-APs. Fig. 6 also shows that the proposed scheme outperforms the benchmark schemes due to optimizing the cache placement. Fig. 7 plots the STP versus the F-APs' transmission bandwidth W F . An increase in the STP with the F-APs' transmission bandwidth is observed for all the considered schemes, which is due to the improvement in disseminating the contents by the F-APs with the increase in their transmissions bandwidth. However, Uniform scheme is affected more than Popular scheme by the F-APs' transmission bandwidth as the contents are often served by the direct F-APs, i.e., a singlehop transmission is often required to deliver the requested content. It is also observed that the proposed scheme performs better than the benchmark schemes, which is due to the optimum utilization of the F-APs as direct or transit as a result of optimizing the cache placement. Fig. 8 illustrates the relationship between the STP and the C-APs' transmission bandwidth W C . Fig. 8 shows an  improvement in the STP with the C-APs' transmission bandwidth for the proposed and Popular schemes, which is due to the higher performance of fetching the contents from C-APs by the transit F-APs. Whereas, Fig. 8 shows that the C-APs' transmission bandwidth has almost no impact on the performance of Uniform scheme, which is due to the extremely low probability of operating the F-APs in the transit mode. Fig. 8 also shows that the proposed caching schemes outperform the benchmark schemes as a result of the optimizing the operation modes of the F-APs by optimizing the cache placement. Fig. 9 plots the STP versus the F-APs' cache size K . It can be seen that the proposed scheme performs better than benchmark schemes and the STP increases with K for all schemes. Here, the improvement in the STP is due to caching more contents at each F-AP, which results in an increase in the probability of finding a direct F-AP that caches the requested content in the vicinity of the requester. Also, caching more contents at each F-AP leads to an increase the probability of inactive contents, and thus an increase in the probability of the transit F-APs. It can be also seen that the differences in the STP between all caching schemes reduce with the increase in the cache size. This is mainly due to higher probability that the combinations of the different caching schemes contain higher number of common contents. Fig. 10 plots the STP versus the total number of contents M . We can see that the STP of all schemes degrades with the total number of contents, which is due to the degradation in the popularity of each content, the lower number of contents that are cached within the discovery range, which results in lower probabilities of the direct and transit F-APs, and the poor performance of disseminating the contents by the C-APs since their transmission bandwidth is evenly divided over a higher number of contents. We can also see that the impact of total number of contents on Uniform scheme is higher than the other schemes, which is due to the extremely low probability of serving the contents by the transit F-APs. Fig. 10 shows that the proposed caching and multicasting scheme achieves higher STPs than the benchmark schemes as a result of optimizing the cache placement, which in turns results in optimizing the operation modes of the F-APs.    11 illustrates the STP versus the end-user density λ U . The figure demonstrates an exponential decay in the STP for all schemes with the increase of the end-user density, which is due to the higher probability of requesting the contents with the increase in the end-user density, which in turn results in a decrease in the probability of the inactive contents, and thus a decrease in the probability of finding a transit F-AP vicinity of the requester and a lower assigned bandwidth per content due to the adopted multicast scheme. Fig. 11 shows that the proposed scheme outperforms the benchmark schemes by up to 15% at high end-user densities (i.e., λ U > 0.2 endusers/m 2 ).    12 plots the STP versus the F-AP density λ F . The figure shows the proposed scheme outperforms the benchmark schemes and the STP of the proposed and Uniform schemes improves with the F-AP density, which is owing to the decrease in the distance between the end-user that is requesting a content and its serving F-AP and the higher probability of the transit F-APs as there are higher number of F-APs residing within the discovery range of the requester. Whereas, the STP of Popular scheme increases first for the same reasons, then it decreases due the high generated interference from the F-APs that degrades the performance of fetching the contents except the K most popular ones from the C-APs.  Fig. 13 illustrates the relationship between the STP and C-AP density λ C . Fig. 13 shows that the STP of the proposed and Popular schemes increases first with C-AP density as a result of the improvement in the performance of fetching the contents from the C-APs by the transit F-APs. Then, it decreases when the gained improvement in the backhaul links cannot compensate the performance degradation on the links between the F-APs and the end-users caused by the higher generated interference from the C-APs. This is also the same reason of the observed decrease in the STP of Uniform scheme as the contents have an extremely low probability to be the transit F-APs. Finally, the figure shows that the proposed scheme performs better than the benchmark caching schemes due to the optimal utilization of the F-APs gained by the optimal cache placement.

V. CONCLUSION
In this paper, a probabilistic caching and multicasting scheme for F-RANs was proposed. The proposed scheme is based on the content combinations, and takes onto account the wireless backhauling of the F-APs. In order to derive the STP, we first derived the probability of a F-AP being a transit or direct F-AP with respect to the requested content and the association probabilities with those types of F-APs. Then, an expression of the STP in the general SNR region has been derived using tools from stochastic geometry and by carefully analyzing the different types of serving F-APs and interferers. The complexity was reduced by deriving a closed-form expression of asymptotic STP in the high SNR region. The optimization problem was formulated to maximize the asymptotic STP of the typical end-user. We developed the novel PCSA to solve the optimization problem and obtain the optimal cache placement. The numerical simulation results showed that PCSA converges faster than the original CSA and outperforms it.
The performance of the asymptotically random caching and multicasting scheme was also analyzed, where the results have shown that the proposed scheme achieves higher STPs than the well-known benchmark caching schemes at all the studied network's parameters.

APPENDIX A PROOF OF (6)
The probability of serving u 0 by a direct F-AP when it requests content m A m,0 is defined as the probability that there is at least one F-AP caches content m within R. Then, A m,0 can be expressed as follows where N (F m ) denotes the number of F-APs caching content m within R, and equality (a) is obtained by the null property of PPPs, i.e., P[N (F m ) = 0] = exp −πT m λ F R 2 .

APPENDIX B PROOF OF (8)
A transit F-AP is a F-AP that does not cache the requested content m and caches inactive contents. Thus, the probability of a F-AP being a transit one with respect to content m can be formulated as Equality (b) is due to the fact that the two events are independent. Whereas, (c) is obtained by noting that the probability that a F-AP is caching the inactive content µ ∈ M \ m is T µ b µ . Thus, P[F-AP caches inactive content = m] can be calculated by summing over the set M \ m, then normalizing it by dividing by the sum of the probabilities of all contents in the set M \ m (i.e., µ∈M\m T µ = K − T m ).

APPENDIX C PROOF OF (9)
The probability of serving u 0 by a transit F-AP when content m is requested A a,0 is defined as the probability of there is no F-AP caches content m and at least one F-AP caches an inactive contents within R. Thus, A a,0 can be expressed as follows where N (F a ) is number of F-APs that cache inactive contents.
Here, equalities (d) and (e) are obtained by noting that N (F m ) and N (F a ) are independent events, and the null property of PPPs, respectively.

APPENDIX D PROOF OF (13)
Given that F m,0 caches combination j ∈ J m , the content load of F m,0 can be expressed as where Y µ is defined as in section IV. Note that content m is already requested by u 0 . Here, the conditional PMF of K 0 follows Poisson binomial distribution [43]. Thus, we have where b µ is given by (7). The conditional probability of caching combination j at F m,0 is p j T m . Thus, by summing over J m (i.e., the set of all possible combinations containing content m) we can prove (13).

APPENDIX E PROOF OF THEOREM 1
Let I m l∈ m \F m,0 D −α l,0 |h l,0 | 2 represent the interference originating from the F-APs storing content m, I −m l∈ −m D −α l,0 |h l,0 | 2 denote the interference originating from the F-APs that do not cache content m, and I C l∈ C D −α l,0 |h l,0 | 2 denote the interference originating from the C-APs. Note that m and −m are thinned PPPs with densities T m λ F and (1 − T m )λ F , respectively. Next, we calculate q κ,m,0 conditioned on D 0,0 = d as in (37), as shown at the bottom of the next page, where s = 2 To proceed, L I m (s, d) can be calculated as in (38), as shown at the bottom of the next page, where the probability generating functional of PPP is utilized to obtain (h) [32]. Whereas, equality (i) is obtained by the change of variables in the integral, i.e., sr −1/α to t, then 1/(1 + t −α ) to w. In the same manner, L I −m (s, d) and L I C (s, d) can be obtained as in (39) and (40), as shown at the bottom of the next page, respectively.
Following the same steps of Appendix E, q κ,a,0,D a,0 (p, d) and q C,a,D C,a (p, d) can be obtained as in (42) and (43), as shown at the bottom of the page, respectively, where s = Next, we remove the conditions on the distance to obtain q κ,a,0, (p) and q C,a (p) as follows q κ,a,0 (p) = where f D a,0 (d) = 2π m λ F d exp −π m λ F d 2 and f D C,a (d) = 2π λ C d exp −πλ C d 2 . Thus, we can prove Theorem 2.