Edge Caching in Dense Heterogeneous Cellular Networks with Massive MIMO Aided Self-backhaul

This paper studies edge caching in dense heterogeneous cellular networks, in which small base stations (SBSs) with limited cache size store the popular contents, and massive multiple-input multiple-output (MIMO) aided macro base stations provide wireless self-backhaul when SBSs require the non-cached contents. We address the effects of cell load and hit probability on the successful content delivery (SCD), and evaluate the minimum required base station density for avoiding the access overload in a small cell and backhaul overload in a macrocell. We demonstrate that hit probability needs to be appropriately selected, in order to achieve SCD. We derive the massive MIMO backhaul achievable rate without downlink channel estimation, to calculate the backhaul time. We provide important insights on the interplay between cache size and SCD, and analyze the latency in such networks. We demonstrate that when non-cached contents are requested, the average delay of the non-cached content delivery could be comparable to the cached content delivery with the help of massive MIMO aided self-backhaul, if the average access rate of cached content delivery is lower than that of self-backhauled content delivery. Simulation results are presented to validate our analysis.


A. Motivation and Background
New findings from Cisco [1] indicate that mobile video traffic accounts for the majority of mobile data traffic. To offload the traffic of the core networks and reduce the backhaul cost and latency, caching the popular contents at the edge of wireless networks becomes a promising solution [2][3][4]. The latest 3GPP standard has required that the fifth generation (5G) system shall support content caching applications and operators need to place the content caches close to mobile terminals [5]. In addition, the emerging radio-access technologies and wireless network architectures provide edge caching with new opportunities [6].
Recent works have focused on the caching design and analysis in various scenarios. In [7], a probabilistic caching model was considered in single-tier cellular networks and the optimal content placement was designed to maximize the total hit probability. In [8], a stochastic content multicast scheduling problem was formulated to jointly minimize the average network delay and power costs in heterogeneous cellular networks (HetNets), and a structure-aware optimal algorithm was proposed to solve this problem. Caching cooperation in multi-tier HetNets was studied in [9], where a low-complexity suboptimal solution was developed to maximize the capacity in such networks. Caching in device-to-device (D2D) networks was investigated in the literature such as [10,11]. In [10], a holistic design on D2D caching at multi-frequency band including sub-6 GHz and millimeter wave (mmWave) was presented. In [11], the performance difference between maximizing hit probability and maximizing cacheaided throughput in D2D caching networks was evaluated. The work of [12] showed that in multi-hop relaying systems, the efficiency of caching could be further improved by using collaborative cache-enabled relaying. Joint design of cloud and edge caching in fog radio access networks were introduced in [13,14], where the popular contents were cached at the remote radio heads. However, prior works [7][8][9][10][11][12][13] did not present design and insights involving edge caching in the future dense/ultra-dense cellular networks (e.g., 5G) with backhaul concerns, where wireless self-backhauling shall be supported [4].
Cache-enabled small cell networks with stochastic models have been investigated in the literature such as [15][16][17][18][19]. Cluster-centric caching with base station (BS) cooperation was studied in [15], where the tradeoff between transmission diversity and content diversity was revealed. In [16], two cache-enabled BS modes were considered, i.e., always-on and dynamic on-off, and it was assumed that the intensity of BSs is much larger than the intensity of mobile terminals. The work of [17][18][19] concentrated on the cache-enabled multi-tier HetNets.
Specifically, [17] and [18] studied optimal content placement under probabilistic caching strategy, and [19] considered the joint BS caching and cooperation, in contrast to the singletier case in [15]. However, [15][16][17][18][19] only aimed to maximize the probability that the requested content is not only cached but also successfully delivered. In realistic networks, when users' requested contents are not cached at their associated BSs, they will obtain their requested contents from the core networks via wired/wireless backhaul, which also needs to be studied in cache-enabled cellular networks.
In fact, existing contributions such as [20][21][22] have studied the effects of backhaul on content delivery in cache-enabled networks. The work of [20] considered that non-cached contents were obtained via backhaul, and designed a downlink content-centric sparse multicast beamforming in the cache-enabled cloud radio access network (Cloud-RAN) architecture, to minimize the weighted sum of backhaul cost and transmit power. In [21], the network successful content delivery consisting of cached content delivery and backhauled content delivery was studied, and the optimization problem was formulated to minimize the cache size under quality-of-service constraint. The work of [22] analyzed the capacity scaling law when there are limited number of wired backhaul in single-tier networks, and showed that cache size needs to be large enough to achieve linear capacity scaling. However, none of [20][21][22] has studied the cache-enabled cellular networks with specified wireless backhaul transmission, such as massive multiple-input multiple-output (MIMO) aided self-backhaul.

B. Novelty and Contributions
In this paper, we focus on the edge caching in dense HetNets with massive MIMO aided self-backhaul, which has not been understood yet. Considering massive MIMO aided selfbackhaul is motivated by the fact that it is challenging to let each backhaul link be fiber-optic in such networks and massive MIMO can support high-speed transmissions thanks to large array gains [4]. Our contributions are summarized as follows: • In contrast to the prior works such as [15][16][17][18][19][20][21][22], we consider cache-enabled HetNets, in which randomly located small BSs (SBSs) cache finite popular contents, and macro BSs (MBSs) equipped with massive MIMO antennas provide wireless backhaul to deliver the non-cached requested contents to the SBSs. Moreover, we also consider the resource allocation when multiple users request the contents from the same SBS, which has not been studied in a cache-enabled stochastic model. • We first derive the successful content delivery probability when the requested content is cached at the SBS. The maximum small cell load is calculated, and the minimum required  density of SBSs for avoiding access overload is obtained. We show that hit probability needs to be lower than a certain value, to guarantee successful cached content delivery.
• We derive the successful content delivery probability when the requested content is not cached and has to be obtained via massive MIMO backhaul. We analyze the massive MIMO backhaul achievable rate when downlink channel estimation is not available, to evaluate the backhaul time. The minimum required density of MBSs for avoiding backhaul overload is obtained. We show that hit probability needs to be higher than a certain value, to guarantee successful self-backhauled content delivery.
• We analyze the effects of cache size on the successful content delivery, and provide important insights on the interplay between time-frequency resource allocation and cache size from the perspective of successful content delivery probability. We characterize the latency in terms of average delay in such networks, and confirm that when the requested contents are not cached, the average delay of the non-cached content delivery could be comparable to the cached content delivery with the assistance of massive MIMO backhaul, if the average access rate of cached content delivery is lower than that of self-backhauled content delivery.

II. NETWORK MODEL
As shown in Fig. 1, we consider a two-tier self-backhauled HetNet, in which each singleantenna SBS with finite cache size can store popular contents to serve user equipment (UEs), and each massive MIMO aided MBS equipped with N antennas accesses to the core networks via optical fiber and delivers the non-cached contents to the SBSs via wireless backhaul. UEs, SBSs, and MBSs are assumed to be distributed following independent homogeneous Poisson point processes (HPPPs) denoted by Φ U with the density λ U , Φ S with the density λ S , and Φ M with the density λ M , respectively. It is assumed that UEs are associated with the SBSs that can provide the maximum average received power, which is also utilized in 4G networks [6]. In addition, each channel undergoes independent and identically distributed (i.i.d.) quasi-static Rayleigh fading.

A. Content Placement
Content placement mechanism is mainly designed based on content popularity [4]. We assume that there is a finite content library denoted as F := {f 1 , . . . , f j , . . . , f J }, where f j is the j-th most popular content and the number of contents is J. The request probability for the j-th most popular content is commonly-modeled by following the Zipf distribution, which is expressed as [23] where ς is the Zipf exponent to represent the popularity skewness [23]. Each content is assumed to be unit size and each SBS can only cache L (L ≪ J) contents. We employ the probabilistic caching strategy [7], i.e., the probability that the content j is cached at an arbitrary SBS is q j (0 ≤ q j ≤ 1), and J j=1 q j ≤L.

B. Self-backhaul Load
We assume that the access and backhaul links orthogonally share the sub-6 GHz spectrum, and the bandwidths allocated to the access and backhaul links are ηW and (1 − η) W , respectively, where η is the fraction factor and W is the system bandwidth. The number of UEs that is associated with an SBS is denoted as K, and UEs in the same small cell are time-dividedly served with equal-time sharing. Thus, the fraction of time-frequency resources allocated to each access link is ηW/K during the cached content delivery. When an associated SBS does not cache the requested content, it has to be connected to an MBS that provides the strongest wireless backhaul link such that the requested content can be obtained from core networks. Let S j (N ≫ S j ) denote the number of SBSs served by the j-th MBS (j ∈ Φ M ) for wireless backhaul.
Since the hit probability that UE's requested content file is stored at an SBS is q hit = J j=1 a j q j , the set of SBSs can be partitioned into two independent HPPPs Φ a S and Φ b S based on the thinning theorem [24], where Φ a S with the density λ S q hit denotes the point process of SBSs with access links, and Φ b S with the density λ S (1 − q hit ) denotes the point process of SBSs with backhaul links. Let ω b = λ S (1 − q hit ) /λ M represent the average number of SBSs served by an MBS for wireless backhaul.

C. Resource Allocation Model
We consider the saturated traffic condition, i.e., all the SBSs keep active to serve their associated UEs.

1) Access:
When the requested content is stored at a typical SBS, the rate for a typical access link is given by where I a denotes the total interference power from other SBSs, P a is the SBS's transmit power, L (|X|) = β (|X|) −αa denotes the path loss with frequency dependent constant value β, distance |X| and path loss exponent α a , h o ∼ exp(1) and |X o | are the small-scale fading channel power gain and distance between the typical UE and its associated SBS respectively, h i ∼ exp(1) and |X o,i | are the small-scale fading interfering channel power gain and distance between the typical UE and the interfering SBS i ∈ Φ a S \ {o} (except the typical SBS o) respectively, and σ 2 a is the noise power at the typical UE. 2) Self-Backhaul: When the requested content is not stored at SBSs, it is obtained through massive MIMO backhaul. For massive MIMO backhaul link, we consider that massive MIMO enabled MBS adopts zero-forcing beamforming with equal power allocation [25]. In such a massive MIMO self-backhauled network, SBSs will not perform any channel estimation, and we adopt an achievable backhaul transmission rate as confirmed in [26,27]. Therefore, given a typical distance |Y o | between the typical SBS and its associated MBS, the rate for a typical massive MIMO backhaul link is given by with where E {·} is the expectation operator, I b denotes the total interference power from other is the small-scale fading channel power gain between the typical SBS and its associated MBS, g j ∼ Γ (S j , 1) 1 and |Y o,j | are the small-scale fading interfering channel power gain and distance between the typical SBS and interfering MBS j, respectively, and σ 2 b is the noise power at the typical SBS.
After obtaining the requested content via backhaul, the associated SBS delivers it to the corresponding UE. In this case, the corresponding access-link rate is expressed as where I a ′ is the total interference power, are the small-scale fading channel power gain and pathloss between the typical SBS and interfering SBS i ′ ∈ Φ b S \ {o}, respectively, and σ 2 a ′ is the noise power at the typical UE. From (3) and (4), we see that to cut latency, massive MIMO backhaul link needs to be of high-speed, which can be achieved by using large array gains and large bandwidths via carrier aggregations (CA). In the following section, we will further examine how much backhaul time is needed at an achievable backhaul rate.

III. CONTENT DELIVERY EFFICIENCY
In this paper, there are two cases for successful content delivery (SCD), i.e., 1) when the associated BS has cached the requested content, SCD occurs if the time for successfully delivering Q bits will not exceed the threshold T th ; and 2) when the requested content is not cached at the associated BS and needs to be obtained via massive MIMO backhaul, SCD occurs if the total time for successfully delivering Q bits to the UE is less than T th .

A. Cached Content Delivery
Different from [15,16,18] where it is assumed that each small cell has only one active UE, we evaluate SCD probability by considering multiple UEs served by an SBS in practice, and analyze the effect of resource allocation on SCD probability. We first have the following important theorem.
Theorem 1: When a requested content is stored at the typical SBS, the SCD probability is derived as where Pλ U λ S (k) is the probability mass function (PMF) that there are other k − 1 UEs (except typical UE) served by the typical SBS, and is given by [29]. In (5), K = K a max is the maximum load in a typical small cell, and can be quickly obtained by using Algorithm 1 to solve the following equation where is the Gauss hypergeometric function [28, (9.142)], and ǫ is the predefined threshold, i.e., SCD occurs when the probability that R a is larger than Q T th is above ǫ.
It is implied from Theorem 1 that in the dense small cell networks (i.e., interferencelimited) 2 , the SCD probability depends on the ratio of UE density to SBS density and hit probability given the time-frequency resource allocation. Based on Theorem 1, we have Corollary 1: From (6), we see that to achieve the load K = K a max ≥ 1 in a small cell, the hit probability should satisfy where Ξ a = 2 It is indicated from (7) that there is an upper-bound on the hit probability, which can be explained by the fact that when more UEs can obtain their requested contents from their associated SBSs in dense cellular networks with large hit probability, there will also be more interference from nearby SBSs that degrades the cached content delivery.
The optimal k * is obtained, i.e., K a max = round (k * ). 8: break In realistic networks, there may be overload issues when the scale of small cells is not adequate to support large level of connectivity, which needs to be addressed. Therefore, given a specified scale of UEs λ U , we evaluate the minimum required scale of small cells as follows.
Corollary 2: To mitigate the harm of overloading, the minimum required SBS density needs to satisfy where is the solution of Pλ U λ S =µa (k = K a max + 1) = ρ with arbitrary small ρ > 0, and can be easily obtained via one-dimension search, similar to Algorithm 1. Such network deployment given in (8) From (8), we see that the minimum required density of SBSs only depends on the maximum load of a small cell and the density of UEs in dense cache-enabled cellular networks.

B. Self-backhauled Content Delivery 1) Massive MIMO Backhaul:
When the required content is not stored at the typical SBS, SBS has to obtain it from the core networks via massive MIMO backhaul. Therefore, we need to evaluate the backhaul time for delivering the requested content to the typical SBS. Given the load S o in a typical macrocell, the achievable transmission rate for a typical backhaul link is given by where and r b is the minimum distance between the typical MBS and its associated SBS. A detailed derivation of (9) is provided in Appendix C. Therefore, the time for delivering Q bits to the typical SBS via wireless . When the number of antennas at the MBS grows large, we have the following corollary.
where [28]. Based on (10), the typical MBS's required time for delivering Q bits to its associated SBS satisfies Proof 3: See Appendix D.
It is explicitly shown from Corollary 3 that large number of antennas and bandwidths are required, in order to significantly cut the wireless backhaul time. From (11), we see that the backhaul time can at least be cut proportionally to 1/ ln N.
In the self-backhauled networks, the number of SBSs being simultaneously served by an MBS for wireless backhaul should not exceed the maximum value denoted by S max , i.e., S o ≤ S max ; otherwise high-speed massive MIMO aided backhaul transmission cannot be guaranteed. Hence, given the minimum required backhaul transmission rate R min b , the maximum backhaul load of a typical massive MIMO MBS is the solution of R b (S max ) = R min b , which can be quickly obtained by using one-dimension search since R b (S o ) is a decreasing function of S o for large N, as suggested in Appendix D. After obtaining S max , we can obtain the minimum number of massive MIMO aided MBSs that needs to be deployed, in order to mitigate the backhaul overload.
Corollary 4: Similar to Corollary 2, the minimum required density of MBSs is given by where ρ with arbitrary small ρ > 0, and can be easily obtained via one-dimension search.
It is explicitly shown in (12) that higher hit probability can significantly reduce the scale of MBSs because of less backhaul.
2) Access: After obtaining the required content via backhaul, the typical SBS transmits it to the associated UE. Thus, we have the following important theorem.
Theorem 2: When the required content is not stored at the typical SBS and has to be obtained via massive MIMO self-backhaul, the SCD probability is derived as where K b max is the maximum number of UEs that a typical small cell can serve when the typical UE's content needs to be attained via backhaul, and K b max can be obtained by solving the following equation 3 with  (14), we see that to achieve the load K = K b max ≥ 1 in a small cell, the hit probability should satisfy where From (15), we see that there is a lower-bound on the hit probability, i.e., minimum cache capacity is demanded at the SBS, since more backhaul results in more interference, which will degrade the self-backhauled content delivery.
Corollary 6: After obtaining the maximum load K b max , we can calculate the minimum required SBS density given from (8) by interchanging K a max → K b max , to overcome overload. Based on Theorem 1 and Theorem 2, the SCD probability in dense cellular networks with massive MIMO self-backhaul for a typical UE is calculated as where K a max and K b max are given by (6) and (14), respectively. The SCD probability given in (16) can be intuitively understood based on the fact that when the small cell load is light, UEs' requested contents can be successfully delivered whether they are cached or obtained from the core networks via massive MIMO backhaul.
However, after a critical value of cell load, UEs can only obtain their requested contents that are cached by the SBSs or via backhaul, which depends on the maximum cell load in cached content delivery and self-backhauled content delivery cases.

IV. CONTENT PLACEMENT, CACHE SIZE AND LATENCY
In this section, we study the effects of content placement and cache size on the content delivery performance. Then, we evaluate the latency in such networks.

A. Content Placement and Cache Size
As shown in (16), hit probability plays an important role in content delivery. Since hit probability depends on the cache size and content placement, SBSs with large storage capacity can cache more popular contents, to avoid frequent backhaul and cut backhaul cost and latency. Therefore, higher hit probability is meaningful to cut the network's operational and capital expenditures (OPEX, CAPEX). Given the SBS's cache size, different content placement strategies may result in various hit probability, and caching the most popular contents (MPC) can achieve the highest hit probability, which is commonly-considered in the literature involving edge caching such as [13,30]. Therefore, we consider MPC caching and analyze the appropriate cache size in such networks. Considering the fact that for large J with MPC caching, q hit = L j=1 a j ≈ L J 1−ς , we have , more time-frequency resources are allocated to the cached content delivery), the SCD probability is and it is an increasing function of the cache size, if the cache size L ∈ J 1 − Ξ b 1−ǫ ǫ  The above corollary provides some important insights into the interplay between timefrequency resource allocation and cache size in cache-enabled dense cellular networks with massive MIMO backhaul, which plays a key role in the content delivery performance.

B. Latency
To evaluate the latency in such networks, we consider the average delay for successfully obtaining the requested content in such networks. It should be noted that when the small cells are overloaded, UEs may suffer longer delay. There are many approaches to solve the overload issue such as deploying enough small cells following the rule of Corollary 2 and Corollary 6 or advanced multi-antenna techniques. Moreover, it may be more lightly loaded in realistic small cell networks [31]. For tractability, we assume that the load of a small cell will not exceed its maximum load K max . As suggested in [32], the average delay for requesting a content from a typical small cell can be expressed as where T 1 is the massive MIMO backhaul time detailed in Section III-B, and E {R a } and E {R a ′ } are the average access rate of the cached and self-backhauled content delivery, respectively, which are given by where ϕ (x, θ 1 , is the complementary cumulative distribution function of the R a or R a ′ , respectively, which is obtained by using the approach in Appendix A.
Given the hit probability, i.e., the cache size is fixed, the spectrum fraction η = η o for meeting E {R a } = E {R a ′ } can be easily obtained by using one-dimension search, considering the fact that E {R a } − E {R a ′ } is an increasing function of η.
Corollary 8: When η < η o , the average delay of self-backhauled content delivery could be lower than cached content delivery if massive MIMO antennas meet

A. Cached Content Delivery
In this subsection, we illustrate the cell load, SCD probability, and minimum required SBS density when the requested content is cached at the associated SBS.    (6), which has a precise match with the Monte Carlo simulations. The CCDF is a decreasing function of number of UEs served in a small cell, since resources allocated to each UE become less when serving more UEs. Fig. 3 shows the SCD probability when the requested content is cached at the associated SBS, based on Theorem 1 and Fig. 2. We see that for fixed cache size, the SCD probability decreases when the system requires higher SCD threshold ǫ, since higher ǫ reduces the level of maximum allowable cell load, as suggested in Fig. 2. Moreover, given ǫ, the SCD probability decreases with increasing the cache size. The reason is that hit probability increases with increasing the cache size, i.e., UEs are more likely to obtain the requested contents cached by their associated SBSs, which results in more interference at the same frequency band and reduces the maximum allowable cell load.   4 shows the minimum required SBS density to avoid the overload issue given the UE density λ U . Without loss of generality, we assume that the maximum allowable load of a small cell is K a max = 5 in this figure (Note that for specified system performance requirement, the maximum small cell load is obtained from (6), as illustrated in Fig. 2.). The numerical result has a precise match with the analysis shown in Corollary 2. We see that when the probability that more than K a max UEs need to be served in a small cell is not larger than ρ = 0.1, the minimum required SBS density satisfies λ U λ S = K a max + 1 = 6, as confirmed in (8). When the system requires lower ρ = 0.1 (i.e., lower overload probability.), the density ratio λ U λ S in such networks decreases, which means that more SBSs need to be deployed.

B. Massive MIMO Backhaul Transmission
In this subsection, we focus on the massive MIMO backhaul achievable rate, which determines the amount of backhaul time when an SBS obtains the requested content from its associated MBS. Note that the macrocell load and minimum required MBS density have been studied in Section III-B, which are similar to Theorem 1 and Corollary 2, and numerical results can be easily obtained by following Figs. 2 and 4.  (9) and (10), respectively, which tightly matches with the simulated exact curves. We see that the backhaul achievable rate decreases when macrocell load increases, since each SBS will obtain less transmit power and array gains. Adding more massive MIMO antennas improves the achievable rate because of larger array gains.

C. Latency
In this subsection, we evaluate the average delay in two scenarios, i.e., 1) the requested content is cached at the associated SBS; and 2) the requested content is not cached and needs to be obtained via massive MIMO backhaul. Fig. 6 shows the average delay for different cache size. The analytical curves are obtained based on the average rate given by (20). We see that the average delay for cached content delivery is lower than self-backhualed content delivery. The average delay for cached content delivery increases with increasing the cache size. In contrast, the average delay for selfbackhauled content delivery decreases with increasing the cache size. The reason is that larger cache size results in higher hit probability, and more SBSs can provide cached content delivery, which results in more inter-SBS interference over the frequency band allocated to the cached content delivery, and less inter-SBS interference over the frequency band allocated to the self-backhauled content delivery. In addition, the backhaul time T 1 is much lower than the access when using massive MIMO backhaul.

VI. CONCLUSION
In this paper, we have studied the content delivery in cache-enabled HetNets with massive MIMO backhaul. We derived the successful content delivery probability involving cached overloading. We demonstrated that hit probability needs to be properly determined, in order to achieve successful content delivery. We showed the interplay between cache size and timefrequency resource allocations from the perspective of successful content delivery probability.
We characterized the latency in terms of average delay in this networks, and showed that when UEs request non-cached contents, the average delay of the non-cached content delivery could be comparable to the cached content delivery with the help of massive MIMO aided self-backhaul in some cases. 20 APPENDIX A: PROOF OF THEOREM 1 Based on (2), SCD probability is calculated as where Pλ U λ S (k) is the probability mass function (PMF) of the number of other k − 1 UEs (except typical UE) served by the typical SBS, and Λ a k is the conditional SCD probability given K = k. According to [33], Pλ U λ S (k) can be calculated as where γ = 3.5 [29]. Given K = k, Λ (k) is calculated as where f |Xo| (x) = 2πλ S x exp (−πλ S x 2 ) is the probability density function (PDF) of the distance between the typical UE and its associated SBS, and Υ 1 (x) is the conditional SCD probability given K = k and |X o | = x. Considering the fact that dense cellular network is interference-limited in practice, the effect of noise power on the performance is negligible.
As such, we can evaluate Υ 1 (x) as where step (a) is obtained by using the generating functional of the PPP [34]. By substituting (A.4) into (A.3), Λ (k) can be derived in closed-form as ηW T th . Based on (A.5), the maximum load K a max of a typical small cell is given by where ǫ is the threshold that SCD occurs when Λ (k) ≥ ǫ. Although the closed-form solution with respect to (w.r.t.) k = K a max of (A.6) is unfeasible, it can be quickly obtained by using one-dimension search as detailed in Algorithm 1 due to the fact that Λ (k) is a decreasing function of k. The SCD probability in (A.1) is rewritten as where Pλ U λ S (k) and K a max are defined by (A.2) and (A.6), respectively, and the proof of Theorem 1 is completed.

APPENDIX B: PROOF OF COROLLARY 2
After obtaining K a max , we can find out how many small cells are sufficient to serve a specified scale of UEs λ U , since serving larger than K a max UEs in a small cell cannot achieve SCD. Assuming that Pλ U λ S (k = K a max + 1) = ρ with arbitrary small ρ > 0, we need to guarantee Pλ U λ S (k) ≤ ρ, ∀k > K a max , in order to avoid content delivery failure resulting from overloading. Let We can intuitively interpret (B.1) based on the fact that given the maximum load K a max , the probability that serving more than K a max UEs should be lower when adding more UEs. From From (B.2), we see that for k = K a max + 1, , and γ+1 , K a max + 1 . Therefore, the minimum required density of SBSs satisfies where µ a ∈ 0, K a max γ γ+1 is the solution of Pλ U λ S =µa (k = K a max + 1) = ρ, and can be easily obtained by using one-dimension search approach, since Pλ U λ S =µa (k = K a max + 1) is an increasing function of µ a as µ a ∈ 0, K a max γ γ+1 . Thus, we obtain the minimum required SBS density, in order to avoid overloading.
APPENDIX C: DETAILED DERIVATION OF (9) Since the typical SBS is associated with the nearest MBS, the PDF of the typical commu- where r b is the minimum distance between the typical MBS and its associated SBS. According to (3) and [26,27], the achievable transmission rate can be written as where C b (y) = log 2 1 + P b So Ξ 1 (y) P b So Ξ 2 (y)+Ξ 3 (y)+σ 2 b with Ξ 1 (y) = L (y) E √ g o 2 , Ξ 2 (y) = L (y) var √ g o , 4 and Ξ 3 (y) = E |Yo|=y {I b }.
APPENDIX D: PROOF OF COROLLARY 3 According to the Stirling's formula, i.e., Γ (x + 1) ≈ x e x √ 2πx as x → ∞, we have x ≈ e as x → ∞. Thus, Ξ 2 (y) = L(y) 2 . By using Jensen's inequality [35], we derive a tight lower-bound on the achievable transmission rate (C.2) as For large N, based on (D.1), ∆ 1 can be asymptotically derived as where Ei (z) is the exponential integral given by Ei (z) = − ∞ −z e −t t dt. Then, ∆ 2 can be asymptotically calculated as (D.5) Considering the fact that which confirms the Corollary 3.

APPENDIX E: PROOF OF THEOREM 2
Based on (4), SCD probability is given by where Pλ U of a typical small cell is the solution of Λ (k)| k=K b max = ǫ. Then, the SCD probability is obtained as (13).

APPENDIX F: PROOF OF COROLLARY 7
Based on (6) and (14), we see that K a max ≥ K b max if T th −T 1 T th ≤ η 1−η and q hit ≤ 1 2 . In this case, the SCD probability in (16) reduces to (17), and the corresponding cache size is obtained by considering Corollary 5 and q hit ≤ 1 2 . Likewise, K a max > K b max if T th −T 1 T th > η 1−η and q hit > 1 2 , and we can obtain (18) accordingly.