Content Caching and Channel Allocation in D2D-Assisted Wireless HetNets

The fifth generation cellular networks are required to provide very fast and reliable communications while dealing with the increase of users traffic. In heterogeneous networks (HetNets) assisted with device-to-device (D2D) communications, traffic can be offloaded to small base stations or to users to make content delivery faster and alleviate the traffic burden from the core network. In this paper, we aim to maximize the probability of successfully delivering files to users, referred to as the successful delivery probability, by jointly optimizing the caching placement and channel allocation in cache-enabled D2D-assisted HetNets. First, an analytical expression of the average content delivery delay is derived. The latter is subsequently used to formulate the joint optimization problem of cache placement and channel allocation. Due to the problem’s non-convexity, a linearization transformation is proposed, which allows to find the optimal solution. However, given the high complexity of the problem, we propose a low-complex heuristic approach for channel allocation and caching. Numerical results illustrate the efficacy of the proposed solutions and compare them to the conventional HetNet. Finally, the impact of several key parameters, e.g., transmit power, caching capacity, and QoS requirements, is investigated, which provides design guidelines for D2D-assisted HetNets.


I. INTRODUCTION
With the fast growth of new heavy mobile applications such as live streaming, language processing, and augmented reality, future wireless networks are required to focus not only on communication and connectivity, but also on high caching in order to satisfy stringent application demands [1].
The proposal of heterogeneous networks (HetNets) using small cells and different radio technologies improves the energy efficiency (EE) and spectral efficiency (SE) performances [2]. However, the ultra-dense small cell deployment and coexistence/interaction with existing macro base stations (MBSs) creates interference issues, which require efficient radio resource allocation and interference management techniques [3]. Moreover, device-to-device (D2D) communications have been envisioned as an essential component of the future wireless networks. By allowing two users (UEs) The associate editor coordinating the review of this manuscript and approving it for publication was Ivan Wang-Hei Ho .
to communicate directly without going through the cellular network, radio resources can be further exploited to enhance SE, transmission delay, and offload the backhaul's traffic [4].
On one hand, many works leveraged the advantages of D2D in the HetNet architecture. For instance, Malandrino et al. solved in [5] the uplink and downlink traffic scheduling problem of a long term evolution (LTE) twotier HetNet, with support of D2D communications. Authors of [6] studied the power allocation and cell selection problems to maximize EE for an LTE advanced HetNet with support for D2D communications and relaying. They showed that a higher number of UEs can perform D2D communications even for stringent quality-of-service (QoS) requirements. In [7], Huang et al. proposed a centrally controlled framework for a D2D communication underlaying a two-tier cellular network. The objective was to maximize the sum rate of the network with individual transmit power and rate constraints. The formulated problem encompasses cellular, dedicated and shared D2D modes, frequency sharing and power control.
The authors in [8] designed an offloading scheme based on the usage of small base stations (SBSs) to offload traffic from the MBS, and also to use some UEs as relays to reach users not in the coverage areas of the SBSs. Similarly, Tsiropoulos et al. proposed in [9] a cooperation framework for heterogeneous devices to improve the spectrum access. Finally, the authors of [10] optimized the EE of D2D communications in a hybrid cellular network composed of millimeter wave and microwave cells, using power control and radio resource allocation, whereas Hao et al. investigated in [11] the EE-SE tradeoff for D2D communications underlying a HetNet. They proposed a two-stage solution to solve the power and spectrum allocation problems.
On the other hand, caching has attracted attention due to its ability to reduce the backhaul traffic and eliminate duplicate transmissions of popular content [1], [12]. Caching has been investigated for use in different network types, namely with macro cells, small cells, or D2D [13]. Peng et al. proposed in [14] content placement in MBSs to minimize the average download delay of files by users. Also, the authors of [15] [20] the D2D link scheduling and power allocation problems to maximize the system's throughput. In addition, authors of [21] maximized the offloading gain of cache-enabled D2D networks by jointly optimizing caching and scheduling policies. Yi et al. proposed in [22] a traffic offloading framework with social-aware D2D content sharing and caching. In [23], the authors presented a caching algorithm to minimize the average transmission delay in a D2D-enabled macro-cell. The proposed algorithm performed better than popularity-based naive caching. This work was extended in [24] to unknown system files popularity and varying system parameters over time, where a non-parametric learner is used to estimate the intensity function of file requests. Authors of [25] proposed a caching policy for millimeter-wave (mmWave) D2D-based cellular networks, where content is strategically disseminated and exchanged among D2D users for maximized offloading gain, while Amer et al. designed in [26] a D2D caching framework with inter-cluster cooperation, where nodes of the same cluster cooperate through D2D communications, while nodes of different clusters exchange data through cellular transmissions. The formulated problem aimed to minimize the network's average delay under caching, queuing and energy constraints. Sun et al. proposed in [27] mobility and delay-aware caching methods for D2D-enabled cellular networks. The aim is to minimize the average transmission and cache leasing costs. They showed that their successive convex approximation (SCA)-based and greedy algorithms consume less cost that benchmarks methods. In [28], Zhang et al. studied user preference and transmission coverage region aware D2D caching deployment methods, which aimed to maximize a cache utility function, while in [29], the authors investigated multi-winner auction based cache placement in order to minimize the traffic load and average content access delay of D2D-assisted cellular networks. Finally, authors of [30] investigated the segmented caching problem in D2D overlay networks, aiming to minimize the average download latency. They proposed a dynamic caching algorithm that outperforms random and popularity-based caching.
Caching has been also studied in integrated D2D-assisted multi-tier networks. For instance, Yang et al. integrated in [31] caching at the UEs, where each user can obtain the content from its own cache or from the cache of an SBS. The demonstrated significant energy savings compared to no cache-enabled UEs, whereas authors of [32] proposed and analyzed cache-based content delivery in a three-tier HetNet, where popular contents are disseminated in SBSs and in a part of the UEs. Li et al. developed in [33] a multi-tier cacheenabled HetNet framework, where MBS, SBSs and pico BSs coexist. They proposed an optimal distributed caching scheme to maximize the successful delivery probability and proved that the optimal solution depends on the cache sizes and BSs densities. In [34], the authors proposed a similar framework, but they aimed to minimize the average file transmission delay through caching and bandwidth allocation. Also, Quer et al. proposed in [35] a proactive caching strategy that takes into account user mobility and user interest classes for content. In our previous work [36], we optimized caching and bandwidth allocation to minimize the average transmission delay of a D2D-assisted HetNet. We divided the problem into bandwidth allocation and caching problems. The first problem was optimally solved. Then, given a bandwidth allocation strategy, a random search algorithm and a greedy algorithm were proposed to determine the caching policies. Also, we investigated in [37] the caching placement problem in a D2D-assisted HetNet, where we successfully derived the optimal solution that minimizes the average content delivery delay. In [38], the authors aimed to minimize the EE of D2D-aided HetNets through SBSs and D2D data offloading and SBSs power optimization. Finally, Fu et al. proposed in [39] cooperative content caching and delivery based on multicast. They solved the problem using a hydrid genetic algorithm (HGA) and demonstrated its superiority compared to conventional algorithms. The related works as summarized in Table 1.
The aforementioned works studied caching placement and delivery in D2D-aided HetNets in different ways. However, most of them relied on the assumption of slow fading channels during content delivery. Such an assumption is valid in a controlled fixed environment, however, due to shadowing  and/or mobility, the channel rapidly varies during transmissions. Moreover, many of these works assumed orthogonal transmissions, thus ignoring the interference management issue. In an attempt to overcome these shortcomings, we propose in this paper to investigate a multi-tier D2D-assisted HetNet, taking into account caching at all tiers, interference among simultaneous transmissions, and accounting for the effect of fast varying channels on content delivery delay.
The main objective is to maximize the successful delivery probability (SDP), defined as the probability of successfully delivering files to users, under channel resources, caching, and delay constraints. To the best of our knowledge, this is the first work that investigates the joint caching placement and channel allocation problem for SDP maximization, given the effect of fast varying channels. The contributions of this paper are summarized as follows: 1) We analytically derive a Chernoff based upper-bound of the average content delivery delay. The obtained novel expression is used in the formulation of the joint caching and channel allocation problem, aiming to maximize the SDP performance of the system. Since the formulated problem is non-convex and non-linear, we propose to transform it into an integer linear program (ILP) and subsequently solve it optimally using the IBM CPLEX solver. 2) Due to the high convergence time of the optimal solution, we propose instead a low-complex heuristic algorithm to solve the formulated problem. Specifically, we start by solving the channel allocation problem based on a greedy geometrical method. Then, given a channel allocation strategy, the caching placement problem is solved using an iterative approach. 3) Through numerical results, the superiority of the proposed solutions over the conventional cache-enabled HetNet is demonstrated. Moreover, the impact of key parameters are presented. Subsequently, related design guidelines are identified.
The rest of the paper is organized as follows. In Section II, the system model is presented. Section III derives the content delivery delay and formulates the optimization problem. Section IV details the proposed solutions, while in Section V, numerical results are presented. Finally, Section VI closes the paper.

II. SYSTEM MODEL
In this section, we describe the communication model and the content caching and delivery model. Symbols used within the remaining of the paper are summarized in Table 2.

A. COMMUNICATION MODEL
The network consists of one MBS, S SBSs and U UEs, where SBS s ∈ S ≡ {s 1 , . . . , s S } and UE u ∈ U ≡ {u 1 , . . . , u U }, such that U > S > 1. Both SBSs and UEs are deployed randomly within the MBS's coverage area, as illustrated in Fig. 1.
We assume that the MBS and SBSs can share the same frequency channels. Moreover, D2D communications reuse the same cellular channels as the BSs [40]. We assume that intra-cell interference occur when channels are reused at the same time. The number of orthogonal frequency channels is set to W , each with bandwidth B, and such that w ∈ W ≡ {w 1 , . . . , w W }. We define by r u,w the binary variable indicating whether channel w is allocated to the transmission having user u as its receiver or not, such that W w=1 r u,w = 1, i.e., a transmission is allocated a single channel. Moreover, we assume that time is divided into slots (TSs), and in each TS t ∈ N, transmissions may occur for one of the two following purposes: 1) Off-peak content placement, where caching memories of the nodes are updated during low traffic periods, and 2) content delivery to satisfy users' requests. In our system, the MBS, SBS s and user u transmit signals using powers P m , P S and P U respectively, such that P m > P S > P U . Also, we assume that wireless channels are modeled as Rayleigh channels that are constant within a TS of duration τ , but vary from a TS to another. Finally, the communications experience additive white Gaussian noise (AWGN) with zero mean and variance σ 2 0 .

B. CONTENT CACHING AND DELIVERY MODELS 1) CONTENT CACHING
We assume that F files of the same size L kilobits (kb) belong to a library F ≡ {1, . . . , F}, which is fully stored in the VOLUME 9, 2021 core network [32]. We assume that the MBS, SBS s and UE u have caching capacities C m Mb, C s Mb and C u Mb respectively, such that FL > C m > C s > C u , ∀s ∈ S and u ∈ U. We assume that user u belongs to a specific class of interest k ∈ K = {1, . . . , K } that provides a specific ranking order of the file popularity. We introduce the probability that user u belongs to class k, defined as p k u ∈ [0, 1], where K k=1 p k u = 1, ∀u ∈ U. The definition of p k u is motivated by the fact that the user's interests may vary in time due to its environment, behavior or events [35]. For each class k, the file popularity follows Zipf distribution and the probability that user u in class k requests file f can be given by where η(f , k) (resp. η(f , k)) is the rank of file f (resp. of file f ) for a user in class k, and β ≥ 0 reflects how skewed the popularity distribution is, that means larger β exponents correspond to higher content reuse, i.e., the first few popular files account for the majority of requests.
For the sake of simplicity, we assume that a file cannot be cached at more than one node within the network. Indeed, redundant caching may lead to a fast saturation of storage resources, hence slowing down caching updates when old files need to be replaced by new ones. Moreover, caching at node i ∈ C = U ∪S ∪{MBS} is represented by a binary vector c i = c i,1 , . . . , c i,F , where c i,f indicates whether node i caches file f or not [13].

2) CONTENT DELIVERY
When user u requests file f , it starts by checking its own cache memory. If the content is available locally, it is obtained directly without any delay. Otherwise, the request is forwarded to the system controller (typically located within the MBS), and the latter decides which device (among MBS, SBSs, UEs, or core network) will serve user u's request. For proper operation, it is assumed that the controller, e.g., MBS, has knowledge of the nodes' caching status and of the statistics of the channel states within its cell. Also, we assume that a file delivery occupies one channel resource only, but can span over several channel varying TSs and is realized by a single source node only. Finally, only the MBS and SBSs can deliver simultaneously different contents to UEs using orthogonal channels, while a D2D transmitter can serve only one user 1 .

III. CONTENT DELIVERY DELAY EXPRESSION AND PROBLEM FORMULATION
In this section, we define the system's average content delivery delay and derive its upper-bound expression. The latter is then used to formulate the SDP optimization problem.

A. CONTENT DELIVERY DELAY AND UPPER-BOUND DERIVATION
The transmission delay between two nodes i and j is defined as the minimum number TSs to transmit a given file f from i to j. For a time varying channel, the average content delivery delay can be written as where R ij (t) is the channel's data rate in TS t. Without loss of generality, the data rate can be expressed by where P i is the transmit power of node i ∈ C, h ij (t) = is the channel coefficient capturing both shortscale h ij (t) and long-scale d ij /d 0 −α fading, with d 0 is a reference distance, α is the path-loss exponent, |h ij (t)| 2 is the channel gain following an exponential distribution of zero mean and variance ( d ij d 0 ) −α , i ∈I j P i |h i j (t)| 2 is the interference signal, and I j is the set of interferers. The average content delivery delay of link i − j, denoted byT ij , is defined by [14]T where E is the expectation operator and P T ij > T is the probability that T ij is above T . This probability is given by where Y t = log 2 1 + SINR ij (t) and Z T = T t=1 Y t . Theorem 1: P T ij > T is upper-bounded by: when I j = ∅, and by when I j = ∅, where g ij (T , ∅) = ζ 0 (T ) and g ij (T , I j ) = ζ 1 (T , I j ), ∀ I j = ∅. We define byD u,f the average delay of delivering file f to user u. For the sake of simplicity and expression tractability, we assume that when interference occurs, it is dominated by the strongest interferer. Consequently,D u,f can be upperbounded by (9), as shown at the bottom of the next page, where T u,f is the set of potential transmitters of f to user u, T 0 is the average transmission delay on the backhaul link, P T u,f = ∅ is the probability that no node is a potential transmitter of file f to user u, P T u,f = {x} is the probability that node x is a potential transmitter of file f to user u, is the probability that no node is interfering on user u's communication, and P [I u = {y}] is the probability VOLUME 9, 2021 that the transmission of node y interferes on user u reception. Specifically, and where the first two sums in (12) correspond to the channels allocated to the users and the next two sub-sums correspond to the class and the file requested by user u = u. Finally,

B. PROBLEM FORMULATION
We formulate the joint caching and channel allocation problem aiming to maximize the SDP performance as follows: where X = x u,f U ×F is the binary matrix indicating the successful deliveries in the network, C = c i,f |C|×F is the caching placement matrix, R = r u,w U ×W is the channel allocation matrix, and G is a large constant. Constraint (P1.a) imposes that in average, the delivery delay of file f to user u cannot exceed a threshold D th , where we opted for the use of the upper-bound G u,f . Constraints (P1.b)-(P1.c) express caching placement with memory limitation at node i and no caching redundancy. (P1.d)-(P1.e) indicate that the same and unique channel w is allocated for transmissions towards one user, and it can be reused R times at most. Without loss of generality, R is set to satisfy R ≥ U W for fair channel allocation, where . is the ceiling function. Finally, (P1.f)-(P1.h) reflect the binary nature of the optimization variables.
The defined optimization problem is non-linear due to the several product terms within the average content delivery delay expression in (P1.a), and is non-convex due to the binary constraints. Nevertheless, in what follows, we propose a method to transform it into an ILP and solve it optimally.

IV. PROPOSED SOLUTIONS
We start in this section by proposing an approach to solve the problem optimally. Then, we develop a heuristic method that provides sub-optimal solutions but with significantly lower complexity.

A. OPTIMAL SOLUTION DESIGN
In order to make problem (P1) tractable, we propose to substitute iteratively the products of the binary variables in function G u,f by new single variables [35], [41], as shown at the bottom of the next page. The resulting new problem (P2), as shown at the bottom of the next page, where in (P2.a), as shown at the bottom of the next page G u,f is expressed by (14), as shown at the bottom of the next page, and in (P2.i)-(P2.r), as shown at the bottom of the next page, we have: 1,w = δ u 1,w and u u ,w = u u −1,w δ u u ,w , ∀u ∈ U\{1} with δ u u,w = r u,w and δ u u ,w = 1 − r u ,w , ∀u ∈ U\{u}, Hence, problem (P2) is an ILP, which we model using the AMPL language [42], then we solve it using the IBM CPLEX Optimizer [43].
To study the solving complexity of problem (P2), we determine the total number of variables involved. For the variables in X, the expected number of nonzero x u,f for each file f can be defined by N . In (14), the number of variables is dominant in the last term. Indeed, we can rewrite ∀u ∈ U, u ∈ U\{u}, w ∈ W, f ∈ F, and f ∈ F\{f }. Since the maximum number of nonzero r u,w and r u ,w is limited by R, ∀w ∈ W, and that of φ x |C|,f (resp. φ y |C|,f ) is bounded by 1, ∀f ∈ F and ∀x ∈ C\{u, u } (resp. ∀f ∈ F\{f } and ∀y ∈ C\{u, u , x}), then, the total number of variables in the optimization problem (P2) is in the order of

B. HEURISTIC SOLUTION DESIGN
Due to the high complexity to solve (P2) optimally, we propose the following heuristic channel allocation and caching algorithms. First, following a geometrical approach, channel allocation is realized to minimize the overall interference. Then, caching is performed to maximize the number of popularity-wise file-user pairs.

1) CHANNEL ALLOCATION
This process depends on the available number of channels and on the number of users. Indeed, if W ≥ U , it is optimal to dedicate one channel per user. However, if W < U , channel allocation has to be adequately designed in order to minimize interference between simultaneous transmissions. To do so, we define the following parameters. Let X = div(U , W ) and R = mod(U , W ) be the quotient and remainder of the Euclidean division of U by W , respectively. For channel usage fairness, we impose that for W − R channels, each channel has to be shared by X users, while for the remaining R channels, each one is shared by X +1 users. Finally, we set r u,u ,w ≤ r u,w ; r u,u ,w ≤ r u ,w ; r u,u ,w ≥ r u,w + r u ,w − 1, (P2.r) The key behind using the metric ν i is that we aim to define similar polygons of widely spaced users. Subsequently, the channel allocation procedure can be described by Algorithm 1. where SDP f is the successful delivery probability of file f and Q u,f = K k=1 p k u q k f is the averaged file f popularity for user u, with respect to the K user classes.
We define by Q f = U u=1 Q u,f the popularity metric of file f . By ranking Q f from the largest to the smallest value in a set Q, an iterative file placement approach can be implemented. The latter is described by the following steps: 1) Starting with the most popular file in Q, we find its best location within the network that achieves the maximal SDP f , with respect to the constraints; 2) we update the caching memories; 3) we move to the next file in Q and repeat the previous two steps; 4) we repeat the previous step until all files are processed. This approach is summarized in Algorithm 2.
The heuristic algorithm complexity can be seen as the dominant complexity issued from Algorithms 1 and 2. Complexity of Algorithm 1 is found straightforward as O(UW ), whereas in Algorithm 2, at each Q f loop, cache placement complexity is O(U + S + 1). Hence, the overall complexity of Algorithm 2 is O((U +S +1)F). Finally, the complexity of the heuristic approach is O(max(UW , (U + S + 1)F)), which is significantly lower than that of the optimal solution.

V. NUMERICAL RESULTS
In the following simulations, we assume a network of one MBS that occupies the center of a circular region, with radius Set C = C 6: Calculate SDP f when f is placed in node i, ∀i ∈ C 7: and (P1.f)-(P1.g) 8: Set c i 0 ,f = 1 10: else 11: Return to step 6 13: end if 14: end for 15: Return C equal to 100 meters, four uniformly located SBSs within 10 √ 50 ≈ 71 meters from the MBS, and U = 8 randomly located users within the MBS's coverage area. Unless stated otherwise, the system's key parameters are set as follows. We set W = 3 channels, using bandwidth B = 10 MHz [34]. Moreover, we assume that the path-loss exponent for the wireless channels follows from the urban micro street canyon model and is equal to α = 3.1 [44]. The duration of a TS is fixed to 10 ms [45], the delay threshold is chosen stringent as D th = 5 TS = 50 ms, which may correspond for instance to an infotainment service [46], while the average backhaul delayT 0 = 10 TS = 100 ms [47]. The transmit signal-tonoise ratio (SNRs) are defined and set to γ m = P m σ 2 0 = 20 dB, γ S = P S σ 2 0 = 17 dB, and γ U = P U σ 2 0 = 10 dB. 2 For caching, we assume that K = 3 user classes, for which we arbitrarily assign the probabilities of belonging to specific user classes by while β = 2 reflects more centralized file requests [34]. For the sake of simplicity, we assume a small library F = 100 files, short files L = 100 kb, and small caching capacities C m = 500 kb, C s = 200 kb, and C u = 100 kb. 3 In Fig. 2 we present the SDP performance as a function of γ U for the ''Optimal'' solution, the proposed ''Heuris-tic'', and the conventional Heterogeneous network, called ''No D2D'' [34], given different D th and W values. The γ U values have been selected to reflect typical SNRs obtained for a low-power communication device (e.g., mobile phone), a medium-power communication device (e.g., laptop), and a high-power communication device (e.g., connected vehicle), respectively. The optimal solution achieves SDP performances close to 1 in most cases. Also, the proposed heuristic achieves acceptable performances in most scenarios, which outperform those of the conventional ''No D2D'' method. In Figs. 2(a)-2(b), we notice that the SDP of the optimal and our proposed schemes improve with γ U . Indeed, with higher transmit power, several UEs become interesting candidates to cache files and deliver to others in an interference-free environment (since W = U ). In contrast, ''No D2D'' achieves the same SDP performance for any γ U since it is independent from D2D caching. In Figs. 2(c)-2(d)), due to interference among several communications, the SDP degrades compared to the previous cases. This degradation is pronounced for the proposed heuristic and ''No D2D'' schemes in stringent QoS conditions, i.e., D th = 5. Indeed, this case demonstrates the limits of the proposed heuristic approach, as it leads to sub-optimal performances. Nevertheless, the complexity is dramatically reduced compared to the optimal solution as illustrated in Table 3.
For the remaining of this section, unless stated otherwise, only the SDP performances of the proposed heuristic solution are shown, where γ U = 10 dB, L = 100 kb, U = 22 users, and W = 3 channels. Fig. 3 illustrates the SDP as a function of C U , for different numbers of users. C U values have been chosen to correspond to the caching of a single or several contents at the devices. When C U increases, the SDP performance enhances. Indeed, higher C U favors more caching in users and leverages low interference D2D communications. Moreover, as U grows, the SDP significantly improves due to a better caching policy, which places the files within better located users, compared to the scenario with a lower number of users. Hence, incentivizing users to participate by caching and delivering files to other users is advantageous in terms of SDP performance.   significantly degrades the SDP performance. Hence, content segmentation may be beneficial when sufficient caching capacity is available within the devices. In addition, the SDP decreases with the number of SBSs. Indeed, a small S means that less caching capacity and reliable wireless links are available to deliver files to users within the network. Consequently, network densification provides an SDP gain if the incurred interference can be efficiently mitigated. In Fig. 5, the impact of the threshold D th on the SDP is studied for differentT 0 values. D th values are selected to reflect delay-tolerant, moderate, and critical services. When D th is small, the SDP performance decreases. This is expected since a stricter delay threshold would inevitably penalize the files delivery. In general, a better backhaul link (smallT 0 ) is expected to improve the SDP since more file deliveries can be conducted through the core network. However, we see that this does not apply for D th = 11. Indeed, even with a better backhaul link, the caching policy prefers to rely on the devices within the network (i.e., MBS, SBSs, and UEs), enabled by the larger D th value, rather than on the core network that has a shorter backhaul delay.

VI. CONCLUSION
In this paper, we studied the joint caching and channel allocation for D2D-assisted wireless HetNets. First, we derived the expression of the average content delivery delay under the fast varying channels condition. Then, the latter is used to formulate the SDP maximization problem with respect to the communication and caching constraints. The joint problem is identified as non-linear and non-convex. To solve it optimally, we proposed a linearizion approach into an ILP. Since the optimal solution incurs a high computation complexity, we proposed a low-complex two-step heuristic method. In the first step, the channel allocation strategy is determined using a geometrical approach, while in the second step, an iterative caching policy is developed. Through numerical results, we showed that our proposed heuristic algorithm is able to achieve between 60% and 90% of the optimal solution's SDP performance, in a very small execution time. Besides, it outperformed the conventional cacheenabled HetNet. Finally, the impact of key parameters is investigated. Obtained results provide the following design guidelines: 1) Incentivizing users to participate in the caching process is beneficial to the network; 2) network densifying may bring an additional performance gain; and 3) a better backhaul link is generally advantageous for stringent QoS requirements, however, loosened QoS favors more caching within the network devices and less use of the backhaul link.

APPENDIX A PROOF OF THEOREM 1
By applying the Chernoff bound to (5), and assuming independence between random variables (RV) Y n , ∀n = 1, . . . , T , we obtain [49] Let Z n = e −tY n and X n = SINR ij (n), ∀n = 1, . . . , T , then, the cumulative distribution function (cdf) of Z n is given by F Z n (z) = P e −t log 2 (1+X n ) < z = P X n ≥ 2 − ln(z) where F X n (x) = 1 − e − x ϑ ij , ∀x ≥ 0, is the cdf of RV X n in an interference-free environment. From (19), we derive the probability density function (pdf) of Z n as Consequently, the mean of Z n is calculated by By combining (21) into (18), ζ 0 (T ) is obtained, given by (6).
In an interfered environment, the cdf of X n can be obtained from (Eq. A.4, [34]) as