Online Learning to Cache and Recommend in the Next Generation Cellular Networks

An efficient caching can be achieved by predicting the popularity of the files accurately. It is well known that the popularity of a file can be nudged by using recommendation, and hence it can be estimated accurately leading to an efficient caching strategy. Motivated by this, in this paper, we consider the problem of joint caching and recommendation in a 5G and beyond heterogeneous network. We model the influence of recommendation on demands by a Probability Transition Matrix (PTM). The proposed framework consists of estimating the PTM and use them to jointly recommend and cache the files. In particular, this paper considers two estimation methods namely a) <monospace>Bayesian estimation</monospace> and b) a genie aided <monospace>Point estimation</monospace>. An approximate high probability bound on the regret of both the estimation methods are provided. Using this result, we show that the approximate regret achieved by the genie aided <monospace>Point estimation</monospace> approach is <inline-formula> <tex-math notation="LaTeX">$\mathcal {O}(T^{2/3} \sqrt {\log T})$ </tex-math></inline-formula> while the <monospace>Bayesian estimation</monospace> method achieves a much better scaling of <inline-formula> <tex-math notation="LaTeX">$\mathcal {O}(\sqrt {T})$ </tex-math></inline-formula>. These results are extended to a heterogeneous network consisting of M small base stations (SBSs) with a central macro base station. The estimates are available at multiple SBSs, and are combined using appropriate weights. Insights on the choice of these weights are provided by using the derived approximate regret bound in the multiple SBS case. Finally, simulation results confirm the superiority of the proposed algorithms in terms of average cache hit rate, delay and throughput.


I. INTRODUCTION
Due to the fast development of communication based applications, it is expected that there will be 5.3 billion total Internet users (66 percent of global population) by 2023, up from 3.9 billion (51 percent of global population) in 2018 [2].To enhance the users' Quality of Experience (QoE), several architecture have been proposed such as FoG networks, mobile edge computing (MEC) etc.These solutions enable the network to proactively predict the future content requests and store popular files closer to the edge devices.This reduces the delay and alleviates the backhaul congestion [3]- [7].On the other hand rapidly growing file sizes, reduced cache sizes (compared to the traditional content delivery networks), and unpredictable user demands make the task of caching algorithms even more difficult.For example, the total data generated by Google per day is in the order of PBs, while installing 1TB memory in every small cell in the heterogeneous network will only shift less than 1 % of the data for even one content provider.To overcome these issues, it has been observed that the user demands are increasingly driven by recommendation based systems.Recommendation based on an individual's preference have become an integral part of e-commerce, entertainment and other applications.The success of recommender systems in Netflix and Youtube shows that 80% of hours streamed at Netflix, and 30% of the overall videos viewed owes to recommender systems [8], [9].With recommendation, the user's request can be nudged towards locally cached contents, and hence resulting in lower access cost and latency [10].This core concept was further expanded to encompass cache-assisted small cell networks in [11].Subsequently, numerous studies were carried out using diverse performance metrics to examine the combined recommendation and cache optimization [12]- [15].
The recent success of integration of artificial intelligence in the wireless communications has further led to better understanding of user's behavior and the characteristics of the network [16].
Especially the edge networks can now predict the content popularity profile hence increasing the average cache hit.The high accuracy in prediction by the neural networks has resulted in many of the content popularity prediction models, such as, collaborative filtering with recurrent neural networks [17], the stack auto-encoder [18], deep neural networks [19], and others.However, the local content popularity profile need not match the global prediction by the central server.Many of the recent works have proposed the edge caching strategies by learning the user preferences and content popularity [20], [21].Context awareness helps in classifying the environment, hence enabling the intelligent decisions at the edge to select the appropriate content, for instance, Chen et al. [22] presented the edge cooperative strategy based on neural collaborative filtering.
In [7], a context-aware caching policy through a cooperative Deep Reinforcement Learningbased algorithm is proposed.In [23], the authors jointly optimize the content placement and content delivery in the vehicular edge computing and networks.The authors in [24], devised a novel integrated framework enabling the dynamic orchestration of networking, caching, and computing resources, thereby enhancing the performance of future vehicular networks.Jiang et al. [25] used the offline user preferences data and statistical traffic patterns and proposed an online content popularity tracking algorithm.Nevertheless, the availability of offline data cannot always be guaranteed.Therefore, these studies make the assumption that users possess identical preferences and that there is no correlation among the data, even though this may not adhere to real-world scenarios.
Based on the observation that the users demands (and hence better prediction of demands) can be nudged based on recommendations, in this work, we consider the problem of jointly optimizing recommendation and caching in a 5G and beyond heterogeneous network consisting of Macro Base Stations (MBS), small base stations (sBSs) and users.The influence of recommendation on caching is modelled using a Probability Transition Matrix (PTM).Thus, one can optimize the recommendation to steer the requests in a way that results in a good cache hit performance.
This can be done if each sBS has access to the PTM.Unfortunately, the PTMs are unknown and hence needs to be estimated.Towards this, we propose two estimation methods namely Point estimation and Bayesian estimation.The Point estimation method assumes that a random set of files will be recommended in the first t time slots and the estimation is done using the frequency of occurrence of requests for each file conditioned on the recommended files.Being a naive method, the Point estimation method will be used to benchmark the Bayesian estimation method.In the Bayesian estimation scheme, the probability is estimated using a Bayesian approach, i.e., each row of the PTM is sampled from a Dirichlet distribution whose parameters are the naive estimates of the conditional probabilities (similar to the point estimation method).This method enables a nice balance between exploration and exploitation of caching and recommendation.For the above two methods, we provide the following results • For both the methods, a high probability guarantee on the estimated caching and recommendation strategies is provided.Irrespective of the estimation method, it is shown that with a probability of at least 1 − δ, the performance of the proposed caching and recommendation strategy is ϵ close to the optimal solution.
• An approximate high probability bound on the regret for Bayesian estimation method is provided.To compare and contrast the obtained regret bound, we also derive an approximate regret bound on a more powerful genie aided scenario using the Point estimation method. 1 These regret bounds are shown to be data dependent.Hence, in order to get better insights, we carry out experiments to determine the scaling of the data dependent term of the regret.Using this result, we show that the approximate regret achieved by the genie aided Point estimation approach is O(T 2/3 √ log T ) while the Bayesian estimation method achieves a much better scaling of O( √ T ).
• The above results are extended to a heterogeneous network consisting of M sBSs with a central MBS.Since the estimates are available at multiple sBSs, it is possible to combine them at each sBS separately to obtain a better estimate.However, it is important to figure out the right weights to be used for different sBS.Towards this, we prove a regret bound for both the estimation methods.As time T increases, each sBS collects more samples.
Intuitively, as T increases weights allocated to different sBSs' estimates should go down to zero.We confirm this intuition by using the regret bound that we derive.Further, we also show a scaling similar to the single sBS case, and show that as M increases, the regret increases.However, for large T , the effect of M is minimal as the weights allocated to each sBS will go to zero.
• We conduct extensive simulation results to corroborate our theoretical findings.In addi-

II. SYSTEM MODEL AND PROBLEM STATEMENT
The system model consists of a wireless distributed content storage network with M sBSs serving multiple users and one central MBS, as shown in Fig. 1.Each sBS can store up to F contents/files of equal sizes from a catalog of contents denoted by C := {1, 2, . . ., F }.The requests are assumed to be independent and identically (iid) distributed across time. 2 As we know, recommending a file influences the users request process, and hence recommendation can provide "side information" about the future requests.In this paper, we consider the problem of jointly optimizing recommendation and caching policies in a cellular network.We model the influence of recommendation on the request via a conditional probability distribution denoted by p ij,k , which represents the probability that a user requested a file i from the sBS k given the content j was recommended [26].Without loss of generality we assume that the time is slotted, and the PTM matrix for the k-th sBS denoted by (P k ) ij := p ij,k , i, j = 1, 2, . . ., F is assumed to be fixed across time slots.For the sake of simplicity, it is assumed that at least one file is requested in every slot by each N th user in the network. 3Let us use u i and v j to represent the probabilities with which a file i is cached, and the file j is recommended at any sBS, respectively.This induces a set of caching and recommendation strategies denoted by where r and c are recommendation and cache constraints, respectively.In the sequel, the strategy is defined by the pair (u, v).For a given strategy (u, v) ∈ C c,r , the average cache hit at the sBS k is given by u T P k v.If the matrix P k is known apriori at the sBS k, the optimal strategy can be found by solving max (u,v)∈Cc,r u T P k v. 4 However, the matrix P k is unknown, and therefore it needs to be estimated from the demands.Let the variable d k,i denote the demand at the sBS k, and is defined as the total number of requests in the time slot t for the file i.Since the demands arrive sequentially, the PTMs need to be estimated and updated in an online mode.
The performance of such algorithms is measured in terms of regret.As opposed to the adversarial setting of online learning, here we have assumed that there is an underlying distribution from which the requests are generated, namely the PTM.Accordingly, the following provides the definition of the regret, which depends on the PTM.
Definition 1: (Regret) The regret at the sBS k after T time slots with respect to any sequence of strategies (u k,t , v k,t ), t = 1, 2, . . ., T is defined as where (u k, * , v k, * ) := arg max (u,v)∈Cc,r u T P k v is the optimal strategy at the sBS k.
In this work, we provide answers to the following two questions: (i) how should one cache and recommend files in an online fashion that results in a sub-linear regret?, and (ii) how should a sBS use the caching and recommendation solutions of the neighboring sBSs to improve its own performance?Towards answering the first question, we propose two strategies at each SBS that result in a minimum regret.In particular, we consider two approaches that aim to find estimates of the PTMs in an online fashion, namely (i) Point estimation and (ii) Bayesian estimation methods, and solve the caching/recommendation problem.The first method is a naive method which acts as a benchmark while the second method balances the exploration and exploitation tradeoff that is typical in any regret minimization algorithm.
Towards finding an answer to the second question above, we consider a linear combination of estimates of PTMs from the neighboring sBSs and find the coefficients that result in a smaller regret.In the following section, we provide caching and recommendation algorithms for single sBS scenario, and provide theoretical guarantees for them.In the later sections, we extend the analysis to multiple sBSs.

III. JOINT CACHING AND RECOMMENDATION FOR SINGLE SBS SCENARIO
In this section, we consider a single sBS, i.e., M = 1.As mentioned above, using the demands obtained at the sBS, an estimate of the PTM matrix is computed using either Point estimation or Bayesian estimation method.Given an estimate P (t) k , the caching and recommendation strategies will be found by solving the following problem5 Now, we present the following two estimation procedures used in this paper.
• Point estimation: Given any SBS k, in this method, the demands until t time slots is used to compute an estimate of the matrix P k .During the first T time slots, recommendation and caching are done in an i.i.d.fashion with probabilities q and p, respectively.Let v t jk = 1 if file j was recommended in the slot t − 1, and zero otherwise.The recommendation and caching constraints in (1) are satisfied by choosing q := r/F and p := c/F .We can see that as the value of t increases, the estimate becomes better, and hence results in a better performance.The estimate of the ij-th entry for the kth SBS of the P k matrix is given by The above is a naive estimate of the probabilities by using a simple counting of events.The corresponding estimate of the matrix P k be denoted by Pk . Since E{p , the point estimator is an unbiased estimator.Note that the regret obtained in the first T slots will be maximum (O(T )) as the caching and recommendations are done in a random fashion.However, to use this scheme as a benchmark, we assume an identical genie aided system where an estimate Pk is available at time slot t + 1 for caching and recommendation.Using this estimate, caching and recommendation are done by solving the optimization problem in (3) with P (t) k as the estimate in (4) for all time slots t = 1, 2, . . ., T .
As we expect, the corresponding regret is small as the estimate at each time slot is good, and hence acts as a benchmark.
• Bayesian estimation: In this method, for a given time slot, rows of the matrix P (t) k are sampled using a prior distribution, which is updated based on the past demands.This may tradeoff the exploration versus exploitation while solving for the optimal recommendation and caching strategies.Here, Dirichlet distribution is chosen as a prior.The Dirichlet pdf is a multivariate generalization of the Beta distribution, and is given by α j ≥ 0 ∀ j.The Dirichlet distribution is used as a conjugate pair in Bayesian analysis and the shape of the distribution is determined by the parameter α j .If α j = 1 ∀ j, then it leads to a uniform distribution.The higher the value of α j , the greater the probability of occurrence of x j .The notation (x 1 , x 2 , . . ., x M ) ∼ Dirch(α 1 , α 2 , . . ., α M ) indicates that (x 1 , x 2 , . . ., x M ) is sampled from a Dirichlet distribution in (5).An estimate in the beginning of the time slot t of the i-th row of the matrix where v s−1 jk is as defined earlier with v 0 jk sampled from {0, 1} with probability q := r/F .
After every time slot t, the recommendation and caching probabilities are selected by solving Algorithm 1 Caching and recommendation algorithm (one sBS case) at any sBS k.
In the following subsection, we provide theoretical guarantees of the above algorithm.

A. Theoretical Guarantees
In this section, we provide a high probability bound on the regret for both a genie aided Point estimation and Bayesian estimation.For the Point estimation case, we start by providing a lower bound on the waiting time which results in a performance that is ϵ close to the optimal performance.The result will be of the following form: With a probability of at least 1 − δ, the following holds provided t ≥ constant where (u t , v t ) is the caching strategy obtained by using any algorithm.The constant ϵ depends on various parameters, as explained next.This result will be used to find a genie aided regret bound for the point estimation method.Towards stating theoretical guarantees, the following definition is useful.
The following theorem provides a bound that is useful to provide the final result.
Theorem 3.1: For a given estimate of the PTM denoted by P (t) k using Point estimation or Bayesian estimation, the following holds good Pr sup where (u t , v t ) is the output of the Algorithm 1 at time t, and ∆P (t) Using the above result, in the following, we provide our first main result on the performance of the Point estimation scheme.
Theorem 3.2: Using (3) for caching and recommendation in slot t, for any ϵ > 0, with a probability of at least Proof: See Appendix B.
As we know, the regret achieved by the Point estimation method is O(T ) as it incurs non-zero constant average error for all the slots t satisfying (9).In this method, the estimation of PTM is done using the samples obtained from the first t slots, and the caching strategy is decided based on this estimate.However, an improvement over this is to continuously update the estimates, and the caching/recommendation strategies.Instead of analyzing the regret for this, we assume that at any time slot t, a genie provides an estimate of the PTM as in (4) to compute the caching/recommendation strategies, and provide the corresponding approximate regret bound.In particular, in Appendix C, we show the following bound on the regret for a genie aided point estimation method.
Theorem 3.3: With a probability of at least 1−1/T , a regret of O(T 2/3 √ log T ) can be achieved through the genie aided Point estimation method.
It can be observed that the regret scales faster than √ T .In the following, we present the result for Bayesian estimation method, and contrast the result with the genie aided case.

B. Bayesian Estimation: Single sBS Scenario
Note that unlike the analysis for Point estimation, in this case, the strategies are correlated across time, which makes the analysis non-trivial.The approach we take is to convert a sequence of random variables (function of caching and recommendation across time) into a Martingale difference.This enables us to use the Azuma's inequality, which can be used to provide high probability result on the regret.In the following, we provide the result.
Theorem 3.4: For the Bayesian estimation in Algorithm 1, for any ϵ > 0, with a probability of at least 1 − 1/T , the following bound on the regret holds where α 2 , and ψ t > 0. Proof: See Appendix F.
Remark: Note that the above result is an algorithm and data dependent bound as it depends on the recommendation strategy and the demands.As a consequence, the choice of ψ t to obtain better regret is not clear.In order to provide more insights into the result, we plot σ2 t versus time slot in Fig. 2. In the same plot, we have also shown that O(1/t) is a good fit for σ2 t .Furthermore, the cardinality of the ϵ cover does not scale with T .Thus, by choosing ψ t = 1/t a , the regret becomes O(T a ) + O(T 1−a ) + T 1/2 , where a ≤ 1/2.Thus, by choosing a = 1/2 results in a √ T regret.Recall that an approximate regret of O(T 2/3 √ log T ) is shown for the genie aided case while the Bayesian estimation method achieves a regret of the order √ T .In other words, the Bayesian performance is better than the genie aided regret in the point estimation case by a factor of T 1/6 .In the next section, we extend our results to two sBS scenario.

IV. PROPOSED CACHING AND RECOMMENDATION STRATEGIES WITH MULTIPLE SBSS
In this section, we present caching and recommendation algorithms when there are multiple sBSs.In particular, we provide insights on how to use the neighboring sBSs estimates to further improve the overall caching and recommendation performance of the network.First, we present the results for two sBS scenario, and similar analysis will be used to extend the results to multiple sBSs.

A. Two Small Base Station Scenario
In this subsection, we consider a two sBSs scenario connected with the same MBS.As described in Section II, P 1 and P 2 represent PTMs for sBS- where λ k ∈ [0, 1], k = 1, 2 strikes a balance between the two estimates.The above estimate is used to compute the respective caching and recommendation strategies for the two sBSs and will be communicated to the respective sBSs.The above results in a better estimate, for example, when P 1 = P 2 or when the two matrices are close to each other.The corresponding algorithm is shown below.First, we prove the following guarantee for the Point estimation method.
Theorem 4.1: For Algorithm 2 with Point estimation, for any sBS k and for any ϵ > 0, with a probability of at least 1 − δ, δ > 0, the regret Reg k,T < ϵ, i.e., Pr (u * k,t where Further, As in the single sBS case, to benchmark the performance of Bayesian estimation method, we consider a genie aided scenario, and in Appendix D, we show that it achieves an approximate regret of where V 12 := sup (u,v)∈Cc,r u T (P 2 − P 1 )v and Θ = 3 8κ 2 F 2 c 2 r 2 (log 4F 2 T 2 +F ) qN .Note that when the second term is non-zero, i.e., P 1 ̸ = P 2 , the above clearly shows the trade-off between the two terms.The first term scales as T 2/3 while the second term scales with T linearly.This can be balanced by using T reveals that as time progresses, i.e., as the sBS k collects more samples, the weights allocated to the neighboring sBS should go down to zero, as expected.Furthermore, by appropriately choosing λ k as above, the regret obtained is of the order T 2/3 .On the other ∼ {0, 1} from q = r/F .

6:
if point estimation then Choose λ i , and find Q(t) i from (11), and solve Use (v * k,t , û * k,t ) to recommend and cache.

13:
end for 14: end procedure extreme when P 1 = P 2 , the second term is zero.In this case, the optimal choice is λ k = 1/2, as expected.Next, we present the guarantees for Algorithm 2.
Proof: See Appendix H.

Remark:
The result shows the trade-off exhibited by λ k .In particular, larger λ k makes the first regret inside the max term in ( 16) larger, and smaller λ k ensures that the second term inside the max above dominates.Similar to the single sBS scenario, using in O( √ T ) scaling of regret.Further, the above result is an algorithm dependent bound as the bound depends on the recommendation strategy, which is determined by the algorithm.Following the single sBS case, we can show similar regret of O( √ T ), which is superior to the point estimation method.In the simulation results section, we present more details on this trade-off in the finite T regime.In the next section, we extend the analysis to multiple sBSs.

B. Multiple Small Base Station Scenario
In this section, we extend the analysis and algorithm of the previous section to heterogeneous network with M sBSs connected to a central MBS.The requests at each sBS are assumed to be i.i.d. with PTM P 1 , P 2 , . . ., P M as described in Section II.Similar to the two sBS model, each sBS computes an estimate of the PTM as follows where λ M , k = 1, 2, . . ., M are non-zero coefficients to be determined later that satisfy The following theorem is a generalization of two BS model which provides a guarantee on the minimum time required to achieve a certain level of accuracy with high probability.
Theorem 4.3: Using (17) for point estimation, for any with a probability of at least 1 − δ, δ > 0, for any BS k, the regret Reg k,T < ϵ provided where M M , and Proof: See Appendix I.
In Appendix E, we show that the regret for the genie aided case after appropriate choice for λ k is given by Remark: Note that the value of regret depends on the values of λ M and the term D k .The first term scales as T 2/3 while the second term scales with T linearly.Using λ √ T for i = 1, 2, . . ., M results in a balance between the two terms.In particular, this leads to a regret that scales as O(T 2/3 ).Similar to the single sBS case, the choice λ reveals that as time progresses, i.e., as the sBS k collects more samples, the weights allocated to the neighboring sBS should go down to zero, as expected.For finite T , one can optimize the above regret with respect to λ M 's, and find the optimal choice; this is relegated to our future work.Next we present the regret bound for the Bayesian estimation method.
Theorem 4.4: Using (17) for Bayesian estimation, for any with probability of at least 1 − δ, δ > 0, for any BS k ∈ {1, 2, . . ., M }, we have the following bound on the regret where R k (ϵ, δ) is as defined in Theorem 3.4 for the single sBS, and Proof: See Appendix J.
Remark: As in the case of single and two sBS scenarios, the above result is an algorithm dependent bound as it depends on the recommendation strategy.Using λ ) scaling of regret.Following the single sBS analysis, one can show that even in the multiple sBS scenario also, the regret is superior to the point estimation method and scales as O( √ T ).Clearly, the regret obtained is better than the genie aided scenario whose regret scales as O(T 2/3 ).In the next section, we present experimental results that corroborates our theoretical observations.

V. SIMULATION RESULTS
In this section, simulation results are presented to highlight performance of the proposed caching and recommendation model.The simulation setup consists of multiple sBSs with multiple users.We assume a time-slotted system in the simulation setup.For the heterogeneous model, the simulation consists of two scenarios as follows: • Fixed Link Scenario: In this case, the links between sBS and users are uniformly and independently distributed in {0, 1} with probability 1/2.
• SINR based Scenario: In this case, the sBS and users are assumed to be distributed uniformly in a geographical area of radius 500m.It is assumed that a sBS and users can communicate only if the corresponding SINR is greater than a threshold.This SINR takes into account the fading channel, the path loss, power used, and the distance between the user and the sBS.The minimum rate at which a file can be transferred from the sBS to a user is given by the threshold, and hence the reciprocal of the rate indicates the delay.In the simulation, we have used τ := 1 log(1+SINR) as a measure of the delay between a user and a sBS.However, when the requested file is absent, a backhaul fetching delay of α × τ is counted in addition to the downlink delay of τ , i.e., the overall delay when the file is absent is (α + 1)τ , with α = 10.Also, if the threshold is R, then at least R bits can be sent in a time duration of at most 1/ log(1 + SINR) seconds, and hence the throughput is roughly R log(1 + SINR) bits/second.Fig. 3 shows the throughput plot for the considered heterogeneous system with 5 sBS and 30 users.In Fig. 3, the total number of files and threshold value for SINR are 100, and 12, respectively.The throughput for the proposed algorithm with recommendation is 225 bits/s for a cache size of 24, while LRFU, LRU and LFU algorithm have a throughput of 100 bits/s, 85 bits/s and 70 bits/s respectively for the same cache size.Thus, from Fig. 3 we can see that the proposed algorithm has higher throughput as compared to the existing algorithms.LFU algorithm.In Fig. 4, the number of sBSs, the number of users, the total number of files, and  the threshold value for SINR are 2, 25, 100 and 12dB, respectively.From Fig. 4 we can observe that the delay of both the proposed algorithms is less as compared to the other benchmark algorithms, since pre-fetching files according to the estimated methods results in lower fetching costs from the backhaul and hence less delay.
Fig. 5 shows the plot for two sBS model.The value of λ 1 is varied between 0.1 and 1.From the Fig. 5, we can observe that as the value of λ 1 approaches 0.5, the average cache hit increases.This is because for λ 1 = 0.5 and P 1 = P 2 , the Q popularity profile matrix of MBS will have maximum similarity to the individual sBS popularity profile matrix and hence the cache hit will be maximum for λ 1 = 0.5 and it will gradually decrease as we further increase the value of λ 1 .
Fig. 6 shows the plot for average cache hit versus λ for 2 sBS when P 1 ̸ = P 2 .From the Fig. 6, we can observe that for larger T , the optimal lambda value is close to 1. Also, for smaller value of T , depending on the value of Θ, the optimal value of λ is less than 1 and as shown in Fig. 6, the optimal value of λ is 0.3.Thus, the simulation results prove that the recommendation helps in increasing the average cache hit when compared to the algorithm without recommendation and it also performs better than the existing popular LRFU, LRU and LFU algorithms.

VI. CONCLUSION
In this paper, we have proposed a novel joint caching decision along with recommendation in the upcoming next generation cellular networks.We leverage the implications of recommendation on user requests and the overall average cache hit is improved.Two estimation methods, Bayesian estimation and Point estimation are used to determine the user request pattern.An algorithm is then proposed to jointly optimize caching and recommendation.A multitier heterogeneous model consisting of MBS and sBSs is also presented and an approximately high probability bound on the regret for both the estimation method is provided.Finally, simulation results and theoretical proofs support the superior performance of the proposed method over the existing algorithms.y i , we get From ( 21) and ( 22) where ∥ F , the above can be further bounded to get N ϵ Pr ∥ ∆P ∥ F ≥ ϵ 4κrc .This completes the proof.

APPENDIX B
PROOF OF THEOREM 3.2 Consider the following where γ := ϵ 4κrc , and the second inequality above follows from the union bound.Conditioning on Since V ls is a binomial random variable with parameter q, the above average with respect to V ls becomes The following bound on the left hand side of ( 8) can be obtained using the above in (23), and substituting it in (8) An upper bound on the above can be obtained by using 1 − x ≤ e −x .Using the resulting bound, where ϵ t > 0. From Theorem 3.2, it follows that for any ϵ t > 0, we have Pr {Reg k (t) ≤ ϵ t } ≥ 1 − δ provided (9).By choosing δ = 1 T 2 , the approximation e −x ≈ 1 − x for small x, and where ⪆ is used to denote "approximately greater than or equal to".Assuming ϵ t < 1 and using log x ≈ x for small x, we have log . 6 Now, we can use (27) to write ϵ t in terms of t to get Note that by choosing large enough t, the above can be made less than one.Since we are looking for order result, this does not change the final result.In other words, with a probability of at . Using this result in (26), we get the following result.With a probability of at least Thus, the above shows that the regret achieved grows sub-linearly with time, and hence (genie aided) achieves a zero asymptotic average regret.

APPENDIX D REGRET ANALYSIS FOR TWO SBS: HEURISTICS
The analysis here is very similar to the analysis of single BS case.We repeat some of the analysis for the sake of clarity and completeness. .
By rearranging and summing over t, the error can be written as follows where Θ = 3 8κ 2 F 2 c 2 r 2 (log 4F 2 T 2 +F ) qN .Using this in the place of ϵ k,t in the above theorem, and summing over t, we get with a probability of at least 1 − 1 T , the following holds for BS k where V 12 := sup (u,v)∈Cc,r u T (P 2 − P 1 )v and Θ = 3 8κ 2 F 2 c 2 r 2 (log 4F 2 T 2 +F ) qN .This completes the approximate analysis.

APPENDIX E REGRET ANALYSIS FOR MULTIPLE SBS: HEURISTICS
The analysis here is again very similar to the analysis of single sBS case.From Theorem

2 Fig. 2 :
Fig. 2: Plot of σ2t versus time.The plot also shows that O(1/t) is a good fit for σ2 t .

Algorithm 2
Caching and recommendation algorithm (two sBS case) 1: procedure PO I N T E S T I M A T I O N /BA Y E S I A N E S T I M A T I O N 1} from p = c/F , and v *

Fig. 4
Fig. 4 corresponds to the SINR scenario for a two sBS model.Fig. 4 shows the average
tion, we also show that the proposed Bayesian estimation method achieves a better Fig. 5: Cache hit v/s cache size for 2 sBS and P 1 = P 2 .
Let x * and y * be solutions to sup (u,v)∈Cc,r u T ∆P .Since x * and y * belong to C c,r , for some i = 1, 2, . . ., N ϵ , there exist x i and y i in A ϵ such that ∥x * − x i ∥ 2 ≤ ϵ/8, and ∥y * − y i ∥ 2 ≤ ϵ/8. v there are N m i.i.d.samples available to estimate p kl .Using Hoeffdings inequality E Pr (p kl − p(t) kl provided t satisfies the bound in the theorem.