Estimating Video Popularity From Past Request Arrival Times in a VoD System

Efficient provision of Video-on-Demand (VoD) services requires that popular videos are stored in a cache close to users. Video popularity (defined by requested count) prediction is, therefore, important for optimal choice of videos to be cached. The popularity of a video depends on many factors and, as a result, changes dynamically with time. Accurate video popularity estimation that can promptly respond to the variations in video popularity then becomes crucial. In this paper, we analyze a method, called Minimal Inverted Pyramid Distance (MIPD), to estimate a video popularity measure called the Inverted Pyramid Distance (IPD). MIPD requires choice of a parameter, $k$ , representing the number of past requests from each video used to calculate its IPD. We derive, analytically, expressions to determine an optimal value for $k$ , given the requirement on ranking a certain number of videos with specified confidence. In order to assess the prediction efficiency of MIPD, we have compared it by simulations against four other prediction methods: Least Recency Used (LRU), Least Frequency Used (LFU), Least Recently/Frequently Used (LRFU), and Exponential Weighted Moving Average (EWMA). Lacking real data, we have, based on an extensive literature review of real-life VoD system, designed a model of VoD system to provide a realistic simulation of videos with different patterns of popularity variation, using the Zipf (heavy-tailed) distribution of popularity and a non-homogeneous Poisson process for requests. From a large number of simulations, we conclude that the performance of MIPD is, in general, superior to all of the other four methods.


I. INTRODUCTION
In recent years, the popularity of Video-on-Demand (VoD) services has increased tremendously.A growing number of people have begun to use on-demand video services because of the unique flexibility of VoD services to provide what the users want, when they want it.With this rapid increase in popularity, the traffic generated has also dramatically increased.According to the Cisco Visual Networking Index [1], the amount of VoD traffic will nearly double by 2022 and will be equivalent to 10 billion DVDs per month.The sum of IP video will continue to account for 80% to 90% of total IP traffic.Globally, IP video traffic will represent 82% of total traffic by 2022.
The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney.
Service providers are exploiting several methods to keep up with the ever-increasing demand.One widely considered method to enhance the performance of video distribution networks is to pre-place popular objects in strategic locations within the network, based on the anticipated demand [2]- [6].Given the enormous sizes of video objects and constrained capacities of storage, it is impossible to replicate all videos in all locations.In particular, we can store only a few videos in content storage located towards the edge of the network.Moreover, unlike other objects in web systems, transferring and deletion of video objects generally takes time because of their enormous sizes [7].The set of items stored in a storage that serves a group of users needs to be just the right set taking account of the requirements of that group of users.(We use the terms, objects, items, and video, to refer to any content offered under VoD service, including full-length videos, drama series and news items.)This is crucial in deciding which video is worth caching and will be requested sufficiently in the future.Effective cache management can be achieved by finding the most popular videos.Here, the popularity of a video is measured by the request count from a given group of users under consideration.Popularity prediction is a process to estimate the number of requests for videos in the future.
The actual popularity of videos can be affected by other factors, such as, ratings, reviews, recommendations, advertising, discounted pricing, etc., which might be hard to quantify [8]- [11].The study of video popularity estimation is complex and has many directions.One of the main research areas is based on the features (characteristics) of the videos.Trzciński and Rokita [12] use video features (such as visual and social features) to predict the popularity of online videos.By this method, the VoD system can have prior information about the popularity of the videos before they are introduced to the system.
Another crucial research area is an in-depth analysis of the dynamics of the video request arrival process [9]- [11], [13].In this area, many methods have been proposed to predict popularity based on the request statistics.These methods need to overcome prediction challenges associated with the fact that video popularity is highly dynamic and time-varying and request patterns from users are usually not uniform.To capture those variations, changes in video popularity need to be monitored continuously.Popularity prediction based on video requests has a crucial role in overall video popularity prediction, as this method can quickly adjust to newly available data on video demands.
Our paper focuses on the second area, namely dynamic popularity prediction based on historical request statistics of users.In particular, we provide new analyses and extensive simulation studies based on realistic traffic models to evaluate the performance of a range of methods that address the challenges of this research area.Other aspects of caching are beyond the scope of this paper.
We provide an in-depth performance comparison with four other dynamic popularity estimation methods, namely, Least Recency Used (LRU), Least Frequency Used (LFU), Least Recently/Frequently Used (LRFU), and Exponential Weighted Moving Average (EWMA).
The Last-k Algorithm was described in [14] and demonstrated to respond faster than LFU to newly introduced videos.In addition, a formula was derived to approximate the number of videos that should be stored in the cache to achieve a given hit ratio (HR) defined as the proportion of the number of requests served by the local server (cache) to the total number of videos requested.The Last-k Algorithm estimates popularity ranking of videos using their most recent request statistics.The parameter k is the number of past request arrivals for each of the videos.Increasing the value of k improves the accuracy of the algorithm at the expense of the response time, computational complexity, and overhead.This suggests the need for a method to optimize the value of k for a given set of requirements, and this was not provided in [14].
The contribution of this paper beyond the contribution of [14] is as follows.Firstly, we provide two methods to optimize k to meet the accuracy requirement through an analysis of the Last-k Algorithm, leading to a new algorithm of popularity estimation that we call the Minimal Inverted Pyramid Distance (MIPD) algorithm.MIPD is based on a novel entity called Inverted Pyramid Distance (IPD), which is used to quickly identify popular videos given the arrival times of a recent sequence of video requests.MIPD is achieved by emphasizing the importance of recent requests using a pyramid-like weight.Like its predecessor the Last-k Algorithm, MIPD also uses the parameter k used in the Lastk Algorithm for popularity prediction.Again, in MIPD we have a trade-off between accuracy (confidence level) that improves with increased k, but at a cost of increased overhead and computational complexity.Since IPDs of videos vary frequently, IPD updates are needed.Such updates are timeconsuming especially if the value of k is large.MIPD uses an efficient way to update the IPD of each video.
Secondly, based on extensive new simulation results using realistic traffic models, we compare MIPD to four other methods, namely, LRU, LFU, LRFU, and EWMA.These methods will all be described in the next section.We show that MIPD outperforms the other four popularity estimation methods in terms of Hit Ratio (HR) and the accuracy for estimating the rank of videos of given actual popularity in our designed VoD model.
Thirdly, in addition to the straightforward implementation of the MIPD algorithm which is suitable for scenarios where the popularity of the objects is not required to be estimated very frequently.We have also described an optimizing MIPD algorithm for cache replacement, a faster, leaner method, that can handle situations where cache servers are updated whenever a user requests a video not already stored in the cache.
While the MIPD we proposed here can be potentially used in other systems such as web systems, we mainly focus on VoD systems, limiting our discussion and evaluation only to VoD systems.Content management in web cache systems is often considered less challenging as the steadily falling storage costs have enabled distributed web servers (e.g., proxy cache servers) to store an enormous amount of web content [15], [16].Moreover, it is known that in VoD systems, higher performance can be potentially achieved by storing partial videos (partial caching) instead of full videos [3], [17], [18].We do not consider such issues of VoD workload here as the focus of our work is on video popularity estimation based on the request arrival process and not based on content analysis.Irrespective of whether we use partial caching or full caching, having an accurate video popularity estimation method is crucial to achieving better performance.
The remainder of the paper is organized as follows.In Section II, we discuss the existing popularity estimation methods and their applications, include the IPD-based popularity estimation method (Last-k algorithm).Then we describe the MIPD algorithm for video popularity estimation in Section III which is also based on IPD, optimization of k that achieved the required confidence level, and an efficient way for IPD updates.In Section IV, we undertake a deep analysis of real-life VoD system and build a VoD system model.Many simulations based on the VoD model in Section V are used to analyze the influence of parameter k in MIPD algorithm and to compare the performance of MIPD against the other four methods.Concluding remarks are presented in Section VI.

II. RELATED WORK
There are many different research directions in this field and research papers on optimization in the VoD system are found widely in the literature.As discussed in Section I, we focus on popularity estimation in VoD systems based on request statistics of videos.The underlying principles behind most cache replacement algorithms are popularity estimation methods using the request record [3], [17], [19], [20].That is, in every cache replacement algorithm in general, the objects least likely to be requested are identified using past access patterns so that those objects can be removed from the cache when required.Instead of analyzing the viewing behaviors of individual users, we consider VoD arrival requests generated by many users for the purpose of popularity estimation with the aim to achieve optimal caching.
The LFU popularity estimation method and its variants use frequency of requests to identify the least popular object.Consequently, under the LFU cache replacement policy, the item requested the least number of times in the past is removed from the cache.The LFU cache replacement algorithm is often deployed in web cache servers using the frequency of the requests to identify the least popular object, i.e., the object least likely to be requested by the users in future [15], [21].
While the LFU method captures long-term popularity of objects, it responds poorly to changes in user demand as it does not emphasize recent history over earlier reports.Since LFU cannot distinguish between requests that occurred recently with requests that occurred significantly earlier, it can incorrectly identify many ''has-beens'' as popular objects because of their high request count in the past [20], [22].For example, when the LFU method is used, earlier episode of a drama series that is no longer popular might remain in the cache for a long time.
The LRU popularity estimation method and its extensions use the time distance; that is the time from the last request to the current time.Specifically, by using LRU, the object that is requested least recently will be evicted from the cache when needed.While the LRU popularity estimation method responds promptly to changes in object popularity, it does not take frequency of requests into consideration.Consequently, LRU does not capture the long-term popularity of objects.An uneven request arrival pattern might result in less popular objects being identified as popular ones [20], [23].For example, when LRU is used, the cache might replace all the popular videos with contents that have recent and short-lived popularity.
The LFU popularity estimation method is more effective when the popularity of the videos is stable, whereas LRU is more effective when popularity is rapidly changing.Also, when carrying out the content pre-placement or cache replacement, content providers may have certain preferences over short-term versus long-term popularity, depending on the schedule with which they introduce new videos into the library and the sizes of their servers.For example, if small caches are used to handle errors in demand prediction, then a cache replacement policy such as the LRU is preferred [22].On the other hand, the LFU policy is shown to work well for proxy caching environments that are aimed to offload the demand in central VoD servers [22], [24].Both LFU and LRU popularity estimation methods have less flexibility to adjust to varying requirements.Accordingly, a straightforward solution is to implement different algorithms depending on the requirements of different circumstances.
LRFU popularity estimation is a combination of LFU and LRU popularity estimations, and has been demonstrated to achieve better performance than LFU or LRU [21], [25].More specifically, the LRFU algorithm de-emphasizes the frequencies of old references by using a combined recency and frequency (CRF) value, which quantifies the likelihood that the object will be requested in the near future [25].When a cache replacement is required, the algorithm replaces the object with the least CRF value, regarded as the least popular object.The CRF value of object b after the k th reference at time t k , is given by, where δ = t k −t b , and τ is a parameter ranging from 0 to 1 that controls the decay.The parameter t b is the time of the most recent reference to the object b before the current reference.
On the other hand, in the frequency computation in the GDS algorithm, a similar decay function is used to de-emphasize the significance of past accesses.That is, the frequency of object p is defined iteratively for the (i + 1) th reference to this object as, where t is the time elapsed since the last reference and T is a constant that controls the rate of decay [26].
The EWMA popularity estimation method calculates the average access interval of each video, where the historical information is forgotten exponentially over time.When a request to a specific video arrives, the average request time interval of that video will be updated as where T interval represents the average access interval of a video, t last is the time when that video is requested most recently, t is the current time, (t − t last ) is the latest requested interval, and p is a forgetting rate which ranges between 0 and 1.The newest access interval (t − t last ) is assigned a weight (probability) p ∈ (0, 1] in determining the overall average request interval.If p is fixed at p = 1, this algorithm becomes LRU [27].

III. MIPD
In this section, we introduce the IPD-based object popularity estimation concept and describe our proposed MIPD method that estimates the popularity of videos using IPD.MIPD is an extension version of Last-k popularity estimation method proposed in [14].The Last-k method is IPD-based, combining the good features of the LFU and the LRU methods in a very different way than LRFU and EWMA do, whilst avoiding their drawbacks.The Last-k method performs as accurately as the LFU method when the popularity of the videos is stable while responding faster than LFU to newly introduced videos.
In this paper, we undertake a deep analysis of the IPD based popularity estimation method.We show that there is the best value of the parameter k that obtains high accuracy in popularity ranking of video objets using the Last-k ranking via IPD.We also introduce two methods that can be used to find suitable values for k.In addition, we discuss how we can reduce the overhead and memory requirements of MIPD for caching replacement.

A. WHAT IS IPD
In MIPD popularity estimation, we compare the popularity of items by their IPD at a given point of time.We will describe, here, how to calculate the IPD of each video.
Fig. 1 illustrates the arrival of requests at the server for different objects with varying popularity.Each arrow represents a request arrival and the horizontal axis represents time.We define the j th backward distance in time (BDT) of object i as the time since the j th last arrival for object i, and we denoted this by t i,j .For example in Fig. 1 the 3 rd BDT of object 1, denoted by t 1,3 , is the time from the 3 rd to the last arrival of video 1.We define the inverted pyramid distance (IPD) of video i as the sum of last k BDTs of video i, and denote it by T i .That is: The request arrival rate for a popular video is high, and gaps between requests are low.When we calculate the IPD of a video, the time since the last request of the video is added k times, the time duration between the 2 nd last request and the last request of that video is added (k − 1) times, the time duration between the 3 rd last request and the 2 nd last request is added (k − 2) times, and so on.Consequently, when IPDs are calculated, the importance of a request is de-emphasized in a pyramid-like fashion, depending on how old the request is.As we will show, the IPD of a video increases when the popularity of the video decreases.
If the video lacks popularity, or is newly introduced, and the total number of requests is not sufficiently large, Equation ( 4) cannot be used to calculate its T i value.While it is the case that another formula with different weighting could be used to calculate a popularity estimate in this case.There are several reasons not to do so: 1.We could have less confidence in the estimate; 2. Choosing to shift an object to the cache should be a conservative decision as it consumes resources and so should be done with high confidence in the performance improvement; 3. Small sample size estimation might be subject to gaming by, for instance, video companies wishing to increase the perceived popularity of their products.These issues of conservative estimation, small sample size and the potential of gaming are related in a complex way and are not discussed further here.We will return to them in future work.
MIPD is applicable in scenarios where we need to identify the C most popular objects from a total of N objects and rank those C objects in order of their popularity.MIPD calculates the most recent k BDTs of each of the objects and uses it to calculate their IPDs.After that, MIPD sorts the objects in increasing order of their IPDs, corresponding to the decreasing order of their popularity.The value of k is selected based on the value of C, so that the estimated popularity rankings meet a pre-defined confidence level, at least for the C most popular objects.
The pseudo-code of the MIPD method is given in Algorithm 1.As described there, the inputs to the MIPD algorithm are k, C, the arrival time matrix t, and the current time t 0 .The arrival time matrix t is derived from the request arrival history as follows, t(i, j) = 0, if there are no past requests for object i j th last request time of video i, otherwise. ( The vector T stores the IPDs of the videos.It is initialized with all of its elements set to 0. It can be seen from the pseudocode that the complexity of the MIPD algorithm is linear in k.While Algorithm 1 shows a straightforward version of the MIPD algorithm, we can use a different implementation with lower overhead and memory requirements if the popularity estimation needs to be carried out repeatedly on the arrival of each new request, such as in cache replacement algorithms.We will discuss this implementation in Section III-C.
end while 10: T(i) ← m j=1 t i,j + (k − m)t i,m 13: end while 14: Sort T and index top C associated objects

B. CHOOSING THE VALUE K FOR MIPD
We show here that MIPD can accurately rank objects in order of popularity for an appropriate value of k, and discuss two methods that can be used to find suitable values for k: the first of these gives an exact minimal value for k bearing in mind the level of confidence we require, whereas the other allows us to calculate some rates of growths of upper bounds for k as the confidence level and size of cache increase.As discussed before, when IPDs are calculated, the importance of a request is de-emphasized in a pyramid-like fashion depending on how old the request is.This allows quick response to changes in video popularity.Even if the popularity of video objects changes frequently with time, such changes still occur in the order of hours, and such periods typically contain thousands of request arrivals.In the following analysis, we consider a small time window, during which the popularity of the objects can be assumed to remain stable.We assume, also, that the request arrival process is approximated by a pseudohomogeneous Poisson process during this time.This is justifiable because VoD requests are normally generated by a large number of independent users [28], [29]; we will discuss this further in Section IV.
Let λ i be the mean request arrival rate of video i.Note that the j th BDT of object i is equivalent to a summation of j request inter-arrival times of object i.Since the request arrivals for video i follows a Poisson process with rate λ i , the inter-arrival times follow an exponential distribution with mean 1/λ i .As a result, the j th BDT of video i is Erlang distributed with scale parameter of λ i and shape parameter j.In turn, the IPD is a summation of k such Erlang distributed BDTs, so that the IPD of video i is also Erlang with scale parameter λ i and shape parameter K , where Recall that, in our proposed popularity estimation method, objects are ranked in decreasing order of popularity, by indexing them in increasing order of IPD.Consider two objects i and i + 1, such that the popularity of object i is greater than the popularity of object i + 1.Let T i,k and T i+1,k be the IPDs of the objects i and i + 1, respectively.We can use these IPDs to correctly identify the more popular object out of these two only if T i,k < T i+1,k with high probability.This probability is [30]: where f (.) and F(.) are the PDF and Cumulative Distribution Function (CDF) of the Erlang distributed random variables T i,k and T i+1,k , respectively.Theorem 1: The probability of IPD of object i being less than the IPD of object i + 1 is given by: where p i = λ i λ i +λ i+1 , K = k(k+1) 2 , and I p i (.) is the regularized Beta function.
Proof: See Appendix A. Now, let i and i + 1 be two videos with consecutive popularity ranks, where the popularity of i is greater than the popularity of i + 1.For this scenario, we plot the variation of P(T i,k < T i+1,k ) as a function of k in Fig. 2, for a range of values of i.As can be seen from Fig. 2, the larger k is, the closer the value of P(T i,k < T i+1,k ) is to 1.This indicates that for k sufficiently large, the more popular object out of i and i + 1 can be identified with confidence based on their IPDs.Moreover, as can be seen from Fig. 2, for a given k, P(T i,k < T i+1,k ) is closer to unity for popular objects (e.g., i = 5) than for less popular objects (e.g., i = 150).In other words, for a given k, IPD-based popularity estimation gives better performance for popular objects.
Intuitively, the value of k; that is, the number of request arrival times from each of the videos used for the popularity estimation, should depend on the number of objects that we need to rank with a given accuracy.Suppose we need to identify and rank the C most popular objects from a total of N .As discussed in the preceding subsection, for a given k,  IPD-based popularity estimation is more robust for popular objects than for less popular objects.More specifically, for a given k, P(T C,k < T C+1,k ) ≥ 1 − δ implies that P(T i,k < T i+1,k ) ≥ 1 − δ for all 1 ≤ i ≤ C. As a result, a value of k large enough to identify the popular object out of the two objects with popularity ranks C and C + 1 with a certain level of confidence, is large enough to rank all the other objects with popularity ranks less than C, with the same if not higher confidence.
Our exact method to find a suitable value for k is based on (24).We define a confidence bound δ and find the value of k that satisfies: The formula in (8), while relatively easy to compute using, say, Matlab, gives little or no insight, into the analytical form of the value of k as a function of the confidence level (1 − δ) and the size C of the cache, or at least of λ C .Such an expression is available provided we acccept an approximate method for computing k.This, at least in principle, gives such an analytical expression.Our approach is as follows.The IPDs are random variables, each taking different values with varying probabilities, and scattered around their means.This statistical noise of IPDs can affect popularity estimation and result in errors in the ranking.Popularity estimation becomes more robust if gaps between IPDs are increased.Our proposed approximate method to determine the value of k is based on this observation.A lower bound L C , and an upper bound U C for the IPD of C th popular object are defined in Eq. ( 9), and we select k for the algorithm such that this IPD satisfies these bounds with high probability.By doing so, we ensure that the gaps between the IPD of the C th most popular object and the two nearest IPDs, i.e., the IPDs of the (C + 1) th and the (C − 1) th popular objects, are sufficiently large.This is graphically illustrated in Fig. 3(a) and Fig. 3(b).
, and Recall that the IPD of each object is Erlang distributed.In the first subfigure of Fig. 3(a), we have selected k = 50.The gap between the IPD of the C th most popular object and (C − 1) th popular object is sufficiently large, while the gap between the IPD of the C th most popular object and the (C + 1) th most popular object is not sufficiently large.In the second subfigure, we have selected k = 200.The gaps between the IPD of the C th most popular object and the two nearest IPDs are sufficiently large in this case.These two graphs demonstrate that for larger k, better performance of the popularity estimation is achieved.
Definition 1: The IPD-based popularity estimation method ranks C most popular objects in order of their popularity with a confidence of 1 − δ if, (10) Note that 1 − δ measures the confidence of popularity estimation with respect to the least popular video of interest.For example, if we need to identify the 100 most popular videos to store in a cache server, 1 − δ measures how good the popularity estimation of the 100 th most popular video.The confidence level of the popularity ranking of each video more popular than this one will be higher when its popularity is estimated using IPDs.We see then that 1 − δ provides a lower bound for the confidence of the overall popularity estimation.It is enough, then, to consider popularity of higher ranked videos to achieve good performance, though overall performance can be further improved by decreasing δ.
As the IPD T C,k of the C th most popular object is Erlang distributed with scale λ C and shape K = k(k + 1)/2, its expected value is: We use Chernoff bounds for T C,k to further simplify Eq. ( 10).
As proved in Appendix B, the Chernoff Bounds for the random variable T C,k are given by: Now, substituting ( 12) and ( 13) in (10), we see that the values of k that satisfies: VOLUME 8, 2020 also satisfies Eq. ( 10).Solving Eq. ( 14) for K gives: We note that, for the Zipf distribution, and C > 1, and this combined with a simple derivatives argument for the function φ shows that the second expression in the max in ( 15) is always larger so that the correct choice of k is the least for which We observe that k found in this way increases as the square root of ln(δ).Numerical simulations suggest that, as a function of C, the value of k, calculated by this method, increases by less than C log(C) and more than C √ log(C), so slightly greater than linearly.

C. OPTIMISATION OF MIPD FOR CACHE REPLACEMENT
In Section III-A, we presented a straightforward implementation of MIPD in Algorithm 1.This implementation is suitable for scenarios where popularity does not need frequent reestimation.However, some cache servers are updated whenever a user requests a video not already stored in the cache.When these cache updates take place, the least popular objects are identified and removed from the cache if the remaining cache storage space is insufficient to accommodate the newly requested object.Hence, content management algorithms for cache servers need to estimate popularity at every new request arrival.For rapid cache replacement, the IPD algorithm needs to be implemented with low computational overhead.
We now discuss how to reduce the overhead and memory requirements of MIPD to carry out popularity estimation on the arrival of each new request.Let T (τ ) i,k denote the IPD of object i at τ seconds.Here, τ is the time at which the previous request arrived at the cache server, which coincides with the time of the previous popularity estimation.Suppose that a new request for object h arrives at the cache server at τ + τ seconds.We need to re-estimate the popularity of the objects at τ + τ seconds by recalculating the IPD of each of the objects.Note that when the popularity of the objects are estimated at τ + τ seconds, the IPD update for each of the objects is: where τ (i) j is the j th last request arrival time of object i with respect to time τ .In Eq. ( 17), only the last (k − 1) request arrival times are considered when calculating the IPD of object h, since the most recent request for the object h arrived at τ + τ seconds.On the other hand, the last k arrival times of each of the objects are considered when IPDs are Eq.calculated for the other objects.Rewriting (17), we see that: By ( 18), we need only the current IPDs and the k th last arrival time of each object, to re-calculate the new IPDs on receiving a new request.The k th last arrival time of each object can be maintained in the form of first in first out (FIFO) queues, each with a length of k.When a request arrives, the relevant FIFO queue is updated with the current time, while removing the last entry in the queue.The ejected entry is the τ k in Eq. ( 18).It should be clear then that IPD-based object popularity estimation can be carried out using Eq. ( 18) with low overhead, making it a suitable cache replacement algorithm.

IV. VOD SYSTEM MODELING A. POPULARITY DISTRIBUTION IN VOD SYSTEM
We now consider a wide range of real-life issues.We discuss many recently references and data-sets, and in particular, we design our experiments based on what we learn from studies that use real-life time-dependent video arrival processes.We design our experiments to follow realistic data traffic models.According to [11], [31], the distribution of the video popularity follows Pareto s Law.Namely, this distribution is heavy − tailed which implies that very few popular movies dominate most of the population of the requested videos.This assertion has also been supported by Choi et al. [32], that models the prevalence of videos by the Zipf distribution (which is also heavy-tailed).They propose that, among N videos, the popularity of video i, where i is the rank of popularity in this VoD system, has request frequency of W /i (W is a normalization constant satisfying the equation).And all the videos should satisfy: where α is the parameter of the distribution.Fig. 4 shows the relationship between the frequency of request and the popularity rank of videos, which is modeled by the Zipf distribution.

B. REQUEST ARRIVAL PROCESS IN VOD SYSTEM
Many research results on experiments using real VoD systems (e.g., [28], [29], [33]) show that video request traffic tends to follow a Poisson process over a sufficiently short period.Liu et al. [28] observed VoD user behaviors over a period of five months, and their research results provide a reliable basis for simulating VoD systems.They indicate that the request traffic for a VoD system can be approximated by a pseudostationary Poisson process over a time interval of about thirty minutes.Tanzil et al. [33], in order to generate user request data based on real-world YouTube reports, also assume that the request arrival process to a VoD server follows a Poisson process during a specified time slot (around twenty minutes).A pseudo-stationary Poisson process is defined by where k is the request count in a time period of length t, and λ is the arrival rate which, in our case, changes every half an hour.The probability distribution of the time between two requests is exponentially distributed with probability density function where x is the time elapsed between two requests.For this (continuous) random variable, λ is the mean arrival rate and 1/λ is the mean time between two requests.Accordingly, we will use a time-dependent Poisson process that relies on the studies in [29], [31], [34] for parameter fitting.

C. PARAMETERS SETTING IN SIMULATION
A single level VoD system with one VoD server and a local server is considered for our evaluation.Fig. 5 illustrates graphically the logical architecture of this simulated network.
In our simulations, we estimate the popularity of the videos at the VoD server and pre-place the full videos at the local server based on their estimated popularity.As shown in Fig. 5, when a user requests a video, the local server can send out the video to the user if the requested video is pre-placed in it.
Otherwise, the request is served by the VoD server.Moreover, we assume that the video sessions run the full length of the videos.This is, however, not the case in real VoD systems as shown by recent research [11], [17], [18].Nevertheless,  this assumption does not affect the validity of our evaluation, as we are only concerned with the popularity estimation aspect of content placement.Of course, we can achieve better performance by pre-placing partial videos, but this is not the focus of our work.In our VoD system model, the total number of videos is N = 5000.According to the Zipf distribution, around 20% of the 5000 videos occupy 80% of the requests in a day while the rest of the videos have far fewer requests.The cache size was selected to be 200 videos.Since the video request process is pseudo-Poisson, we divide a day into 48 halfhour periods, while the simulation continues for 24 hours.According to [28], the arrival rate of the requests varies over a day, and the daily pattern of the request traffic remains similar across different days.Also diurnal request patterns vary across movies.More specifically, some videos are more popular in the morning (e.g., cartoons and morning news), some videos are more popular during the noon break time (e.g., news) and other videos are more popular in the evening (e.g., movies, drama).This behavior of the request traffic in VoD systems is also reported in [35], [36].For this reason, the request counts for popularity ranked movies in an hour may not follow a Zipf distribution.So we can assume the request counts for popularity ranked movies in a whole day follow Zipf distribution.
We classify these 5000 videos into three classes (Class M , Class A , Class E ) according to which part of the day (morning, afternoon, evening, respectively) their popularity is highest.
Let λ i be the request arrival rate for video i averaged over a day, i = 1, 2, . . ., N and λ i,j the request arrival rate for movie i in the j th half hour, j = 1, 2, . . ., 48.Then we have λ i = 48 j=1 λ i,j , where each λ i s follows a Zipf distribution.According to [28], the average number of hourly requests is around 10 6 requests per VoD system of around 250,000 videos, so in this case N i=1 λ i = 10,000,000.The parameters of the VoD system and their meanings and defaults are provided in Table 1.

V. PERFORMANCE EVALUATION
We describe here the simulation experiments to evaluate the performance of our proposed MIPD popularity estimation algorithm.To understand how k influences the accuracy and efficiency of MIPD, we run the experiments with different values of k.We also compare the performance of MIPD against other common popularity estimation methods: LFU, LRU, LRFU and EWMA.It is assumed that the popularity of the videos is re-evaluated, and the pre-placed content is updated accordingly, every half hour.By default, we use k = 456, the value given by the first method, for comparison.This value is determined via the approximate method described in Section III-B, so as to rank the 5000 objects with confidence greater than 1−δ = 0.9.For the LRFU popularity estimation method, using τ = 0.001 in Eq. ( 1) has given the best performance for our situation and so this value of the smoothing parameter is used for LRFU in our simulations.For the LFU method, we assume that the entire history is available for popularity estimation, namely, 24 hours in the current scenario.Moreover, we assume that the recency of the most recent request of each video is used for the LRU method.

A. THE EFFECT OF K ON MIPD PERFORMANCE
We use the Hit ratio (HR) and the Rank Bias (RB) as the performance metrics in our evaluation.We consider a scenario where video popularity ranks estimated at the VoD server are used to determine the videos to be pre-placed at the local server, to maximized HR.An efficient popularity estimation method should accurately identify the latest trends on user demands, and therefore should yield better HR at the local server.hit ratio (HR) is defined as: For a large sample of N videos requests, the number of requests for the i th most popular video is approximately p i N , where p i is the request probability of the i th most popular video.The HR expected at the local server is: where S is the set of popularity ranks of the videos that are pre-placed.Given a local server with constrained storage capacity, we should then pre-place the most popular videos in order to achieve the maximum HR.As shown in Fig. 6, HRs of different k fluctuate through a day because the popularity of each of the various classes of videos changes according to different patterns.Additionally, as k increases, the speed of response of MIPD to changing popularity decreases, because larger k means MIPD will take account of more of the historical requests.More specifically, as shown in Fig. 7, with increasing k, the accuracy of MIPD first increases gradually and then shows a downward trend: larger k does not guarantee better performance, especially when the popularity rank of videos changes frequently.We also use RB as performance metrics in this paper; this is the absolute deviation between the actual rank and the estimated rank of a specific video.In this simulation, we assume that the diurnal request pattern of each of the popularity types remains constant, so that their ranking will not change during the day.In order to compare the different value of k through RB, we introduce three videos into the system with known popularity rank of 2, 20 and 200 at the 20 th half an hour.As mentioned before, popularity of videos is estimated every half hour.
Fig. 8 shows that when a new video (for which the popularity rank is around 200) is introduced, it ranks very low in popularity as it has no record of requests.When the new video are subsequently requested by users, accuracy of popularity estimation increases gradually.The black dashed lines in the figures are the actual popularity rank of the video.For the different values of k, the estimated popularity fluctuates around the true popularity over time.Variation of the value of k changes the responsiveness and accuracy of MIPD.As k increases, the bias of MIPD decreases gradually suggesting that the accuracy of MIPD is improved.On the other hand, the speed of response of MIPD shows a downward trend.
To investigate the accuracy of MIPD for different popularity ranked videos, we accumulate the absolute deviation through time after new videos were introduced into system,   and compare the variation in ranking of videos.As Fig. 9 shows, the higher the popularity of a video, the more accurate the MIPD rank estimate.The average absolute deviation of a video with popularity rank = 2 is smaller than that for a video with popularity rank = 20.For a video with popularity rank = 200, the average absolute deviation is much larger.

B. COMPARISON BETWEEN DIFFERENT METHODS
In this subsection, we also use HR and RB as the performance metrics in our evaluation.We use k = 456, noting that speed of response to variation of popularity rank with time is not a major issue, because the 200 most popular videos to be put in the cache have a high enough request rate.We compare the HRs achieved by the local server under each of the considered popularity estimation method, when there are three different classes of videos with varying popularity patterns over the day.We compare the accuracy for ranking the newly introduced videos of these five methods.Fig. 10 shows that the HRs of the five methods fluctuate throughout the day.Note that we record the HR from t = 5 (half hour) because, before that time, there are insufficient request records to calculate the HR.It is clear that MIPD achieves the best performance.MIPD has a higher HR than the other four methods most of the time, especially when the popularity of videos is changing.This suggests that MIPD can respond very effectively to popularity changes of videos.LRU has the lowest HR, despite a rapid response to popularity change.The HRs of LRFU and LFU methods are also considerable, even becoming higher than MIPD at some times, but their response to popularity change is relatively slow.The reason is that LRFU and LFU methods take more account of historical requests than other methods.The EWMA method also shows a good performance in simulation: it has a higher HR than LRU, and responds faster than LFU and LRFU when the popularity of videos change.
We also use RB to compare the different popularity methods, as shown in Figs.11(a The performance of LRU is the worst, consistent with its poor HR.The performances of both LFU and LRFU are good, and as request statistics become sufficient, their accuracy becomes higher.It also indicates that LFU and LRFU can perform much better when the popularity of videos is stable.EWMA also makes a relatively accurate prediction of the popularity rank of videos and its fluctuations, though larger than those of MIPD, are relatively small.

VI. CONCLUSION
In this paper, we have described a significant extension (called MIPD) of the IPD method for object popularity estimation reported in [14], where it was called the Last-k method.MIPD, derived from the arrival times of requests for each of the objects, de-emphasizes the importance of a request in a pyramid-like fashion, according to how old the request is.The popularity of a video reflected by its IPD is a seamless blend of long-term and short-term popularity of that video.We analyzed the relationship between the value of the parameter k in MIPD and confidence in the estimation, and demonstrated how to find a large enough k to achieve a given confidence requirement.We also discuss how to reduce the overhead and memory requirements of the algorithm.The implementation can refresh the popularity rank efficiently making it suitable in scenarios where the popularity of the objects are required to be estimated very frequently.We use a range of simulations to justify MIPD in comparison with four other popularity estimation methods proposed in the literature.The simulation results reveal that MIPD popularity estimation method outperforms the other four methods according to both hit rate (HR) and rank bias (RB).

APPENDIXES APPENDIX A PROOF OF THEOREM I
We restate Theorem 1 here: The probability of IPD of object i being less than the IPD of object i + 1 is given by: P(T i,k < T i+1,k ) = I p i (K , K ), where p i = λ i λ i +λ i+1 , K = k(k+1) 2 , and I p i (.) is the regularized Beta function.
Substituting the relevant expressions in the equation above and simplifying, we have: (λ i t) j j! dt.(24) Using the integral identities from [37], yields a further simplification of Eq. (24): where p i = λ i λ i +λ i+1 .Let X be a negative binomial random variable with parameters K and p i , i.e., X ∼ NB(K , p i ), and F K ,p i (x) denote the cumulative distribution function (CDF) of X .Then, Eq. ( 25) can be reduced to: Using the expression for the CDF of the negative binomial distribution [38], we can write: where I p i (.) is the regularized Beta function.Finally, by substituting Eq. ( 27) in ( 26), we have which completes the proof of Theorem 1.

APPENDIX B CHERNOFF BOUNDS FOR ERLANG RANDOM VARIABLES
The Chernoff Bounds for a random variable X are given by: and 19944 VOLUME 8, 2020 where M (t) = E[e tX ] is the moment generating function of X .The moment generating function of an Erlang distributed random variable with scale parameter λ and shape parameter K , is [39]: Substituting (31) in ( 29) and (30), gives the Chernoff bound for the Erlang distributed random variable T C,K as: and By differentiating (32) and (33) with respect to t and setting the result to 0, we see that the values of t that minimize the right hand sides of ( 32) and ( 33) are, and respectively.Finally, substituting Eq. ( 34) and Eq. ( 35), in (32) and (33), respectively, we obtain: and

FIGURE 1 .
FIGURE 1. Request arrivals at the server.

FIGURE 2 .
FIGURE 2. Variation of P(T i ,k < T i +1,k ) with k.

FIGURE 5 .
FIGURE 5. VoD system architecture considered for the evaluation.
HR =Number of requests served by the local server Total number of videos requested .

FIGURE 6 .
FIGURE 6. Variation of Hit Ratio with time for different value of k.

FIGURE 7 .
FIGURE 7. Average hit ratio with different k.

FIGURE 8 .
FIGURE 8. Variation of Rank200 bias with time.

FIGURE 9 .
FIGURE 9. Average Bias with different k.

FIGURE 10 .
FIGURE 10.Variation of Hit Ratio with time for different methods.

FIGURE 11 .
FIGURE 11.Variation of Rank bias with time for different methods.
)-11(c).When new videos are introduced into the library at t = 20 (half an hour), none of the methods can identify the popularity rank of those new videos promptly because of insufficient data.Figs.11(a)-11(c) show the variations of the estimated video popularity ranking with time for three videos with different popularities.The statistical variability in the popularity ranking using MIPD appears less than in other methods, regardless of whether the rank is k = 2, k = 20 or k = 200.

TABLE 1 .
The parameter of VoD model.