Real-Time Video Content Popularity Detection Based on Mean Change Point Analysis

Video content is responsible for more than 70% of the global IP traffic. Consequently, it is important for content delivery infrastructures to rapidly detect and respond to changes in content popularity dynamics. In this paper, we propose the employment of on-line change point (CP) analysis to implement real-time, autonomous and low-complexity video content popularity detection. Our proposal, denoted as real-time change point detector (RCPD), estimates the existence, the number and the direction of changes on the average number of video visits by combining: (i) off-line and on-line CP detection algorithms; (ii) an improved time-series segmentation heuristic for the reliable detection of multiple CPs; and (iii) two algorithms for the identification of the direction of changes. The proposed detector is validated against synthetic data, as well as a large database of real YouTube video visits. It is demonstrated that the RCPD can accurately identify changes in the average content popularity and the direction of change. In particular, the success rate of the RCPD over synthetic data is shown to exceed 94% for medium and large changes in content popularity. Additionally,the dynamic time warping distance, between the actual and the estimated changes, has been found to range between20sampleson average, over synthetic data, to52samples, in real data.The rapid responsiveness of the RCPD is instrumental in the deployment of real-time, lightweight load balancing solutions, as shown in a real example.


I. INTRODUCTION
V IDEO content is projected to account for 82% of the global Internet traffic by 2020, significantly increased from 72% in 2016 [1]. In parallel, novel emerging networking, cloud and edge computing paradigms with significant elasticity capabilities appeared recently, e.g., software-defined networks (SDN) [2], cloud orchestration proposals [3] and content distribution networks (CDNs) [4]. These advances offer the means to respond quickly to changes in content popularity dynamics with appropriate adaptations, e.g., in terms of efficient server resource allocation schemes, load balancing or content caching. As a result, the early detection of changes in content popularity [5], [6] is proving a highly important topic and can have a significant impact on the network traffic and the utilization of servers.
So far, the vast majority of research efforts have focused on the prediction of content popularity dynamics, as opposed Sotiris  to their real time detection, which is the focus of this study.
There is a multitude of reasons as to why the precision of even state-of-the-art prediction algorithms can be impaired. A variety of factors -both from the digital and the physical world -can influence the users' Internet surfing behavior, e.g., [5]: (i) the quality, type (e.g., commercial or user-provided) and lifetime of content; (ii) its relevance to users and physical events; (iii) the social interactions between users; and (iv) the content promotion strategies involved. Importantly, mid-term and longterm content popularity prediction [7] -and corresponding adaptations in the network or cloud environment -can prove highly inaccurate [8] and thus result in sub-optimal service planning, provisioning, and utilization of resources or violation of service level agreements.
In this work, to address the aforementioned shortcomings of the commonly employed prediction algorithms, we propose a corresponding detector, referred to as the "real-time change point detector" (RCPD). The RCPD is compatible with modern, flexible networking and cloud approaches, that are highly adaptive and can respond to short-term network dynamics. With accurate, on-line content popularity detection, discrepancies between inaccurate predictions and actual changes can be alleviated. The RCPD is real-time, lightweight, accurate and is parameterized autonomously by analyzing historical data.
In the RCPD, we employ the change point (CP) detection theory and algorithms; their suitability is confirmed against a large number of synthetic as well as real YouTube video datasets. In this contribution, the early detection of changes in the average content popularity is addressed with a novel CP detection methodology, consisting of a training phase, using historical data, and, an on-line phase. In the training phase, we employ a modified off-line CP detection scheme to configure the on-line (sequential) algorithm's parameters. This approach is shown to greatly improve the accuracy of the online detector, as in essence, the algorithm parameterization is not arbitrary but rather extracted from corresponding historical data. To the best of our knowledge, it is the first time in the literature that retrospective (off-line) and sequential (online) CP detection schemes are combined in a single algorithm operating autonomously (i.e., without manual configuration of parameters).
Besides that, our approach complements the off-line scheme with an improved time-series segmentation heuristic for the detection of multiple CPs. Furthermore, we propose two possible variations for the on-line CP algorithm, the first based on the standard cumulative sum (CUSUM) procedure [9] and the second on the ratio-type CUSUM procedure [10] 1 . Additionally, we introduce two alternative indicators to detect the direction of changes: the first one is directly derived from the statistical test of the on-line CP procedure, while the second is based on a modified exponential moving average filter, extensively used in econometrics. As discussed in Sections III and IV, the RCPD combines all the above mentioned algorithmic elements, and is based on sufficiently general and convenient assumptions. Moreover, unlike other approaches e.g., [11], we employ methods that allow dependence between observations (in the form of t−dependence), leading to more realistic assumptions for the statistical structure of the content visits.
We evaluate the proposed detector and its individual algorithmic components (i.e., the off-line / on-line test statistics, the time-series segmentation algorithm and the trend indicator), over synthetic and real YouTube content views data. Our experiments using synthetic data, generated by an autoregressive moving average (ARMA) filter, demonstrate: • The superior performance of the proposed time-series segmentation heuristic over the standard approach, improving the true alarm rates by up to 43%. • The ability of the two proposed trend indicators to identify the direction of estimated changes, with successful identification rates exceeding 99%, in all cases. • The RCPD performance; the true alarm rates surpass 94% for medium / large changes in the mean number of content views, while the corresponding CP identification lag ranges between 10 to 20 instances, confirming the real-time operation of the detector. On the other hand, the RCPD achieves very small false alarm rates, well within the limits of the statistical error specified by the chosen significance level of the CP algorithms. Furthermore, our tests on real YouTube content views datasets show that: • YouTube video views match the underlying assumptions of the RCPD, i.e., the content popularity time-series datasets can be modeled as t-dependent. • The RCPD can detect CPs in more than 70% of the videos in our dataset, implying a sufficiently high number of content popularity changes and the suitability of the CP theory framework for content popularity detection. • The successful CP direction identifications exceed 91%, i.e., the proposed trend indicators work for real data. • The average dynamic time warping (DTW) distance [12], [13] between the identified CPs and a benchmark offline algorithm was estimated to be 52 time instances on average, showcasing the rapid responsiveness of the RCPD. • The overall processing cost of the RCPD is very low; notably, it took less than one second to process 882 videos on a typical personal computer (PC). Finally, as a proof-of-concept, we demonstrate the applicability of the proposed algorithm in a real load balancing scenario. We provide a set of measurements showcasing improvements in terms of the clients' connectivity time to download specific content, without a significant impact on the utilization of the content servers. This is achieved due to the deployment of additional content caches, an event triggered by the output of the proposed RCPD detector.
The rest of the paper is organized as follows. In Section II, we discuss our approach with respect to related works. In Section III, we present the training phase of the RCPD algorithm, while the on-line phase is discussed in Section IV. In Section V, we present four experiments over synthetic data, providing an extensive validation of the RCPD and its subroutines, while in Section VI, we discuss corresponding experiments using a database of real YouTube video views. In Section VII, we demonstrate the load balancing gains achieved through the use of the RCPD, in a realistic content provisioning scenario. Our conclusions and directions for future work are presented in Section VIII.

II. RELATED WORKS
In this Section, we discuss how this work relates to the literature of video content popularity prediction, on one hand, and, anomaly detection (i.e., CP analysis), on the other hand.
The topic of content popularity attracted a lot of attention in recent years, because of its importance in a number of applications, such as network dimensioning (e.g., capacity planning or scaling of resources), on-line marketing (e.g., advertising, recommendation systems) or real-world outcome prediction (e.g., analysis of economical trends) [5]. The main approaches used for content popularity estimation can be categorized as: (i) cumulative growth studies, estimating the "amount of attention" from the publication instance to the prediction moment [6]; (ii) temporal analysis approaches, i.e., how content visits evolve over time [14]; and (iii) clustering methods of content with similar popularity trends [7]. We note that many content popularity studies consider the aggregate behavior of a particular content, e.g., [6], [14], whereas we study the real-time behavior of video views time-series. In addition, studies using clustering methods [7] are based on content popularity prediction and adopt parametric models, unlike the RCPD algorithm that is non-parametric.
To the best of our knowledge, our earlier conference paper [15] is the first in the literature proposing CP techniques [16] for content popularity detection. The RCPD algorithm falls into the general category of anomaly detection [17]; in essence, we assume that no changes in popularity constitutes the normal behavior of video content and search for deviations from this behavior. Non-parametric anomaly detection has typically been considered for the detection of abnormalities in the network traffic. As an example, in [18] an algorithm was proposed based on the Shiryaev-Roberts procedure for anomaly detection in computer network traffic. In [19] and [20], CUSUM based approaches were introduced for the detection of SYN attacks.
Further examples of parametric anomaly detection methods include [21], in which a bivariate sequential generalized likelihood ratio test (LRT) was proposed, accounting for the packet rate -assumed to follow a Poisson distribution -and the packet size -assumed to follow a normal distribution. Other parametric anomaly detection approaches assume a particular underlying process for the normal behavior and search for anomalies on the residuals of the process. For example, in [22], Kalman filtering is combined with several CP methods, such as CUSUM and LRT, to detect anomalies in origin-destination flows. In [23], traffic flows (in the form of TCP's finite state machine), are modeled using Markov chains and an anomaly detection mechanism based on the generalized LRT algorithm is developed.
As opposed to previous content popularity prediction works, in this paper we introduce a novel CP detection methodology that provides accurate, lightweight, autonomous and on-line CP detection of content popularity. We formulate the detection of a change in the average content popularity as a statistical hypothesis test and employ non-parametric procedures to avoid a particular distribution assumption (such as a specific copula model). This context ensures low convergence time since it avoids estimating a large number of model parameters and restrictive assumptions that may not match the structure of the time-series. Furthermore, we avoid problems of parametric models that require parameters' fitting and selection, which become challenging as new data become available. In the proposed RCPD algorithm, an off-line phase specifies important parameters for the on-line phase; these parameters are re-evaluated dynamically after a detected CP. Our loadbalancing experiments, elaborated in [4], demonstrate the RCPD's behavior in a real test-bed deployment.
Up to now there are only a handful of proposals addressing the challenges of new flexible networking and cloud architectures accounting for content popularity. Exceptions include [24] in which a logistic-loss machine learning approach to content popularity prediction is applied for a Fog RAN environment, and, our recent papers [4] and [15]. In [4], the algorithm -outlined in [15] and presented extensively in the present -is integrated into an elastic CDN framework based on lightweight cloud capabilities using Unikernels. [4] focuses on the platform details rather than on the CP algorithm; it confirms experimentally the suitability of the latter for relevant flexible network and cloud architectures. The first detailed description of the proposed CP detection algorithm is presented in the following Sections, along with a rich set of validation results. We elaborate on the two phases of the RCPD in Sections III and IV respectively and provide the corresponding pseudo-code.

III. TRAINING (OFF-LINE) PHASE
In this Section, the training phase of the algorithm is discussed and the fundamental components of the off-line scheme are presented. We note that standard off-line CP schemes can only detect a single CP. To address the issue of detection of multiple CPs, we modify the basic algorithm with a novel time-series segmentation heuristic, that belongs to the family of binary segmentation algorithms.

A. Basic Off-line Approach
Let {X n : n ∈ N} be a sequence of rdimensional random vectors (r.v.). The first dimension represents the number of views for a specific video content within a time period n ∈ {1, . . . , N }, while the other dimensions could be optionally used to represent other content popularity features, such as likes, comments, etc. We assume that X 1 , ..., X N can be written as, where {µ n : n ∈ N} is the mean value of video visits, {Y n : n ∈ N} a random component with zero mean E [Y n ] = 0 and positive definite covariance matrix, E Y n Y T n = Σ, while E[·] denotes expectation. We further assume that the timeseries is t-dependent, implying that for The model in (1) and the underlying assumption of t−dependence are in agreement with statistical characterizations of the distribution of visits, which have been shown in numerous analyses to follow either a Zipf [25] or a Zipf-Mandelbrot [26] distribution for both commercial and user-generated content. Furthermore, it is confirmed in the real YouTube datasets used in the present work through the evaluation of the time-series's Hurst exponents, as will be discussed in Section VI-A.
The off-line analysis tests the constancy (or not) of the mean values up to the current time N . Hence, we define the following null hypothesis of constant mean, against the alternative, . . = µ N , indicating that the mean value changed at the unknown (time) point k * of f ∈ {1, . . . , N }. Considering (1) and the corresponding assumptions for the stochastic process X n , we develop a non-parametric CUSUM test statistic following [27]. The test statistic T S of f , can be viewed as a max-type procedure, where the parameter C n is the retrospective CUSUM detector, X i denotes the sample mean. Ω N represents a suitable estimator of the long-run covariance Ω, where The estimator should satisfy, where P − → denotes convergence in probability. Several estimators have been proposed in the literature that satisfy (5), including kernel-based [28], bootstrap-based [29], etc. Considering our requirement for real-time detection (low computational time), a kernel-based estimator is more suitable; in this context, we employ the Bartlett estimator, so that which satisfies (5), while the function k BT (.) corresponds to the Bartlett weight, and Σ w denotes the empirical auto-covariance matrix for lag w, Finally, we chose W = log 10 (N ) as in [28]. The long-run covariance is involved in the test statistic to incorporate the dependence structure of the r.v. into the statistical analysis, through the integration of second order statistical properties. This approach is suitable for the targeted context since we avoid a restrictive assumption for the dependence structure of the observations. Going back to the basic question of rejecting or not H 0 , we need to obtain critical values, denoted by cv of f , for the test statistic. We approach this issue by considering the asymptotic distribution of the test statistic under H 0 , (1), and W (t) denotes the standard Brownian motion with mean 0 and variance t.
The critical values for several significance levels α can be computed using Monte Carlo simulations that approximate the paths of the Brownian bridge on a fine grid. The last step is to estimate the unknown CP, defined previously as k * of f , under H 1 , given by:

B. Extended Off-line Approach
The above hypothesis test identifies the existence of at most one CP and does not ensure that the sample remains statistically stationary in either direction of the detection. In particular, by construction (see (2)), the off-line test statistic detects the CP with the highest magnitude. Therefore, for the detection of multiple CPs we need to rephrase the hypothesis test H 1 , as follows: A greedy technique to identify multiple CPs is the binary segmentation (BS) algorithm. The standard BS algorithm relies ; T S of f : the off-line test statistic (eq. 2) 4: ; cv of f : the critical value (eq. 9) 5: ;k * of f : the identified CP (eq. 10) 6: calculate T S of f (start, end) and cv of f 7: if T S of f (start, end) > cv of f then 8: calculatek * of f and store it in array s end if 19: end procedure on the general concept of binary segmentation and is an extension of the single CP estimator. First, a single CP is searched for in the time-series. In case of no change, the procedure stops and H 0 is accepted. Otherwise, the detected CP is used to divide the time-series into two segments in which new searches are performed. The procedure is iterated until no more CPs are detected. The BS algorithm is lightweight (computational time O(N logN )), while its conceptual simplicity leads to efficient implementations. On the other hand, it has been shown in the literature [30], [31], that the standard BS algorithm tends to overestimate the number of CPs, as it does not cross-validate them after their detection.
In the extended off-line approach, we propose the modification of the standard BS with a cross-validation step of the estimated CPs. The cross-validation step is similar to that used in the iterative cumulative sum of squares (ICSS) segmentation algorithm [32], which is used to search for CPs on the marginal variance of independent and identically distributed (i.i.d.) r.vs. In the extended off-line algorithm we consider the CPs estimated from the standard BS in pairs and check if H 0 is rejected in the segment delimited by each pair. If H 0 is not rejected in a particular segment, then no change can be detected in it; as a result, all CPs that fall in the respective segment are eliminated. The improvement, in terms of accuracy, is shown through simulation results in Section IV. The pseudo-code of the modified BS algorithm is given in Algorithm 1; note that we integrate the algorithm with the test statistic T S of f , given in equation (2) and the corresponding critical value (cv of f ) given in (9).

IV. ON-LINE PHASE
In this Section, we describe the on-line scheme that includes: (i) two alternative CUSUM-type approaches for the detection of a change in the mean; and (ii) two alternative approaches to estimate the direction of a change.

A. On-line Analysis
We rewrite equation (1) in the form, where µ, I ∈ R r represents the mean parameters before and after the unknown time of possible change k * ∈ N * respectively. As a reminder, the first dimension of the timeseries represents the video views; the rest could be likes, comments, etc., and {Y n : n ∈ N} is a random component. The term m ∈ N denotes the length of the training period, i.e., an interval of length m over the historical period during which the mean is assumed to remain unchanged, so that, To satisfy this assumption, the modified off-line CP test previously presented is run in order to identify a suitable m. With m determined, the on-line procedure can be used to check whether (12) holds as new data become available.
In the form of a statistical hypothesis test, the on-line problem becomes, H 0 : I = 0, The on-line sequential analysis belongs to the category of stopping time stochastic processes. In general, a chosen online test statistic T S on (m, l) and a given threshold F (m, l) define the stopping time τ (m): implying that T S on (m, l) is calculated on-line for every l in the monitoring period. The procedure stops if the test statistic exceeds the value of the threshold function F (m, l). As soon as this happens, the null hypothesis is rejected and a CP is detected. The following properties should hold for τ (m), ensuring that the probability of false alarm is asymptotically bounded by α ∈ (0, 1), and, ensuring that under H 1 the asymptotic power of the statistical test is unity. The threshold F (m, l) is given by, where: (i) the critical value cv on,a is determined from the asymptotic behavior of the stopping time procedure under H 0 by letting m → ∞; and (ii) the weight function, depends on the sensitivity parameter γ ∈ [0, 1/2).
We use two different CUSUM approaches; the standard [9], with test statistic denoted by T S ct on , and, the ratio-type [10], with test statistic denoted by T S rt on . Their corresponding critical values are denoted by cv ct on,a and cv rt on,a , respectively, and their stopping rules by τ ct (m) and τ rt (m), correspondingly. Both tests are based on the sequential CUSUM detector, The standard CUSUM test is expressed as: where Ω m is the estimated long-run covariance, defined as in (4), that captures the dependence between observations. Then, the stopping rule τ ct (m), is defined as: where the ℓ 1 norm is involved to modify T S ct on so that it can be compared to a one dimensional threshold function. The critical value, cv ct on,a , is derived from the asymptotic behavior of the stopping rule under H 0 : Unlike standard CUSUM tests, ratio type statistics do not require to estimate the long-run covariance and are also considered for this reason in this analysis. The precise form of the chosen statistic is given in the following quadratic form, with its equivalent stopping rule, Similarly to the standard CUSUM, the critical value, cv rt on,a , is estimated by, Similarly to the off-line case, the on-line critical values for both test statistics can be computed using Monte Carlo simulations, considering that, cv rt on,α = sup The estimated on-line CP,k * on , is derived directly from the value of the stopping time τ (m), as,

B. Trend Indicator
Considering the on-line procedure, the hypothesis H 1 in (13) is two-tailed because the test statistics T S rt on and T S ct on are formulated in a quadratic form and a ℓ 1 norm, respectively. This means that the stopping time rule τ ct (m) (or τ rt (m)) cannot be an indicator of the direction of a detected change. Thus, to estimate the direction of a change we introduce two indicators: i) based on the CUSUM detector in (17), denoted by T I ts ; and ii) based on the moving average convergence divergence (MACD) filter [33], denoted by T I f . Focusing on T I ts , the indicator is directly derived from the form of the sequential CUSUM detector E(m, l). As shown in (17), the detector compares the mean value of the observations that are collected on-line for a chosen monitoring period l, with the mean value of a subsample of the historical data over the predetermined training sample. Hence, for a detected CP, we have that, E(m, l) > 0, denotes an upward change E(m, l) < 0, denotes a downward change .
However, in certain cases, limiting the window over which the direction of a change is estimated to the immediate neighbourhood of a detected CP can be unreliable due to the continuous variability of the time-series. In such cases, we have to estimate the direction of a change by incorporating more elaborate filters; in this context, we estimate the direction of detected changes by applying the MACD indicator. The MACD is based on an exponential moving average (EMA) filter, of the form, with p denoting the lag parameter. The MACD series can be derived from the subtraction from a short p 2 lag EMA (sensitive filter) of a longer p 3 lag EMA (blunt filter), as described below: The trend indicator T I f is then obtained by the subtraction of a short p 1 lag EMA filter of a MACD series from the raw MACD series, as described below In the evaluation of T I f three exponential filters are involved. In essence, T I f is an estimation of the second derivative over an interval around the change (considering that the subtraction of a filtered variable from the variable generates an estimate of its time derivative). In contrast to other works [33], we only adopt T I f to characterize the direction from the specific value of T I f at the estimated time of change. We ; m: training period 5: ; l: monitoring time frame 6: ; d: period assuming no change 7: ; T S on : on-line test statistic (eq. 18 or 21) 8: ; cv on : critical value (eq. 24 or 25) 9: ;k * on : the estimated on-line CP (eq. 26) 10: ; TI: trend indicator (T I ts or T I f ) 11: for n in X n do 12: if n = m s then 13: s=MBS(1,m s ,1) ; calculate off-line CPs 14: if array_length(s) > 0 then end for 30: end procedure announce an upward change if T I f (k * on ) > 0, otherwise, if T I f (k * on ) < 0, a downward change. Finally, we propose a modification of the trend indicator T I f , converting it from a point estimator to an interval estimator; instead of evaluating T I f (k * on ), we propose to evaluate the trend indicator at a time interval (k * on ,k * on + h), where h is a threshold parameter: The proposed T I f (k * on , h) modification improves the estimator's accuracy; the calculation of the sum of a multitude of observations, after a CP, can smooth out a potential false onepoint estimation, especially in the case of small changes.

C. Overall Algorithm
We outline in Algorithm 2 the RCPD algorithm, as a combination of the off-line and the on-line phase, in the form of pseudo-code. Beginning from the initial value set for the monitoring starting period, denoted by m s , the modified offline algorithm is applied over the whole historical period; the training period m is then defined as the interval elapsed from the last detected off-line CP (if one exists) to m s . In pseudocode this step is described in lines 14 − 18. As a second step, the on-line test statistic, T S on (m, l) in (14), is applied for a specified monitoring time frame l. If a content popularity change is detected at time instancek * on , the trend indicator subroutine is called to reveal the direction of change. 2 At this point the procedure stops and a new starting point for the monitoring window is defined as m s =k * on + d, where d is a constant value specifying a period assuming no change. This step is described in lines 19 − 29. Otherwise, if no change is detected after a maximum of l instances, the procedure restarts from the last time point, m s = m s + l.

V. VALIDATION OF THE RCPD USING SYNTHETIC DATA
In this Section, we validate the performance of the overall algorithm by performing a series of four different experiments on synthetic data. The use of synthetic data allows us to regulate the parameters of the time-series in terms of mean changes and thus obtain quantitative metrics for the performance of the proposed algorithms.
The choice of the time-series model for the generation of the synthetic data is based on the fact that several studies have shown that ARMA models capture very well content popularity evolution. For example, in [7] it has been concluded that an ARMA model can efficiently describe the daily access patterns of YouTube content, based on an extensive analysis of 100, 000 videos. Similarly, in [34] an ARMA model has been proposed for the estimation of the popularity of video content. Motivated by these findings, for the validation of the proposed algorithm we use an ARMA(1, 1) time-series. We generate 1, 000 time-series of length N = 600 samples. Without loss of generality, we assume an initial mean value µ 0 = 0, noting that the performance of the RCPD is independent of the initial mean value and only depends on the magnitude of the variation of the mean value before and after a CP.
In the first experiment, we begin with a comparison of the standard BS to the proposed modified BS algorithms described in Section II-B. We perform two tests; in the first test we introduce two CPs at the instances k * i = (iN )/3, i = 1, 2, while in second test, we introduce four CPs at k * i = (iN )/5, i = 1, . . . , 4. The two tests are repeated for three different values of the magnitude of a change µ 1 = 1, µ 2 = 1.5, µ 3 = 2, i.e., we randomly increase or decrease the mean value by µ j , j = 1, . . . , 3 at the time of change. Table I summarizes our findings regarding the true and false alarm rates of the two algorithms.
Both the standard and the modified BS algorithms provide similar true alarm rates, exceeding 94%, in the first test. On the contrary, in the more challenging second test, the superiority of the modified BS over the standard BS algorithm is clear. The modified BS algorithm achieves true alarm rates in excess of 70%, even in the demanding scenario of a relatively small change in the mean µ 1 = 1. On the other hand, the standard BS algorithm has in all cases a true alarm rate of less than 50%, rendering any CP detection highly questionable. The second test confirms that the standard BS algorithm is prone to an overestimation of the number of CPs as shown by the high false alarm rates (in excess of 25% in all cases), an issue that can be effectively addressed by the modified BS algorithm which scores false alarm rates below 10%.
Next, in the second experiment, using the same test sets as above, we measure the success rates achieved by the proposed trend indicators T I ts in (27) and T I f in (31) for h = 0 (larger thresholds provided the same true identification rates). The results are summarized in Table II. The two trend indicators successfully identify the direction of a change in more than 99% of the cases, which shows that they can be interchangeably employed. In the assessment of the performance using real datasets in Section VI, we solely employ the T I f trend indicator.
We proceed by assessing the proposed RCPD algorithm using both the standard and the ratio type CUSUM. In this third experiment, we measure the average number of CPs detected, averaged over 1, 000 simulations when a single CP is introduced in the ARMA time-series at the time instance N 2 = 300. We consider different values for the magnitude of change µ ∈ {0, 0.5, 0.7, 1, 1.2, 1.5, 2} and the monitoring window length l ∈ {25, 50, 100}. We note that we included the case µ = 0 -which corresponds to the absence of a change -to evaluate the false alarm rate of the overall algorithm. We omit results with true alarm rates lower than 50% as they are statistically unreliable. In terms of the remaining algorithmic parameters, we have set the minimum distance between two successive CPs to d = 50, 3 the sensitivity parameter to γ = 0.25 [35] (we choose a neutral value as the behaviour of γ is well studied), and, the significance level to α = 0.05. In each test of the third experiment we measure the exact number of CPs detected, tabulated as one the following three values: i) 0 when (falsely 4 ) no CP is detected; ii) 1 when (correctly) a single CP is detected; and iii) > 1 when (falsely) multiple CPs are detected. Finally, we measure the median of the time instance of the single CP detection, denoted byk * . 5 The results of this experiment are presented in Table III and are discussed below. Firstly, we observe that both the standard and the ratio type CUSUM achieve very small false alarm rates, inferior to 6% when no CP is inserted, irrespective of the choice of l. On the contrary, the choice of l readily affects the algorithm's success rate for µ > 0; for small changes in the mean value, µ = 0.5, 0.7, a larger monitoring window l increases the algorithm's true alarm rates in identifying correctly the existence of the CP. For medium and high changes in the magnitude of change µ = 1, 1.2, 1.5, 2, it is observed that a high true alarm rate -in excess of 93% for the standard CUSUM -is achieved, while choosing a smaller l can slightly increase the true alarm rates. As a result, depending on the application, a choice of a larger l can be appropriate if the algorithm is to be employed as a universal CP detector. Alternatively, a smaller l can be chosen when the focus is on the identification of large changes in the mean value, i.e., we are interested primarily in detecting CPs of larger magnitude.
Secondly, we observe that overall, the ratio type CUSUM is outperformed by the standard CUSUM in all tests. Con-sequently, the standard CUSUM based detector can be considered as an efficient universal choice. Finally, we observe that the lag betweenk * and the actual instance of change at the point 300 decreases with increasing µ, ranging from 343 to 307, while it appears less sensitive to changes in l. This demonstrates that, intuitively, larger magnitude changes can be detected faster. This result is important for load balancing applications as it provides us with the means to quickly respond to significant changes in the network traffic.
Subsequently, in Table IV in the following page, we present the outputs of the fourth experiment in which we assess the performance, averaged over 1, 000 simulations, of the RCPD algorithm when two CPs are inserted in the ARMA time-series. We introduce a change at the time instance k * 1 = N 3 = 200 and a second CP at the time instance k * 2 = 2N 3 = 400. We investigate the true and false alarm rates for µ ∈ {0.5, 0.7, 1, 1.2, 1.5, 2} and l ∈ {25, 50, 100}, while the rest of the parameters retain the values of the third experiment. In each test of the fourth experiment we measure the exact number of CPs detected, tabulated as one the following three values: i) < 2 when (falsely) less than two CPs are detected, ii) 2 when (correctly) two CPs are detected, and iii) > 2 when (falsely) more than two CPs are detected. Finally, we measure the median of the detection instances of the two CPs, denoted byk * 1 andk * 2 , respectively (we omit the results with true detection rate lower than 50%).
Similarly to the third experiment, we observe that increasing l increases the true alarm rates for small magnitudes in the mean changes µ = 0.5, 0.7, while this trend is reversed in high magnitudes µ = 1.5, 2. For medium values µ = 1, 1.2 the effect of l on the true alarm rates is less than 2%. Furthermore, in agreement with the outputs of the third experiment, with increasing µ the algorithms achieve increasingly high success rates, over 93% for the standard CUSUM when µ ≥ 1.
In addition, the superior performance of the standard CUSUM is re-confirmed in all the tests of the fourth experiment. Finally, with respect to the lag in the estimation of the time instances of the CPs, we observe that, as in experiment three, larger magnitude changes can be detected faster, e.g., for µ = 2 a lag inferior to 11 instances is observed for both CPs with the standard CUSUM, irrespective of l.
Concluding this Section, we have presented an extensive set of experiments that provide strong evidence for the efficiency of the proposed algorithms. We have explicitly demonstrated the superiority of the modified BS over the standard BS algorithm and confirmed the validity of the proposed trend indicators. Subsequently, we evaluated the performance of the overall algorithm for various values of µ and l. We have shown that the RCPD algorithm achieves extremely high true alarm rates for larger values of µ, while increasing the length of the monitoring window l can significantly impact the performance for small values of µ. Finally, overall, the standard type CUSUM outperforms the ratio type CUSUM and should be preferred.
VI. PERFORMANCE EVALUATION USING REAL DATA In this Section we investigate the performance of the proposed algorithms using a real dataset provided within the framework of the CONGAS project [36]; the dataset consists of the number of views of 882 YouTube videos, observed over N = 1, 000 instances.

A. Statistical Properties of the Real Dataset
First, we evaluate the validity of the most important underlying assumption of this analysis, that the content popularity can be modelled as the sum of a constant mean and a weakdependent (t-dependent) stochastic process, as given in (1). A first intuitive method to test whether the time-series is shortrange dependent (SRD) is through its autocorrelation function (ACF). The ACF for a weakly-stationary process {X t : t ∈ N} with mean value µ is given by, Note that if ∞ k=−∞ ρ(k) → ∞ the process has long-range dependence (LRD), while if ∞ k=−∞ |ρ(k)| < ∞ it exhibits SRD. To distinguish between these two phenomena, we use the following functional form of the ACF, where C i > 0 and H ∈ (0, 1) is the Hurst exponent characterizing the LRD,i.e., H ∈ (1/2, 1) indicates the presence of LRD. It is challenging to accurately estimate the Hurst exponent out of real data [37] and several methods have been proposed in the literature [38]. In this work, we apply two semi-parametric tests, identified as accurate options among others presented in the survey paper [38]. The first method uses the discrete second order derivative in the time domain while the second uses the discrete second order derivative in the wavelet domain. Both methods estimate an H ≤ 0.5 for   (1).

B. Performance of the Off-line Training Phase
First, we test the hypothesis H 0 of no change in the mean structure on our dataset. H 0 is rejected in approximately 70% of the cases, for a significance level of a = 0.05. This outcome indicates that CP algorithms can identify changing content dynamics in real times series.
Next, we estimate the number of CPs, by applying the extended off-line algorithm. The corresponding results are illustrated in Fig. 1 and indicate a sufficiently high number of content popularity anomalies (i.e., mean changes). Hence, a CP analysis is indeed a suitable tool for content popularity detection.
To evaluate the performance of the proposed trend indicator T I f , we need a baseline independent assessment of the direction of change. We declare that a real increase in the mean value of content visit exists if or, that a real decrease in the number of visits exists if where i = 1, · · · , card(k * of f ),k * 0 = 1,k * s+1 = N and E[·] denotes the numerical average. We test the modified MACD T I f on two sets of videos. The first set, Video Set 1, comprises the whole dataset, while the second set, Video Set 2, comprises only the videos with a considerable average number of visits (> 10), i.e., for which, E[X(1) : X(1000)] > 10.
The percentage of successful T I f identifications are tabulated in Table V  identifications. As expected, as h increases the procedure works better. More specifically, an h ≥ 5 parameter choice yields a success rate of 95%, while if a more agile estimation is needed then an h ≥ 3 still maintains a 91% accuracy. Considering the interim time between consecutive changes, we deduce that an h ≤ 7 is preferable. Regarding Video Set 2, we see that the results are highly improved, indicating that the procedure works even better for the most popular videos. In practice, this represents the more interesting scenario as it will have a greater impact in terms of the applied load balancing mechanism. Furthermore, in Fig. 2, the time instances of upward and downward changes are shown in the form of a boxplot. It is intuitive that upward changes occur earlier than downward changes. Moreover, Fig. 2 demonstrates that the multitude of upward changes is greater than the respective of downward changes, indicating that decreases in popularity are sharper than increases. In particular, we estimated that out of the total number of changes, 67% are upward.
Finally, we analyze the interim time between consecutive CPs. The results presented in Fig. 3 illustrate the existence of a sufficiently large gap between consecutive potential changes. 90% of the intervals corresponding to consecutive CPs exceed 70 time instances and only 5% of them are shorter than 50 time instances, ensuring that a sufficiently large training window can be applied. The results depicted in Fig. 3 allow adjusting parameters of the on-line phase, in particular the minimum time interval between consecutive changes, denoted by the parameter d.

C. Evaluation of the RCPD Algorithm
In the previous subsection we have evaluated the performance of the off-line algorithm and demonstrated its efficiency as well as how it is employed in determining parameters of the on-line phase, such as the interval assuming no change d and the threshold parameter of T I f h.
We further employ the off-line algorithm as a benchmark against which the performance of the RCPD algorithm will be evaluated. We note that the off-line analysis provides the best possible statistical detection of the actual mean changes, as off-line algorithms operate retrospectively over the entirety of each of the time-series. Thus, in absence of a priori knowledge of the actual CPs in the real data (as opposed to the synthetic data in which the CPs were controlled), we evaluate the performance of the RCPD procedure by measuring the "similarity" of its outputs (detected CPs, instances of detection and trends) to the corresponding outputs of the off-line version.
As the number of detected CPs and / or their exact positions are likely to differ at the output of the retrospective (off-line) and of the RCPD algorithm, in order to obtain a measure of their similarity, we estimate their dynamic time warping (DTW) distance. The DTW is a dynamic programming tool that measures distances between asynchronous sequences and is widely used by the speech processing community [12].
The results are presented in Fig. 4, where the estimated DTW distances are depicted for several values of the monitoring window length l ∈ [40, 150], to investigate the consistency of parameter l over different values. In the RCPD algorithm we use d = 50 (minimum distance between two changes) and have set the sensitivity parameter to γ = 0.25. The estimated mean DTW distance for the standard CUSUM is 52 and for the ratio-type CUSUM is 73. For comparison purposes, we note that the corresponding DTW distance over the synthetic data is 20 for medium / large changes, while the true CP detections are around 95%. As a result, we can infer, that the outputs of the on-line algorithm, using the standard CUSUM, are "very close" to the outputs of the benchmark off-line algorithm. In agreement with our observations over the synthetic data, the DTW distance using the ratio-type CUSUM is clearly larger.
We also study the magnitude of the detected CPs. We define as the CP magnitude the percentage-wise change in the  Additionally, for illustration purposes, we depict the RCPD algorithm's outputs for four different time-series. We set the beginning of the monitoring period at m s = 200 and monitoring horizon l = 50, the on-line parameter g = 0.25 and the significance level to a = 0.05. The corresponding results are depicted in Fig. 5, showing the estimated CPs by applying the standard CUSUM and the ratio type CUSUM procedures, respectively. In both cases, the estimated changes correspond to the real content popularity changes; visual inspection suggests that the performance of the standard CUSUM is more reasonable (e.g., Fig. 5d). The RCPD, as it is illustrated in Fig.  5b seems to be adaptable to "fast" changes; without getting "confused" by random peaks in the time-series, such as those in Fig. 5a or in Fig. 5c.

D. Time Dependencies of Piecewise time-series
We also measure the autocorrelation function of the piecewise -divided by the detected CPs -time-series. Results are tabulated in Table VII and verify the short dependence structure of the dataset; significant lags in time dependencies higher than 30 instances can be found in less than 5% of the time-series. Furthermore, the fact that the ACF of the piecewise time-series drops to zero quickly indicates that the detected CPs split the time-series into stationary segments, which, additionally, confirms indirectly the accuracy of the off-line CP estimations over the changes in the real data.

E. Computational Complexity and Scalability
Finally, we present a MATLAB implementation of the overall algorithm with a large number of time-series (882 in this experiment) to quantify its performance in terms of processing cost. The computational time is measured on a Lenovo IdeaPad 510-15IKB laptop, with an Intel Core i7-7500U @ 2.70 GHz processor and 12 GB RAM. In Fig. 6, we show the aggregate processing cost per time instance for the two on-line methods and the total number of time-series. For the first 100 time instances, the algorithm collects the initial data, since it bootstraps. The peaks indicate the off-line part of the algorithm, which is more processing demanding mainly due to the segmentation algorithms running in parallel. The online part in the standard on-line algorithm indicates a linear complexity, since it is based on (18), while the equivalent quantity in (21) of the ratio-type is more CPU intensive, justifying the comparatively higher processing cost of the latter algorithm. In both cases, the aggregate processing cost is typically much less than a second, which demonstrates the lightweight nature of the proposed scheme. Such results could be further improved with a distributed deployment of scheme replicas since each of the time-series could be processed independently.

VII. THE RCPD ALGORITHM IN A LOAD BALANCING SCENARIO
In this Section, we demonstrate our proposal in a real content distribution scenario, balancing the traffic between web clients and content caches with a bespoke DNS-based load-balancer. We implement the RCPD algorithm as a clientserver MATLAB application. The RCPD engine receives periodic content popularity measurements; if a CP is detected, the corresponding upward or downward changes are signalled to the load balancer. The load balancer: (i) distributes the load between the deployed content caches, in a round-robin fashion; (ii) tracks content visits and communicates them to the RCPD engine; and (iii) deploys or removes content caches based on the RCPD outputs.
We implement the web clients using with the httpperf tool (https://github.com/httperf/httperf). The number of clients at each time instance is based on a real time-series of YouTube content views, illustrated in Fig. 7a. In practice, an experimental run without the RCPD mechanisms uses three content caches constantly and a run with the RCPD mechanism enabled uses initially two and then three, four and five content caches, after each of the three detected change points, respectively. As we show in Fig. 7b, the web clients improve their connectivity times to download the content, while as demonstrated in Fig. 7c the CPU utilization in the servers hosting the content remains almost the same. A relevant experimental platform is presented in [4].

VIII. CONCLUSIONS AND FUTURE WORK
In this paper, we developed the RCPD, a novel algorithm for the real-time detection of changes in the mean value of content popularity. Approaching the problem statistically, we efficiently combined off-line and on-line non-parametric CUSUM procedures to avoid restrictive assumptions for content popularity behavior and to reduce the overall computational cost. We divided the algorithm in two phases. The first phase is an extended retrospective (off-line) procedure with a modified BS algorithm and is used to adjust online parameters, based on historical data of the particular video. The second phase integrates one of two alternative trend indicators to the sequential (on-line) procedure, to reveal the direction of a detected change. We provided extensive simulations, using synthetic and real data, that demonstrated the performance of the proposed algorithm for the successful identification of content popularity changes in real-time. We also demonstrated through experimental measurements that the RCPD's processing cost is almost imperceptible. Finally we provided proof-of-concept by applying the algorithm in a load balancing application, highlighting its efficiency in a realistic setting.
In future work, we will evaluate the proposed scheme using multi-dimensional time-series to capture more accurately the dynamics of content popularity better (e.g., incorporate additional dimensions with the number of likes, comments, etc.) and in different contexts, such as on the real-time resource utilization of servers. We will also investigate and further extend the algorithm's scalability properties, theoretically and experimentally, i.e., estimate the number of videos that can be analyzed in parallel. Our aspiration is to conduct real largescale CDN experiments utilizing a distributed architecture with multiple content popularity analyzers, monitoring in real-time clusters of videos at a minimum overall processing cost.