Scalable Video Caching for Information Centric Wireless Networks

Recently, information centric wireless network (ICWN) has been concerned due to its flexible network structure and high efficiency of content delivery. Meanwhile, scalable video coding (SVC) is a promising solution to provide high quality of video services. Combining ICWN and SVC is expected to improve the performance of wireless video delivery services. Since caching strategy plays a significant role in ICWN, in this paper, we address the caching problem for scalable videos over ICWN in mobile scenarios. By jointly considering the layered structure of SVC and the hierarchical architecture of ICWN, we formulate an optimization problem to minimize the average download delay. A novel layered hierarchical caching method is proposed for solving the problem. Furthermore, we focus on a special but common case in which the above delay minimization problem is equivalently transformed into a cache hit ratio maximization problem. A simplified algorithm with 1/2 approximation ratio is provided. Finally, simulation results show that our proposed caching schemes outperform baseline methods in cache hit rate and delay performance.


I. INTRODUCTION
With the development of wireless communication technologies and the proliferation of smart devices, we have witnessed the explosive increment of mobile traffic. According to [1], the total amount of mobile traffic is predicted to reach 136 exabytes (EB) per month by the end of 2024, and over 74% of the traffic will be contributed by videos. Meanwhile, it have been observed that the majority of the traffic is generated from some replicate popular video contents [2].
To effectively cope with such a traffic growth, network operators aim at not only expanding the network capacity, but also changing the network structure to prevent unnecessary traffic forwarding and large delays. Information centric wireless networks (ICWN) is regarded as a promising technology to achieve the aforementioned goals. Different from the content delivery network (CDN) which caches contents in the application layer outside the access networks, ICWN provides in-network caching capabilities nearby mobile users, and reduces the average download delay when the requested contents are locally cached. Moreover, The associate editor coordinating the review of this manuscript and approving it for publication was Byung-Seo Kim .
benefiting from edge computing, ICWN has the potential to learn users' behavior, such as mobility pattern and request distribution. Compared with traditional information centric networking (ICN), ICWN offers unique opportunities, since proactive caching can be possible in the wireless nodes before user demand.
In addition, with respect to adaptive video streaming services, the same content may be downloaded with various bit rates due to users' inhomogeneous screen resolutions and channel conditions. As a part of the H.264/265 standards, scalable video coding (SVC) can adjust to various network conditions and user requirements, while guaranteeing acceptable video quality. Specifically, SVC encodes a video into a base layer and multiple enhancement layers [3]. The base layer provides a fundamental video quality and carries essential information, and the enhancement layers provide different levels of improved video qualities. Before decoding of a higher enhancement layer, the base layer and all lower enhancement layers should be correctly decoded [4]. The caching of SVC-based contents can be more efficient if its layered video structure is considered.
Given that the scalability provided by SVC is beneficial for video delivery and ICWN has been widely concerned due to its flexible network structure, SVC-based video over ICWN is expected to improve the performance of wireless content delivery services. Therefore, in this paper, we consider such a video steaming scenario, and focus on caching strategies for better video delivery.
Caching in wireless networks has been widely concerned [5]- [10]. The concept of wireless caching is introduced in [5] to alleviate the explosive increase in video-on-demand transmissions. Popular video files are cached in equipments with high storage capacity to assist the macro base stations by handling local requests. After that, a large number of researchers study the issues of wireless caching in different aspects. For example, in [6], [7], the authors focus on the hierarchical structure of networks, and provide insights into the design of cooperative caching algorithms based on the topology of hierarchical tree. In [8], the authors aim at minimizing the average delay-cost of content delivery in cloud radio access networks through cooperative hierarchical caching. The work in [9] extends the terrestrial wireless caching into the sky, and addresses the problem of proactive content deployment in cache-enabled unmanned aerial vehicles. The authors in [10] consider dynamic video streaming in device-to-device assisted wireless networks, and design a method to cache video files of varying quality levels to enhance the quality of user experience (QoE). However, all the above mentioned researches take no consideration of caching SVC-based contents, and thus cannot fully exploit the benefits of layered videos.
To utilize the scalability provided by SVC, wireless caching dedicated for SVC-based contents have also been actively studied [11]- [18]. The authors in [11] investigate cache-enabled wireless networks to provide scalable video services with multiple perceptual qualities. Based on the theory of stochastic geometry, the expressions of local serving probability, ergodic service rate and service delay are derived. To improve energy efficiency (EE), the authors in [12] propose energy-efficient caching schemes for SVC-based videos over heterogeneous networks. In [13], the problem of joint power allocation and SVC-based content caching is addressed, and the QoE is improved by emphasising user's reception capability. For improving successful transmission probability, the authors in [14] investigate a layer placement scheme for SVC videos, where multiple video layers are stored in the cache devices of small-cell base stations (SBSs). In [15], caching strategies in large-scale wireless networks are analyzed and optimized, which reveals the relationship between layers of SVC-based videos. Moreover, some recent studies address the transmission latency in SVC-based video caching. The policies proposed in [16], [17] aim to reduce the average delivery delay of SVC videos in content delivery networks and heterogeneous wireless network. In [18], the authors provide an analytical characterization of the video delivery delay in a cache-enabled network, where the available video layers are stored based on their popularity. However, all of the above SVC-based caching schemes have not considered user mobility in wireless networks, and may not be applied with high efficiency when users are moving from one access point to another.
In addition, caching strategies play a significant role in traditional ICNs or content centric networks (CCNs), where routers are cache-enabled. There are already some researches on the caching issue for SVC-based videos over ICN. For instance, the authors in [19] propose a mechanism for cache management and request forwarding policies for scalable video streaming in ICN, which provides video faster to the users, especially the mandatory base layer. In [20], to improve the QoE for adaptive scalable video streaming services, layered cooperative cache management (LCC-VCCN) scheme is proposed. Neighbor nodes within broadcast range are selected to cache one or several SVC layers content, which reduces content retrieval time and prevents stalls of the video playback. However, the wireless factors such as channel fading, cell association, and user mobility involved in ICWN have not been jointly considered in the aforementioned caching strategies.
Motivated by the above discussion, we would like to design proactive caching schemes for SVC-based videos over ICWN, and take into account the following specific characteristics. Firstly, caching more layers results in higher scalability for video delivery. However, the cost will increase accordingly. Therefore, the number of video layers should be cached properly. Secondly, a mobile user requesting adaptive video streaming services often selects an appropriate video bitrate according to its channel bandwidth [21]. The factors which have important impact on the download rate should be taken into consideration, such as channel fading, cell association, and user mobility. Thirdly, the hierarchical network architecture of ICWN and layered structure of SVC videos have to be jointly considered. In summary, the novelty and technical contributions of this work are as follows: • We address the problem of scalable video caching over ICWN, and aim to minimize the average transmission delay. Both the hierarchical caching architecture of ICWN and the layered feature of SVC-based videos are taken into account. An optimization problem is formulated and proved to be NP-hard. By simplifying it into a special knapsack problem and solving it through machine learning, we propose a layered hierarchical caching scheme.
• We consider a special case of the above delay minimization problem, which can be equivalently transformed into a simple cache hit rate maximization problem. A heuristic algorithm is proposed which provides at least 1/2 approximate ratio of the optimal solution.
• Simulation results show that our proposed caching strategies achieve improvements in both transmission latency and cache hit rate compared to baseline caching strategies.
The rest of the paper is organized as follows. In Section II, we present the system model, and formulate a delay minimization problem for wireless SVC-based videos over VOLUME 8, 2020   ICWN. In Section III, the hardness of the problem is analyzed, and a layered hierarchical caching scheme is designed by combining machine learning and optimization methods. A special case of the delay minimization problem is illustrated in Section IV, and an approximate solution is derived with 1/2 approximate ratio. In section V, through numerical simulations, the effectiveness of our proposed caching algorithms is verified. In Section VI, the work of this paper is concluded.
Notations of some important symbols are summarized in Table 1.

A. NETWORK MODEL
As shown in Fig. 1, we consider a set U of mobile users, each of which is requesting SVC videos and moving in an area covered by ICWN. A set K of radio access points (APs) are located in the area and connected to a centralized edge controller (or edge router). The controller takes the role of aggregating SVC video traffic between the video server and APs. Both the edge controller and APs are equipped with caches. This architecture enables centralized optimization for content caching and cooperation in wireless access networks.
If a user in AP k requests a video which is neither cached at AP k nor at the edge controller, the video will be downloaded through Internet. Generally, the transmit delays from different network nodes are unequal. We denote by d R , d 0 and d k the average latency per bit incurred for a user associated with AP k to download videos from Internet, from the edge controller and from AP k, respectively.
We adopt the Gauss-Markov process to characterize users' mobility [22]. Let vectors u (t) = ( ux (t), uy (t)) and z u (t) = z ux (t), z uy (t) respectively denote the location and velocity of user u at time t, where x and y represent the subscripts of two orthogonal components in a two-dimensional (2-D) area. The velocity in the next time slot is given by (1) where is the memory parameter to reflect how current velocity affects future velocity, µ = (µ x , µ y ) and β represent the central tendency and the dispersion of velocity, and w(t) is an independent 2-D Gaussian process with zero mean and unit variance. The parameters µ, β and are assumed to be known a priori. The location of user u in the next time slot can be expressed as if user is out of bounds, When a user is outside the bounds, its location in the next time slot remains unchanged. Based on the above Gauss-Markov process, the edge controller is aware of user mobility, and determines the association between users and APs in the next time slot. We assume that each user is associated with its closest AP. Let the binary decision variable a ku indicate the association result, which is determined according to the following strategy: where r ku denotes the distance between user u and AP k, and the term i∈K,i =k P(r 2 ku < r 2 iu ) is the probability that AP k is the nearest AP of user u.
Let U k = {u|u ∈ U, a ku = 1} denote the set of users that will be connected to AP k. Assume each AP transmits video data using fixed power, and the bandwidth of AP k is equally allocated to the users in U k . The average downloading rate of user u ∈ U k is calculated by 77274 VOLUME 8, 2020 where W k and P k denote the total available bandwidth and the transmit power of AP k. h ku and σ 2 0 denote the channel fading and the noise power, respectively. Note that although inter-AP interference may exist, it can be well coordinated via different techniques [23]. Thus, for analytical simplicity, we assume that a user experiences a roughly static interference, which is similar as the model setting in [24], [25]. In addition, the average channel fading h ku is given by h ku = A r (r ku ) τ , where A r is a constant coefficient in large-scale fading model, and τ is the path loss exponent.
Next, we derive the cumulative distribution function (CDF) of R ku . According to (1), in time slot t + 1, the moving speed z u (t + 1) of user u is a gaussian random variable with mean value ξ zu = z u (t) + (1 − ) µ and variance . Based on (2), the square of distance r 2 ku is the sum squares of two independent gaussian variables, which follows a non-central chi-square distribution. Therefore, the probability density function (PDF) of r 2 ku is given by where I 0 (·) is the zero-order modified Bessel function, δ 2 = δ 2 z ( T ) 2 denotes the variance of r 2 ku and ξ ku is the mean value given by where k x and k y are the horizontal and vertical coordinates of AP k. Finally, the cumulative distribution function (CDF) of R ku can be derived based on (5), and is given by: where is derived from (4).

B. VIDEO DELIVERY MODEL
A set V of videos are stored in the remote server. Each video is encoded into L layers according to SVC protocol. Therefore, the video server can provide at most L versions for a requested video with different bitrates. Video v is divided into a set M v of segments, each of which lasts for T v seconds. Without loss of generality, we assume that T v is equal to the duration of a time slot T for analysis simplicity. The average segment size of the lth layer of video v is denoted by o vl . Segment m of video v transmitted or cached in ICWN is refer to as an object and expressed by (v, m) in the following. If a user requests segment (v, m) with version l, the first l layers will be downloaded, and the transmit rate should be greater or equal to l i=1 o vl /T v . Moreover, each user has a buffer to store pre-fetched video data for subsequent use, and is capable of combining multiple layers into an integrated video. For example, a user requests a video with version l 2 , but only layers from 1 to l 1 are cached locally, where l 2 > l 1 . The user will obtain the first l 1 layers from local caches and download other layers through the backhaul link, and combine them together to form the requested video.
Moreover, let p u kvl indicate the probability that current channel condition of user u can afford the streaming of video v with version l, which is expressed by Notice that if the transmit rate cannot afford the lowest video version (i.e. R ku < o v1 /T v ), even the base layer of the SVCencoded video cannot be successfully downloaded before playback, which will cause interruptions and rebuffering. However, the base layer will be still requested if the video session is not ended. Similar as the video request model proposed in [26], we assume each request starts from the beginning of the video file and proceed sequentially. The viewing process is roughly divided into two phases: a browsing phase with high departure rate p F and a viewing phase with low departure rate p B . A viewing ratio of 15% is set as the boundary between the two phases. Meanwhile, since Zipf distribution has been established as a proper approximation to video popularity [27], we assume that if a user decides to switch to another video, the first segment of video v is requested with a probability given by: where α is a parameter to determine the distribution skewness.
Based on the aforementioned model, if segment (v, m) is currently downloaded, segment (ν, n) will be requested in the next time slot with the following probability C. PROBLEM FORMULATION In this paper, we aim to minimize the average downloading delay of segments through effective caching schemes. We use VOLUME 8, 2020 a binary decision variable b m kvl to indicate the caching decision of the l-th layer of segment (v, m), where b m kvl = 1 means layer l is cached in the edge controller (k = 0) or AP k (k > 0) and b m kvl = 0, otherwise. Any requested contents that are not cached at the mobile edge should be downloaded through Internet. For user u ∈ U k requesting segment (v, m) with version l, the average delay to obtain all layers from 1 to l is given by: Moreover, let λ um kv denote the probability of user u ∈ U k requesting video segment (v, m) in the next time slot, which can be predicted according to (9). Based on (7) and (10), the total average delay of all mobile users can be expressed as: To minimize D total , we formulate the following optimization problem: where C k and C 0 denote the cache capacities of AP k and the edge controller, respectively. The first constraint in (12) indicates that the total amount of contents in each cache device should not exceed the corresponding cache capacity.

III. LAYERED HIERARCHICAL CACHING FOR SVC-BASED VIDEO STREAMING
In this section, we first prove the NP-hardness of problem (12), then simplify the problem and propose a layered hierarchical caching algorithm for SVC-based video streaming over ICWN.
Proof: Knapsack problem (KP) is a well known NP-Hard problem. We prove Theorem 1 by reducing KP to Problem (12). In KP, we are given a set of items and a knapsack. These items have different weights and values, and can be packed into the knapsack which has limited weight capacity. The objective is to pick out part of the items to make sure that their total value is the largest while their total weight is less or equal to the knapsack capacity. Note that KP is a special case of problem (12) k vi λ um k v , respectively. The above steps of reduction can be finished within polynomial time, which proves the NP-hardness of problem (12).

A. PROBLEM REFORMULATION
According to Theorem 1, it is hard to obtain the optimal solution of problem (12) within polynomial time. Therefore, we first simplify the problem according to the theory of KP, and then propose a heuristic algorithm.
We rewrite (11) as where D is a constant value given by: and To minimize the delay D total is equivalent to solving the following optimization problem: Problem (16) can be interpreted as a knapsack problem. The cache devices at the edge controller and APs play the role of knapsacks in KP. Accordingly, there are a total of |K| + 1 knapsacks with capacities C 0 , C 1 , · · · , C K , respectively. Each layer of a video segment corresponds to an item in KP. Therefore, all v |V|·L ·|M v | layers form an item set O. Each item can be put into more knapsacks. For the l-th layer of video segment (v, m), its weight is o vl , but its value D m kvl is only determined after it has been put into a knapsack. The objective is to pick out K + 1 sets of items which maximize the total value. Update the values of items in O for knapsack 0.

5:
Call Algorithm 2 for knapsack 0 to obtain S 0 . 6: for k ∈ K do 7: Update the values of items in O for knapsack k. Layered video contents can be cached in both the edge controller and an AP simultaneously. According to (15), the value of an item D m kvl may vary when the item is put into different knapsacks. Specifically, the value of an item in the edge controller depends on whether it has been cached in APs. Similarly, the value of an item in an AP depends on whether it has been placed in the edge controller.
Therefore, the caching priority has a significant impact on the performance, which should be determined by considering both network parameters and popularity of video segments. However, it is hard to obtain an exact priority directly for each item. Existing solutions such as branch-and-bound algorithms may suffer from forbidding computational complexity or unsatisfactory optimality [28]. To overcome the above disadvantages, some machine learning (ML) based approaches have been widely adopted and proved to be effective [29]. Therefore, in this paper, we determine the priority based on ML, which will be detailed in Section III-C.
Based on the caching priority, we propose a heuristic layered hierarchical caching scheme, which is given as Algorithm 1. The details of each step are described as follows: Step 1: Initialize the selected item set S k for each knapsack k, where k ∈ K ∪ {0}. Let g v (·) and g w (·) denote the value function and the weight function of an item or a set, respectively. We can obtain that g v ( ) = 0 and g w ( ) = 0, where denotes an empty set.
Step 2: During the online decision making phase, the instant parameters of items and networks within our considered region are readily available at the controller. We can obtain the caching priority for each item in real time based on the well trained SVM model.
Step 3: For knapsack 0, the value of each item D m 0vl in set O is updated. These items are sorted by the decreasing order Step 4: Based on the obtained S 0 and S k in Step 3, the sum value k∈K∪{0} g v (S k ) can be calculated. If the increment of the above value is less than a constant tolerance or the number of iterations exceeds the maximal threshold I , the item sets for all knapsacks are obtained. Otherwise, the next iteration of the algorithm starts from Step 2.
The single knapsack filling algorithm applied in Step 3 is shown in Algorithm 2. Notice that our proposed layered hierarchical cache scheme can be executed with low computational overhead and finished within polynomial time, which can be implemented in the centralized edge controller.

C. CACHING PRIORITY DETERMINATION BASED ON MACHINE LEARNING
First, a sufficiently large number of training data have to be generated. Therefore, we randomly simulate multiple scenarios with various user requests and network parameters. For each scenario, a specific problem (12) is obtained accordingly. Ideally, if these problems can be solved optimally, an accurate labeled training data set will be formulated based on the solutions. Specifically, if b m 0vl = 1, the controller will get the priority to cache the lth layer of segment (v, m), and the corresponding item in O is labeled as 1; otherwise, the item is labeled as 0. After collecting all training labels from multiple scenarios, the training set becomes large enough to make a machine learning model be well trained.
However, problem (12) is hard to be solved optimally within a reasonable time, since the complexity of searching b m 0vl exhaustively is as high as 2 |O| , which is unaffordable in practice when |O| is very large. Therefore, in the following, we design a novel exhausting search method with sublinear reduction [30] to speed up the generation of training data.
The essential idea is to effectively reduce the size of O in each scenario. We first remove tail items with extremely low or zero unit-values, and sample a subset of the residual VOLUME 8, 2020 items to obtain a reduced subset O . Then, we search the optimal b m 0vl in set O exhaustively. An item that is not included in O shares the priority of its most similar item in O . Specifically, we sort the items in O in decreasing unitvalue order according to D m kvl /o vl , and obtain the ranks κ ik for each item i ∈ O in AP k ∈ K and the controller. These ranks are further normalized for similarity comparison, and the caching capacities C k (k ∈ K ∪ {0}) are reduced by the same sampling ratio. Each record of the training set includes the normalized unit-value ranks, weight, d R , d 0 , d k , C 0 , C k , α, and a corresponding label of caching priority. Although the exhausting algorithm cannot be executed in real time, it does not impair the effectiveness of our proposed method, because the training data are generated offline.
In addition, we design a binary classifier based on support vector machine (SVM), which takes the training data as input and outputs the caching priority for each item. Intuitively, the unit-value ranks of an item in different caches are closely related to its request characteristics, and are suitable to be utilized as training features. Moreover, we observe that their mean value reflects the average popularity of an item, and their variance indicates whether an item is uniformly requested by all users in different APs. Therefore, we adopt all the above elements as training features, i.e. d R , d 0 , d k , C 0 , C k , α and κ ik , together with the corresponding means and variances of κ ik . Note that, the caching capacities of the controller and each AP are reduced by the same sampling proportion accordingly.
To reduce the computational complexity, we choose radial basis function (RBF) as the kernel function of SVM [31], which can map the samples nonlinearity to a higher dimensionality space. Note that such offline training process does not take up the time of online decision making and the welltrained model can be used directly during the service time.

D. COMPLEXITY OF ALGORITHM 1
The complexity of our proposed method is analyzed. Since the training process can be performed offline, we only consider the online decision making phase of Algorithm 1. First, we sort the items in O in decreasing unit-value order in each knapsack, whose complexity is O((K + 1)|O| log |O|). Then, for each item, the complexity of judging an item's priority based on the SVM model is O(N c d in ) [32], where N c is the number of output categories, i.e. 2 in our model, and d in represents the dimension of the input vectors. The complexity of Algorithm 2 is O(|O| log |O|). The maximal computational complexity of Algorithm 1 can be calculated by O(N c d in |O| + I (K + 1)|O| log |O|).

IV. A SPECIAL CASE: REMOTE DOWNLOAD DELAY IS MANY-FOLD HIGHER THAN LOCAL DOWNLOAD DELAY
In practice, the delay to download a video from remote server is generally many-fold higher than that from local caches at the network edge [33]. In this section, we consider a special case of Problem (12), where the condition Based on the above mentioned approximation, the average delay given in (10) can be simplified as Moreover, let where q m kvl denotes the amount of layers from 1 to l of video v, which are neither cached in AP k nor in the edge controller. Correspondingly, the delay minimization problem (12) can be equivalently expressed as where R miss means the video data downloaded from remote server due to cache miss. We define the cache hit ratio of our considered ICWN as the percentage of requests that can be retrieved from the edge controller or an AP. Obviously, solving Problem (19) is equivalent to maximizing the cache hit ratio. Therefore, Problem (19) is also referred to as cache hit ratio maximization problem in this paper.
Although we can adopt Algorithm 1 to obtain a heuristic solution since Problem (19) is a special case of Problem (12), we would like to design a more efficient algorithm with low complexity and guaranteed approximation ratio by utilizing the special structure of Problem (19).

A. SIMPLIFIED LAYERED HIERARCHICAL CACHING ALGORITHM
Lemma 1: In the optimal solution of problem (19), no layers of the SVC videos can be both cached simultaneously by the edge controller and an AP, i.e. b m kvl b m 0vl = 0 or b m kvl + b m 0vl ≤ 1, ∀k ∈ K, ∀v, ∀l, ∀m. Proof: We prove Lemma 1 by contradiction. Suppose there is an optimal solution that caches the l-th layer of video segment (v, m) both in the edge controller and AP k, i.e. b m kvl = 1 and b m 0vl = 1. Obviously, the objective value of (19) will not be changed if we set b m kvl = 0. In other words, removing the cached layer of (v, m) in AP k would have no impact on the result. Then, we can fill the caching space of AP k that previously cached the removed content with other uncached contents, which can certainly improve the optimal solution. This contradicts the assumption of the optimality.
Based on Lemma 1, we further simplify R miss by 77278 VOLUME 8, 2020 where B is a constant value given by and Therefore, Problem (19) can be equivalently transformed into the following problem Similarly as what we did for solving Problem (16) in Section III, Problem (23) can also be interpreted as a knapsack problem. Here, for the l-th layer of video segment (v, m), the weight and the value are respectively o vl and R m kvl when it is put into knapsack k. A new constraint is added that an item cannot exist both in knapsack 0 and knapsack k, where k ∈ K.
Since the edge controller and APs have similar downloading delay compared with remote video server, the priority of the edge controller is higher than APs, then all videos should be sorted according to D m 0vl (which is not essentially different from R m 0vl here), and packed with knapsack 0 to get a set of videos as S 0 . Then, the value D m kvl of all videos in the AP (which is not essentially different from R m kvl ) is updated. For the video files in S 0 , their value in each AP is updated to 0. That is to say, knapsack 0 always has highest priority to cache video contents in the special case.
Based on the above analysis, we propose an approximal method as shown in Algorithm 3. First, we sort the items of O in decreasing unit-value order, i.e. g v (a i ) where a i and a j denote the ith and the jth items in O. Then, we put as many items as possible according to this order into knapsack 0, and obtain a item set of S 0 . Second, for each knapsack k (k ∈ K), sort the items in O\S 0 in the same decreasing unit-value order. Fill knapsack k with as many items as possible according to the aforementioned decreasing order.

B. APPROXIMATION RATIO
To show the performance of our proposed scheme, we estimate its approximation ratio compared with the optimal solution.

Input:
C k , λ um kv , o vl , p u kvl , Output: 1: Map all layers of videos into an item set O with weight o vl . Calculate R m kvl according to (22). Set S k = , ∀k ∈ K ∪ {0}, where denotes an empty set.  for each item a j in O k , 1 ≤ j ≤ |O k | − 1 do 12: if g w (S k ∪ {a j }) ≤ C k then 13: 14: end if 15: end for 16: end for 17: return S k , ∀k ∈ K ∪ {0} Theorem 2: Algorithm 3 can provide at least 1/2 approximations to the optimal solution.
Proof: Assume there is an optimal algorithm for solving Problem 23. By adopting the optimal algorithm and Algorithm 3, we obtain two results as S opt k and S k , which indicate the items finally selected for knapsack k. Obviously, if local caches' capacities in considered ICWN are large enough to cache all versions of videos in library, our proposed scheme can achieve the optimal performance. However, in general, the local caches cannot cache all videos. In this case, the proof is given as follows.
It is assumed that the items in O have been sorted in decreasing unit-value order. Let a i be the first excluded item when filling S 0 , i.e.
If the binary constraint that b m kvl ∈ {0, 1} is relaxed to b m kvl ∈ [0, 1], (23) can be considered as a linear programming problem, whose optimal objective value is an upper bound of k∈K∪{0} g v (S opt k ). If we expand the capacity of knapsack k to g w (S ‡ k ) by solving the linear programming problem, we can easily obtain that S ‡ k is the optimal solution. Thus, we have Let a max 0 ∈ C 0 be a item with maximum value for knapsack 0, where g w (a max 0 ) ≤ C 0 is satisfied. Let a max k ∈ O\{a max 0 } be an item with maximum value for knapsack k, which is constrained by g w (a max k ) ≤ C k , k ∈ K. With the above analysis, we can derive that In summary, our proposed algorithm can achieve at least 1/2 of the optimal total value.

C. COMPLEXITY OF ALGORITHM 3
In Algorithm 3, all items in O are first sorted for the centralized controller, and then K similar sorting processes are performed over O k for all APs. Therefore, the complexity of entire algorithm can be given by O(|O| log |O| + k∈K |O k | log |O k |). Compared with Algorithm 1, Algorithm 3 is of low complexity and easy to be implemented.

V. NUMERICAL RESULTS
In this section, the performance of our proposed caching schemes are evaluated. Our simulation platform is built based on MATLAB. In the platform, user movement and video request are simulated and predicted according to the models in Section II, and the SVM classifier tool integrated in MATLAB is adopted. Average transmission delay of downloading a video segment, cache hit rate and QoE of users are adopted as main metrics.
Network Setting: We consider an ICWN with 10 APs which are connected to and controlled by an edge controller. Both the controller and each AP are equipped with caching devices with size of 5 GB and 1 GB, respectively. We set the available bandwidth, the transmit power and the noise power to be W k = 20 MHz, P k = 35 dBm and σ 2 0 = −105 dBm, respectively. The number of mobile users in the ICWN is 100 in default. The user mobility is characterized by Gauss-Markov process, where the average velocity is µ = (0.6, 3.8) m/s, and other mobility parameters are set to be β = 2 and = 0.8, respectively. The latency parameters are d R = 80 ns, d 0 = 30 ns, and d k = 20 ns for each AP, respectively.
Application Setting: Each video in the server is encoded into 1 base layer and 4 enhancement layers. Normally, the bitrate of video layer depends on the encoding parameters and the content of the video. For the purpose of simplifying simulation, it is assumed that different videos have similar bitrates, and each layer has a bitrate of 500 Kbps. Therefore, with 500 Kbps in step, the video server offers 5 versions of video, ranging from 500 Kbps to 2500 Kbps, respectively. The departure rates p F and p B are set to 0.7 and 0.3, respectively. The duration of a video follows a uniform distribution from 2 minutes to 10 minutes. A video is divided into multiple segments, each of which lasts for 2 seconds.
Comparison Baseline: We compare the performance of our proposed schemes with the following two baselines.
• Baseline 1: The first one is a non-layered caching scheme. Different from our proposed methods, nonlayered caching approaches use traditional non-scalable coding protocols to encode videos. Typically, SVC consumes approximately 20% more bits to achieve the same video quality, which is an additional overhead of adopting layered coding schemes [34].
• Baseline 2: The caching method proposed in [10] is also simulated as a comparison baseline, which caches each video as a whole file including all video segments.
A. AVERAGE TRANSMISSION DELAY Fig. 2 depicts the delay performance of different caching schemes with the increasing number of video segments. The parameter α of Zipf distribution is set to 0.6 and 0.9, respectively. As shown in Fig. 2, for all caching schemes, the larger the number of video segments, the higher the average delay of downloading a single video segment. The reason that our proposed schemes outperform the baselines is that the layered cache schemes reuse low-layer video data, which fully explore the benefits of layered video encoding. Fig. 2 shows that the content reusing gain can overcome the 20% overhead of layered coding and leads to better performance. Moreover, by caching each video as a whole file, the performance of baseline 2 is inferior to that of the others which cache video contents at segment level. This is because baseline 2 is not  mobility aware. When a user is moving from one AP to another, a large number of pre-cached video segments in the user's previous associated AP may become useless. In addition, when α increases from 0.6 to 0.9, i.e. video requests become more concentrated, the average delays of all schemes are reduced, which is consistent with the characteristics of Zipf distribution. The more concentrated the video requests are, the better performance will be obtained. Fig. 3 shows the trend of cache hit rate under different cache schemes, α values and numbers of video segments. As the number of video segments increases, the cache hit rates of all caching schemes decrease. However, the performance of our proposed layered cache scheme is better than that of other baselines. It is noted that, benefiting from the data reuse feature of layered video, the cache hit rate of the layered cache scheme with α = 0.6 is higher than that of the nonlayered caching scheme with α = 0.9. What's more, our proposed cache scheme and baseline 1 are both better than baseline 3 thanks to the former two being able to use cache space more efficiently.

D. AVERAGE NUMBER OF PAUSES
Since viewing interruption largely affects the QoE of users, we simulate the average number of pauses for different schemes. The numerical results are averaged over a large number of independent runs each with a duration of 5 minutes. As shown in Fig. 5, the average numbers of pauses under different caching schemes increase with the number of video segments. Compared with baselines, users experience less number of pauses when adopting our proposed caching scheme. In addition, when user requests are more concentrated, i.e. the parameter α of Zipf distribution is changed from 0.6 to 0.9, the number of pauses becomes less accordingly. Fig. 6 depicts the impact of user scales on our proposed caching schemes. The total number of users is set to 60, 80, 100, and 120, respectively. The parameter α in Zipf distribution is set to 0.6. In our design, most performance metrics depend on the total number of users. According to (6) and (7), with the increasing number of users, the download rate tends to decrease, resulting in more requests of videos with lower version. However, although the average video quality is impaired, the performance of the average delay, the cache hit rate, the backhaul load and the number of pauses is improved. This is because more users are probable to request for videos with low qualities, thus the distribution of requests becomes VOLUME 8, 2020  concentrated. More requested videos can be found in local caches. Therefore, higher cache hit rate and lower backhaul load are observed. Furthermore, since lower versions of video includes less bits and the average delay per bit is constant in the simulation, there will be lower average delay and less potential pauses if less bits are downloaded.

F. SIMULATION RESULTS OF THE SPECIAL CASE
In the special case introduced in Section IV, where d R − d k ≈ d R −d 0 ≈ d R is satisfied, minimizing the system total latency is equivalent to maximizing the cache hit rate. As shown in Fig. 7, the trend of the curve distribution of this special case is similar to the general delay minimization case. Other analysis of the proposed layered cache scheme and the other two baseline schemes have been respectively presented in Fig. 2, Fig. 3, Fig. 4, Fig. 5, and Fig. 6 will not be discussed here. Fig. 8 shows the cache hit rate for three schemes in different APs, where the number of video segments is set to 5000, and the parameter α of Zipf distribution is set to 0.6. The cache hit rate in different APs varies due to diverse interests of users in different APs for video contents. By adopting our proposed caching scheme, users are more probable to obtain higher cache hit rate in each AP.

VI. CONCLUSION
In this paper, we address the caching problem for SVC-based video steaming over ICWN, which is characterized by both layered video contents and hierarchical network structures. We formulate a 0-1 programming problem to minimize the average video transmission latency. The NP-hardness of the problem is proved, and we simplify it into a special KP for the ease of problem solving. A layered hierarchical caching scheme is proposed to solve the problem within polynomial time. In addition, we consider a special but common case where the remote download delay is many-fold higher than the local download delay. The original problem can be equivalently simplified into a cache hit rate maximization problem, and an algorithm is proposed to solve the simplified problem with a 1/2 approximation ratio. Simulation results demonstrate the effectiveness of our proposed schemes in achieving low average delay and high cache hit rate.