Cooperative Multi-Bitrate Video Caching and Transcoding in Multicarrier NOMA-Assisted Heterogeneous Virtualized MEC Networks

Cooperative video caching and transcoding in mobile edge computing (MEC) networks is a new paradigm for future wireless networks, e.g., 5G and beyond-5G, to reduce scarce and expensive backhaul resource usage by prefetching video files within radio access networks (RANs). Integration of this technique with other advent technologies, such as wireless network virtualization and multicarrier non-orthogonal multiple access (MC-NOMA), provides more flexible video delivery opportunities, which leads to enhancements both for the network’s revenue and for the end-users’ service experience. In this regard, we propose a two-phase RAF for a parallel cooperative joint multi-bitrate video caching and transcoding in heterogeneous virtualized MEC (HV-MEC) networks. In the cache placement phase, we propose novel proactive delivery-aware cache placement strategies (DACPSs) by jointly allocating physical and radio resources based on the network stochastic information to exploit flexible delivery opportunities. Then, for the delivery phase, we propose a delivery policy based on user requests and network channel conditions. The optimization problems corresponding to both phases aim to maximize the total revenue of network slices, i.e., virtual networks. Both problems are non-convex and suffer from high-computational complexities. For each phase, we show how the problem can be solved efficiently. We also propose a low-complexity RAF, in which the complexity of the delivery algorithm is significantly reduced. A delivery-aware cache refreshment strategy (DACRS) in the delivery phase is also proposed to tackle the dynamical changes of network stochastic information. The extensive numerical assessments demonstrate a performance improvement of up to 30% for our proposed DACPSs and DACRS over traditional approaches.

to maximize the total revenue of slices subject to the quality of services contracted between slices and end-users and the system constraints based on their own assumptions. Both problems are non-convex and suffer from high-computational complexities. For each phase, we show how these two problems can be solved efficiently. We also propose a low-complexity RAF in which the complexity of the delivery algorithm is significantly reduced. Extensive numerical assessments demonstrate a performance improvement of up to 30% for our proposed DACPSs over traditional approaches.
Index Terms-Mobile edge computing, cooperative caching, multi-bitrate video transcoding, adaptive bitrate streaming, wireless network virtualization, multicarrier NOMA.

A. Background & Motivations
Wireless edge caching has been developed as a candidate solution for next generation wireless networks, e.g., 5G, to address high data rate and/or low latency multimedia services by proactively storing contents at the edge of wireless networks and in so doing offloading scarce and expensive backhaul links [1]- [3]. Among various mobile services, mobile video services and applications are expected to account for a major percentage of the global mobile data traffic in coming years [4], [5]. For this reason, video caching at the network edge has drawn a lot of attention recently [2], [4]- [9].
In practical scenarios, due to the multiple bitrate variants of each unique video file, service providers often need to transcode video files into multiple bitrates [5]- [9]. To this end, adaptive bit rate (ABR) streaming techniques have been developed to enhance the quality of delivered video in radio access networks (RANs) where each video file is adjusted according to users' requests based on their display size and network channel conditions [5]- [7].
Recently, mobile edge computing (MEC) networks have emerged as a promising technology for next generation wireless networks, providing cloud caching and computing capabilities within the RAN [8], [10]- [13]. Thanks to this paradigm, video files could be prefetched and/or transcoded in close proximity to end-users, leading to enormous latency and backhaul traffic reductions in wireless networks. One problem with this, however, is that duplicated video caching and transcoding in multiple resource-constrained MEC servers wastes both storage and processing resources. To tackle this issue, cooperative joint multi-bitrate video caching and transcoding (CVCT) technology is proposed where each MEC server is able to receive the requested video files from neighboring MEC servers via fronthaul links [7]. In this architecture, each MEC server is deployed side-by-side with each base station (BS) using the generic computing platforms which provides the caching and computation capabilities in heterogeneous networks (HetNets) [7], [8].
By sharing both the storage and processing resources among multiple MEC servers, more video files can be served within the RAN, reducing the backhaul traffic. However, non-simultaneous transferring and transcoding video files wastes more time and physical resources in the CVCT system, which is not beneficial for delay-sensitive services. To cope with this challenge, parallel video transmission and transcoding capability [9], [14] can be deployed. In the parallel CVCT system, video transcoding runs in parallel with video transmission, and all the multi-hop video transmissions (between backhaul, fronthaul, and wireless access links) also run in parallel.
Non-orthogonal multiple access (NOMA) has recently emerged as a key enabling technology to improve the spectral efficiency of 5G wireless networks [15], [16]. Unlike conventional orthogonal multiple access (OMA) techniques, NOMA can significantly improve the system throughput and support the massive connectivity by using successful interference cancellation (SIC) at the receivers and a mixture of multiple messages at the transmitter [15], [16]. The spectral efficiency can be further improved by combining NOMA with multicarrier systems, called multicarrier NOMA (MC-NOMA), which utilizes multicarrier diversity [17]. To reduce the capital expenses (CapEx) and operation expenses (OpEx) of RANs, wireless network virtualization technology is developed where the wireless network infrastructure is abstracted and sliced based on different services [18]- [20]. Exploiting MC-NOMA in virtualized networks can further reduce the wireless bandwidth cost of slices by reusing each subcarrier for multiple users owned by each slice. In this way, the integration of the aforementioned technologies enables the CVCT technology at MEC servers and network infrastructure abstraction for different cost-efficient wireless servicing.
Transcoding a large number of videos at each resource-constrained MEC server simultaneously poses another challenge for delay-sensitive services [5]- [7]. The significant performance gain of the CVCT system can only be achieved when a joint distributed video caching and transcoding strategy is designed [7]. Accordingly, the major question is which bitrate variant of a video file should be cached or transcoded to another lower bitrate variant? An efficient design of joint power and subcarrier allocation is required to achieve the benefits of MC-NOMA in the virtulized wireless networks as well as the improved throughput. Additionally, the scheduler should be fast enough to readopt the video delivery policy based on the arrival requests of users and channel state information (CSI), specifically in realistic ultra dense 5G wireless networks with a larger number of videos. To this end, the video delivery policy needs to be lightweight.
virtualization [18]- [20], [30]. However, none of these works have utilized the benefits of video transcoding in their systems.
In the context of cloud-based video transcoding, some research efforts investigate the advantages of cloud computing and devise joint processing resource allocation and scheduling policies to reduce the transcoding delays in the delivery phase [9], [31]. In addition, [5]- [7] investigate joint multi-bitrate video caching and transcoding by utilizing the ABR streaming technology in C-RANs. In [6], a transmission-aware joint multi-bitrate video caching and transcoding policy is devised to maximize the number and quality of concurrent video requests in each time slot in a single-cell scenario. In [5], the benefits of joint caching and radio resource allocation policy is investigated for a multi-cell MEC network without any cooperation between MEC servers.
Additionally, [7] investigates the design of a transcoding-aware cache replacement strategy in the online delivery phase of a non-parallel CVCT system based on the arrival video requests.
Accordingly, designing an efficient proactive DACPS for the CVCT systems is still an open problem. Furthermore, the parallel transmission and transcoding capability is not applied for the CVCT system in [7] which can avoid wasting time and physical resources. Besides, prior works do not utilize the benefits of jointly allocating physical resources for designing an efficient DACPS. In addition, based on our most up-to-date knowledge, the impact of applying MC-NOMA in virtualized wireless networks in terms of bandwidth cost reduction is not yet addressed in the related works. In our research, we address these aforementioned challenges.

C. Our Contributions
In this paper, we consider a parallel CVCT system in a MC-NOMA-assisted heterogeneous virtualized MEC (HV-MEC) network. This network consists of multiple remote radio systems (RRSs) each equipped with a BS and MEC server that enables the CVCT capability in the network edge. For this setup, we propose a virtualization model and a pricing scheme based on the revenues of slices with a specific QoS. In contrast to [7], where the main goal was only to decrease network cost, we aim to maximize the revenue of slices by jointly increasing slice incomes (which is obtained by providing data rates for own users) and decreasing slice costs. In this network, consuming more physical resources provides more data rates for users, which increases slice incomes. However, this consumption causes a network cost increment that degrades slice revenues. Accordingly, this trade-off should be handled carefully.
To address this, we propose a resource allocation framework (RAF) where network operational time is divided into two phases: a cache placement phase (Phase 1); and a delivery phase (Phase 2). To the best of our knowledge, this paper is the first in the literature to propose efficient DACPSs in a parallel CVCT system for a MC-NOMA-assisted HV-MEC network based on available stochastic wireless channel distribution information (CDI) and video popularity distribution (VPD). This novel strategy is designed on the basis of jointly allocating available physical resources as storage, processing, and transmission (transmit power of RRSs, subcarriers, and backhaul and fronthaul capacities) with user association and request scheduling. This paper also provides a novel solution to reduce the computational complexity of our delivery algorithm with on-demand and real-time cloud services. We show that our proposed low-complexity RAF (LC-RAF) can be efficiently utilized for dense environments where there are higher levels of path loss.

D. Paper Organization
The rest of this paper is organized as follows. Section II presents the network architecture and formulates the cache placement and delivery optimization problems. Section III contains the solution of the problems and the proposed LC-RAF. The numerical results are presented in Section IV. Our concluding remarks are provided in Section V. The abbreviations used in the paper are summarized in Table I.  is subscribed to only one slice. Fig. 1 shows an illustration of this network.
Assume that there exist V unique transcodable videos, each having L bitrate variants 1 in the origin cloud server with unlimited storage capacity [5], [7], [22]. The video library is denoted where v l belongs to the v th video type with the l th bitrate variant with the size of s v l . Consider that video v h can be transcoded to v l , if l < h This network operates in two phases: Phase 1, where the scheduled video files are proactively stored in the cache of RRSs during off-peak times [6], [7]; and Phase 2, where the requested videos are sent to the end-users according to the adopted delivery policy [7], [9], [14], [31]. In Phase 1, we aim to design an efficient DACPS based on the available VPD and CDI. Phase 2, which is followed by Phase 1, is divided into multiple finite time slots where in each time slot, we propose a delivery policy based on the arrival requests of users, CSI, and the caching status.
The proposed RAF is illustrated in Fig. 2. To design a DACPS in Phase 1 utilizing the delivery opportunities in the system, we first need to describe the delivery model of the system in Phase 2. Then, we investigate the idea of developing DACPSs in Phase 1. 1 The lowest bitrate of each video type is denoted by 1 and the highest is denoted by L.

B. Phase 2
In this phase, we assume a time-slotted system where at the beginning of each time slot, each user requests one video file. Similar to related works [3]- [6], [18], [19], [30], [32]- [34], the CSI and requests of users remain fixed through a time slot and are completely independent from other time slots 2 . All requests of users at each time slot should also be served within the time slot [3], [18], [33]. Hence, the adopted delivery policy for each time slot is completely independent from other time slots. Therefore, we focus on only one time slot in Phase 2. The request of user u for video v l is indicated by a binary variable δ v l u ∈ {0, 1} such that if user u requests video v l , δ v l u = 1 and otherwise, δ v l u = 0. Thus, we have v l ∈V δ v l u = 1, ∀u ∈ U. The binary parameter θ b,u ∈ {0, 1} determines the user association indicator where if user u is associated with RRS b, θ b,u = 1 and otherwise, θ b,u = 0. In this system, we assume that each 2 Similar to previous works, the main motivation for considering this model is to simplify the video transmission model. Actually, dynamic video requesting and changes of CSI during each data transmission time cause a more complex delivery model, which is not yet investigated in the joint radio resource allocation and content placement context [3]. Assuming a dynamic video requesting with changes of CSI during a time slot can be considered as a future work. user can be connected to at most one RRS, which is represented by [3], [6], [7], [18], [30] The requests of users associated with RRS b for video v l can be served by one of the following binary events denoted by [7]: 1) x v l b = 1 represents that video v l can be sent directly from cache of RRS b. 2) y v h ,v l b = 1 indicates that video v l is directly served by RRS b after being transcoded from a higher bitrate variant h at RRS b.
and then, sending to RRS b via fronthaul link.   To avoid duplicated video provisioning at each RRS, we assume that all the requests for each 3 This parallel CVCT system can also be extended to a coordinated multi-point-enabled one in which each user is able to access to more than one transmitter. Despite the significant potential, the coordinated multi-point system increases the complexity of the delivery model. Therefore, we consider this scheme as a future work. video file v l from users associated with RRS b can be served by only one type of events [7], i.e., In practical terms, the video transcoding can only be performed for higher bitrate variants of a transcodable video file to lower bitrate variants. Accordingly, we have [9]. Note that each event can be chosen if the required video exists in the target storage [6], [7], [9]. Let ρ v l b be the binary cache placement indicator where ρ v l b = 1 if video v l is cached by RRS b, and ρ v l b = 0, otherwise. Therefore, we have In this parallel CVCT system, the MC-NOMA technology is deployed at each RRS such that the total frequency bandwidth W is divided into a set of N = {1, 2, . . . , N } orthogonal subcarriers where the frequency band of each subcarrier is W s . In this scheme, we assume that users aim to download the video files; online video servicing based on the playback rate of videos in the HV-MEC networks is considered for future work. To this end, MC-NOMA allows each orthogonal subcarrier n to be shared among multiple users in each RRS via applying a superposition coding at the transmitter side 4 and a SIC at the receiver side [17], [35], [36]. The binary subcarrier assignment indicator is also indicated by τ n b,u , where if subcarrier n is assigned to the channel from RRS b to user u ∈ U, τ n b,u = 1 and otherwise, τ n b,u = 0. Note that each user can take subcarriers from RRS b if the user is associated with that RRS. Therefore, we should have [3], [5], [18] We denote by p n b,u the transmit power of RRS b to user u ∈ U on subcarrier n, and, h n b,u the instantaneous channel power gain between RRS b and user u ∈ U on subcarrier n. After performing SIC, the instantaneous signal-to-interference-plus-noise ratio (SINR) at user u ∈ U associated with RRS b on subcarrier n is [17], [35] γ n b,u = where I Intra,n b,u = u ∈U ,u =u h n b,u >h n b,u p n b,u h n b,u represents the induced intra-cell interference on user u ∈ U over subcarrier n, I Inter,n ,u is the received inter-cell interference at user u ∈ U over subcarrier n, and σ n b,u = W s N 0 is the additive white Gaussian noise (AWGN) power, in which N 0 is the noise power spectral density (PSD). Therefore, the instantaneous data rate at user u ∈ U from RRS b on subcarrier n is r n b,u = τ n b,u W s log 2 (1 + γ n b,u ). To apply the SIC technique for MC-NOMA, the following constraint should be satisfied [17]: Accordingly, the instantaneous data rate at user u ∈ U assigned to RRS b is r Ac b,u = N n=1 r n b,u . Under the assumption that each RRS b has a maximum transmit power P max b , we have Consequently, the instantaneous latency of user u to receive video v l from RRS b can be Transcoding video v h to v l , ∀l < h, is performed via the ABR streaming technique where the transcoding operation is mapped to a η v h ,v l -bits computation-intensive task [7], [8]. Let Cycle be the number of central processing unit (CPU) cycles required to compute 1 bit of the computation-intensive task of transcoding video 5 v h to v l [10], [11]. Each RRS performs all scheduled computation tasks in parallel by efficiently allocating its computation resources [12], [13]. The number of CPU cycles per second allocated to RRS b for transcoding video v h to v l is also indicated by φ v h ,v l b ∈ {0, 1, 2, . . . } [9], [10], [31]. Let χ max b be the maximum processing capacity of RRS b. Therefore, the per-RRS maximum processing capacity constraint can be The speed of the transcoding process is obtained by the video transrating, i.e., transcoding bit rate, which is the number of bits transcoded by the processor per second [14]. Therefore, the delay of transcoding video v h to v l at RRS b can be obtained by [11]- [13].
Cycle is referred to the workload of the task of transcoding video v h to v l in the ABR technique.
For this setup, we consider R max 0,b and R max b ,b as the maximum capacity of backhaul link b, and fronthaul link from RRS b to RRS b, respectively [32]. We also denote r v l 0,b and r v l b ,b the adopted data rate for RRS b to receive video v l from the origin cloud server and from the neighboring RRS b , respectively. Hence, the following maximum channel capacity constraints should be satisfied: and v l ∈V Accordingly, the delays of receiving video v l from the origin cloud server and RRS , respectively.
From the isolation perspective in slicing context, to guarantee the QoS at users owned by each slice m, i.e., U m , we apply a minimum data rate constraint as where R min m represents the minimum required access data rate of users in U m [30]. Satisfying (11) for a non-zero variable R min m needs allocating at least one subcarrier to each user in U m . According to (4), this condition can only be applied if each user be associated with at least one RRS. Therefore, for each non-zero variable R min m , all users in U m should be associated with at least one RRS. Accordingly, based on constraints (1), (4), and (11), if R min m > 0, the inequality in (1) turns into an equality for all users in U m . It is noteworthy that R min m = 0 means slice m does not guarantee any QoS for its own users. In this regard, this slice provides a best-effort service in which the requests of its own users are not guaranteed to be served.
The video transcoding process runs in parallel with the video transmission, where the delay of each transcoding in the system is measured by transcoding the first several segments of a video file [9], [14]. This is negligible compared to the corresponding wireless transmission delay.
To efficiently allocate physical resources in this parallel system, some transmission/transcoding delay constraints should be held for each multi-hop scheduling event. For instance, in Event 2, the delay of transcoding video v h to v l at RRS b should not be greater than the access latency of user u to receive video v l from RRS b [6], [14], i.e., For Event 3, the delay of fronthaul transmission should not be greater than the access delay [30], [32], [33]. Therefore, we have For Event 4, the delay of transcoding and fronthaul transmission should not be greater than the fronthaul and access delays, respectively. These practical constraints can be represented as For Event 5, fronthaul and transcoding delays should not be greater than the transcoding and access delays, respectively. Hence, we have Finally, for Event 6, the backhaul delay should be equal or less than the access delay for each video transmission. Accordingly, we have If all conditions in (12)-(18) hold, the total latency of each user comes from its access delay (wireless transmission delay) [14], [22], [30]. This parallel system prevents the extra fronthaul/backhaul transmission and video transcoding delays in the network.
Since slices lease the physical resources of InP, we intend to propose a new pricing model to per Hz, respectively [19], [30].
Moreover, the unit price of backhaul and fronthaul rates are defined as α BH and α FH per bps [30].
For the storage resources, each slice pays µ Cache b per bit to utilize the memory of RRS b [19].
The price of the processing resources usage at RRS b is also defined as µ Proc b per CPU cycle.
On the other hand, each slice m gets rewards from its own users due to providing their access data rates [18], [19], [30]. We define ψ m as the reward of slice m from each user u ∈ U m per unit of received data rate (bit/s). Let us consider that ψ m is an increasing function of R min m , i.e., for m = m ∈ M, for R min m ≥ R min m , we have ψ m ≥ ψ m . In this scheme, we aim to maximize the revenue of slices, which can be defined as the reward minus cost of each slice. The reward of each slice m is b∈B r Ac b,u ψ m . To define the cost of each slice, we first formulate the cost of provisioning video v l to RRS b caused by one of the scheduling events as Furthermore, the cost of the access transmission resource usage for transferring the requested video file to user u is $ Cost,Ac . Therefore, the cost of serving Hence, the revenue of each slice for serving video v l to user Based on (2), each slice pays the cost of storage, processing, backhaul, and fronthaul resource usages for each video provisioning to each cell only once. Moreover, each slice m pays the usage cost of each subcarrier only once in each cell, even when the subcarrier is shared among users in U m in that cell via the MC-NOMA technology. Accordingly, the revenue of slice m in Phase 2 can be defined as In Phase 2, with the objective of maximizing the total delivery revenue of slices denoted by $ tot = m∈M $ slice m under the QoS requirements of users, we jointly optimize the user association, access transmit power and subcarrier allocation, fronthaul and backhaul rate adaption, processing resource allocation, and request scheduling to have an efficient delivery performance. For ease The delivery optimization problem can be formulated as (1)-(4), (6), (7)- (18), where (21c) represents that in each cell b, each subcarrier n can be assigned to at most Ψ b users [17].

C. Phase 1
In this phase, we aim to design a DACPS by utilizing the delivery opportunities in Phase 2. Note that the users' request and CSI for Phase 2 are not available in this phase. To utilize the delivery model in the DACPS design, we need some stochastic information about videos popularity and wireless channel conditions. Similar to [3], [22], [32], [34], we assume that the VPD changes slowly compared to the instantaneous requests of users and remains fixed for the entirety of the network's operational time. Also, this parameter can be estimated by the operators by collecting prior set of requests of users [3], [32], [34]. Accordingly, we assume that VPD is available at the scheduler in Phase 1 and does not change during Phase 2, i.e., is valid for Phase 2. Similarly, the CDI can be averaged over various CSIs in different time slots of prior Phase 2 [3]. Assume that the VPD follows the Zipf distribution with the Zipf parameter λ and is the same among all users in the network [3], [22], [27]. Therefore, the popularity of requesting video v l with rank 6 Λ v l is given by To design a DACPS which covers the whole Phase 2, we propose an averaged based joint cache placement and ergodic resource allocation based on the VPD and CDI. Our DACPS is valid until the VPD and/or the CDI changes [3].
In Phase 1, we aim to formulate the stochastic problem of maximizing the total revenue of slices based on the available VPD and CDI to have an efficient delivery performance in Phase 2. In other words, since requests of users and CSI are not available in Phase 1, (21) should be reformulated based on the VPD and CDI. In this regard, the average or ergodic data rate of wireless access link between user u and RRS b isr Ac b,u = E h r Ac b,u , where E h {·} is the expectation operator on the channel power gains [3]. This expectation is necessary even in slow fading scenarios, since the whole of Phase 2 has a much longer time length compared to Phase 1. In this phase, in contrast to the instantaneous access delays formulated in Subsection II-B, the average access delay for receiving video v l from RRS b at user u obtained byD Ac,v l b,u = sv l r Ac b,u is considered. Moreover, in order to apply SIC, the average SIC constraint should be satisfied Furthermore, since the requests of users (δ v l u ∈ {0, 1}) are unknown in Phase 1, constraint (2) should be reformulated based on the VPD (∆ v l ∈ [0, 1]) which is non-achievable. To tackle this challenge and cover all possible situations in Phase 2, we assume that all videos should be served by each RRS based on one of the events described in Fig. 3 if at least one user is associated with that RRS. Hence, we have Although this constraint consumes more physical resources, it covers all possible situations. In other words, it guarantees that various sets of arrival requests in different time slots of Phase 2 can be served [7]. With this assumption and available CDI, constraints (12), (13), (15), (17), and (18) in Phase 2 are reformulated as respectively. Let C max b be the maximum storage capacity of RRS b. In contrast to Phase 2, in this phase, we add a cache size constraint for each RRS as follows: 7 In contrast to Phase 2 where the SIC of MC-NOMA is applied based on the CSI, in Phase 1, we apply SIC based on available CDI.
In Phase 1, our pricing model presented in Subsection II-B is also averaged based on both VPD and CDI. In this line, the average provisioning cost of a video file at RRS b can be formulated . Moreover, the average reward of each slice m for providing a video file to user u ∈ U m based on the considered average data rater Ac b,u can be obtained by b∈Br Ac b,u ψ m . Therefore, the average revenue of slice m is  (26), we propose two low-diversity (LD) and high-diversity (HD) schemes. In the LD scheme, each slice assumes that all of its own users have the same requests based on the VPD. Hence, this scheme considers the best requesting situation which provides the maximum achievable revenue of slices. Conversely, in the HD scheme, each slice assumes that all of its users have different requests, i.e., the worst requesting situation is considered. To handle the different requesting diversity situations in the CPS design, we propose two baseline diversity CPSs, namely LD and HD. In the LD strategy, we consider the upper-bound value of the average revenue of each slice formulated as which is compatible with the LD scheme. On the other hand, for the HD strategy, we consider the lower-bound average revenue of each slice which is expressed as This strategy is also compatible with the HD scheme. In this phase, we design DACPSs in the LD and HD schemes to maximize the total estimated average revenue of slices which are formulated as$  (7)- (10), (14), (16) The cache placement optimization problem in the HD scheme is  The mixed-integer nonlinear programming (MINLP) problems (21), (29), and (30) are completely NP-hard which is mathematically proved in [37] in the transmit power and subcarrier allocation problem for MC-NOMA in order to maximize the downlink data rate of users. In addition, in [38], it is mentioned that data rate maximization optimization problems with interferences are completely NP-hard for OFDMA-based wireless networks and it is non-achievable to find the global optimum joint transmit power and subcarrier allocation policy by any existing method. Since the transmit power and subcarrier allocation in OFDMA is a special case of MC-NOMA [37], the result follows. The exhaustive search requires the examination of S Power , S FH , S BH , and S Process are the number of values that each variable in p, r FH , r BH , and φ can take, respectively. Accordingly, it is very challenging and impractical to find the global   optimum solution for such large-scale and NP-hard optimization problems, since the number of optimization variables and constraints grow exponentially [22], [24]- [26], [37].
The main structure of optimization problems (21) and (29) are the same for a fixed placement variable ρ. Therefore, we provide a local optimum resource allocation algorithm that can be adopted for both problems (21) and (29). However, (30) and (29) differ in objective function.
Accordingly, we modify the solution algorithm proposed for (29) to be applied to (30).

A. DACPSs and Delivery Algorithm
Here, we propose an efficient solution algorithm for (29) by utilizing the alternate optimization algorithm [4], [36], [38]. This algorithm consists of two main steps: 1) finding joint p, φ, r FH , The main alternate method.
7: end for and θ κ 1 are adopted for the network. and r BH ; 2) finding joint Υ, ρ, τ , and θ. We repeat the aforementioned steps until we have where ε 1 is a positive small value and κ 1 is the iteration index, or the number of main iterations exceeds a pre-defined threshold Γ 1 . The proposed approach is summarized in Algorithm 1.

1)
Step 1: In the first step, we obtain p, φ, r FH and r BH jointly, by solving the following subproblem for fixed Υ, ρ, τ , and θ as Problem (31) is still MINLP, which is NP-hard, due to non-concavity of objective function (31a) and non-convexity of constraints (14), (16), (21d), (22), (24), and (29b). In order to deal with the aforementioned challenges, we first relax φ v h ,v l b in (21d) to be a non-negative real value [10], [13], which is an acceptable approach in this context since the maximum number of CPU cycles in each processor is on the order of 10 9 [11]- [13]. For the non-convex constraints, we apply transformation methods to tackle their non-convexity (please see Appendix A). Then, to tackle the non-concavity ofr Ac b,u in (31a) and (29b), and its non-convexity in constraints (34e)-(34i) of the transformed problem (presented in Appendix A), we use the successive convex approximation (SCA) approach based on the difference-of-two-concave-functions (D.C.) approximation method [32], [38], [39]. In this regard, we first initialize the approximation parameters. Then, we solve Algorithm 2 The proposed SCA algorithm with the D.C. approximation method.
Proposition 1: The INLP problem (32) can be equivalently transformed into an IDCP form which is presented in Appendix C. 23 To solve (21) for Phase 2, we can also apply Algorithm 1, since when ρ is fixed, (21) and (29) have a similar structure. Due to space limitations, the solution for the problem of Phase 2 is not included here. To solve (30) in the HD scheme, we again apply a similar method to Algorithm 1 (See Appendix D).

B. Convergence of The Proposed Algorithms
Here, we discuss the convergence of our proposed Algorithm 1 for solving (29). This discussion is presented in the format of the following two propositions.

Proposition 2:
The objective function (29a) is upper-bounded by the total average reward of slices which is obtained by u∈Um b∈Br Ac b,u ψ m that is a nonnegative finite term, due to limited access bandwidth and constraint (7). Therefore, for a feasible problem (29), the proposed alternate Algorithm 1 will converge to a locally optimal solution.
Proof. Please see Appendix E.
Proposition 3: The SCA approach with the D.C. approximation method generates a sequence of improved feasible solutions. Therefore, the proposed algorithm for solving (31) will converge to a locally optimal solution when the number of SCA iterations are large enough.
Proof. Please see Appendix F.
Since all proposed algorithms have similar structures, the convergence of the other algorithms can be proved same as the proposed Algorithm 1 for solving (29).

C. Computational Complexity of the Proposed Algorithm
Here, we aim to obtain the computational complexity of the proposed solution algorithms for problems (21), (29), and (30). Since the proposed alternate algorithms for solving (21), (29), and (30) have basically the same structure, we only present the details of obtaining the computational complexity of solving (29) using Algorithm 1. After that, the computational complexity of solving (21) and (30) is investigated.
In the first step of Algorithm 1, we first use a transformation method presented in Appendix A to solve (31). We then solve the equivalent result problem (34) using the iterative SCA approach based on the D.C. approximation method. In each iteration κ 2 of SCA, the approximated disciplined convex programming (DCP) problem of (34) is solved by CVX, which employs the geometric programming with the interior-point method (IPM) [39], [40]. Therefore, the computational complexity of solving the approximated problem of (34) is on the order of Ω 1 LD = , where T 1 LD = 3B +B 2 +U +M +BU 2 N +BV LU (1 + B + L)+B 2 V L 2 (2 + 2U ) is the total number of constraints in the approximated problem of (34), t 0 is the initial point for approximating the accuracy of the IPM, 0 < ∞ is the stopping criterion for the IPM, and 0 is for updating the accuracy of the IPM [39], [40]. Note that Ω 1 LD is only for one iteration of the SCA approach. The total complexity of solving (34) by using the SCA method mainly depends on the number of optimization variables, constraints, and accuracy of the algorithm. In the second step of Algorithm 1, we find (Υ, ρ, θ, τ ) by solving (32). In fact, we transform (32) into an equivalent IDCP problem (44) using the epigraph technique.
The resulting IDCP problem (44) is also solved by utilizing CVX with the MOSEK solver. The computational complexity of solving the cache placement optimization problem (30) can also be obtained using the same method as (29). In this line, the complexity of finding joint p, φ, r FH and r BH at each iteration of the SCA approach is on the order of

D. Designing a Low-Complexity Resource Allocation Framework
It is very important that the central scheduler be fast enough to readopt the delivery policy in each time slot of Phase 2 based on the arrival of instantaneous requests from users and CSI, especially in realistic ultra dense 5G wireless networks with a large number of unique videos.
For this reason, we propose here another RAF that has a lower computational complexity in each time slot of Phase 2 than that of our proposed RAF in Fig. 2. The main part of the complexity of the proposed framework in Fig. 2 is caused by reallocating the radio resources as well as reassociating users to RRSs at each time slot of Phase 2 to readopt the access transmission strategy based on the arrival requests and CSI. However, some environments with higher path loss heavily limit the flexibility of the user association process and reduce the impact of the wireless small-scale fading on the SINR of users. On the other hand, we aim to utilize the benefits of radio resource allocation and user association policies in the MC-NOMA system to improve user data rates and correspondingly improve user revenues. Thus, in this novel framework, we adapt the obtained radio resource allocation, i.e., transmit power and subcarrier allocation, and user association policies in Phase 1 to all time slots of Phase 2. Hence, the system only reallocates the processing, fronthaul and backhaul resources as well as the request scheduling at the beginning of each time slot of Phase 2 based on the arrival requests from users. Accordingly, in this framework, the LD and HD cache placement problems are exactly the same as (29) and (30) whereas the delivery optimization problem at each time slot of Phase 2 is formulated as The optimization problem (33) is a MINLP which can be efficiently solved by utilizing an alternate algorithm in which (33) is divided into two subproblems as: 1) finding joint φ, r BH , and r FH ; 2) finding Υ. The problem of finding joint φ, r BH , and r FH is a linear programming (LP) and thus, the globally optimal solution can be found by utilizing the CVX software or the Lagrange dual method. On the other hand, the IDCP problem of finding Υ is solved by using the MOSEK solver.
The computational complexity of solving (33) can be obtained as the same way of obtaining the complexity of solving (21). In this way, the complexity of finding φ, r BH , and r FH is on the order of B + L + 4B 2 V L 2 . Moreover, the complexity of finding Υ is obtained by

IV. SIMULATION RESULTS
In this section, we present simulation results that demonstrate the performance of our proposed CPSs via MATLAB Monte Carlo simulations through 500 network realizations [19]. The network topology and user placement is shown in Fig. 4  in km [5], [36], [42]. The small-scale fading of the wireless channel is modeled as independent and identically distributed Rayleigh fading with variance 1 [36]. The CDI is averaged over 1000 CSI samples for a fixed channel power loss, since the location of users and RRSs are fixed in our numerical examples. The PSD of AWGN noise is set to N 0 = −174 dBm/Hz with a noise figure of 9 dB at each user [23]. In the MC-NOMA technology, we set Ψ b = 2, ∀b ∈ B [17], [35], [36]. For the HP-RRS, we set P max dBm [38].
We assume that there exists V = 10 unique videos, each having L = 4 bitrate variants. In our simulations, we set the relative bitrates of the four variants to be 0.45, 0.55, 0.67, and 0.82 of the original video bitrate 2 Mbps (HD quality) [6], [7]. Besides, all video variants have equal length of 10 minutes [7]. The skew parameter of the Zipf distribution is set to λ = 0.8 [6], [7].
Similar to [6], [7] we assume that the processing workload η v h ,v l is proportional to s v l . In this For the proposed virtulization model, we assume that there exists M = 2 slices in the infrastructure with R min 1 = 1 Mbps and R min 2 = 2 Mbps. Each slice also has U/2 = 15 users such that each user subscribes to any slices with a probability of 1/M = 50% [18], [19]. In our proposed pricing scheme, we take µ Cache units/Mbps and ψ 2 = 9 units/Mbps [18], [19], [30].
For processing capacities, we set χ max 1 = 50 GHz (maximum number of CPU cycles per seconds in HP-RRS is 50 × 10 9 ) and χ max b = 25 GHz, ∀b ∈ B/{1}. Moreover, the required number of CPU cycles per byte at each RRS b is set to N v h ,v l Cycle = 5900 [11], [43]. For the storage capacities, we set C max  Table III. To investigate the benefits of each technology in the system, we compare the CVCT system  We also compare the performance of our proposed DACPSs in each scheme with two conventional baseline popular/bitrate CPSs: 1) Most Popular Video (MPV), where each RRS caches the most popular videos until its storage is full [23], [32]; 2) High-Bitrate Video (HBV), where each RRS caches the high-bitrate variants of video files randomly until its storage is full.
As noted previously, obtaining the globally optimal solution of each cache placement and delivery optimization problem via the exhaustive search method would take an unrealistically long time for U = 30, N = 64, B = 5, and V L = 40 [32]. Thus, we limit our simulations to only evaluate the performance of our proposed solution algorithms. We also investigate the delivery performance gain achieved by each of our proposed RAF in terms of total delivery revenue of slices and computational complexity of the delivery algorithm.    opportunities. Therefore, backhaul resource usage is significantly reduced, which decreases the total provisioning cost of slices (see Fig. 6(a)) and correspondingly improves the total revenue shown in Fig. 6(b).

A. Convergence of the Delivery Algorithm
From Fig. 6(b), it is observed that the NC scheme provides the lower-bound of total revenue of slices. In addition, the CVCT system with the LD strategy reduces the total provisioning cost of slices by nearly 70% compared to the NC scheme, which correspondingly improves the total revenue of slices around 17-fold. On the other hand, the cooperation between RRSs improves the total revenue of slices around 69.9% compared to the NoCoop scheme.
In these schemes when the storage capacities are low, the HBV and MPV strategies have lower performances than those of our proposed LD and HD strategies, since they do not consider the flexible delivery opportunities. For instance, when the cache size percentage is 10%, the LD strategy outperforms the performance of the CVCT system by nearly 24.3% and 40.8% compared to HBV and MPV, respectively.
2) Impact of the processing capacity of RRSs: Fig. 7 shows the impact of processing capacity limitation at LP-RRSs on the performance of CPSs in different schemes. Obviously, larger processing capacities provide more video transcoding opportunities, which alleviate the backhaul resource usage. It is noted that two transcoding types exist in the system as self-transcoding and cooperative transcoding. The self-transcoding in the parallel transmission and transcoding system mainly depends on the wireless channel, storage, and processing capacities. In other words, for larger processing capacities, increasing them alone cannot significantly improve the self-transcoding opportunities, since higher bitrate variants should be stored and the wireless channel capacities are limited. Besides, the cooperative transcoding mainly depends on the wireless channel, storage, processing and fronthaul link capacities. In other words, the cooperative transcoding operations cannot be successfully performed if the fronthaul capacities between RRSs are insufficient. Accordingly, based on the available storage and fronthaul capacities, it can be seen that the performance of all CPSs have slow changes when the processing capacity of LP-RRSs exceed 35 GHz and 15 GHz for the CVCT and NoCoop schemes, respectively. These results are shown in Fig. 7.
In Fig. 7(b), it is shown that the LD strategy in the CCNT scheme performs 10 times better than NC. Moreover, the cooperative transcoding technology improves the systems performance closed to 56.4% alone in the cooperative schemes when the processing capacity of LP-RRSs is 25 GHz.
As shown in Fig. 7, the performance of HBV is more affected by the amount of the processing capacity, since HBV randomly selects the highest bitrate variants in order to increase the self and (empirically) cooperative transcoding opportunities. Interestingly, when the relative processing capacities increase, the performance gain between the HBV and LD strategies decreases from 37.9% to 21.8%, as shown in Fig. 7(b). Therefore, the HBV strategy can be a good candidate when processing capacities are large enough. Besides, the performance of MPV changes slowly according to variations in processing capacity, since it does not consider the transcoding of video files, specifically in the event that low bitrate variants turn out to be more popular. MPV also stores the same popular videos, which significantly degrades the cooperative transcoding opportunity in the system.
3) Impact of the fronthaul capacity of RRSs: Fig. 8 shows the impact of the fronthaul capacity limitation between RRSs on the performance of CPSs in the CVCT and CCNT schemes.
Generally, larger fronthaul capacities increase the cooperative communication capability. In this way, the systems performance will significantly be improved (see Fig. 8(b)).
According to Fig. 8(b), for the LD strategy, the CVCT technology with R max b ,b = 70 Mbps outperforms the total revenue of slices by nearly 75.3% compared to the NoCoop where R max b ,b = 0. In this regard, the systems performance is improved nearby 112.9% in the CCNT scheme which is caused only by the cooperative caching technology. Besides, the caching capability without  any cooperation causes nearby 459.8% improvement in the total revenue of slices. In addition, the joint caching and transcoding capability at RRSs without any cooperation improves the system performance by nearly 10-fold compared to the NC scheme. From this result, it can be concluded that the self-transcoding capability in the system improves the total revenue of slices up to 5-fold alone.
C. Effect of Zipf Parameter Fig. 9 shows the effect of Zipf parameter λ on the performance of different CPSs. As shown in Fig. 9(a), for λ = 1.5, U = 30, and V L = 40, there are nearly 10 unique requests. For the aforementioned setting, Fig. 9(b) shows that nearly 57.9% and 36.22% of these requests are for 20% and 10% of most popular videos, respectively. These results are averaged over 10000 sets of requests. Accordingly, Figs. 9(a) and 9(b) show that when λ increases, the diversity of requests decreases in the system. In this line, the request percentage of the most ranked videos increases, exponentially in the system. In this regard, the backhaul, fronthaul, and processing resource usages are reduced which decrease the total provisioning cost of slices (shown in Fig.   9(c)). As a result, the total revenue of slices is improved that is shown in Fig. 9(d).
Based on Figs. 9(c) and 9(d), it can be derived that the HD strategy is more compatible than that of LD when λ tends to zero. Besides, MPV is more affected by the Zipf parameter, since it  is only a baseline popular strategy. Specifically, when λ varies from 0 to 2, MPV outperforms by about 152.4%. For λ = 0, i.e., when users uniformly request videos, HBV is more compatible than that of MPV. This is because, in this situation, the VPD factor is not dominant at all. In addition, the performance gaps in terms of the total revenue of slices between the LD and HBV strategies and the LD and MPV strategies are nearly 18.4% and 57.8%, respectively. Interestingly, these results show that MPV is not compatible for high diversity situations, whereas HBV can be a good solution with its very low complexity structure. On the other hand, when λ is large enough, i.e., when only a few videos are frequently requested by users, the performance gaps 12 16 32 64 Total number of subcarriers    between our proposed LD and HD strategies and MPV decrease significantly in the system. Actually, MPV is close to an almost optimal strategy when λ is too large. It is noted that there still exists a performance gap (nearly 7.38% in terms of total revenue of slices) between HBV and other strategies when λ is too large, since HBV does not consider the VPD.  Fig. 10(a). This improvement outperforms the total reward of slices (see Fig. 10(b)). Paradoxically, increasing the data rate of users needs more backhaul, fronthaul, and processing resources due to the parallel delay constraints (12), (13), (15), (17), and (18). In this regard, improving the users access data rate leads to increasing the provisioning cost of slices which is shown in Fig. 10(c). Interestingly, by increasing N , the expensive wireless bandwidth usage is significantly reduced, which corresponds to a decrease in the total bandwidth cost. (see Fig. 10(d)). This is because increasing N improves the flexibility of bandwidth allocation in the multicarrier systems. Hence, more opportunities are provided for slices to assign the available subcarriers to the users and satisfy the QoS requirements. The bandwidth cost reduction and slice reward increments combined have a greater effect on the revenue of slices than the increment of the provisioning costs. Hence, increasing N improves the total revenue of slices as shown in MC-NOMA is expected to increase the spectral efficiency more than OMA by performing SIC at receivers shown in Fig. 10(a). In this regard, the total reward of slices is improved in the system (see Fig. 10(b)). Although MC-NOMA causes more provisioning costs than OMA because of increasing access data rates in parallel systems (see Fig. 10(c)), it reduces the total bandwidth cost of slices much more than OMA. This is because each slice can use the same subcarriers for its own users in each cell, while the price of each subcarrier bandwidth is paid once in the cell by that slice (see (20)). This opportunity is only instantiated by MC-NOMA, since it provides the reuse of orthogonal subcarriers at each RRS. As shown in Fig. 10(a), for N = 12, the average data rate of each user in MC-NOMA is improved by nearly 33.86% compared to OMA. This is because when OMA is applied to a system with critically low number of subcarriers, system performance is degraded owing to the flexibility of the bandwidth reuse at each RRS being completely eliminated. Finally, MC-NOMA outperforms OMA by nearly 220.2% when N = 12; and when N = 64, this result is reduced to 21.91%. Fig. 11 compares the performance of our proposed LC-RAF to that of Fig. 2 in terms of total revenue of slices and computational complexity order of the delivery algorithm. In Fig. 11(a) we assume that the path loss is modeled as 128.1 + 10 log 10 (d b,u ) + z b,u in dB where the path loss exponent varies from 2 to 4. Obviously, when increases, the data rate of users decreases in the system. Accordingly, the total reward of slices degrades, which reduces the total revenue of slices. Interestingly, as seen in Fig. 11(a), the performance gap between the main and  and unique videos V are large enough, respectively. Interestingly, as shown in Fig. 11(b), the complexity of the delivery algorithm in the LC-RAF does not depend on the number of users, which means this framework can be a good choice for dense environments.

V. SUMMARY AND CONCLUDING REMARKS
In this paper, we investigated the idea of developing a DACPS in a parallel CVCT system followed by a limited backhaul and fronthaul MC-NOMA-assisted HV-MEC. In this network, we first proposed a RAF based on the network operational time. In Phase 1, we designed two diversity-based schemes where in each scheme, we maximized the estimated average revenue of slices subject to the minimum required access data rate of each user owned by each slice and some fundamental system constraints. In order to find an efficient solution for these large-scale and NP-hard problems, we proposed a solution algorithm based on the alternating optimization method. To reduce the computational complexity of the delivery algorithm, we proposed a LC-RAF where the radio resource allocation policy obtained in Phase 1 is adopted for all time slots of Phase 2. Numerical assessments showed that our proposed DACPSs improve the average system delivery cost between 25% and 40% compared to the conventional baseline bitrate/popular video CPSs. It is also observed that the integration of cooperative caching and cooperative transcoding capabilities can improve the system revenue up to 17-fold, which is notable considering each technology alone outperforms the system revenue only 6 to 10-fold. Finally, we showed that our proposed LC-RAF can be a good choice for dense environments with high levels of path loss where the distance of users to RRSs is more pivotal for the user association decision than the placement of videos in order to satisfy the QoS of users.

APPENDIX B
THE PROPOSED SCA ALGORITHM FOR SOLVING (34) To tackle the non-convexity of both (34a) and (29b) and all constraints in (34e)-(34i), the access data rate functionr Ac b,u should be transformed into concave and convex forms, respectively. To approximate the access data rate functionr Ac b,u in (34a) and (29b) to a concave format at each iteration κ 2 of the SCA algorithm, we first define where f n b,u and g n b,u are concave functions with respect to p. Moreover, f n b,u and g n b,u are formulated, respectively, by Then, we approximate g n b,u (p κ 2 ) at each iteration κ 2 by its first order Taylor series approximation around p κ 2 −1 as follows [38], [39]: where ∇g n b,u (p κ 2 −1 ) is a vector of length U B and its entry is obtained by , ∀i = b, u ∈ U/{u}.
Therefore, the concave approximated function ofr n b,u at each iteration κ 2 is expressed bŷ Besides, in order to approximater Ac b,u in (34e)-(34i) to a convex form at each iteration κ 2 of the SCA algorithm, we first definer Ac b,u as a D.C. function in (35). Then, we approximate f n b,u (p κ 2 ) at each iteration κ 2 by its first order Taylor series approximation around p κ 2 −1 as where ∇f n b,u (p κ 2 −1 ) is a vector of length U B and its entry is expressed by , ∀i = b, u ∈ U/{u}.

APPENDIX C
EQUIVALENT TRANSFORMATION OF (32) The resulting IDCP form of (32) is formulated as min Υ,ρ,θ,τ ,ϑ,θ,ν,Υ,Υ m∈M$ slice,Epi,2 m x ν n b,m ≥ τ n b,u , ∀b ∈ B, n ∈ N , u ∈ U m , ν n b,m ∈ {0, 1}, (44k)   . To address this challenge, we utilize the following lemma to linearize each binary bilinear product. Proof. For each two binary variables x and y, the following equality is always satisfied: xy = min{x, y}. By utilizing the epigraph technique and introducing a new binary variable z ∈ {0, 1} such that z ≤ x, z ≤ y, and z ≥ x + y − 1, the integer linear term z can be replaced with the integer nonlinear term xy.
Based on Lemma 1, by adding constraints (44l)-(44q), the binary variablesŷ  (46), respectively, which turn (46) into a linear integer form. To cope with the nonlinearity of delay functions in (14) and (16), and average access delay functions in (24), we utilize the transformation method presented in Appendix A. To this end, we first transform (14) and (16)  The first step of Algorithm 1, i.e., finding joint p, φ, r FH and r BH , is similar to the first step of solving (29) except that in the second step, i.e., finding joint Υ, ρ, τ and θ, θ b,u is directly multiplied to all scheduling variables in Υ. Therefore, by using Lemma 1, (30) is transformed into an equivalent IDCP problem formulated as