CMU-VP: Cooperative Multicast and Unicast With Viewport Prediction for VR Video Streaming in 5G H-CRAN

Virtual reality (VR) is commonly regarded as one of 5G killer-applications. Transmission efficiency and quality of experience (QoE) are the most concerning issues for VR video streaming in 5G networks. Several multicast approaches were proposed to address these issues regardless of variance of personal viewports. In this paper, we explore a novel scheme combining multicast and unicast sessions in heterogeneous cloud-radio access networks (H-CRAN), in which a basic version of the video is transmitted to all users through the g-NB in a multicast session, and tiles of enhanced-version are transmitted to each viewer in a unicast session through its stationed remote radio head (RHH). To ensure the real-time content delivery, a user’s viewport is predicted using a method based on historical trajectories and similarity of motion behavior, and then the tiles of predicted viewport in a version dependent on the channel quality are sent to the user in the unicast session. The scheme is formulated into a mixed-integer nonlinear problem (MINLP), and two near-optimal solutions are proposed to solve it by applying greedy approach and approximate approach, respectively. The simulation results show that our proposed scheme ensures better QoE under constrained bandwidth, and the proposed near-optimal solutions can efficiently solve the problem with low complexity and comparable performance.


I. INTRODUCTION
Recently, as the popularization of virtual reality (VR), increasing people are able to experience VR capabilities on affordable head-mounted display (HMDs) (e.g., HTC VIVE).VR makes use of 360-degree panoramic videos with high resolution (higher than 4K), high frame rate (60-90 fps) and low delay (less than 20 ms) to provide an immersive environment for the user to interact with the virtual world [1].However, streaming bandwidth-intensive VR video over current network is quite challenging.First, compared to traditional video, its larger size and higher resolution lead to The associate editor coordinating the review of this manuscript and approving it for publication was Xiaodong Xu. intensive bandwidth demand.Second, unlike traditional video viewing, where screens are often shared (e.g., family watching TV together), HMD screens cannot be shared and each HMD requires the same content need to be delivered separately [2].Third,compared to communication delay (source provider to user equipment) for current network (e.g.,10∼200ms for LTE), motion-to-photon (MTP) latency for VR is quite critical (i.e.,less than 20ms) [2].Therefore, the increase of end-to-end latency will degrade the QoE significantly [3].Moreover, only portion of the panoramic VR video is watched at a certain time due to the limitation of viewer's field of view (FOV), which causes significant wasting of bandwidth resource.Consequently, streaming high quality 360-degree videos efficiently is one of the most critical issues for current VR applications.
Optimization of VR video transmission is actively under exploration.In order to reduce bandwidth demand, tile based adaptive streaming is proposed [4], which partitioned the panorama into independent tiles, enabling various content rate adaptable to the desired viewport of viewers.Based on tiling scheme, multiple approaches has been proposed to improve the quality of experience (QoE) [5], [6].However, these approaches always transmit panorama videos to viewers in a unicast manner, e.g, streaming a VR video to millions of viewers, which is not efficient even though tiling can help with bandwidth reduction.Meanwhile, latency is another critical issue for adaptive streaming, since HMD updates the viewport data with a frequency between 60Hz and 120Hz [8] (which means the latency for data input should be less than 20ms).Hence, viewport prediction based prefetching strategy is generally adopted, which significantly reduce the latency as well as the transmission bandwidth.However, the accuracy of historical motion based prediction [10] will decrease sharply along the time.Once the prefetched quality-foveated tiles is mismatched with the real viewport, QoE of viewer will be degraded noticeably.Recently, a few attempts to utilize multicast scheme to serve a huge number of viewers consuming the same VR content simultaneously [7]- [9].The multicast solutions significantly improve the efficiency of VR video transmission.However, it fails to fulfill individual demand of viewers with respective of viewport (i.e., one has to compromise to view low quality of VR video due to the user grouping for spectral efficiency improvement).
As for the underlying technology, 5G heterogeneous cloudradio access networks (H-CRAN) has shown as a promising solution for VR video transmission in the near future [11].At early stage of VR application, most of VR videos are downloaded in advance and viewed locally by moving the viewport in real-time.Recently, an increasing number of people prefer to experience VR at any time and any place through portable HMDs integrated with powerful Graphics Processing Unit (GPU), which is also referred as Mobile VR.Unsurprisingly, Mobile VR is predicted as one of 5G killer-applications [3].H-CRAN facilitated with mobile-edge computing (MEC) [12], together with dual radio architecture [22] are envisioned to be typical in 5G systems.Heterogeneous networks with massive densification of small cells and C-RANs are combined in one network structure to improve spectral efficiency, resource management and energy efficiency.Naturally, those make bandwidth-intensive VR video streaming over H-CRAN feasible.
In 5G H-CRAN, dual connectivity (DC) allows users to be simultaneously served by a macro and a small cell operating at different carrier frequencies [10].By exploiting the DC, a multimedia multicast transmission scheme through macro/small BS cooperation is presented in [10], in which multicast streaming and patching streaming is delivered by macrocell base station (MBS) and small-cell BS (SBS), respectively.Notice that the multicast can significantly improve the transmission efficiency but it will degrade the viewers' QoE.By contrast, the QoE can be guaranteed by unicast but it will result in ineffective transmission, especially when considerable number of people are request same VR content simultaneously (e.g. a popular VR film).Is it possible to transmit VR video by taking advantage of multicast for transmission efficiency and unicast for QoE improvement ?Motivated by this idea, in this paper, we explore VR video streaming in 5G H-CRAN through cooperative multicast and unicast (CMU) by macro (i.e., g-NB) and small cell (i.e., RRH), respectively.More specifically, a certain version of panoramic tiled VR video is transmitted to all viewers through g-NB in a multicast manner.Meanwhile, based on cross user viewport prediction, a portion of enhanced tiles in the predicted viewport is transmitted to respective viewers in a unicast manner through corresponding RRH.In order to improve the accuracy of viewport prediction for unicasting enhanced tiles, we explore the viewport prediction based on historical trajectories and similarity of cross viewer motion behavior.Furthermore, with proactive caching and real-time transcoding paradigms in MEC, the end-to-end latency is reduced significantly as that pushes the data close to the user equipments (UEs).
Our main contributions are described as follows: • In order to achieve better transmission efficiency and QoE improvement, we proposed the cooperative multicast and unicast with viewport prediction (CMU-VP) scheme for VR video streaming in H-CRAN.
• Viewport prediction based on historical trajectories and similarity of cross viewer motion behavior is proposed for better accuracy.
• To solve the mixed-integer nonlinear problem (MINLP), two near-optimal solutions with low complexity and comparable performance are presented by exploring the greedy approach and approximate approach, respectively.
• Plenty simulations are conducted to verify that CMU-VP scheme ensures better QoE under limited bandwidth, and the near-optimal solutions can efficiently solve MINLP with low complexity and comparable performance.
The remainder of the paper is organized as follows.Section II provides an overview of related work.System model is presented in Section III.Cross user viewport prediction and proposed CMU-VP scheme are described in Section IV.Problem formulation and Solutions are represented in Section V. Evaluations are performed in Section V. Finally, Section VI concludes the paper.

II. RELATED WORK A. ADAPTIVE VR STREAMING
Aiming at reducing the bandwidth consumption, video coding solutions are widely adopted for VR video streaming in the literature.Among these approaches, tile-based and viewport based are the most common approaches for adaptive streaming, which common strategy is to transmit portion of panoramic video in high quality while the rest in lower quality.Tile based adaptive streaming is first proposed by Patrice et al. [4], in which a greedy approach is adopted to solve the optimization of tile selection.Furthermore, the impact of tiling scheme on quality of transmission is evaluated.Based on tiling scheme, multiple approaches has been proposed to improve QoE [5], [6].
Viewport based adaptive streaming is typical represented by Facebook, a VR video is transformed and encoded into multiple versions towards different perspectives.And a viewer requests one of the video versions according to the orientation.However, such scheme produces high redundancy of contents (e.g., Facebook creates 150 versions for one VR video) which require huge amount of storage and bandwidth.

B. MOBILE VR STREAMING
The previous studies focused on wireline VR video transmission or offline delivery for local viewing on HMD.However, with the growing capacity of mobile network and increasingly popular of Mobile VR, the demand of transmission high quality VR video efficiently over mobile network is becoming ever urgent [15].To cater for data intensive VR video delivery over mobile network, Qian et al. [16] proposed a cellularfriendly VR video streaming scheme that transmits only visible portion of video based on head movement prediction.Similar approach is also presented for mobile VR [14], which improvement is to complement the viewport based method by adding foveated provision within each view.While these approaches heavily depend on viewport prediction, the QoE will degrade dramatically once the transmitted data mismatch the real Field of View (FOV).Furthermore, the chunks of data stream has to be segmented relatively short to keep peace with frequent viewer's viewport changes.Consequently, the efficiency of transmission with short chunks is low because of ineffective compression and heavy cost of data synchronization.An efficient User-Generated Content (UGC) VR video transmission scheme over cellular network is represented in our previous work [25], in which only one representation of each tile is generated for uploading based on optimization of uplink resource allocation under the consideration of quality of content (QoC) contribution, then directly transmitted to viewer without transcoding.
However, transmitting a bandwidth-intensive VR video to multiple users over resource-limited mobile network in unicast manner is an inefficient way.Hence, a few researches explore VR video multicast for the objective of transmission efficiency.Bao et al. [8] designed a bandwidth-efficient scheme that combines multicast and unicast by sending partial of VR video based on motion prediction, to decide whether to use one multicast or multiple unicasts when a certain viewing area on the sphere is required by multiple viewers.In addition, margins are added to the predicted FOV in order to handle prediction errors.In order to improve spectral efficiency of VR video multicast.Jounsup et al. [7] proposed a multi-session multicast approach for VR video, in which user grouping, wireless resource allocation and tile rate selection were jointly optimized with spectral efficiency maximization.Inspired by the similar motivation, a multicast DASH-based tiled streaming solution was presented in [2], and a heuristic algorithm was proposed to solve the tile rate adaptation problem.In addition, a QoE-aware deep learningaided VR multicast framework was presented by Cristina Perfecto et al. [17], in which the future FoV of VR users were predicted by a deep neural network.Based on prediction FOV, proactive multicast resource scheduling was performed.And the problem was formulated as a request admission maximization problem and was solved by a low complexity matching algorithm.Furthermore, Athul Prasad et al. [18] discussed the challenges for VR broadcast using 5G small cell network, solutions in terms of usage of single frequency network (SFN) type of deployments and unlicensed millimeter wave (mmW) bands were considered.Recently, they proposed a D2D assisted VR broadcast scheme that enabled radio resource efficient delivery VR video using broadcast transmission [19].

III. SYSTEM MODEL
The system considered in this work is depicted in Fig. 1, which consists of three part: VR content Server, MEC Server and H-CRAN.The detail of the system are described as follows: The VR content Server is responsible for storing the encoded VR video.With motion-constrained tile sets (MCTS) coding [5], the VR video is spatially split into rectangular, independently decodable, non-overlapping tiles after Equirectangular Projection (ERP) [8] to facilitate viewport cropping, which are denoted by t ∈ T .And the stream of each tile is then segmented into chunks temporally.Generally, multiple representations for each tile are generated at the content Server to facilitate further adaptive streaming, and the  bitrate representations are denoted as {R 1 t , ..R Q t }, where t is represented the tile index, R 1 t and R Q t represent the predefined minimum and maximum encoding rate of t-th tile, respectively.In addition, a cross user viewport prediction module is designed at the VR content sever to utilize the similarity of user viewport motion, then the result of prediction is fed back to MEC server for CMU scheduling.
The MEC server is located at the edge of H-CRAN, which enables proactive caching and real-time transcoding for request content.When users within the coverage of H-CRAN request the VR content from the content server, the request data will be retrieved from the MEC server.We assume that the only one representation of a VR video with the highest quality is proactively cached at the MEC server.To provide appropriate tile representations for CMU, a real-time transcoding module is designed in this paper, as shown in Fig. 2. Viewport tiles for each unicast links and transcoded entire VR video for multicast link are re-encoded based on the proactive cached representation, and the appropriate quantization parameters (QPs) for unicast stream and multicast stream are selected according to the optimization of CMU scheduling.
Two-tier H-CRAN are considered in this work, where a macro 5G base station (i.e., g-NB) are underlaid with small cells (i.e., RRHs), the g-NB and small cell RRHs are connected to a centralized baseband-unit (BBU) pool through backhaul links and fronthaul links, respectively [11].The BBU pool executes upper layer functions and baseband signal-processing, whereas, the RRHs perform as radiofrequency (RF) transceivers and only perform basic RF functions.Software defined network (SDN) is adopted to support the separation of data and control information, which divides the control information into g-NB and data information into RRHs [12].The g-NB is mainly responsible for the delivery of control information and multicast service.For the simplicity of index, the RRHs and g-NB are uniformly denoted by n ∈ N .To protect the control information and multicast signal, spectrum of g-NB and RRHs are assigned at different frequencies.In addition, the spectrum assigned to RRHs are reused at each RRH.Moreover, cyclic prefix OFDM (CP-OFDM) [11] is considered as multiple access technique in our presented architecture.
Furthermore, consider V randomly viewers consuming one VR video simultaneously.DC is supported for all user equipments (UEs) (i.e., viewer HMDs in our system).The signal interference to noise ratio (SINR) corresponding to unicast link and multicast link are estimated from the uplink sounding reference signals that are periodically broadcasted from the UE.Based on the link quality and predicted viewport information, optimization of CMU scheduling is performed at the MEC server.The optimization results will be sent to the UE, and the UE request the appropriate quality of viewport tiles and entire transcoded VR video according to the optimization result.Finally, the MEC server performs the transcoding task and send the proper chunk of tiles by CMU through g-NB and RRH, respectively.

IV. CROSS USER VIEWPORT PREDICTION AND PROPOSED CMU-VP SCHEME A. CROSS USER VIEWPORT PREDICTION
Cross user viewport prediction is proposed, in which, K most similar users is first selected based on similarity of historical motion, then the K users' fixation is used to amend the prediction result achieved by linear regression approach.
Tile based adaptive streaming has shown as a promising way to transmit VR video efficiently [4].However, due to the delay sensitive feature of VR application (e.g., motion to photon latency should be less than 20 ms), it is highly impracticable to transmit high quality of video at current viewport area immediately.Thus, viewers has to keep relative large buffer for proactively pre-fetching tiles to ensure continuous playback, and it is necessary to predict the viewport for the reason of quality enhancement on portion of pre-fetching tiles.Motion-based [8] and content-based [13] approaches are most widely adopted for viewport prediction.However, prediction accuracy of motion-based approach drops quickly along with the time, and content-based approach is probably not reliable since how texture and motion of video content influences on the user motion are not fully investigated [20].Recently, Ban et al. [14] exploited cross user behavior to predict viewport by K-Nearest-Neighbors (KNN) method, which achieved considerable performance.However, KNN method fails to consider the similarity between cross user motion behaviors, since the K nearest fixation of other users is used to amend the prediction result based on historical trajectories.While motion behavior of the prediction user is probably quite different from the K users' (i.e., each user can potentially view a VR video in a unique fashion, under this situation, using the K nearest fixation to amend the prediction result will lead to larger errors).On the other hand, some research have already shown that users in virtual environment have certain similar viewing patterns when watching the same VR video [21].
Based on the aforementioned analysis, we intend to utilize the similarity of user motion behaviors for higher accuracy of viewport prediction.The demonstration of proposed prediction approach is shown as Fig. 3, where the viewport is first predicated based on historical trajectories, then the prediction result is amended by exploiting similarity of motion behaviors of cross users.The similarity of motion behaviors between two user is defined as (1), where t 0 represents system time, δ is sampling index of historical trajectories, D is the time window of similarity measurement, t δ {v, j} represents the overlapped number tiles between two users' viewport at time t δ .v and j are the index of two different user, and v, j ∈ V .
t δ {v} represents number of tiles in viewer's viewport at time t δ .
Similar strategy as KNN approach is adopted in our work, top K of most similar motion behaviors of users are selected for amending the historical trajectories based prediction.
For historical trajectories based prediction, a linear regression (LR) model in window (S 0 − 1, S 0 ) is trained to predict viewport in future time t 0 + , which can be formulated as (2), where S 0 and denote the chunk index at t 0 and the time duration of chunk, respectively.O LR represents the estimated fixation predicted by LR, and ω LR is the regression coefficient of LR.
We adopt the vote mechanism which considering fixation weights and tile's viewed times by the K of most similar users.Furthermore, the votes for a certain tile ξ t can be written as (3), where W LR and W SM represent weights of LR prediction and weights of similar user fixation, respectively.O k SM denotes the fixations of K most similar cross-users.F(O) indicates the covering field of each fixation represented by a T -dimension vector, where T represents the tile number indexed in raster-scan order, and F(O) = 1 means the tile is viewed, 0 otherwise.
Finally, viewing probability of each tile p t for the user can be written as (4), i.e.,normalizing votes for each tile.

B. PROPOSED CMU-VP SCHEME
The scheme with reasonable tradeoff between transmission efficiency and QoE is proposed for efficient VR video streaming.Streaming entire tiled video to users in unicast manner wastes significant network bandwidth, because users can only watch a portion of video at a time through the HMDs, while the rest of transmitted video outside the user viewport is wasted.Multicasting VR content to large number of users who consuming same VR video simultaneously is an efficient way.However, it would lead to part of users with lower QoE, since it fails to enhance content quality within respective viewport of users.To utilize DC feature in H-CRAN, we take both advantages of unicast on QoE improvement and multicast on transmission efficiency, i.e. cooperative multicast and unicast with viewport prediction, which is referred as CMU-VP in this paper.Specifically, a certain version of entire tiled VR content is transmitted to all viewers through g-NB in a multicast manner, while portion of enhancement tiles are transmitted to each viewer based on predicted viewport through corresponding RRH in a unicast manner.For each viewer, after receiving multicast and unicast data, then merges them into a new version of VR content for future display.
The detail of CMU-VP is summarized as follows: 1)Tiled VR video with the highest representation is proactively cached at the MEC server, when users within the coverage of H-CRAN request the content from the content server, the request data will be retrieved from the MEC server.2) Meanwhile, viewport prediction is performed at the VR content sever according to our proposed cross user viewport prediction method, then the result of prediction is fed back to MEC server for CMU scheduling.3) Once VR content server receive the consuming requests of VR video from viewers, it creates a multicast service at Broadcast Multicast Service Center (BM-SC), which is responsible for multicast sessions management [23].Meanwhile, a unicast stream VOLUME 7, 2019 is also maintained with each UE for transmitting enhancement tiles.4) Optimization of CMU scheduling are performed jointly considering probability of tile viewing, the channel quality of multicast link and unicast links.Then a certain version of entire tiled video and enhancement tiles responding to each predicted viewport are generated through the realtime transcoding.5) With the assistance of SDN, the certain version of entire tiled video and enhancement tiles are distributed into g-NB for multicast and RRHs for unicast, respectively.6) With DC supported, UEs receive multicast stream from g-NB and unicast stream from corresponding RRH.Finally, multicast and unicast streams are merged (i.e., the lower representation of the tiles received from both unicast and multicast streams is discarded) in buffer of UEs to achieve a high quality of VR content.
Note that the objective of CMU-VP is to optimize QoE for all the viewers.Thus, the version of entire tiled video (i.e., tiles with which representations) for multicast, and unicast enhancement tiles (i.e., which tiles and with which representations) to respective viewer need to be determined for that purpose.Thus, the problem of VR video streaming through CMU-VP is formulated as a QoE optimization problem, and detail of problem formulation and solution are given in next section.

V. PROBLEM FORMULATION AND SOLUTIONS A. PROBLEM FORMULATION
In this paper, we adopt QoE metric for VR video streaming in [7], and utility of a tile is defined as (5), where R v,t and R Q t represent the encoding rate of t-th tile transmitted to v-th UE and the pre-defined maximum tile encoding rate, respectively.And α and β are the coefficients of utility model.The utility is strictly concave function of received video rate, and also marginally decreasing as the increase of video rate.These features of utility function which can model the quality of user experience well as the received video rate increasing.For tile based VR video, the utility of a tile is used to model the contribution of tile with certain bitrate to the whole quality of user experience.
According to conventional multicast scheme [7], the transmission rate of a group is determined by the user with the worst channel condition in that group.Thus, the multicast rate can be written as follows: where B m represents bandwidth for multicast, P 0 represents transmission power of g-NB, E h 0,v 2 is the UE v average channel gain from g-NB, and σ 2 is the power spectral density of additive white Gaussian noise.Note that, in order to ensure the transmission efficiency, only one user group is considered in our scheme.
And the unicast rate of UE v associated n RRH can be written as follows: where represents multicast representation of tile t.Note that each viewer always chooses a better representation of tile if the tile is received by viewer from both unicast and multicast streams.We adopt weighted utility function [4] to model the expected QoE for VR video streaming in this paper, the utility of a tile is used to model the contribution of tile with certain bitrate to the whole quality of user experience, and weight p v,t is the probability of tile t visible in the viewport of viewer v.In order to maximize the whole QoE of all users under limited bandwidth, the version of entire tiled video for multicast (i.e., R m v,t ), and unicast which enhancement tiles with which bitrate (i.e., x v,t , R u v,t ) to respective viewer need to be determined by CMU-VP scheduling.Consequently, CMU-VP schedule problem with the objective of QoE maximization for all viewers can be formulated as below: The constraint (8) limits the sum of multicast tiles rate in bits/sec is less than the rate achieved from g-NB for multicasting.(9) indicates that total transmitted tiles rate to each viewer in a unicast manner should be less than the achievable rate of each UE from associated RRH. ( 10) limits the muticast and unicast tile rate is selected from pre-defined tile representations, respectively.( 11) is binary constraint indicates whether tile t is transmitted to viewer v in a unicast manner.

B. SOLUTIONS
The problem P1 is a MINLP, and is proved NP-hard in [24].Due to high computation complexity of exhaustive search and heuristics algorithms, we explore to derive two near-optimal solutions for the problem, and details are given as follows.// determine x v,t according to predicted viewport 5: x v,t = 1 apply greedy approach to solve P2 Greedy algorithm solve the problem by adopting greedy approach after one variable x v,t is firstly determined according to the viewport prediction.Note that there are three variables(i.e., x v,t , R u v,t , R m v,t ) need to be determined in the P1, it's highly impossible to simultaneously find the optimal solutions for objective optimization.However, we notice that once the x v,t is determined, and R u v,t , R m v,t can be selected by solving the P2 and P3, respectively.Fortunately, x v,t for each viewer can be first determined based on viewport prediction as we assuming that the FOV consists constant number of tiles for each viewer.With the highest viewing probability tile index, the tiles within the the predicated viewport is selected for unicast to viewer.
With determined x v,t , for each viewer, the unicast and multicast schedule problem can be derived into two independent sub-problems P2 and P3.While the similar problem has been studied in our previous work [25], in which we adopted the greedy approach to solve this type of problem.Therefore, the problem P1 is finally solved by applying the greedy approach, and the Algorithm 1 summarizes the details of our approach, which is given as following: 9), (10) 8), (10) It should be noted that Algorithm 1 achieves sub-optimal solutions as we roughly derive P1 into two independent problems P2 and P3 after the x v,t is determined.In addition, utilizing the greedy approach to solve P2 and P3 separately will also lead local optimal.Meanwhile, the computational complexity of greedy approach is relatively high.

2) DECOMPOSING ALGORITHM
Decomposing algorithm solves the problem by decomposing the P1 into two sub-problems.Note that, multicast and unicast jointly scheduling makes the problem P1 prohibitively difficult to solve.However, if the multicast schedule result is first obtained, then the unicast schedule problem with the multicast schedule result would be relatively easy to solve.Inspired by this idea, we decompose the problem P1 into two sub-problems P4 and P5, which can be described as multicast schedule problem and unicast schedule problem with the multicast schedule result, respectively.
where p t is the average probability of tile t for all multicast viewers, and R m t = R m v,t indicates that all the viewers received same tile representation through the multicast stream.
After solving P4, P5 becomes a resource constrained knapsack problem.The P5 can be solved by relaxing it into a fractional knapsack problem then applying the greedy approach [4].Due to the computationally fast and elitism strategy, we adopt APMonitor Modeling Language to solve P5 [26], which provides real-time optimization, and is freely available through MATLAB or Python interface.And APOPT solver provided by APM solves the problem P5 with obtained solution of P4 efficiently.
Note that more sophisticated algorithms can be applied to solve the P1 optimally, however, it should be high computation complexity cost.In contrast, according to our strategy of problem solutions, the complexity is greatly reduced and is easy to implement by adopting exist algorithm or optimization toolbox.Furthermore, simulations also show that the gap between proposed near-optimal solutions and optimal solution is relatively small, and near-optimal solutions can solve the problem efficiently with comparable performance.And details will be explained in next section.

VI. SIMULATIONS
In this section, we conduct plenty simulations to evaluate the performance of our proposed solutions for the problem.

A. SETUP
A g-NB cell with 0.5 km radius is considered in the simulation, where three RRHs is uniformly distributed within g-NB cell.The users' locations are randomly generated and UE follow a Poisson Point Process (PPP) with density distributed under g-NB coverage.The transmit power of g-NB and RHHs are 46 dBm and 30 dBm, respectively, and system bandwidth is set 10 MHz, 15 MHz, 20 MHz in different simulation conditions,respectively.Note that bandwidth is divided into two parts.one part is for multicast through g-NB, and the rest part is for unicast through RHHs, respectively (i.e., 6,9,12 MHz and 4,6,8 MHz in our simulation).And 60% of g-NB bandwidth is used for multicast service [7].Lognormal shadowing with 8 dB standard deviation is implemented.The noise power spectral density is assumed to be -173 dBm/Hz, and more detail of simulation parameters is given in Table .2.
In order to fairly evaluate the proposed CMU-VP scheme with near-optimal solutions (CMU Greedy, CMU Decomposing), few existing schemes are adopted for comparison: 1) CMU optimal solution with exhaustive search (CMU optimal): CMU-VP scheme is adopted, and solved by exhaustive search to obtain optimal solution.2) CMU-VP with equal tile bit rate (CMU equal): CMU-VP scheme is adopted, while tile bit rate is distributed equally without considering probability of viewing.3) Multicast only [2]: Tile based multicast scheme according to [2], which jointly considers bandwidth constraint and tile weight.4) Unicast only [5]: Tile based unicast scheme according to [5], where tile selection is based on viewport prediction.4K resolution VR video (i.e., Freestyle Skiing) in [27] with 32 tiles is used for the simulation, and each tile is encoded by open-source HEVC encoder Kvazaar with 10 different representations: {0.1, 0.3, 0.5, 0.7, 0.9, 1, 1.2, 1.5, 1.7, 2.0} Mbps.And viewport trajectories of 48 users in dataset [27] are divided into two part, the first part are used for viewport prediction, while the other are adopted for performance evaluation.For one certain user, K most similar users is selected through the similarity measurement based on the first part of viewport trajectories.meanwhile, viewport prediction for next chunk is performed by LR based on historical trajectory.viewport prediction is performed by our cross user viewport prediction method.Then different streaming schemes are performed with viewing probability of each tile p t for each user.Finally, based on the second part of viewport trajectories, we evaluates the QoE objectively according to tile bitrate within each user's viewport.

B. RESULTS
First, proposed viewport prediction method based on historical trajectories and similarity between cross user motion behaviors is evaluated.And conventional prediction method linear regression (LR) and KNN-based Viewport Prediction algorithm (KVP) in [14] are considered for comparison.For fairness of comparison, the value of K in our proposed method is set as the same in [14], i.e., K = 5.Fig. 4 shows that accuracy of viewport prediction with different method along the time.LR method performs the worst, especially for longterm prediction.Because it performs prediction only based on historical data, and viewer motion behavior probably become quite different from that in former moment due to changes of content.Compared to KVP, our proposed method improves at least 6% higher accuracy, and maintains high accuracy over 80% even for the long-term prediction.
Fig. 5 shows that normalized utility with different schemes in different bandwidth conditions.We can observe that the CMU-VP scheme always perform better than that without CMU-VP scheme (e.g., Multicast only and Unicast only approaches) in all bandwidth conditions.The Multicast only approach performs worst among all the approaches, because low tile bit rate is transmitted to viewers with the limited multicast bandwidth.The utility of Multicast only approach in average is 25% less than that achieved by CMU-VP optimal.Compared to CMU-VP optimal approach, CMU-VP proposed approach achieved comparable performance.While CMU-VP equal performed much worse, about 13% less  utility compared with CMU-VP optimal.As the resources for transmission are equally distributed to each tile without considering probability of viewing, which leads the tile within the viewport in low quality (i.e., low V-utility value).And Unicast only performs steadily as the bandwidth increasing, however, much of transmission bandwidth is wasted to transmit entire of VR video to respective viewers in a unicast manner, which leads to the worst efficiency.
Fig. 6 shows the total achieved data rate of UEs over total transmission bandwidth, which reflects the efficiency of transmission schemes.Clearly, We can observe that schemes with CMU(i.e., CMU optimal, CMU Greedy, CMU Decomposing, CMU equal) outperform schemes without CMU, since the bandwidth resources are fully utilized for schemes with CMU.And the unicast only performs the worst.Note that the only difference among the schemes with CMU is the strategy of tile rate allocation.However, tile rate allocation is based on the achieved data rate of UEs, which is same for schemes with CMU.Therefore, schemes with CMU has same performance on transmission efficiency.In fact, the curves of schemes with CMU are overlapped (i.e., CMU optimal in red, CMU Greedy in green, CMU Decomposing in yellow, CMU equal in blue) in Fig. 6.Then, cumulative distribution function (CDF) of tile utility in corresponding viewers' viewport (V-utility) is measured to evaluate the QoE objectively.As can be seen from Fig. 7, approaches with CMU perform better than these without CMU.Furthermore, The gap between CMU optimal and CMU proposed indicates that the proposed near-optimal solutions are promising solutions with low complexity.Whereas, due to lack of transmission bandwidth, we can see that 30% of V-utility with Multicast only approach is below 0.1.
Finally, complexities of proposed algorithms are analyzed, which are presented in Table .3.Complexity of the greedy algorithm consists three part, the first part is the complexity for x v,t determination, and the second and third part are the complexity of greedy approach for P2 and P3, respectively.Complexity of Decomposing algorithm is sum of the complexity of greedy approach for P4 and P5 in the worst case, and the complexity of approximate approach(i.e., AMP) for P5 should be less than that with greedy approach.Exhaustive search solve the problem by searching all the possible options, therefore, the complexity is in order of multicast tile rate select options times unicast enhancement tile options.
Table .3shows the comparisons of complexities.CPU time is also measured using Matlab in the condition with 10 MHz system bandwidth.The greedy algorithm and Decomposing algorithm take shorter time to complete the optimization than the exhaustive search approach.The Decomposing algorithm algorithm takes shortest time to obtain the solution, however achieved utility is less than greedy algorithm.
Overall, the proposed scheme could significantly improve QoE by utilizing the heterogeneous feature of H-CRAN.And the proposed near-optimal algorithms (i.e., greedy algorithm and decomposing algorithm) are promising solutions with low complexity and comparable performance.

VII. CONCLUSION
In this paper, we explore viewport prediction based on historical trajectories and similarity between cross user motion behaviors.In order to achieve better transmission efficiency and QoE improvement, we propose the CMU-VP scheme for VR video streaming in H-CRAN.Furthermore, two nearoptimal solutions for CMU-VP schedule problem is presented.The simulation shows that the proposed CMU-VP scheme with our proposed viewport prediction method can improve at least 25% and 17% higher QoE compared with muticast only scheme and unicast only scheme, respectively.Notice that time window of similarity measurement in our prediction method is a very important factor, which affects the accuracy of prediction.It's worth to careful study for long-term viewport prediction.CMU-VP might be a promising way to stream VR video to a group of people in some specific scenarios.For CMU scheduling, more sophisticated algorithm with low complexity will be explored in our future work.

FIGURE 6 .
FIGURE 6.Total UEs data rate over bandwidth with B = 10.

TABLE 1 .
List of symbols.

TABLE 2 .
Parameters of simulation.