360Cast+: Viewport Adaptive Soft Delivery for 360-Degree Videos

The existing viewport-adaptive 360-degree video streaming schemes encode tiled 360-degree videos with digital-based compression. However, these schemes cause a cliff effect wherein the headset video quality drops when the channel signal-to-noise ratio (SNR) falls below a certain threshold. To realize high-quality wireless 360-degree video streaming, we propose a novel viewport-adaptive soft delivery scheme for 360-degree videos, called 360Cast+. 360Cast+ skips the non-linear operations in digital-based streaming and adopts power allocation and analog modulation to achieve graceful video quality improvement in unstable wireless links. In particular, 360Cast+ integrates the human vision system (HVS) and projection distortion as a perceptual weight in power allocation operations. A near-optimal low-complexity subcarrier matching algorithm was also adopted to extend 360Cast+ to fading channel environments. To reduce the effect of the prediction error, 360Cast+ uses dynamic linear regression (DLR) to predict the future orientation and future prediction error to extract the extension area around a viewport. The evaluation results demonstrated that the proposed 360Cast+ provides sophisticated video quality irrespective of the prediction error and channel conditions in orthogonal frequency-division multiplexing (OFDM)-based systems.


I. INTRODUCTION
Virtual reality (VR) is a multimedia technology with a high potential for growth. All types of VR content build a synthetic virtual environment to mimic the real world for participants to interact with. VR can be applied to various applications such as automotive video streaming, virtual live concerts, and six-degrees-of-freedom content streaming for remote operation. However, the poor user experience provided by traditional computer-supported VR headsets or all-in-one headsets (e.g., Oculus Go) limits the imagination and potential of virtual worlds.
The restricted mobility of wired VR headsets and the lack of real-time high-quality content in all-in-one VR headsets are the main problems encountered by VR applications. If these limitations are eliminated, wireless VR headsets can provide immersive experiences for users at any time or place, thereby producing a plethora of novel VR application opportunities.
The associate editor coordinating the review of this manuscript and approving it for publication was Jiachen Yang .
To display VR content on wireless VR headsets, the delivery of high-quality 360-degree videos over wireless links is a challenging issue. A 360-degree video, also called an immersive or spherical video, is a new video format for VR content. Each 360-degree video consists of spherical images captured by an omnidirectional camera or camera array. The captured 360-degree video is then projected onto a 2D plane via projection methods, including equirectangular and cubemap projections. Although the full resolution of 360-degree video delivery can immerse users in realistic virtual worlds, the required transmission rate is significantly high; therefore, the transmission rate may not meet the bandwidth requirement in wireless links. To meet the bandwidth requirement in wireless links, viewport-adaptive 360-degree video delivery has been proposed in recent years. A viewport represents a part of each 360-degree video frame that is displayed on the user headset. While playing VR content on a headset, each headset user can freely switch their viewing orientation by moving their head, and only 20% of the pixels of the entire 360-degree video frame need to be rendered and displayed on the headset [1]. Based on this information, each headset user requests a new viewport from the sender; subsequently, the sender sends back the requested viewport with enhanced quality in viewport-adaptive 360-degree video delivery for traffic reduction.
In conventional viewport-adaptive 360-degree video delivery schemes over wireless links, digital video compression and digital wireless transmission are performed in sequence [2]- [7] for the 360-degree video frames. For example, the video compression part uses the H.265/High-Efficiency Video Coding [8] standard and generates a compressed bitstream by implementing nonlinear quantization and entropy coding operations defined in MPEG-I (MPEG Immersive media) standard. The wireless transmission part uses a channel coding and a digital modulation scheme to reliably transmit the encoded bitstream.
However, the conventional schemes have the following problems due to unreliable wireless channels. First, encoded bitstreams are highly vulnerable to bit errors that occur in wireless channels with low quality. Even when the conventional schemes can assign a large number of bits to the user viewing viewport for quality enhancement, the video quality on the headset drops significantly when the channel signalto-noise ratio (SNR) falls below a certain threshold. This phenomenon is referred to as the cliff effect. Second, the video quality of the user viewing viewport is constant even in highquality wireless channels. This is because quantization is a lossy process whose distortion cannot be recovered at the receiver. Although some studies [9], [10] have been proposed to mitigate the cliff effect in digital-based video delivery by introducing layered source coding and layered channel coding, the cliff effect is converted into the so-called staircase effect [11]. In the staircase effect, the video quality discontinuously improves as the wireless channel quality improves. Third, the experienced quality of viewport-adaptive delivery schemes is highly dependent on the viewport prediction accuracy. Even when a high-quality viewport is successfully delivered to a headset user, the experienced quality will drop when the distance between the displayed viewport and predicted viewport is large.
In this study, we propose a novel delivery scheme for 360-degree videos, called 360Cast+, to overcome the aforementioned issues. The proposed 360Cast+ is inspired by soft video delivery schemes proposed in [12], [13]. 360Cast+ integrates viewport prediction, analog modulation, and optimal power allocation to provide better video quality on the wireless headsets of users. In the proposed scheme, we first predict the future viewport and prediction error of each user based on their past head movements via dynamic linear regression (DLR). 360Cast+ then transforms the pixel values within the predicted viewport, whose size is adaptive to the prediction error, into frequency components, and then directly maps the frequency components to transmission symbols, i.e., analog modulation, after transmission power allocation. In particular, the power allocation process assigns transmission power to the frequency components within the predicted viewport considering the distortions in the human vision system (HVS) and sphere-to-plane projections. The evaluation results demonstrated that the proposed 360Cast+ yields better video quality in comparison to the existing digital-based and soft video delivery schemes even in bandlimited environments.
In contrast to 360Cast, proposed in [14], 360Cast+ overcomes two major issues of wireless 360-degree video delivery, namely the viewport prediction error and frequencydependent wireless channel variations. In [14], a future viewport was predicted based on the recent head movements of a user. Although it achieved better video quality in the predicted viewport, the headset users do not always conform to the predicted head movements. The prediction error causes significant degradation in the video quality on the user headset, especially in high-quality wireless channels. In 360Cast+, we predict a potential prediction error from the recent head movements and the predicted head movement; then, based on the potential prediction error, the size of the viewport is enlarged. We confirmed that an enlarged viewport can decrease the effect of the viewport prediction error, implying low-quality degradation, with limited increment in video traffic.
In addition, 360Cast was designed for broadcast transmission with additive white Gaussian noise; thus, it does not consider the frequency-dependent channel variations in modern wireless systems. In the widely used orthogonal frequencydivision multiplexing (OFDM) systems, which decompose a wideband channel into multiple subcarriers, the channel SNRs across subcarriers are usually different, often by approximately 20 dB [15]. To accomplish high-quality 360degree video delivery in modern wireless systems, we utilized diversity for channel selectivity. 360Cast+ designs a subcarrier matching algorithm and joint source-channel power allocation technique to minimize the total end-to-end distortion by utilizing the frequency-dependent channel variations.
The contributions of our study are as follows: • The power allocation process in 360Cast+ integrates the distortions of human perception in HVS and 2D-plane projection for better visual quality on the user headset.
• The joint source-channel power allocation process with a subcarrier matching algorithm utilizes the frequencydependent differences across the available subcarriers for high-quality 360-degree video delivery in modern OFDM systems.
• We used the head movements of ten headset users for the evaluations. We demonstrated that the prediction of viewports in 360Cast+ reduces the prediction error effect and realizes better and efficient video quality on each user headset in comparison to the existing digitalbased and soft video delivery schemes.

II. RELATED WORK A. MPEG-I STANDARD ACTIVITIES
MPEG-I groups have discussed various 360-degree video streaming standards for VR systems. They defined the common test conditions (CTC) and evaluation procedures for the VOLUME 9, 2021 video coding of 360-degree videos. In the CTC, the digitalbased video coding standard of H.265/HEVC, i.e., HEVC Test Model (HM) [16], is used for the video coding. The HM uses the test 360-degree video sequences such as Class-roomVideo and TechnicolorPainter for video encoding to evaluate the coding efficiency. Since the resolution of the 360-degree video is even high, the user's viewport is one of the key factors to reduce the required bandwidth for VR streaming. MPEG-I groups have discussed the possibility of motion-constrained tile sets (MCTS) [17] for traffic reduction considering the user's viewport. The MCTS limits the temporal prediction range within the same region, enabling the independent extraction and decoding of each encoded tile at the receiver. In this case, the server selectively determines each encoded tile will/will not be delivered to the headset user over networks based on the user's viewport, i.e., viewportbased streaming, using the dynamic adaptive streaming over HTTP (DASH) [18].
In addition, the weighted to spherically uniform peak SNR (WS-PSNR) [19] and immersive video PSNR (IV-PSNR) [20] are defined as the quality metrics to evaluate the perceptual quality in VR systems. WS-PSNR considers the pixels errors based on the projection distortion from the sphere to the 2D plane while IV-PSNR considers the pixel shift and global color shift into the conventional PSNR to deal with rounding errors and color characteristics in VR systems.
In this paper, the proposed 360Cast+ skips nonlinear operations of the quantization and entropy coding from the digital-based video coding to prevent cliff effect and constant quality irrespective of wireless channel fluctuation. In addition, we used both WS-PSNR and IV-PSNR as the quality metrics to discuss the perceptual quality of the proposed 360Cast+.

B. VIEWPORT-ADAPTIVE 360-DEGREE VIDEO STREAMING
The streaming schemes of viewport-adaptive 360-degree videos have been widely studied for traffic reduction. The main approach of viewport-adaptive 360-degree video streaming includes tile-based schemes [2]- [7], [21]- [34]. The tile-based schemes divide each 360-degree video frame into sub-frames called tiles. To deliver the optimal quality of 360-degree video over wired/wireless links, each tile is encoded into different quality levels. References [2], [3] encoded the tiles into two layers, i.e., the base layer and the enhancement layer. The base layer provided the entire 360-degree video frame, while the enhancement layer improved the quality of the tiles corresponding to the predicted field of view (FoV). A study by [22] determined an optimal bit allocation technique for each tile based on the user viewport, and [24] used quantization parameters for bit allocation across the tiles. References [23] proposed an optimal hypertext transfer protocol (HTTP) streaming for the encoded tiles, and OpTile [4] adaptively determined the size of the tiles to reduce the storage requirement of HTTP streaming. References [21] evaluated the existing tile-based methods to discuss the benefits of tile-based methods over no tile-based methods. Some studies compensated for the viewport prediction error by extending the predicted viewport [5]- [7]. References [6] extended k-tiles around the predicted viewport, while [5] extended the viewport area based on the user head movements. Other studies considered the characteristics of 360-degree videos for bit allocation. References [28] used 2D projection distortion as the weighted distortion of the bit allocation algorithm, and [29] developed a spherical bit-rate equalization technique for rate-distortion optimization.
The existing tile-based methods optimize the quality of each tile to meet the requirements of the wired/wireless bandwidth. However, in a realistic wireless channel, digitalbased methods induce the cliff effect owing to the fluctuations in the wireless channel quality. The proposed 360Cast+ adopts analog modulation for viewport-adaptive 360-degree video delivery to make efficient improvements in the viewport quality by improving the instantaneous wireless channel quality. In addition, 360Cast+ considers the characteristics of 360-degree videos, i.e., joint distortions of 2D projection and human perception, for power allocation to ensure better viewport quality on the user headsets.

C. VIEWPORT PREDICTION
The quality of the viewport-based streaming highly depends on the accuracy of the viewport prediction. For accurate viewport prediction, there are non-learning-based [1], [2], [35]- [37] and learning-based [33], [38]- [40] prediction methods. The most existing studies on the non-learningbased viewport prediction utilize a regression curve obtained from the past head movement, such as linear regression (LR) [1], [35], [37] and weighted LR (WLR) [2], [36]. In the learning-based prediction methods, [38] used saliency map prediction based on orientation data of multiple users and [39] further adopted the recurrent neural network (RNN)-based long short-term memory (LSTM) model to predict future viewport movement from both the saliency map and past head orientation. Some studies [33], [40] discuss the effect of the LR-based and deep-learning-based methods on the viewport prediction. They found that the difference of the prediction accuracy between both methods is slight.
Our study designs the non-learning-based viewport prediction. In contrast to the existing LR-based viewport prediction, we utilize DLR for future head orientation and potential prediction error to reduce the effect of the prediction error. We found the viewport considering the potential prediction error can decrease the quality degradation owing to the prediction error with a slight increment in the bandwidth requirement.

D. SOFT IMAGE/VIDEO DELIVERY
Soft image/video delivery schemes [12], [13], [41]- [55] have been recently proposed to ensure that the received video quality is proportional to the instantaneous wireless channel quality. For example, SoftCast [12] skips quantization and entropy coding and uses analog modulation, which maps the discrete cosine transform (DCT) coefficients directly to the transmission signals. Some researchers, motivated by the concept that both the source and channel components have non-uniform energy distributions, utilized the distributions for joint source-channel coding. ParCast [13] extended Soft-Cast to a multiple-input multiple-output OFDM link and assigned high-energy source components to high-gain subchannels that could be utilized based on the non-uniform energy distributions. To accommodate multiple users with diverse channel conditions, ECast [50] proposed a joint source-subcarrier matching and power-allocation scheme to minimize the mean square error (MSE). Additionally, spatial scalability-enabled robust video broadcast (SSRVB) was used to address the multi-user scenario of both heterogeneous device resolutions and channel conditions by iterative joint subcarrier matching and power-allocation methods [42]. Other studies extended SoftCast to immersive video contents. FoveaCast [53] utilized the perceptual distortions in the HVS for soft image delivery to achieve a higher visual quality for users. FreeCast [45] adopted 5D-DCT for multi-view plus depth and exploited a fitting function based on a Gaussian Markov random field model for metadata overhead reduction. References [51] first considered soft delivery for 360-degree videos by using a combination of 1D-DCT and spherical wavelet transform. OmniCast [41] proposed two algorithms to find the block partition with the minimum 2D projection distortion for different sphere-to-plane projections.
Although the existing studies [41], [51] have designed soft delivery schemes for 360-degree videos, they require a large amount of video traffic because they send the full resolution of the 360-degree videos. To reduce the traffic in the soft delivery of 360-degree videos, 360Cast+ only sends the predicted and extended viewports considering the potential viewport prediction error. In addition, we designed our 360Cast+ for modern OFDM-based systems, i.e., frequencydependent channel variations, by utilizing joint subcarrier matching and power allocation to minimize the end-to-end distortions in wireless 360-degree video delivery.

III. 360Cast+ A. OVERVIEW
This study proposes a novel soft and viewport-adaptive delivery scheme for 360-degree videos. Fig. 1 shows an FIGURE 2. Viewport prediction based on the predicted head orientation and potential viewport prediction error.
end-to-end system of the proposed 360Cast+. We consider the past orientations of the user head, i.e., the pitch, yaw, and roll, that are transmitted from the user headset. Based on the past orientations and DLR, 360Cast+ predicts the future orientation and potential prediction error. Here, we consider the foveation point of the headset user as the predicted future orientation and extract the viewport from the full resolution of the 360-degree video based on the foveation point. The extracted viewport is then transformed into discrete wavelet transform (DWT) coefficients using 2D-DWT through the Daubechies 9-tap/7-tap filter. The DWT coefficients are divided into multiple chunks and scaled by chunk-wise power allocation before transmission. In this case, 360Cast+ finds an optimal match between the chunks and subcarriers for optimal power allocation. This optimal match is based on the variance of each chunk and the channel gain of each subcarrier. 360Cast+ then assigns a transmission power to each chunk to optimize the perceptual quality of the headset user by considering the joint distortions in the sphere for 2D plane projection and human perception in the HVS. The power-assigned DWT coefficients in each chunk are sequentially mapped to the I and Q components symbols and transmitted over each subcarrier based on the matching result.
At the decoder, the minimum MSE (MMSE) filter can provide an optimal linear estimate for the received DWT coefficients. AT the decoder, the minimum MSE (MMSE) filter can VOLUME 9, 2021 B. VIEWPORT PREDICTION 360Cast+ first predicts the future head orientation of the headset user based on the past head orientation received from the headset to estimate the user foveation point. Considering the 2D-projected 360-degree video frames, the user viewport can be determined from two attributes of the orientation, i.e., the yaw and pitch. Let P = {P 0 , . . . , P t } be a set of pitch/yaw attributes from an initial time, P 0 , to the present time, P t , and the predicted pitch/yaw attribute, P t+t p , at the future time, t + t p , can be obtained using DLR as follows: where w is the dynamic window size for LR and f w ( * ) is a LR function that uses the past pitch/yaw attribute from P t−w to P t . In contrast to the standard LR model, DLR adaptively sets the window size, w, based on the inflection point,ĵ, obtained from the past head orientations. To find the inflection point from past orientations, 360Cast+ uses the three previous orientations for an arbitrary index, j; for example, P t−j−1 , P t−j−2 , P t−j−3 , and decides the inflection point,ĵ, satisfying 360Cast+ also predicts the potential prediction error of the head orientation to reduce the effect of the prediction error on the displayed video quality on the user headset. The past prediction error, E P t , at time t can be derived between the predicted and actual head orientations as follows: . , E P t } denotes the set of the past prediction error. The potential prediction error corresponding to the predicted pitch/yaw attribute P t+t p can be obtained from the DLR-based prediction as follows: The predicted head orientation and potential prediction error were used to obtain the size of the transmission viewport. Fig. 2 displays an overview of viewport prediction based on the predicted head orientation and potential viewport prediction error. Let the predicted head orientation and potential prediction errors transformed to the 2D plane be denoted by X 0 = (x 0 , y 0 ) and R = (r x , r y ), respectively. In addition, we assume that the half-width and half-height of the user headset viewport corresponding to the FoV are v w and v h , respectively. We first consider a tentative viewport at the center coordinate of (x 0 , y 0 ) with half-width, v w + r x , and half-height, v h + r y . We then consider the chunks included in a tentative viewport as the extended viewport. 360Cast+ only sends extended viewports to the user headsets for traffic reduction.

C. ENCODING 1) PROBLEM FORMULATION
The extracted viewport is then transformed into frequency components via 2D-DWT and divided into N chunks with a resolution of c h × c w pixels. Let x i [j] denote the i-th analogmodulated symbol, which is the i-th chunk, c i , scaled by a factor of g i for noise reduction, as follows: During a transmission time slot, N chunks are assigned to N subcarriers in the OFDM systems. The receiver obtains the received symbol over wireless OFDM links, which is modeled as follows: where y i,j is the received symbol of the i-th chunk over the j-th subcarrier, h j is the channel fading coefficient of the j-th subcarrier, and n i is the effective noise with a variance of σ 2 . The transmitter performs optimal power control by selecting g i to obtain the best 360-degree video quality for the headset user. Accordingly, we define a weighted metric called weighted MSE (WMSE) as follows: WMSE denotes the weighted mean square error between the original and reconstructed 360-degree video frames considering the sphere-to-2D mapping distortion, D s ( i ), and perceptual distortion, S(v, X i ), in the HVS. Both D s ( i ) and S(v, X i ) can be calculated based on the current location of the given point in the spherical and pixel domains, i.e., i = (θ i , φ i ), X i = (x i , y i ), respectively. The best g i should be obtained by minimizing the WMSE under the power constraint with the total power budget, P. When the transmission symbols of the i-th chunk are assigned to the j-th subcarrier, the square error, MSE i,j , in Eq. 3 can be obtained as follows: Here, the total end-to-end distortion can be calculated when N chunks are assigned to N subcarriers as follows: where b i,j is a binary value denoting whether the i-th chunk is assigned to the j-th subcarrier. It should be noted that u i = g 2 i λ i and W i = D s ( i )S(v, X i ). In this case, the optimization problem can be expressed as follows: The optimization problem is a mixed binary programming problem, which is an NP-hard problem. We divide the problem into two sub-problems (power allocation and subcarrier matching) to find a near-optimal solution. The power allocation problem can be optimally solved, and the subcarrier matching problem can be reformulated as an assignment problem.

2) POWER ALLOCATION
If the subcarrier matching table, {b i,j }, is given, the optimization problem (5) can be reformulated as From [56], this optimization problem is convex and can be solved by the Lagrange multiplier as follows: By differentiating with u i and γ , we obtain: When σ is small, u i can be simplified as follows: Thus, an optimal scaling factor g i can be calculated: The weight matrix, W i , which consists of the 2D mapping distortions and perceptual distortions of the HVS, will be introduced in the following subsections.

3) 2D MAPPING DISTORTION
In contrast to the conventional 2D videos, a 360-degree video is captured by an omnidirectional camera and mapped onto a sphere domain. The spherical 360-degree videos are then mapped onto the 2D plane using a linear projection technique.
Thus, the spherical distortions create an unequal weight between the pixels of the 2D-projected 360-degree videos. Specifically, the pixels, (θ, φ), in the spherical domain are projected to the pixels, (x, y), in the 2D-plane domain. In this case, d p (x, y) represents the distortions between the original and reconstructed pixel values at the location of (x, y) in the 2D-plane domain. The spherical distortion can be defined as follows [44]: where J (x, y) is the Jacobian determinant, that is: ∂θ ∂x ∂θ ∂y ∂φ ∂x ∂φ ∂y .

4) HUMAN PERCEPTUAL DISTORTION
Based on the predicted head orientation of the headset user, 360Cast+ determines the user viewport region that will be displayed on the headset from the full resolution of 360-degree videos. We consider that the user foveation point of the viewport is the same as the predicted head orientation. In this case, the user error sensitivity for the pixels within the viewport decreases as the distance between the foveation point and target pixel increases. 360Cast+ adopts the error sensitivity features [56], [57] of the pixel and wavelet domains into the transmission power allocation process to realize better visual quality in the HVS. To make our description more concise, the values of the variables are listed in Table 1. We first introduce the error sensitivity in the pixel domain. The empirical model of contrast sensitivity as a function of retinal eccentricity can be defined as: where CT 0 , α, and e 2 denote the minimal contrast threshold, spatial frequency decay constant, and half-resolution eccentricity constant, respectively. The retinal eccentricity at location x is calculated as follows: where N and v denote the resolution of the transmission area and viewing distance, respectively. d(X) is the distance between point X = (x, y) and the foveation point, (x f , y f ).
In this case, the error sensitivity in the pixel domain is defined as the normalization of contrast sensitivity as follows: e(v, X)) where δ is the visual sensitivity when the spatial frequency, f , exceeds the threshold. The cutoff frequency is the minimum value of the critical invisible frequency, f c , and the display Nyquist frequency, f d , which can be defined as follows: , πNv 360 .
The error sensitivity defined in Eq. (6) is then extended to the wavelet domain [57]. The wavelet coefficients provide different perceptual distortions in the four sub-bands, i.e., LL, HL, LH, and HH. In the wavelet domain, the spatial frequency, f , is determined by the wavelet decomposition level, l, i.e., f = r2 −l , where r is the display resolution. The weight of the error sensitivity, S w (l, m), in the wavelet domain related to the sub-band (l,m) is presented in Table 2. Finally, the visual sensitivity in the wavelet domain is defined as follows: where β 1 and β 2 denote the weights of s w and s f , respectively.

5) SUBCARRIER MATCHING
We obtained the optimal power scaling factor by assuming an optimal subcarrier and chunk matching. To determine the optimal subcarrier matching table, {b i,j }, between chunk i and subcarrier j, the subcarrier assignment problem must minimize the end-to-end distortions using a matching table, which can be formulated as follows: Thus, according to Eq. (5), a chunk should be appropriately assigned to a subcarrier based on the variance and channel gain to decrease the WMSE. Specifically, the chunks with larger variance should be allocated to subcarriers with higher channel gains. 360Cast+ sorts the chunks and subcarriers in the descending order before power allocation; subsequently, it assigns the chunks corresponding to the subcarriers in this order. Fig. 3 illustrates an overview of the subcarrier matching operation. 360Cast+ uses a matrix, whose columns and rows correspond to the number of transmission symbols and subcarriers, respectively. The rows are sorted in descending order based on the channel gain, h j . The encoder also uses vectors of each chunk, y i , and sorts the vectors in descending order based on the variance. Each vector includes c h ×c w elements. The elements of the chunk with higher variance are assigned to OFDM channels with higher channel gain by the encoder in a sequential manner to maximize diversity gain. After the assignment, 360Cast+ allocates the frequency representation of each chunk to OFDM subcarriers based on the matrix. The algorithm determines the average computational complexity of O(n 2 ) in comparison to the existing subcarrier assignment algorithm, i.e., an auction algorithm with a computational complexity of O(n 2 log n).

D. DECODING
At the receiver side, the received symbols, y i , of chunk, i, are filtered via an MMSE filter [12] as follows: 360Cast+ utilizes inverse 2D-DWT operations for the filtered symbols to reconstruct the pixel values of the transmitted viewport. Finally, the user headset renders the displayed viewport based on the real head orientation of the user.

E. ANALOG COMPRESSION
In the above designs, we consider that the available baud rate, i.e., bandwidth, is enough to send all the analogmodulated symbols within the viewport. If the available bandwidth and/or time resources are restricted for wireless channel use, the proposed 360Cast+ has to selectively transmit the DWT coefficients to fit the available bandwidth. For such cases, the proposed 360Cast+ discards the chunks in high-frequency components to fill the bandwidth. When the sender discards a chunk, the receiver regards all coefficients in the chunk as zeros. As a result, data compression can be accomplished even for the proposed 360Cast+.

IV. EVALUATION A. SIMULATION SETTINGS 1) PERFORMANCE METRIC
We evaluated the performance in terms of the PSNR, structural similarity index measure (SSIM) [58], WS-PSNR [19], IV-PSNR [20], and the proposed weighted PSNR (WPSNR). PSNR is defined as follows: where L is the number of bits used to encode the pixel luminance (typically, 8 bits) and ε MSE is the MSE between all pixels of the decoded and original videos. SSIM can predict the perceived quality of video streaming. A larger SSIM value, close to 1, indicates higher perceptual similarity between the original and decoded 360-degree video frames. As mentioned in Sec. II-A, WS-PSNR and IV-PSNR represent perceptual quality metrics for 360-degree video defined in CTC. WPSNR represents the 360-degree video quality considering the 2D projection distortions and human perceptual distortions in the HVS as follows:

2) TEST DATASET
We used three different types of standard reference 360-degree videos, namely, Mega Coaster, Shark, and Pacman with a frame rate of 30 fps, 4:2:0 chroma sampling, and resolution of 3840 × 2048 pixels, along with 50 user head orientations derived from the headset sensors provided in [59]. First, we used the Mega Coaster reference 360-degree video and the head orientations of 10 users for comparison; subsequently, the other 360-degree videos in Section IV-E were used to discuss the effect of the 360-degree video categories. We assumed the FoV of the user headset to be 90 degrees × 90 degrees. In this case, the resolution of the viewport was set to 960 × 1024 pixels. We set the chunk size to 32 × 32 pixels for all comparative schemes.

3) WIRELESS SETTINGS
We simulated OFDM channels with 128 subcarriers, whose channel gains included i.i.d. Rayleigh distributions, i.e., h i ∼ N (0, 1). Here, ∼ implies ''distributed as' ' and N (a, b) is a Gaussian distribution with a mean of a and variance of b. The effective noise, n i , follows a white Gaussian distribution with a variance of σ 2 , i.e., n i ∼ N (0, σ 2 ). We first set the available bandwidth to 24.3 MHz (= 1084 (coefficients in width) × 998 (coefficients in height) × 1.5 (color channels) × 30 (Hz) × 0.5 (symbol/coefficients). The bandwidth is almost enough to send all the analog-modulated symbols within the predicted viewport in the proposed 360Cast+ since the average region of the predicted viewport is 998×1084 pixels. In Sec. IV-C, we discuss the effect of the available bandwidth on the video quality.

B. VIDEO QUALITY
To clarify the benefits of the proposed 360Cast+ from the existing video delivery schemes, we compared the proposed 360Cast+ with the existing digital-based and soft delivery schemes. The digital-based schemes use HM 16.20 for video compression and the modulation format of Binary Phase Shift Keying (BPSK) with 1/2-rate and 1/4-rate convolutional codings, respectively. We prepare three existing soft delivery schemes: 360Cast, ParCast, and SoftCast. Especially, 360Cast+ and 360Cast only deliver a part of the 360-degree video frames, while the other schemes transmit the full resolution of the 360-degree video frames constrained by the same transmission power budget. Besides, 360Cast+ and Par-Cast implement the proposed subcarrier matching algorithm, while 360Cast and SoftCast implement random subcarrier assignment.
Figs. 4 (a) through (e) show the video quality using the five metrics of PSNR, SSIM, WS-PSNR, IV-PSNR, WPSNR, respectively, as a function of wireless channel SNRs. We can see the following points: • 360Cast+ prevents cliff effect at low SNR regimes and gracefully improves the received video quality with the improvement of the wireless channel quality.
• 360Cast+ yields better performance in comparison to the existing ParCast and SoftCast schemes because it achieves traffic reduction by only delivering the predicted viewport.
• 360Cast+ also yields better video quality in comparison to 360Cast by assigning the power budget within the predicted viewport considering the viewport prediction error and makes adequate subcarrier matches.
• The digital-based schemes cause cliff effect at a certain wireless channel SNR. This is because a large number of bit errors happens in the received bitstream even with a low-rate channel coding and the errors caused decoding error at the receiver. For example, the average IV-PSNR and WS-PSNR improvements of the proposed 360Cast+ are 13.1 dB and 12.0 dB compared with the BPSK with a 1/4-rate convolutional coding scheme, respectively, across the wireless channel SNRs of 0 dB through 16dB.

C. EFFECT OF AVAILABLE BANDWIDTH
In this section, we discuss the efficiency of the proposed 360Cast+ considering the available bandwidth. We compare the effect of the available bandwidth with the digitalbased 360-degree video delivery schemes. We prepare two digital-based schemes: full resolution-based and viewportbased schemes. The full resolution-based scheme encodes and delivers the full resolution of 360-degree video to the VOLUME 9, 2021  headset user while the viewport-based scheme only encodes and delivers the user's viewport predicted by the proposed viewport prediction in 360Cast. Here, we use HM 16.20 for video compression and the modulation format of BPSK with 1/4-rate convolutional coding. To discuss the effect of the available bandwidth, the digital-based schemes set QP parameters of 5, 9, 16, 28, and 50 for video coding.
Figs. 5 (a) and (b) show the video quality as a function of the available bandwidth at the wireless channel SNRs of 5 and 15 dB, respectively. We can see the following points: • The proposed 360Cast+ achieves the best video quality in the band-limited environments irrespective of wireless channel quality.
• The digital-based schemes at a wireless channel of 15 dB achieves better video quality in broadband environments.
• At a low wireless channel SNR of 5 dB, the digital-based schemes do not reconstruct the 360-degree video from the received bit stream owing to a large number of bit errors.

D. EFFECT OF SUBCARRIER ASSIGNMENT
The previous evaluations demonstrated that 360Cast+ yielded better performance than the comparative schemes owing to subcarrier matching. To further discuss the effect of subcarrier matching, we evaluated the viewport quality under different subcarrier matching algorithms, i.e., random, SSRVB, and the proposed subcarrier matching algorithm. The random scheme uniformly assigns a chunk within a predicted viewport to a subcarrier. The SSRVB scheme iteratively assigns a chunk to a subcarrier based on an auction algorithm, proposed in [41]. Each scheme delivers the samesized predicted viewport to the headset user according to DLR-based viewport prediction.   6 demonstrates the video quality of the proposed 360Cast+ under different subcarrier matching algorithms as a function of wireless channel SNRs. This shows that the proposed subcarrier matching algorithm outperforms other algorithms irrespective of the wireless channel SNRs. For example, the video quality of the proposed subcarrier matching algorithm is 7.6 and 1.3 dB higher than that of random matching and SSRVB at a wireless channel SNR of 10 dB, respectively.

E. EFFECTS OF USER HEAD MOVEMENT AND 360-DEGREE VIDEO SEQUENCES
In this section, we evaluate the performance improvement of the proposed 360Cast+ under the effects of weighted power allocation and viewport prediction error. We first evaluated the video quality of the proposed 360Cast+ and the comparative schemes for 10 different headset users as a function of wireless channel SNRs. Here, 360Cast+ (viewport only) represents our 360Cast+ scheme without considering the viewport prediction error. 360Cast+ (ParCast Power Allocation) performs power allocation without considering the perceptual distortions in the HVS and 2D projection distortions. To compare the effect of user head movements in the comparative schemes, we calculated the average DLR-based viewport prediction error for each headset user during 360-degree video playback. User 4 was observed to be the best headset user with a prediction error of 0.875 pixels, while user 1 was the worst user with a prediction error of 33.375 pixels. Fig. 7 (a) shows the WPSNR performance of headset user 1 as a function of wireless channel SNRs. The proposed 360Cast+ outperformed the 360Cast+ (viewport only) and 360Cast+ (ParCast Power Allocation) schemes by up to 1.8 and 0.9 dB at a wireless channel SNR of 5 dB, respectively. In addition, the performance difference between the proposed 360Cast+ and 360Cast+ (viewport only) increased as the wireless channel SNRs improved. This is because the viewport prediction error has a higher effect on quality VOLUME 9, 2021 degradation in 360Cast, especially in high wireless channel SNR regimes. Fig. 7 (b) shows the WPSNR performance of headset user 4 as a function of wireless channel SNRs and Fig. 7 (c) shows the average WPSNR performance across 10 headset users as a function of wireless channel SNRs. In Fig. 7 (b), it can be observed that 360Cast+ also yields better WPSNR performance than the comparative schemes with DLR-based viewport prediction including 360Cast+ (viewport only) and 360Cast+ (ParCast Power Allocation). This indicates that the proposed 360Cast+ can realize better viewport quality irrespective of the viewport prediction accuracy by extending viewports based on the potential viewport prediction error and scaling them with the perceptual weight. Fig. 8 evaluates the average video quality of the comparative schemes across 10 headset users for the test 360-degree video sequences of Mega Coaster, Shark, and PacMan at a wireless channel SNR of 5 dB. Each video belongs to three categories, i.e., (i) natural image, fast-paced head movement; (ii) natural image, slow-paced head movement; and (iii) computer graphic (CG) and fast-paced head movement. Accordingly, the following two observations were obtained: • The proposed 360Cast+ achieves the best performance irrespective of the 360-degree video categories.
• As the speed of the user head movements increases, the received video quality decreases in all comparative schemes owing to the increase in the viewport prediction error.

F. VISUAL QUALITY
Finally, we discuss the visual quality of the comparative schemes. Fig. 9 shows a received frame of each comparative scheme using the test 360-degree video sequence of Mega Coaster at a wireless channel SNR of 10 dB. The area within the red rectangle represents the displayed viewport on the user headset. The conventional SoftCast scheme yielded the worst visual quality with numerous blurs; the ParCast scheme achieved better visual quality by utilizing the channel diversity gains. In comparison to the ParCast scheme, 360Cast only delivered the DWT coefficients within the predicted viewport. It demonstrated less visual degradation due to the channel noise by assigning more transmission power to the limited coefficients. However, in this case, the displayed viewport contained a large untransmitted area, i.e., the black rectangle caused by the viewport prediction error. Because the proposed 360Cast+ delivers an extended viewport to reduce the effect of the viewport prediction error, the transmitted viewport covers the red rectangle by only sending the extended viewport.

G. DISCUSSION ON DELAY ISSUE IN VR SYSTEMS
We finally discuss the delay issue on the proposed 360Cast+ and the existing digital-based schemes. The delay issue in VR video streaming is known as event-to-eye delay [60].
The event-to-eye delay mainly consists of the video capturing and stitching delays on the camera, encoding and transmission delays on the server, and decoding and rendering delays on the headset. The main differences of the event-toeye delay between the proposed 360Cast+ and the digitalbased 360-degree video delivery are encoding and decoding delays. We note that as discussed in Sec. IV-C, the proposed 360Cast+ can realize compression according to an available bandwidth by discarding high-frequency coefficients. In this case, the transmission delay between the proposed 360Cast+ and the digital-based schemes can be regarded as the same. We evaluate the encoding and decoding delays in the proposed 360Cast+ and the digital-based schemes to discuss the effect on the event-to-eye delay. Specifically, we used a 360-degree video of Mega Coaster with the resolution of 2048 × 3840 pixels. Here, we consider the resolution of the viewport is 960 × 1024 pixels. The specifications of the operating environment are Windows 10 64-bit operating system with Intel (R) Core (TM) i7-8750H CPU and 16 GB memory. The proposed 360Cast+ used MATLAB encoder and decoder while the digital-based schemes used HM 16.20 for 360-degree video encoding and decoding. Table 3 shows the total delay of encoding and decoding for full resolution and viewport of eight 360-degree video frames using HM 16.20 considering different QP parameters, respectively. Here, the full resolution-based scheme and viewportbased scheme are the same as Sec IV-C. The digital-based schemes integrate nonlinear operations including intra and inter predictions, quantization, and entropy coding for efficient coding while such integration may cause long encoding and decoding delays. On the other hand, the total delay of encoding and decoding for eight 360-degree video frames in the proposed 360Cast+ is approximately 44.5s. The proposed 360Cast+ skips such nonlinear operations, instead, only performs 2D-DWT with power allocation for coding. The proposed 360Cast+ realizes at least a hundred-fold and twentyfold improvement compared with the full resolution-based and viewport-based digital-based schemes, respectively, and may contribute to reducing the event-to-eye delay in wireless VR video delivery.

V. CONCLUSION
This study proposed 360Cast+ to realize the viewportadaptive and efficient video delivery of 360-degree videos. 360Cast+ overcame the issues involved in digital-based wireless 360-degree video delivery, i.e., the cliff effect, constant quality, and large perceptual redundancy, by integrating analog modulation, DLR-based viewport prediction, and optimal power allocation considering the joint distortions between 2D projection and human perception. In addition, 360Cast+ reduced the effect of the viewport prediction error and frequency-dependent channel environment in modern OFDM systems by integrating viewport extraction considering the potential prediction error and chunk-subcarrier matching algorithm. The evaluations demonstrated that the proposed 360Cast+ can yield better viewport quality in comparison to the existing digital-based and soft delivery schemes, irrespective of the viewport prediction error, through modern OFDM channels.
One of potential issues in the proposed 360Cast+ is to require sophisticated modulator and demodulator to realize the analog modulation. To carry out the analog modulation in practical scenarios, we will discuss the integration with the System on Chip (SoC) [61], [62] for the proposed 360Cast+ as a future work. He is a member of IPSJ and IEICE. He has served on program committees for many networking conferences, such as the IEEE, ACM, IPSJ, and IEICE. VOLUME 9, 2021