Subjective QoE of 360-Degree Virtual Reality Videos and Machine Learning Predictions

360-degree video provides an immersive experience to end-users through Virtual Reality (VR) Head-Mounted-Displays (HMDs). However, it is not trivial to understand the Quality of Experience (QoE) of 360-degree video since user experience is influenced by various factors that affect QoE when watching a 360-degree video in VR. This manuscript presents a machine learning-based QoE prediction of 360-degree video in VR, considering the two key QoE aspects: perceptual quality and cybersickness. In addition, we proposed two new QoE-affecting factors: user’s familiarity with VR and user’s interest in 360-degree video for the QoE evaluation. To aim this, we first conduct a subjective experiment on 96 video samples and collect datasets from 29 users for perceptual quality and cybersickness. We design a new Logistic Regression (LR) based model for QoE prediction in terms of perceptual quality. The prediction accuracy of the proposed model is compared against well-known supervised machine-learning algorithms such as k-Nearest Neighbors (kNN), Support Vector Machine (SVM), and Decision Tree (DT) with respect to accuracy rate, recall, f1-score, precision, and mean absolute error (MAE). LR performs well with 86% accuracy, which is in close agreement with subjective opinion. The prediction accuracy of the proposed model is then compared with existing QoE models in terms of perceptual quality. Finally, we build a Neural Network-based model for the QoE prediction in terms of cybersickness. The proposed model performs well against the state of the art QoE prediction methods in terms of cybersickness.


I. INTRODUCTION
360-degree videos (360 • ), also known as panoramic, omnidirectional, or spherical videos, are an emerging multimedia technology that offers immersive viewing experience to endusers. Unlike 2D traditional videos, 360 • videos require additional bandwidth for a satisfactory viewing experience that covers a wide area. Different manufactured Head-Mounted-Displays (HMDs), also known as VR headset, can be used to enjoy the extra immersive experience that 360 • videos offer. These videos require much higher bandwidth [1], [2] and may result in poor user's QoE while transmitting over a bandwidth-constrained network. To offer excellent end-user QoE, these videos should have high resolution and quality The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Tang . that result in bulk size. Therefore, for efficient streaming and satisfactory QoE of 360 • video, it is significant to get an in-depth understanding of QoE-affecting factors.
QoE counts on how users observe the overall value of service, and therefore it depends on subjective measures. The research on QoE-affecting factors is of great importance for QoE evaluation. Two significant QoE aspects are perceptual quality and cybersickness. Perceptual quality indicates the satisfaction degree of a user for a video quality watched with HMD. Cybersickness is the feeling of nausea and dizziness when watching 360 • video in a virtual environment. Since cybersickness is an effect that does not happen while watching traditional videos on a regular screen (as compared to VR glasses or goggles), this is one of the critical differences of viewer's experience between watching traditional videos and 360 • video. Previous studies show that the traditional video affected by compression and rendering device due to blurring artifact and blocking may result in QoE degradation [3]- [7]. The existing study on QoE-affecting factors and QoE aspects of are still limited [8]- [10]. Most existing researches focus on 1) the perceptual quality by considering the affecting factors such as gender, usage history, and rendering device, and 2) the cybersickness with the effect of content motion [9]- [12]. Besides these factors, the researcher should consider the user's interest in the 360 • video he/she is watching. Another affecting factor is the users' familiarity with VR that can impact the users' judgement about the quality of the video. Users having regular VR experience may have a different opinion about the video quality compared to first-time and rarely viewers. To the best of the authors' knowledge, no previous research studies have taken into account the effect of user's interest in a video and user's familiarity with VR on cybersickness and perceptual quality. Besides, the impact of six (QP, resolution, rendering device, gender, user's interest, and user's familiarity with VR) factors considered in this manuscript has not been investigated collectively on perceptual quality.
There are many studies and various approaches in the literature for the QoE prediction of videos [13]- [16], mainly including three categories: subjective QoE assessment, objective assessment, and data-driven QoE method. Subjective QoE assessment is the most promising method that directly reflects the viewers' judgment. Objective assessment method uses a mathematical tool and mostly based on reference-based classification methods or human visual system. The data-driven method realizes QoE modelling and prediction by analyzing the collected dataset. The subjective QoE method indicates to directly ask the participants to judge the QoE while watching videos. The subjects need to give his/her score after watching a video in a subjective experiment. Based on the subjective score and collected dataset, the researchers can use the recorded data as a training set for the prediction of QoE through various Machine Learning (ML) algorithms. In recent years, ML-based QoE prediction for multimedia streaming has emerged. The intents of applying ML is to model an unknown target concept from observations. ML algorithms have been used to predict the QoE for traditional videos [13], IPTV services [14], online video service provisioning [15], mobile video transmission [16], and 3D-immersive media streaming [17], however, little is known in the area of 360 • video. This manuscript intent to investigate the impact of six key QoE-affecting factors on users' perceptual quality and the effect of three (gender, users interest, and users familiarity with VR videos) QoE-affecting factors on users' cybersickness level. Besides, four supervised machine-learning algorithms such as LR, kNN, DT, and SVM are applied to the subjective dataset to predict the QoE in terms of perceptual quality which has not been used collectively for the QoE prediction of 360 • video in VR. We proposed the LR-based model for the prediction of QoE in terms of perceptual quality. Moreover, for the prediction of QoE in terms of cybersickness, we propose an ANN-based model. To aim this, we focus on 360 • videos QoE evaluation in terms of two aspects and six QoE-affecting factors. Our main contributions are fourfold.
• First, we subjectively investigate the impact of six QoE-affecting factors such as quantization parameter (QP), resolution, rendering device, gender, user's interest, and user's familiarity with VR on perceptual quality. Besides, we subjectively evaluate the impact of three QoE-affecting factors (gender, user's interest, and user's familiarity with VR) on cybersickness.
To aim this, we conducted a subjective experiment on 96 video samples and obtained a dataset from 29 subjects to investigate the QoE. The statistical result from the subjective experiment provides various findings on QoE impact factors in terms of perceptual quality and cybersickness.
• Second, we present two new QoE-affecting factors such as user's familiarity with VR and user's interest in 360 • video. The impact of these two influence factors is evaluated on users' QoE in a subjective experiment.
• Third, we proposed the LR-based model for QoE prediction in terms of perceptual quality. The proposed model is trained on perceptual quality (MOS) dataset obtained from subjective experiment, and then QoE is predicted in terms of perceptual quality based on training data. The performance accuracy of the LR model is then compared against well-known supervised ML algorithms such as kNN, SVM, DT, and existing QoE prediction models. The prediction accuracy of the proposed model is higher compared to kNN, SVM, DT, and existing QoE models.
To the best of the authors' knowledge, this is the first study that collectively applies the four supervised ML methods (LR, kNN, DT, and SVM) for the QoE prediction of 360 • video in VR.
• Fourth, we use an ANN-based model for QoE prediction in terms of cybersickness. To predict the QoE, The ANN model is trained and tested on cybersickness data obtained from the subjective test. The prediction accuracy of the proposed model is then compared against existing cybersickness prediction models. The proposed ANN-based model achieves higher QoE prediction accuracy in terms of cybersickness. The remainder of this manuscript is structured as follows: The related work significant to the subject of this manuscript is explained in detail in Section II. In Section III-A, we describe the QoE influence factors of 360 • video in detail while, in Section III-B, the detailed description of the subjective experiment and test video generation is presented. The subjective results and analysis are illustrated in Section IV, and the ML-based QoE prediction is presented in detail in Section V. Section VI includes the experiments evaluation and performance comparison of ML algorithms. Section VII presents the QoE prediction in terms of cybersickness based on ANN. Finally, Section VIII concludes the manuscript. VOLUME 8, 2020

II. RELATED WORK
In this section, we enumerate the related work relevant to the scope of our study. First, we present the related literature in detail about affecting factors that affect the QoE in various aspects. Then, we discuss the machine learning-based QoE-prediction methods founded in the literature of traditional videos and 360 • videos.
A. 360-DEGREE VIDEOS QoE-AFFECTING FACTORS AND SUBJECTIVE EVALUATIONS 360 • videos contain considerable information as compared to traditional ones, and thus these videos have higher resolutions and are encoded at higher bitrates. Besides, to fully immerse in the virtual environment and cover the widearea, an HMD device is used to watch such kind of videos, which creates many challenges for the researchers to analyze the QoE and consider different factors that may affect the QoE. A substantial number of approaches have been applied to evaluate the QoE of traditional 2D videos, and many QoE-affecting factors and aspects are considered. Although a considerable number of studies have been published for the QoE of traditional video [18]- [23], only a few studies on 360 • videos [24]- [30] have been recently published. However, much more attention is needed to investigate QoE-affecting factors and aspects. In [12], authors evaluate the QoE in terms of simulator sickness for omnidirectional videos; their study emphasizes the impacts produced by different HMD device. The authors in [31] investigate the direction and velocity of head motion and those of object motion in VR. Besides, their study analyzes the content feature such as object motion and background complexity to know the feeling of objects under diverse conditions by collecting sickness level ratings from 80 subjects. Regarding the perceptual quality of the 360-degree video, the authors in [32] subjectively evaluate the effect of encoding parameters, content type, and rendering device on QoE by taking into account the user's profile. Their study concludes that viewers tolerate the encoding parameters when they watch an interesting 360-degree video in VR compared to a non-interesting video. Besides, they reveal that viewers are concerned about device type and recorded higher MOS scores while watching in HTC Vive compared to Google cardboard.
Authors in [33] collected subjective score by considering three influence factors, including projection scheme, QP, and video characteristics. The study builds a QoE model using linear regression based on their subjective scores. To minimize the cybersickness level in 360 • video, the authors proposed a method by applying image processing and a wearable 360 camera [34]. Their study reveals that a sudden speed change in the translational and vibrational camera can result in severe cybersickness. Most of the existing studies either investigated the QoE in terms of perceptual quality by considering the head motion, object motion in the video, background complexity, and projection scheme. Most importantly, from the previous literature, the cybersickness aspect is evaluated by taking into account the content motion such as low, medium, and fast videos.
The most comprehensive study on QoE-affecting factors and QoE aspects for 360 • videos in VR, and closest to our work is presented by Huyen T.T et al. in [35]. Their study investigated the subjective QoE in terms of presence, acceptability, perceptual quality, and cybersickness. The authors considered the effect of four factors on QoE such as encoding parameters, content characteristics, rendering device and mode. Their results show a lower presence score strongly affected by content characteristics compared to perceptual quality. Furthermore, the study analyzed the impact of low, medium, and fast motion content on the user's cybersickness level. The authors concluded that cybersickness is a significant problem while watching fast motion videos in VR. The authors extend their work in a recent subjective study carried out in [11], which presents the content motion effect in more detail. In [11], the content motion type such as an object, camera, and background motion is analyzed. The study concluded that the impact of content motion on cybersickness is serious while the impact of rendering device is insignificant. In terms of cybersickness aspect, the above two studies focused on the content motion and device type. As compared to these studies above, we investigate the impact of all six QoE-affecting factors on perceptual quality. Among them, two new QoE factors (user's interest and user's familiarity with VR) were also analyzed. Besides, three influence factors, gender, user's interest, and user's familiarity with VR, which induce cybersickness in a virtual environment, are also investigated during the subjective experiment. These three factors have shown to have a substantial influence on the user's cybersickness level in a virtual environment.

B. ML-BASED QoE PREDICTION METHODS
The study of the literature shows that ML techniques have extensively been used for the QoE prediction of VR videos [36]- [38]. The previous researches in [39]- [42] presents the capability of various ML algorithms to classify the values for QoE-affecting factors, such as video quality and stalling. A Deep Neural Network (DNN) model is proposed to interpret the relation between subjective QoE score and network parameters [16]. The viewers were shown pre-installed traditional videos from a dataset of 80K samples with 89 features and the subjects were asked to give their opinion about video quality, stalling, loading, and overall quality of the video. For a YouTube QoE in cellular networks, a dataset of a distributed network and QoE measurement in a real user's smartphone is collected in [43]. The authors proposed ML models to predict the four QoE-affecting factors such as initial delays, video quality switches, number of stalling, and stalling frequency. The study also approaches multiple ML techniques for the prediction of MOS score and user engagement. These techniques include random forest, DT, SVM, kNN, and Naive Bayes (NB), the performance of random forest classifier is higher than other ML algorithms.  In a recent study conducted in [44], a data analysis model is proposed to analyze the video based on MOS collected from various datasets. The dataset includes the MOS value of three QoE factors such as initial buffering, stalling time, and stalling ratio. The QoE is predicted by the combination of two ML methods, LR and K-means clustering. The study concludes that their proposed technique achieved 97% performance improvement as compared to the existing prediction methods. The above-mentioned machine learning techniques for the QoE estimation and prediction are applied to only traditional videos, but not for 360 • videos. Unfortunately, there is still limited work on ML-based QoE prediction of 360 • video in VR.
Regarding 360 • videos, The authors in a recent study [45] subjectively evaluate the effect of various stalling events on end-users QoE under three different bitrate levels (1Mbps, 5Mbps, and 15Mbps). Besides, the interaction between stalling event and bitrate level was investigated. They proposed a Bayesian Inference Method (BIM) to predict the end-user QoE of 360-degree video in VR. Their study concluded that the adverse effect of multiple stalling in a single video sequence is more profound when the presentation quality level approaches to the high and low end. Another study in [46] proposed Movement (HM) saliency prediction method by using a deep reinforcement learning approach. The study also proposes offline-DHP to predict HM map of 360 • videos and online-DHP approach to predict HM position of one subject at the next frame, based on the 360 • video and HM scanpath until the current frame. Authors claim that their proposed method is significant in both offline and online prediction of head movement.
Recent work carried out in [47] approaches a deep reinforcement learning method and proposed a DRL360 model for 360 • videos that can adjust to dynamically varying features in environments and optimize different QoE objectives. The proposed technique utilizes DNN method to predict the future viewport and bandwidth, and assign the rate for tile by ACTOR-CRITIC algorithm. The model outperforms the existing methods by 20%-30%. These aforementioned machine-learning methods mostly focused on head movement and future viewport prediction. In this manuscript, we have collected a dataset from a subjective test, and the established dataset fits into various ML algorithms. Four supervised ML techniques LR, kNN, SVM, and DT, are trained with a perceptual dataset while ANN is trained with cybersickness data obtained from the subjective experiment and QoE are predicted. The Subjective QoE evaluation and machine learning-based prediction framework used in this manuscript are shown in Figure 1.

III. DESCRIPTION AND EXPERIMENTAL SETUP A. 360-DEGREE VIDEO QoE INFLUENCE FACTORS
To investigate the influence of various factors on two QoE aspects such as user's perceptual quality and cybersickness, we choose six factors that affect the end-user QoE. These factors are QP, resolution, rendering device, gender, user's interest, and user's familiarity with VR. The impact of these factors on the user's perception and cybersickness is recoded during a subjective experiment. QoE-affecting factors considered in our study are shown in Table 1, with features description and QoE aspects.

1) QP AND RESOLUTIONS
Both encoding parameters (QP and resolution) plays a vital role in video quality. Changing QP and resolutions of video affect the end-users' QoE. In this study, we evaluate the impact of various QPs and resolutions on users' QoE in terms of perceptual quality during a subjective test. To investigate the influence of these factors, we choose four QP (22, 28, 34, and 40) and four resolution (HD, fHD, 2.5K, and 4K).

2) RENDERING DEVICE
Different HMD devices have a distinct effect on viewers while watching 360 • video in VR. Different manufactured HMD devices are easily accessible to the users, which creates many challenges for researchers to evaluate the QoE because investigating the user's opinion in one device may be different in another type of HMD. Therefore, in this work, we used two different types of rendering devices (HTC Vive and Google Cardboard) during the subjective experiment. Both devices render 360 • video differently; HTC Vive is directly connected with desktop PC while Google Cardboard with a smartphone and the impact of both HMDs on the user's perceptual quality recorded.

3) GENDER
In our subjective study, we include both male and female users. The effect of cybersickness (dizziness, nausea eye strain, headache, disorientation, and vomiting) on different genders is recorded on a five-point scale ranging from ''1 to 5'', where ''5'' means very dizzy and ''1'' not dizzy.

4) USER's INTEREST
The user's interest in a video plays a vital role while investigating perceptual QoE in virtual environment. User personal interest in a video significantly affects the user's QoE. Since viewers immerse themselves in a virtual environment, if the user is not interested in a video he or she is watching, their opinion will be different in both cases. User's tolerance about perceptual video quality varies according to their interest. A user may be less sensitive and more tolerant when watching an interesting video. Even a lower quality of the video may less influence the viewer's opinion as compared to a non-interesting video. Similarly, in the case of cybersickness, the more user not interested in the content, more will he or she feel boredom, which also results in high dizziness or nausea. Therefore, it is essential to investigate the user interest in each 360 • video during a subjective experiment to evaluate the QoE.

5) USER's FAMILIARITY WITH VR
The viewer's prior experience about VR affects the end-users opinion. The judgment of those viewers having the first experience of watching 360 • video in the virtual environment is different about video content from those of regular or rarely users, which affect the viewer's opinion. In this study, we have users with different prior VR experience and familiarity (users watching the first time, rarely, and weekly).

B. SUBJECTIVE USER STUDY
Total twenty-nine subjects participated in the subjective experiment, including nineteen males and ten females subjects. To ensure the QoE of 360 • videos in different HMDs, we select two devices, HTC Vive and Google Cardboard for the subjective test in a virtual environment. The resolution of HTC Vive is 2160 × 1200 with a field of view (FoV) of 110 degrees and is directly connected with Desktop PC. Virtual Desktop software is installed in a PC used as a 360 • video player. Google Cardboard with 90-degree FoV covers 75% of human FoV used with Samsung Galaxy A7 mobile phone with 5.7 inches display and 1920 × 1080 resolutions. Six different types of 360 • videos are downloaded from YouTube. Figure 2 shows their example frames and the details of source videos are given in Table 2. The source videos are diverse contents that cover a wide range of SI (spatial) and TI (temporal) indexes. The SI and TI of source videos are shown in Figure 3. Each source video reduced into a short clip of 1-minute duration. Each source video is encoded in four different QP (22, 28, 34 and 40) with four resolutions 3480 × 1920 (4K), 2560 × 1400 (2.5K), 1920 × 1080 (fHD), and 1280 × 720 (HD). We use FFMPEG 1 software tool with H.264 (libx264), to create the 16 different videos stream with four QP and four resolution. Moreover, the audio track of the videos is discarded during the encoding process to bypass the impacts of acoustic information. Total, there are 96 videos used in our experiment. Before the actual test, Snellen (20/20) and Ishiara charts were used to check the visual acuity and color vision of all subjects. All the subjects were reported normal. Participants were exposed to a training session before the test to enlighten the subjects with the actual test procedure and to assist them to adjust the HMD device according to their head size [12]. The subjects allowed to sit on a swivel chair to move their heads freely and to cover the wide region of 360 • video. A five-point ACR scale were used according to ITU [48] corresponding to the experience of ''excellent 5'', ''good 4'', ''fair 3'', ''poor 2'', and ''bad 1''.
During the test, subjects were randomly divided into two groups. Each user of a group watches 48 samples of videos. The purpose of dividing subjects into two groups is to avoid the negative impact of the long experiment. Each subject in a group takes 1.5 hours, including 15 minutes break after every 16 samples (48 samples in each group). Therefore total test time was almost 40 h. After watching each sample, the subjects were asked four questions. The questions included in our subjective test are listed below. We have recorded five subjects with first time 360 • VR videos experience, eighteen users with rarely and six subjects with weekly experience during the subjective experiment. Each subject had to speak their score loudly so that the test organizer could note down on a paper to save time. The average rating of each user for each question is used as a MOS of that question. After conducting the subjective experiment, we applied the outlier removal scheme [49], and no subject was reported.

IV. SUBJECTIVE RESULTS AND ANALYSIS
The impact of six factors (QP, resolution, rendering device, gender, user's interest and user's familiarity with VR) on perceptual quality were investigated while the influence of three factors (gender, user's interest, and user's familiarity with VR videos) on cybersickness in VR was also evaluated. The results of a subjective experiment and their description are presented below.

A. IMPACT ON PERCEPTUAL QUALITY 1) USER's INTEREST
The subjective results show exciting outcomes of how the user's interest influence the perceptual quality and cybersickness while watching 360 • video in VR. In the case of a user interested in a video, Figure 4(a), 5(a), and 6(a) show the subject perceptual quality score against QP and resolution. Figure 4(b), 5(b), and 6(b) depict the perceptual quality score of the users watching a non-interesting video. In all three Figures (Figure 4, 5, and 6), it can be seen that users are less sensitive about the quality of the content for resolution and QP when watching interesting contents and at the same time more conscious about the quality of content for resolution and QP in non-interesting case. Therefore, it is suggested that the user's interest has an essential role in evaluating the QoE of 360 • videos.

2) USER's FAMILIARITY WITH VR
The subject's prior experience of watching 360 • video in a virtual environment has a significant effect on perceptual quality. Figure 4 depicts the perceptual quality score of the subjects watching 360 • video for the first time in VR. The figure shows first-time viewers' MOS scores against encoding parameters. Figure 5 interprets the impact of a user watching rarely with HMD on perceptual quality against encoding parameters. The influence of user watching weekly 360 • video in VR on perceptual quality is shown in Figure 6. Among all these three figures, the result of Figure 4 is exciting in the case of viewers having the first experience of a virtual environment. In Figure 4 (first time users), the MOS is higher in both interesting and non-interesting scenarios than rarely and weekly users, shown in Figures 5 and 6, respectively. One possible reason may be that users watching the first time in VR are more passionate as compared to rarely and weekly VOLUME 8, 2020   users because there is a clear difference between users' MOS score. Therefore, it recommends that the user's prior experience of watching 360 • video with VR has a notable impact on perceptual quality.

3) QP AND RESOLUTIONS
The impact of QP and resolution on perceptual quality is evident in Figures 4, 5, and 6. In all three figures, the MOS values for 4K is higher than other resolutions, even in interesting and non-interesting cases. Similarly, the MOS score for QP22 is higher than all other QPs in all three figures. In most cases, the MOS score for HD resolutions is lower than 3, which suggests that the content provider should avoid HD resolution when encoding 360 • video. Besides, the MOS score for QP22 and QP28 in all cases is higher than 3MOS. Therefore, it is clear that viewers feel comfortable at QP22 and QP28, even watching both interesting and non-interesting videos.
Another notable fact from all three Figures  (Figure 4, 5, and 6) is that users having weekly experience are more tolerant and less sensitive with given QP and resolution when watching interesting 360 • video. At the same time, these weekly users are very conscious about resolution and QP. Therefore, we can say that weekly users are regular viewers of VR videos and their MOS results are significant. Interestingly these viewers are also susceptible to video resolution in both interesting and non-interesting cases when they watch fHD and HD virtual reality videos. Therefore, based on these observations, content providers should take into account the regular VR viewers' requirements and interests, which meets the end user's satisfaction and expectations. Figure 7 shows the impact of two rendering devices on perceptual quality against four resolutions. From the subjective test, the MOS value for HTC Vive is recorded higher than Google Cardboard. For 4K resolution, the HTC Vive MOS value is 4.8 while the MOS is recorded 4.3 when subjects watch videos with 4K resolution in Google Cardboard. It can be seen that the average MOS difference between these rendering devices at 4K resolution videos is 0.5. It is also observed that the average MOS difference between these both devices is 0.1 when subjects watch 360 • video with HD resolution, which is negligible. It means that viewers QoE is almost the same for both devices in case of HD resolutions. From Figure 7, we can conclude that the MOS score of subjects is higher in all four-resolution cases when they watch through HTC Vive. Therefore, these results suggest that the rendering device also has an impact on viewers QoE. Hence, the HMD device type should be taken into account while investigating the QoE of 360 • videos.  Figure 8 depicts the average impact score of males and females based on the user's familiarity with VR on perceptual quality. One can see that the perceptual quality of males is higher for the first time and rarely users compared to female viewers. Interestingly, in the case of weekly users, the MOS value for the perceptual quality of females is higher. It suggests that the perceptual quality judgment of female users watching 360 • video in VR regularly (weekly) is higher than males. Figure 9 depicts the impact of the user's familiarity with VR videos on cybersickness. Each bar represents the cybersickness level (SSQ score) of each user for each sample video based on their familiarity with VR (a) first time users (b) weekly users, and (c) rarely users. Figure 9 (a) shows that users watching the first time in VR feels more cybersickness as compared to rarely users and weekly users shown in Figure 9 (b) and Figure 9 (c). Weekly users are less affected by cybersickness as compared to rarely users and first-time users. Interestingly the perceptual quality score of first-time users is higher as we see earlier in Figure 4, this might be because subjects having the first experience of VR are more excited than other users.

2) USER's INTEREST
Similarly, the impact of the user's interest on cybersickness is shown in Figure 10. Each bar represents the cybersickness level of each user for each sample video based on their interest (a) interested and (b) not interested. It can be seen that the subject's level of cybersickness is higher while watching non-interesting 360 • video (the content they are not interested in watching). Hence, it is crucial to consider the user's interest while evaluating the level of cybersickness in a virtual environment.

3) GENDER
The subjective results reveal that different genders have a distinct level of cybersickness while watching videos in a virtual environment. Females report a higher level of cybersickness and feel very dizzy than males. During the subjective test, out of 10 female feels very dizzy and recorded higher cybersickness shown in Figure 11. Similarly, each bar represents the cybersickness level of each user for each sample video based on their gender (a) male and (b) female. It recommends that females are more prone to cybersickness than males, which affect QoE. Therefore keeping in mind the impact of gender on cybersickness, the researcher and content provider VOLUME 8, 2020 FIGURE 9. Impact of user's familiarity on Cybersickness (SSQ score from 1 to 5 (5 = very dizzy, 1 = not dizzy)).  should take into account the gender before evaluating QoE for 360-degree videos. To easily understand the effect of all three factors on cybersickness, Figure 12 depicts the average impact of the user's familiarity, user's interest, and gender on cybersickness. Each bar represents the average SSQ score of users based on their familiarity, interest, and gender.

V. MACHINE LEARNING QoE PREDICTION
To predict the QoE of 360 • video in VR, we choose supervised machine learning techniques. The machine learning technique is an algorithm that can predict output by giving input independent variables. It takes a known dataset as an input and its known response (output) to learn the classification model. The learning algorithm then trains the model to predict the response to a new dataset. We collect the dataset from a subjective experiment for training the model. To assess the machine learning to predict the QoE, we modelled and designed four classification methods in python, including LR, kNN, SVM, and DT.

A. LOGISTIC REGRESSION (LR)
LR is a classification algorithm used when the target variable is dichotomous (1 or 0). We use binary LR for the prediction of QoE. Binary LR is a classification method that predicts whether the end-users' QoE is ''satisfactory'' or ''unsatisfactory.'' In binary classification, the output dependent variables are dichotomous, having only two possible values (binary). Therefore, we distribute the sample dataset into binary values ''0'' and ''1'' to label the output response. For classification purposes, we categorize the six factors shown in Table 1 and distribute the MOS values (independent variables) into ''0'' as unsatisfactory QoE and ''1'' as satisfactory QoE (output dependent variables). The MOS >= 3 is ''1,'' and MOS < 3 is  considered ''0''. Figure 13 shows the MOS distribution of all factors. Note that, only the MOS values of perceptual quality is considered in this section. We do not include the subjective results of cybersickness in this section. The cybersickness prediction will be explained in detail in Section VIII.

VI. EXPERIMENTS EVALUATIONS AND PERFORMANCE COMPARISON
We used python programming language for machine learning algorithms. Three steps are taken to accomplish the classification and prediction task: Dataset collection, training and testing of machine learning model, and evaluation of the prediction. The dataset obtained from a subjective experiment and the binary LR model trained on labelled data by using the scikit machine-learning library. The algorithm examines the training data and predicts the QoE on testing data. We used 70% of the subjective dataset for training and 30% for testing. The estimated parameter values for the regression coefficients and intercept are shown in Table 4. In order to evaluate the ability of binary LR model, two essential evaluation curves, Receiver Operating Characteristic Curve (ROC) and Area Under Curve (AUC) are applied, as shown in Figure 14. The solid line interprets the ROC curve of a binary logistic model, while the area under curve (AUC) is 0.94, which is > 0.5. It indicates that the model performs well as the more area under the curve, the more the model performs well.

A. K-NEAREST NEIGHBORS (kNN)
The k-nearest-neighbors algorithm is a supervised machine learning classification technique that uses a group of labelled data. Based on these labelled data, the algorithm learns how to label the other points. To label a new data point, the kNN algorithm search at the nearest label point that is closest (neighbor) to that new data point having those neighbours vote. Therefore, whatever label the most of the nearest neighbour have, will be the label for that new data point. K is the number of neighbours the algorithm checks.
We used the scikit-learn library in python to implement the kNN algorithm. To find the best value of K, we execute a loop for k, ranging from 1 to 40, to allow the algorithm to check up to 40 neighbours. To do so, we first calculate the mean error rate for all the predicted outcomes, where the range of K is kept from 1 to 40. In each iteration the mean error for the predicted outcome of the test set is calculated and then added to the list of errors. The mean error value against K ranging from 1 to 40 is shown in Figure 15. It is observed that the most optimal value of K is 19, 29, 30, 31, 35, and 37 for the given dataset, where the mean error is recorded zero.

B. SUPPORT VECTOR MACHINE (SVM)
Support Vector Machine (SVM) classifies data points by separating them with a linear decision boundary called a hyperplane. SVM split all data points of one class from another class and identify best separating hyperplane that maximizes the boundary between classes of training data points. SVM algorithm takes a supervised training (labelled) data and determines an optimal hyperplane to identify the most significant possible distance to minimize the upper bound. In SVM, support vectors are the coordinates of data that are closest to the best separating boundary (hyperplane), which gives the complete suitable information for SVM classification. The prediction performance of the SVM algorithm also depends on the kernel parameters. For optimization, we tune the hyper-parameters using grid search. Two functions of kernel parameters, linear and radial basis function (rbf) are applied. In addition, the different parameter of c is tested. Finally, we achieved the highest accuracy at c=1000 and kernel= rbf. The algorithm performs well in rbf case and achieved 79% accuracy while 72% recorded with linear.

C. DECISION TREE (DT)
Decision tree (DT) is a machine-learning technique used for classification problems. DT split the dataset into a smaller subset and incrementally construct a binary tree using the features and thresholds that returns the maximum information gain at each node. After splitting the dataset, the information gain depends on the decrease in entropy because the more data is impure, the more entropy will be. Gini index is the most prominent to measure the information gain. Therefore, the probability of getting the desired QoE outcome, the algorithm weight the entropy calculated for that desired QoE value. The outcome is a tree with decision nodes and leaf nodes. Figure 16 shows the DT graph generated on the subjective experiment dataset. The final decision is the prediction of the algorithm on the bases of training data that classify the users' QoE in class 1 (satisfactory) and class 0 (unsatisfactory).
We used 70% of subjective data as a training set and 30% as a test set to check the accuracy and performance of machine-learning QoE prediction algorithms. We compared the four ML classification algorithms' performance with respect to confusion matrix, accuracy rate, precision, recall, f1-score, and MAE.

1) CONFUSION MATRIX
The confusion matrix is a performance measurement for ML classification that provides the classification of the correct match rates for predicted against actual class. It gives the four different combinations of actual and predicted values. True positive of confusion matrix interpret correct outcome while false negative reflects the miss percentage. LR recorded 86% accuracy while kNN and DT 82% and 79%, respectively. SVM accuracy with rbf kernel is 79%, and with the linear kernel is 72%. A confusion matrix is prominent to measure precision, recall, and F1-score.

Precision is measured
3) RECALL It is the measure of how many of the actual positive a model predicted, through labeling it true positive. That is,

4) F1-SCORE
F1 is the accuracy measure of test and weighted harmonic mean of the precision and recall. The final value of the F1 measure is between 1 and 0. The closer the final F1score to 1, the better the accuracy of the test. The equation to calculate the F1-measure is.

5) OVERALL ACCURACY
The percentage of the correct prediction is the overall accuracy that is, where, TP is true positive, TN is true negative, FP is false positive and FN is false negative.

6) MEAN ABSOLUTE ERROR (MAE)
Mean absolute error calculates the average difference between actual values and predicted values. The difference error is proportional to the absolute difference between calculated and actual values. The following equation can measure MAE.
The calculated value of precision, recall, f1-score, MAE, and overall accuracy is shown in detail in Table 5. For validation purpose, we choose the linear regressionbased model [33], Back Propagation Neural Network (BPNN) [50], and Bayesian Inference Method (BIM) [45] as the prediction performance method. We compared these methods from the perspective of Pearson Linear Correlation Coefficient (PLCC) and Spearman's Rank-order Correlation Coefficient (SRCC) on the test set. Furthermore, the proposed QoE method is compared against the objective quality metrics designed for 360 • videos such as S-PSNR, WS-PSNR, and CPP-PSNR [51]. The PLCC and SRCC are calculated to evaluate the performance comparison of the proposed models. Figure 17 shows the PLCC and SRCC to compare the prediction performance of the proposed model. It can be seen that our proposed LR-based model delivers the best performance in predicting the subject's QoE.

VII. CYBERSICKNESS PREDICTION BASED ON ANN
After subjectively analyzing the impact of user's interest, user's familiarity with VR, and gender on QoE in terms of cybersickness, we build a model based on ANN that can assist content providers to keep in mind the viewer's preferences and interests. It can also help researchers to take into account the mentioned QoE-affecting factors while evaluating and predicting the cybersickness level in virtual reality. In our ANN model, we used four layers based on the Stochastic Gradient Descent (SGD) shown in Figure 18. Our model comprises one input layer, two hidden layers, and one output layer. X 1 , . . . , X 7 are input neurons, which is seven in our model. We use seven features of three QoE-affecting factors (user's interest, user's familiarity with VR and gender) as input nodes and five output nodes, Y 1 , . . . , Y 5 represent QoE in terms of cybersickness which is {1, 2, 3, 4, 5}. We used two hidden layers, h 1 1 , h 1 2 , . . . , h 1 n represents neurons in first hidden layer while h 2 1 , h 2 2 , . . . , h 2 n indicates the neurons in second hidden layer. The aim of using ANN for cybersickness prediction is to map these influence factors into a range from 1 to 5. We use a high-level neural network Keras library that runs on the top of TensorFlow. Keras offers the SGD optimizer with a learning rate. For optimization purpose, we firstly fix the SGD learning rate to 0.2, the number of neurons is then adapted in the hidden layer. Finally, we get the 64 neurons in the first hidden layer and 32 neurons in a second hidden layer that performs well with a prediction accuracy of 85%. We used 1000 epochs (iteration) during training the model. We also noticed that the final accuracy of the network varies with the tuning of learning rate. Therefore, we test the prediction accuracy of the proposed model against different learning rates. Figure 19 depicts the change in prediction accuracy with different learning rates.
To check the validity of the proposed ANN-based QoE prediction model, we compare the performance accuracy of our model against well known QoE prediction methods in terms of cybersickness. The PLCC and SRCC are computed to compare the prediction result of the proposed method and exiting QoE models. We choose the existing QoE prediction model in terms of cybersickness such as VR sickness predictor (VRSP) [31], VR sickness assessment (VRSA) network [52], and deep learning visual comfort assessment (VCA) method [53] as a performance comparing methods. Figure 20 shows the computed PLCC and SRCC of the proposed and existing QoE models in terms of cybersickness. It can be seen that our  proposed ANN-based model delivers the best performance in predicting the subject's QoE in terms of cybersickness.

VIII. CONCLUSION
This manuscript has utilized the machine learning algorithms for the QoE prediction of 360 • videos in virtual reality. Six QoE influence factors such as QPs, resolution, rendering device, gender, user's familiarity with VR videos, and user's interest in 360 • video are considered. The QoE is investigated subjectively on 96 video samples in two important QoE aspects, perceptual quality, and cybersickness. It is suggested that the user's interest has a vital role in evaluating the QoE. Similarly, the subject's cybersickness level is higher while watching a non-interesting 360 • video. Besides, the user's prior experience of watching 360 • video in VR has a notable impact on perceptual quality. Regarding the effect of gender on cybersickness, female users reported more severe cybersickness than males. The results from the subjective dataset used as training data for the prediction of QoE. Four supervised machine-learning algorithms are trained on subjective data, including LR, kNN, DT, and SVM. We proposed a binary LR technique for the QoE prediction of 360 • video. We compared the prediction results of the proposed method against kNN, SVM, and DT with respect to accuracy rate, recall, f1-score, precision, and mean absolute error (MAE). The proposed method performed well and recorded an 86% accuracy rate against other supervised machine learning techniques, which is in close agreement with subjective opinion. We also build a neural network model for QoE prediction in terms of cybersickness. We observe that ANN performs well at a 0.2 learning rate. Finally, the prediction accuracy of the proposed LR-based model is compared against the state of the art QoE methods in terms of perceptual quality, and the accuracy performance of the ANN model is compared against well known QoE models in terms of cybersickness. Both proposed models perform well against existing QoE methods.
In our future work, we will take into account deep learning methods to predict QoE for more influencing factors such as content type (fast, slow, medium), number of moving targets in the content, camera motion, and exposure time. Besides, few other QoE aspects will be considered, such as presence, acceptability, usability, and immersion. SADIQUE AHMAD received the Ph.D. degree in computer science and technology from the Beijing Institute of Technology, China. He is currently working as an Assistant Professor with the School of Computer Sciences and Technology, Iqra University, Karachi. He has published 22 research articles in peer-reviewed journals and conferences. His research interests include deep learning and image processing. He has worked on developing new measurement techniques for the prediction of students' cognitive skills during cognitive tasks (i.e., measurement of student's performance during the interview, any written Examination, and class activities) (transfer learning). His main focus of his work is to recognize emotions (e.g., frustration, stress, and anxiety) using videos of student's specific activities, such as interviews, written examinations, and final year project presentation.