QoE Prediction for Gaming Video Streaming in O-RAN Using Convolutional Neural Networks

The growing popularity of online and cloud gaming applications is reshaping the landscape of the entertainment industry and acting as a key driver of market growth. However, the dependency of these applications on network resources poses significant challenges to the communication infrastructure. This is particularly critical as network performance plays a key role in influencing user satisfaction during gameplay. Inevitably, these inherently interactive applications are also closely linked to the concept of quality of experience (QoE), which expresses the perceived quality of a service by end-users. In this paper, we leverage deep learning methodologies to develop an objective QoE prediction model. Specifically, the proposed prediction model investigates the effect of wireless network operation on the QoE of gaming video streaming. Employing a tailored multi-headed convolutional neural network (multi-headed CNN), the model can predict in real-time the transmission-related QoE value using measurable quality of service (QoS) parameters. To validate the effectiveness of the model, tests and evaluations were conducted in an open radio access network testbed environment equipped with O-RAN-compatible interfaces.


I. INTRODUCTION
T HE VIDEO game industry has grown at an astonishing rate in recent years, and gaming has now transformed from a niche hobby to one of the largest markets in the entertainment industry [1].Gaming reached an all-time high in terms of revenue and user engagement during the COVID-19 global pandemic.In 2022, the global video game industry's revenues surpassed the USD 200 billion mark for the first time, while the number of players worldwide was estimated at 3 billion.Currently, the video game industry exceeds the combined revenues of the music and film industries and is only trailed by television [2].Therefore, online and cloud gaming can be seen as an increasingly important research area, attracting the interest of both academia and industry.Consequently, the development of wireless networks capable of supporting the demanding service requirements of gaming video streaming is a necessary condition for the further growth of gaming applications.
Thus, it has become vital for Internet service providers (ISPs) to provide the best possible gaming experience to end-users, making efficient use of the available physical network infrastructure.However, understanding the mechanisms that determine quality as experienced by end-users is a highly complex process.This is due to the fact that these mechanisms include technical parameters such as terminal equipment, network state and video compression, the gaming environment, as well as the subjective perception of the players [3].To this end, quality of experience (QoE) is the metric that indicates the perceived quality of a service by the end-user, taking into account all the factors involved.In the context of wireless communication networks, QoE has become a key tool for evaluating the quality of their operation.This is because it allows a more in-depth knowledge of how the quality of service (QoS) factors of a network affect the perceived quality of communication services by end-users.
Legacy wireless networks, however, are unable to meet the network traffic and radio resource requirements of today's gaming video streaming applications as they exhibit limited bandwidth, high latency, and inefficient network resource management.These challenges require new wireless network designs that offer customized communication solutions based on programmable, flexible, and scalable architectures.A significant opportunity for transforming the wireless and mobile networks is offered by open radio access network (Open RAN) technology, as suggested by the O-RAN Alliance [4].Based on two key concepts, openness and intelligence, Open RAN can be seen as a disruptive technology in the wireless communications ecosystem, capable of enabling new markets and encouraging innovation.
In addition, the development of QoE prediction models can make a decisive contribution to optimizing the operation of wireless networks, complementing the innovative design of the Open RAN network architecture.The benefits of accurate QoE prediction in a wireless communication system with respect to system performance arise from the ability to predictively analyze the patterns and dynamic characteristics of QoE.This analysis can serve as the basis for the implementation of a network optimization policy.Consequently, accurate QoE prediction facilitates the effective planning for the allocation of the finite radio and computing resources, enhances network energy efficiency, mitigates network congestion, and leads to reduced capital expenditure (CapEx) and operational expenditure (OpEx), ultimately enhancing profitability.Real-time QoE prediction should be regarded as an important element in the design of next-generation wireless communication networks, particularly when transmitting services with intrinsic interactivity, such as gaming applications.
In this paper, we present a deep learning-based objective QoE prediction model for gaming video streaming applications.The prediction model is based on a customized multi-headed convolutional neural network (multi-headed CNN) applied to an O-RAN-compliant wireless communication network.The goal of the prediction model is to measure the influence of wireless network operation on gaming video quality as perceived by end-users in real-time.Since the prediction model can forecast the network/transmissionrelated QoE value using purely quantifiable QoS factors, the QoE evaluation can be categorized as objective.The QoS parameters of bandwidth, latency, packet loss, and jitter are captured and recorded by a QoS monitoring system designed with open-source monitoring tools.The prediction model first converts the collected QoS parameters into mean opinion score (MOS) values, and then the multi-headed CNN network analyzes the interdependencies among them to derive the overall QoE value for the wireless gaming video streaming service.To the authors' knowledge, this is the first study to measure and predict the QoE of gaming video streaming over Open RAN.Furthermore, it gives for the first time a benchmarking of deep neural network (DNN) algorithms for this kind of application.
The paper is organized as follows: Section II provides a review of state-of-the-art models for gaming video QoE prediction.Section III analyzes the QoE assessment methodologies in connection to gaming video applications.Moreover, it explores the gaming video QoE influencing factors (IFs) and evaluates the network/transmission-related IFs.In addition, it examines the QoS/QoE mapping approaches.Section IV introduces the suggested QoE prediction model.First, the Open RAN testbed is described, then the QoS monitoring system and dataset creation process, the QoS/QoE mapping model, and finally the deep learning-based QoE prediction model.Section V presents a comparative study of deep learning methodologies and provides the performance assessment findings for the suggested prediction model.Section VI concludes with closing observations.

II. RELATED WORK
Early work on QoE prediction for wireless video streaming focuses on conventional video formats without considering gaming video applications, which due to their highly interactive nature exhibit particular features and network requirements.
In [5], a DNN method for QoE prediction in mobile video transmission is presented.This approach uses a mobile phone application to collect data related to user QoE.The process involves the collection of a significant amount of data that includes both subjective ratings and network parameters.A DNN is then constructed to discriminate the links between these network parameters and subjective QoE ratings.
In [6], a real-world dataset derived from a mobile operator is employed to create a correlation between network-side parameters and user QoE in video streaming applications.This involves the utilization of a deep learning model for the initial forecast of the channel path loss.Subsequently, this prediction is applied to forecast the MOS for mobile video streaming.It is important to note that the trained model cannot be directly applied to a different geographical area, which does not allow the generalization of this approach.
In [7], a model integrating mobile edge computing (MEC) and software-defined networking (SDN) is proposed to allocate resources and reduce latency for 3D highdefinition video.The model uses an actor-critic based deep reinforcement learning algorithm for viewport prediction and QoE optimization and a long-short memory network (LSTM) for bandwidth and viewport prediction.The model can adaptively assign the best transmission throughput based on observations to maximize QoE.
In [8], an end-to-end framework for video QoE prediction is presented.Initially, a mixture of deep learning methodologies, including word embedding and a 3D convolutional neural network (C3D), are used to extract generalized features.These features are then fused and fed into a neural network to learn representations.The acquired representation is then used as input for tasks related to classification and regression.
In [9], an explainable artificial intelligence (XAI) model is used to overcome the need to ensure high levels of explainability in AI models.The study contrasts fuzzy decision tree models with classical decision tree models and random forest (RF) classifiers on a QoE classification dataset.The findings of the comparison demonstrate that fuzzy decision trees are easier to interpret and exhibit comparable performance, particularly in detecting stalling events in video streaming applications.
In [10], a Bayesian network (BN) model aiming to predict the re-buffering ratio by concentrating on a selected set of QoE parameters is introduced.Specifically, a neural networkbased approach is suggested to validate the BN's accurate representation of stalling data patterns.The study concludes by demonstrating that models incorporating hidden variables and contextual information exhibit enhanced performance on QoE-related metrics.
In [11], a tutorial focused on implementing QoE measurement and prediction in video streaming applications using supervised machine learning (ML) algorithms is presented.The tutorial is structured in three parts.Initially, an approach for applying video streaming QoE prediction models based on supervised learning is described.Next, the development of ML-based models for QoE prediction and measurement in 5G/6G networks is examined.Finally, a benchmark analysis is provided, which evaluates the performance of state-of-theart supervised learning ML models.
In recent years, there has been increasing interest in developing QoE prediction models adapted to the particular characteristics of wireless gaming video streaming applications.
In [12], a no-reference frame-based measurement system for gaming video quality performance is created using the support vector regression (SVR) algorithm.The SVR training includes nine frame-level indices as input features and video multimethod assessment fusion (VMAF) scores as reference data.This prediction model is characterized by low complexity, as it is based on features that are available in real-time.
In [13], two no-reference lightweight ML methods for predicting QoE in gaming video streaming are introduced.These models are constructed using the algorithms of SVR, Gaussian process regression (GPR), random forest (RF), and artificial neural network (ANN).Due to their simplicity, both models can serve as the initial phase of a real-time optimized online gaming QoE management framework.
In [14], a method is proposed to improve the quality of compressed gaming content by using generative superresolution contradictory networks (SRGAN).This involves using a DNN in tandem with an adversarial network to produce higher-resolution images.The proposed approach incorporates adaptations to the generative network, including modifications to the skip connections and loss function.These modifications improve the information flow within the network, leading to increased perceived quality.
In [15], a no-reference lightweight module for assessing video quality in gaming content is created, utilizing the random forest regression (RFR) algorithm.This method forecasts video quality scores solely based on the recorded video, prioritizing features that are straightforward to calculate.Additionally, it incorporates minimal features in order to establish a model capable of making real-time QoE predictions.
In [16], a technique is put forward to develop a CNNderived metric for assessing the quality of gaming videos.The CNN undergoes training with the VMAF objective quality model as a reference, and further refinement is achieved through subjective image quality evaluations.Additionally, a novel temporal pooling approach based on frame-level predictions is introduced for the prediction of gaming video QoE.
In [17], a methodology for real-time reduced reference evaluation of gaming video quality is introduced.This approach is built upon a psychometric curve-fitting method characterized by low complexity.The model employs ML techniques, including decision tree regression (DTR) and ANN.The proposed solution selects the most pertinent objective features while minimizing complexity.Subsequently, it models the relationship between these features and the reference quality by incorporating human visual system (HVS) psychometric perception.
In [18], a QoE estimation model is introduced, incorporating both gaming and non-gaming videos and relying on the algorithms of CNN and RF.The CNN is trained using an objective metric, enabling it to grasp video artifacts, and is fine-tuned using blockiness and blurriness scores derived from a small image quality dataset.The temporal and framelevel information of videos is employed by an RF model to estimate video QoE.The model's low complexity renders it well-suited for real-time applications.
In [19], a streamlined quality prediction model for gaming video streaming is suggested, utilizing CNN.This prediction model incorporates a hard pairwise ranking loss, enabling it to prioritize the discrimination of similar pairs.Additionally, an efficient adapted distillation model is integrated, resulting in minimal performance loss.
The first set of state-of-the-art QoE prediction models mentioned above focuses on the wireless streaming of conventional video formats and cannot be generalized for use in gaming video streaming applications, as these models do not take into account the specific aspects and characteristics of this type of video.The second set of QoE prediction models mainly takes into account frame-level IFs of the gaming video format, such as bluriness, naturalness, blockiness and complexity, as well as the influence of spatiotemporal features and psychometric parameters.Our approach differs from these methods, concentrating on the influence of the transmission channel on gaming video QoE.Across various applications, the majority of existing literature on QoE assessment and prediction neglects the thorough examination of network/transmission-related IFs.In instances where the impact of the wireless transmission channel is considered, it is typically done using network simulators and emulators.Our study stands out as the first to utilize a real 4G/5G wireless network, thoroughly investigating the impact of network/transmission-related IFs on gaming video streaming QoE under real-world conditions.Moreover, our work is centered on developing an entirely objective QoE prediction model through the application of QoS/QoE mapping functions.Specifically, we consider real-world network design scenarios based on Open RAN technology, where the use of subjective QoE evaluation and prediction models is impractical due to their limitations, as discussed in Section III-A.Finally, this is the first paper that provides a comprehensive study on deep learning techniques for QoE prediction in wireless gaming video streaming applications.The choice to develop deep learningbased prediction models is due to the effectiveness of their operation when trained with huge amounts of data, such as those encountered in real mobile networks, which tends to completely replace the use of statistical analysis and conventional ML models.

III. GAMING VIDEO QOE ASSESSMENT
The idea behind the development of the QoE metric is to reflect the quality of a service as perceived by-end users.As a result, in a wireless network ecosystem, QoE indicates how the network operation affects the perceived quality of communications services.The QoE evaluation is conditional on the comparison between the anticipated quality characteristics that define the user's expectations and the perceived characteristics derived from the physical stimulus.
A contributing factor to the difficulty in comprehending gaming video QoE mechanisms is that gaming, as opposed to conventional media applications, can be viewed as a human-machine interaction.Therefore, typical approaches to assessing the influence of transmission on media distribution are not applicable.Furthermore, parameters such as the content, the backend platform on which the game was developed, the user interface, the wireless communication channel, and the user features can all significantly affect QoE [20].

A. PREDICTION METHODOLOGIES
The literature reports two ways of evaluating QoE, subjective and objective evaluation [21].In subjective methods, participants evaluate the quality of a service after being subjected to a set of tests or stimuli.These techniques use psychophysical and psychometric methodologies to quantify evaluators' perception of service quality, as well as qualitative methodologies to determine which IFs affect QoE and to what extent [22].Evaluators, as a general rule, rate a variety of perceived quality attributes on a MOS scale that spans from 1 to 5 (i.e., poor to excellent), reflecting the degree of contentment with a service [23].The main advantage of subjective evaluations lies in the fact that, due to the direct reception of data from end-users, they achieve precise results.These results can subsequently be used as a reference for training and validating QoE prediction models [24].
Because subjective methodologies have boundaries such as being laborious, costly, unable to perform real-time use, and non-reproducible, there has been a substantial interest in creating objective models that predict QoE using exclusively measurable qualitative characteristics of communication networks.The idea behind objective methods is to predict QoE values that are close to the evaluations of subjective techniques.The fundamental benefit of objective approaches is their ease of use and adjustment, because the assessment procedure requires quantifiable QoS parameters and mathematical models that connect these parameters to QoE values.The drawback of these methodologies is their inaccuracy, as they calculate the QoE value; hence, their results are an estimate and not an actual representation of the quality perceived by end-users [25].

B. GAMING VIDEO QOE NETWORK/TRANSMISSION INFLUENCING FACTORS
Any characteristic of a user, system, service, or application that can have an impact on the perceived quality of a service by the end-user can be referred to as a QoE IF [26].The purpose of this paper is to investigate the effect of the operation of the wireless communication network on QoE.For this reason, our work focuses on the study of network/transmission-related IFs.These IFs are affected by transmission channel losses and refer to network QoS parameters, including latency, jitter, bandwidth and packet loss [27].
Wireless gaming video streaming and cloud gaming applications require extensive Internet resources when connecting between client and server.Due to the need to exchange a huge amount of multimedia data with a server, a user's wireless connection must be able to send this data with the lowest possible delay [28].Therefore, these types of applications are susceptible to variations in network QoS parameters, which makes it necessary to ensure adequate values for latency, jitter, bandwidth, and packet loss to achieve optimal QoE: • The latency perceived by an end-user is associated with the time interval from the execution of the user's commands to the occurrence of the subsequent game event on the screen.Therefore, the impact of delay on QoE is greatly affected by the game's characteristics [29].
• Jitter has a discernible influence on the QoE of online and cloud gaming applications [30].The presence of jitter induces an unsmooth visual impression of the game, as due to its effect, the display of video frames occurs with a fluctuating latency [31].• The impact of bandwidth limitation on QoE has proven to be particularly important for cloud gaming applications, as they rely on real-time gaming content streaming.This requires a consistent and high-rate data transfer to ensure smooth gameplay.Bandwidth constraints can lead to buffering, lower resolution and increased latency, negatively impacting QoE [32].
• Packet loss has a significant impact on QoE in gaming applications, with levels as low as 1% resulting in an important decline in the end-user experience.Excessive packet loss reduces visual quality, leading to lower frame rates and a poor gaming experience [33].

C. QOS/QOE MAPPING
The operating principle of QoS/QoE mapping is to calculate QoE using only countable values of the network QoS parameters.To specify the association between these parameters and QoE level, it is essential to create a correlation model.The purpose of this model is to compute QoE values using fitting mathematical models [34].The prediction model we have developed is based on the logistic mapping function, IQX hypothesis, Weber-Fechner law, and Steven's power law, as follows: 1) To convert the countable QoS parameters into MOS scores, an appropriate mapping function is required.Mapping functions can be either linear or nonlinear [35].However, since objective quality metrics are seldom uniform, the linear mapping functions tend to understate the outcome.Therefore, typically, nonlinear mapping functions are employed, as they obtain more accurate correlations.One of the most commonly used mapping functions is the logistic, as shown in the below formula [22]: where a, b and c coefficients are adjustable parameters.
2) The IQX hypothesis is an exponential method that describes QoE as a parameterized negative exponential function of a QoS impairment attribute.QoE can be defined as a function of n influence factors I j , 1 ≤ j ≤ n: IQX hypothesis focuses on a single influence component, I = QoS, to obtain the primary correlation QoE = f (QoS) [36].We fit the differential equation below, considering a linear connection at the QoE level: where γ coefficient defines the sensitivity of the observed QoE to changes in QoS.This equation's outcome is specified as an exponential function reflecting the fundamental relationship of the IQX hypothesis: where a, b and c coefficients are adjustable parameters.
3) The Weber-Fechner Law (WFL) is a logarithmic approximation that relates the perceptual capacities of the human sensory system to the awareness of scarcely perceptible changes of a salient stimulus [37].The following differential equation explains it: Therefore, the ensuing mathematical equation is logarithmic and can be utilized to express the coupling of stimulus and perception as follows: where a, and b coefficients are adjustable parameters.
4) The Stevens' power law (SPL) is a psychophysics law that describes how the intensity of a physical stimulus affects human perception [38].The following equation can be used to describe SPL: where P stands for human perception as a product of stimulus strength S, K is a constant that varies with the measurement setting, and the exponent b denotes the kind of stimulus and defines the curvature of the function power.

IV. QOE PREDICTION MODEL
The suggested QoE prediction model is based on a multiheaded CNN and applied to a small-scale experimental wireless network created using Open RAN technology.The user-side QoS monitoring system has been built with opensource monitoring tools.The collected QoS attributes of bandwidth, delay, jitter and packet loss are provided to the QoS/QoE mapping model, which maps them to MOS values.Finally, to calculate the overall QoE value, the multi-headed CNN prediction model analyzes the interdependencies between the mapped input values.

A. OPEN RAN TESTBED
Open RAN technology has the potential to revolutionize mobile RAN and is expected to be a key element in the creation of next-generation wireless networks.In line with the O-RAN Alliance specifications, the Open RAN architecture will be able to provide customized communication services to meet the demanding and diverse service demands of innovative usage scenarios, including online and cloud gaming applications [39].The Open RAN implementation operates in a small-scale testbed environment, supporting both 4G and non-standalone The following are the main characteristics of the testbed, as illustrated in Fig. 1 [42]: • Remote radio head (RRH) is implemented for all RAN architectures via a universal software radio peripheral (USRP) B210 in single-input single-output (SISO) mode, connected to the PC with a universal serial bus (USB) 3.0 cable.• For split option 7.2, the distributed unit (DU) implements higher-layer physical layer (high-PHY) functions, including medium access control (MAC) and radio link control (RLC) layer, while the central unit (CU) deploys radio resource control (RRC), packet protocol data convergence process (PDCP) and service data adaptation process (SDAP).The two units are connected to each other via an F1 interface.• The virtual baseband unit (vBBU) is present in the cases of 4G and 5G monolithic architectures, performing the functions of the CU and DU units.• The EPC consists of the 3GPP modules based on the home subscriber server (HSS), the mobility management entity (MME), the service gateway (SGW) and the packet gateway (PGW).However, the service and packet gateways (SPGWs) have separate user (SPGW-U) and control (SPGW-C) layers.

B. QOS MONITORING AND DATASET GENERATION
The QoS monitoring system uses the open-source monitoring tools Prometheus [43], Telegraf [44] and Grafana [45] for the collection of the network/transmission metrics of bandwidth, jitter, latency and packet loss.It can be classified as a usercentric QoS monitoring system since network/transmission parameters are collected from the user side using end-user device probes to provide application-level metrics [46].The dataset for training the neural network was generated by monitoring the QoS parameters during the wireless transmission of YouTube gaming videos.Data was collected using a variety of video streaming loads from November 2022 to January 2023.The monitoring period was 8 weeks in total, and data was collected with a measurement interval of 1 min, resulting in a total of 80640 data samples for each QoS parameter.Videos with a resolution of 2560x1440 pixels (2K) and frame rate of 60 FPS were used.The fivenumber summary of the dataset (i.e., minimum, first quartile, median, third quartile and maximum) is shown in Fig. 2 and the.collected samples of QoS parameters are shown in Fig. 3.

C. QOS/QOE MAPPING MODEL
In order to construct the QoS/QoE mapping model, we used the logistic, IQX, WFL and SPL mapping methods by applying formulas (1), ( 4), ( 6) and (7) to the measurements of the network QoS parameters, respectively.To obtain an approximate curve y = f (x) that best fits the discrete set of measurement points (x i , y i ), where i = 1, 2, 3, . . ., n, we used the curve fitting method.In particular, we adopted the method of least squares, which is one of the most commonly used methods to find the curve that best fits a given dataset [47].In the formula f (x), the coefficients a, b, and c of the mapping functions are adjustable parameters.The aim of the least squares method is to determine these parameters to minimize the fitting error, i.e., the variance between the data values y i and the y-values f (x i ) on the fitted curve.The residuals are defined as the variances between the observed y-values and those given by the fitted curve at the x-values where the data were originally collected.
Let the data points be (x 1 , y 1 ), (x 2 , y 2 ), . . ., (x n , y n ) where x is the independent variable and y is the dependent variable.The deviation error e i of the fitted curve f (x) from each data point is determined as follows: According to the principle of the least squares, the best fitting curve has the property that the sum of the squares of errors in formula ( 8) is minimum, and hence, the calculated value of the parameters a, b, and c minimizes the error e i . n To implement the least squares curve fitting we used Python's SciPy, NumPy, and Pandas open-source libraries [48].To evaluate the curve fitting accuracy, we used the metrics of R 2 and mean squared error (MSE) [34].
Table 1 shows the curve fitting accuracy results for the IQX, WFL, SPL and logistic mapping functions, whereas Fig. 4 shows the associated curves.The QoS/QoE mapping mechanism, as shown in Fig. 4, transforms the measured values of bandwidth, jitter, delay and packet loss into MOS values.It is worth mentioning that in the case of latency, jitter and packet loss, their impact on perceived service quality develops in reverse proportion to the degradation, i.e., the higher the QoS value, the lower the objective quality.In the case of bandwidth, however, the higher the QoS score, the higher the objective quality.For this reason, the algebraic signs in equations ( 1), ( 4), (6), and ( 7) have been adjusted accordingly.

D. DEEP LEARNING PREDICTION MODEL
The development of the QoE prediction model is based on deep learning techniques, namely CNN.Originally designed for two-dimensional image data, CNN can also be used to model time series prediction problems, as they can automatically learn features from sequence data, handle multivariate data, and instantly generate a vector for multistep prediction.One of the key advantages of CNN is that it can generate representations for fixed-size frames, permitting the expansion of the network's frame size through multilevel stacking.This ensures exact control over the maximum length of the dependencies to be modeled.Furthermore, the convolutions can be applied in parallel, allowing an entire input sequence to be processed in its whole during both training and evaluation, which results in faster training and testing procedures [49].As a rule, the architectural design of CNN consists of an input layer, an output layer and multiple hidden layers.These layers are referred to as convolutional layers, aggregation layers and fully connected layers, respectively.More specifically, the convolution layers are responsible for the convolution or cross-correlation process, which maps multidimensional parameters using kernels, i.e., locally connected filters.The convolution performs for each position p of the output y, the following operation [50]: where p G stands for the positions in the receptive field G of the convolutional filter W, which corresponds to the receptive field of the neurons in the inputs at a convolutional layer.W denotes the weights which are distributed in the input representation.Typically, when the inputs and outputs contain M and N filters respectively, the convolutional layer would need M × N filters to perform the convolution process.The CNN architecture is based on three key principles [51]: 1) sparse interactions, 2) parameter sharing, and 3) equivalent representations, which has the effect of reducing the volume of model features, and the number of necessary factors, minimizing the possibility of overfitting during model training.
A subclass of the CNN architecture is the CNN model with multiple heads, in which a separate CNN sub-model or a separate head is used for each input variable.In multi-headed architecture, adaptation is required in the design of the model and in the preprocessing of the training and test datasets.In terms of model design, a separate CNN model must be defined for each of the input variables, with appropriate modifications to the number of layers and hyperparameters tuning.More specifically, a sub-model must be developed for each input variable, which accepts a one-dimensional sequence of input data and outputs a planar vector containing a summary of the feature learning.The set of these vectors can be merged by concatenation in order to develop a large vector that, prior to forecasting, is interpreted by a number of fully connected layers.
To develop the deep learning prediction model, we used TensorFlow [52] and Keras [53].The suggested QoE prediction model is built using a multistep, multiheaded CNN with multivariate input.The multivariate input, as illustrated in Fig. 5, comprises the mapped values of bandwidth, jitter, latency, and packet loss.The hidden layer is based on a multi-headed CNN designed to resolve sequence problems.Since the number of elements in input and output sequences might differ and prediction entails providing the next value in an actual sequence, the task of sequence prediction is made particularly challenging.These kinds of applications are typically framed as prediction problems involving one or more input time steps and one output time step.In our case, we use the QoS measurements of the previous 2 days to forecast the QoE of the next day.
In data preparation we resampled the dataset into non-overlapping 60-minute sequences.Consequently, the 80640 minute input data were resampled into hourly samples, yielding a total of 1344 samples.The multi-headed CNN is designed as a multistep multivariate time series prediction model that predicts the next day's QoE values based on the previous two days' data.In particular, to predict the QoE of 24 forward time steps (1 day x 24 hours), it uses 48 hourly samples (2 days x 24 hours) as backward time steps.The prediction model has 16 input sequences corresponding to the mapped values of the QoS parameters and 1 output that yields the overall QoE value.
The structure of the multi-headed CNN model is shown in Fig. 6.The model does not consider the data to have time steps; instead, it treats it as a sequence on which convolutional layers may apply filters similar to a 1D image.During model development, the input and flatten layers are kept in lists in order to determine the model inputs and the function of the max pooling layer.The model expects 16 arrays as input, one for each of the sub models.This is required at all stages of the process, including training, evaluation, and prediction with a final model.For this reason, a list of 16 3D matrices [samples, time steps, features] has been created, where the column of features is equal to one, as it represents the one-dimensional input sequences.

V. PREDICTION MODEL EVALUATION
The aim of this paper is to introduce a model designed to predict the QoE of wireless gaming video streaming in real-world conditions.The development of the proposed prediction model is based on deep learning techniques, while its effectiveness is evaluated through comparison with state-of-the-art deep learning models.This choice is based on the superior performance demonstrated by deep learning models when trained with huge amount of data, such as that encountered in real-world mobile networks.Therefore, statistical analysis and traditional ML models are not considered, and instead, the performance of the proposed model is evaluated against deep learning models belonging to the CNN and RNN classes.

A. DEEP LEARNING MODELS FOR COMPARISON
The multi-headed CNN is compared with the most widely used models of the CNN class, including CNN, multichannel CNN, temporal convolutional network (TCN), and residual CNN-LSTM (ResCNN-LSTM), as well as with the well-established models of the recurrent neural network (RNN), LSTM, and gated recurrent unit (GRU): • CNN is suitable for multistep time series forecasting problems.It can be used in two ways: either in a recursive forecasting approach, in which the model produces one-step forecasts and the outputs are provided as inputs for subsequent forecasts, or in a direct forecasting approach, in which a model is constructed for each time step to be forecast.A major advantage of using CNN is that it can provide forecasts using numerous 1D inputs, which is advantageous in problems where the output is a function of a multivariate input sequence.• Multi-channel CNN is an enhanced version of CNN designed to predict the value of the next time step using each of the variables in the time series input sequence.This is achieved by feeding each unidimensional time series into the model as a separate input channel.The model then reads each input sequence into a separate set of filter maps, thus learning features from each input time series variable.This method is useful for problems in which the output sequence is a function of observations at previous time steps of various independent features.• TCN is a variant of CNN for sequence modelling problems that incorporates elements of RNN and CNN design.The key difference between TCN and other CNNs is that it uses causal and dilated convolutions.
Causal convolutions prompt the model to learn the relationship between time steps while maintaining the natural time sequence.This differs from other CNNs, whose algorithms use all available data in a sequence.
The dilation approach allows TCN to process more steps in the time series as it progresses deeper into the layers.Typically, these networks are trained faster than RNN.• LSTM, a dedicated type of RNN, is specifically developed for applications where the use of a common RNN proves to be ineffective.It excels at capturing long-term dependencies, having an architectural design specifically tailored to address the primary shortcomings of RNN.These shortcomings include the challenge of preserving past state information and addressing the problems of exploding and vanishing gradients commonly associated with conventional RNN.
• ResCNN-LSTM is a CNN framework that addresses the vanishing gradient problem, allowing for the construction of networks with a huge number of convolutional layers.This is accomplished by the use of skip connections, in which some layers are skipped and the output of the preceding layer is passed on to the current position, enhancing the model's operation.Another purpose of skip connections is to allow for better gradient flow and to guarantee that critical attributes are conveyed down to the network's final layers without increasing computational load.

B. COMPARATIVE EVALUATION OF PREDICTION MODELS
The prediction models use each of the 16 time series input variables to predict the total value of QoE for the next 24 time steps.This is achieved by including each one-dimensional time series as a separate input sequence, to which an internal representation is then interpreted from the output.This is because, instead of predicting a single feature, our approach requires multivariate inputs, as the output sequence is a product of observations of previous time steps affected by multiple independent features.The 8 weeks of QoS monitoring resulted in 56 days of data collection.These 56 days were distributed as follows: the training data set consists of 42 days, the test data set consists of 7 days, and the validation data set consists of 7 days.We employed Keras callbacks to increase the efficiency of training when assembling the models.In particular, we utilized ModelCheckpoint to store model weights at key time steps, EarlyStopping to terminate the process when the observed evaluation metric no longer improves, and ReduceLROnPlateau to minimize the learning rate when the observed metric no longer improves.
We utilized the MSE, root-mean-square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and median absolute error (MedAE) metrics to assess the accuracy of the prediction models [34].Table 2 presents the findings of the prediction accuracy metrics.We can see that the suggested multi-headed CNN model prevails over the other models by outperforming them in all the accuracy scores.It is important to emphasize that all the prediction models show moderate discrepancies between them, which validates the suitability of the CNN and RNN class models in addressing time series data modelling problems.The results of the QoE prediction of the next day over the validation dataset values of the previous 2 days are depicted in Fig. 7.The next day's prediction corresponds to 24 forward time steps, and the validation basis of the previous 2 days corresponds to 48 backward time steps.Table 3 includes the mean QoE values as derived from the considered prediction models.We can observe that the proposed model approximates the actual QoE value with greater accuracy compared to the rest of the prediction models.
The reason for using mean QoE values is that this aggregate metric summarizes the performance of deep learning models across the entire dataset and observation time.This facilitates comparison between models and provides a strong indication of the expected level of QoE predicted by the models.In addition, the deep learning models exhibit stochastic behavior during the training phase due to the use of mini-batch sampling and the dropout regularization technique.Utilizing average QoE values helps mitigate the effects of this stochasticity by providing a summary statistic that reflects overall performance by smoothing out the noise introduced during training.

C. REAL-TIME QOE PREDICTION
The trained models are used to predict QoE in real-time, as shown in Fig. 8    of the gaming video streaming via the Open RAN testbed, providing a total of 240 minute samples.These samples are used to evaluate the accuracy of the real-time QoE prediction in backward time steps and are the dataset against which the ability of the model to generalize well to new data is validated.
The suggested multi-headed CNN model outperforms the CNN, multi-channel CNN, TCN, LSTM, ResCNN-LSTM, RNN, and GRU prediction models in all accuracy measures, as shown in Table 4.We can also see that the real-time QoE prediction accuracy scores for all models are lower than the values obtained when evaluating the models.This is due to the fact that the experimental real-time dataset is limited in size, which affects the intrinsic variance of the input data, leading to a larger intrinsic variance of the training dataset compared to the validation dataset.Due to its greater ability to handle small datasets, the multi-headed CNN model achieves higher accuracy than the rest of the prediction models.The prediction accuracy improves when the volume of the dataset is increased through long hours of data collection, but we are interested in verifying the usefulness of the prediction model in real-world situations, where the model should be able to provide predictions in a limited amount of time.
Table 5 presents the real-time average QoE value obtained from the considered forecasting models.This value indicates the overall performance of the Open RAN testbed as determined by the composition of the dependencies among the 16 input variables.As mentioned previously, the multi-headed CNN model approximates more accurately the actual QoE value compared to state-of-the-art deep learning prediction models.This demonstrates that the multi-headed CNN is highly effective in handling small-sequence problems, making it suitable for real-time applications, where, due to time constraints, wireless network performance evaluation relies on feeding a limited amount of new data into the QoE prediction model.

VI. CONCLUSION
Real-time QoE prediction should be thought of as a key component in the design of next-generation wireless communication networks, especially with regard to the transmission of inherently interactive and increasingly demanding gaming video streaming applications.In this work, we provide a QoE prediction model based on deep learning techniques.More specifically, we present a multi-headed CNN-based objective QoE prediction model for gaming video streaming applications.The prediction model is evaluated on an Open RAN testbed and is capable of quantifying the impact of wireless network operation on gaming video quality using the QoS parameters of bandwidth, latency, packet loss, and jitter.
The suggested prediction model outperforms state-of-theart deep learning models, including CNN, multi-channel CNN, TCN, LSTM, ResCNN-LSTM, RNN, and GRU.The multi-headed CNN model has greater prediction accuracy and can generalize to new data more successfully.Furthermore, it is suitable for time series forecasting problems with multivariate inputs since a distinct CNN sub-model is employed for each input variable and an output planar vector is created that comprises the feature learning summary.In addition, our findings show that the multi-headed CNN model is ideal for real-time forecasting applications since it operates effectively with small data sequences.
This is the first study to present a comparative analysis of deep learning algorithms from the CNN and RNN class for QoE prediction in wireless gaming video streaming applications.Furthermore, this is the first time a QoE prediction model for gaming video streaming has been developed and evaluated on an Open RAN testbed.

FIGURE 2 .
FIGURE 2. Five number summary of QoS parameters.

FIGURE 6 .
FIGURE 6. Structure of the multi-headed CNN.

FIGURE 8 .
FIGURE 8. Representation of real-time QoE prediction versus actual values.

TABLE 1 . Curve fitting accuracy.
• RNN is a modified version of feedforward neural networks (FNN), specifically designed for processing sequential or time series data.Unlike conventional FNN, which is suitable for handling unrelated data, RNN excel in scenarios where the values in a sequence are dependent on each other.It can be trained to retain historical information from previous inputs, allowing it to predict future values within the sequence.This ability to capture the dependencies between values makes RNN an effective choice for modeling data with sequential relationships.
• GRU stands out as an improved variant in the RNN category, addressing the challenge of vanishing gradients faced by traditional RNN.Similar to LSTM, GRU addresses this issue by incorporating two gates -the update gate and the reset gate -to regulate the data stream.What makes GRU salient is its simpler architecture compared to LSTM; it does not have a separate cell state and has only one hidden state.This streamlined design simplifies the model and also improves the training efficiency, making GRU a more efficient and advanced option.
, where QoE values are predicted for 24 time steps forward.Predicted versus actual QoE values are presented, based on real-time QoS monitoring and QoS/QoE mapping.The experimental data collection was performed over a 4-hour period and was based on QoS monitoring