Quality of Experience for Streaming Services: Measurements, Challenges and Insights

Over the last few years, the evolution of network and user handsets' technologies, have challenged the telecom industry and the Internet ecosystem. Especially, the unprecedented progress of multimedia streaming services like YouTube, Vimeo and DailyMotion resulted in an impressive demand growth and a significant need of Quality of Service (QoS) (e.g., high data rate, low latency/jitter, etc.). Mainly, numerous difficulties are to be considered while delivering a specific service, such as a strict QoS, human-centric features, massive number of devices, heterogeneous devices and networks, and uncontrollable environments. Thenceforth, the concept of Quality of Experience (QoE) is gaining visibility, and tremendous research efforts have been spent on improving and/or delivering reliable and addedvalue services, at a high user experience. In this paper, we present the importance of QoE in wireless and mobile networks (4G, 5G, and beyond), by providing standard definitions and the most important measurement methods developed. Moreover, we exhibit notable enhancements and controlling approaches proposed by researchers to meet the user expectation in terms of service experience.

research and explore it in depth. The rest of this paper is structured as follows. We provide an overview of the influencing factors on the users' experience in section II. We introduce different models and approaches used to measure the QoE in section III. Then, in section IV, we discuss controlling methods proposed by various researchers to improve the QoE, section V exhibits the challenges and enhancements aiming to bring the content closer to the enduser. In section VI, we discuss some recent technologies and hot problems related to QoE. Finally, a few concluding observations are drawn in Section VII.

II. FACTORS INFLUENCING THE QUALITY OF EXPERIENCE
Since the QoE is still a new concept, content providers, service and network providers, in addition to researchers are facing new challenges related to delivering, measuring, and controlling QoE. Then, investigating and analyzing the QoE influencing parameters (IFs) [10] is a first step to go. It is hard to predict the QoE because of its subjective nature, see Figure 1. Therefore, in order to evaluate the overall service quality, factors that influence the users' perception should be determined beforehand [11]. Qualinet [9] has defined IFs of the QoE as follows: "Any characteristic of a user, system, service, application, or context whose actual state or setting may have influence on the Quality of Experience for the user." The IFs could interrelate, thus they should not be classified as isolated entities. From this perspective, they are classified into three categories: • Human-related Influencing Factor: any variant or invariant property or characteristic of a human user. The characteristic can describe the demographic and socioeconomic background, the physical and mental constitution, or the user's emotional state.
• System-related Influencing Factors: properties and characteristics that define the technically generated quality of a service or an application. They are associated to media capture, transmission, coding, storage, rendering, and reproduction/display, also to the communication of information itself from content production to the user.
• Context-related Influencing Factors: are factors that embrace any situation property to describe the user's environment, in terms of physical (location and space, including movements within and transitions), temporal, social (people present or involved in the experience), economic (Costs, subscription type, or brand of the service/system), task, and technical characteristics These factors can occur on different levels. In addition to the three previous IFs (i.e., context level, system level and user level), Juluri et al. [12] introduced a fourth IF for video delivery, see Figure 2:

QoE Influencing Factors
• Content-related Influencing Factors the information regarding the offered content by the service or application under study. They are associated, in the case of video, with video format, encoding rate, resolution, duration, motion patterns, type and contents of the video, etc.
Several works provided other external factors. Like the importance of the application, user's terminal hardware, and mobility [11]. Also, five standards of video quality metrics (i.e., the join time, the buffer ratio, the rate of buffer events, average bit-rate, and rendering quality) were presented in [13]. As well as the prefetching process, source coding [14] and the effect of packet reordering [15], [16] studied in [17].
In another perspective, a comparison of the influence of some metrics the packet loss and bandwidth have a significant impact than the jitter and delay [18]. In short, it is worth noting that there are specific IFs relevant for different types of services and applications.

III. MEASUREMENTS APPROACHES
To consider the user satisfaction in the context of realtime video streaming applications, QoS is no longer sufficient to evaluate the quality. Therefore, researches have been conducted to assess the QoE [19]. In this section, we will address the developed techniques to measure the QoE [20]. Whether using subjective or objective methods or combine both are discussed in [11] as follows: "Subjective methods are conducted to obtain information on the quality of multimedia services using opinion scores, while objective methods are used to estimate the network performance using models that approximate the results of subjective quality evaluation."

A. Subjective Assessment
In [21], subjective assessment is considered as the most accurate approach to measure the QoE perceived by the enduser. This method gathers human observers in a laboratory to evaluate sequences of a video and then scores depending on their point of view and their perception, the average of the values obtained for each test sequence is known as the Mean Opinion Score (MOS) [22]; MOS is often used to quantify these factors. Commonly rated on a five-point discrete scale as follows [1:bad, 2:poor; 3: fair; 4:good; 5:excellent]. Although MOS is the most known precise assessment, it slows scoring due thinking and interpretation, as well people are limited by finite memory and cannot capture users perception over time.
In addition, in a recent research [23] authors have studied the impact of considering young student (9-17 years old) as viewers to evaluate the quality of videos (MOS) subjectively. The results suggested that they are suitable and can notice different quality issues to the adults. However, more studies should be performed. To conduct a subjective quality test, to evaluate a video quality [24], we introduce some of the widely known standard methods as follows: • Double Stimulus Continuous Quality Scale (DSCQS) [25]: The evaluator is presented twice by reference and the processed video sequence in alternative fashion, upon termination of the video he is asked to rate its quality at a scale of 0 (lowest value)-100 (highest value) then the difference of the video assessment value is calculated. In the case of a small value, the quality of the presented video is close to the reference video else the quality is low. For a large number of video scenes, DSCQS needs a very long time to implement quality assessments.
• Single Stimulus Continuous Quality Evaluation (SSCQE) [25] ITU-R recommendation: The user votes the quality of a continuous video usually of 20 to 30 minutes. This method allows observing the variation of the quality over time by calculating the average quality evaluation of the subjects, SSCQE requires well-trained observers to attain stable assessment results.
• Absolute Category Rating (ACR) [26]: ACR is recognized as a single stimulus method. The video is watched for about 10 seconds, and during the next interval up to 10 seconds, the subjects evaluate the video by the fivegrade quality scale expressed as MOS.
• Absolute Category Rating-Hidden Reference (ACR-HR) [26]: This approach is similar to ACR. Except that the reference version of each shown distorted test sequence is also displayed to the participants. Afterward, they give their scores in the form of MOS, and a final quality evaluation is computed using a differential quality score.
• Pair Comparison [26]: Pair of videos are presented to the subjects to be compared and then evaluated (i.e., which one of the pairs has superior quality). The results vary depending on, which one was shown first, as the assessments take longer time than the ACR method.
Other standards, such as Simultaneous Double Stimulus for Continuous Evaluation, Subjective Assessment Methodology for Video Quality, Degradation Category Rating or DoubleStimulus Impairment Scale and Comparison Category Rating, are discussed in [27].
Subjective Assessments are very expansive in terms of human resources, cost and time consumption. However, such technique cannot be used as an automatic measurement or monitoring for real-time applications like video streaming. Fortunately, there exists another subjective evaluation form of QoE, that enables new potentialities to conduct web-based tests. It is more flexible, offers a diverse population as participants and is cost and time effective. Besides, it creates a realistic test environment, named Crowdsourcing [28], [29].
Here, we cite some platforms and web-based frameworks: -Aggregator platforms (e.g., Crowdflower, Crowdsource): These platforms often delegate the task to different channels, that provide workers. Such a system focuses on a limited set of predefined tasks only. Meanwhile, it might suffer from a significant drawback as some aspects of the experiment, might not be directly controllable; -Specialized platforms (e.g., Microtask, TaskRabbit): This platform focuses on a limited set of tasks or a specific workers class, as it maintains their workers; -Crowd providers (e.g., Amazon, Mechanical Turk, Microworkers, TaskCN): Acknowledged as the most flexible type, a self-organizing service, maintains a large work crowd and offers unfiltered access to the recruited participants; -Quadrant of Euphoria: Permits for a pairwise comparison of two different stimuli, so the worker could judge which of the two stimuli has a higher QoE. A test uncovers fake users and rejects them, but at the cost of exposing reliable users also to rejection.
On the other hand, an underdeveloped crowdsourcing system is proposed [30], to evaluate the QoE of video on demand streaming. This system is different from other crowdsourcing platforms as it can monitor network traffic and the bandwidth, as well measure the central processing unit (CPU) usage, Random Access Memory (RAM) utilization, times video freezes and MOS (i.e., users fill a questionnaire). It proved to be about a 100% accurate in High Definition display resolution (HD) and about 81 to 91% in other qualities as their test shows.
Most of these Crowdsourcing techniques have only allowed testers to conduct the test on their computers or laptops. However, Seufert et al. [31] introduced a new application "CroQoE". It runs on mobile devices to evaluate the QoE of streaming videos, connected to a Linux back-end server to dynamically prepare and evaluate the test. Also, they allowed users to choose the content of videos they would like to watch. The results proved that this added feature (i.e., choosing the content) could slightly enhance the QoE ratings. Still, they utilized, in their tests, only high definition videos with a duration of fewer minutes. Crowdsourcing technique has some drawbacks, as there is a little control over the environment, which may give the participants a chance to cheat in order to increase their income. Also, as stated in [32], crowd diversity and expectations, the context, type of equipment (workers typically use their own devices and could differ regarding hardware, software, and connectivity) and the duration and design of the test (small duration will encourage the workers while long duration may be unreliable) impact the QoE assessment.

B. Objective Assessment
A considerable number of objective quality measurements have been developed using mathematical formulas or algorithms to estimate the QoE based on QoS metrics ( parameters collected from the network). Depending on the accessibility of the source signal, they are organized into three approaches: • Full reference (FR): a reference video is compared frame-by-frame (e.g., color processing, spatial and temporal features, contrast features) with a distorted video sequence to obtain the quality (commonly used in labtesting environments, e.g., ITU-T J.247).
• Reduced reference (RR): Only some features of the reference signal are extracted and employed to evaluate the quality of the distorted signal (e.g., ITU-T J.246).
• No reference (NR): The reference video is inessential while evaluating the distorted video sequences Quality.
(commonly used for real-time quality assessment of videos, e.g., ITU-T P.1201).
Some of the most known objective quality assessment approaches are Peak Signal to Noise Ratio (PSNR), Structural Similarity Metric (SSIM) [33], Multi-Scale Structural SiMilarity [34], SSIMplus [35](supports cross frame rate and cross resolution), Video Quality Model (VQM) [36], and Natural Image Quality Evaluator (NIQE) [37]. Despite that these models outperform PSNR, most researchers commonly use PSNR [38], the logarithmic ratio between the maximum value of a signal and the background noise, due to its simplicity to assess video quality. However, it cannot be appropriate to be used in a real-time mechanism. A heuristic mapping of PSNR to MOS (see Table I) exists though, the research in [39] revealed that the correlation between the PSNR and subjective quality would be decreased if the codec type of the video content changes unless otherwise. PSNR is a qualified indicator of video quality. Here we exhibit few PSNR to MOS mapping models: • The relation between PSNR and MOS for time-variant of video streams quality on mobile terminals [40]: where M SE(n) is defined as follows: Further, M OS P SN R (n) is captured using a linear law where M SE(n) denotes the mean square error of the n-th frame F n compared to the frame R n of the reference sequence. i and j address particular pixel values within the frame. C is the number of the color components and c is an index to address them. c M OS,P SN R represents the sample co-variance between the PSNR(n), and the MOS(n). µ P SN R and µ M OS are the sample means of PSNR respectively MOS. σ 2 PSNR is the sample variance of PSNR. a and b are respectively the scaling and the shift factors.
• The PSNR-MOS nonlinear mapping model on the wireless mobile network for video services as follows [41]: a and b are model parameters associated with measured data, R p transmitted rate of the the video service and P LR is the packet loss rate.
α, β, ξ and γ are parameters that vary with the content and structure of the video sequences.
• PSNR to MOS mapping using an S-type (sigmoidal) mapping function [42]: α, β, γ and λ are related parameters that can be determined through many experiments. Moreover, authors in [43], based on the article [44], have evaluated a relationship between MOS and the bit-rate as follows: where R is the bit-rate, α and β the parameters obtained from the upper and lower limit of MOS values. Based on the paper [45] α = 2.3473 and β = 0.2667. After presenting PSNR; Other Frameworks were proposed to measure and predict future QoE collapses, such as: • The bit-rate switching mechanism is executed at the users' side in a wireless network, to elevate the quality of the user and determine the QoE metrics. Xu et al. propose [46] a framework for dynamic adaptive streaming, that, given the bit-rate switching logic, computes the starvation probability of playout buffer, continuous playback time and mean video quality. It can be used to predict the QoE metrics of dynamic adaptive streaming.
• YoMoapp [47], a passive android application was employed in a field study of mobile YouTube video conducted in [48] to monitor the application-level key performance indicators (i.e., buffer and the video resolution) of YouTube in the user's mobile device, this monitoring application works on JavaScript which might indicate some errors however it is accurate by approximately 1 second.
• Pytomo [49] evaluates the playback of a played YouTube [50] video as experienced by users. It collects the download statistics such as the ping, the downloaded playback statistics, number of stalling event and the total buffer duration, then estimates the playout buffer level. Moreover, Pytomo allows the study of the impact of the DNS resolution. This tool could be YoMo complementary. However, it is not feasible, due to the need to access the user's device.
• An application for mobile service [51] was proposed to measure the QoE directly from the user's device, in order to transmit the results to the service provider while preserving the user's privacy.
• QMON [52] is a network-based approach that monitors and estimate the QoE of the transmitted video streaming.
It focuses on the occurrence and the duration of playback stalls, also it supports a wide range of encoding (MP4, FLV and WebM). The study confirmed that streaming parameters (i.e., stalling times, times on quality layers) are the best appropriate for QoE monitoring, to ensure an accurate developed model to estimate QoE.
• The authors in [14] studied the quality of streaming from the aspect of flow dynamic. They developed an analytical framework that computes the QoE metrics like dynamics of playout buffer, scheduling duration, and the video playback variation, in a streaming service over wireless networks. The framework is assumed to anticipate precisely the distribution of prefetching delay and the probability of generating a function of the buffer starvation. The obtained result proved that the flow dynamics has more influence on QoE metrics. Also, it is assumed to be suitable in some scenarios like hyperexponential video length distribution, heterogeneous channel gains, mixed data, and streaming flow.
• Network operators may handle long and short views with different priorities. Thus [53] build a model on starvation behavior in a bandwidth sharing wireless network by using a two-dimensional continuous time Markov process and ordinary differential equations to determine that progressive downloading increases, considerably, the starvation probability. Further, they observed based on their result, that the history of time-independent streaming traffic pattern can predict future traffic, and that the viewing time follows a hyper-exponential distribution which is validated to be more accurate than some existing models (i.e., exponential, Pareto distribution).
• The paper [54] proposes a real-time video QoE software assessment system. It evaluates the error of network in the part of video transmission, by testing the value of the service quality, the quality of transmission, the encoded videos in various contents and sizes. The authors indicate that this platform is deployable on a real network.
• A QoE Index for Streaming Video (SQI) model was proposed by Duanmu et al. [55] to predict the QoE instantly. To build their model, they have started by constructing a video database (effect of initial buffering, stalling, video compression), then investigate the interactions between video quality and playback stalling.
The SQI seems to be ideal for the optimization of media streaming systems as well; it is simple in expression and effective. However, it does not support reporting function on the degradation of QoE and has limited monitoring parameters.
• YOUQMON [56] estimates the QoE of YouTube videos in real time in 3G networks. It combines passive traffic analysis and a QoE model to detect stalling events and project them into MOS. Each minute monitoring system computes the number of stalling as the fraction of stalling of every detected video, as well it supports two video formats used by YouTube, AdobeFlash, and Moving Picture Experts Group (MPEG4). The results appear to be accurate, similar to MOS values and indicate the potentiality of the performance of this system. Still, it cannot identify the point of the network that impacts the quality.
• The QoE Doctor tool [57] is an Android tool that can analyze across different layers (application, transport, and network), from the app user interface (UI) to the network. The tool employs a UI automation tool to duplicate user behavior and to measure the user-perceived latency (i.e., identify changes on the screen), mobile data consumption, and network energy consumption. QoE Doctor can quantify the factors that impact the app QoE and detect the causes of QoE degradation, although it is unfit to supervise or control the mobile network, the component responsible for detecting UI changes has to be adjusted for each specific app.
• Zabrovskiy et al. [58] presented AdViSE, an Adaptive Video Streaming Evaluation framework of web-based media players, and adaptation algorithms. It supports different media formats, various networking parameters and implementations of adaptation algorithms. AdViSE contains a set of QoS and QoE metrics gathered and assessed during the adaptive streaming assessment evaluation as well as a log of segment requests, applied to generate the impaired media sequences employed for subjective evaluation. Still, they do not provide a source code level analysis of familiar Dynamic Adaptive Streaming over HTTP (DASH) players and support for popular commercial streaming players. In [59], same authors proposed an end-to-end QoE evaluation to collect and analyze objectively (AdViSE) and subjectively (Web-based subjective evaluation platform (WESP) [60]) the streaming performance metrics (e.g., start-up time, stalls, quality). The framework is flexible and can also determine when players/algorithms compete for bandwidth in different configurations although it does not consider Content Delivery Network (CDNs), Software-Defined Networking (SDN), nor 5G networks.
• VideoNOC [61] is a video QoE monitoring prototype platform for Mobile Network Operators, considering video QoE metrics (e.g., bit-rate, rebuffering). VideoNOC allows to analyze the impact of network conditions on video QoE, reveals video demand across the entire network, to develop and build better networks and streaming services. Despite, the platform disregard transport-layer and relevant RAN KPIs data and QoE inference on encrypted video traffic.
• In the same vein, an online Machine Learning (ML) named ViCrypt is introduced [62], to anticipate rebuffering events from encrypted video streaming traffic in real-time. This approach, after it subdivides the video streaming session into a series of time slots, that have the same length. It employs a fine-grained time slot length of 1 second (for a proper tradeoff between precision and stalling delay detection), from which, the characteristics are extracted. Afterward, they are used as an input to the ML model to predict the stalling occurrence. It should be mentioned that the initial delay and length of stalling events can be also be obtained. As an extension to the later work, the authors have demonstrated in [63] that ViCrypt can additionally predict the video resolution and average video bit-rate accurately. As an extension to the later work, the authors have demonstrated in [63] that ViCrypt can additionally predict the video resolution and average video bit-rate accurately. Also, Vasilev et al. [64] opted to build an ML model to anticipates the rebuffering ratio based on the hidden and context information to enhance the precision of prediction through Logistic regression.
• Lin et al. [65] applied a supervised ML and support vector machine to anticipate users' QoE by considering the number of active users and channel conditions experienced by a user. They classify a session in two categories (i.e., with or without stall events) based on cell-related information collected at the start of a video session. Considering the starvation events, mobile users experience them more than adaptive streaming and static users. As well these last, are more accurate and convenient to predict their starvation event. Similarly, a multistage ML cognitive method is developed by Grazia et al. [66]. Although, this model combines unsupervised learning of video characteristics with a supervised classifier trained to extract the quality-rate features automatically. Their model is supposed to exceed the other offline video analysis approaches.
• Orsolic et al. [67] proposes YouQ, an android application to prognosticate The QoE (i.e., stalls, quality of playout and its variations) employing ML relying on objective metrics like throughput and packet sizes extracted from the stream of encrypted packets. Though, the promising result, the majority of the features depends on TCP, meaning that, in regards to UDP, these techniques probably will fail.
• Similarly the authors [68], suggested a QoE detector based on extracted data from networks' packets employing a deep learning model. The model is based on a combination of an RNN, Convolutional Neural Network, and Gaussian Process (GP) classifier.This classifier can recognize video abnormalities (i.g., black pixel, ghost, blockness, columns, chrominance, color bleeding, and blur) at the current time interval (in 1-second) and predicts them. The model is supposed to predict video QoE in a real-time environment; however, it could encounter a few issues like having a small amount of training data.
• ECT-QoE framework [69] predicts at the instant the QoE of streaming over DASH, based on the expectationconfirmation theory and the video database, they have built. The model is presumed to defeat several models, especially when combined with the SSIMplus model. Despite that, ECT-QoE can be applied only to videos consisting of view segments.
• Wu's model [70], contrary to other propositions, examines the global intensity and local texture metrics extracted from a decoded video, to predict stalls event and assess the user's quality. The algorithm maps the normalized number and duration of stalls using linear combinations. When compared to other models (e.g., [71], [72], [73]), Wu's proposition appears more consistent concerning subjective perception.
• A cost-constrained video quality satisfaction (CVQS) framework is proposed [74] to predict the quality expected, considering some metrics such as the high cost of data. Despite that, it indicates satisfactory results the accuracy of the CVQS could be impacted by the video encoder as well in their test the client can only obtain the next video segment after two seconds. There are a large number of standards, that offer indications on good and accustomed practices, for certain test applications, standards do not provide the best or most advanced method available, but it gives solid, common basis which is accessible to all, like ITU -International Telecommunication Union [75] (Table II). Furthermore, a survey [76] summarized various ITU-measurement methods to evaluate video streaming quality.

C. Hybrid Assessment
According to [135], QoE of a user's performance can be estimated based on objective and subjective psychological measures while using a service or product. Moreover, another approach exists that consists of a combination of subjective and objective assessment, referred to as The Hybrid approach. Using ML algorithms [136], [137], statistics, and other fields. It could be employed in real time, and it is categorized as the most accurate approach since it decreases the weaknesses of previous approaches [19].
For instance, the Pseudo Subjective Quality Assessment (PSQA) was created to give similar results as perceived by human in real-time, as it provides an accurate QoE measurement [138], [139]. PSQA is based on training a particular type of statistical learning approach, Random Neural Network (RNN).
To evaluate the quality of the video, the IFs on the quality are selected to be used to generate several distorted video samples. Afterward, these samples are subjectively assessed. Then the results of the observations are employed to train the RNN in order to apprehend the relation between the factors that cause the distortion and the perceived quality by real humans. The training method is performed once, after that the trained

Speech
Audio Video Subjective network can be used in real time. A comparison study in [138] proved that PSQA is more effective than subjective (MOS), objective (PSNR), in the matter of time-consuming, manpower moreover it runs in real-time. Likewise, a further investigation was done [139] in the context of Multiple Description Coding (MDC) video streaming over multiple overlay paths in video distribution networks, confirms the same result as in [138]. Because, after training MDC-compatible version of PSQA; PSNR could not evaluate, and its results did not change a lot corresponding to the Group of Pictures (GOP) size. On the contrary, PSQA module considered the size of GOP and differentiated if MDC is used or not. Nevertheless, this approach is not applied in wireless mesh networks. Fortunately, another tool called Hybrid Quality of Experience (HyQoE) can predict for real-time video streaming applications [140]. It takes into account six parameters percents losses in I frame, P frame and B frame, general loss, complexity, and motion. Comparing HyQoE to other tools, they have demonstrated that, PSNR algorithm does not take into consideration the human visual system and the MPEG structure during the assessment process. Also SSIM is inadequate to reflect the user opinion when different patterns of loss, motion, and complexity are analyzed, and that video quality mode generates low scores.
Where µ i and µ j are respectively, the average value in the block of the original and the distorted image. c 1 and c 2 are the variables that stabilize the division with weak denominator. σ 2 i and σ 2 j are respectively, the variance in the block of the original and the distorted image. σ ij denotes the covariance of the block of the original and the distorted image.
HyQoE gives results quite similar to the one given by MOS. They believe that it can be used to optimize the QoE by improving the usage of the network's resources. Likewise, Chenet al. [141] proposed a framework that seizes the users' perception while using network applications named Oneclick. If ever the user is displeased, he can click a button to indicate his feedback. Then the collected data is analyzed to determine the user's perception under variable network conditions. The tool is supposed to be intuitive, lightweight, time-aware, and it is convenient for multi-modal QoE assessment and management studies considering its application independent nature. The framework considered to give the same result as MOS but faster. Furthermore, the authors in [142] employed four ML algorithm (i.e., Decision Tree, neural network, kNN, and random forest) to evaluate MOS value, Based on VQM and SSIM values (i.e., the effect of video distortion and structural similarity). Thus, to assess the performance of these algorithms, the Pearson correlation coefficient and the Root Mean Square Error are employed. According to the results, the Random Forest algorithm was the best in anticipating user perception. However, network parameters like transmission delay and response time are not taken into account.
MLQoE is a modular user-centric algorithm developed by Charonyktakis et al. [143], based on supervised learning to correlate the QoE and network parameters such as average delay, packet loss, average jitter. The framework uses multiple ML algorithms (i.e., Artificial Neural Networks, Support Vector Regression Machines, Decision Trees, and Gaussian Naive Bayes.). The one that outperforms the others, as well as its parameters, will be selected automatically considering the dataset employed as input. According to their result, MLQoE can predict precisely the score of the QoE compared to other existing ML model. As well, in [144] the authors have suggested a trained ML model that predicts the MoS value in SDN, based on network parameters (e.g., bandwidth, jitter, and delay), Their proposal seems to be efficient.
YoMoApp (YouTube Monitoring App) [145] is an under improvements tool. It monitors the application and the network layer (i.e., the total amount of uploaded and downloaded data, is logged periodically) for both mobile and WiFi networks streaming parameters. As well to obtain subjective QoE ratings from end-users (MOS). The data is, anonymously uploaded, to an external database. Then a map is generated from the uploaded data of all users to reveal how every network operator function and how to be employed to benchmark them. YoMoApp performs accurate measurements on an adequately small time scale ( 1 second). They recommended that QoE measurements have to consider more extended video clips. However, the tool uses JavaScript, which can occasionally cause inconsistencies and errors. The latter was employed as well as another Android-based passive monitoring tool to investigate the precision of different approaches. Consequently, streaming parameters revealed high correlations to the subjectively than for the objective experienced quality, which proves that it is better suited for QoE monitoring. [48]. Also, authors in [146] have used YoMoApp to monitor video sessions and obtain several features from end-user smartphones (e.g., the signal strength and the number of incoming and outgoing bytes). They, using ML, introduce a lightweight approach to predict Video streaming QoE metrics such as initial delay, number, and the ratio of stalling and user engagement. According to their evaluation, network layer features is enough to get accurate results. Recently, [147] propose an ML model called Video Assessment of Temporal Artifacts and Stalls (ATLAS). It uses an objective video quality assessment (VQA) method by combine QoE-related features and memory features sources of information to predict QoE. They have also adopted, a subjective assessment, LIVE-Netflix Video QoE Database [148] to evaluate their model. Although the model is only apt to deliver overall QoE scores and cannot be used for real-time bit-rate decisions. To sum up, the hybrid approach can collect metrics simultaneously from both the network and user-end. Such methods would help to correlate the QoS metrics on the QoE and generate a better MOS prediction tool. Also, hybrid studies will allow the study of the impact of the variations in the performance of the network on the users' QoE [12]. Moreover, little research has been conducted in this area. Like in [149], authors have examined the effect of user behavior (e.g., seeking, pausing, and video skipping) on the accuracy of the trained QoE/KPI estimation models. They have concluded that when including user's various interactions, much better results will be obtained. However, more studies should be done.
In Table III, we have summarized a few measurement approaches (i.e., subjective, objective, and hybrid). We outlined the methods, techniques, and challenges for each one of them.

IV. CONTROLLING QUALITY OF EXPERIENCE
As previously presented, various metrics influence the QoE. In this section, several approaches and observations will be discussed to enhance and control the QoE of video streaming services. Some may presume that increasing QoS, means precisely a higher QoE as stated in [21]. Except that the user could be content if he is expectations and requirements are fulfilled, especially if the context of the video is interesting. The previous findings were confirmed by [151] as their results indicated that even frame freezes and shorter playbacks are acceptable by viewers.
Although it was proven that as the number of starvation increase the experience decrease, which the user is unable Both CBR and VBR streaming were considered under static and fast fading channels. Ordinary differential equations were constructed over a Markov process, to determine the prefetching delay distribution and the starvation probability.
High Yes Computational complexity [ First, they extract the playout timestamp, and then the algorithm calculates the actual buffer fill level and the duration of the stalling event.
High Yes Access to the user's device Algorithm limited to YouTube [54] User and Network After collecting the viewers' experience, through a mathematical model, QoE scores are evaluated.

High Yes
Computational cost User's fairness [

140] User
HyQoE evaluates the quality of the video, based on static and trained learning (random neural network) over a wireless mesh.

High Yes
Computational complexity to endure and finally deserts the video [51]. Hence, to avoid starvation, prefetching/Start-up delay and re-buffering delay, a model was proposed in [152], to optimize the QoE by computing the optimum start-up threshold that influences the number of starvation, which allows the content provider to achieve its QoE requirements choosing the right QoE metrics and to avoid starvation. Likewise, authors in [153], when analyzing the buffer starvation, have suggested that service providers should configure different start-up threshold for different categories of media files. Furthermore, based on the observations in [53] they advice network operators, that to enhance the QoE of short views, they should be configured in a higher scheduling priority to reduce the starvation significantly and start-up delays, in the other hand the probability of starvation will slightly increase for long views. However content providers are unwilling to share statistics of views with network providers.
The authors in [154] adopted Lagrange Multiplier, after studying the probability of starvation (p s ) of different file distribution to exploit the trade-off between p s and the start-up delay. They were able to optimize the start-up delay by 40%.
In contrast, dynamic adaptive bit-rate was not considered in their scenario. In the same manner, another work [155] used KKT-conditions based on a Resource Allocation Algorithm [156] to optimize the problem (i.e., reduce the occurrence of stalling events, assure fairness among users (whether utilizing dynamic adaptive streaming or not)). Compared to other proposals (e.g., Proportional Fair Resource Allocation [157] and Base Station Optimization [158]) theirs indicate better performance. For example, in a disturbed traffic network, authors [17] proposed to keep the packet reordering percentage below 20% to maintain an acceptable level of QoE. Still, they have streamed the video using UDP protocol in their study.
The streaming service has adopted a new protocol that answers to the massive demand on network requirements like bandwidth, entitled DASH [159], [160]. It is proved to adapt the quality of the requested video, based on the current bandwidth and devices qualification, but it is affected by many factors based on [161], [162], initial delay, stalling and level variation (frame rate, bit-rate and resolution), besides other factors like video length and the number of motions in the video. Consequently, to derive an effective the trade-off between the network variations and dynamic videos streaming behavior, they [163] introduce a queue-based model to analyze the video buffer (GI/GI/1 queue) with pq-policy (pausing or continuing the video download) using discrete-time analysis. Suggesting to adjust the buffering thresholds according to the bandwidth fluctuations to reduce the stalling vents. In the same aspect, authors [164], after studying the impact of variable and fixed segment duration (HAS streaming services commonly use segments of equal duration) on the stalling probability, proposed a variable segmentation approach that effectively increases the content encoding (i.e., reduced bit-rate per video clip. However, the segment duration can affect the QoE of the streaming behavior of DASH. Besides, authors in [74] suggested a trade-off between profit and service, to network operators and mobile providers. It states that based on several metrics like cost of data and encoding, they can decide the suitable quality level to transfer data to the end-user and thereby, reduce the video storage and optimize resource allocation. In the same context, the framework named QUVE [165] intended to increase the QoE of video streaming services. It comprises two principal sections the first approach, the QoE estimation model, considers encoding parameters, re-buffering conditions and content time to assess the QoE for Constant Bit Rate (CBR) video streaming. The second, QoE parameter estimation approach, it predicts the network quality, re-buffering time and count for the proposed model. The results attest that QUVE is adequate to improve the QoE by choosing the adequate encoding based on a user network conditions.
In another context, users usually find it troubling to decide the next segments quality level to maintain a high QoE.
Thus an extension of DASH player is presented [166] to make a decision based on Markov Decision Process (MDP) called MDP-based DASH. It requires a bandwidth model and a learning process, so after adequate training, the player parameters are tuned to be employed. It is shown that adopting MDP to adapt video quality will reduce notably the video freezing and buffering events.
There exist also a bit-rate switching mechanism permitting users to choose among different switching algorithms to control the starvation probability, which is difficult to define its behavior, as the wrong choice affect the QoE. In [46] a framework is proposed to assist the user in finding the optimal bit-rate to optimize the QoE, taking into consideration all the future occurrences. Also to provide the QoE expected from video streaming HTTP Adaptive Bitrate (ABR) was adopted, caching many streaming files to meet up with the QoE requirements. ABR encountered a problem of storage to control it, an optimal subset of playback rates that would be cashed is chosen. As a solution to this problem, the authors in [54] developed a model for QoE driven cache management to offer the best QoE and avoid the content storage to be filled up rapidly.
Regarding the increase in energy consumption in a cellular network and mobile devices authors in [167], have conducted a study on the subject. They have asserted that to maintain a good balance between QoE and energy consumption, while watching a video from a mobile phone over Long-Term Evolution (LTE) networks, a new design of video streaming service will decrease the energy consumption by 30%. Though; some points (increasing the length of video segments, increasing the buffer size, the strength of the signal and using appropriate DASH sittings) should be taking into consideration. In another paper by Song et al. [168], they propose an Energy-aware DASH (EDASH) framework over LTE to optimize network throughput and to find an excellent balance between the energy consumption of the users' device and the QoE, that proves based on their experiments, its efficiency. The authors in [169] have determined the mathematical formula expressed by two QoE metrics (video rate, the probability of timely delivery of video packets), in order to compute the probability of time delivery of DASH over a wireless access cell (LTE) to determine the bandwidth assigned to the mobile user to maintain a satisfactory QoE. Moving cell phones between wireless access networks make it hard to maintain a good QoE. Thus in [170] they have proposed an adaptive streaming protocol consisting of network adaption and buffer management block that dynamically adapts the bit-rate according to network conditions fluctuations, to provide a stable QoE over 5G. The protocol is designed independently of the operating system (OS) version and CPU performance of the mobile device. The result indicates that the proposed protocol seemed to enhance the users' QoE, as it has been deployed commercially in South Korea for more than five years over commercial LTE/3G and wifi networks. In addition, to address the problem of network delays for CBR and Variable bit-rate (VBR) over 5G mobile networks. In this paper [173] they describe an analytic method that addresses this challenge. Also, the authors present a method to compute the users' QoE based on an exponential hypothesis for streaming traffic using delay and packet loss rate as metrics. This approach decreases the network delays of traffic by less than 1 ms, therefore improve the QoE.
Furthermore, in some bidirectional streaming services, the up-link capacity might also be required as much as downlink capacity. For instance, the authors of [174] propose a piggyback mechanism for audio-video IP transmission over the uplink channel to enhance the QoE, which seems to perform well. The result obtained shows that the mechanism is rather more effective in adaptive allocation schemes than under static allocation schemes. However, it seems Nunome and Tasaka have tested their proposition on other classes of contents.
Dutta et al. [175], to face the challenges encountered in 5G networks (i.e., arranging the connectivity of high data rate to an expanding mobile data traffic), suggest an approach to allow the cloud infrastructure to dynamically and automatically change the resources of a virtual environment, to use the resources efficiently and to provide an adequate QoE. The approach seems to be able to ensure a real elastic infrastructure and promising in handling unexpected load surges while reducing resource, demanding real-time values of PSQA.
Other research efforts suggest that a better quality perception might be met when the quality should be controlled. In [176], [177], the authors apply provisioning-delivery hysteresis for QoE in video streaming case, in order to predict the behavior of the throughput and the QoE to control the quality, using the SSIM. Another mechanism [178] is proposed to control the quality, as congestion degradation affects QoS which impacts thereby the QoE of users. The authors introduce an Admission Control (AC) mechanism based on QoS and QoE metrics, using a joint QoS/QoE that is predicted by a QoS/QoE mapper. Based on these metrics the AC decides whether the user should be accepted within the small-cell network on not.
Though the results obtained are encouraging, AC is only simulated and has not been implemented in realistic network as far as we know. In addition, an introduction of SELFNET 5G project [179] provides a self-organized capability into 5G networks achieving autonomic management of network infrastructure. It designs and implements an adaptable network management framework to provide scalability, extensibility and smart network management reducing and detecting some of the network problems. The framework improves the QoE also and reduces the operational expenditure (OPEX).

V. BRINGING QOE AT THE EDGE
In a typical scenario, when a mobile device requests a video content, it is issued from the servers of CDN, then crossing the mobile carrier Core Network (CN) and Radio Access Network (RAN). Clearly, a massive number of simultaneous streams would generate a colossal demand at backhaul side. Moreover, the wireless channel uncontrollable conditions (e.g., fading, multi-user interface, peak traffic loads, etc.) might be a challenging issue for the monitoring of user's QoE and would be an additional load on the cellular network. Yet, delivering a streaming content is rather difficult, giving that the channel between servers providing the desired content and users can cause delays when transporting data, which would impact the user's experience. Bringing the content closer to the user via caching promises to overcome several obstacles like the network load and delays resulting in an enhanced QoE [180].
To improve users' QoE when using dynamic rate adaptation control over information-centric networks, StreamCache [181] is proposed. This latter periodically collects statistics (i.e., video requests) from edge routers to make a video cache decision. The results indicate that this approach offers a nearoptimal solution for real-time caching as it enhances the QoE by increasing the average throughput. However, the cache size at routers level might influence the performance. Also, a Mobile Edge Computing (MEC) scheme was suggested [182] to permit network edge assisted video adaption based on DASH. The MEC server locally caches the most popular segments at an appropriate quality based on collected data from the network edge (i.e., throughput, latency, error rate, etc.). To solve the problem of cache storage a context-aware cache replacement algorithm, replaces old segments by new popular ones, which leads to maximizing the users' QoE as it ensures a steady playback minimizing frequent switching. Proactive service replication is a promising technique to decrease the handover time and to meet the desired QoE between different edge nodes. However, the distribution of replicas inflates resource consumption of constrained edge nodes and deployment costs. In [183], the authors have proposed two integer linear problem optimization schemes. The first scheme aims to reduce the QoE collapse during the handover; whereas the second scheme aims to reduce the cost of service replication. Evaluating this scheme in MEC, mentioned above, the authors believe the effectiveness of their solutions as well they could provide more information about the network (i.e., predict the user's mobility pattern).
Furthermore, to manage calls' handover in wireless mesh networks, a testbed technique [184], combines RSSI (measure the strength of the received signal) and RTR (as an indicator of transmission rate quality) to compute the quality of a wireless link (every 1 second). This procedure allows monitoring and takes decision of handover (select the access point with the highest quality level). On one hand, this scheme is assumed to improve the QoE by 70 %. On the other hand, it might increase the amount of updates, delays besides it disregard variable bit-rate (VBR).
In another research piece [185], the authors propose a cloud encoding service and a Hypertext Markup Language revision 5 (HTML5) for adaptive streaming player. The built player is a client framework that could be integrated into any browser. The server side is implemented within a public cloud infrastructure. It has been claimed that this scheme promises the elasticity and the scalability needed to suit the clients, although this approach is specifically destined for MPEG-DASH. Due to a rapid growth of mobile data traffic (e.g., mobile videos), the authors of [186] develop some optimum storage schemes and some dynamic streaming policies to optimize the video quality, combing caching on a device and D2D communication to offload the traffic from cellular network as well as the available storage on mobile devices. They introduce a framework called reactive Mobile device Caching (rMDC). Hence, instead of requesting a video from the base station, in D2D caching network, the user can request it from neighboring users and might be served over an unlicensed band. In such a way, D2D candidates are detected before starting communication sessions between devices, using assigned beacon (synchronization or reference signal sequence) resources by the network. This beacon will be broadcast in the cell area to allow devices to advertise their presence and identify each others [187]. Thereby, in the occurrence of a video request, the device starts searching its cache and afterwards, it explores the neighbors' caches locally to retrieve the desired video. If it does not appear, the cache agent at the e-NodeB attempts to locate another mobile device in another group that belongs to the same area. Finally, if the video is not located in any other neighboring device, the cache agent will program to get the missing chunks of the videos from the cache of the e-NodeB if they exist, else they will be downloaded from the CDN. Figure -3indicates the different transition that the mobile device might take before obtaining the desired video. Here, the authors have proved that using rMDC along with user preference profile-based caching, their framework seems to perform well and reaches high network capacity and better video QoE for mobile devices. Besides, the distance between the mobile device and the server hosting the video might be long and could impact the QoE. In [188], the authors propose two mechanisms for files duplication: 1) caching (duplicate copy of a file in different places); and 2) fetching (retrieving the video to another place or zone) simulated separately in different scenarios. Based on the observed demand on a given file, it is selected and the duplication algorithm is activated to duplicate it at the operators sharing server, to be closer and more accessible to the user with good quality and minimum cost. The content fetching seems to be more efficient than caching, and combining these mechanisms might produce even better results. To efficiently bring a given content to end-users with a satisfactory QoE level, the CDN administrator should ensure that this content is strategically stored/cached across the Web [189], [190], as this profoundly impacts the user experience. Storage policy also influences the cost, both in terms of CAPEX and OPEX, to be paid by the CDN owner. It also plays a crucial role in offering of CDN as a service (CDNaaS) [191]. CDNaaS is a platform that could establish virtual machines (VMs) over a network of data centers and provides a customized slice of CDN to end-users. Moreover, it can handle a significant number of videos through caches and streamers hosted at different VMs. The authors formulate two linear integer solutions for VM placement problem, that was implemented using Gurobi optimization tool, Efficient Cost Solution (ECS) and Efficient QoE Solution (EQS).
In terms of maximizing QoE, EQS algorithm shows the best performance. However, regarding time, ECS algorithm exhibits better performance, disregarding the number of data centers and the number of flavors per location.
In order to deliver virtual server resources in a CDNaas architecture, [192] presents a QoE estimation solution that can be employed as a part of a QoE-aware system. The developed system discovers how many users can simultaneously be handled by a server while granting a satisfactory service quality level. It aims to capture how the QoE of a video stream is affected by different factors. The results, based on PSQA, reveals that stream segment duration is an influential factor, and needs to be taken into account throughout resource optimization. The system might be used as a part of the QoE-optimized resource. However, the authors seem to have overlooked the effect of network bandwidth.
From a different perspective, an optimal rate allocation was designed by [193] to limit the co-channel interference and manage resources between D2D and cellular users. They are using a joint encoding rate allocation and a description distribution optimization forwarded to BS and D2D users (predefined candidates, who already cached the content, and who are selected based on their available storage and battery level) before transmitting video segments to the requester. They believe that the scheme improves the QoE of video streaming delivery. Despite, the authors did not consider the additional delays that would be generated by the optimization process at the BS level. Also, a dynamic allocation method is adopted in [43], implementing the shortest path tree, to allocate joint resources (i.e., video streaming, files, etc.). The results conclude that selecting the appropriate transmission rate and the dynamic allocation, could result in an enhanced QoE. Still, the authors assume the content chunks have the same size and the transmission rate is the same for all active nodes, which is not true in real networks scenarios. The end-to-end communications in Next-Generation Networks (NGN) between users and application servers may cross different networks belonging to different operators and implementing different technologies, which is challenging in terms of measuring, monitoring and managing the QoE.
According to [194], optimizing the QoE requires that some factors should be considered like application-level QoS, allocated capacity, customer premise factors and subjective user factors. These factors are hard to figure out due to the difficulties of measuring subjective factors, and some of the elements degrading the QoE may not be available for diagnoses. Moreover, crossing several heterogeneous networks/links makes it hard to determine the element that induces a poor QoS level. In this regard, the authors build a framework that can be implemented in NGN, where the user is able to report the perceived QoE and QoS via software, which allows the operator to allocate the resources and reconfigure them accordingly. Nevertheless, the cost in terms of reporting, and the changing in the parameter might affect the performance significantly. Moreover, some networks might refuse to join and prefer to manage their QoE independently. A new dynamic and a reconfigurable Machine-to-Machine (M2M) network is proposed by [195] where the two metrics are introduced allowing to manage the wireless network, operational quality of applications and efficiency of wireless resource utilization. These metrics allow the network to cover more applications running with higher QoS level and enhanced QoE metrics. The authors consider a multiple layer sensing to the proposed system, so as the platform collects information from each wireless node in the wide area and then forwards the resulting control information to the management network entity. Thereby, the network management decides to optimize the network topology and so on.
Mobile network operators have a limited spectrum/bandwidth, and they pay billions of Dollars to obtain time-limited licenses. Hence, obtaining efficient spectrum usage to get the required capacity is of great interest both for operators and end-users. Thus, communication network needs to increase the capacity to cope with the growing demand for data transmission. The authors of [196], have described and clearly formulated this problem, and the new areas of research on infrastructureless communication (e.g., D2D, M2M, etc.) and small-cells. They also emphasize some innovative spectrum management options, that permit more flexible use of spectrum while enabling D2D communication and deploying small-cells to be candidates to ease such a flexible usage of spectrum.
Long Term Evolution-Advanced (LTE-A) significantly enhanced the spectral efficiency. Yet, the imbalanced traffic distribution among different cells and the severe congestion of some of them still a challenging issue. Techniques like smart-cells [197] and biasing [198] seem to be promising and might partially solve such a problem. Yet, although they cannot deal with real-time traffic distribution, authors of [196] propose a D2D communication-based load balancing algorithm to increase the ratio of user equipment (UE) that can access Internet at the same time. This mainly helps offloading traffic of macro cells via small cells. However, unfortunately, this algorithm could only be utilized for network applications/services, and is not adapted to streaming service as it suffers from some drawbacks like security issues and interference management. [199] presents a resource management algorithm Media Service Ressource Allocation (MSRA). This scheme schedules limited cellular network resources based on content popularity, while considering channel conditions and packet loss rate of D2D direct links. It also allows to achieve an interesting tradeoff between the amount of video service delivered and available cellular resources. Compared to other schemes, MSRA benefits from a rapid users' services distribution adjustment, reduces the impact of D2D underlying interference and enhances the QoE level. For better QoE fairness over services in LTE/LTE-A, a self-tuning algorithm [200] is proposed. The key idea is to repeatedly change the service priority parameters to (re)prioritize services, and guarantee that all users achieve the same average QoE regardless of the type of running service. Depending on whether the objective is to improve the average service QoE or the individual QoE, the authors present two algorithms: 1) QoE unweighted approach, and 2) QoE weighted approach. This way, the appropriate algorithm is selected according to the preferred objective function. Thus, if fairness between services is desirable despite the number of users per service, the unweighted algorithm is used. Otherwise, the weighted algorithm priorities the popular services to enhance the user's QoE by around 15%.
LTE wireless network supports most of M2M Communication classes. Yet, it faces many challenges like dealing with a massive number of M2M devices without influencing the users' QoE. While LTE scheduler plays an important role, it does not distinguish between M2M terminals and legacy UEs. It follows that the radio resources scheduler which turns to be in favour of M2M terminals over user equipment. As a solution [201] suggests an M2M-aware hybrid uplink scheduler to balance the radio resources allocation, which provides adequate scheduling of M2M terminals without affecting standard UEs and the perceived QoE. Machine Type Communication (MTC) allows communication of machines or devices to machines over mobile networks. It is expected to exceed billions of M2M connections, still it might overload the system when a massive number of MTC devices attempt to connect simultaneously to the mobile network. The problem is addressed in [202] regarding a Lightweight Evolved Packet Core (LightEPC) to organize the on-demand creation of cloud-based lightweight mobile core networks dedicated for MTC and to simplify the network attach procedure, by creating an NFV MTC function that implements all the conventional procedures. The latter scheme is shown to exhibit some nice efficiency and scalability features.

VI. OPEN ISSUES
Although QoE modeling has gained a tremendous attention recently, it is still a challenging topic due to its multidisciplinary and subjective nature. For instance, it is hard to get access to operators' network data and traces, which makes it hard to experiment in realistic environments. Also, lack of open source video database to test quality metrics is being a high barrier towards understanding, assessing, improving and controlling the QoE.

A. Need to develop robust and realistic models
Most of existing QoE models consider only a few parameters and not all QoE impacting factors. Whilst many IFs have been identified, such as user and context (e.g., habits, cultural background, environment, etc.), should be taken into account to design a robust and holistic model. Moreover, most reviewed articles do not offer a full study on the complexity of the proposed models from resource allocation (e.g., computing capacity, storage, energy consumption, etc.) perspective, specifically for handsets like smartphones, reducing the performance of the suggested assessment applications.
B. Need to consider powerful tools to predict and assess QoE Throughout this article, we have surveyed a long list of methods aiming to assess QoE and control it. Unfortunately, none of them seems to perform well under general realistic settings. Namely, most of schemes suggested in related literature are only valid for some specific cases, under some strong assumptions in terms of content, user profile, handset, environment, etc. Artificial intelligence and machine learning algorithms have been recently used to measure the QoE objectively or to improve it. For instance, DASH uses machine learning to set the appropriate resolution and/or bit-rate according to the channel state. Allowing this way to continuously track the QoE and proactively take appropriate actions to keep good user experience. However, unfortunately, few works have used machine learning for hybrid assessment which gives similar results to subjective measurement approach. This performance collapse is probably due to the massive amount of required data, computation, verification and the complexity of the training model. We believe this research direction is still in its infancy and needs to be explored in depth. Furthermore, other powerful tools could be used to provide a better understanding of the QoE evolution over space and time. For instance, we believe mean-field game theory is a promising framework that may allow to model and track the QoE variation, while capturing the interaction among active users. More precisely, mean-field game theory turns to be very efficient in analyzing the behavior of a massive number of actors under uncertainty (e.g., random channel, random number of active users, unknown locations of attractive contents, etc.), by averaging over degrees of freedom allowing hereby to deal with a much simpler problem equivalent to the original complex problem.
C. Need to consider human at the center of service design process Recently new applications/technologies have emerged [203], requiring an unprecedented requirement in terms of high data rate and extremely low latency. Consequently, promising the best possible experience is non-trivial due to diverse factors. As future applications like Virtual Reality (VR) Augmented Reality (AR), Mission Critical (MC) services, Tactile Internet (TI) and teleportation will require a colossal amount of resources, end-users will keep asking for high QoE while using these apps [204]. The international telecommunication union [205] has highlighted numerous requirements for the developing agreement on the usage states and needs of the emerging services (e.g., e-health, remote tactile control, etc.). Additionally, the technical infrastructure developments of the 5G communication systems have been evaluated in the context of recent system requirements (e.g., high bandwidth, low latency, and high-resolution content) and new experiences of users such as 4K resolution video streaming, TI, and AR/VR. AR allows people to add digital elements into their existing environment (e.g., Snapchat, Instagram, PokemonGO, etc.). Billions of mobile users already heavily used, and many companies like Apple and Google Glass, Microsoft HoloLens are encouraging developers to build AR-Apps. Conversely, VR changes the real world into a virtual one requiring specific special hardware such as Oculus Rift gear (expensive and not-portable), which is slowing down its adoption rate by end-users. Moreover, TI [206] will combine many technologies such as mobile edge computing, AR/VR, automation, robotics, telepresence etc.. Also, it will permit the control of the Internet of Things (IoT) in real time while moving and within a particular communication range. Further, a new dimension will be added to human-to-machine interaction by enabling tactile and haptic ( sense of touch, in essence, the manipulation and perception of objects utilizing touch and proprioception) sensations, and at the same time transform the interaction of machines. Therefore, assessing the QoE of such an application would need to consider all new parameters and will extremely specific QoS (e.g., ultra-reliability and low-latency) [207].
Inevitably, these emerging applications are changing our daily life and surrounding environment (e.g., home, work, etc.), which impacts our perception and understanding of space and time. Indeed, numerous study such as [208] have proven that AR increases the learning ability. Earlier to this, more research must be conducted in various demographic, geographic areas. To incentivize users to experience and interact with immersive environments, it is fundamental to provide seamless services with perfect audio/video data processing capabilities. The most crucial performance metrics of these applications are typically high energy consumption and long processing delay [209]. To overcome the computational resource shortage of mobile devices novel techniques like mobile cloud computing and mobile edge computing are to be examined to allow users offloading the intensive computation tasks to several robust cloud servers. However, for more efficiency, a convenient edge-to-cloud architecture should be constructed. In this aspect, machine learning techniques can be applied to approach these difficulties possibly by using available traces. For example, to anticipate computational requirements so that devices could minimize latency, proactive scheduling of the computational resources could be performed in advance [210].
As mentioned earlier, TI, MC, VR and AR, are new classes of applications that completely change the way we interact with reality. It is essential to keep in mind, that they can massively impact the brain, and affect its perceptions and reasoning, directly in an obvious manner (e.g., motion sickness, addiction, discomfort, eyestrain, nausea, migraine, etc.) [211]. Thus more studies have to consider these critical issues.

D. Economics of QoE
Economics of telecom services has reached maturity as a tremendous research effort has been spent in developing joint QoS and pricing models. Most of these models capture the interaction among competing operators over a shared market under homogeneous services and inhomogeneous services. However, all these models only consider strategic pricing for delivered QoS, and only deals with optimizing CAPEX and OPEX. Thus, interactive models considering QoE and its influencing parameters are still to be build. More precisely, charging end-users according to the QoE they receive is of great importance. Of course, the pricing is assumed to depend on the delivered QoS but also on the end-users' satisfaction level and context. A deep analysis of the interaction among content provider, service provider, network provider, broker and end-users is becoming of grand importance. This interesting research direction is highly inter-disciplinary as it involves: economics, logistics and demand-supply optimization, flow theory, cognitive science, psychological and behavioral science.

VII. CONCLUSION
In this article, we provide a comprehensive literature review on QoE, by presenting standard definitions as well the influencing factors of QoE, that depends mostly on the type of network, the type of device, content, services and users. Next, we list major tools and techniques allowing to monitor and measure/estimate the QoE of a given service. We also discuss the challenges encountered in wireless networks and mobile networks (e.g., LTE, LTE-A and 5G), such as network capacity and varying channel conditions. Then, we exhibit most impactful solutions from literature. Many improvement mechanisms and controlling approaches with promising potential and even effective, are also cited and analyzed.
With 5G being deployed around the world, providing responsive networks able to grant high throughput and low latency is not a challenging issue anymore. However, supporting extremely latency/reliability demanding applications such as VR/AR and tactile Internet is still to be addressed. Thus, we believe considerable research efforts need to deal with developing efficient mechanisms allowing to meet these requirements.