Network Capacity Estimators Predicting QoE in HTTP Adaptive Streaming

The aim of adaptive HTTP streaming technology is preserving the best possible video streaming quality for viewers in heterogeneous network conditions. This can be achieved by making multiple quality versions of the video available. Switching between versions during playback should be imperceptible and fluent. The decision about quality-level switching is typically based on network capacity estimation and buffer occupancy, which predict the risk of stalling. Since quality-level switching and stalling are directly evident to the user, they are often classified as influence factors of quality of experience (QoE). In this paper, we observe different network capacity estimators and buffer behavior in limited network conditions and study how the estimators predict QoE. The challenges of variable bitrate (VBR)-encoded video are considered. We also propose two new estimators to predict QoE. One compares segment fetch time to segment playback time, while the other explores the difference of throughput and average download rate. As segment duration may influence HTTP adaptive streaming (HAS) playback in unstable conditions, the findings are tested with four segment lengths. Moreover, streaming quality is analyzed in a testbed using two popular web players to reveal possible effects of the players’ features.


I. INTRODUCTION
The high global Internet penetration rate has enabled the massive growth of streaming video on demand (SVoD) services. Cisco [1] has predicted that Internet video usage will reach 82 percent of global Internet traffic by 2022. The main actors in the video delivery chain are content providers, content delivery network (CDN) operators, Internet service providers (ISPs), and application designers [2]. Although these parties have their own criteria for developing services, the quality of experience (QoE) of the end user is their common interest for customer satisfaction. ITU-T FG IPTV [3] defines QoE as ''the overall acceptability of an application or service, as perceived subjectively by the end user''. QoE is subjective and depends on a user's experiences and context. QoE data can be collected from test environments involving humans. There are also standardized models like ITU-T P.1203 recommendation for assessing the QoE of HAS.
Before delivering a video, the content provider makes decisions about encoding and compression that affect video quality. The aim of on-demand streaming is to transmit and The associate editor coordinating the review of this manuscript and approving it for publication was Nishant Unnikrishnan. display the stored video with the best quality possible from the user point of view. The transmission channel has characteristics like the available bandwidth, packet loss, delay, and jitter, which can affect QoE. Video transmission in the network is controlled by the rules defined by streaming and transmission protocols. Since the 2010s, HAS technology has overtaken the streaming protocol field. In HAS, all requests are done via HTTP on port 80, similar to plain web browsing. Thus, the streaming traffic is capable of traversing firewalls and proxy servers. HAS does not need a persistent connection between the server and player, and it can utilize existing content delivery networks. No special streaming servers are needed. In practice, the four following HAS technologies currently share the market: Apple's HTTP Live Streaming (HLS), Microsoft Smooth Streaming (MSS), Adobe's HTTP Dynamic Streaming (HDS), and MPEG's Dynamic Adaptive Streaming over HTTP (DASH or MPEG-DASH). DASH is the first HTTP-based adaptive bitrate streaming solution that is an international standard. In this paper, HLS is used, since it is the most widely-used streaming protocol.
In HAS technology, the video is encoded into multiple quality versions. Furthermore, each quality version is divided into sections of few seconds called segments. Using HTTP VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ GET messages, the client requests each segment separately. Segmentation reduces network wastage compared with traditional progressive downloads, where the entire video is downloaded with a single request. Switching the quality level is possible in segment boundaries because quality levels are segmented evenly. If the streaming conditions change, the client can request the next segment in a different quality than used for the previous one. While downloading segments, the client collects and monitors the data needed to evaluate the streaming conditions. Collected data can include the observed throughput for each segment, buffer occupancy information, and possibly, the central processing unit (CPU) load. By combining and analyzing the collected data, the network conditions and system capacity are assessed. The goal is detecting the limits where the current bitrate should be switched higher or lower to better match prevailing streaming conditions. The choice between bitrate options is based on estimated streaming conditions and adaptation logic. The adaptation logic includes the rules that define how conservatively or aggressively the bitrate is changed. In addition to the adaptation logic, buffering strategies and segment duration may affect the streaming quality in fluctuating network conditions. Buffering strategies help optimize playback performance by trying to prevent stalling, but at the same time, enabling fast startup and minimizing data wastage. A typical segment duration is between 2 and 10 seconds. Apple recommends the target duration of a segment to be 6 seconds [4]. Short segments enable faster reactions to changing streaming conditions because the segment duration defines the bitrate switching interval. Every segment must start with an I-frame, and this lowers the encoding efficiency compared with the same bitrate videos with fewer I-frames. Requesting each segment separately causes a higher overhead compared with traditional streaming protocols. While the overhead can be decreased by lengthening the segment duration, longer segments may increase the initial delay.
Like other streaming techniques, HAS is vulnerable to network interference. Various factors affect delivery efficiency in packet-switched networks; such factors include the available bandwidth; network congestion; bit errors; capacity restrictions on the client; data processing on the media server, routers or switches; and interference in the transmission medium. HAS faces extra challenges because it uses TCP as the transmission protocol. TCP performs best with a steady stream of data packets. Thus, the sequential HTTP requests, creating an on-off pattern, present a challenge for TCP. The relationship of HAS and TCP performance is considered, for example, in works by Hu et al. [5] and Huang et al. [6].
This study explores the relationship between selected quality of service (QoS) estimators and the QoE influence factors of HAS playback. In HAS, the most common QoE influence factors at application-level are a long initial buffering time, interruptions due to rebuffering, and decreased quality of the segments (and switching between them) [7]. Delay in the startup phase and rebuffering are shown directly to the user, and HAS tries to minimize them by decreasing the segment quality. Hence, quality changes -even noticeable ones-may be unavoidable; however, too-frequent quality switching or even distracting bitrate oscillation can be eliminated with a decent adaptation algorithm. Garcia et al. [8] also detected that changes between high-quality videos are less noticeable than changes between low-quality ones. They suggested preparing more video versions at low quality to make the quality change gradual and less noticeable to the user. Typically, adaptation methods use network capacity estimation and/or buffer occupancy information to choose a suitable bitrate. Even many of the model based approaches are developed around these QoS metrics like a DASH rate adaptation algorithm QUETRA, developed by Yadav et al. [9].
As network capacity and buffer occupancy change before the user observes an improvement or reduction in quality, they predict the QoE influence factors of the playback. Often, predictive metrics are modified to be more applicable in adaptation algorithms. In this study, we monitor these metrics under changing network conditions to reveal their features and possible weaknesses in bitrate adaptation. To cause changes in HLS streaming quality, a variable bitrate (VBR)-encoded test video was streamed in altered network conditions by varying the available bandwidth and inducing packet loss. Two popular video players designed for web playback were chosen for analysis.
This study makes the following contributions in the HLS streaming context: • It shows effects of network impairments on bandwidth estimators and buffer occupancy (i.e. on typical metrics used in adaptation algorithms); • It examines how well common bandwidth estimators, as well as our two proposed estimators, and buffer occupancy can predict QoE influence factors;.
• It examines effects of lengthening the segment duration on QoE influence factors in unstable streaming conditions; and • It uses two players in our test environment to examine the possible effect of players' features. The paper is organized as follows: Section 2 discusses the related work, and Section 3 introduces streaming condition estimators used in this paper. Section 4 presents our testbed, while Section 5 describes the HLS streaming behavior in the test environment, where network conditions are altered. The effect of segment duration is also examined. Section 6 discusses the findings in this study, and Section 7 concludes the paper.

II. RELATED WORK
The performance of media players depends largely on rate adaptation methods. In addition to throughput-, buffer-, and hybrid-based approaches, Yadav et al. [9] divide adaptation approaches into QoE-centric, queuing model, and non-normative based. The simplest throuhput-based methods evaluate future throughput with the most recently arrived segment. Buffer-based methods use different thresholds to prevent buffer underflow or overflow. When throughput metrics and buffer occupancy are used together for assessing streaming conditions, the bitrate version can be selected by assessing the TCP throughput first and then fine tuning the selection based on the buffered media time. Karn et al. [10] developed an algorithm that predicts throughput but makes the final decision about level switching based on buffer occupancy. Particularly, when the buffer occupancy is between two threshold values, the algorithm will keep the current quality level regardless of throughput estimate. That is to avoid unnecessary quality switching when there are two competing clients. Tian and Liu [11] generated a rate adaptation algorithm that uses TCP throughput estimation in addition to an adjustment factor, which is the product of the buffer size adjustment, buffer trend adjustment, and video segment size adjustment functions. QoE-centric based methods consider throughput and buffer occupancy in order to avoid radical or frequent quality level changes, and other factors that are known to influence on QoE. In queuing model based approaches, HAS client is a queuing system, where queue length is the buffer occupancy. QUETRA [9] selects the quality level of the segment so that buffer occupancy converges to the ideal value in the estimated network throughput. Thus, also QUETRA combines buffer-based and throughput-based approaches. Into non-normative approach Yadav et al. [9] categorize methods that use less general goals or means when selecting quality level. These include for example server-side quality selection aiming for fairness among clients. To conclude, even more sophisticated adaptation approaches use network throughput estimation or buffer occupancy monitoring.
VBR videos bring more challenges in bitrate selection. The more a VBR video's bitrate varies from the target encoding bitrate, the more leeway is needed to prevent buffer underflows. It is not uncommon for videos on the Internet even to double the advertised bitrate occasionally. In addition to buffer size, various smoothing techniques are applied to improve playback quality in streaming VBR videos. Le et al. [12] used video bitrate estimation with a moving average to evaluate the capacity sufficiency more precisely for VBR videos. The adaptation algorithm of Dubin et al. [13] estimated the median bandwidth instead of average based on previous segments to obtain more stable estimation. These researchers also suggested that playlist files should include each segment rate in addition to the average bitrate of the entire quality level.
The effects of segment duration have been considered in various studies. Sideris et al. [14] observed in their experiment that a longer segment duration achieved a better QoE level. They deduced that downloading shorter segments prohibits the TCP's sending window from reaching high values, which causes the adaptation logic to remain at lower quality levels. Islam and Khan [15] observed that downloading one large segment is faster than downloading multiple smaller segments; they considered the option that, instead of switching to a lower bitrate version, the segment duration could be varied in insufficient network conditions. In addition, Liu et al. [16] studied the possibility of using segment duration in the rate adaptation. They developed a rate adaptation method, that estimates the minimum segment duration for producing a smoothed HTTP/TCP rate, representing the current network capacity.
Nguyen et al. [17] compared streaming with fixed segment lengths of 2, 5, and 10 seconds in networks with different round trip times (RTTs) and using instant and smoothed capacity estimation methods. They found that advantages of longer segments arise when the RTT is increased. Videos delivered with longer segments reach a higher average bitrate during streaming, especially with the instant throughput-based adaptation method. They also discovered that using a shorter segment duration reduces the occurrence frequency of buffer underflows. In another study, Mondal et al. [18] explored YouTube's bitrate and quality adaptation algorithm. They found that YouTube uses a parallel downloading of segments and segment length changing to offer the best possible quality with minimum data wastage.
In our research, we bring together and compare some common estimators, introduced in Section III, and evaluate how QoE influence factors can be predicted with them. These estimators can further be used in adaptation algorithms. By modifying the introduced metrics, we form two new streaming quality estimators. The analysis is performed using a VBR-encoded video. Although HTML5 tags enable embedding videos directly in a webpage, adding adaptive bitrate streaming, live streaming, and other functionality requires using HTML5 Media API and JavaScript. For that reason, a readymade HTML5 player is often the most straightforward solution. Unlike in the papers mentioned previously, in this study, two common readymade web players are chosen as test players.

III. NETWORK CAPACITY AND BUFFER OCCUPANCY ESTIMATION
The traditional method for determining the most suitable bitrate version is assessing the network capacity and/or monitoring the client buffer occupancy. To avoid reacting to short-term throughput variation caused by TCP congestion control, a smoothed throughput estimation can be used to detect more persistent bandwidth changes [19]. The simplest method for assessing the network capacity T is measuring the segment fetch time (SFT) and dividing the segment size l size by it, that is, Here, SFT i denotes a period of time from the time instant t i of sending a GET request for a ith media segment to the instant of receiving the last bit of the requested media segment, that is, the time consumed downloading a segment of size l size (i). The network capacity for the next segment request interval i + 1 can be estimated as The longer the media segment duration, the smoother the throughput estimation is in equation (1). Liu et al. [19] estimated segment size l size by multiplying segment duration l dur with segment bitrate l br , obtained from the playlist file. Hence, Method (1) takes only the previous segment fetch time into account when estimating the network throughput for the next segment interval. To smooth the estimation more, the exponentially weighted moving average (EWMA) of segment throughput can be used. Parameter δ ∈]0, 1[ is a weighting value (smoothing factor). Following [20], the smoothed throughput estimate for the download interval i+1 can now be formulated as A smoothed bandwidth may cause a late reaction to a large throughput decrease. In these situations, the buffer should be big enough to prevent stalling. In HAS streaming, the data are transferred periodically. A segment fetch period, the time between two consecutive GET requests, may include long idle periods. Thus, the average download rate can be much lower compared with the throughput. The average download speed in a segment fetch period is This represents the average speed at which the client can receive a segment due to restrictions set by the throughput or playback buffer. Here, t i is the timepoint of the GET request for the ith segment. Akhshabi et al. [21] used EWMA-smoothing on the average download speed in a constant, 2-second period for exploring adaptation algorithms. In that case, A (i) = m i /(2 seconds), where m i represents all media data downloaded in the ith 2-second period. In the following, we use EWMA smoothing on equation (4) for estimating the average download rate for the next segment interval, that is In this paper, A e is applied to form a new estimator. The difference (i) = T (i) − A(i) gives information about the bandwidth utilization. When the playback is in a steady state, that is, the playback buffer is full, A(i) follows the average video bitrate, and the difference (i) increases. The less time a player uses on idle periods, the closer A(i) becomes to T (i); that is (i) decreases. Decreasing the difference means that the client strives to maximize the use of the bandwidth as the player is in the buffering state. In addition to operating in the startup phase, the buffering state is on every time the buffer occupancy decreases due to insufficient streaming conditions for the current video bitrate, that is, when media data are removed from the playback buffer faster than they are received. Due to these features, the EWMA-smoothed difference, is the other of our two proposed new streaming quality estimators. One option for estimating the threshold value for e is monitoring the initial buffer filling phase. When the buffer is filled for the first time, the client uses the maximum capacity and e is its smallest. The closer to zero the difference becomes during the filling phase, the more efficiently the client can utilize the available bandwidth. If e later approaches the value observed during initial buffer filling, the playback is closer to transitioning into the buffering state.
If the video bitrate occasionally shows high variation, even inside the same HAS quality level, it may not be enough to compare only the throughput to the bitrate for the adaptation methods. To fade out absolute throughput measuring, the approach to observing the relationship of the segment fetch time and segment playback time was examined, that is, If S(i) exceeds a threshold value, λ = 1, segment i is received slower than one is played out. This results the buffer occupancy to fall. Vice versa, the value of S(i) under 1 depicts that the current throughput is sufficient for the video and the buffer occupancy can grow if it is not yet full. In practice, a value of λ below 1 should be selected to give more time, for example, for decoding. Metric S(i) evaluates the buffer occupancy development direction. It is not independent of A(i) and T (i), as the following equation shows: To form a streaming quality estimator from S(i), we used an EWMA-smoothed version of it: where While monitoring S e , buffer occupancy can be estimated by comparing the received media time to the passed time from the initial buffer filling. Basically, all information needed comprises timestamps of sent GET requests from the network level. If application-level information of the video timeline position value is available, an accurate buffer occupancy can be deduced. As the segment duration l dur is constant, and the player requests a segment only after all the previous ones are received, the buffer occupancy b at timepoint t is Herein, h i is the time when the ith segment has arrived and p(t) is the video timeline position value at time t. Using p(t) gives more accurate estimation than using real time in the buffer evaluation in pursuance of S e monitoring. Particularly, method (7) takes stalling occurrences into account. As segments are moved into the playout buffer as a whole, b(t) forms a sawtooth line. In the buffer occupancy figures given below, we depict the buffer occupancy only in the GET request points b(t i ), in which case, the buffer always contains at least one segment.

IV. THE TEST ENVIRONMENT
During the tests, we used two HTML5 video players that are built for web playback-the commercial JW Player [22] and open source Video.js player [23]. Both are commonly used for professional online video deployments. Adaptive streaming is possible with HLS, MPEG-DASH, and RTMP protocols. In our tests, HLS streaming and the current market-leading web browser, Google Chrome, were used. Video was streamed from a normal web server with the Windows Server 2008 R2 operating system, which uses the Compound TCP (CTCP) version [24]. The media data were sent on the network layer in 1,500 byte-sized packets. CTCP uses the delay-and loss-based congestion avoidance approaches. We also observed that the client used delayed acknowledgment, informing only every other data packet. Changing network conditions were generated during the streaming tests with the Linktropy 5500 WAN emulator [25] by adjusting the available bandwidth and packet loss rate. During the playbacks, the media data traffic was monitored with the Wireshark network analysis tool [26]. The playback progress information was tracked with JavaScript, utilizing the players' API methods. The playback progress information was combined with timestamps of the GET requests recorded by Wireshark to estimate the buffer occupancy at the GET requests using equation (7). The test environment proved to be isolated enough to provide very little variation in the metrics monitored during playbacks with the same settings. The test setup is illustrated in Fig. 1.
As it is commonly applied in video quality research papers, the VBR encoded Big Buck Bunny [27] animation was chosen as a test video. To assure some results, also another test video, Elephants Dream [28], was used. The videos were re-encoded suitable for HLS transmission. The specifications of videos are shown in the Table 1. The client machine information is in the Table 2. Fig. 2 presents the bitrate profiles of test videos with 2-second long segments. Each video and audio frame has a display timestamp that defines when the frame should be rendered. The figure shows the size of frames in each display second. The blue dashed line depicts the overall average bitrate of the video. The blue solid line is the  20-second moving average of the bitrate profile. The content dictates the variation in the bitrate profile.

V. RESULTS
In this section, the effects of different bandwidth conditions and packet loss rates on network capacity estimators and the buffer behavior are observed. For HDS, the client's playback buffer size is roughly recommended to be at least three times the segment duration [29]. The specification states that the buffer length should provide minimal playback disruptions while considering factors like network conditions, desired latency, desired start times, and effects on server scalability. The DASH and HLS specifications do not stipulate the buffer length. In our test setup, the maximum buffer size in the media time was 25s for JW Player. The default buffer size of Video.js is 60s, but it was also set to 25s in the test environment. As a default, both players gathered only 1 or 2 segments to the buffer before starting the playback, when the segment length was 2 seconds. With longer segments tested, players started the playback after receiving one segment. This led to short initial delays (even at the expense of smooth playback-a small initial buffer occupancy may cause stalling occurrences right at the beginning of the playback).

A. BANDWIDTH LIMITATION
For years, service providers have relied on network over-provisioning as a solution to traffic fluctuations. Reserving more bandwidth, than the expected traffic load, provides readiness to serve future customers, although it is not energy efficient. However, over-provisioning does not solve all situations in a network. Not all routers prioritize real-time applications, and UDP-based flows lacking congestion control may flood the network [30]. When multiple flows compete for their fair share of the link, the throughput decreases. Especially, if the link is shared between other adaptive streaming flows with a temporal overlap of the onoff periods, the fair share may be estimated incorrectly. This FIGURE 1. Test environment. Video segments are fetched from a web server and transported through a campus network. Before delivering packets to the client, streaming conditions are altered with a physical WAN emulator. Network traffic is monitored with the Wireshark packet capture program.

FIGURE 2. Encoding bitrate profiles with moving averages of Test video 1 a) and Test video 2 b) in bits/display time.
causes instability in video quality, unfairness, and bandwidth underutilization [31].
Larger video player buffer sizes, better playing strategies, and improvements in TCP have decreased network throughput requirements [32]. For constant bitrate videos, Biernacki and Tutschku [32] assessed that the network throughput should exceed the video bitrate for about 15 percent for smooth transmission. In a simplest case, an adaptation algorithm chooses the maximum of the bitrates that meet the condition γ · l br ≤ T e , where γ = 1.15 is a coefficient that is evaluated to guarantee enough bandwidth to overhead traffic. With VBR videos, a throughput exceeding the average bitrate by 15 percent may not be enough.
To examine the estimators' behavior and players' performance, we regulated the available bandwidth with the WAN emulator using the two players. In a test case shown in Fig. 3, the bandwidth is first set to be 1.7 times the average bitrate, then reduced to 1.2 times, changed to 1 time, and increased back to 1.2 times the average bitrate of test video 1. This kind of sudden bandwidth change may be caused by other clients connecting to share the link. Figure shows the received media data per second (gray line) and estimators T e (green line), A e (blue line), and e (brown line) for both players. The initial value of the smoothing factor δ is chosen to be 0.2, following Akhshabi et al.'s [21] article. The time is set to start from the timestamp of the GET request for the first segment. During the tests, a 2-second-long segment length was used.
In the first bandwidth period (bw = 1.7 · average (video bitrate)), players fill their playback buffers as fast as possible. This causes e to decrease to near zero as all the available bandwidth is utilized. After that, A e and T e diverge from each other, and the variation of A e increases. An increasing e value denotes that the network connection will allow higher bandwidth utilization than is used in video streaming. At its highest, e rises to 4.4 Mbps; that is, at this point, the client is using over 4 Mbps less bandwidth than offered. The decrease of A e results from the idle periods in the data transmission. However, T e stays up since packets are received as quickly as they were earlier, although requested less frequently. The playback is in a steady state; that is, the buffer is full and the next segments are requested less frequently to prevent buffer overflow.
In the second period, the available bandwidth is decreased to 1.2 times the average video bitrate. Estimates A e and T e can still keep their distance from each other, meaning that clients can have idle periods and playback is not vulnerable to stalling. In the third and fourth periods, e approaches zero. Both A e and T e are near the emulated bandwidth; that is, the players request each segment as soon as they have received the previous one. This may indicate decreasing buffer occupancy. The situation is not much improved when the available bandwidth is raised back to the second period level (bw = 1.2 · average (video bitrate) ). The behavior of the estimators is extremely similar in both players. 9822 VOLUME 10, 2022  In Fig. 4, the new estimator S e is applied, and the buffer occupancy for both players during the same test runs as above is depicted in time instances of GET requests using equation (7). In the S e chart, the threshold value is λ = 1 and smoothing factor is δ = 0.2. In Fig. 4a, S e stays below the threshold value λ in the first two periods, indicating sufficient streaming conditions with both players. In the first period, the maximum of S e is about 0.75; that is, downloading a segment takes roughly less than 75 percent of the time it takes to play it back. This is enough for building up the buffer occupancy. Both players can also hold the buffer fullness well in the second bandwidth period.
In the third period, S e rises above the threshold value. On average, S e ≈ 1.14 for both players in the third period. This means that the streaming would need approximately 14 percent more bandwidth than offered with the current bitrate. The time since the arrival of the first segment reaches a received media time of about 360 seconds with JW Player. Thus, it can be assumed that all the received media time is played out, and buffer underrun will take place. The buffer decrease is proved by Fig. 4b, where a more accurate buffer approximation, with the video timeline position, is used. The user will not see any changes in playback quality until the buffer has drained or reached the threshold defined by the player, causing stalling. Both playbacks stall twice in the third period. The last test period looks extremely different from the second test interval, although the available bandwidth is the same in both periods. This is explained by the variable bitrate of the test video. The last 2 minutes of the content requires a higher bandwidth than the first part (see Fig. 2a). S e exceeds the threshold value and stays above it for over a minute.
QoE influence factors (total stalling times and occurrences), were observed in conditions where the available bandwidth was not changed in the middle of the playback. The test video was played five times on six bandwidth levels with both players. Fig. 5 shows, that when the available bandwidth decreases below 1.2 · average (video bitrate), stalling starts to appear; that is, QoE starts to decrease. Thus, the threshold limit for estimator T e should be 20 percent larger than the average video bitrate for this video; in other words, selecting segments in such a way that T e > 1.2 · l br should prevent stalling. Our previous observations with estimators e and S e and the buffer occupancy values suggest switching quality level when the bandwidth drops under 1.2 · average (video bitrate) because of the bitrate variation of the test video. This result is in line with Fig. 5. In the setup above, VOLUME 10, 2022  all the estimators concerned work fairly reliably. However, the constant bandwidth does not usually compare with the reality. In the next section, packet loss is added to the channel to cause more bandwidth fluctuation.

B. PACKET LOSS
The most common cause of packet loss in wired networks is congestion. Another cause of losses is transmission errors resulting in corrupted packets, which are then rejected. Device-based reasons include the performance of routers or switches that are unable to handle all traversing traffic or damaged cables. Wireless networks are more vulnerable to packet loss as the signal strength weakens due to multipath fading. The popularity of mobile devices makes packet loss common. As HAS protocols run on top of TCP, all lost or corrupted packets are resent. If the available bandwidth is high enough, the increased retransmissions may not manifest to the user. Many retransmissions, and especially TCP retransmission timeouts, can cause delay, throughput fluctuation, and finally, buffer underflow. Video image artifacts, such as blockiness or blurring, are not typical in TCP streaming.
The influence of packet loss was tested while keeping the available bandwidth constant. From previous tests, we concluded that bandwidth exceeding the encoding bitrate by 20 percent is just enough for playing the video back flawlessly, but this may cause the buffer occupancy to decrease in the final part of the video. To ensure that insufficient throughput is caused by the packet loss, the available bandwidth was set to 1.3 · average (video bitrate) in the test setup. The loss rate was set first to 1%, then increased to 3%, and then increased again to 5%. The final part was played without packet loss to see how quickly estimators reacted to improved network conditions. The WAN emulator discards packets randomly based on the specified packet loss rate. Dropped frames also consume link bandwidth.
The received media data per second and estimators T e , A e and e are shown in Fig. 6 for both players. Packet loss affects the throughput and causes variation in estimators. Although δ is chosen as a way to smooth out T e , the variation may still lead to failure in selecting optimal bitrate, when using T e alone. Estimators T e and A e follow each other when the player tries to fill the buffer, that is, e approaches zero with only a slight variation. In the second period (3% loss), e rises, indicating that the buffer fills up momentarily. When the packet loss is removed in the final period, T e and A e start to rise again near the emulated bandwidth level. It takes almost  a minute for T e to reach its maximum, but most of the rise takes place in a few seconds. Fig. 7 depicts the behavior of estimator S e and the buffer occupancy during previous test runs. In 1% and 3% packet losses, S e stays below the threshold value, indicating that the channel can deliver packets in sufficient speed to render video. In a 5% packet loss period, S e rises above the threshold value of λ. In this period, the average of S e is 1.04 with JW Player and 1.01 with Video.js. Stalling occurrences are still avoided. The buffer occupancy is, at the lowest, about 11s with JW Player and 12s with Video.js in the third period.
Packet loss naturally slows down the buffer filling and causes the initial delay to lengthen. However, neither players gather more than 1-2 segments to buffer before starting the playback. This already causes difficulties in the beginning as both players can raise their buffer level only to about 10 seconds during the first minute. As could be assumed based on estimator e , the players manage to fill in their playback buffer in the second packet loss period. From the behavior of estimator S e , it could be deduced that, in the 5% packet loss period, with the used bandwidth, the level of media data decreased in both players' playback buffer. Fig. 8 shows the averages of total stalling times and stalling occurrences, when the packet loss ratio is kept constant during the whole playback time and the available bandwidth is limited to 1.3 · average (video bitrate). The video is played back five times with each packet loss ratio. The stalling time starts to increase when the loss ratio exceeds 4%.
Along with the packet loss, the throughput conditions were more realistic during the tests in this section. Naturally, the fluctuating throughput affected the behavior of T e the most. Estimator e behaved similarly to the case with constant available bandwidth. Estimator S e varied a bit more than it did without packet loss, but as it only indicates the incoming data in relation to played data, it is easy to interpret.

C. SEGMENT LENGTH
Segment length is usually decided on the server side. The decision depends on the terminal device and video content. Short segments enable quick adaptation to changing network conditions. Every segment starts with a key frame; thus, long segments allow higher efficiency in encoding. In addition, fewer requests are needed, which reduces the overhead. For the HLS protocol, a segment length of 6 to 10s is often recommended. However, for example, Bitmovin [33], suggested HLS segment sizes of around 2 to 4s to achieve a good compromise between encoding efficiency and flexibility for stream adaptation to bandwidth changes. This section VOLUME 10, 2022  examines both the behavior of estimators with longer segments and the effects of segment length on QoE influence factors.
The changing bandwidth and packet loss condition, realized in sections V-A and V-B, were repeated with a 10-second segment length. Since there were no significant differences in buffer handling between players, and switching algorithms were not examined, only JW Player was used in these tests. Fig. 9a shows how the lengthening of segment duration increases the smoothing of estimates T e and A e ; that is, it slows down reacting on throughput changes, and the variation decreases. The time for T e to reach the real throughput may be too long for most real-life use cases. This should be considered when choosing the smoothing factor δ. In contrast, when the packet loss is induced in Fig. 9b, T e smooths down variation to the level that would prevent bitrate oscillation. Fig. 10 shows the estimator S e and buffer occupancy for playback in the 10s length segment duration for varying bandwidth levels and packet loss rates. In this setup, the estimator S e is formed using two different smoothing factor values, δ = 0.2 and δ = 0.6. JW Player starts the playback right after the first segment has arrived and uses the same maximum buffer size (25s) as with the 2-second segment. Estimator S e reveals that data are received more slowly than they are removed from the playback buffer in the third and fourth bandwidth periods. The accurate buffer occupancy monitoring reports six stalling occurrences. Segment lengthening from 2 to 10 seconds could not prevent buffer underrun. Corresponding, figures of S e and buffer occupancy for packet loss test are shown in fig. 11.
Finally, the effect of bandwidth limitation on two QoE influence factors with four different segment lengths (2,4,6, and 10s) is depicted in Fig. 12. In these four video variants, keyframes only appear at the start of each segment. Thus, lengthening segments cause decrease in average bitrate. On each relative bandwidth, the video was played five times and stalling occurrences and the total time spent on them were monitored. In Fig. 12a, the total stalling durations during the 10-minute video for JW Player are shown between bandwidths of 1.05 · average (video bitrate) to 1.3 · average (video bitrate). In the bandwidth of 1.05 · average (video bitrate), video stalls on average lasted 47 seconds with 2-second segments and 42 seconds with 10-second segments in total. In a 10-minute-long video, the difference is hardly significant to the user. However, it is interesting that the longest and shortest segment lengths tested had the least stalling events. Closer inspection revealed that, unlike with longer segments, JW Player collected 2-second segments more than one before resuming playback after stalling. It is also possible that some stalling occurrences with 2-second segments were shorter than 0.5 seconds, which was our criterion for stalling. Most stalling occurrences on average (34) were observed with the 4-second segment length in the lowest available bandwidth conditions tested.   The results were also checked with Test video 2. Test video 2 is smaller, average bitrate varying from 1.83 to 1.91 Mbit/s between segment length versions. Streaming this video clip, gave us more unsteady results that are depicted in Fig. 13. When decreasing bandwidth with longer segment variations (6s and 10s), the playback showed stallings, not caused by buffer underrun. These can result from for example browser settings, device glitch or corrupt video software. The phenomena is demonstrated in Fig. 14. The dark red lines depict the stalling times caused by buffer underrun. Lighter ones are stallings times caused by other reasons. Adding more key frames did not clear up this buffering behavior. However, Video.js player gave consistent results with the Test video 2. These are shown in Fig. 15.

VI. DISCUSSION
Above, the features of four estimators were brought out to evaluate their applicability in HAS algorithms. The smoothed VOLUME 10, 2022 estimator T e and buffer occupancy are possibly the most used (alone or together). The estimator T e evaluates absolute throughput to the next segment request interval. To select the most suitable bitrate, the value of T e is compared with encoding rates that are informed in the playlist file, or for example, to the measured average bitrate of a given number of previous segments. In cases where T e varies a lot, it is not optimal for the adaptation algorithm. This may happen, for instance, in a congested channel, when T e has a constant smoothing factor. On the other hand, the video sections that differ a lot from informed target bitrate, may lead to incorrect decisions of segment quality selection if only T e is used.
Often, the throughput estimator is completed with other metrics. To make T e more applicable for VBR videos, it was observed in relation to the download estimator A e , and the difference e = T e −A e . Estimator e discerns how much of the maximum available bandwidth capacity is unused. When the player fills its playback buffer, maximum utilization is applied. This happens right at the beginning of a streaming session during the initial buffering. In the steady state, the player has idle periods as the playback buffer only has room for a new segment when data are removed from it to play back. Together, T e and e are able to tell whether the evaluated throughput is sufficient for the current bitrate, and in theory, estimate the amount by which the video bitrate can be raised or reduced.
Alongside throughput estimators, buffer occupancy is often used in adaptive algorithms by setting different threshold values for it. These values define when to start playback and switch quality level. When used alone, the appropriate segment bitrate selection is not based on any throughput measurements but has to be done for example one quality level at a time. This can cause for example too slow reacting when the buffer is full.
To observe the relative throughput, we simply compared the segment download time to its assumed playback time (segment duration). Estimator S e is a smoothed version of that metric. The approach combines features from buffer occupancy and throughput estimation. The variation of S e results both from throughput and segment size changes. A simple goal of the adaptation algorithm using S e may be selecting a segment that keeps S e right below the set threshold value λ.
A more ambitious algorithm would allow exceeding λ occasionally. The time possible to spend above the threshold value depends on the buffer occupancy development. Buffer occupancy can be obtained by recording in-and outgoing media seconds during S e tracking. Estimator S e also tells whether the video bitrate should be raised or reduced.
The behavior of estimators was tested with two playersa commercial JW Player and open source Video.js. Since the HTML5-versions of these players did not express any resource-dependent adapting buffering methods, players' performance were very similar. Only difference was observed when long segments (6s and 10s) where played back in insufficient bandwidth conditions. This caused JWPlayer to pause playback occasionally before buffer underrun. In addition, as a default, the players collected only one segment before starting playback. This resulted in similar initial buffering behavior. As the estimators in this paper predict the metrics for the next request interval, lengthening the segment duration smooths out the estimates; this requires adjusting parameters. Otherwise, changing the segment length hardly affected the QoE metrics observed with JW Player.

VII. CONCLUSION
This paper presented metrics that can be used as building blocks for adaptive algorithms of HAS. We explored four different approaches to evaluating streaming conditions from a client's side. A commonly used test video was played in different bandwidths and throughput conditions, both altering them in the middle of the playback and using the same channel conditions throughout the video. This study also brought out challenges that VRB encoding causes to streaming condition estimators. When the encoding bitrate varies, it is often also with the metrics evaluating streaming conditions. Thus, adaptation algorithms should be designed to handle and interpret fluctuating metrics.
The ability of estimators to predict influence factors of QoE were evaluated. Stalling frequency and total stalling time were used as response variables. Estimator S e may be the most applicable in predicting streaming conditions independently. Throughput estimator T e usually needs additive information. The work in this paper also demonstrated that lengthening segments is not a straightforward solution for increasing QoE, even when the initial delay is not considered.
The adaptation algorithms of players tested in this paper do not include specific buffering methods. Thus, the true differences of players will probably manifest only in a multibitrate environment. Future work will consider testing the S e estimator as part of an adaptive algorithm. In addition, more precise information on video complexity should be utilized to better optimize the bitrate switching. He is currently a Professor in computer science with Kokkola University Consortium Chydenius (KUC), University of Jyväskylä. He is also the Vice-Director of KUC and the Head of the Information Technology Unit, KUC. His research has focused on intelligent and autonomous sensor systems and he leads the research group in this field. His current research interests include wireless sensor networks with emphasis on data processing, programming frameworks and middleware, localization, and the design and performance evaluation of wireless sensor systems. VOLUME 10, 2022