Congestion Control for Cloud Gaming Over UDP Based on Round-Trip Video Latency

We describe a network congestion control mechanism for cloud gaming (CG) platforms based on the user datagram protocol (UDP). To minimize the contribution of the downstream transmission delay to the total end-to-end latency in the interaction–perception loop, we first define the round-trip video latency (RTVL) and develop a congestion model. Based on them, we design and implement an adaptation strategy that detects the early stages of congestion to prevent high values of RTVL and network bufferbloat, thus avoiding packet losses. Using data measured from the network, our strategy modifies the target output bitrate of the video encoder to throttle down or upto the data flow sent by the server to the client. In the presence of sudden downstream channel capacity drops of over 40%, our algorithm reactively manages to satisfy the key CG requirements for interactive games by entirely avoiding the packet losses and keeping the RTVL below 100 ms. In reasonably stable network conditions, our algorithm proactively keeps exploring for higher bitrates and building a “network state dictionary,” due to which it achieves an effective downstream channel capacity use of ~95%.


I. INTRODUCTION
The improvement of network infrastructures in recent years, mainly due to the proliferation of FTTX (Fiber To The X) deployments which provide high-speed and symmetric bandwidth connections, has gone hand in hand with the growth of cloud services.Among the variety of services flowing through Internet, some of them are more demanding than others in terms of the QoS (Quality of Service) parameters that can be measured in the network.Even more important than the objective QoS is the QoE (Quality of Experience) perceived by the user, especially for interactive audio-visual services, whose QoE is very dependent on the total latency, and on the quality of the video displayed to the user.
CG (Cloud Gaming), a.k.a.GaaS (Games as a Service), is one of such services, which has been possible since powerful GPUs (Graphics Processing Units) and FTTX enabled the (almost) real-time execution of the ''game play and ren- The associate editor coordinating the review of this manuscript and approving it for publication was Nizar Zorba.
dering, plus video coding, transmission, decoding and display'' chain.But it is among the interactive services with the strictest requirements on transmission delays and packet losses.Jarschel et al. [1], [2] carried out a study of the factors that degrade the QoE for CG, which sets some boundaries in objective terms of QoS, based on subjective tests.They concluded that downstream packet loss is the most important parameter, followed by downstream delay and jitter.This makes it very challenging to design a user-friendly CG platform.
Indeed, guaranteeing no losses and a low latency throughout a game session is not always possible: there are relatively static limitations, such as the network technologies/capabilities by region [3], but also intrinsically dynamic ones, such as network congestion.Some Internet services are delivered to the client over managed network infrastructures, but CG must be understood as an OTT (Over-The-Top) service competing with other OTT services over unmanaged networks.In the OTT framework, there exist some solutions called ''DiffServ vs. InServ'' (Differentiated vs. Integrated Services), but their deployment depends on the underlying architecture of the Internet service provider, so they are not a short-term solution for us.
Our research has focused on designing an algorithm for C 3 G (Congestion Control for Cloud Gaming) to meet the strict CG requirements of having minimal losses and keeping latency within playability limits.To do so, we first worked on understanding the network congestion process to find in latency good predictors of network ''bufferbloat'' [4] (which ultimately leads to losses), and this prompted us to analyze the measure of both losses and latency.
Video packet losses yield different decoding errors depending on the type of packets which are lost, e.g., the video compression picture type (I, P or B) they belong to.In a noninteractive video streaming scenario, these different errors have a varying impact on the perceived quality of the decoded video, but in a CG scenario they may also disturb in completely different ways the user interaction, i.e., the game play itself.However, defining and measuring losses is trivial.
On the other hand, defining interactive latency is more complex, because many delays contribute to it, so several studies related to our work have started from different definitions of the end-to-end delay.We highlight Wen and Hsiao's RTRD (Round-Trip Reaction Delay) [5], which includes all delays in the interaction-perception loop.Other researchers related (parts of) RTRD to playability [6]- [10] and concluded that latency thresholds depend on the game type, but one can generally assume that keeping the RTRD below 100 ms guarantees a good playability, because it does even for the ''fastest'' games, such as racing ones or first person shooters.
In the general context of video streaming, and regarding packet loss reduction, a long-term buffer could be included at the client side to allow for more retransmissions, and thus ensure the arrival of complete frames before they must be decoded.But the price is adding an arbitrary delay at the decoder, thus increasing latency.Several widely used video streaming techniques help reduce latency and mitigate the image quality degradation due to packet losses.For instance, in scenarios like ours where low latency is critical, it is good encoding practice to avoid using B frames altogether, or set the VBV (Video Buffering Verifier) size to the target frame size.These common techniques to reduce latency logically increase the downstream bitrate, or limit the adaptation capability of the platform to the channel capacity.
As for the more specific context of CG [11], congestion control is still a relatively unexplored field.In Section II, we summarize our findings on channel-adaptive algorithms developed to meet the strict QoS requirements mentioned above, both in terms of latency and losses, and at the same time try and maximize the effective use of the downstream channel capacity.
Our contributions start by modeling the behavior of the network at different stages of the congestion process.As we explain in Section III, we immediately found it essential to determine the contribution of the downstream delay to the RTRD.We therefore defined the RTVL (Round-Trip Video Latency) before designing our congestion model, which is based on that of BBR (Bottleneck Bandwidth and Round-trip propagation time) [12], a congestion control algorithm recently developed by Google.Our model allows us to establish a relationship between the maximum channel capacity and the buffer size of the potential network bottlenecks, the amount of data sent to the network (by setting a video encoder target bitrate), and of course the RTVL itself.
As a result of our congestion model, we also designed a generic adaptation strategy explained in Section IV, and implemented an algorithm for C 3 G based on UDP at the application layer.This implementation was adapted according to the limitations of the commercial CG platform we used for testing, as described in Section V.
Section VI reports on the tests we carried out with our algorithm, in particular to compare it with BBR.The experimental results show how our algorithm reacts quickly to channel capacity drops by reducing the target video encoding bitrate, and manages to completely avoid losses for sudden drops of up to ∼42%.The results show as well that our algorithm keeps the RTVL below 100 ms for steady bandwidth limitations, while continuously exploring for higher target bitrates.This proactive exploration strategy results in an effective use of 93-96% of the available channel capacity.

II. STATE OF THE ART
In this Section, we review existing congestion control algorithms designed to work at either the transport or the application layer, and based on TCP (Transmission Control Protocol), UDP (User Datagram Protocol), or none of them.

A. TRANSPORT LAYER POTENTIAL SOLUTIONS
Theoretically, flow and congestion control is a service that belongs in the transport layer; indeed, the ubiquitous TCP provides applications with transparent flow control and ordered reliable delivery, and has been a cornerstone of the Internet over the past few decades because it excels at delivering non-latency-sensitive loads (up to now the bulk of the Internet traffic).
On the other side of the spectrum, UDP is the trivial transport protocol, which provides no service beyond multiplexing, thereby leaving all flow control, error checking, and message ordering to the applications themselves.In between these two extremes, there are other protocols that provide subsets of the functionality provided by TCP, such as SCTP (Stream Control Transmission Protocol) [13] and DCCP (Datagram Congestion Control Protocol) [14].However, they see very little use because most commercial routers or firewalls do not support them, leaving application developers with TCP and UDP as the only practical choices.
It is worth mentioning that the design of TCP is generic enough to enable interoperation between implementations using different congestion algorithms, but no matter the specific algorithm the latency is potentially unbounded on account of TCP's reliable nature; therefore, TCP is generally not appropriate for real-time, low-latency traffic.Still, some VOLUME 7, 2019 of the congestion control algorithms that have been proposed for TCP provided us with a good starting point.Most of them use packet losses as congestion signal (Tahoe, [New] Reno, [CU]BIC, etc.) and are therefore not acceptable for CG; some others, such as FAST TCP [15] and the already mentioned, more recent BBR [12], [16], use delay measurements to detect congestion before it actually happens.We will cover in more depth different aspects of the latter, which is the most similar algorithm to ours, in Subsections III-A and IV-B.

B. APPLICATION LAYER POTENTIAL SOLUTIONS
Given the problems described above to implement real-time streaming at the transport layer, multiple application-layer solutions have been designed.
VoD (Video-on-Demand) is a service superficially similar to CG in the sense that it needs to react to network conditions in real time, and adapt the video quality accordingly.Popular techniques like ABR (Adaptive BitRate) [17] consist in that the (typically HTTP-based) server offers the same content in a variety of coding presets that the client autonomously switches between, depending on the evolution of its own reception buffer.However, these techniques are only possible because the content is not user-dependent and users can easily tolerate delays in the order of seconds, allowing time for the server to pre-generate all preset qualities.
Video conference, on the other hand, is a service much closer to CG: its content is session-specific and produced in real time, and it has stricter delay requirements [18].Still, video conference can tolerate significantly higher delays than CG, and also drastic video bitrate reductions because the essential source of information is (low-bandwidth) audio, while the video information is essential to CG.Hence, most proposals in the literature relevant to CG have been purpose-designed.
Jarvinen et al. [19] used TCP's RTT jitter to detect network congestion, and then decide upon the triggering of the bitrate adaptation.In their solution, a video adaptation module constantly monitors the network status, and dynamically adjusts the encoder target bitrate using an AIMD (Additive-Increase/Multiplicative-Decrease) scheme, just like TCP.They consider RTT jitter to be a binary congestion signal, which causes too much oscillation in the target bitrate when there is a persistent bandwidth limitation.Furthermore, their proposal is TCP-based and does not offer a mechanism to cope with losses and subsequent retransmissions.
Wang and Dey [20] proposed a procedure that requests one of several preset-and game-specific bitrates depending on downlink delay thresholding, as well as a method, based on the same metric, for reducing the play-out buffer delay.Another proposal of theirs [21], also based on delay thresholding, dynamically modifies game rendering parameters to modulate video complexity.Both proposals use a previous network probing mechanism [22] to measure delays and losses, and share the same shortcomings: they require a deep analysis of the characteristics of each game, and their use of discrete bitrate presets does not provide sufficient adaptation granularity.
More recently, Hong et al. [23] proposed a MOS (Mean Opinion Score) based model for dynamic frame rate and bitrate adaptation on the open-source CG platform GamingAnywhere [24].The model implicitly considers the available bandwidth, which estimation is inspired in WBest [25]; it keeps track of the dispersion time of the video packets but does not consider a threshold over the latency.
Finally, other proposals for CG [24] simply leverage RT[C]P (Real-time Transport [Control] Protocol) over UDP, a protocol designed for audio and video streaming that provides QoS feedback, temporal reconstruction and loss detection.However, they do not implement bandwidth management, guarantee a given QoS, or provide any means to address congestion control.

III. CONGESTION MODEL
From the different approaches reviewed in Section II, we based our work on BBR, whose network congestion model and adaptation strategy are explained in more detail in Subsection III-A.Subsection III-B introduces the congestion model of our C 3 G algorithm.Before explaining in Section IV our control strategy to adapt the server output bitrate to a varying channel capacity, we describe our C 3 G discrete-time congestion detection algorithm in Subsection III-C.

A. BBR's CONGESTION MODEL AND ADAPTATION STRATEGY
BBR's network congestion model is based on the one defined by Kleinrock [26], which considers that an arbitrary complex path formed by many links behaves as a single one, whose bandwidth is the minimum of those of all individual links.Kleinrock defined the OOP (Optimal Operating Point) of a global path or individual link as the transmission bitrate allowing to use that path/link at its maximum channel capacity while keeping a minimum transmission delay.If the sender transmits a bitrate lower than OOP, the channel is underused; but if it tries to transmit a higher one, the delay increases while the effective delivered bitrate does not.
All these bitrates, delays, and therefore OOPs may of course vary along time in a real network, and BBR relies on TCP to define and estimate two time-dependent variables: BtlBW , the bottleneck bandwidth, and RTprop, the roundtrip propagation time.The latter is the minimum of all RTT s reported by TCP over some time window (typically tens of seconds to minutes), and the former is the maximum of the delivery rates (ratios of delivered data to elapsed time) calculated over some other time window (typically 6-10 RTT ).Based on these two variables, BBR defines as well the BDP (Bandwidth Delay Product), which is simply The key premise of BBR's congestion model is that the OOP is found when BDP = inflight, a native TCP parameter representing the number of unacknowledged sent packets.We measure RTVL(t ) at the server, for each video frame, as the time elapsed since the raw frame output by the game engine was sent to the video encoder (S 0 ) until its decoding ACK is received from the client (S ACK ).
If inflight > BDP, a packet queue starts to grow at some bottleneck link, and RTT increases linearly with inflight.
BBR's adaptation strategy consists in continuously estimating BtlBW and RTprop, hence BDP, and pacing the packet transmission to have inflight match or remain just below BDP.It also aims higher periodically by tentatively increasing the transmission bitrate, and then immediately decreasing it, according to what is known as a MIMD (Multiplicative Increase / Multiplicative Decrease) scheme based on a cycle of fixed gains: see Subsection IV-B.

B. C 3 G's CONGESTION MODEL
BBR's congestion model cannot be implemented over UDP since it requires the native TCP parameters mentioned in the previous Subsection.Besides, in our application layer context of video transmission and CG in particular, it makes more sense to have the client send ACKs per video frame, instead of per UDP packet (which anyway would completely defeat the purpose of using UDP vs. TCP).
This led us to define the RTVL (Round-Trip Video Latency) illustrated by Figure 1.C 3 G's congestion model assumes that the client sends an explicit ACK after having received all the UDP packets of a video frame and decoded it.Upon reception of that ACK at instant S ACK , the server computes RTVL(t) for that particular frame as the time elapsed since its raw version output by the game engine was sent to the video encoder at instant S 0 .
Our application-oriented, per-frame RTVL(t) is meant to be analogous to BBR's (in fact, TCP's) RTT .To continue with this analogy, and be able to seek an OOP defined similarly to BBR's, we need a way to estimate an equivalent to TCP's inflight, which is inherently impossible in UDP.This is why we measure the server-sent and client-received bitrates, λ(t) and µ(t), which are the accumulated sizes of the sent and received UDP packets during a certain time window.Note that RTVL(t) includes, as explicitly shown in Figure 1, the video frame encoding and decoding delays, which we will assume constant for the moment, although they do depend (weakly) on λ(t) and µ(t).
C 3 G's ideal congestion model also assumes that the time windows of both server and client are properly aligned, so when there is no congestion µ(t) = λ(t).However, as shown in Figure 2, a link in the network may turn into a bottleneck if its maximum capacity c btl (t), which is equivalent to BBR's BtlBW , is lower than λ(t): in such a case, its buffer of size btlBuffer will start holding an amount of queued data qdData(t) > 0 waiting to be delivered, and µ(t) < λ(t).
Figure 3 illustrates how C 3 G's congestion model works by showing how µ(t), qdData(t), RTVL(t), and packet losses l(t) would evolve over time in a rather academic example scenario where λ(t) would increase linearly with time, while c btl (t) would remain constant.Three phases can be identified: , so there is no congestion.Besides, RTVL(t) = RTVL min , so the latency is minimal, but this phase is nevertheless sub-optimal because the channel is underused.2. t 0 ≤ t ≤ t 1 : The OOP is reached at t 0 , when and because λ(t 0 ) = c btl (t), but then, as λ(t) keeps growing, UDP packets start to accumulate in the buffer because µ(t) remains equal to c btl (t).Both qdData(t) and RTVL(t) increase: During this congestion phase, it is still possible to react before incurring losses, because the buffer is not full until t = t 1 , when RTVL reaches its maximum value RTVL max , which is imposed by the buffer size.As for qdData(t now ), it is calculated by adding all differences λ(t) − µ(t) since the execution instant t cong at which congestion was detected for the first time.
We will elaborate on t cong in Subsection IV-A, and in Section V on other implementation details such as the misalignment of the λ(t) and µ(t) time windows, but we precisely state already here how our C 3 G algorithm should ideally calculate its Boolean variable congestion(t now ), to be able to better explain its adaptation strategy: Note that both conditions must be met for congestion(t now ) to become true: the mere fact that qdData(t now ) > 0 is not too worrying in itself, because btlBuffer might suffice to mitigate the variations of λ(t) and c btl (t) -this is precisely why the buffer is there!In fact, when we detect that qdData(t now ) > 0, we take advantage of it to estimate c btl (t now ) = μ(t now ).But if we see that, on top of having queued data, RTVL(t) has increased, then we do declare congestion(t now ) true.

IV. ADAPTATION STRATEGY
As hinted above, our C 3 G algorithm estimates the channel capacity and tries to reach the OOP by adapting λ(t) accordingly.Slivar et al. [27] carried out tests to derive video encoding adaptation strategies for CG, and concluded that the game type must be taken into account when evaluating QoE, but for both games analyzed in their paper, fluidity (framerate) had a more significant impact on the QoE than video quality (bitrate).The framerate being a very sensitive parameter, we decided not to act on it to modify λ(t), but only on the target bitrate for the video encoder, v(t).In Subsection V-C we elaborate on the difficulties involved in commanding λ(t) through v(t) but, for the purposes of this Section, we will assume that our control algorithm is indeed able to set λ(t) directly.
There are two distinct situations for λ(t) adaptation, respectively analyzed in the two Subsections below: A. When our algorithm detects a channel capacity drop, it reactively decreases the target λ(t), which is challenging: if it is too low, qdData(t) = 0 will indeed be reached soon, thus exiting the congestion phase 2 of Figure 3, but video quality, hence QoE, will be poor while the buffer is drained; on the other hand, if λ(t) is set too high, i.e., too close to its pre-congestion value, video quality might remain acceptable, but at the risk of reaching the bufferbloat phase 3. B. At all times during a period of apparently stable channel capacity, our algorithm proactively explores higher acceptable values of λ(t) to escape the sub-OOP phase 1.
Again, this must be done carefully to avoid stepping too deep into the congestion phase 2 and increasing too much the latency or, even worse, ending up in the bufferbloat phase 3.

A. REACTIVE BITRATE DECREASE
Once we detect congestion (see Equation 1), we know we might have to launch a λ(t) adaptation sequence like the one shown in Figure 4, which starts by lowering it to λ drain to drain the buffer and reach qdData(t) = 0 again as soon as possible.However, to avoid over-reacting, we do not necessarily lower λ(t) whenever we detect congestion(t now ).Instead, we tentatively set t cong = t now (to mark the starting point for the qdData(t) sum, as explained in Subsection III-C), and then try and predict in two ways whether RTVL(t) will exceed the maximum acceptable threshold RTVL th (e.g., 100 ms) during the next control loop cycle, i.e., for some t ∈ [t now , t next = t now + t].We calculate the following two predicted values: 1. RTVL 1 (t next ) is linearly extrapolated from all RTVL(t) samples recorded during the last control cycle, [t prev , t now ]; 2. RTVL 2 (t next ) assumes that qdData(t now ) is still far below btlBuffer (i.e., that t now is still close to t 0 in Figure 3), and that it will increase the last known value RTVL(t last ) at a rate given by μ(t now ), which we found to be a reasonable estimate of the bottleneck capacity c btl (t now ), as explained at the very end of Subsection III-C: .
If any of these two predicted values is larger than RTVL th , we declare drainNeeded(t now ) true, set t drain = t now , and trigger the adaptation sequence, as explained below.
But if none is, we keep calm and carry on. . .This means that congestion(t) might be true at several successive runs of the C 3 G algorithm, and then turn false without drainNeeded(t) ever becoming true.Another couple of good things that might happen ''naturally'', after one or more successive runs in which congestion(t) is still true, is that λ(t now ) < μ(t now ) (qdData(t now ) is still positive, which is the first condition for congestion(t now ) to be true, but smaller than qdData(t prev ), so the buffer is draining) and that the RTVL(t) samples recorded during the last control cycle show a decreasing tendency (although RTVL(t now ) > RTVL(t prev ), which is the second condition for congestion(t now ) to be true).If those two things do happen, we reset t cong = t now , again tentatively.
Once drainNeeded(t now ) is true, so a server-sent rate adaptation sequence like the one of Figure 4 must be launched, we start by calculating λ drain to set it as the new target for λ(t).This value must of course be lower than the estimated bottleneck capacity to help drain the buffer and reach qdData(t) = 0 again as soon as possible.In principle, it could be desirable to completely drain the buffer in a single control loop cycle, which would be achieved with: But this could imply setting v(t) below the minimum acceptable target video encoding bitrate, v min .This is why we impose a drain period T drain , possibly much larger than t, during which the target server-sent rate is kept equal to λ drain .We calculate it as follows: Finally, as suggested as well by Figure 4, once T drain is over and the buffer is completely empty again, our C 3 G algorithm increases the target server-sent rate to achieve the OOP, and sets it to λ OOP = μ(t drain ), which is the channel capacity estimated just before entering the drain period.

B. PROACTIVE BITRATE INCREASE (EXPLORATION)
In times of apparently stable channel capacity, any OOP-seeking λ(t) adaptation algorithm must explore higher acceptable rates, but must try to be very cautious in doing so, to avoid exceeding c btl (t) so much that RTVL(t) > RTVL th , or that l(t) > 0 due to bufferbloat.This is especially true in the CG context, where a radical increase of λ(t), when it is already close to c btl (t), may cause a fast and vast data accumulation in the bottleneck buffer, hence any of the two undesired consequences above, which lead both to an unacceptable QoE.
As summarized in Subsection III-A, in terms of exploration, BBR uses a MIMD scheme based in a periodic cycle of gains applied to the server-sent rate to try and reach the OOP (BDP = inflight) of a channel of potentially increasing capacity. Figure 5a shows (from left to right) how: 1.The first two attempts to apply 1.25 factors (i.e., 25% gains) to the server-sent rate are not successful, because c btl (t) is exceeded (which is detected because inflight > BDP), so the target rate remains the same for the following cycles.2. On the contrary, the third and fourth attempts are successful, because the channel capacity c btl (t) has indeed increased, and the corresponding 1.25 factors for the target rate are both consolidated.3. The last two attempts to increase the target rate are unsuccessful, like the first two.BBR's exploration strategy is inadequate for our CG over UDP context for two main reasons, the most obvious being that the x1.25 (i.e., +25%) gain is too greedy, which often impacts badly the QoE.The straightforward solution to this problem would be lowering the gain factor (to, say, x1.2 or x1.15).However, this would not only cause a delay in reaching the new (higher) channel capacity, but also an undesirable oscillation in λ(t).Indeed, the second reason why BBR's exploration strategy is unsuitable in our context is VOLUME 7, 2019 that its fixed pattern of gains ignores the history of the already explored target rates, along with the QoS parameters (BDP and inflight).Unlike BBR's, our strategy does take exploration history into account to dynamically modify its gain factor and, by doing so, yields: i) a smoother λ(t), thus avoiding too many oscillations in the video encoding bitrate, and ultimately in the QoE; and ii) a more efficient channel use.We manage to be conservative when a recent tentative increase of λ(t) over the last known c btl (t) has caused drainNeeded(t), but more aggressive if this has not happened so recently, which usually means that c btl (t), hence λ OOP , have themselves increased.
Like BBR's, our exploration strategy is based on cycles, or exploration periods, T exp (t), but their duration is not fixed.Our steps for λ(t), called λ exp (t), are not fixed either, and do not depend exclusively on λ(t).In fact, our exploration period and step depend on our exploration gain g(t), which is an integer between 1 and g max : [NB: in our tests, we set g max = 50 and λ exp,min = 10 kb/s.] At the end of each exploration period, we increase, maintain or decrease g(t), and then set T exp (t) and λ exp (t) for the new period.Before explaining our criteria for this decision, we want to stress that, as shown in Figure 5b, our steps of variable width and height allow us to be cautious in the vicinity of a constant channel capacity, whatever the current target rate.On the other hand, when there is indeed a newer, higher channel capacity to be discovered and matched, the simultaneous decrease of T exp (t) and increase of λ exp (t), both due to an increase of g(t), yield an ''exponentialish'' increase of λ(t), which matches quickly the new c btl (t).
We decide to increase, maintain or decrease g(t) by comparing the network QoS behavior during the just-finished period with its previous, saved/logged behaviors since congestion was last detected (and g(t) was initialized to g min ).We define ''network QoS behavior'' by means of a channel rating function R( λ) which combines four QoS parameters measured or estimated during an exploration period, namely the average, standard deviation, and maximum value of RTVL(t), and the amount of qdData(t): The four coefficients a, b, c and d are meant to balance the contribution of these different QoS parameters of the rating function, i.e., to give more relative importance to any of them.In any case, they must all be positive since large R( λ) values represent undesirable tendencies, because all its four QoS parameters do: large or highly variable/unpredictable latency, or a lot of undelivered data.[NB: in our tests, we used (a, b, c, d) = (1, 10, 1, 2).] Based on our channel rating function, what we save/log is what we call ''states'', defined as couples s( λ) = ( λ, R( λ)), where λ is the average value of λ(t) during a just-finished exploration period.As time goes by, our C 3 G algorithm gradually builds a ''network state dictionary'', S = {s( λ)}, by storing each new couple s( λ) if λ had not been explored yet, or by possibly updating its R( λ) if it had, as explained below.Note that the discretization effected by λ exp,min helps accelerate searches in this state dictionary, which are necessary to choose between the following three candidates for the new target λ(t): After calculating these three candidates, our algorithm traverses the following decision tree, which we designed to favor increases in λ(t): note that the second sub-case of B.b.2 is the only situation where λ − is chosen, and bear in mind that R(λ 1 ) < R(λ 2 ) means that the network behaves better for λ 1 than for λ 2 .
A. If λ = is not found in S: a.If λ + is not found in S: λ + is chosen because the main mission of our exploration strategy is precisely to aim higher.b.If λ + is found in S: if R(λ + ) > R( λ), λ + is chosen because it seems that the network behaves better now for the just-explored rate than in the past for a higher one; otherwise, λ = is chosen, hoping that things will improve during the next exploration period.B. If λ = is found in S: a.If λ + is not found in S: if R(λ = ) > R( λ), λ + is chosen because it seems that the network behaves better now for the just-explored rate than in the past; otherwise, λ = is chosen, hoping as in case A.b that. . .b.If λ + is found in S: 1.If R(λ + ) > R( λ), λ + is chosen because it seems that the network behaves better now for the justexplored rate than in the past for a higher one (see case A.b).

If R(λ
because it seems that the network behaves better now for the just-explored rate than in the past, but not as much better as in case B.a; otherwise, λ − is chosen because it is the only sensible option.Once λ + , λ = or λ − is chosen, R( λ) is updated if needed.

V. IMPLEMENTATION DETAILS
In this Section we address the implementation of our C 3 G algorithm explained in the previous two Sections, and how we had to tailor it to PlayGiga's CG platform [28].In particular, we give some details on how we monitor the network parameters, deal with packet losses and retransmissions, and set λ(t) through v(t), to achieve effective congestion control in a real environment.
But before doing so, we want to highlight the importance of running at the server (vs. at the client) our C 3 G algorithm to detect congestion and adapt λ(t).We believe this has at least the following three advantages: 1. it implies less action-reaction time, since the decisions are taken where the video encoder operates, thus saving the transmission time needed to report any decision from the client to the server; 2. from the deployment viewpoint, it is easier to update one server than multiple (types of) clients; 3. a centralized algorithm helps manage multi-player services which can share the same streaming channels.

A. MISALIGNMENT OF THE SERVER AND CLIENT TIME WINDOWS
Our C 3 G algorithm takes its decisions based on µ(t), which must be periodically measured by the client and reported to the server.It is therefore the server's responsibility to calculate λ(t) for the same time window used by the client for µ(t), as we assumed in Subsections III-B and III-C.In PlayGiga's CG platform, the client does measure µ(t) and send it to the server aboard TCP KA (KeepAlive) packets at regular time intervals.[NB: TCP's KA packets are typically meant to carry no meaningful data, and just used by one peer to check as needed that the other peer and the link between the two are still ''alive'', which is confirmed (or not) by the reception (id.) of replies to the KA probes.]Although this is not required by our C3G algorithm, the period used at the client for these KA reports is the same t used at the server for the congestion control loops: see Subection III-C.This is shown by the right side of Figure 6: when KA packet #2 is due, t after #1, the client adds the amount of data carried by all UDP packets received during that µ(t) time window (from #1 of video frame A until #1 of video frame F), divides it by t, and reports the resulting µ(t) via KA packet #2.
The main goal of Figure 6, however, is to illustrate the misalignment of the µ(t) time window described above with the corresponding λ(t) window on the left, which is not only shifted ''down'' in time, but also of a different size, due to both the uplink and downlink propagation delays: the first affects the KA packets, which leads to the time span difference ±δt; more importantly, the downlink delay of the UDP packets, and their potential losses, affect the number of them taking part in the computation of µ(t).Both delays yield a noisy function of λ(t) − µ(t) differences, thus a noisy estimation of qdData(t) for our congestion detection and adaptation strategy.
To filter this noisy function, we consider only its positive values above a certain threshold , which is periodically updated to hold the maximum value of λ(t) − µ(t) reported without signs of an increasing RTVL.In our implementation, the first condition of Equation 1, qdData(t now ) > 0, was replaced with λ(t now ) − μ(t now ) > .
Note that all this does not affect the computation of RTVL(t) at the server, for which we used exactly the procedure illustrated by Figure 1: 0. the server takes a timestamp S 0 before sending each raw frame output by its game engine to its video encoder, and, once it is encoded and fragmented in n UDP packets, 1. . . the server sends these UDP packets to the client, which might not receive some of them; n. when all n UDP packets have been received, or after the presentation timeout described below, the client sends the re-assembled frame to its video decoder, and, just before displaying it, sends a frame ACK to the server; A. upon reception of that ACK, the server takes a second timestamp S ACK and computes RTVL(t) for that frame.

B. PRESENTATION AND RETRANSMISSION TIMEOUTS
Contrary to what Figure 1 implies, a real CG client must consider UDP packet losses and might therefore have to stop waiting to receive all packets of a video frame before decoding and displaying it.A ''presentation timeout'' is typically imposed on the difference C n − C 1 , and if it is reached before all packets of a video frame have been received, the frame is nevertheless decoded and displayed, obviously with decoding errors.In PlayGiga's CG client, the presentation timeout was set to 100 ms.Note that, regardless of the value chosen for the maximum acceptable latency, RTVL th (see Subsection IV-A), its actual saturation value, RTVL max (see Figure 3), should be equal, in the general case of a frame with packet losses, to the transmission delay of its first packet plus its presentation timeout.But in the particular case of a completely lost frame for which no UDP packet is ever received, RTVL max would be unbounded.PlayGiga's CG client handles this particular case by sending to the server an ACK for the completely lost frame after a later frame has been received, and with a low priority (i.e., at the end of the control algorithm execution), so RTVL max may in fact reach values as high as 350 ms.
Besides the presentation timeout, real CG clients also have a ''retransmission timeout'' associated to an application buffer which allows for requesting retransmissions of notyet-received (and thus potentially lost) UDP packets to the server.In PlayGiga's CG client, the retransmission timeout was set to 30 ms.
Dealing with packet losses and retransmissions requires two modifications in our adaptation strategy: 1.In the event of network congestion, the client requests the retransmission of UDP packets which are lost or delayed for too long, so the server-sent rate does not only depend on the video encoding parameters, but also on the retransmission percentage.To allow for the bottleneck buffer to drain, this must be taken into account when computing λ drain : r(t) being the fraction of retransmissions with respect to λ(t) for the considered analysis time window, Equation 2becomes 2. Since retransmissions and losses also have an impact on the network QoS, the rating function of Equation 3 must also be modified to include them: [NB: packet losses l(t) are calculated by the client, for each frame, as the fraction of its non-received UDP packets at the time it is sent to the decoder.]

C. SERVER-SENT VS. VIDEO ENCODER BITRATES
Setting a new target server-sent bitrate, λ(t), by setting a new target video encoder bitrate, v(t), is not straightforward, since different factors are involved, such as the CG platform used (there are several proprietary solutions, notably from NVIDIA, Intel and AMD), the video encoder provided by it and its API, and the nature and complexity of the video sequences to be encoded, i.e., the output of the game engine.Besides, specifying a particular v(t) value does not necessarily mean that the real output bitrate of the video encoder will match it exactly and even less instantly, so our C 3 G algorithm always modifies v(t) by taking into account the λ(t)/v(t) ratio of previous iterations.

VI. EXPERIMENTAL RESULTS
In the experimental tests we carried out to compare C 3 G's performance with BBR's in a real-world CG platform, we focused in particular on its capabilities to: i) rapidly adapt to network congestion while minimizing its negative effects on the user's QoE; and ii) use the maximum channel capacity in stable channel conditions.We tested both algorithms using PlayGiga's CG platform [28], which provides realistic conditions in an end-to-end system, including transmission over a real WAN (Wide Area Network) with realistic QoS degradation, e.g., propagation delay, spurious losses, jitter, etc.Additionally, we implemented limitations on the channel capacity in the client side to test the response of both BBR and C 3 G under different bandwidth conditions.
Subsection VI-B describes the design of our test benchmark, and Subsections VI-C and VI-D report and discuss on the results obtained by C 3 G vs. BBR in the two scenarios described in Section IV.But first, Subsection VI-A explains how we had to adapt BBR, originally designed to operate at the transport layer, and using TCP.Since recent works have used it at the application layer [29] in the Gaming Anywhere [24] CG platform, by following the same approach, we could compare both methods in a common UDP-based CG platform.

A. BBR's ADAPTATION TO PLAYGIGA's CG PLATFORM
Given that video transmission is based on UDP in Play-Giga's CG platform, we had to adapt BBR to operate in the absence of TCP parameters.Therefore, we had to derive BBR's parameters RTprop, BtlBW and inflight, described in Subsection III-A, from C 3 G's QoS parameters RTVL(t), µ(t) and λ(t), described in Subsection III-B.
This called for changes in the implementation of BBR's rate control strategies, which were aimed at keeping intact its two cornerstones: i) a gain-based adaptation scheme to react to channel capacity drops; and ii) a probe cycle algorithm to explore higher channel bandwidth limits.We based our re-implementation of BBR on its implementations at the transport layer for ns-3 [16], [30], and on its adaptations for the application layer [29].
1) DERIVATION OF BBR's PARAMETERS FROM C 3 G's 1.We replaced TCP's RTT by RTVL(t), which is meant to be analogous to RTT , only ''frame-wise'' (i.e., at the application level), instead of ''packet-wise''.We thus defined BBR's RTprop as the minimum value of all RTVL(t) samples in a time window w (we used w = 2 t). 2. We obtained BBR's BtlBW from µ(t), which represents a valid estimation of c btl (t) in the case of congestion, as explained at the very end of Subsection III-C.3. We estimated the amount of unacknowledged sent bits as inflight = (λ(t) − µ(t)) w (we used w = 1 s).

2) IMPLEMENTATION OF BBR NETWORK ADAPTATION STRATEGY
BBR follows a gain-based strategy for rate control with two phases, Startup and ProbeBW, which apply different strategies to derive increasing/decreasing data rate gains.Figure 7 shows the scheme of these two phases as implemented in our test platform, following the guidelines of [29].In BBR's original implementation, these gains modify the channel bandwidth estimation BtlBw, which ultimately modifies the transmission rate.With the same spirit, in our implementation, these gains are applied directly to the target encoding bitrate λ(t). 1. Startup: In this initial phase, the bitrate is iteratively doubled (x2 gain).In each iteration, congestion is checked and, if detected, the bitrate is halved (x0.5) to allow the bottleneck queue to drain.After bitrate reduction, or when the bitrate reaches a predefined upper limit called ''plateau'', BBR moves to the ProbeBW phase.2. ProbeBW: In this phase, BBR uses an approach called ''gain cycling'' to reach a higher throughput: the bitrate is moderately (x1.25) and, whenever congestion is detected, reduced (x0.75) and then stabilized (x1) for six iterations.Note that BBR also includes an additional phase, the ProbeRTT cycle, which is invoked if RTprop has not decreased in the last ten seconds.In that phase, BBR reduces CWND to a minimum value (four packets) to estimate RTprop.This strategy is not valid in a CG application, since reducing the bitrate to a minimum necessarily results in either a degradation of video quality or an increase of RTVL beyond acceptable QoE limits.Therefore, in our BBR re-implementation we omitted the ProbeRTT phase.

B. DESCRIPTION OF THE TEST BENCHMARK
PlayGiga's end-to-end CG platform, that we used for our experimental tests, is depicted in Figure 8 and described below.1. Server: its video encoder ran on an AMD RadeonTM RX 480 GPU and generated a 720p@30fps video bitstream compressed according to the AVC/H.264standard.In all tests, we used a peak-constrained bitrate control, with a moderate range of QP (Quantization Parameter) values, namely [22,40], to generate a stable bitrate output λ(t) by acting on v(t), as described in Subsection V-C. 2. Network: the client was nine WAN hops (and ∼30 km) away from the server.In its local network, a separate PC implemented the TBF (Token Bucket Filter) within the traffic control queueing disciplines [31], acting as a limiter for the channel capacity.The TBF simulated a configurable bandwidth limitation c btl (t) and the typical queuing delay of 100 ms [12], [29].[NB: this implies that packets with a delay over 100 ms are discarded.]To avoid interference from other connections in the results, only the incoming traffic from the game server was shaped by the TBF.
3. Client: it was implemented on a PC equipped with an Intel Core i7-6500U@2.5-3.1GHzCPU with 16 GB of RAM, and an integrated Intel HD Graphics 520 GPU. 4. Video content: all tests were performed using the game ''Sonic & All-Starts Racing Transformed'', and the same ''race'' and game stage, for the sake of fair comparison.This game is very demanding for the video encoder given its high-frequency textures and fast motion (see Figure 9).The encoder was therefore able to produce a high range of bitrate values as commanded by C 3 G.
We compare the performance of C 3 G and BBR in the case of network congestion.The adaptation procedures to congestion of both methods are described in Subsections IV-A and VI-A.In our tests, the channel suffers a ''step-shaped bandwidth drop'' because its capacity is instantly reduced from an initial value, c btl,init , to a limited one, c btl,lim .This same method has been used in previous works to test the resilience of channel-adaptive techniques for interactive realtime applications [32].To characterize the resilience of C 3 G and BBR to such channel capacity drops, we measured three QoS parameters: 1. AP (Adaptation Period): the lapse of time, after the channel capacity drop, during which RTVL(t) exceeds the 100 ms playabilty threshold.2. RTVL peak : the maximum RTVL(t) value during AP. 3. L AP : the number of frames suffering losses during AP.
In our tests, we used a fixed set of values for c btl,lim , namely {5, 7, 9} Mb/s, and an initial capacity c btl,init proportional to c btl,lim : c btl,init = α c c btl,lim .To cover a wide enough range of capacity drops, for each value of c btl,lim , we tested α c ∈ {1.25, 1.5, 1.75, 2, 2.5}, and tried four times each (c btl,lim , α c ) combination, for a total of sixty network congestion tests.
Figure 10 shows the results of this test set.All results are given with respect to the ratio of channel capacity reduction α c , and for different values of c btl,lim .[NB: in practice, the initial channel capacity was defined by an initial target encoder bitrate v init = α c c btl,lim ; as the output bitrate λ init does not exactly match v init , α c values in Figure 10 slightly deviate from the set {1.25, 1.5, 1.75, 2, 2.5}, but this does not affect our conclusions on the results.] In addition, Figure 11 shows an example of the evolution in time of the bitrates and QoS parameters for one specific test case, that illustrates the behavior of each method.
The results for the three QoS parameters in Figure 10 show that C 3 G outperforms BBR in terms of resiliency to network congestion, as we explain in the rest of this Subsection, by analyzing its three sub-Figures one by one.Note that there seems to be no dependence on c btl,lim of either of these three parameters, for either BBR or C 3 G.
Figure 10a shows that BBR's AP values are consistently higher than C 3 G's, for all values of α c and c btl,lim .C 3 G's APs are between 0.4 and 0.8 seconds, while most of BBR's exceed 1 s, even for moderate capacity drops (α c < 2).This difference in AP lengths is illustrated by the RTVL(t) graphs in Figures 11b and 11d.Furthermore, BBR's AP values are more variable and frequently over 2 s throughout all the range of tested values for α c .Long APs have a highly negative impact on the QoE, as they are (by definition) long periods during which the end-to-end latency exceeds the playability threshold, and, besides, they increase the probability of incurring packet losses (see Figure 3).
In particular for C 3 G, and less so for BBR, there is a moderate correlation between AP and α c .This stems from the fact that qdData(t), i.e., the amount of data to be drained, which is measured instantly after the channel capacity drop, is proportional to c btl,init − c btl,lim , which increases with α c .C 3 G achieves lower AP values thanks to its quicker adaptation to congestion by means of the draining mechanism described in Subsection IV-A, which derives λ drain and T drain from an estimate of qdData(t).Instead, BBR's fixed bitrate reduction ratios (x0.75 or x0.5) result in longer APs when there is a lot of queued data, as several iterations are needed.
Figure 10b shows RTVL peak values for both algorithms, which are comparable and highly correlated with α c .The lack of apparent correlation between RTVL peak and c btl,lim could be explained by how the congestion control mechanisms react to a sudden channel capacity drop: RTVL peak is reached very shortly after the drop, when the draining procedure has not yet started; as a consequence, RTVL peak solely depends on the channel capacity reduction ratio, and this for both rate control algorithms.The RTVL(t) graphs in Figures 11b and 11d (for BBR and C 3 G respectively) show that RTVL peak reaches ∼350 ms, which corresponds to the RTVL max value of the system, as described in Subsection V-B.
Figure 10c illustrates how C 3 G outperforms BBR as well in terms of L AP , since the number of lossy frames is consistently higher for BBR for all values of α c > 1.75 (for lower channel capacity drops, i.e., below ∼42%, there are simply no losses).C 3 G manages to keep L AP below ten frames always, but BBR exceeds this value for moderate α c values, and reaches L AP > 20 for quite a few experiments.While for both algorithms L AP increases with α c , this tendency is much slower for C 3 G, again due to its faster adaptation to network congestion (BBR's longer draining periods are more likely to cause bufferbloat, and therefore losses): see for example the RTVL(t) and l(t) graphs in Figures 11b and 11d.
Finally, by considering Figures 10b and 10c together, it can be seen that losses kick in when RTVL = RTVL max : see Subsection III-B.This condition is met for α c > 1.75, but this is true both algorithms.
What does differentiate C 3 G from BBR is the draining strategy, which in our case is adaptive, and based on an estimate of qdData(t), as already mentioned in the analysis of AP and L AP .This allows C 3 G to have shorter APs and fewer lossy frames, thus limiting much better than BBR the negative impact of channel capacity drops on QoE.    over fast bitrate increase when channel conditions improve.C 3 G's bitrate increase strategy described in Section IV-B generates a ''cold'' start that accelerates if no congestion is detected (see Figure 11c).Instead, BBR's ''greedy'' bitrate increase strategy is faster (see Figure 11a).This is the only arguable advantage of BBR over C 3 G, but in a CG application it hardly compensates for all its flaws discussed above.

VII. CONCLUSIONS
We have presented our C 3 G algorithm, designed to help a UDP-based CG platform suffer minimal packet losses and keep latency within playability limits, even in the presence of severe downstream channel capacity drops.The strategy of our algorithm is twofold, since it does not only react when the channel capacity decreases, but also seeks proactively to use it as efficiently as possible when it increases.
Our network congestion model is inspired by that of Google's BBR, which uses TCP's RTT parameter to detect congestion.We propose a novel round-trip latency measure, RTVL, defined at the application layer and better suited than RTT, which is defined at the transport layer, to drive rate control decisions in real-time video streaming applications such as CG.RTVL proves to be a much better congestion predictor than losses, which occur when it is already too late to react.When C 3 G detects congestion, unlike BBR, it decreases the target bitrate of the video encoder in an amount and during a drain period that both depend on the new estimated channel capacity.On the contrary, in reasonably stable network conditions, it proactively explores for higher acceptable downstream bitrates, and gradually builds a network state dictionary to characterize the channel capacity behavior.Both these reactive decreases and proactive increases of the server-sent bitrate may happen within a given game session, and without breaking the playability limits.
Indeed, the experimental results show how C 3 G is clearly better suited than BBR to the CG context, since it manages to completely avoid losses for sudden downstream channel capacity drops of up to ∼42%, and to keep RTVL below 100 ms, while continuously exploring for higher target bitrates, thus achieving an effective use of 93-96% of the available channel capacity.
Nevertheless, we already foresee some desirable improvements to our C 3 G algorithm.For instance, it would be desirable to better align the λ(t) and µ(t) time windows to achieve a better characterization of the network state.This could help us take more accurate decisions in the early stages of the congestion process, and reduce the exploration times, thus allowing us to be more greedy when increasing the target bitrate.Another avenue for improvement may be the use of other encoding parameters along with the final target video encoding bitrate.This could help C 3 G produce smoother target bitrate transitions in its reactive phase, and have a deeper control over bitrate bursts during its proactive phase.

FIGURE 1 .
FIGURE 1.We measure RTVL(t ) at the server, for each video frame, as the time elapsed since the raw frame output by the game engine was sent to the video encoder (S 0 ) until its decoding ACK is received from the client (S ACK ).

FIGURE 5 .
FIGURE 5. Behavior of BBR vs. C 3 G during their typical target server-sent bitrate exploration sequences.(a) BBR.(b) C 3 G.

FIGURE 6 .
FIGURE 6. Misalignment of the time windows used to measure λ(t ) at the server and µ(t ) at the client.

FIGURE 7 .
FIGURE 7. Scheme of the BBR algorithm for the adaptation to the network channel capacity implemented in PlayGiga's CG platform.[NB: i represents an iteration of the algorithm.]

FIGURE 8 .
FIGURE 8. Block scheme of our test CG platform, whose server has a game engine, a video encoder and a rate-control module, and whose client includes a video decoder and a TBF (Token Bucket Filter) to shape the incoming data rate.

FIGURE 9 .
FIGURE 9. Screen captures of the ''Sonic & All-Starts Racing Transformed'' game that was used for the experimental tests.

D. C 3 FIGURE 10 .
FIGURE 10.Performance results of BBR vs. C 3 G in the reactive bitrate decrease tests, for different α c and c btl,lim values.(a) AP (Adaptation Period).(b) RTVL peak : Maximum RTVL(t ) during AP.(c) L AP : Number of lossy frames during AP.

TABLE 1 .
Effective channel use and QoS results for the channel capacity stability test.The results for each value of c btl,lim have been averaged over all tests (different values of α c ).
3. t > t 1 : Once the buffer is full, if λ(t) keeps increasing RTVL(t) remains equal to RTVL max but packets are dropped and l(t) > 0. This phase must be avoided.
C. C 3 G's DISCRETE-TIME CONGESTION DETECTION ALGORITHMFigure3is obviously an over-simplified version of what can be observed and done in a real system.Real congestion detection and control algorithms, like ours, operate in discrete time and typically in an iterative way, because they are invoked periodically.In such real systems, it might be hard to align the server and client time windows used to measure λ(t) and µ(t), hence to accurately estimate qdData(t).Besides, these time windows, even if properly aligned, might be of a different size from the one used to measure RTVL(t) at the server.This is why our C 3 G algorithm, indeed invoked with period t, works with average rates between two successive executions at instants t prev and t now = t prev + t: λ(t now ), μ(t now ) and RTVL(t now ) are the respective averages of all λ(t), µ(t) and RTVL(t) samples recorded for t ∈ [t prev , t now ].

TABLE 2 .
Average EP by algorithm.