Real-Life Implementation and Evaluation of Coupled Congestion Control for WebRTC Media and Data Flows

WebRTC enables users to simultaneously transfer media (over the Real-Time Transport Protocol (RTP)) and data (over the Stream Control Transmission Protocol (SCTP)) between web browsers, multiplexed onto a single UDP port pair. This design choice of using two different transport protocols, each with their own congestion control mechanism, can lead to competition between the flows, resulting in undesirable spikes in queuing delay and packet loss. In this paper, we investigate solutions to the harmful effects WebRTC flows cause on each other by having the different congestion controllers of the flows collaborate. Using implementations in the Chromium browser, we show that our mechanism can combine a set of heterogeneous congestion control mechanisms, fairly allocate the available bandwidth between the flows, and reduce overall delay and losses.

network elements. 23 The separate congestion control (CC) mechanisms within 24 the two different transport protocols in WebRTC can lead 25 to competition between the flows, resulting in undesirable 26 spikes in queuing delay and packet loss. Such competition 27 can be eliminated by using a coupled CC mechanism which 28 The associate editor coordinating the review of this manuscript and approving it for publication was Alba Amato . combines the congestion control mechanisms of all the flows 29 sharing a common path. In [1] and [2], we have shown that 30 our coupling scheme called ''Flow State Exchange'' (FSE) 31 can significantly improve the overall performance of multiple 32 congestion-controlled RTP sessions in terms of delay and 33 packet loss, and that it allows to exert a precise allocation of 34 the available bandwidth. However, this mechanism only com-35 bines a set of homogeneous congestion control mechanisms 36 and therefore cannot be readily applied to combine the data 37 and video flows in WebRTC, since they use two different CC 38 mechanisms: a delay-based CC mechanism for media and a 39 loss-based CC mechanism for arbitrary data. 40 Because loss-based CC mechanisms fill the queue until 41 packets are dropped, the competition between the flows 42 leads to undesirable spikes in queuing delay and packet 43 loss for the RTP flow. Combining a heterogeneous set of 44 CC mechanisms can therefore yield several performance 45 benefits, especially when one of the mechanisms reacts to 46 a congestion event earlier than the others. This has been 47 shown by Flohr et al. in [1], [3] with an extension of the 48 FSE called ''FSE Next Generation'' (FSE-NG). WebRTC's 49 105 A. WebRTC 106 WebRTC [7] is a standard that comprises an extensive col-107 lection of protocols and Application Programming Interfaces 108 (API), providing real-time peer-to-peer communication and 109 data transfer between web browsers. Historically, there was 110 a tendency for real-time communication software to rely 111 on proprietary protocols and third-party plugins. WebRTC 112 presents a break from this pattern, letting applications com-113 municate unconstrained in the browser. 114 The WebRTC W3C Working Group 1 is responsible for 115 defining the APIs that applications can use to control the com-116 munication via javascript. The IETF Working Group named 117 Communication in Web-Browsers (RTCWEB) 2 is responsi-118 ble for defining the protocols, data formats and other essential 119 facets needed to enable real-time peer-to-peer communica-120 tion in the browser.

121
A handful of protocols and technologies are imposed by 122 what WebRTC needs to offer in terms of services and func-123 tionality. WebRTC uses the Real-time Transport Protocol 124 (RTP) [8] for media transmission and the Stream Control 125 Transmission Protocol (SCTP) [9] to transmit arbitrary appli-126 cation data. These protocols are multiplexed over a sin-127 gle User Datagram Protocol (UDP) [10] connection. While 128 WebRTC requires that all data be encrypted, vanilla RTP and 129 SCTP are not encrypted. Therefore, WebRTC uses SRTP [11] 130 (a secure version of RTP) and encrypts SCTP. Datagram 131 Transport Layer Security (DTLS) [12] is used for key 132 management. SCTP's CC is based on TCP's CC [9], [13], and is always 136 applied to the entire SCTP association and not to individual 137 SCTP streams. The transmission rate is determined by the 138 receiver window (RWND) and congestion window (CWND), 139 of which the minimum is used. RWND is the amount of data 140 the destination side can receive. CWND is the amount of data 141 the SCTP sender can transmit into the network before receiv-142 ing an acknowledgement (ACK). As in TCP, the four central 143 algorithms of SCTP's CC mechanism, which determine the 144 value of CWND, are Slow Start, Congestion Avoidance, Fast 145 Retransmit, Fast Recovery. 146 2) VIDEO CHANNEL 147 RTP alone provides simple end-to-end delivery services for 148 multimedia. Therefore, WebRTC must also incorporate a CC 149 mechanism for RTP. Currently, three different congestion 150 control mechanisms are being considered for RTP flows in 151 WebRTC: Google Congestion Control (GCC) [6], Network-152 Assisted Dynamic Adaption (NADA) [14] and Self-Clocked 153 Rate Adaptation for Multimedia (SCReAM) [15]. In this 154 paper, we only focus on GCC because it is used by two 155 prominent web browsers: Chrome (with its open-source 156 counterpart Chromium) and Firefox. the same bottleneck while at the same time being easier to 210 implement than the CM. As opposed to CM, the FSE utilizes 211 the flows' congestion controllers by having them share infor-212 mation amongst each other instead of removing them. The 213 mechanism has already shown promise in [1] and [2] when 214 implemented with homogeneous CC mechanisms but so far 215 has not been tested on heterogeneous CC mechanisms. 216 Two other mechanisms stem from the original FSE imple-217 mentation that try to couple NADA and SCTP flows. 218 ''Reduction of Self Inflicted Queuing Delay in WebRTC'' 219 (ROSIEEE) [3] is a mechanism that limits queuing delay in 220 WebRTC by coupling NADA and the SCTP congestion con-221 trol. As opposed to other mechanisms like [4] and [17] that 222 control the congestion window explicitly, the authors of [3] 223 propose to only calculate a maximum congestion window 224 CWND max for SCTP based on the rate calculated by NADA. 225 The algorithm itself uses the change in send rate R i and 226 RTT i -which is the RTT received from NADA every time 227 an RTCP message i is received-to gradually converge to a 228 maximum allowed SCTP sending rate that is later converted 229 to CWND max . 230 While this mechanism does, in fact, couple the WebRTC 231 congestion controllers, it does not provide the possibility to 232 prioritize the different flows, which is an essential require-233 ment for WebRTC. Accordingly, FSE-NG [5] combines the 234 active FSE from [4] with the ROSIEEE algorithm to support 235 the prioritization of flows while still being able to couple and 236 manage both loss-based and delay-based flows. As with the 237 original FSE, FSE-NG also calculates a sum of rates S_CR 238 and assigns it based on the priority of the flows in the FG. 239 The mechanism does not use information from the loss-based 240 flows when calculating S_CR. To calculate the upper limits 241 for the SCTP flows, it shares S_CR and splits it amongst the 242 SCTP flows in the FG.

245
In this section, we introduce our testbed in section III-A 246 which we use in all our tests, and then present a GCC vs. 247 SCTP fairness issue by exploring how these two mechanisms 248 compete under different network settings in section  This problem further motivates the use of a coupled con-250 gestion control mechanism, on top of the earlier mentioned 251 benefits attainable with congestion control coupling (lower 252 delay and packet loss, and precise control over the per-flow 253 rate share).

254
A. TESTBED 255 Figure 1 shows the topology used in our experiments. It con-256 sists of three physical machines: a WebRTC sender, a receiver 257 and a router. The sender and receiver are both connected to 258 the router with Ethernet cables. The three nodes are equipped 259 with Linux version 5.11.0 (router), 5.13.0 (sender) and 260 5.15.18 (receiver). Two of the nodes are running one session 261 each of the Chromium browser (Linux 64-bit 100.0.4896.12) 262 with an instance of a WebRTC test application, acting as 263 FIGURE 1. The testbed topology. Three nodes-one router performing traffic shaping and two nodes running Chromium-are connected through Ethernet cables. The sender node sends commands to the traffic shaper via ssh and hosts the signalling server, connecting sender and receiver applications.
sender and receiver. The receiver is only there to passively 264 receive any streams coming from the sender and is running is disabled for the media streams. We also set the video codec 282 to be VP8 [23], which yields a maximum possible bitrate of however, it does eventually adapt and stay at around 2 Mbps. 296 We can see that, in general, GCC achieves a reasonably 297 high throughput of around 2 Mbps regardless of which flow 298 starts first, while the SCTP flow utilizes the rest of the link's 299 capacity.

300
On the other hand, when limiting the bottleneck capacity 301 to 5 Mbps, GCC is not able to compete with SCTP at all 302 and is starved, as the plots in Figure 3 show. While most 303 users from countries in the western world usually will have 304 a much higher bandwidth than 5 Mbps and may therefore 305 rarely notice this behaviour, it may be problematic for users 306 in countries with poor internet service.

307
Recent performance evaluations of GCC [24], [25] show 308 that GCC can aggressively compete against TCP-like con-309 gestion controls, which implies it should also be able to 310 compete with SCTP. However, we could not replicate the 311 same behavior in our testbed. As a sanity check, we tried to 312 use the same topology and settings as described in [25], but 313 GCC was still starved when competing with long-lived SCTP 314 or TCP flows over the same bottleneck.

315
This problem motivates us to investigate if a coupled CC 316 mechanism can fairly allocate the rates between SCTP and 317 RTP on low capacity links. 320 We started our endeavor with an implementation and evalu-321 ation of the FSE in the Chromium browser. Being restrained 322 to media flows, the direct benefits that can be attained with 323 this first algorithm only apply to rather limited use cases, 324 e.g. when simultaneously transferring video from a mobile 325 phone's front and back camera. Use cases will become more 326 realistic (e.g., screen/data and video sharing) when we come 327 to the extensions of FSE that couple the media and data 328 channels.

329
A. ALGORITHM 330 We briefly introduce the FSE algorithm since it serves as the 331 basis for other coupling solutions in this paper. The FSE can 332 be described as a manager that receives information from the 333 different flows and calculates a new send rate for each flow 334 based on all the information. When a flow starts, it registers 335 itself with the FSE and a Shared Bottleneck Detection (SBD) 336 element (in our case, simply the use of the same 5-tuple), 337 and when it stops, it deregisters from the FSE. When a flow 338 registers itself, the SBD will assign it to a Flow Group (FG) 339 by giving it a Flow Group Identifier (FGI). A flow group is 340 defined as a set of flows that share the same bottleneck and 341 thus should exchange information with each other. Whenever 342 a flow's congestion controller calculates a new rate, the flow 343 executes an UPDATE call to the FSE with the newly calcu-344 lated rate as a parameter.            When a flow f starts, FSE_R is initialized with the initial 376 rate determined by f's congestion controller. After the SBD 377 assigns the flow to an FG, it adds its FSE_R to S_CR. S_P ← S_P + P(f ) 6: if FSE_R(f ) < DR(f ) then 14: if TLO×P(f )

396
The FSE can allocate rates based on the flows' priorities with-397 out requiring any modification in the congestion controller. 398 This is shown in fig. 5 where the priorities of the two RTP 399 flows are set to 1 and 2, respectively. To show that flows limited by the desired rate share their 402 leftovers with other flows, we ran an experiment with two 403 flows, one with the desired rate configured to 0.75 Mbps and 404 the other without a limited desired rate. Figure 6 shows that 405 the first flow never exceeds 0.75 Mbps. The FSE allocates the 406 leftover bandwidth to the second flow. As we have shown, the FSE works well for media flows-409 it improves fairness and offers possibilities for sharing left-410 over rates and prioritizing flows. However, the main prob-411 lem plaguing WebRTC congestion control is the way SCTP 412 affects GCC, which necessitates a mechanism that also incor-413 porates SCTP flows. The most glaring limitation with the FSE 414 is that it is only designed for media flows and therefore cannot 415 directly be used to couple RTP and SCTP flows. We try to stay as faithful to the pseudo-code and explanations 427 in the original paper [5] as possible. However, we found some 428 parts of the algorithm description in [5] to be ambiguous or 429 lacking detail; for those cases, we choose the approach that 430 seems to work best in practice.  Table 2 provides an overview of the variables used in this 442 section.  when it needs to update f . FseNg adds the initial rate to 452 S_CR upon registration and creates a new RateFlow object 453 to store P(f ), Update_CC and DR(f ). The set of RateFlows 454 also stores a pointer to the RateFlow object. The callback 455 function is simply a function that receives a rate and sets 456 the value of the current estimate inside the GCC class that 457 interacts with FseNg. The flows are also assigned a unique 458 flow id by FseNg. 459 2) REGISTERING SCTP FLOWS 460 The WebRTC library uses a class called UsrsctpTransport 461 to interact with the usrsctp library. In the class, there is a 462 method called Connect which is called when a new SCTP 463 association is being made. Accordingly, we choose to register 464 SCTP flows in that method. Upon registration, the Usrsctp-465 Transport object of flow f sends in the initial CWND max (f ), 466 a callback function Update_CC(f ) that gets called by FseNg 467 to set CWND max (f ) later, and lastly a flow priority P(f ). The 468 CWND max (f ) is stored so that FseNg may reset CWND max (f ) 469 in cases where all RateFlow's deregister. In such a case there 470 is no CC information being reported to FseNG and it should 471 let SCTP flows use their default CWND max . The SCTP flows 472 are also assigned a unique flow id by FseNg.  for f in rate_flows do 3: if if IsEmpty(cwnd_flows) or AllAppLimited(rate_flows) then 3: for f in rate_flows do 9:

11:
S_RTP_CR ← S_RTP_CR + FSE_R(f ) 12: end for 13: 15: for f in cwnd_flows do 16: end for 20: if IsEmpty(cwnd_flows) then Here, we detail our extensions that deviate from the algorithm 503 description in [5]. The final version of the update algorithm 504 is shown in Algorithm 3.

506
The original paper [5] does not specify how to handle situa-507 tions where the S_CR is larger than the sum of desired rates,  The original FSE-NG algorithm uses the same DR for all 521 the RTP flows; however, the maximum bit rate of RTP flows 522 may vary. For instance, the WebRTC Javascript API offers 523 an RTCRtpEncodingParameters object which lets the appli-524 cation set the maximum bit rate of the underlying RTP trans-525 mission of a mediaStreamTrack. Consequently, we extend 526 the original algorithm by requiring each update call to also 527 provide the flow's current DR. FseNg uses the individual 528 flow's last reported DR instead of a shared global DR value 529 when allocating bandwidth to the RTP flows. 3) CHOOSING CC_R 531 We make some adjustments when implementing the FSE-NG 532 updates because of an inherent difference in how NADA 533 and GCC work. NADA combines loss, delay, and ECN 534 into a single aggregated value called ''composite conges-535 tion signal'' [14]. When coupling NADA flows in the 536 FSE-NG, each NADA flow updates FSE-NG with the aggre-537 gated value as CC_R which FSE-NG then sets to FSE_R 538 instead.

539
GCC, on the other hand, maintains two separate esti-540 mates, one based on loss (As_hat) and one based on delay 541 (A_hat). The final rate used is min(As_hat, A_hat). In the 542 GCC implementation two classes are responsible for main-543 taining the estimates, SendSideBandwidthEstimation and 544 AimdRateControl. AimdRateControl maintains the delay 545 based estimate. SendSideBandwidthEstimation is responsi-546 ble both for maintaining the loss-based estimate and then set-547 ting the final target based on the most conservative of the two 548 values.

549
In our tests, we found that the A_hat will always be the 550 most conservative value, and hence use A_hat as the rate that 551 will be reported to FSE-NG.

553
This section presents results from experiments performed 554 with the evaluation testbed (see Section III-A). Firstly, 555 we look at simple scenarios where the mechanism works as 556 intended; then, we highlight some issues. We can trace some 557 problems back to design flaws in the mechanism, while others 558 arise because we implemented it with GCC while FSE-NG 559 was originally designed to work with NADA. The IETF 560 RMCAT Working Group developed test cases to evaluate 561 real-time media flows in [26]. In accordance with [26], we use 562 a bottleneck queue length of 300ms in all our tests (with the 563 exception of fig. 11, as we will explain in section V-C2.a). 564 We have also run tests with different queue lengths, which 565 yielded similar results.  568 We start with the simplest case of two RTP flows to test the 569 efficacy of the FSE-NG mechanism. Figure 7 shows that the 570 effect of coupling two RTP flows with the FSE-NG is similar 571 VOLUME 10, 2022    To show that FSE-NG correctly handles and enforces priori-595 ties, fig. 10 presents sending rate plots of 2 RTP and 1 SCTP 596 flows with different priority configurations. It can bee seen 597 from fig. 10a to 10c that FSE-NG allocates rates based on the 598 flows' priorities when the flow group is heterogeneous. has not yet measured an RTT and initializes the RTT to a 603 default of 200 ms. We have also identified that it may take 604 several seconds before an actual RTT is registered. In sce-605 narios where there are registered GCC flows in the FSE-NG 606 before any SCTP flows, this is not a problem since it gives 607 ample time for GCC to find approximately the base RTT 608 value. On the other hand, in cases where SCTP flows are 609 registered before or simultaneously with any GCC flows, and 610 the real base RTT is lower than 200 ms, the SCTP flow gets 611 a much higher rate allocated by the FSE-NG than it should. 612 This problem leads to GCC being out-competed by SCTP in 613 the first few seconds of the transmission. The phenomenon 614 is shown in fig. 11 with a high initial SCTP rate spike even 615 though both flows are coupled and should be getting their 616 fair share each. The RTT used in the figure is 50 ms, and, 617 deviating from the common 300 ms configuration, we con-618 figured the router queue to 150 ms to ensure that the total 619 measured RTT stays low enough. We also ran an experiment 620 with a 100 ms RTT and found the same issue. Since FSE-NG only uses the congestion signals generated by 623 GCC, S_CR stays at 0 as long as no RTP flows are registered, 624 even though SCTP flows might be running and using a large 625 share of the capacity. One side effect of this design is that 626 when RTP flows start later than SCTP flows, the SCTP flow 627 gets drastically pulled down when the an RTP flow registers. 628 The phenomenon is illustrated in fig. 12-when the RTP flow 629 starts, SCTP gets dragged down to around 750 Kbps; it should 630 be close to 1.5 Mbps. The issue is that S_CR will start at 631 the initial RTP rate, and the flows are both limited for some 632 time until S_CR has grown enough for them to utilise the 633 bandwidth. The impact of this problem could get slightly 634 mitigated by the aforementioned RTT problem, because it 635 accidentally gives SCTP a much higher CWND max than it is 636 supposed to. However, this is not something that the FSE-NG 637 mechanism is in control of, and it should therefore not rely on 638 the default reported RTT being much higher than the RTT base .    depending on how close to convergence the rate appears to be.

646
In fig. 13, GCC is carrying out a multiplicative increase from back to its DR. Naturally, this makes convergence for SCTP 656 very slow; in the scenario of Figure 13 it takes approximately 657 30 seconds from SCTP's start until the total capacity of 658 6 Mbps is utilized.

D. DERIVED IMPLICATIONS OF THE FSE-NG MECHANISM 660
This section summarizes the design issues and limitations of 661 FSE-NG that we found when in our implementation using 662 GCC.

663
• In the beginning, GCC has not yet gotten a realistic RTT 664 report; therefore, it reports the default RTT of 200ms 665 to FSE-NG; this leads to unfair bandwidth allocation 666 between RTP flows and SCTP in the first couple of 667 seconds ( fig. 11).

668
• When SCTP flows start before any RTP flows, the 669 FSE-NG will significantly throttle them once any RTP 670 flow begins ( fig. 12).

671
• Because GCC is limited to a maximum rate change of 672 8% no matter the conditions, only using GCC's rate as 673 input leads to a very slow SCTP convergence when the 674 link has a high capacity ( fig. 13).

675
• Since FSE-NG originally was designed to assume that 676 all RTP flows will have the same desired rate, it does not 677 share leftovers between RTP flows in cases where one 678 flow is limited to a given desired rate and another one is 679 not (fixed by our extension described in section V-B2 it exacerbates the problem explained in Section V-C2.a. 689 We decided not to change this part of the algorithm 690 for our implementation since the advantages seem to 691 outweigh the disadvantages; it is also more faithful to 692 the original algorithm to leave it as is.  We also introduce fixes for the issues of SCTP being dragged 733 down upon GCC registration (see fig. 12), and the slow SCTP 734 convergence (see fig. 13): these issues stem from the fact 735 that only GCC is responsible for the rate growth of both 736 mechanisms. Consequently, both of these issues can be solved 737 by also letting SCTP report a rate and add to S_CR growth. 738 Firstly, this fixes the issue of SCTP getting dragged down 739 when it starts before any RTP flows because SCTP will 740 already have converged to a rate reasonably close to the link's 741 capacity, which then can be shared with the newly registered 742 RTP flow. Secondly, the very slow SCTP convergence when 743 the RTP flow is application limited is fixed because SCTP 744 contributes to S_CR alongside RTP.

745
When both controllers contribute to S_CR, it grows much 746 quicker. However, the FSE-NG's reduction of delay is based 747 around only letting GCC control any rate increases. To ensure 748 GCC is allowed to control any rate increase and keep the 749 delay low when necessary, we therefore compromise by only 750 adding SCTP's relative rate change under two conditions: 1) 751 if there are no RTP flows registered, or 2) if S_CR is large 752 enough to give all registered RTP flows their DRs. This fix 753 also improves the initial start-up of the GCC flow because 754 SCTP has already contributed to the aggregate S_CR.

756
In the original algorithm, we discovered a bug which led to 757 the rate change reported by GCC flows to be added twice 758 when all GCC flows are application limited or when there 759 are no registered SCTP flows. As we have discussed, this bug 760 mitigated the slow SCTP convergence problem arising after a 761 later-joining GCC flow, but since this problem is now fixed, 762 it is also safe to eliminate this bug and ensure that the rate 763 change is added only once.

765
The implementation of the Extended FSE-NG algorithm is 766 very similar to FSE-NG's implementation; in this section, 767 we will explain the extensions and changes. This implements the Extended FSE-NG as a sin-778 gleton class. It stores the same state as FseNg, but 779 has a slightly changed update algorithm, and a new 780 method allowing SCTP to also send rate updates.    for f in rate_flows do 5:

S_P
, DR(f )) 6: Update_CC(FSE_R(f )) f S_CWND_CR ← S_CR − S_RTP_CR 10: for f in cwnd_flows do 11: if IsEmpty(rate_flows) or AllAppLimited(rate_flows) then The original FSE-NG solely relies on the delay-based flow to 856 drive the rate calculation and leaves SCTP passive. As we 857 have discussed, a delay-based flow can also benefit from 858 receiving information from a loss-based flow. However, 859 FSE-NG is designed around the core concept that only the 860 delay-based flow should lead, and therefore possibilities for 861 incorporating SCTP updates are limited. This motivates us to 862 investigate a different avenue where both types of flows are 863 treated equally from the get-go.

864
VII. COUPLING, PART 4: FSEv2 865 We design and implement Flow State Exchange v2 (FSEv2), 866 a new coupling mechanism for heterogeneous congestion 867 control mechanisms that is based on lessons learned in the 868 preceding sections. Our previously discussed extensions to FSE-NG mitigate 871 some of FSE-NG's problems by using SCTP's rate changes 872 under certain conditions. However, this works only for cases 873 where capacity is large enough to accommodate all desired 874 rates anyway. Furthermore, one potential problem with the 875 FSE-NG mechanism is that it makes both types of coupled 876 flows solely rely on GCC's ability to compete against other 877 flows. To try a different approach, we base our new mech-878 anism on the idea that the loss-based mechanism should be 879 more active in the coupling process. That is, it should be 880 allowed to contribute to rate changes, while still expecting 881 that the delay-based mechanism will keep queuing delay 882 down. The design is primarily based on the FSE mechanism 883 but with support for loss-based mechanisms added. The basis 884 for basing the update algorithm on FSE rather than FSE-NG 885 is firstly that FSE-NG assumes all GCC flows have the same 886 DR, thus not supporting sharing of leftover rate between GCC 887 flows. Secondly, our mechanism couples SCTP in an inher-888 ently different way to FSE-NG by using the actual CWND 889 of the flows as opposed to FSE-NG, which only sets the 890 CWND max values for SCTP flows. Thus, FSE-NG concepts 891 like, for instance, keeping track of RTT base are no longer rel-892 evant. Loss-based flows are treated similarly to delay-based 893 flows, except that we do not take any DR into consideration 894 for them.

896
We now delve into more concrete details about the new 897 mechanism and its implementation. The RateFlow is also reused for the new mechanism to 900 represent GCC flows. Here is an overview of the new classes 901 that we created:  Table 4.   flow is performing the update, we assume that it has already 942 converted the reported CWND value to CC_R(flow). Firstly, 943 S_CR is updated based on the CC_R(f ) by adding the sum of 944 the difference between CC_R(f ) and the flow's previously 945 allocated rate FSE_R(flow). Then, in lines 3-11, the algo-946 rithm calculates the total sum of priorities S_P by adding 947 all priorities of both types of flows; it also initializes all 948 allocated rates to zero. Next, in lines 14-32, the algorithm 949 simultaneously allocates rates to all GCC and SCTP flows 950 while ensuring that application-limited GCC flows do not get 951 more than their desired rate. The leftover rate is shared fairly 952 between all the other flows. The allocation for the GCC flows 953 is the same as the original FSE algorithm (see Algorithm 1). 954 However, an extra loop is added in lines 28-33. Finally, in 955 lines 33-39, when all flows have been allocated a rate, the 956 rates are distributed to the flows. When FseV2 sends updates to GCC flows, both the delay 959 based estimate and the loss based estimates are updated 960 with the FSE_R. The loss based estimate is calculated in 961 a different GCC class called SendSideBandwidthEstima-962 tion than the delay based estimate, which is calculated in 963 AimdRateControl. Both of these classes are controlled by 964 yet another class called GoogCc, which is responsible for 965 tying the various GCC components together. Accordingly, 966 when FseV2 sends updates to GCC, they are sent to GoogCc 967 which then relays the information to AimdRateControl and 968 SendSideBandwidthEstimation so that both estimates are 969 updated to FSE_R. When an SCTP flow f changes the CWND it sends an 972 update to FseV2 containing CC_CWND(f ) and last_rtt(f ). 973 CC_CWND(f ) is converted to CC_R(f ) and Algorithm 6 974 is executed with CC_R(f ) as input. FseV2 sets the actual 975 CWND when distributing rate updates. To accommodate for 976 this we added another function called set_cwnd to the usrsctp 977 library; set_cwnd is called from UsrsctpTransport. Because the mechanism sets the actual CWND, which is 980 tied to the state of SCTP's CC mechanism, we consider the 981 following before setting CWND to FSE_CWND. is stored in bytes in the usrsctp library, it only 984 increases and decreases by the segment size number of 985 bytes. Therefore, before setting the CWND, we round 986 FSE_CWND down to the closest number of whole 987 segments.

S_P
≥ DR(f ) then 19: 20: else 23:  1009 We carry out experiments to evaluate the three heterogeneous 1010 CC coupling mechanisms that we presented in section V to 1011 VII. We present results from experiments performed with 1012 the evaluation testbed described in section III-A. The IETF 1013 RMCAT Group developed test cases to evaluate congestion 1014 control mechanisms for real-time media flows in [26]. Our 1015 test cases are inspired from [26] --some cases are extended 1016 or modified to accommodate the fact that we are coupling 1017 heterogeneous flows. This section describes the general con-1018 ditions surrounding the experiments. In accordance with [26], 1019 we use a bottleneck queue length of 300 ms in all our tests. 1020 The total run time of the experiments is 120 seconds. FIFO 1021 is used as the bottleneck queue type, and no artificial packet 1022 loss or jitter are added along the path. 1023 We consider the following evaluation metrics:

1024
• Sending rate, as observed by capturing the packets being 1025 sent on the sender node's interface with tcpdump.

1026
• Throughput, as observed by capturing the packets arriv-1027 ing on the receiver node's interface with tcpdump.

1028
• Bandwidth utilization, the ratio between available 1029 capacity and average throughput.

1030
• Delay, gathered by logging the measured RTT in the 1031 GCC and SCTP Chromium code.

1032
• Jain's fairness index [29]. When there are n flows, where 1033 x i is the throughput for the ith flow, the fairness is rated 1034 with the following formula: The result ranges from 1 n to 1, with the former being 1037 the worst result and the latter being the best. A result 1038 of 1 means that all flows receive the same allocation, 1039 while a result of 1 n means that one flow receives all the 1040 allocation.

1041
• RTP (GCC) packet loss. The number of RTP packets 1042 dropped during an interval of 500 ms is gathered through 1043 the WebRTC JavaScript API in the test application. We begin by examining the behavior of two heterogeneous 1047 flows having equal priorities. This experiment aims to assert 1048 that the given coupling mechanism can solve the essential 1049 issue of assuring fairness between data and media flows in 1050 WebRTC. We expect the link to be shared fairly between the 1051 two flows in this test case. Specifically, Jain's fairness index 1052 should stay close to 1 whenever both flows are registered 1053 in the coupling mechanism and transmitting. We also expect 1054 that the coupling mechanism prevents SCTP from filling the 1055 queue to make sure queuing delay is within acceptable levels.  Table 5 shows metrics based on the average results. The

1073
The delay box plots in fig. 16    Average results based on 10 runs of the test case when GCC has P = 2 and SCTP has P = 1, the bottleneck has a capacity of 2 Mbps, 100 ms RTT and a 300 ms queue. We only consider the time intervals when both flows are running at the same time.   Figure. 17 shows sending rates and RTT from one exper-1113 iment. All three mechanisms do seem to experience some 1114 rate oscillations. For FSE-NG and extended FSE-NG, the 1115 oscillations follow a pattern of being stretched for longer 1116 periods, though FSE-NG's oscillations are more extreme. 1117 In the case of FSE-NG, this is due to the bug which adds 1118 GCC's rate increases and decreases twice (see section V-D). 1119 Because rate decreases become twice as large, FSE-NG has a 1120 much lower bandwidth utilization than the other mechanisms. 1121 However, we can see that FSE-NG keeps the delay lowest; 1122 extended FSE-NG experiences a bit more delay while FSEv2 1123 has the most delay. Tests with larger queues also provided 1124 similar or very close results.

1125
FSEv2's extra delay can be traced back to the fact that 1126 SCTP is also allowed to send rate updates to the manager; 1127 however, GCC seems to prevent SCTP from increasing the 1128 delay too much, making sure the delay is within an acceptable 1129 range. As fig. 17c shows, this also leads to quite a large initial 1130 delay increase when the SCTP flow is in the slow start phase. 1131 Figure. 18 shows the the throughput for both flows 1132 when GCC's priority is set to 1 and SCTP's priority varies 1133 from 1 to 0.1. The throughput values for each different 1134 priority configuration in the plot are based on the average 1135 value of 10 different runs to ensure statistical significance. 1136 The mechanisms are able to distribute the rate according to 1137 varying priorities.      honored. This is because the media encoder is not able 1169 to change the video quality as quickly as the target rate 1170 changes. The FSE-NG based mechanisms do not experience 1171 this because they set the upper limit of the CWND and do 1172 not receive the rapid rate updates from SCTP. Avoiding this 1173 behavior with a mechanism that receives updates from SCTP, 1174 for instance, by skipping SCTP's Slow Start mode when it 1175 registers after GCC flows, would likely lead to slow con-1176 vergence on higher capacity links; therefore it is a necessary 1177 trade-off. This test case aims to evaluate how well the mechanisms 1181 allow SCTP to utilize available bandwidth when there is 1182 enough capacity to satisfy RTP flows. In this test case, the 1183 GCC flow is configured to have a DR of 1.5 Mbps, and 1184 both flows are given equal priority. It is expected that GCC's 1185 throughput will converge to a stable rate of 1.5 Mbps. SCTP 1186 should be able to quickly converge to around 3.5 Mbps since 1187 the total capacity is 5 Mbps. The coupling mechanism should 1188 also ensure that delay is kept low despite SCTP sending at a 1189 higher throughput than GCC. The bottleneck capacity in this 1190 scenario is 5 Mbps and one-way propagation delay is 50 ms. 1191 Figure. 20 shows the throughput and delay for the mech-1192 anisms when the SCTP flow is started before the GCC 1193 flow. In this case, some difference between the mechanisms 1194 is visible. Firstly, in fig. 20a, FSE-NG has two problems; 1195 1) when the GCC flow starts, the SCTP flow's sending rate 1196 gets dragged all the way down to 1 Mbps (see Section V-C2b), 1197 2) SCTP recovering convergence afterwards is very slow, tak-1198 ing approx. 20 seconds (see Section V-C2c). Our Extended 1199    fig. 21b show FSE-NG keeping delay lower than the other 1219 two mechanisms. We can trace FSE-NG's lower delay back 1220 to the fact that the mechanism limits SCTP to a higher degree 1221 than the other mechanisms by not allowing it to send rate 1222 updates.

1224
When the desired rate limits the RTP flow, and the link's 1225 capacity is large, the convergence time for SCTP can become 1226 very long. Both FSE-NG and our Extended FSE-NG are not 1227 able to solve this problem (see fig. 13 for FSE-NG and fig. 14 1228 for Extended FSE-NG). Because FSEv2 takes rates from both 1229 GCC and SCTP flows, it can be seen from fig. 22 that FSEv2 1230 fixes this problem.

IX. CONCLUSION
In this paper, we have shown how the design of using two 1233 different congestion control mechanisms within two differ-   Our code is open source, and freely available from [28]; 1265 we believe that these implementations should serve as a good 1266 basis for code in widely-used WebRTC-capable browsers.

1267
Regarding the choice of algorithm, we recommend FSEv2.

1268
While our results have shown that, due to its heavier reliance 1269 on SCTP rate updates, this algorithm does not always consis-1270 tently perform best, e.g., in terms of delay, the differences 1271 are miniscule. FSEv2 is, however, the only algorithm that

1281
The present work has only focused on coupling flows 1282 running between two peers. One possible avenue for fur-1283 ther research is to explore a scenario with several peers, 1284 e.g., conference call applications where all flows sent from 1285 a given peer or several peers to one or more destinations 1286 may share the same bottleneck. As future work, we plan to 1287 investigate such scenarios using a shared bottleneck detection 1288 method [30], [31] to infer which flows share a common 1289 path. Such an extension could greatly amplify the benefits 1290 attained with these coupling mechanisms, since they would 1291 then operate on a much larger number of flows. respectively. He has been a Full Professor at the 1407 University of Oslo, Norway, since 2009. He has 1408 been active in the IETF and IRTF for many years, 1409 such as by chairing the Internet Congestion Con-1410 trol Research Group (ICCRG) leading the effort 1411 to form the Transport Services (TAPS) Working 1412 Group. He has also participated in several Euro-1413 pean research projects, including roles such as a co-ordinator and a technical 1414 manager. His main research interest includes transport layer.

1415
TOBIAS FLADBY received the M.Sc. degree in 1416 computer science from the University of Oslo, 1417 Norway. He is currently a Software Engineer 1418 at Cisco, Norway. His research interests include 1419 performance analysis of transport protocols and 1420 WebRTC.