Coupled Multipath BBR (C-MPBBR): A Efficient Congestion Control Algorithm for Multipath TCP

Multipath transmission control protocol (MPTCP) is a promising transport layer protocol that enables a device to utilize multiple communication interfaces simultaneously, thereby achieving high throughput. A congestion control algorithm (CCA) employed in MPTCP constitutes a key part that controls the data flow through different subflows (SFs). There are two fundamental challenges associated with MPTCP CCAs. First, MPTCP flows should have an advantage over single-path flows; second, MPTCP flows should be fair, indicating that SFs sharing a common bottleneck should occupy a capacity fairly close to that occupied by a single-path flow. Several MPTCP CCAs have been developed; however, they have failed to satisfy these challenges in all scenarios. Recently, Google has introduced the bottleneck bandwidth and round-trip-time (BBR), a new CCA for single-path TCP, achieving high throughput with minimum delay by employing a network model. In the present paper, we propose a novel MPTCP CCA based on BBR named coupled multipath BBR (C-MPBBR) that satisfies the fundamental challenges by exploiting the concept of network modeling in BBR. C-MPBBR addresses the first challenge by closing the low-bandwidth SFs by tracking the delivery rate and bottleneck bandwidth (BtlBW). Then, it satisfies the second challenge through identifying those SFs that share a common bottleneck and dividing the BtlBW share corresponding to a SF among them. We implemented C-MPBBR in the Linux kernel, tested it on a wide range of scenarios by the Mininet emulation experiments, and the real-world Internet, and confirmed that the proposed C-MPBBR outperforms the existing MPTCP CCAs in terms of successfully satisfying the fundamental challenges by ensuring both throughput and fairness.


I. INTRODUCTION
The techniques facilitating the use of devices with multiple communication interfaces, such as 4G/5G and WiFi, have greatly advanced recently. It is anticipated that the simultaneous use of multiple interfaces will substantially improve the Internet experience, specifically, the quality of service (QoS) [1]. Although the existing transmission control protocol (TCP) does not support the concurrent use of multiple interfaces [2], an extension of TCP referred to as multipath TCP (MPTCP) [3] has attracted increasing attention as it enables multipath support by introducing small modifications in the transport layer. MPTCP defines a path between The associate editor coordinating the review of this manuscript and approving it for publication was Yulei Wu . a pair of interfaces corresponding to two MPTCP terminals as a subflow (SF), i.e., each of the paths is designated as SF.
Congestion control (CC) is one of the key mechanisms in MPTCP as it enables avoiding network congestion by controlling the amount of transmitted traffic over each SF based on the network condition. There are two fundamental challenges associated with MPTCP CC [4]- [6]: • Goal 1: To improve throughput, MPTCP should perform at least as a single-path flow performs on the best available path; in other words, MPTCP should always ensure an incentive over single-path flows.
• Goal 2: Provide fairness in terms of bandwidth (BW) usage. When sharing a common bottleneck, the total capacity occupied by the MPTCP SFs sharing that VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ bottleneck should be fairly close to that of a single-path flow; in other words, MPTCP flows should be fair. Several MPTCP CC algorithms (CCAs), such as LIA [6], OLIA [7], BALIA [8], D-LIA [9], and Couple+ [10], have been proposed to address the fundamental challenges. However, they mainly focus on the second goal, which leads to the incapability of satisfying the first goal in most scenarios [11]. Moreover, their strategies mainly rely on modifying the legacy CCA in TCP (for example, Reno [12]) to fit MPTCP. As a result, the existing CCAs inherit the problems of the conventional loss-based CCAs. In addition, the issues get multiplied due to the multipath management of MPTCP. For example, a packet loss in a SF may often result in the severe degradation of performance in other SFs [13], [14].
Recently, Google has proposed the bottleneck bandwidth and round-trip-time (BBR) algorithm for single-path TCP, a new CCA that is aimed at avoiding congestion and packet losses by actively preventing persistent queue formation at the bottleneck [15]. BBR is used to ensure full BW utilization with lower delay. BBR not only controls the congestion window (CWND) but also regulates the sending rate to establish a more stable data transmission environment even at the presence of packet losses. BBR persuades a network model by sequentially measuring the bottleneck BW (BtlBW), minimum round-trip time (minRTT), and the delivery rate (DelRt). It identifies the sending rate based on the network model. Google reported [15] that BBR allowed reducing the YouTube's median RTT by 80%, improved BW of B4 [16] by 133 times, and reached the Kleinrock's optimal operating point [17]. Therefore, it is anticipated that the proper implementation of BBR in MPTCP can considerably enhance the performance of MPTCP.
Inspired by the promising performance of BBR in single-path TCP, several research groups developed different multipath BBR implementations [18]- [21]. However, in [18], [19], an uncoupled approach focused mainly on Goal 1 (Improve throughput) was introduced, whereas in [20], [21], a coupled method concentrating on Goal 2 (Fairness) was implemented. Moreover, the coupled algorithms introduced in [20]- [22] were mainly grounded on the MPTCP's existing coupled CCAs. As a result, they partially inherited the disadvantages of the MPTCP CCAs. Therefore, none of them satisfied the two fundamental goals of MPTCP. Although BBR could be used to represent a brief overview on a network by estimating BtlBW and minRTT, none of the proposed approaches implied utilizing these advantages.
Concerning Goal 2, those MPTCP flows that share the same bottleneck should occupy a total capacity fairly close to that of a single-path TCP flow. A simple way to realize it is to identify the SFs that share an identical bottleneck and divide the available capacity among them. To accomplish this, a key factor is to successfully identify a common bottleneck. BBR has a unique mechanism to sequentially measure BtlBW that constitutes the BW of a bottleneck router of that flow. In the case of proper application, this can be helpful in identifying a common bottleneck.
Concerning Goal 1, MPTCP always needs to provide the benefit over single-path TCP, or at least, it should not perform worse. BBR provides a key advantage here as well. BBR enables a mechanism to continuously measure DelRt that is a measure of the utilized capacity of an underlying network. An efficient implementation of this measurement along with the measured BtlBW can be useful to address Goal 1.
In the present work, we propose the coupled multipath BBR (C-MPBBR), a novel MPTCP CCA that is aimed to satisfy the two fundamental challenges of MPTCP by exploiting the available properties of BBR. The main contributions of this study can be summarized below: 1. Based on the sequentially measured BtlBW in BBR, C-MPBBR introduces a novel method to identify those SFs that share a common bottleneck. 2. C-MPBBR implies measuring BW that is available to a single SF going through a common bottleneck and then, distributing that BW among all SFs sharing that common bottleneck, thereby ensuring the fairness with regard to single-path flows.

Performing the continuous observation on DelRt and
BtlBW, C-MPBBR regularly measures whether there is a benefit of using a multipath connection. If it finds that the multipath flows lag with regard to single-path ones, it closes a SF with the lowest BW. If a lag still appears, it closes the next SF with the lowest BW and so on until only one SF with the best BW remains. In this way, C-MPBBR focuses only on the best BW path and behaves similarly as a single-path flow. Therefore, C-MPBBR can satisfy Goal 1. 4. Finally, we implement MPBBR in the Linux kernel, and by conducting extensive real-world and Mininet emulation experiments in a wide range of scenarios, we have demonstrated that C-MPBBR successfully satisfies the two fundamental challenges associated with MPTCP CCAs, being a BBR-based CCA. Moreover, we compare its performance with other MPTCP CCAs, such as LIA, OLIA, BALIA, as well as the coupled multipath BBR proposed by Han et al. [21], and the uncoupled multipath BBR proposed by Nguyen et al. [18]. C-MPBBR not only outperforms its competitors in terms of throughput in most cases but also ensures better fairness with respect to single-path flows. The rest of the paper is organized as follows. Section II describes the related works. Section III discusses the basic principles of BBR in short and describes the C-MPBBR algorithm in detail. Section IV presents the comparison of the performance of C-MPBBR with those of considered CCAs. Finally, Section V concludes the paper.

II. RELATED WORKS
In this section, we briefly describe several previously proposed multipath BBR implementations and their limitations.
Nguyen et al. considered realizing an uncoupled multipath BBR in a Linux kernel [18]. They simply used the single-path BBR in multipath scenarios without any changes in the single-path algorithm and evaluated its performance in various scenarios with different packet losses. They concluded that compared with BALIA, their proposed multipath BBR demonstrated acceptable performance in all scenarios. However, they did not focus on the second fundamental challenge of MPTCP, namely, on the fairness with regard to single-path flows.
Zhang et al. introduced the other version of the uncoupled multipath BBR [19]. They claimed that the original BBR implementation relied on aggressive pacing to measure BtlBW, thereby causing delays in real-time video streaming. They considered using the different values to define the pacing rate in BBR, as well as a mechanism to decrease the pacing rate further when the bottleneck buffer becomes full. Moreover, they implemented a packet scheduling mechanism. However, as they mainly focused on performance improvement in the case of real-time video streaming, they did not consider the two fundamental challenges of MPTCP.
Zhu et al. proposed a coupled multipath BBR named wBBR [20]. They observed that their method achieved better performance compared with the other MPTCP CCAs; however, they mainly focused on satisfying the ''congestion equality principle'' [23] that did not correspond to the basic fundamental challenges of the MPTCP coupled CCAs, as proposed by IETF [6].
Han et al. proposed a coupled multipath BBR implementation focused on the fairness improvement [21]. They introduced a multiplication factor related to the BW of each SF. They conducted the experiments on the coupled BBR in a simple lossy environment and achieved better results compared with LIA, OLIA, and BALIA. However, it was unclear how coupled BBR would perform in the case of more complex scenarios and how it could always ensure at least equal or better throughput compared single-path flows; in other words, how it could realize Goal 1.

III. COUPLED MULTIPATH BBR (C-MPBBR)
In this section, we provide a brief description of the key mechanism of BBR, discuss the simple analysis of its behavior through a simple experiment, and then, we briefly describe proposed C-MPBBR CCA. We aim to address the two fundamental challenges of MPTCP. We start with describing the approach proposed to accomplish the Goal 2, then we explain the method to fulfill the Goal 1.

A. BRIEF OVERVIEW OF BBR
BBR is aimed at fully utilizing an underlying network while avoiding persistent queue formation at a bottleneck. Accordingly, it allows providing high throughput with minimum delay. A bottleneck is defined as a cluster in a network that has the lowest BW in a path between a sender and a destination. During each data connection, there is a bottleneck that determines the available BW, as well as the throughput of a network. Moreover, a persistent queue builds up here. BBR implies measuring BtlBW and minRTT periodically to actively control the flow of data. BBR mainly works in four following states: Startup -BBR starts with an exponential search for BtlBW until DelRt stops increasing; Draindrains the excess data from the network that it poured during the Startup state; Probe Bandwidth (ProbeBW) -it consists of eight cycles, a specific ratio of the extra data (determined according to the pacing rate) is inputted into the network to check whether there is a new measure of BtlBW, and then, these extra data are drained during the following cycle, while during the next six cycles, BBR sends the data at a rate based on newly measured BtlBW; Probe RTT (ProbeRTT) -BBR updates the minRTT value by releasing the bottleneck buffer by limiting the size of CWND to only four packets. BBR enters this state only if it does not find an update on minRTT during the past ten-second interval.
There are mainly three control parameters in BBR that are used to control the data flow: pacing rate, CWND, and send quantum. Based on the estimated value of BtlBW and min-RTT, BBR sets the pacing rate, CWND, and send quantum to efficiently use an underlying network.
Recently, the extensive research has been conducted to further improve the performance of BBR [24]- [30]. Moreover, Google has been actively working on developing BBR v2, an improved version of the current BBR v1 [31]. In the present work, we aim to introduce C-MPBBR on the basis of the current BBR v1. The recent improvements of BBR v1 can be easily incorporated into C-MPBBR. In addition, C-MPBBR can serve as a basis for the implementation of the upcoming BBR v2 in multipath scenarios in the near future. Fig. 1(a) illustrates a simple experimental setup deployed to test BBR. In this experiment, there were two senders and two destinations. Each sender initiated a data flow to a destination; therefore, there were two flows of data. CCA was defined as BBR. Both flows were started at the same time. Data transmission was executed for 120 seconds. Both flows shared a common bottleneck with the BW of 10 Mbps, the queue length of 2 BDP, and the random packet loss rate of 0.1%. RTT was the same in both flows. The Mininet [32] emulator was used for the experiment. Fig. 1(b) represents the CWND of BBR in this simple scenario. BBR started with the exponential growth and drained the excess queue that was created to measure BtlBW, and completed the Startup and Drain states. Next, it continued the execution launching the ProbeBW state. It also entered into the ProbeRTT state periodically. Fig. 1(c) shows the estimated BtlBW in both BBR flows that was accumulated and updated during the ProbeBW states. Although there were particular small glitches, both flows measured BtlBW almost equally. This was the reason underlying the almost equal CWND in both flows. Notably, the estimated BtlBWs in both flows were quite high than the real BtlBWs. Specifically, according to Fig. 1(a), the bottleneck link had the BW of 10 Mbps; when divided into the two flows, it allocated 5 Mbps to each flow. However, the average BW estimates measured in Flow #1 and Flow #2 were 6.93 Mbps and 7.23 Mbps, respectively.

B. SIMPLE EXPERIMENTAL ANALYSIS OF BBR
Notable results were observed when we analyzed the throughput curve, as shown in Fig. 1(d). The average throughput estimates corresponding to Flow #1 and Flow #2 were 3.6 Mbps and 3.62 Mbps, respectively, which was less by 40-50% compared with the measured BtlBW. The reason behind this could lie in the induced packet losses and bottleneck queue.
In summary, the key findings can be listed as follows: • All BBR flows equally estimated BtlBW.
• The estimated BtlBW exhibited the direct relation with throughput. As DelRt is highly related to throughput, a relation between DelRt and BtlBW can be established.
C. C-MPBBR: FULFILLING GOAL 2 As stated by BBR [15] and confirmed during the simple experiment reported in the previous section, the BBR flows estimate the available BtlBW equally. We incorporate this feature of BBR to resolve the unfairness issues in MPTCP, i.e., to fulfill Goal 2. Goal 2 implies that if two or more SFs travel through the same bottleneck, combinedly they should take a capacity fairly close to that of a single-path TCP flow. The capacity is closely related to the available BW. Goal 2 can be simply interpreted as follows: the BW occupied by all SFs sharing a common bottleneck should be close to that of a single-path TCP flow. BBR provides a unique feature to estimate the available BW, and the multiple BBR flows sharing a common bottleneck can equally detect BtlBW. If we can successfully identify SFs sharing a common bottleneck, we will be able to divide the available BW of a bottleneck sharing SF among all the SFs sharing that bottleneck. Therefore, the SFs sharing the common bottleneck will take nearly the same capacity as that of a single-path BBR flow.
To achieve this, the first challenge is to identify the SFs sharing a common bottleneck. We assume that the key to resolve this issue lies in the equal measurement of BtlBW. As we have observed previously, the flows traveling through a common bottleneck have almost equal measures of BtlBW. Therefore, if two or more SFs have close measures of BtlBW, we can consider that these SFs share a common bottleneck. Moreover, considering the small glitches observed in Fig. 1(c), we propose incorporating an error of ±α%. In addition, to ignore false-positive detections, we suggest observing the SFs with the same BtlBW for at least three consecutive PrbeBW states before making the final decision.
In summary, if two or more SFs have similar BtlBW with the deviation equal to ±α% during three successive complete ProbeBW states, then those SFs are considered to share a common bottleneck. Thereafter, we define set A containing the SFs going through the common bottleneck. For SF i ∈ A, we divide the measured BtlBW i by the number of SFs passing through the common bottleneck as follows: It should be noted that the first and second cycles of ProbeBW state will proceed normally with the measured full BtlBW. This enables C-MPBBR to continuously measure the correct BtlBW during whole transmission for all SFs. Moreover, we divide BtlBW and not CWND, as following BtlBW, BBR automatically decides CWND, send quantum, and the pacing rate.
Moreover, if BtlBW k of SF k ∈ A does not show a similar BtlBW compared with other SFs (SF i ∈ A) during three successive complete ProbeBW states than this SF k is removed from set A and continues to send the data with its estimated BtlBW k . In this way, by continuously observing BtlBW, C-MPBBR can keep on updating set A, i.e., the set of SFs sharing the common bottleneck. The entire method is summarized in Algorithm 1, where we first describe the algorithm in words and then present the pseudo-code accord- ing to the implementation in Linux kernel. Here, we set the value of α to 20. Moreover, the interested readers are encouraged to access and test the implemented Linux Kernel code for C-MPBBR uploaded in the GitHub repository given in [33]. Goal 1 states that MPTCP flows should provide an incentive over single-path flows or at least ensure the comparably good performance. Therefore, if MPTCP flows cannot provide an incentive or perform at least equally to single-path flows, then it is preferable to stop multipath transmission and use a single-path only. As discussed in Section III.B, the BBR flows exhibit the relation between BtlBW and DelRt. We consider that this can be a key to resolve Goal 1.
It is a well-known fact that the BW of a network represents its available capacity. By estimating BtlBW, BBR computes the network capacity. Therefore, it is also expected that DelRt should be close to the measured BtlBW. We propose to achieve Goal 2 by ensuring the appropriate utilization of these two parameters.
To fulfill Goal 2, we propose to convert multipath flows into a single-path flow in the cases when the multipath flows provide no benefit over a single-path one. However, rather than stopping all SFs at once, we propose to follow a step-by-step procedure. Let Total_Del_Rt be the total DelRt gained by all SFs and highest_bw_among_all_SFs be the highest BW among all available SFs. C-MPBBR continuously observes BtlBW and DelRt. Considering five consecutive complete ProbeBW states (full cycle), if the following condition is true: (2) then C-MPBBR stops the SF with the lowest BW and continues to monitor BtlBW and DelRt. If it finds that DelRt does not improve and Eq. (2) is satisfied for five consecutive ProbeBW states, it closes the next lowest BW SF. This is repeated until only the highest BW SF is left. Thereafter, C-MPBBR transforms into a single BBR flow and works following single-path BBR. Here, ß is set equal to 40 following the gap between the estimated BW and DelRt, as observed in Fig.1. Moreover, C-MPBBR waits for five consecutive complete ProbeBW states so that it can eliminate the chances of false-positive results before closing SF. Algorithm 2 summarizes the procedure to achieve Goal 1.

IV. PERFORMANCE EVALUATION OF C-MPBBR
In this section, we discuss the evaluation of the performance of C-MPBBR concerning a wide range of scenarios which are designed to evaluate the specific properties of MPTCP CCAs. Moreover, we compare the performance of C-MPBBR with the conventional MPTCP CCAs, such as LIA [6], OLIA [7], and BALIA [8], as well as the recent multipath BBR implementations, including the ones reported by Han et al. (referred to as Han's MPBBR) [21] and Nguyen et al. (referred to as U-MPBBR) [18]. We start with a detail description of the experimental setup and scenarios, and then compare the performance of the considered CCAs in the studied scenarios. Subsequently, we evaluate the performance of the considered CCAs in a complex network scenario to observe how they perform when different challenges are required to be addressed simultaneously. VOLUME 8, 2020

A. EXPERIMENTAL SETUP
We conducted the performance evaluation through emulation experiments based on the Linux network namespaces defined in the Mininet [32] emulator. Here, ''fq'' [34] was enabled as the queueing discipline; ''NetEm'' [35] and ''ethtool'' [36] were used to configure RTT and BW, respectively; ''iperf3'' [37] was employed to transmit the data between a client and a server and to assess the total throughput; ''ifstat'' [38] was used to measure the throughput per flows; ''tcpprobe'' [39] was utilized to measure CWND and other internal parameters of BBR. MPTCP v0.93.4 deployed in Linux kernel v4.9.169 was used to conduct the experiments. Fig. 2 represents the experimental scenarios considered to evaluate and compare the performance of C-MPBBR. In Scenario #1, as shown in Fig. 2(a), the client and server were connected via two different links having different properties. No background traffic was introduced in this scenario. This scenario was defined to understand what part of the whole capacity C-MPBBR and considered MPTCP CCAs could utilize in an underlying network. Scenario #2 (Fig. 2(b)) was designed to observe the performance of multipath flows when one SF shared the bottleneck with a single-path flow. In Scenario #3 (Fig. 2(c)), the aim was to observe the performance of the MPTCP CCAs in terms of addressing Goal 2; namely, how they performed when two SFs shared a common bottleneck while competing with a single-path flow sharing the same bottleneck. Scenario #4 (Fig. 2(d)) represented an interesting case in which SF-1 passed through a high BW path with small delay and low losses, whereas SF-2 passed through comparatively a rather narrow BW path with long delay and high losses. Moreover, SF-2 shared its path with a single-path flow. This allowed observing how differently MPTCP CCAs addressed this tricky condition in which a single-path flow received an upper-hand over the multipath flow. Finally, Scenario #5 (Fig. 2(e)) was specifically designed to observe the shortcomings of C-MPBBR. Here, the client and server were connected through two identical paths. C-MPBBR considers the SFs with the equal BtlBW as those sharing the same bottleneck. In Scenario #5, this mechanism could force C-MPBBR to consider the two SFs going through different paths as SFs sharing a common bottleneck.  Fig. 3 represents the CWND and throughput per flow for the considered MPTCP CCAs observed in Scenario #1. In this scenario, all BBR-based MPTCP CCAs outperformed all loss-based MPTCP CCAs. The loss-based MPTCP CCAs, such as LIA, OLIA, and BALIA, were greatly affected by the inability to properly utilize an underlying network for both SFs. During the 120 seconds emulation time, the average throughput for C-MPBBR was equal to 11.7 Mbps, whereas that of LIA was only 8.8 Mbps. The induced packet losses in Scenario #1 caused LIA to slow down the sending rate. Moreover, the aggressive fairness algorithm of LIA caused the throughput being suppressed further and resulted in poor performance [11]. The same cause applied to OLIA and BALIA as well. However, as BtlBW was different, C-MPBBR could understand that no SFs shared a common bottleneck, thereby enabling both SFs to behave as two separate BBR flows and to utilize the BW in full. U-MPBBR algorithm also performed similarly. Although, Han's MPBBR lacks such mechanism to identify bottleneck sharing SFs, they performed considerably well because of the congestion avoidance nature of BBR. Fig. 4 shows the CWND and throughput of the considered CCAs for Scenario #2. Please note that the background traffic for SF-1 was BBR for C-MPBBR, Han's MPBBR, and U-MPBBR, as well as Reno for LIA, OLIA, and BALIA. For clarity, we named them as C-MPBBR vs. BBR, LIA vs. Reno, and so on. In this scenario, the paths of SF-1 and SF-2 had the BW of 10 Mbps and 5 Mbps, respectively. Because there was background traffic in the path of SF-1, by considering fair BW sharing principle, SF-1 would receive the BW approximately 5 Mbps. In this scenario, when C-MPBBR SFs passed through two different paths, they behaved like two separate BBR flows. Therefore, following the principle of a BBR flow, SF-1 shared the BW with the single-path BBR flow and had the average throughput of 3.7 Mbps, whereas the single-path BBR flow received an average of 4.1 Mbps. Moreover, the total average throughput of C-MPBBR was 7.5 Mbps. Han's MPBBR and U-MPBBR also exhibited a similar trend. However, in LIA, SF-1 achieved the average throughput of only 1.4 Mbps, and the total average throughput was 4.9 Mbps. Notably, the single-path Reno flow could achieve the average throughput of 7.4 Mbps, which was much higher than that of LIA. A similar trend in results was also observed for OLIA and BALIA. This indicated that C-MPBBR and other BBR-based MPTCP variants could efficiently utilize and share the underlying network, in contrast with LIA, OLIA, and BALIA. According to [11], we consider the cause to be an aggressive fairness ensuring mechanism used in LIA, OLIA, and BALIA. Fig. 5 represents the performance of the considered CCAs when SF-1 competes with CUBIC in Scenario #2. A similar trend in the results was observed similarly as in the case of SF-1 competing with BBR/Reno.     Fig. 7 represent the performance of the confided MPTCP CCAs in Scenario #3 in which the background traffic corresponds to BBR/Reno and CUBIC, respectively. Here, both SFs traveled through a common bottleneck, and the bottleneck was shared with a single-path TCP flow. According to Goal 2, the two SFs combinedly occupy the capacity close to that of a single-path TCP flow. Here, among all BBR-based CCAs, only C-MPBBR successfully satisfied this criterion. The average throughput of C-MPBBR was 4.3 Mbps, whereas that of the single-path BBR flow was 3.8 Mbps, being fairly close to each other. Moreover, Goal 1 stated that multipath flows had to always ensure an incentive over a single-path flow, which was also addressed. On the other hand, Han's MPBBR's and U-MPBBR's average throughput was 6.1 and 5.7 Mbps, respectively. And their rival single-path BBR flow's average throughput was   2.5 and 2.6 Mbps, respectively. The same trend in the results was observed while competing with CUBIC. Therefore, Goal 2 was violated as those algorithms were excessively greedy in terms of absorbing BW. On the contrary, LIA achieved the average throughput of 2.8 Mbps, and the singlepath Reno reached that of 5.4 Mbps, meaning that LIA failed to hold its equal share. The same trend in the results was observed for LIA against CUBIC, and OLIA and BALIA against both Reno and CUBIC. Fig. 8 and Fig. 9 illustrate the performance of the considered MPTCP CCAs while competing with BBR/Reno, and CUBIC in Scenario #4, respectively. Here, SF-1 went through a path with the BW of 50 Mbps, 1 ms delay, and 0.01% loss. Whereas, SF-2 traveled through a path with 1 Mbps BW, 200 ms delay, and 0.2% loss. Moreover, SF-2 shared its path with a single-path flow. In this scenario, a single-path flow through the path of SF-1 would get an upper-hand over the multipath flow because the properties of the path VOLUME 8, 2020   [11]. Fig. 10 shows the performance of different CCAs in Scenario #5. Although this type of scenarios is rather uncommon in the modern complex Internet, we designed it to test the performance of C-MPBBR in the worst-case scenario. As it could be seen, although C-MPBBR was affected by the identical BW of the two SFs, it could still achieve better throughput. The average throughput of C-MPBBR was 14.5 Mbps, whereas those of Han's MPBBR and U-MPBBR were 13.0 and 15.5 Mbps, respectively, being almost the same. Moreover, the average throughput estimates of LIA, OLIA, and BALIA were 10.5, 11.4, and 12.0 Mbps, respectively. Therefore, we concluded that C-MPBBR could perform sufficiently well even in the worst-case scenario.

D. PERFORMANCE EVALUATION IN TERMS OF AGGREGATE BENEFIT
Following [40], to thoroughly investigate the network utilization with regard to multipath flows, considering the goodput and available BW, we defined a parameter ''Aggregate Benefit (Ag_bft)'' as follows: where Gt, BW m , and BW max were the total goodput of multipath flows, the BW of path m, and the largest BW among all paths, respectively. The better was the result of Ag_bft, the better was the performance. Fig. 11 shows the performance of different CCAs in terms of Ag_bft. It should be noted that ''vs. BBR/Reno'' implies that the background traffic for C-MPBBR, Han's MPBBR, and U-MPBBR corresponds to BBR; and LIA, OLIA, and BALIA to Reno; and ''vs. CUBIC'' means that the background traffic corresponds to CUBIC. This rule applies for all the scenarios unless stated otherwise. For Scenarios #1-2, in the case of the absence of background traffic, the performance estimates of the BBR-based MPTCP CCAs surpassed those of LIA, OLIA, and BALIA owing to the appropriate network modeling by BBR. Moreover, all BBR-based MPTCP CCAs resulted in almost equal Ag_bft. For Scenarios #2-4, the same trend in the results was observed for both cases, including ''vs. BBR/Reno'' and ''vs. CUBIC''. Furthermore, the same trend in the results was observed while comparing the BBR-based CCAs to LIA, OLIA, and BALIA except for Scenario #4. In Scenario #4, C-MPBBR surpassed all other CCAs, as C-MPBBR adopted a unique mechanism to recognize the situations in which single-path flows performed better than multipath ones and to deal with such situations by converting itself to a single-path flow, thereby successfully addressing Goal 1. Moreover, LIA, OLIA, and BALIA also performed better than Han's MPBBR and U-MPBBR in this scenario.
However, in Scenarios #2-3, Ag_bft of C-MPBBR was slightly less than those of Han's MPBBR and U-MPBBR. This was because the SFs were going through a shared path, to satisfy Goal 2, C-MPBBR reduced its BW so that single-path flows could obtain a fair share. This is discussed in detail in the subsequent section.

E. PERFORMANCE EVALUATION IN TERMS OF FAIRNESS TO SINGLE-PATH FLOWS
In the previous section, we observed how multipath flows utilized an underlying network. In this section, we discuss how fairly they behave with regard to single-path flows.
First, we analyzed what part of network capacity singlepath flows could exploit. To clearly estimate performance, we calculated the normalized throughput (Norm_Thpt) as follows: where Thpt and Available_BW denoted the throughput obtained by a single-path flow and the actual available BW for that flow, respectively. Notably, Available_BW did not correspond to the available physical BW but to the fair BW available to each flow going through the bottleneck. The value of Norm_Thpt equal to one, less than one, and greater than one represented that a single-path flow fully utilized the available BW, underutilized the available BW, and utilized above its fair share, respectively. Fig. 12 represents Norm_Thpt for single-path flows in Scenarios #2-4. Notably, the actual available BW for single-path flows in Scenarios #2-4 was 5, 3.3, and 0.5 Mbps, respectively. In Scenarios #2-3, compared with C-MPBBR, both the BBR and CUBIC flows reached the Norm_Thpt value near one and performed significantly better with regard to Han's MPBBR and U-MPBBR. However, the single-path Reno and CUBIC flows occupied BW more than their fair share while competing with LIA, OLIA, and BALIA. In Scenario #4, the single-path BBR and CUBIC flows achieved the highest Norm_Thpt value competing with  C-MPBBR. They received the BW approximately twice larger than their fair share. In Scenario #4, to ensure better throughput, C-MPBBR stopped SF-2 after some time. This allowed releasing the BW for single-path flows and enabled them to utilize the total BW of that path. This resulted in better throughput both for the multipath and single-path flows. Single-path flows showed a similar trend in the results competing with the other MPTCP CCAs, as observed in Scenarios #2-3.
Finally, for Scenarios #2-3, we calculated the Jain's fairness index [41], [42], as shown in Fig. 13. Notably, we did not do that for Scenario #4, because, by design, C-MPBBR stopped SF-2 to ensure better throughput. This invalidated the fairness issue in this scenario.
Moreover, in Scenario #3, SF-1 and SF-2 shared the common bottleneck with a single-path TCP flow. Here, SF-1 and SF-2 together should take a BW fairly close to that of a single-path TCP flow, i.e., the application using MPTCP should occupy approximately equal or slightly more BW than the single-path TCP to fulfill both Goals. Therefore, during the fairness calculation of Scenario #3, we considered the total capacity used by the MPTCP application, and compared it to the capacity utilized by the single-path TCP application.
In addition, we conducted experiments to observe how fair C-MPBBR behaves when it competes with another C-MPBBR. We considered Scenarios #2-3 in the absence of background traffic. The multipath client started two ''iperf3'' [37] data flows with the multipath server setting C-MPBBR as the CCA. We measured the total capacity achieved by the two applications and calculated the Jain's fairness index.
From Fig. 13, it is clear that the C-MPBBR ensured the best fairness index among all considered CCAs, while competing with BBR, CUBIC, and another C-MPBBR in both scenarios. As observed previously, Han's MPBBR and U-MPBBR could acquire better Ag_bft owing to their greedy BW absorbing nature. This ultimately resulted in blocking other flows in a shared bottleneck and violating Goal 2. However, due to the aggressive nature in terms of fairness, LIA, OLIA, and BALIA could not obtain a fair share for themselves, violating Goal 1 and resulting in a poor fairness index.
We noted that the best performance of C-MPBBR in terms of fairness could be attributed to its intelligent algorithm aimed to recognize a shared bottleneck, as well as to its fair BW allocation mechanism for such situations. This enabled C-MPBBR not only to ensure an appropriate BW share for itself, but also to the competing flows.

F. PERFORMANCE EVALUATION IN A COMPLEX NETWORK SCENARIO
In this section, we evaluated the performance of the considered CCAs in a complex network scenario, as shown in Fig. 14(a). In this scenario, the multipath client and server were connected via five different paths, denoted as SF-1 to SF-5, respectively. SF-2 and SF-3 as well as SF-4 and SF-5 traveled through two separate common bottlenecks. Moreover, background traffic was present in all the bottlenecks. The path of SF-1 had the highest BW and lowest delay. It was also shared with a single-path flow, thus the available fair BW for SF-1 became 10 Mbps. The key challenge for SF-1 would be achieving full utilization of the BW while fairly sharing the BW with the single-path flow. SF-4 and SF-5 shared a common bottleneck, having the BW of 20 Mbps. This bottleneck was also shared with a single-path flow leaving a fair share of 5 Mbps for each SF. Therefore, SF-4 and SF-5 would require to combinedly take a BW equal to single-path flow which is around 10 Mbps. Finally, the common bottleneck shared between SF-2 and SF-3 had a very narrow BW of 1 Mbps, with a high delay of 200 ms, and packet loss of 0.3%. The fair share for each of the SFs was only 0.25 Mbps. Compared to the other SFs, packets traveling through SF-2 and SF-3 would often cause head-of-line blocking at the receiver. Therefore, it would be better to terminate these flows and leave this bottleneck for the single-path flow. Fig. 14(b)-(c) show the Ag_bft when the background traffic was BBR/Reno and CUBIC, respectively. Notably, C-MPBBR achieved the highest Ag_bft due to its advanced algorithm that stops the SFs with the lowest BW to always ensure an incentive over single-path flows. Both Han's MPBBR and U-MPBBR could not achieve such performance because they continued sending packets through SF-2 and SF-3, which caused the head-of-line blockage at the receiver. LIA, OLIA, and BALIA's poor performance can be attributed to their aggressive bias toward fairness. Fig. 14(d) represents the Jain's fairness index for the complex network scenario. Here, we also measured the performance of C-MPBBR while it competed with another C-MPBBR flow in the absence of background traffic, following the same procedure as mentioned in Section IV(E). Again, C-MPBBR ensured the best fairness index among the considered CCAs because of the proposed intelligent algorithm. Han's MPBBR and U-MPBBR showed poor fairness indices due to their greedy BW absorbing nature. Further, the poor performance of LIA, OLIA, and BALIA was because of their aggressive nature with regard to fairness.

G. PERFORMANCE EVALUATION IN THE REAL-WORLD INTERNET
Finally, we evaluated the performance of the considered CCAs in the real-world Internet. We set up a multipath client and server at the two ends of the Kyungpook National University, Daegu campus. The multipath server computer was equipped with an Intel Core i5-9600K 3.7GHz processor, 32 GB RAM (random-access memory), one Ethernet NIC (network interface controller), and one wireless NIC. The multipath client computer was equipped with an Intel Core i7-8750H 2.20 GHz processor, 20 GB RAM, one Ethernet NIC, and one wireless NIC. The Ethernet and wireless link had the BW of 10 Mbps and 2 Mbps, respectively. Fig. 15(a) illustrates the network scenario. The experimental setup process described in Section IV(A) was followed, and the duration of each experiment was 120 seconds.
The achieved throughput can be observed in Fig. 15(b). C-MPBBR achieved the highest throughput in comparison to the considered CCAs due to its advanced algorithm. Han's MPBBR and U-MPBBR showed a similar tendency as C-MPBBR due to their dependency on BBR and greedy BW absorbing nature. However, LIA, OLIA, and BALIA exhibited comparatively low performance due to their aggressive nature in terms of fairness toward single-path flows.

V. CONCLUSION
In the present paper, we considered the problem of developing an appropriate MPTCP CCA that can satisfy the fundamental challenges of MPTCP. Several existing MPTCP CCAs were deemed aggressively fair so that they failed to utilize an underlying network properly. Others were so aggressive that they blocked other flows in a shared bottleneck in terms of BW. To address this issue, we proposed C-MPBBR, a novel BBR-based CCA for multipath scenarios. C-MPBBR was designed to be easily implementable inside BBR v1. We deployed it in the Linux kernel and made the code available online.
We conducted the extensive emulation and real-world experiments concerning several critical scenarios and found that C-MPBBR could successfully satisfy the fundamental challenges of MPTCP. It managed to avoid being excessively greedy for BW or aggressive in terms of fairness by identifying an appropriate trade-off. C-MPBBR was capable of fully utilizing an underlying network by ensuring fairly high throughput for itself and other competing flows. Moreover, it provided an incentive over single-path flows while enabling a fair share for rivals.
In the future research work, C-MPBBR can be further improved and extended by exploiting the recent developments in BBR v1. Specifically, we consider that the proposed algorithm can serve as a basis for the implementation of the upcoming BBR v2 in multipath scenarios in the nearest future.