Fast Data Recovery for Improved Mobility Support in Multiradio Dual Connectivity

Data aggregation is one of the crucial features of the 3GPP Multi-Radio Dual Connectivity (MR-DC) technology. However, mobility events and radio link failures, which may occur during the data aggregation, may pose challenges in meeting the latency, reliability, and throughput key performance indicators (KPIs). Unlike single connectivity, the user equipment (UE) in MR-DC operation can experience such events in either of the two base stations (BSs) serving the UE with MR-DC. In typical MR-DC deployments, these events occur more frequently in the BS acting as the secondary node (SN) since the SN operates at a higher frequency band. In this paper, we show that handovers (HOs) and signal blockage events that occur at the SN can create out-of-order data reception or losses at the UE’s Packet Data Control Protocol (PDCP) layer, making the application stop receiving data for up to hundreds of milliseconds. Thus, challenging to meet the KPIs defined for such application. To mitigate this effect, we propose an intelligent and efficient mechanism that operates in the transmitting PDCP layer and significantly minimizes the data interruption periods suffered by the application when the UE aggregates data and HOs or failures of the SN occur. We use LTE/NR testbed experiments to show that the proposed mechanism achieves a high and stable aggregate throughput with near-zero interruption time and data reliability of at least 99.999% without transport layer retransmissions. The experiments are conducted for saturated TCP traffic and under link quality variations based on traces extracted from a Nokia-proprietary system-level simulator.

duplication, and/or provide mobility robustness through data 23 offloading [2], [3], [4]. In MR-DC, the two BSs can use the 24 same or different 3GPP radio access technologies (RATs) 25 of Long Term Evolution (LTE) and 5G New Radio (NR). 26 For ultra-reliable low-latency communications (URLLC) and 27 enhanced mobile broadband (eMBB) use cases, MR-DC 28 The associate editor coordinating the review of this manuscript and approving it for publication was Tiankui Zhang . plays an important role in meeting certain KPIs defined for 29 such use cases. For instance, MR-DC can help achieve a 30 given reliability target without using time-consuming retrans-31 mission mechanisms such as hybrid automatic repeat request 32 (HARQ) or automatic repeat request (ARQ) through data 33 duplication. Alternatively, aggregating data from both BSs 34 can improve the user data rate without significant hardware 35 complexities. 36 In a typical MR-DC deployment, as illustrated in Fig. 1, 37 one BS has macro cell coverage using frequencies in the 38 range 1 (FR1), i.e., below 7.125 GHz, while the other BS has 39 small cell coverage and may use frequencies in the range 2 40 (FR2), i.e., 24.25 GHz to 52.6 GHz, [4], [5], [6]. In such 41 common scenario, user mobility causes the link using the FR2 42 time [2], [3]. In both cases, they are probably triggering the 75 upper layer retransmission mechanisms. For instance, unlike 76 UDP, the TCP receiver requests data retransmission when it 77 detects sequence gaps, a.k.a., fast retransmission, or when the 78 already transmitted data has not been acknowledged during 79 a given period at the TCP sender, a.k.a., retransmission by 80 timeout. In these cases, the TCP receiver stops delivering 81 new information to the application layer until the missing 82 data is correctly received, increasing the data interruption 83 time to several hundreds of milliseconds. On top of that, the 84 aggregate throughput is seriously affected since TCP reduces 85 its congestion window. In this situation, meeting the KPIs 86 defined for reliability-and latency-constrained applications 87 such as low latency eMBB is challenging [10]. Accordingly, 88 in this paper, we show quantitatively the TCP performance 89 degradation in such scenarios. 90 Despite the challenges that the data interruption time repre-91 sent for the performance of MR-DC operation, the 3GPP has 92 not defined any solution to tackle such a problem. Indeed, 93 the SN change procedure [1], which is specified to manage 94 the frequent changes of the SN, makes the UE stop commu-95 nicating via the SN link until the change from the serving SN 96 (S-SN) to the target SN (T-SN) is completed, as depicted in 97 Fig. 1(a). Moreover, the typical way to recover the network 98 connectivity from a radio link failure (RLF), i.e., the blockage 99 of the radio link, in single connectivity (SC) operation is via 100 a cell re-establishment procedure [11], [12]. However, when 101 the SN link fails in MR-DC operation, a.k.a, secondary cell 102 group (SCG) failure, such a solution is not specified by the 103 3GPP for MR-DC [8], [12], [13]. Therefore, the data buffered 104 in the failing SN may be considered lost unless it can be 105 transmitted to the UE using a new BS, as shown in Fig. 1(b). 106 Unfortunately, this requires a new data forwarding procedure, 107 which the 3GPP has not considered. Note that this scenario 108 can also occur if the SN change procedure fails. 109 Most of the relevant research studies in MR-DC such as [3], 110 [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], 111 [25] have proposed, on the one hand, methods to aggregate 112 data considering that the UE is static and no failures occur. 113 On the other hand, studies have proposed methods to provide 114 mobility robustness. Yet, they do not consider the data aggre-  The rest of this paper is structured as follows. Section II 150 introduces the main technical aspects and the challenges that   In the following, we present the most important aspects 165 related to signaling mobility management for SC and MR-166 DC, considering downlink traffic.

168
In SC operation, the UE is connected to a single BS. Hence, 169 the mobility events that may trigger a HO are handled by 170 the BS the UE is connected to and the core network (CN). 171 In the typical break-before-make HO used in LTE, the UE 172 can experience, at the radio level, typical data interruption 173 times of 15-50 ms, but delays of hundred of milliseconds 174 can also occur [26].   or out-of-order data reception, the data interruption time 299 experienced by the upper layers will be much higher than the 300 one experienced by the physical layer [10].    Additionally, when an SCG failure occurs due to an RLF or 336 SN change failure, the UE will no longer receive PDUs via 337 the SN link. In this case, the PDCP PDUs buffered in the 338 SN's RLC buffer, i.e., the ones transmitted but not received 339 at the UE, and the ones in-flight, i.e., the PDUs already split 340 by the MN but which have not arrived at the SN yet, can be 341 considered lost unless they are transmitted by a new BS.

342
Most of the available research efforts on MR-DC have 343 focused on developing flow control solutions for data aggre-344 gation or methods to provide mobility robustness without 345 data aggregation instead of reducing the application's data 346 interruption experienced during HOs or signal blockages. 347 For the former case, flow control mechanism mainly aim to 348 maximize the user's aggregate throughput, reduce the end-349 to-end latency, maximize the throughput in one of the BSs, 350 or achieve a minimum throughput for all users in both BSs [3], 351 [14], [15], [16], [17], [18], [19], [20]. All theoretical and 352 practical models presented in these works and their evalu-353 ations consider UEs without mobility and without service 354 interruptions. Moreover, for the latter case, there have been 355 studies that explore the use of MR-DC as an alternative to the 356 legacy HO. In these studies, the authors state that MR-DC 357 can reduce the HO failure probability, signaling exchange 358 with the CN, the HO computational complexity, and HO 359 completion delay [21], [22], [23], [24], [25]. Furthermore, 360 these studies consider that before the HO, the UE already 361 had CP connectivity via both BSs, i.e., the split bearer is 362 configured in the MN and SN. However, the user's data 363 is always transmitted via only one BS, i.e., the SN, upon 364 triggering the HO. For this, the traffic is forwarded from the 365 MN to the SN. 366 Furthermore, several studies show the capability of 367 MR-DC to reduce the negative impact of link blockages on 368 the performance of the user application's KPIs. For instance, 369 in [34], the impact of various system parameters on the 370 user's ergodic capacity for dense mmWave deployments is 371 studied. Authors demonstrate that using multiple degrees 372 of multi-connectivity, i.e., multiple radio link connections, 373 helps to increase the achievable capacity by enabling backup 374 connections. Additionally, in [35], authors indicate that hav-375 ing the UE with multi-connectivity reduces by up to seven 376 times the denial of service and by up to ten times the drop-377 ping probability when static and dynamic blockages appear 378 at the density of one blocker per square meter. Moreover, 379 the theoretical framework presented in [36] suggests that 380 under a high-density BSs deployment, extensive UE cov-381 erage, and short HO execution time, dual connectivity is 382 sufficient to achieve the reliability target required in URLLC 383 services in the presence of signal blockers. However, the 384 multi-connectivity degree needed to support VR/AR ser-385 vices may be higher, especially in ultra-dense deployments. 386 Authors in [37] state that blockages reduce the line of 387 sight probability between the UE and BSs, implying that 388 the UE has fewer available BSs to connect with in the 389 area. This, in turn, increases the HO likelihood in BSs 390 that use mmWaves. As a result, having multiple radio link 391  However, this procedure is time-consuming and can take hun-446 dreds of milliseconds, possibly making the PDCP reordering 447 mechanism discard the PDCP PDUs pending at the SN since 448 they will likely arrive at the UE within a different reordering 449 window. In other words, the sequence number of the received 450 PDU(s) will be lower than that of the last PDU delivered 451 to the upper layers, as illustrated in Fig. 4. Moreover, since 452 the non-delivered data is probably present in the failing SN's 453 RLC buffer, it can be re-routed and transmitted by a new 454 BS, e.g., via the MN. However, the data must be re-routed 455 through a backhaul link with a non-zero delay, which makes 456 it challenging to meet the latency requirements for some 457 applications. For instance, according to [10], low-latency 458 eMBB applications require that the data re-routing must be 459 completed in less than 10 ms. In this regard, our proposed 460 fast data recovery mechanism for MR-DC will address the 461 abovementioned challenges.

463
Our FaRe mechanism aims to minimize the data interruption 464 time that the application experiences during SN change or 465 SCG failure events. To achieve that, the FaRe locally and 466 temporarily stores the PDCP PDUs split via the SN. There-467 fore, the MN can timely retransmit the missing PDUs when 468 one of the aforementioned events occurs. In this regard, the 469 FaRe avoids the time-consuming higher layer data retrans-470 missions, e.g., at the TCP level, which can be required when 471 an SCG failure occurs. Likewise the FaRe avoids the slow 472 data forwarding procedure required during an SN change. 473 The FaRe works along with a flow control algorithm to facil-474 itate the data splitting management, i.e., stop/pause/resume 475 the splitting, during the SN change or failure events. The 476 FaRe has three main functional stages: the buffering, the fast 477 retransmission, and the splitting activation stages, which are 478 depicted in Fig. 5   forward the data present in its RLC buffer to a new BS. if DDDS Type 1 is received then 8: while FaRe-PDU ≤ ACK SN do 10: Delete the FaRe-PDU 11: else 12: Continue sends either the SN Release Request or SN Change Confirm 522 messages. Actually, the MN can even stop the data splitting 523 earlier if the radio link conditions experienced between the 524 UE and S-SN are not favorable to maintain the connectivity. 525 Upon receiving the SN Release Request or SN Change 526 Confirm messages, the S-SN stops communicating with the 527 UE and releases the radio resources assigned to the corre-528 sponding UE [1]. For this reason, the S-SN prepares and sends 529 through the X 2/Xn interface a DDDS report that includes 530 the latest delivered/transmitted PDCP PDUs and the indi-531 cation that this report is the final one, i.e., the Final Frame 532 Indication flag is activated. if Head's RLC SDU is segmented then 10: Place the FaRe-PDUs after the segmented RLC SDU 11: else 12: Place the FaRe-PDUs before the RLC SDUs 13: Flush the FaRe-Buffer 14: else if SCGFailureInformation received then 15: Notifies the flow control to stop data splitting via SN 16: Read the PDCP Status Report 17: ACK SN = First Missing PDU 18: while FaRe-PDU < ACK SN do 19: Acknowledge FaRe-PDUs except the non-received PDUs included in the bitmap 20: Update the FaRe-Buffer 21: if Head's RLC SDU is segmented then 22: Place the FaRe-PDUs after the segmented RLC SDU 23: else 24: Place the FaRe-PDUs before the RLC SDUs   is the last MAC SDU size statistics received from the S-SN 672 just before the MN sends the SN Change Confirm or SN 673 Release Request message, and RLC delay is the initial value 674 to use for the RLC buffering delay. It is worth mentioning 675 that during a HO, the quality of the radio link conditions 676 does not allow the UE to achieve high data rates. However, 677 in MR-DC operation, the BSs must assure a minimum of 678 radio resources for the UE to achieve a given minimum data 679 rate. For this reason, the FaRe uses the minimum MAC SDU 680 size statistics received during the period mentioned above as 681 the initial value for the MAC SDU size variable. Likewise, 682 since the T-SN's RLC buffer is empty after the RA procedure, 683 the FaRe indicates the CCW to use 0 ms as the initial value 684 for the RLC buffering delay variable. where PDU delay is the PDCP PDU transmission delay. Since 688 during and right after the completion of the RA procedure, 689 the T-SN has no data in its RLC buffer, the FaRe indicates 690 the Delay-based to use 0 ms as the initial value for the PDCP 691 PDU transmission delay.

692
Regardless of the flow control algorithm used by the MN 693 to split the incoming data, the FaRe indicates the flow control 694 algorithm to use the initial values for their variables until the 695 MN receives up-to-date statistics from the T-SN. Likewise, 696 the FaRe's Buffering Stage is initiated as soon as the incoming 697 traffic is split via the SN link. Note that after an SCG failure 698 event, the UE switches to SC operation. However, the UE 699 may recover the MR-DC operation, thus the data aggregation, 700 if the MN initiates the procedure to re-establish the connec-701 tion with the failing SN [12]. On the contrary, if the MN 702 decides to add a new SN, the data aggregation is started from 703 scratch. Thus, the Traffic Activation Stage is not applicable 704 in this case. The splitting activation stage is presented in the 705 Algorithm 3.

707
To validate our proposed FaRe mechanism, we implemented 708 a Dual Connectivity (DC) [27] solution on a LTE/NR testbed 709 developed using the Open Air Interface Software (OAI) [44]. 710 The testbed is based on the split DRB architecture and imple-711 ments the user plane functionalities of DC detailed in [9]. 712  15: Set the flow control variables accordingly. 16: if initial DDDS report received then 17: if CCW enabled then 18: Set SDU DC = SDU initial    If the S-SN has no data to forward to the T-SN, the latter 779 has no data to transmit to the UE once the RA procedure is 780 completed. Thus, the MN will not resume the data splitting. To evince and compare the variability of the throughput 809 caused by SN change and SCG failure events in the entire 810 data session, we use the Variance Ratio (R var ) [21], which is 811 defined as where T DC is the average aggregate throughput obtained 814 by the application at the end of an experiment and δ T DC 815 is the standard deviation of T DC . Note that high values of 816 R var indicate significant throughput instability, such as long 817 periods of zero throughput or short periods with very high 818 throughput peaks.

819
3) DATA RELIABILITY 820 When a UE aggregates data, the main goal is to maximize the 821 obtained throughput. However, achieving a given reliability 822 target while maximizing the throughput may be challenging 823 for some applications during SN change or SCG failure 824 events. In this regard, we evaluate the reliability obtained at 825 the PDCP level with and without the use of the FaRe mecha-826 nism. For this, we compare the number of PDCP PDUs that 827 leave the MN's PDCP layer with the PDCP PDUs received in 828 the mirroring layer at the UE during the entire data session. 829 The PDCP reliability (R PDCP ) is defined as where PDUs RX is the number of PDCP PDUs that are suc-832 cessfully received at the UE, and PDUs TX is the number of 833 PDCP PDUs that are split by the MN and leave the PDCP 834 layer to be transmitted via either BS. When the UE aggregates data, the interruption time experi-837 enced at the transport and/or application layers is influenced 838 by the out-of-order arrival of PDCP PDUs or PDCP PDU 839 losses. In this regard, the interruption time increases while the 840 PDUs spend more time in the PDCP reordering buffer. Unlike 841 UDP, TCP is a reliability-oriented protocol, so it must provide 842 in-sequence delivery to the application. Therefore, if TCP 843 sequence gaps are detected, the application will not receive 844 data until the lost packet is correctly recovered by TCP. 845 For this reason, we measure the elapsed time the transport 846 layer stops receiving data during the data session. Note that 847 iperf3 measures the throughput at the transport layer, but 848 these results also represent the data interruption time at the 849 application level.

851
To recreate a mobility scenario on the DC testbed described 852 in Section V-A, we use the signal-to-interference-plus-noise-853 ratio (SINR), channel quality indicator (CQI), and refer-854 ence signal received power (RSRP) traces extracted from the 855 Nokia-proprietary system-level simulator for the MN and SN 856 using a 3GPP-defined DC scenario detailed in [47].  In this subsection, we evaluate the performance of the 939 FaRe mechanism against the Baseline strategies described 940 in Section V-B. Fig. 7 illustrates the CDF of the aggre-941 gate throughputs obtained using two t-Reordering values, 942 i.e., 100 and 300 ms, which help us to visualize the 943 behaviour of the aggregate throughput based on two different 944 configurations.

945
The Baseline_1 strategy achieves the worse performance 946 among the compared methods. In this case, Fig. 7 shows 947 that the probability of having zero throughput values is more 948 significant than in the other cases. Indeed, the probability 949 is around 1% only when the t-Reordering is 100 ms. In the 950 VOLUME 10, 2022  during the events mentioned above. Some applications, such 988 as LL-eMBB or real-time applications, may not tolerate hav-989 ing extended periods of zero throughputs. Hence, it may be 990 more beneficial to have a continuous data flow rather than 991 short periods of zero and very high throughputs. Contrary 992 to the results obtained with the Baseline strategies, the FaRe 993 achieves a stable throughput with an average of 25.5 Mbps 994 regardless of the reordering timeout value. As a significant 995 advantage, the FaRe's instantaneous throughputs never reach 996 abnormal zero and very high peak values, as illustrated 997 in Fig. 7.   To study the data interruption time, we rely on the periodic 1118 throughput reports delivered by the iperf3 tool. In this regard, 1119 we measure the periods of zero throughputs during the entire 1120 data session for each experiment. For this, the throughput 1121 reports are periodically collected in our experiments every 1122 100 ms for both scenarios. The results shown in Fig. 11 1123 represent the average data interruption time experienced by 1124 the transport layer after running 20 experiments for the FaRe 1125 and every benchmarking strategy.

1126
The results depicted in Fig. 11(a), for the SN change sce-1127 nario, show the effectiveness of the FaRe to avoid suffering 1128 from data interruption periods. On the other hand, it can be 1129 visualized that Baseline_2 creates the lowest average inter-1130 ruption time among the Baseline strategies. At a glance, the 1131 obtained interruption time may not represent a significant 1132 problem in scenarios where the throughput stability and the 1133 data reliability are not the primary concern. However, for 1134 latency-and reliability-constrained applications, the Base-1135 line_2 is not an option to consider. Even though the Base-1136 line strategies and the FaRe achieve on average a similar 1137 aggregate throughput, as shown in Fig. 7, the SN change 1138  do not arrive at the UE in time to fill the PDCP sequence 1152 gap, the transport layer will not receive the expected data.

1153
For reliability-oriented protocols such as TCP, the transport 1154 layer will retransmit the missing data, significantly reducing 1155 the throughput and increasing the data interruption periods. FaRe-PDUs are acknowledged by the MN every 5 ms, the 1166 memory allocation requirements at the MN are negligible. 1167 In fact, the FaRe-Buffer usage results, depicted as Buffer_Size 1168 in Fig. 12, show that during a data session of 20 seconds, 1169 on average, 20 PDUs are present in the FaRe-Buffer. In our 1170 LTE/NR testbed, the iperf3 tool generates packets of fixed 1171 size, creating PDCP PDUs of 1466 bytes. Hence, the aver-1172 age FaRe-Buffer size corresponds to 29.3 KBytes, which 1173 has a negligible impact on the overall MN's performance. 1174 It is worth mentioning that the buffer demand may slightly 1175 increase in case of higher throughput demands and larger 1176 bandwidth sizes.

1177
Moreover, during an SCG failure event, the FaRe retrans-1178 mits via the MN's Uu, on average, 15 FaRe-PDUs, as can be 1179 visualized with the variable Re-routed in Fig. 12. The first 1180 of these PDUs arrive at the UE's PDCP in approximately 1181 5-8 ms. This delay, shown as R_Delay in Fig. 12, matches 1182 with the theoretical delay, i.e., * FaRe , computed with (2). 1183 Note that the measurement of this delay starts once the MN 1184 receives the SCGFailureInformation message and ends when 1185 the UE's PDCP layer receives the first FaRe-PDU. 1186 Additionally, we noted in our experiments that in some SN 1187 change events, some PDUs, that were initially transmitted 1188 via the SN, arrive at the UE's PDCP after the FaRe-PDUs. 1189 This event happens when several HARQ retransmissions 1190 were required in the UE-SN path to decode the transport 1191 block correctly. During the evaluated period, the UE received, 1192 on average, 6 duplicated PDUs, as depicted with the vari-1193 able name Duplicated in Fig. 12. This random event has a 1194 negligible impact on the performance of the FaRe since the 1195 throughput, reliability, and data interruption are not affected. 1196

1197
A fast data recovery mechanism that minimizes the data 1198 interruption time experienced by the application in MR-DC 1199 scenarios with mobility is presented in this article.  1414 and support of mission-critical applications. She has coauthored more than 1415 30 peer-reviewed publications and four book chapters, and she is the inventor 1416 of numerous patents on a wide range of topics.