$\mathsf{HxL3}$: Optimized Delivery Architecture for HTTP Low-Latency Live Streaming

While most of the HTTP adaptive streaming (HAS) traffic continues to be video-on-demand (VoD), more users have started generating and delivering live streams with high quality through popular online streaming platforms. Typically, the video contents are generated by streamers and being watched by large audiences which are geographically distributed far away from the streamers’ locations. The locations of streamers and audiences create a significant challenge in delivering HAS-based live streams with low latency and high quality. Any problem in the delivery paths will result in a reduced viewer experience. In this paper, we propose <inline-formula><tex-math notation="LaTeX">$\mathsf{HxL3}$</tex-math></inline-formula>, a novel architecture for low-latency live streaming. <inline-formula><tex-math notation="LaTeX">$\mathsf{HxL3}$</tex-math></inline-formula> is agnostic to the protocol and codecs that can work equally with existing HAS-based approaches. By holding the minimum number of live media segments through efficient caching and prefetching policies at the edge, improved transmissions, as well as transcoding capabilities, <inline-formula><tex-math notation="LaTeX">$\mathsf{HxL3}$</tex-math></inline-formula> is able to achieve high viewer experiences across the Internet by alleviating rebuffering and substantially reducing initial startup delay and live stream latency. <inline-formula><tex-math notation="LaTeX">$\mathsf{HxL3}$</tex-math></inline-formula> can be easily deployed and used. Its performance has been evaluated using real live stream sources and entities that are distributed worldwide. Experimental results show the superiority of the proposed architecture and give good insights into how low latency live streaming is working.

Experience (QoE) [2]. HAS is currently standardized in two main formats: (i) MPEG Dynamic Adaptive Streaming over HTTP (DASH) [3] and (ii) Apple HTTP Live Streaming (HLS) [4]. HAS live video traffic generated by various online streaming platforms continues to increase exponentially; it currently accounts for more than 17% of all Internet traffic with a projected 3-fold growth from 2018 to 2023 [5]. Advances in video streaming technologies, delivery networks, and proliferation of online video platforms (e.g., YouTube, Twitch, Facebook) have enabled users to stream and watch live content on a plethora of devices including -but not limited to -mobile devices [6]. Additionally, more users are now generating high quality live streams directly from their devices through these online platforms to reach out to many viewers that are distributed in different locations worldwide [7]. For instance, a streamer can distribute a live concert to multiple audiences that are geographically distributed in various locations. These online platforms try to satisfy the increase in demand for more interactivity and higher quality, which in turn attracts even more users [7].

A. Problem Description and Motivation
With the emergence of a new class of live streaming applications, it is very important to provide a high user experience while maintaining low latency. Currently, one popular area of development in HAS is low latency live (L3) streaming [8], where the goal is to achieve an end-to-end (E2E) latency in a range of three to five seconds when delivering live content from a live source to multiple viewers. Ensuring short E2E latency, i.e., the time difference between source video capture and player video playback (also referred to as glass-to-glass latency), is not straightforward as this time includes several processes in the delivery workflow. That is, the video has to be captured, encoded, packaged, and transferred from a server to a client, which may be far away from each other and subject to unpredictable network conditions. Thus, satisfying the main requirements of L3 streaming applications, four essential key challenges should be considered and need to be addressed: 1) Conflicting QoE Objectives. Latency is often not the only consideration in viewers' desired QoE. Other conflicting factors such as high video quality, low rebuffering rate, and few quality switches also need to be considered. For example, playing the video at the highest possible quality may lead to frequent stalls which increase the latency, thus conflicting with the goal of ensuring high video stability. 2) Challenging TCP Transmission. HAS typically uses the Transmission Control Protocol (TCP) as its transport layer protocol. In a common case scenario where users (streamers and viewers) are far away and distributed worldwide, it is well known that TCP experiences poor bandwidth over network paths with long round-trip time (RTT) [9]. This is mainly caused by the slow-start and congestion control mechanisms, degrading viewers' experience. This issue will become worse when the E2E content delivery path involves a combination of wired (i.e., long-distance core) and wireless (i.e., short-distance fronthaul network) networks with heterogeneous characteristics. Even if the fronthaul network provides high bandwidth capacity, the L3 application requirements still cannot be guaranteed due to a potential TCP bottleneck in the core network or packet losses in the wireless network. 3) Streamlining and Optimizing Delivery Path: In addition to long paths, E2E latency in HAS also includes a number of specific delays for capturing, ingesting, encoding, packaging (including possible encryption), publishing to the origin, delivering through a content delivery network (CDN), player buffering, decoding and rendering, most of which are proportional to segment duration. To achieve low latency, all these processes have to be streamlined and optimized.

4) Achieving Fair Latency and Reducing Load:
The experienced latency should fair among all the players with a much tighter synchronization among them at scale. Under heavy loads, the origin server gets overloaded by multiple requests from a large number of viewers watching the same live stream which typically results in replicated requests and thus undesirable performance. In addition, traversing all the way toward the origin as well as responding to each request create huge overhead in the network that causes congestion.
One way to achieve low latency is to leverage the additional storage and bandwidth resources at the edge near to the viewers to compensate for poor TCP performance via optimized caching and/or prefetching policies [10]. While this solution achieves good performance for on-demand video streaming, it shows many limitations in L3 streaming as media segments are still produced at the encoder [11]. Another way is to use very short segment durations (e.g., ≤ 0.5 s) which however has its own drawbacks [12]; for instance, the overhead is increased as the numbers of HTTP requests and responses increase, the player experiences frequent quality switches, and the encoder efficiency is reduced. To address these issues while still using reasonable segment durations, recent advances in L3 streaming technologies combine MPEG's Common Media Application Format (CMAF) standard [13] and HTTP/1.1 Chunked Transfer Encoding (CTE) (RFC 7230). In this case, long segments are generated but transported in small non-overlapping chunks, allowing to significantly reduce the serial delays between video source and viewers, and thus, a reduced E2E latency can be observed [14]. While this solution was a significant step forward in L3 streaming, recent experimentation has shown that if one entity in the delivery workflow does not support CTE and is not optimized for L3 delivery, then it will undo the benefits of CMAF and CTE [15].

B. Overview of HTTP-Based Low Latency Live Streaming
In order to address the issues identified in the previous subsection, we design a novel E2E video delivery architecture for HAS-enabled L3 streaming, termed HxL3-HTTP/x-based Low-Latency Live streaming architecture. HxL3 is agnostic to the protocol and codecs that can be deployed with any application protocol (HTTP/1.1 or HTTP/2.0), streaming format (DASH, HLS, or CMAF), and transport protocol (TCP or UDP) as well as in compliance with standard HAS-based encoders, servers, and CDNs. It addresses the above-mentioned challenges as follows: (Problem 1) It considers various QoE objectives in form of optimization model (i.e., binary integer linear programming) where the objective is to ensure the best balance between the QoE factors without impacting the user experience in various network conditions. (Problem 2-3) It develops virtual reverse proxies located at origin-side (denoted by VRP o ) and the edge (denoted by VRP e ) for efficient transmission between end-points (streamers and viewers). (Problem 4) It uses VRP e which manages all the player requests by aggregating them based on the requested live channel and sends only request back to the origin-side via VRP o , and then delivers the requested segments to the viewers at the same time for fair latency and synchronized playback. Moreover, It uses transcoding capabilities of the VRP e (which holds all the segment representations sent by VRP o ) to alleviate impact of variable network conditions.
HxL3 strives to fulfill requirements of L3 applications that include high quality, low latency and rebuffering, ensuring the best possible viewing QoE. In a nutshell, HxL3 is depicted in Fig. 1 [17] proposed a framework for real-time streaming and transcoding at the edge of the network. They aggregate clients' requests at the edge servers, then transfer the highest requested bitrate from the origin server to the edge servers. Ge et al. [11] designed Edge-based Transient Holding of Live sEgment (ETHLE) architecture for 5G-enabled networks. ETHLE aims to address the issues of TCP slow-start on interactive live video streaming applications by performing context-aware transient holding of video segments at the mobile edge with virtualized content caching capability. Similarly, Drago et al. [18] presented a reliable video streaming architecture for mmWave 5 G networks, based on multi-connectivity and network coding technique. The multi-connectivity allows to provide continuous coverage with LTE and high capacity with mmWave, and network coding allows to to simplify the management of the video delivery on multi-path, and thus, provide additional robustness. Yang et al. [19] leverages multi-access edge computing (MEC) architecture to implement an MEC-enhanced mobile video streaming service. The MEC server implements three main functions: popular video caching, radio analytics, and an optimized media quality adaptation. Following the same architecture design, Xu et al. [20] designed an MEC-enabled architecture which combines content caching and ABR streaming technology together. The proposed solution formulates a joint cache and radio resource allocation optimization as matching problem and solves it using a game theory framework. Tuysuz et al. [21] designed a collaborative QoE-based mobility-aware video streaming scheme deployed at MEC servers.
Solutions for Unconstrained Latency: Bagci et al. [22] proposed a centralized and distributed architectures for collaboration between Internet service provider (ISP), video service provider (VSP), and DASH video players to provide ISP-managed or VSP-managed DASH services over softwaredefined networks (SDN) with quality-of-service (QoS) reserved network slices. Bhat et al. [23] implemented an SDN-based video streaming architecture that leverages SDN to build an assisted control plane to guide video players in their ABR decisions and CDNs for better caching. Bentaleb et al. [24] designed an SDN-based dynamic resource allocation and management architecture for HAS systems which aims to alleviate QoE degradation issues when multiple video players share the same network condition. Under the same objectives, Bentaleb et al. [25] a QoE-aware SDN-based bandwidth broker and management solution for HAS traffic in an hybrid fiber coax network network, which dynamically selects the optimal joint representation and the respective bandwidth allocation decisions to meet the per-session and per-group QoE objectives. For QoE optimization, two interesting data-driven ABR schemes termed CFA [26] and Pytheas [27] were designed and deployed over a central manager. CFA learns and determines a set of critical features automatically (e.g., CDN, client geographical region, video content, representations list, etc.) for different streaming sessions, and then it tries to find the best CDN and bitrate decisions by accurately predicting the video quality of each client. In contrast, Pytheas was proposed to eliminate the CFA limitations such as CFA cannot respond quickly to sudden bandwidth variations, and CFA suffers from a lot of known biases (i.e., incomplete visibility) across streaming sessions. Similarly, Sun et al. [28] proposed CS2P, a data-driven based throughput prediction framework. CS2P uses an hidden Markov model that takes bandwidth history and state transition for available bandwidth prediction. Mehrabi et al. [29] developed a network-assisted adaptation solution for DASH system. The proposed solution facilitates the access of multiple mobile clients to the set of replicated video contents over multiple edge servers considering the joint weighted maximization of viewer QoE and group fairness.
Most of these solutions usually made specific assumptions that do not reflect a real-world deployment as well as did not systematically investigate the impact of a live video delivery workflow. Hence, their performance for L3 streaming in volatile real-world deployments with variable network conditions may be suboptimal.

III. HxL3 ARCHITECTURE
The delivery workflow of HxL3 is highlighted in Fig. 1, which comprises two essential parts: video contribution and video distribution. Each part contains a set of core entities, which are: live source, ABR encoder, origin, VRP origin (VRP o ), standard legacy CDN cloud, multiple VRP edge (VRP e ), and HAS players. First, video contribution is the process of capturing the live video content via a live source (e.g., mobile phone), ingesting it for encoding and packaging in one of the HAS formats via an ABR encoder, publishing it to the origin server for ingestion, and then to the VRP o for distribution over the Internet. Second, video distribution is the process of transporting the packaged media segments to the CDN cloud and then to various VRP e 's located at different last-mile edge networks one hop from players. Each VRP e caches the minimum number of segments after receiving requests from the players to join various live sessions. Then, the corresponding VRP e distributes the media segments at scale across many concurrent HAS viewers per live stream. We note that the default communication from VRP o to VRP e 's is performed using standard UDP. In this case, the packet loss may increase which may lead to cache misses. To avoid this issue, we included a pull-based technique that triggers in case of segment loss and cache miss. The details of both VRPs are described below. Fig. 2 illustrates the detailed internal architectures of both types of VRPs. For ease of explanation, let us describe them in a top-down approach. To meet some important and desirable quality of service constraints (e.g., agility, optimized resource allocation, accessibility, and availability) as well as to provide comprehensive management capabilities, we propose three local and one global management layers for both VRP o and VRP e as follows ( Fig. 2): 1) Global Management Layer (GML): GML provides a holistic view of the system. It monitors and extracts useful information from both types of VRPs and also advertises the available live services in VRP o 's. Moreover, the GML enables to control the three local layers which are created underneath the GML. In the case of using the software-defined networking (SDN) concept and where each instance can be considered as VRP e or VRP o . Upon a possible VRP failure, the LFM conducts incoming requests for the VRP to the origin server until another instance is started. This function is exemplary, however, and LFM has the potential to be improved by adding more fault-tolerance mechanisms which is not the focus of this work.

3) Local Resource Management (LRM) Layer: To optimize
resource consumption by VRPs in order to cope with unexpected traffic load (e.g., flash crowd [30]), we introduce the LRM layer. Since both types of VRPs are designed based on the virtualization concept, LRM can apply useful methods such as scale-up/down to improve the efficiency of resources. 4) Local Request Management (LQM) Layer: In the proposed architecture, all requests from the players (received by VRP e ) or requests from VRP e 's (received by VRP o ) should be collected by defined interfaces in VRPs (the interfaces will be explained in the following section). However, in the proposed architecture, we introduce LQM layer to elevate VRP instances' performance in some cases, such as lack of VRPs' resources or existence of a fault in VRPs by managing the incoming requests to VRPs.

A. VRP e Architecture
As depicted in Fig. 2, all VRP e instances are executed under supervision of the four aforementioned management layers. Each VRP e instance consists of the following main components: two virtual east-west interfaces for communicating with HAS players and VRP o 's, players' requests analyzer, service optimizer, and cache agent. In the following, we describe them in detail:

1) Virtual East-West Interfaces (V e 2P and V e 2 V o ):
In the design of HxL3, each VRP e serves as a one-hop edge server for players. It should gather requests, process them, and then send a subset (or all) of requests to a VRP o through a CDN cloud to receive the requested media segments. The details of analyzing requests will be described in the Requests Analyzer model. These operations will be accomplished by cooperation with other internal components. Here, we introduce two components, namely VRP e -to-Player (V e 2P) and VRP e -to-VRP o (V e 2 V o ) virtual interfaces for communication between various entities in HxL3. Note that all the communication and traffic between VRPs is going through a CDN cloud with full CDN capabilities.
2) Requests Analyzer (RA): After gathering requests by the defined east-west interfaces, VRP e extracts useful data (i.e., requested video channel, segment number, and quality) and process them to make appropriate decisions. In HxL3, VRP e needs to send the following requests/commands to the VRP o : (i) launching a push process, (ii) changing the quality of media segments, and (iii) terminating the push process. As an initial operation, the extracted data enables VRP e to group similar requests that provide many advantages, such as optimizing the bandwidth consumption by sending only one request for each group. Receiving the first request by the VRP o is considered as launching a push process command. Then, VRP o uninterruptedly sends the segment to the requesting VRP e . Note that VRP e should modify the URL in the HTTP request message to be redirected to the designated origin or CDN server. VRP o does not change the quality of segments unless it receives the command (ii). Eventually, the segment transmission will be terminated when VRP e sends the third command. Regarding the defined operations, we define RA's main responsibility as follows: gathering players' requests, performing an initial process, and updating the meta-data table. Suppose that we start VRP e with no players at the beginning. The RA first creates two tables: (1) AllReqTBL: [PlayerID, ChannelID, Bitrate], and (2) GroupedReqTBL: [ChannelID, Bitrate, Quantity].
Then after receiving the first HTTP request, it fills the table AllReqTBL using extracted data from the HTTP message. Moreover, it updates the second table that groups the requests by ChannelID and Bitrate. The third column Quantity shows the number of requests with identical ChannelID and Bitrate.
3) VRP e Service Optimizer (V e SO): V e SO is the core component in VRP e that should consider the following questions: (i) Which VRP o must be selected for each row in Groupe-dReqTBL? (ii) how to serve each row in GroupedReqTBL? To do so, we first present four practical rules and their corresponding cost functions. Then, we propose a binary integer linear programming (BILP) model to determine an optimal set of rules for each request (denoted by r) in GroupedReqTBL by minimizing the total cost function. Let us first define rules E1 to E4 as follows: E1: Buffering the requested segment from VRP o . In this rule, VRP e must send request r to an optimal VRP o with index v if it has not been sent yet, else it should continue buffering segments from VRP o . In other words, if the request is not the first one and a group with similar demand already exists, then after updating the AllReqTBL table and Quantity column of the corresponding row in the second table, VRP e should continue to buffer segments for the group. The cost of this rule is related to the amount of bandwidth that is consumed for receiving segments.
E2: Transcoding a higher quality bitrate to the requested quality. This rule can be applied when V e SO prefers to transcode a segment with higher quality (that is being downloaded) to the requested one instead of downloading it from the VRP o . For applying E2, computation cost should be considered.
E3: Serving with a lower quality segment. Following CTA 5004 recommendation [31], in this rule, instead of downloading the requested segement from the VRP o (E1) or transcode the segment (E2), V e SO decides to serve the player's request by sending a segment with a lower quality. Rule E3 imposes a penalty related to the viewer's QoE degradation.
E4: The last rule lets a player or a group of players with similar requests experience a rebuffering event because none of the rules E1-E3 is applicable. Note that the cost of rebuffering is higher than other rules' cost.
In this study, we only consider these rules; however, we can easily add more rules by defining its cost function.

1) Mathematical Formulation:
We introduce a binary integer programming (BILP) optimization model to determine an optimal rule for each request in GroupedReqTBL with minimizing the total cost. We first define the set of variables and input parameters required in our optimization problem. Let R denote the set of requests which consists of all records in the table Groupe-dReqTBL, where r.channel, r.bitrate, and r.quantity indicate the properties of request r in GroupedReqTBL. Moreover, let V r denote the set of VRP o that can serve request r ∈ R.
To select an optimal rule for each request r, we need to define four decision binary variables: x v r , t r , l r and s r , where x v r = 1 indicates that request r must be sent to VRP o with index v and rule E1 should be applied, otherwise x v r = 0. Indeed, if any of the binary variables is equal to 1, then we must apply the corresponding rule, where t r , l r , and s r imply rules E2, E3, and E4, respectively. Now, we can define the set of constraints of our optimization problem. As for each request r ∈ R, only one rule must be selected at the time. Thus, the first constraint is defined as follows: To apply rule E2, V e SO needs to download a segment with the target bitrate level (i.e., with a higher bitrate than the requested one) depending on the network condition between VRP e and VRP o . Thus, the second constraint is defined as follows: if r.bitrate <r.bitrate and r.channel =r.channel. In other words, constraint (2) states that VRP e can apply the second rule on a requested segment when it downloads that segment with a higher bitrate. To apply E3 and serving request r by sending a lower quality, the following equation should guarantee that the lower quality must be available either by applying rule E1 or E2: where r.bitrate >r.bitrate and r.channel =r.channel (r is a request from another player requesting the same live channel with lower bitrate than request r). The following two constraints are defined to check the availability of required resource for applying rules E1 and E2. As shown in Fig. 2, during the segment download, VRP e measures the available bandwidth from itself to all connected VRP o s during each segment download. By having this, the following constraint states that the total required bandwidth for downloading segments from VRP o of index v must not exceed the measured available bandwidth (denoted by Finally, the last constraint states that the amount of required computational resources type k ∈ {CPU, RAM} does not exceed the available one on the VRP e when rule E2 is going to be selected for request r. Thus, we have: where r k and Ω k are the required amount of computational resource type k to run the transcoding function for request r and the total amount of resource type k at VRP e , respectively. It is worth mentioning that HxL3 is designed for Internet Service Providers (ISPs) to enhance the bandwidth utilization and the perceived QoE by clients. WLOG, suppose that an ISP rents at least one Virtual Machine (VM) for a time duration (i.e., a month) to run transcoding functions. The ISP also purchases a bandwidth plan to communicate with a set of VRP o servers. In addition to satisfying the five constraints (1)-(5), the BILP model should minimize the overall serving cost. Therefore, we introduce the following three cost functions F E2 (r) − F E4 (r) to measure the cost of applying rules E2-E4, respectively. It is important to note that although serving requests by E1 and E2 does not impose extra cost (since the ISP paid the resource cost for using in a time frame (i.e." a month)), for E2, we should consider the transcoding time and apply a penalty if the request cannot be served during the given time. The three cost functions F E2 (r), F E3 (r), and F E4 (r) are defined as follows: In (6), r.time and r θ are the average transcoding time of segment in r (that depends on segment duration and the target bitrate) and the average deadline to perform transcoding function, respectively. Therefore, if rule E2 is applied to the request r (i.e." t r = 1) and r.time is greater than r θ , then the cost function For rule E3, when V e SO decides to serve a request with lower bitrate, we define C E3 as its penalty. Thus, for obtaining F E3 , we should consider the number of similar requests with r through r.quantity that should be served by the lower quality segment. Similarly, by considering rebuffering penalty C E4 and the number of similar requests r.quantity, we formulate the last cost function F E4 . By adjusting appropriate penalty values C E2 , C E3 , and C E4 , we can prioritize the defined rules. Since HxL3 should prefer E3 over E4, we can simply set C E4 = Δ and C E3 = Δ where Δ is a large constant value and 0 < Δ < Δ; moreover, by selecting B × Δ < C E2 where B = max{r.quantity|∀r ∈ R}, HxL3 does not apply E2 when the given deadline r θ is less than r.time. Therefore, regarding the selected penalty values, if there is sufficient bandwidth, HxL3 selects E1, otherwise E2 should be opted if the required computational resource at V RP e is available; and, in case of insufficient bandwidth and computational resources, HxL3 will serve request r if constraint (3) let l r = 1 otherwise E4 is the only applicable rule. Finally, the BILP model can be introduced as follows: The proposed BILP model (9) runs periodically, but, the main question is, at which times it must be run to avoid any impact on the QoE? The BILP can be triggered in a time-slot manner. Assume that VRP e requests segments with various duration for different live streaming sessions. Now, we adjust the time-slot's duration to the shortest segment duration and then run the BILP at the middle of each time-slot. However in a real scenario, since the BILP model is NP-complete and takes more time than a segment duration to produce a solution, we propose a simple greedy-based heuristic algorithm.
2) Heuristic Algorithm: We propose a greedy-based heuristic algorithm (Algorithm 1) that consists of three main phases: (i) initializing (lines 1-4), (ii) optimizing (lines 5-15), and (iii) finalizing (line 16). The proposed greedy-based heuristic algorithm aims to determine appropriate rule for each request. Lines 1-4 of Algorithm 1 define six properties for each request r ∈ R as follows: (1) VRP o : an optimal selected VPR o for r ∈ R, (2) selectedRule: a final selected rule, (3) savedCost: the amount of saved cost by applying other rules (E1-E3) on request r in comparison with tempRule, (4,5) Hbr and Lbr: two subsets of R for each r that contain those requests with higher and lower bitrates but with an identical ChannelID respectively, and (6) tempRule: a temporary rule which is initially set to E4. As an example, let us consider six requests received by a VRP e for three live channels v1 − v3 with different bitrates and quantities illustrated (see Fig. 3). Here, for example, r 5 and r 3 construct the Hbr and Lbr sets for r 4 , respectively.
In line 6 of Algorithm 1, r refers to r with the maximum 'savedCost,' v is the selected appropriate VRP o for r , and c is the 'savedCost' value of r . In the while loop, we first call the 'SavedCostEvaluation' algorithm (Algorithm 2) for each r if its selectedRule is null (lines 8-9). By executing 'SavedCostEvaluation' algorithm, we measure the amount of saved cost obtained by applying E1. It is worth mentioning that assigning E1 to each r can impact on other requests in its Hbr and Lbr sets. Therefore, in the 'SavedCostEvaluation' algorithm, we first measure the saved cost, which is equal to the difference between the cost of applying tempRule and E1 for the given r (lines 1-3 of Algorithm 2). Note that the "Cost Function(a, b)" in lines 1 and 2, returns the cost of applying rule b on request a based on proposed equations in (6)- (8). Note that the cost value of applying "Cost Function(a, b)" for b=E1 is 0. Afterward, if the selected VRP o has sufficient bandwidth, then we continue the algorithm by investigating its impact on Hbr and Lbr (lines 7-15 of Algorithm 2), otherwise we set its savedCost to '−1' and return to Algorithm 1 (lines 5-6 of Algorithm 2 Let us come back to Algorithm 1. Line 13 will be executed if r is null, which means either E1 has been selected as selectedRule for all r ∈ R or there is not enough bandwidth to apply rule E1 on at least one request. In the case of r = null, we first set E1 as the selected rule for r and then update tempRule of other requests in its Hbr and Lbr sets. In the last phase of algorithm 1, we need only to copy all tempRule into selecte-dRule for those requests with selectedRule = null. The table in Fig. 3 shows the proposed heuristic algorithm's results in terms of tempRule and selectedRule for the given requests r 1 − r 6 with different required bitrates and quantities. For ease of explanation, we set C E4 = Δ = 100, C E3 = Δ = 10, and B = 500, where B = max{r.quantity|∀r ∈ R}. Thus, the minimum integer value of C E2 = 50001 since B × Δ < C E2 . Moreover, in this example, we consider one VRP o with the available bandwidth of 4000 kbps (ω o = 4000 kbps) and one VRP e with unlimited computational resources. The first column of the table shows the algorithm's phases and four rounds in optimization phase. In all phases, we show values of tempRule, selectedRule, and the available bandwidth of VRP o . In the initialization phase, E4 has been set as tempRule for all requests. Then, in the first round of the while loop, the proposed heuristic algorithm sets E1 for r 3 since through this decision it achieves the maximum saved cost. Furthermore, the temp rules of r 4 and r 5 , and also the available bandwidth of VRP o are updated accordingly. In the next round, rule E1 is selected for r 1 . Although assigning E1 for r 5 provides more saved cost, it should be ignored due to lack of VRP o 's bandwidth. Thus, we select E1 for r 4 and r 6 , in the next two rounds. As we can see, in each round, tempRule has been updated according to the selectedRule's changes. Finally, all values in tempRule are copied into selectedRule.
3) VRP e Cache Agent (VCA): VCA is an essential component in HxL3's architecture. The main responsibility of VCA is to apply V e SO's decisions and serve players. For example, for applying E2, VCA needs to prepare the requested segment by performing a transcoding function on a selected cached segment. Furthermore, for serving a client by E1, VCA should first wait for the requested segment to be buffered in the partial cache and then forward it to the destined player.

B. VRP o Architecture
Now, let us clarify the internal architecture of the VRP o instance. Each instance of VRP o consists of the following main components: two virtual east-west interfaces (V o 2O and V o 2 V e ) and VRP o Service Optimizer (V o SO). The V o 2O and V o 2 V e interfaces are designed to enable each VRP o instance to communicate with the origin server and VRP e s (through a CDN cloud), respectively. During the ABR encoding process, the origin server must push the produced segments to the VRP o . To this end, we can launch a VRP o instance to buffer segments in its local partial cache by creating a simple socket connection. The main responsibility of the V o SO component is to handle the following four types of control messages that are sent from VRP e to VRP o : 1) Stream Service Initialization: To request for launching a streaming service, VRP e sends the INIT command to the VRP o with the following parameters: (CMD= 'INIT,' Chan-nelID, QualityID[], MaxE2E). By receiving a message with CMD='INIT', V o SO first assigns a ServiceID to the request with default value that is equal to incoming sock-etID. Then, VRP o sends the selected ServiceID to the VRP e for its next related requests. Moreover, V o SO asks the cache agent to push a copy of segments to the requesting VRP e ; and, regarding the given upper bound of E2E latency in the message (denoted by MaxE2E), V o SO selects proper transport protocol (i.e., TCP and UDP) to carry the data. Note that MaxE2E can be used as r θ in (6). It is worth mentioning that advertising MaxE2E to the VRP o can enable content providers to offer different levels of QoE as well as various business plans to their customers. Note that we assumed CDN supports both transmission protocols and the desired latency. Finally, for each Q ∈ QualityID, V o SO records the following meta-data in its data-base (DB) module: [ServiceID, ChannelID, Q, Protocol], where Protocol is the selected transport layer protocol to serve VRP e .
2) Stream Modification: During serving users by the VRP e , it is possible that some players join or leave the streaming. This phenomenon can change the prior requested quality list by the VRP e . To handle such events and optimize bandwidth consumption, a VRP e must modify the stream in terms of quality of segment, by sending a command to the VRP o with the following inputs: (CMD='MOD,' Servi-ceID, ChannelID, Q []). ServiceID is the assigned ID to the streaming service that was replied from VRP o in response to the INIT command. Here, V o SO first updates the meta-data for unique ID ServiceID and then dictates the cache agent to serve VRP e with the new requested quality list Q []. As we mentioned earlier, VRP o is buffering different qualities for each live streaming in its partial cache. To prevent sending MOD commands and consequently overwhelm VRP o , we can enable VRP e to perform transcoding functions, serve higher requested qualities with lower ones, or buffers different popular qualities.
3) Stream Termination: By sending the following command (CMD='FIN,' ServiceID), VRP e explicitly requests for terminating the stream with the given ServiceID. Thus, VRP o after updating meta-data closes the connection.

4) Packet Loss:
As mentioned earlier, VRP o uses two main transport layer protocols: TCP and UDP. In the next section, we will show that TCP is the best candidate for serving applications that can tolerate higher E2E delay. In this case, V o SO decides to send segments with a larger duration. The upper bound of segment duration is limited by the given E2E delay. However, V o SO selects UDP to serve VRP e s that require lower E2E. Packet loss is one of the main issues of UDP while TCP has an elaborated strategy to deal with lost packets. Therefore, in using UDP, we delegate the responsibility of recovering packet loss to the application layer by designing sophisticated procedures for both VRP o and VRP e s. Therefore, we define the Loss Packet message to cope with losses that is common in the connectionless UDP. This message is used by VRP e to inform VRP o about the missed UDP packet or segment with the following parameters: (CMD= 'LOS,' ServiceID, UniqueID,Flag={0,1}). Indeed, VRP e requests for re-transmitting a UDP packet with unique identifier UniqueID (Flag=0) or segment with the number of UniqueID (Flag=0). Note that VRP o sends numerous UDP packets for each segment with unique identifiers. For instance, assume during data transmission for ServiceID S1, a UDP packet with packet id '0x12' is lost. Thus, VRP e sends a packet loss message with the flowing arguments: (CMD= 'LOS,' ServiceID= 'S1,' UniqueID='0x12',Flag=0). However, sometimes due to adverse network conditions, it is possible that VRP e misses receiving a segment before the deadline. In this case, VRP e should start buffering the next segment and retries to buffer the previous one by sending following packet loss message: (CMD= 'LOS,' ServiceID='S1,' UniqueID='100',Flag=1), where '100' is the segment number and flag=1 shows that VRP e is requesting for a segment. The details of proposed application-based loss recovery mechanism will be described in the performance evaluation section.

C. Video Players
We used our optimized player [8] for L3 streaming which is developed based on dash.js [16].

V. PERFORMANCE EVALUATION
In this section, we present the results from a real-world experimental evaluation of HxL3. We start by describing the practical implementation of HxL3. We then present the methodology, results, and detailed analysis. Our goal is to answer the following questions: 1) Can HxL3 provide high viewer experience with less rebuffering at a low latency? How does it compare against an existing solution that fetches the live segments directly from the origin server with CMAF and CTE? 2) What is the performance of the HxL3 heuristic-based algorithm? and How effective is HxL3 using different real-world scenarios: one player, multiple players, different VRP e locations, and encoding parameters (i.e., segment duration).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. 3) What is the performance of HxL3 in various real-world network conditions? and What is the impact of network conditions and transport protocols on the HxL3 performance?

A. Implementation
HxL3 includes six main entities: (i) FFmpeg-based encoder (with DASH packaging), (ii) Python-based origin server, (iii) Python-based VRP origin, (iv) Python-based VRP edge, and (v) JavaScript-based dash.js player (v3.0.1). For the set of VRPs, we have created two variants: one uses UDP and the other uses TCP for data transmission while using HTTP/1.1 at the application layer. For the player, we have used the dash.js based player proposed in [8].

B. Methodology and Evaluation Setup 1) Experimental Scenarios:
We evaluate the effectiveness of HxL3 using four main scenarios: (S1) scenario 1 (one player), (S2) scenario 2 (multiple players), (S3) scenario 3 (VRP edge locations), and (S4) scenario 4 (segment duration). In all scenarios, we fixed the location of the encoder, origin server, and VRP origin in Singapore while the players are located in Austria. Furthermore, we used HTTP/1.1 as the delivery protocol at the application layer. We found that results are following the same trend for considered solutions in configuration with and without CDN in the delivery path. For this reason and simplicity, we have omitted the CDN during the experiments. The remaining description of each scenario (S1-S4) is highlighted in Table I.
We compared HxL3 against a CMAF-based live streaming solution [14] that streams directly from the origin using CMAF, CTE, and TCP as transport protocol (Origin-based). We have also investigated the performance of HxL3 using TCP or UDP as transport protocol for communications between VRP o and VRP e . To achieve this goal, we have created two variants of our solution, namely: HxL3-UDP (default) and HxL3-TCP.
2) Video and Encoding Parameters: Consistent with the setup of the Twitch grand challenge [32], we used the animation video sample Big Buck Bunny (BBB). 2 We encoded the video sample using the H.264 codec of FFmpeg with an ABR ladder of three representations {360p@200 Kbps, 480p@600 Kbps, 720p@1000 Kbps} and segment duration (denoted by τ ) ranging between half a second and 4 seconds depending on the scenario. The target latency is set to 3 seconds, each live streaming session is 600 seconds long. We note that for origin-based solution, we fixed CMAF chunks duration to 33 milliseconds (1-frame at 30 fps). 2 [Online]. Available: https://peach.blender.org/

3) VRP Parameters:
For developing HxL3-UDP and HxL3-TCP, we used different strategies. To enable the application layer to retrieve UDP packets that have been lost, VRP o adds the following metadata to each header of the UDP packet for the segment s to be downloaded: the checksum of the segments' data (40 bytes), the sequence number of the packet (4 bytes), the number of all UDP packets of s (4 bytes), the length of the UDP packet (4 bytes), and the name of the requested segment (24 bytes). These metadata assist the application layer to appropriately receive segments and re-request in case of packet lost. The UDP packet size is set to 4000 bytes. The VRP o in HxL3-UDP consists of the following main threads: (a) NewSegment(): It is responsible for sending a newly received segment (from the origin server) to the VRP e by running a thread of UDPSend() to send the whole segment in UDP packets. Note that to reduce E2E latency, UDPSend() sends packets through different destination port numbers based on the number of representations in the ABR ladder (i.e., a port for each representation; we used three ports in our experiments); (b) PktLoss(): It waits to receive a UDP packet from the VRP e for retrieving a lost packet. Upon receiving a request, PktLoss() runs a thread of UDPSend() to re-send the lost UDP packet. Since in our implementation we configure the encoder to produce segments with an ABR ladder of three representations, the VRP e in HxL3-UDP hosts three main functions to simultaneously receive UDP packets of segments in three various qualities. Moreover, VRP e employs another thread function to check the received packets and request for lost ones. To boost the efficiency of VRP e 's recovery function, we set a threshold as the maximum waiting time for a segment that should be less than segment duration. The design of HxL3-TCP's has a less complexity since TCP recovers packet loss itself. Therefore, regarding three segment qualities, VRP o uses three functions that concurrently send the generated segments to the VRP e over three different sockets.

4) Evaluation Metrics:
To evaluate the performance of HxL3, we used the following metrics: Live Latency: The live (E2E) latency model is calculated as the delta time between video capturing at live source to its rendering at the player.
QoE Model: We used the same linear QoE model presented in original paper [8] which conforms with requirements of L3 streaming. At each segment download, the QoE model uses five main metrics: selected bitrate, rebuffering duration, live latency, bitrate switches, and playback speed.

5) Experimental Setup:
Our setup is depicted in Fig. 1 (without CDN) and consisted of one physical workstation, two AWS EC2 medium instances, and one laptop. All the machines where running Ubuntu 19.10 and connected through the public Internet. The physical workstation running two virtual machines (VMs). The first VM is used to run the FFmpeg encoder with DASH packaging enabled which receives a live source (BBB video) and feeds into the origin server. The second VM is used to run a Python-based server with an origin functionalities. To run both VRPs, we used the first AWS EC2 that is located in Singapore and used to run VRP origin. While the second AWS EC2 is located in different locations as shown in Table I and used to

C. Results and Analysis 1) Offline Experiments:
To evaluate the efficiency of the proposed BILP model (9) in comparison with the heuristic-based algorithm, we run three sets of offline experiments by varying (i) available resources (i.e., bandwidth and computational capacity), (ii) number of parallel live channels (CH), and (iii) number of simultaneous requests from the players. We developed a Python-based code to simulate our live streaming scenarios (Table I) considering the total number of requests, viewers, total network capacity (bandwidth), and live streaming channels. We run our offline experiment in a time-slot manner. The duration of each time-slot and each segment are set to one and two seconds, respectively. In each time-slot, after collecting requests, we run both heuristic-based algorithm and BILP model to find a solution. It is worth mentioning that time-slot duration can implicitly impact the QoE since requests should be held for a decision in each time-slot. However, the minimum duration is determined by the performance of the proposed approaches in term of time complexity. To cope with this issue, a potential solution is to develop a machine learning based online algorithm which is subject to future work.
The simulation time is set for 100 time-slots. During the simulation, the random number of players up to threshold Req.# join channels and stay for random intervals between 5 to 20 seconds. Regarding cost function (6), we need to simulate r θ as the average deadline of transcoding request r. To do this, we define θ as the percentage of requests in each time-slot that would violate the deadline if VRP e transcoded them. Moreover, we limit the number of requests that can be simultaneously transcoded on VRP e in each time-slot to Ω for simulating the computational resource constraint. The average results of these experiments over ten runs are shown in Fig. 4. In the first experiment ( Fig. 4(a)) with Req.#=500, we investigate the impact of increasing available bandwidth (denoted by BW) for different numbers of channels (CH#) on the performance of the BILP model in terms of the normalized objective function (Obj), percentage of bandwidth usage (BW), and selecting rules E1-E4. As illustrated, the BILP model efficiently uses the available bandwidth in all scenarios by employing rule E1. Moreover, for each value of CH#, increasing bandwidth decreases objective function due to fetching more segments from VRP o . It might be interesting to answer this question: Why does the BILP model rarely use E2 (it happened only in CH#10 with BW×4)? Note that for transcoding, we need to first download a segment with higher quality then transcode it to the requested one; so, more bandwidth is required. Therefore, if the BILP model selects E2 for some requests, many requests demanding segments with lower quality should be severed by E4 that can dramatically increase the objective function.
We extend our investigation by running the BILP model over five scenarios (s1-s5) including different values of live channel (CH#={5, 10, 15}), maximum allowable concurrent transcoding at each time-slot (Ω = {5, 20}), and the percentage of requests with a deadline lower than required transcoding time (θ = {0%, 60%})(see Fig. 4(b)). As we can see, in all scenarios, almost all amount of available bandwidth is used through serving efficient number of requests by rule E1. Moreover, by increasing Ω from 5 to 20 (in s1 and s2), the objective function slightly decreases since more requests can be served by E2 instead of E3. To achieve the minimum objective function, the BILP model consumes the maximum of computational capacity (denoted by Cmp%). The percentage of using E1 does not depend on the number of channels because of running aggregation function over received requests at VRP e . For example, in s2 and s3, both consume equal bandwidth, but the BILP model serves more requests via E1 and E3 in s2 and s3, respectively. This phenomenon also can be seen in the last scenario with 15 channels. By considering θ = 60 in scenario s4, the probability of employing E2 diminishes due to insufficient time for transcoding segment (see E2 and Cmp% in s4); therefore, the BILP model fetches the lowest requested bitrates (rule E1) and serves remaining requests with lower qualities (rule E3). In Fig. 4(c), we compare the performance of BILP and the heuristic algorithm for different number of requests. In general, we can see that the proposed heuristic algorithm achieves a near performance compared to the BILP model in all scenarios with different number of requests. While by increasing the number of requests (Req.#) with 100 channel number, the object function increased considerably. This is because, we set the bandwidth to a constant value and then many of the requests are served by the lower quality (rule E3) or even get rebuffering (rule E4). Fig. 4(d) highlights the execution time comparison between the heuristic algorithm and BILP model for different number of live streaming channels and simultaneous requests. The results show that the proposed heuristic algorithm has a better performance in term of execution time and can be employed in online scenarios.
2) Online Experiments: For each of the following scenarios, we executed two runs and took the average results.
Scenario 1: In this scenario, we aim to study the effectiveness of HxL3 over the Internet deployment and also analyze the impact of E2E path between endpoints with different network conditions, transport protocols, and delivery architecture on the viewer QoE. The player results are shown in Table II and Fig. 5(a). In the origin-based architecture, the player is directly served from the origin, where a long E2E content delivery path does not involve any intermediate entity between the endpoints (player and origin server). For both runs, Fig. 5(a) highlights the available bandwidth measured at the player. It is shown that the available bandwidth for the player that uses origin-based experiences low bandwidth and exhibits significant variation, which is caused by long E2E path. Similar variations is experienced by the player that uses HxL3-UDP, but with high bandwidth (average of 41% better for both runs). This is because the player (in Austria) is served from a close point (VRP edge) that is located (in Germany) near to it.
In the origin-based delivery, we observe that when the origin is located far away from the player, the E2E latency fluctuates between 1.5 s and 3 s (with an average of 2.4 s). Such latency fluctuation is introduced because of the high RTT along the delivery path. Hence, this will have a significant impact on the TCP slow-start performance by causing its congestion window to increase at a slower and unpredictable pace. Further, even when a persistent HTTP connection is used and the TCP connection is kept alive, the origin server keeps the cycle of slow-start phase at each requested segment due to the time interval between requests, which makes the origin consider the connection to be idle. Therefore, this phenomenon impacts the download process of all media segments and causes a low bandwidth at significant variations.
It is also observed that player results of HxL3-UDP and origin-based are comparable with highest possible viewer QoE (high bitrate selected, low rebuffering duration, less bitrate switches, and low latency), and latency fluctuates within its target as shown in Table II. This is because in the origin-based solution, CMAF with CTE are used allowing the chunks of the current segment that is being prepared at the encoder to be delivered at the same time to the player. Therefore, it helps the player to achieve a good performance.
Scenario 2: In this scenario, we aim to analyze the impact of multi-players that are located in the same network and joining simultaneously the same live session, whose average results are highlighted in Table III and Fig. 5(c). Note that the segment duration is fixed to 2 s for HxL3-TCP and 0.5 s for HxL3-UPD and origin-based.
First, Fig. 5(c) shows that each player in origin-based and HxL3-UDP experiences similar bandwidth variation. For example, for all the players the average bandwidth is ranging between 5 Mbps and 8 Mbps, and between 2 Mbps and 4 Mbps, for HxL3-UDP and origin-based, respectively. Only in HxL3-TCP, the players experiences different bandwidth variations in each run because of the congestion control mechanism as RTT and latency increases (average of 9 seconds for all the players as shown in Table III) the bandwidth decreases. For example, for PL1 the average bandwidth in run1 (11 Mbps) is higher 36% compared to run2 (7 Mbps). This impacts the player performance a bit by increasing rebuffering duration, bitrate switches, E2E latency, as well as reduce the bitrate compared to other solutions. As we are streaming over Internet, there are occasionally some fluctuations in the bandwidth among all the solutions. On the other hand, we can see that HxL3-TCP with 2 s segment duration works confirming our outcome in Scenario 1, where TCP (without CMAF and CTE) has an issue of low performance with small segment durations (<1 s). With larger segment durations, the player performance gets better. However, this benefit comes at the expense of higher latency.
Second, HxL3-UDP generally works better with a small segment duration. As it uses UDP, the media segments are transmitted between VRPs faster without any loss (slightly reduce bandwidth compared to HxL3-TCP), enabling high player performance in terms of QoE with an average of 4.75 and latency of 2.72 s among all the players. Also, it is noticed that all the players experience mostly similar performance resulting in an acceptable level of fairness. These results are similar to what players have achieved in the origin-based delivery which is mainly because of CMAF and CTE capabilities as well as the fair and stable (same variation among all the players) E2E bandwidth, RTT and latency, which are generally dominated by the number of players and delivery path length and is validated through our measurement results in Table III and Fig. 5(b). In addition, we have noticed that the startup delay in HxL3-UPD and HxL3-TCP largely depends on the segment duration and network conditions. Thanks to CMAF and CTE, the startup delay is decoupled from the segment duration and only depends on the network conditions, where the player can start rendering as available chunks arrive without requiring the full segment to be present at the playback buffer. For instance, players that use origin-based achieve slightly better average startup delay of 0.8 s among all players compared to HxL3-UDP with 0.94 s and HxL3-TCP with 2.03 s.
Third, when the number of players increases, the origin-based solution is expected to suffer form poor performance. Similar outcome is highlighted in [10], [11]. Specifically, the the origin server may become overloaded because of a large number of requests, and also the long delivery path between endpoints might be exposed to severe congestion. Such issues are very detrimental for the viewer experience. This problem is exacerbated in the case of flash crowds [30] as the number of players increases from a few to many in few seconds. Based on our initial investigations, with an origin-based test of only 10 players, the players start performing poorly with low QoE and high latency. The results are not shown as we were not able to collect the logs because some of the players could not start the streaming, others stop in the beginning, and some in the middle of the streaming session. In contrast, HxL3 is designed to address these issues allowing players to achieve the best performance in L3 scenarios. Fourth, as mentioned above for TCP problems over long paths, when HxL3 is deployed over TCP, the players sometimes achieve an acceptable (not best) performance compared to HxL3 over UDP. In HxL3-TCP, the VRP edge experiences several cache misses which increases depends on distance between VRPs. This impacts the player performance. In contrast, HxL3-UDP avoids these issues and achieves the best results. Based on this outcome, it is recommended to use HxL3 with UDP for communication between VRPs in order to guarantee L3 requirements at scale. In the future, we are also planning to integrate HxL3-UPD with the SRT protocol 3 for robust delivery with minimal packet losses as well as CMAF/CTE with HxL3-TCP.
In general, when the number of players increases, it is noticed that performance of players degrades and suffers from low performance, mainly due to long E2E delivery paths that exhibit low bandwidth with high network condition fluctuations, and large RTTs. For example, we tried to run 10 players simultaneously on the same network in Austria, but some of them run with low performance and few of them could not even run, which confirms our observation that the origin-based solution is not good at scale. Although the origin-based solution achieves comparable results to HxL3-UDP in small setup with low numbers of players (e.g., 1-6), it is still far away from real world deployment as highlighted in [33]. Hence, HxL3 represents the best alternative to satisfy L3 streaming requirements with high QoE at scale. Another observation is that as the RTT of the delivery path gets higher, the overall E2E available bandwidth gets lower with high variation. Taking origin-based delivery as an example, it experiences an average E2E bandwidth of 3.3 Mbps, compared to 5.7 Mbps for HxL3-UDP. This is because longer RTT means it takes longer for the TCP congestion window to grow in the slow-start phase, which results in lower overall bandwidth. This problem is the main cause that omits players in HxL3-TCP from even running the live session. Therefore, such a problem can be alleviated if (i) longer segment duration is used or (ii) integrating CMAF and CTE. So, if segment size gets larger, their TCP performance gets more resilient to the fluctuating latency since the slow-start phase becomes less dominant of the bandwidth. This outcome is shown in Figs. 5(b) and 5(d) of scenarios 3 and 4, respectively. We note that number of switches are higher in HxL3 compared to the origin-based solution because of the player ABR. In the case of origin-based, the player throughput measurement module considers the whole path from the origin until the player. While in HxL3, the player throughput measurement considers only the path between VRP e and the player. As the last mile path experiences more variability, the ABR switches more between available bitrate levels.
Scenario 3: The main objective of this scenario is to investigate if the location of VRP edge matter to achieve better user experience. To do so, we deployed VRP edge over AWS instances located in London and USA (Ohio), with preserving the same location of other entities in the deliver architecture as highlighted in Table I. Average results of each run for this scenario with different segment duration (0.5 s and 4 s) are shown in Table IV and Fig. 5(b). The first observation is that HxL3-UDP and HxL3-TCP players experience low performance with low QoE and high latency (> target latency) for different segment durations. This is mainly because of the VRP origin location that is located far from the players. Therefore, the performance of the players is largely depends on VRP location. As the distance increases, the player suffers from low video quality, high video instability and latency. For example, the player in HxL3-UDP (τ = 0.5 s, location = London) achieves an average bitrate of 633 kbps, switches of 39, and latency of 3.1 s, leading to QoE 3 [Online]. Availible: https://www.srtalliance.org/ of 2.4. Here, even the segment duration is small but it does not help as number of packet loss and cache misses increase, and bandwidth decreases with high fluctuation (between 1.9 Mbps and 3.2 Mbps) as shown in Fig. 5(b) (left plot). As the distance between VRP edge and player increases, the bandwidth experienced by the player is very low. This outcomes is clearly described in Fig. 5(b) where the bandwidth in VRP edge located at Ohio is decreased by 64% (from 2.5 Mbps to 0.9 Mbps as average) compared to its location at London. Similar results are experienced by the player in HxL3-TCP this because of slow-start problems caused by long delivery path. Meanwhile, although caching more segments produces higher overall bandwidth, no further QoE benefit can be introduced as VRP edge is far. Instead it introduces more latency and rebuffering. Note that player in HxL3-TCP (τ = 4 s, location = London) for Run 1 selects high bitrate as bandwidth was good with minimal congestion and small RTT (around an average of 8 Mbps).
Scenario 4: After showing the importance of VRP edge location that should be near to the players for assured QoE. In this scenario, we are aiming to investigate the impact of segment duration on user experience, whose results are presented in Table V and Fig. 5(d). For a short segment duration (τ ranges between 0.5 s to 2 s), the player in HxL3-UDP outperforms both HxL3-TCP and origin-based, confirming our outcome that is HxL3-UDP works better under short segment duration. Using short segment duration helps HxL3 to achieve a low latency with and average of 0.46 s and 1.96 s for HxL3-UDP and HxL3-TCP, receptively. For both runs, the player experiences a high bitrate, minimal rebuffering, low latency and switches with an average of of 993 kbps, 0.46 s, 2.95 s, and 3.74, respectively. It also experiences an average bandwidth of 6 Mbps and minimal variations. HxL3-TCP suffers from slow-start problems which increases cache misses and results in low player performance. It also experiences slightly low bandwidth of an average of 4 Mbps. Because of long distance between player and origin, the player in origin-based experiences low bandwidth (3.3 Mbps), but still achieves good results because of short chunks (encoded at one frame) that are pushed by the origin in burst to the player. In the other hand, for long duration segment (τ ranges between 4 s and more), both HxL3 variants achieves good results compared to origin-based. The latency in this case is high as expected with an average of 7.27 s and 12.75 s for HxL3-UDP and HxL3-TCP, respectively, as segment sizes increase takes more time to download. As shown in Fig. 5(d), the bandwidth is higher compared to short segment duration, as the segment sizes increase allowing in alleviating the TCP slow-start problem and selecting higher bitrate. In contrast, even with CMAF and CTE capabilities, the player in origin-based suffers from very low and high variation bandwidth (average of 0.9 Mbps) because of long delivery path as well as TCP slow-start problems. This significantly impact the player performance. The four scenarios above answers the set of asked questions on HxL3 performance and provide general guidelines on how the proposed solution is impacted by different factors such as RTT, segment duration, TCP slow-star, VRP edge location, and number of players. Thanks to the heuristic Algorithm 1 and the robust design and formulation of HxL3 (see Section III) as well as respecting the best recommendations in each scenario, HxL3 is able to provide high viewer experience with low latency at scale (as shown in offline experiments), satisfying L3 streaming requirements. In addition to these recommendations, selecting the suitable HxL3 implementation will also depend on L3 use case and application requirements. Although distance between endpoints is long, streaming from origin using CMAF and CTE with small set of players shows a good performance which raise a question for future investigation of hybrid solution that combines CMAF/CTE capabilities with HxL3. We note that the selection of the best VRP e for a set of players can be performed dynamically (e.g., at the segment boundary) either at the player-side or third party analytic service based on many metrics like geographical location, latency, quality, RTT, packet loss, etc.

VI. CONCLUSIONS
This paper proposes HxL3-a novel architecture for low latency and QoE guarantee in Low latency Live (L3) streaming at global Internet scale. HxL3 is a protocol-and-codec-agnostic and can work with existing HTTP-based video streaming implementations. It successfully addresses different issues where transport protocols like TCP experiences poor bandwidth when the E2E delivery path between endpoints is long. Through robust design and formulations that include caching and prefetching policies as well as transcoding capabilities, HxL3 can fetch and store a minimum number of live media segments at the edge. The set of entities between origin server and players can be easily deployed as VNF in first-mile (video contribution) and last-mile networks (video distribution), splitting the E2E delivery path into three parts. Such splitting allows to boost player performance with significant reduction of load and overhead that origin-based solution has. It enables to cache few segments at the edge which increases the bandwidth. We have provided a practical implementation of the proposed solution and real-world experiments over Internet show the advantage of HxL3 over its competitors in improving QoE for a given target latency. In the future, we are planning to add more protocols support to HxL3 as well as adapting various AI-based techniques for better caching and prefetching polices.