FSW: Fulcrum Sliding Window Coding for Low-Latency Communication

Fulcrum Random Linear Network Coding (RLNC) combines outer coding in a large Galois Field, e.g., GF(28), with inner coding in GF(2) to flexibly trade off the strong protection (low probability of linear dependent coding coefficients) of GF(28) with the low computational complexity of GF(2). However, the existing Fulcrum RLNC approaches are generation based, leading to large packet delays due to the joint processing of all packets in a generation in the encoder and decoder. In order to avoid these delays, we introduce Fulcrum Sliding Window (FSW) coding. We introduce two flavors of FSW: Fulcrum Nonsystematic Sliding Window (FNSW), which divides a given generation into multiple partially overlapping blocks, and Fulcrum Systematic Sliding Window (FSSW), which intersperses coded packets among the uncoded (systematic) transmission of the source packets in a generation. Our extensive evaluations indicate that FSSW substantially reduces the in-order packet delay (for moderately large generation and window sizes down to less than one fourth) and more than doubles the encoding and decoding (computation) throughput compared to generation-based Fulcrum.


I. INTRODUCTION
Random Linear Network Coding (RLNC) is a Forward Error Correction (FEC) mechanisms that linearly combines source packets according to random coding coefficients in a Galois Field GF to create coded packets. The coded packets carry the random coding coefficients in the packet header. A destination can decode the coded packets from the coding coefficients without any need for additional signaling or coordination between the encoder and decoder, making RLNC attractive for a wide range of communication scenarios in general wireless networks [1]- [4] and wireless sensor network [5]- [7], as well as backhaul networks [8], [9] and content distribution networks [10]- [13].
In particular, with RLNC, the decoder can decode N source packets from any set of N received coded packets as long as the coded packets (i.e., the corresponding coding coefficients) are linearly independent. A large Galois Field, e.g., GF (2 8 ), ensures that the coding coefficients are linearly independent with a very high probability, but requires complex computations for the decoding. Therefore, GF (2 8 ) coded packets can reasonably only be decoded on computationally powerful destination nodes. In contrast, the small Galois Field GF (2) suffers from a relatively high probability of linear dependent coding coefficients, but can be decoded with elementary Exclusive OR (XOR) operations with low computational complexity. Thus, GF (2) coded packets can be decoded on nodes with little computational power, e.g., Internet of Things (IoT) actuator nodes.
Conventional RLNC is restricted to operating only in a single Galois Field and thus can either cater to destination nodes with high computation power or low computation power. In contrast, Fulcrum RLNC combines a large-field outer coding, e.g., in GF (2 8 ), with inner coding in the small field GF (2), so as to flexibly cater to both high-powered and low-powered destination nodes [14], [15]. Thus, Fulcrum RLNC enables the use of RLNC as FEC for scenarios with sets of diverse destinations that differ widely in their computing capabilities, e.g., IoT systems with high-powered gateways, and low-powered actuators. However, existing Fulcrum RLNC schemes are restricted to operate only on the basis of so-called generations, which are groups of N source packets. With generation-based Fulcrum RLNC, all N source packets in a given generation are randomly combined during encoding; hence, all coded packets for a given generation need to be received before the coded packets can be decoded to obtain the source packets. The generation-based operation introduces delays that are proportional to the generation size N .
Finite sliding window RLNC seeks to reduce the delay due to the coding structure by combing the source packets in a relatively small coding window covering w, w < N , source packets so as to reduce the RLNC latencies [16]- [18]. Additionally, source packets can be sent in uncoded (so-called "systematic") form to bypass the decoding, and to avoid the delays associated with encoding and decoding [19]- [23]. The systematic source packet transmissions can be FEC protected by coded packets that are combinations of the source packets in a window and are interspersed among the source packet transmissions [19]- [21], [24], [25]. To the best of our knowledge, all such existing schemes for reducing the latency in RLNC packet communication operate only with a single Galois Field size and thus can only cater to either destinations with high or low computational capabilities.

A. CONTRIBUTION AND STRUCTURE
Based on the review of the related literature in Section I-B, we develop and evaluate the first sliding window approach to Fulcrum RLNC coding so as to enable RLNC FEC to diverse sets of destinations while mitigating latencies through systematic source packet transmissions and sliding window RLNC coding. The Fulcrum Sliding Window (FSW) coding developed in Section II incorporates the sliding window mechanism in the Fulcrum inner coding. In particular, Fulcrum Non-systematic Sliding Window (FNSW) conducts the inner coding for a generation of N source packets (and associated outer coded packets) through coding multiple partially overlapping blocks and sends only coded packets. In contrast, Fulcrum Systematic Sliding Window (FSSW) transmits the source packets and outer coded packets in systematic form, interspersed by sliding window inner coded packets. In summary, our main original contributions towards developing novel network coding methodologies are: • We design the Fulcrum Sliding Window (FSW) methodology for RLNC, the first RLNC approach that permits the flexibility of utilizing different Galois fields (cater-  [14], [15], [26]- [30] Generation 2 different GFs Slid. Window [16]- [18], [31]- [33] Slid. Window Single GF FSW (this article) Slid. Window 2 different GFs ing to receivers with different computing capabilities) with a low-latency sliding window. • We design the Non-systematic FSW flavor (called FNSW) which codes all transmitted packets • We design the Systematic FSW flavor (called FSSW) which further reduces latencies by transmitting source packets in systematic form. Our performance evaluations in Section III compare the two proposed FSW flavors with the low-latency single Galois Field PACE RLNC approach [25] that intersperses coded packets among the systematic source packet transmissions of a given generation and with a small-generation variation of the original generation-based Fulcrum RLNC [14]. We find that FSSW attains the short in-order packet latencies of PACE while achieving higher encoding and decoding throughput levels than PACE and conventional Fulcrum and achieving nearly as high decoding probabilities as PACE. We also find that FSW coding effectively differentiates the decoding complexity between inner decoding in GF (2) and outer decoding in GF (2 8 ). Section IV summarizes the conclusions from this first FSW study and outlines future FSW research directions.

B. BACKGROUND AND RELATED WORK 1) Generation-Based Fulcrum RLNC
Fulcrum RLNC [14], [15] provides a convenient RLNC coding mechanism to provide the performance characteristics of multiple Galois Fields [34] to heterogeneous receiver nodes. Recent Fulcrum RLNC research has examined the reduction of the encoding and decoding computational complexity through sparse coding with a low density of non-zero coding coefficients [26] as well as the adaptivity and energy efficiency in fog computing and smart city scenarios [27]- [30]. To the best of our knowledge, as summarized in Table 1, all existing Fulcrum RLNC related research has been limited to generation-based coding, which introduces relatively long packet delays. In contrast, we introduce Fulcrum Sliding Window (FSW) coding in this study to reduce the packet delays.

2) Single Galois Field Sliding Window RLNC
Sliding window RLNC [16]- [18] avoids the delays of processing full generations of packets, related research has explored the mixing of generations [35] and batched network coding [36], [37]. The low-latency characteristics of sliding window RLNC are well suited for media streaming, which has been studied in [31]- [33]. Generally, sliding window RLNC can operate in a non-systematic mode or a systematic mode. Before delving into the review of the two modes of sliding window RLNC, we briefly note that an alterative approach to reduce packet delays is to focus on systematic source packet transmissions and to enable the recovery of erased systematic source packets through interspersed coded packets, e.g., through the so-called PACE approach [25]. a: Non-systematic Sliding Window Non-systematic sliding window coding partitions the original source packets into overlapping windows (also referred to as blocks) consisting of w source packets, whereby the window size w is smaller than the generation size N . The w source packets in a given block are coded together. After generating enough coded packets for a given block, the window slides forward by at most w source packets; the more overlap, the more reliable the coding.
Non-systematic sliding window coding has been studied for different types of coding, such as digital fountain codes for streaming multimedia [38], Raptor codes for efficient video broadcasting [39], and BATS codes [40]. An analytical model for non-systematic sliding window coding has been developed in [41], while the specific application context of mobile ad hoc networks has been studied in [42], [43] and cooperative communication has been explored in [44].

b: Systematic Sliding Window
Systematic sliding window coding transmits a subset of u source packets in uncoded (systematic) form, followed by the transmission of coded packets. These coded packets are random linear combinations of the w source packets in the (coding) window. Research has explored mechanisms for advancing the systematic sliding window with feedback from the destination to the sender [21], [45] and without such feedback [46], [47]. Throughout this study, we consider sliding window coding without feedback.
To the best of our knowledge, all existing sliding window research has considered RLNC with a single fixed Galois Field, limiting the existing sliding window RLNC approaches, e.g., to either high-powered or low-powered destinations. In contrast, we introduce Fulcrum sliding window RLNC coding to support diverse sets of destinations.

A. BACKGROUND: GENERATION-BASED FULCRUM RLNC
The general Fulcrum coding principle is that a generation of N source packet is expanded by r expansion packets that contain redundant information to a total of N + r coded packets. More specifically, the original source packets P = {P 0 , P 1 , . . . , P N −1 } are first multiplied with outer coding coefficients c ℓ,j that are randomly selected by the encoder from GF (2 h ) [14]: c ℓ,j · P j , ℓ = 1, 2, . . . , r. (1) These expansion packets are concatenated with the original source packets P to create the set of outer coded packets  Then, these outer coded packets are multiplied with the inner coding coefficients λ ℓ,j that are randomly selected by the encoder from GF (2) to create the inner coded packets: The inner coded packets I ℓ can be recoded in the intermediate nodes following the general RLNC recoding principles. Decoders can decode the received packets with inner, outer, or combined decoding, depending on their computing power.

B. PRINCIPLES OF FULCRUM SLIDING WINDOW (FSW)
This section explains how non-systematic and systematic sliding window schemes can be integrated into the Fulcrum coding. The integration is applied to the Fulcrum inner coding coefficients since the number (typically at least N + r) of packets generated by the inner encoder is much larger than the number (r) of packets generated by the outer encoder. Also, all three types of Fulcrum decoders require the coded packets generated by the inner encoder. Hence, the structure of the inner encoding has a substantial impact on the complexity and latency of the decoding; whereas, the outer expansion packets generated by outer encoder help only the outer and combined decoders to increase the packet decoding probability, not to lower the packet latency.
In the proposed FSW coding, the outer encoder generates outer coded expansion packets in the conventional way. Specifically, r outer coded expansion packets are generated with dense outer coding coefficients from a large field size, e.g., GF (2 8 ), see Section II-A. In FSW coding, the inner encoder randomly selects the coding coefficients in the window from GF (2) and sets the other coefficients to zero.

C. FULCRUM NON-SYSTEMATIC SLIDING WINDOW 1) Encoding
In order to keep the encoding and decoding computation time as well as the complexity low, Fulcrum Non-Systematic Sliding Window (FNSW) divides a data stream into blocks, whereby the block length equals the window length w. In the  encoding, only the w packets in a given block are combined together. Figure 1(a) illustrates the coding coefficient matrix of the inner encoder for a generation of N = 9 source packets and r = 2 outer coded packets. The block length is w ≤ N , and equivalently, the window size is w = 5, while the moving step is m = 3 packets. As illustrated in Figure 1(a), the first c FSSW × w = 5 × 5 block of inner coding coefficients, which is marked with gray shading in the upper left of Figure 1(a) consists of the coding coefficients corresponding to the packets P 0 , P 1 , P 2 , P 3 , and P 4 . According to the example in Figure 1(a), the encoder has to generate at least 5 coded packets out of these w = 5 source packets in order to enable the decoding of this block. In the example in Figure 1(a), the coding coefficient row for creating the first coded packet is 1, 0, 1, 1, 0, i.e., the first inner coded packet I 0 to be sent in time slot 0 is a combination of the source packets P 0 , P 2 , and P 3 (since the corresponding coding coefficients are 1), while the coding coefficients corresponding to P 1 and P 4 are zero (i.e., P 1 and P 4 are not included in I 0 ). After the encoder generates c FNSW coded packets from a block, the coding window is moved forward by m = 3 packets, as summarized in the flowchart in Figure 2. The second block consists of the inner coding coefficients corresponding to the source packets P 3 , P 4 , P 5 , P 6 , and P 7 (see grey shaded 5 × 5 block in the center of Figure 1(a)); whereby P 3 , and P 4 are common packets in the first block and the second block. As a result, packets P 3 and P 4 are more protected than packets P 0 , P 1 , and P 2 since packets P 3 , and P 4 are covered by the first and second blocks. In an erasure-free scenario, there is no need for overlap, i.e., the window can move by the block size m = w. In erasure scenarios, either a large overlap or a high number of coded packets c FNSW should be set to protect against packet erasures. Note that since the inner encoder operates in GF (2), the probability of creating linear dependent packets is high. Thus, the number of coded packets generated for one block, should at least be equal to the window size, i.e., c FNSW ≥ w.

2) Decoding
For progressive decoding, which commences elimination with the first received packet, a modified version of the Gauss-Jordan elimination is applied [48].
The decoding process of one block can be finalized with inner decoding when the decoder has received w linearly independent coded packets. Then, all w packets can be decoded together and sent to the upper layer. If the number of received linearly independent coded packets for one block is less than w, then the inner decoding process of this block is not successful, and additional coded packets are required. These additional coded packets can be coded packets generated by the inner encoder as random linear combinations of all N + r packets. Since these additional coded packets are denser than the coded packets that are generated from a single block, they can (provided they are linearly independent) recover packet erasures or linear dependencies that occur in any block.

b: Outer and combined decoding
For outer or combined decoding, w linearly independent coded packets are required to decode blocks that do not include outer expansion packets (i.e., with the coefficients corresponding to the outer expansion packets set to zero). For instance, in Figure 1(a), five linearly independent coded packets are required to decode the first two blocks (since the coding coefficients corresponding to the outer expansion packets o 1 and o 2 are zero). However, the outer decoder only needs w − r = 3 linearly independent coded packets to decode the last block in Figure 1(a) since the dimension of the decoding matrix is N × N and the outer expansion packets are used to map back the coding coefficients to the higher field detail [14]. Moreover, these outer expansion packets are random linear combinations of N source packets, i.e., they are fully dense.

3) Parameter settings
The number c FNSW of inner coded packets per block and the moving step m control the trade-off between low delay and low computation complexity on one hand; and protection against channel erasures on the other hand. For instance, a larger number c FNSW of coded packets for a block, will increase the chance of decoding the block; on the downside, a larger c FNSW will delay the transmission of the next block. We propose to set the number c FNSW of coded packets for a block as a function of the channel packet erasure probability ϵ: A smaller moving step m implies more overlap, i.e., the window will slide forward more slowly, resulting in more blocks. Hence, a smaller m will increase the total number of generated packets and the computational time. We propose to set: i.e., a larger erasure probability ϵ implies a smaller moving step m, and thus more overlap. The number n w of windows in one generation [40,Eq. (1)] is We define the number t FNSW of packets that are inner GF (2) coded according to the sliding window scheme, i.e., as combinations of w packets (sparse coded packets), for a given generation as For instance, in Figure 1(a), the source transmits t FNSW = 15 packets that are coded according to the FNSW scheme. After sending these t FNSW = 15 sparse coded packets, the source can send dense coded packets that are random linear combination of N +r packets, until the N source packets can be decoded. Based on [49, Eq. (7)], the probability of receiving w linearly independent coded packets out of c FNSW transmitted inner GF (2) coded packets is:

D. FULCRUM SYSTEMATIC SLIDING WINDOW (FSSW) 1) Encoding
Fulcrum Systematic Sliding Window (FSSW) first transmits a subset of u uncoded source packets, followed by c FSSW VOLUME 4, 2016 coded packets, as summarized in the flowchart in Figure 3. Figure 1(b) illustrates the FSSW inner coding coefficient matrix for a generation of N = 9 source packets, r = 2 outer coded packets, subset size u = 3, and window length w = 5. As marked in Figure 1(b) with gray shading, the coding coefficient that corresponds to a systematic packet is set to one (while all other coding coefficients in the row are set to 0). For instance, in the first row of Figure 1(b), the coefficient that corresponds to packet P 0 is one and all other coding coefficients are zero, which means that P 0 is sent uncoded, i.e., systematically. After sending u systematic packets, c FSSW = 1 coded packet is transmitted. The coding coefficients of the coded packets are selected randomly from GF (2). Since w > u, the window also includes w − u source packets from the previous subset of uncoded packets. For instance, the coverage of the second coded packet (which corresponds to time slot 7 in Figure 1(b)) includes P 1 and P 2 which were sent in the first subset. We propose to set the window length to

2) Decoding
The decoder only needs to decode coded packets to recover packet erasures since received systematic packets can immediately be sent to the upper layer. The decoder can store systematic packets to utilize in the decoding process of coded packets so as to recover packet erasures. For instance, suppose that P 5 in Figure 1(b) is erased. If the decoder receives the second coded packet which is a combination of P 1 , P 3 , and P 5 , the decoder needs P 1 and P 3 in uncoded form to recover P 5 . If a packet is erased and cannot be recovered through redundant packets, then either the sender needs to send coded packets at the end of the generation, or this packet will be counted as a loss.

3) Parameter setting
For a given packet erasure probability ϵ, the corresponding code rate c ≤ 1 − ϵ [50] defines the number c FSSW of coded packets that need to be transmitted for a subset of u uncoded packets: We define the number t FSSW of packets that are coded according to the FSSW sliding window scheme for a given generation as whereby, N + r packets (i.e., the N source packets and the r outer coded packets) are sent systematically by the inner encoder; and ⌈c FSSW · (N + r)/u⌉ packets are sent as inner GF (2) coded packets, i.e., as combinations of w packets (sparse coded packets).

III. PERFORMANCE EVALUATION
This section first introduces our simulation setup in Section III-A, examines the delay of individual packets in Section III-B, the encoding and decoding throughput of different generations sizes in Section III-C, the impact of different window lengths on the delay, linear dependency, and throughput in Section III-D, and the decoding probability in Section III-E.

1) Overall Setup
We implemented FSSW and FNSW with the Kodo library (kodo-fulcrum version 7.0) [51]. We measured the encoding and decoding throughput with the standard benchmarks in the library. We considered a coding scenario consisting of one sender and one receiver, without intermediate recoding nodes. A progressive decoder is applied, which is an on-thefly version of the Gauss Jordan elimination [48], and starts decoding from the first received coded packet. We performed the measurements in a virtual machine with two vCPUs and roughly 3 GB RAM. The properties of the host PC are Intel(R) Core(TM) i7-7600U CPU 2.90 GHz with 20 GB RAM. The size of the data packets was 1500 bytes which is the maximum size of Ethernet packets. We use the Galois Fields GF (2) for the inner coding and GF (2 8 ) for the outer coding. We conducted over 2000 independent replications for each scenario resulting in 95% confidence intervals that are less than 1% of the corresponding sample means. The confidence intervals are omitted from the plots to avoid visual clutter. The channel erases a transmitted packet independently with probability ϵ.

2) Throughput Metrics
With N denoting the generation size in number of source packets and σ denoting the packet size in bytes, the encoding throughput is defined as the generation size in bytes N · σ, divided by the encoding computation time for a given generation. Similarly, we define the decoding throughput as N · σ divided by the decoding computation time for recovering the N original data packets.

3) Delay Metrics
We assume that time is slotted between the sender and receiver, and that one packet is transmitted in each time slot. We define the packet in-order-delay δ(i) for packet i, i = 0, 1, . . . , N − 1, as the index of the time slot in which the packet is decoded (while neglecting the decoding computation time, which is evaluated as decoding throughput) minus the packet order number i. For instance, suppose in Figure 1(b) that P 1 is erased when it is sent uncoded in time slot 1. Then, suppose the receiver received the first coded packet in time slot 3 and recovers P 1 . The in-order delay of P 1 is then two time slots, since it was supposed to be received in time slot 1, while it was actually decoded in time slot 3. We define the mean in-order packet delay D as the mean of the in-order packet delays δ(i) experienced by the source packets i, i = 0, 1, . . . , N − 1, of a generation, i.e.,

4) Benchmarks
We compare the performance of the proposed FNSW and FSSW for a given generation size N and number r of Fulcrum outer coded packets against the original generationbased Fulcrum [14] operating with a range of "smallgeneration" sizes η, η ≤ N , and corresponding numbers ρ, ρ ≤ r, of outer coded packets, as manipulating the generation size is a possible strategy to tune the RLNC performance [52]- [54]. Specifically, for FSW scenarios with fixed generation size N = 64 and r = 4 outer coded packets in Section III-D, we consider the following "smallgeneration" original generation-based Fulcrum benchmarks: transmit two small-generations, each with generation size η = 32 and ρ = 2 outer coded packets; four smallgenerations with η = 16 and ρ = 1; eight small-generations with η = 8 and ρ = 1 (so as to maintain the Fulcrum principle), and 16 small-generations with η = 4 and ρ = 1.
We further compare FSSW with u systematic packets in a subset and c FSSW coded packets after a subset with PACE-Burst encoding [25] with u systematic packets followed by c FSSW coded packets per sub-generation. The PACE benchmark focuses on low latency by interspersing GF (2 8 ) coded packets that combine all source packets in a generation that have been transmitted so far. Table 3 lists the FSSW and FNSW parameters for the two considered packet erasure probabilities ϵ = 0.05 and ϵ = 0.15. For both FSSW and FNSW, the code rate c was set according to Eqn. t FSSW and t FSSW packets, respectively, according to the sliding window approaches, the source sent dense GF (2) coded packets until the N source packets could be decoded. These FSSW and FNSW parameters were identical for all Fulcrum decoder types, i.e., for the inner, outer, and combined Fulcrum decoders. Figure 4 shows the packet in-order delay δ(i) for the inner decoder as well as for the outer/combined decoder. We observe that the delays for the inner and outer/combined decoders exhibit generally similar behaviors, whereby the outer/combined decoder can slightly reduce the delays compared to the inner decoder. In particular, with original Fulcrum RLNC, the delay of the first packets (small i) is very high since all packets are decoded at the end of the generation, namely after the outer/combined decoder has received N = 64 linearly independent coded packets, or the inner decoder has received N + r = 68 linearly independent coded packets [14]. The decoded packets are sent to the upper layer. For inner decoding, the in-order delay δ(0) of the first packet is around 72 time slots for ϵ = 0.05 in Figure 4 We observe from Figure 4 that the FNSW in-order packet delay resembles a sawtooth wave with four peaks. The four peaks are due to n w = 4 windows in a generation, whereby each peak corresponds to the beginning of a block. The delay of the first packet in a block is the highest among the packets in the block because in FNSW, the receiver has to have w linearly independent packets to decode the block. Thus, the first packet in a block has to wait until w linearly independent coded packets have been received, causing the decreasing slope of a given sawtooth; whereby the first (left-most) sawtooth consists of w packets and subsequent sawtooth waves consist of m ≤ w packets (due to the w − m packets of overlap with the preceding window). If the c FNSW coded packets per window of w packets are not enough to recover the erased packets of a block, then some packets in the block have to wait until dense coded packets arrive at the end of the generation, causing an underlying decreasing delay trend for increasing packet order number i, see the mild trend in Figure 4(a) and the pronounced trend in Figure 4(c). Whereas, too many coded packets delay the transmission   Table 3 for ϵ = 0.05, but actual ϵ = 0.15 in (c), while estimated ϵ = 0.15 and coding parameters from Table 3 for ϵ = 0.15; but actual ϵ = 0.05 in (d); fixed parameters: generation size N = 64, r = 4 outer coded packets.

2) Results and Discussion
of the next window, causing an underlying increasing delay trend, see Figure 4(d).
For FSSW, Figure 4 indicates that the in-order packet delays are consistently below 12 time slots for all considered scenarios. FSSW achieves the short in-order packet delays due to the transmission of the systematic packets. More specifically, a received systematic packet i can immediately be passed to the upper layer. If there are no more than c FSSW packet erasures within a subset of u packets, then the erasures can be recovered by the c FSSW coded packets at the end of the subset (provided these c FSSW coded packets are linearly independent). If the packet erasure probability ϵ on the channel has been correctly estimated, then the c FSSW coded packets are typically sufficient for the recovery, resulting in the nearly flat delay curves in Figures 4(a) and 4(b). On the other hand, if ϵ is underestimated and thus c FSSW too small, then some erased packet can only be recovered with the dense coded packets sent at the end of the generation, resulting in the tendency for increased delays for the packets early in the generation and thus the decreasing delay trend with increasing packet number i in Figure 4(c).
A large number c FSSW of coded packets at the end of a subset of u packets is helpful for ensuring the recovery of the erased packets of the subset; however, the c FSSW coded packets delay the transmission of the subsequent systematic source packets by c FSSW time slots. These dynamics lead to the four barely visible sawtooth waves in Figure 4(b), where the c FSSW = 3 coded packets protect against the high erasure probability ϵ = 0.15, while increasing the in-order delays of the packets at the beginning of the next subset. The downward trend within a given sawtooth (subset) is caused by the packet recovery with the coded packets at the end of the subset. An excessive number c FSSW of coded packets leads to the "step-up" curve of the in-order packet delays in Figure 4(d).
We also observe from Figure 4 that PACE with its structure of u systematic packets followed by c FSSW dense GF (2 8 ) coded packets achieves the same low delays as FSSW; albeit, PACE does not provide the flexibility of utilizing the inner, outer, or combined Fulcrum decoder.

C. IMPACTS OF GENERATION SIZE ON THROUGHPUT
This section examines the encoding and decoding (computation) throughput as a function of the generation size N . The channel erasure probability was set to ϵ = 0.1. The window size was set to w = 10 packets so that FNSW can have a low coding complexity and low computation time and correspondingly high throughput levels. The effect of the window size w on the throughput will be examined Section III-D.

1) Encoding Throughput
We observe from Figure 5(a) that FSSW and FNSW substantially increase the encoding throughput compared to the original generation-based Fulcrum, whereby the encoding throughput gap widens with increasing generation size N . In particular, FSSW more than doubles the original Fulcrum encoding throughput for N = 50; while for N = 400, FSSW achieves more than five times the original Fulcrum encoding throughput.
We also observe from Figure 5(a) that PACE achieves encoding throughput between FSSW and FNSW for the small generation size N = 50, while the PACE encoding throughput drops to just slightly above the original Fulcrum encoding throughput for large N . PACE sends u = 9 systematic packets followed by c FSSW = 1 dense GF (2 8 ) coded packets that combine all previously transmitted source packets of the generation, whereas FSSW and FNSW utilize GF (2) for the inner encoding and GF (2 8 ) only for the outer encoding.

2) Decoding Throughput
Turning to the decoding throughput in Figure 5(b), we observe similar trends as for the encoding throughput. In comparison to the original Fulcrum, FNSW almost doubles the decoding throughput, while FSSW increases the decoding throughput approximately 2.5 times relative to the original Fulcrum. We observe from Figure 5(b) that for FSSW, the outer decoder achieves higher decoding throughput than the combined decoder for the small generation sizes N = 50 and 100; while for the large N = 400 generation size, the combined decoder achieves a higher throughput than the outer decoder, as is common for Fulcrum decoding [14]. The combined decoder applies two stages of inner decoding up to N received packets and then maps the resulting decoder coefficient matrix back to the GF (2 8 ) outer decoding [14, Sec. II.D. 3.]. In contrast, the outer decoder maps back to GF (2 8 ) outer decoding right away. The large proportion of systematic source packets in FSSW simplifies the outer decoding (as the systematic packets are an extreme form of sparse coding [56]- [61]). Moreover, for small generation sizes N , the cubed computational complexity of decoding in N [62] is low. The additional computation overhead from executing the inner decoding stages of the combined decoder does not pay off for the combination of extremely sparse (FSSW) coding and small generations. On the other hand, for large generations of extremely sparse (FSSW) coding or any generation size of moderately sparse (FNSW) coding, the inner decoding stages of the combined decoder reduce the decoding complexity compared to pure outer decoding.
The PACE decoding throughput in Figure 5(b) follows a similar trend as the PACE encoding throughput in Figure 5(a) due to the exclusive operation of PACE in GF (2 8 ).

D. IMPACT OF WINDOW SIZE
This section examines the impact of the window size w on the mean packet delay D, the number of linear dependent coded packets per generation, as well as the encoding and decoding throughput of FSSW and FNSW. Throughout this section, we consider the fixed generation size of N = 64 packets with r = 4 outer coded packets and a packet erasure probability of ϵ = 0.1 on the channel. Table 4 summarizes the FSSW and FNSW parameters for the range of considered window sizes w. Figure 6 shows the mean in-order packet delay D and the number of received linear dependent coded    Table 4 and of small-generation original Fulcrum as a function of η, see Section III-A4.  5  17  6  4  102  10  8  11  9  88  18  4  20  16  80  27  3  30  24  90  36  2  40  32  80  45  2  50  40  100  64  2  70  57  83 packets per generation, while Figure 7 shows the encoding and decoding throughput as a function of the window size w.

1) Mean In-order Packet Delay
We observe from Figure 6(a) that the FNSW mean in-order packet delay D generally increases as the window size w increases. This mean delay increase is mainly due to the sawtooth FNSW delay dynamics in Figure 4. As the window size w increases, there will be fewer, but larger sawtooth waves, which increase the mean value of the individual inorder packet delays δ(i). For the extreme case of w = N + r, there would be only a single sawtooth (window) and FNSW would degenerate to the original generation-based Fulcrum.
In contrast, we observe from Figure. 6(a) that the FSSW mean in-order packet delay D remains essentially unchanged as the window size w increases. There is only a very slight delay increase for the small window sizes w = 5 and 10 due to the slightly increased numbers of FSSW linear dependent packets for these small window sizes, see Figure 6(b).
We observe from Figure. 6(a) that the mean in-order packet delay D of small-generation original Fulcrum generally de-creases as the small-generation size η is reduced; however, the delay D reaches a minimum for η = 16 for inner decoding and η = 8 for outer decoding and then increases for further reductions of the small-generation size η. In order to maintain the Fulcrum principle of at least one outer coded packet, the η = 4 small-generation coding requires a total of N/η = 16 outer coded packets per generation of N = 64 source packets, see Section III-A4. This large coding overhead negates the delay reduction effect of the small generation size. Moreover, the inner decoder cannot utilize the outer GF (2 8 ) coded packets and requires the transmission of numerous inner coded packets to compensate for the high number of received linearly dependent inner coded packets that occur for small η, see Figure 6(b).

2) Received Linearly Dependent Coded Packets
The number of received linearly dependent coded packets per generation in Figure 6(b) is evaluated by subtracting N + r from the total number of packets received by the inner decoder (up to the point when enough packets have been received to decode all N source packets) and by subtracting N from the total number of packets received by the outer and combined decoder. We observe from Figure 6(b) that the number of linear dependent packets generally decreases with increasing window size w. This is because the number of possible random combinations in GF (2) of the w source packets in the coding window increases as 2 w , reducing the probability of the occurrence of linear dependent coded packets. Specifically, the probability of receiving linearly independent packets in Eqn. (7) increases with w while c FNSW and w satisfy Eqn. (3).
We further observe from Figure 6(b) that for a very small window size w (or small-generation size η), FNSW and original Fulcrum with inner decoding have more than double the number of linear dependent packets than FSSW and original Fulcrum with outer decoding. On the other hand, for large window sizes w ≥ 36, FNSW and original Fulcrum have slightly lower numbers of linear dependent packets than FSSW. As illustrated in Figure 1(a), FNSW transmits all packets in inner GF (2) coded form, while FSSW transmits only c FSSW inner GF (2) coded packets per u uncoded (systematic) source packets. For small window sizes w, and correspondingly few (2 w ) possible random combinations of the source packets in the coding window, it becomes quite likely that FNSW generates linearly dependent coded packets. For large window sizes w (and correspondingly large c FNSW that satisfy Eqn. (3), the probability of receiving w linearly independent packets among c FNSW coded packets increases, see Eqn. (7), and correspondingly the probability of linear independent packets increases for original Fulcrum with large values of the small-generation size η approaching the actual generation size N . On the other hand, for FSSW with large window sizes w, w ≫ u, the first few coded packets in FSSW combine fewer than w packets; specifically, the first coded packet combines u source packets, as illustrated in the upper left corner of the coding coefficient matrix in Figure 1(b). Thus, for large window sizes w, w ≫ u, FSSW slightly increasing the chance of linear dependent packets compared to FNSW which always combines w source packets.

3) Encoding Throughput
We observe from Figure 7(a) that FSSW achieves a high encoding throughput that decreases only slightly for increasing window size w; whereas, the FNSW encoding throughput decreases approximately quadratically with increasing window size w. FSSW generates only ⌈c FSSW · (N + r)/u⌉ inner GF (2) coded packets that are combinations of w coded packets, see Eqn. (10), leading to a linear increase in the encoding complexity with increasing w. In contrast, FNSW generates t FNSW , t FNSW ≥ N + r, inner coded packets as specified in Eqn. (6) in conjunction with Eqns. (3)-(5). Each inner coded packet is a combination of w packets in GF (2), and there are c FNSW , c FNSW ≥ w [see Eqn. (3)], coded packets per window, resulting in a computational encoding complexity that is quadratic in w [62]. Figure 7(a) also indicates that the encoding throughput of the small-generation original Fulcrum strategy is generally in the vicinity of the FNSW encoding throughput. The decrease of the original Fulcrum encoding throughput when decreasing the small-generation size from η = 8 to η = 4 is mainly due to the increase of the total number of outer GF (2 8 ) coded packets from 8 to 16 for the full generation of N = 64 source packets, see Section III-A4. Figure 7(b) shows the decoding throughput levels for the three types of Fulcrum decoders. We observe from Figure 7(b) that for all decoder types, FSSW achieves higher decoding throughput than FNSW, mainly due to the systematic source packet transmissions in FSSW. We also observe from Figure 7(b) that the FSSW decoding throughput is nearly constant as the window size w increases; whereas, FNSW exhibits decreasing decoding throughput for increasing w. There are two main opposing effects are work: (TH Dec ↘) Larger window sizes w increase the computational effort for the decoding of the sparse coded packets, i.e., the packets that are combinations of w source (and outer coding) packets; and (TH Dec ↗) larger window sizes w reduce the number of linearly dependent packets (see Figure 6(b)), thus reducing the need to decode dense coded packets (that are combinations of all N + r source and outer coded packets) at the end of the generation. For FSSW with only few sparse coded packets, these two effects approximately compensate each other. On the other hand, for FNSW which requires the decoding of blocks of c NFSW ≥ w sparse coded packets (which involves computationally expensive matrix inversion with cubed complexity in c NFSW [62]), the decoding throughput decreasing effect (TH Dec ↘) dominates for increasing w despite the pronounced drop in the FNSW number of linearly dependent packets with growing w (see Figure 6(b)). VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  Importantly, the throughput results in Figure 7 indicate that the FSW coding design provides a strong differentiation between the computational complexity of the different types of Fulcrum decoders. For the higher-performing FSSW, the inner decoder achieves about 3/2 times the throughput of the outer and combined decoders, i.e., inner decoding has only about 2/3 of the computational complexity of outer or combined decoding.

E. DECODING PROBABILITY
This section examines the decoding probability of the N source packets of a generation as a function of the number of extra received packets (beyond N packets) by the inner Fulcrum decoder and the outer/combined Fulcrum decoder. We observe from Figure 8 that small-generation original Fulcrum with η = 32, PACE, and FSSW with outer/combined decoder, as well as FNSW with outer/combined decoder require the fewest received packets, followed by the inner decoder for original Fulcrum with η = 32, FSSW, and FNSW. The original Fulcrum with η = 4 and outer decoding requires even more received packets. The outer/combined decoder maps back to the large-size GF (2 8 ), where linear dependent coded packets and the omission of source packets in the outer coded packets due to a zero coding coefficient are very rare. Thus, in FSSW, the r = 4 outer coded packets can almost surely recover any erased systematically transmitted source packet. On the other hand, the inner decoder has to rely on the c FSSW = 1 inner GF (2) sliding window coded packet that follows after u = 9 systematic source packets and the inner GF (2) dense coded packets that follow after the t FSSW packet transmissions. The sliding window inner coded packets cover at most w = 30 source packets (while the dense coded packets cover all N source packets); in addition, linear dependent coded packets (as examined in Figure 6(b)) and source packet omissions in a GF (2) coded packet (when the corresponding coding coefficient is zero) limit the recovery capabilities of the inner coded packets.
We note that generation-based Fulcrum (for the full generation size N ) requires at least N +r received packets for inner decoding [14]. In contrast, FSSW permits-in principleinner decoding with N received packets: Suppose that the channel erases c FSSW systematic packets from each subset of u systematic source packets and that the c FSSW coded packets can recover these packet erasures (or none of the u source packets, but all following c FSSW coded packets are erased). Then, all N source packets can be decoded (or are systematically received) from N packets received at the destination. FNSW transmits only coded packets, therefore the N source packets need to be obtained through outer decoding of the N × N coding coefficient matrix or inner decoding of the (N + r) × (N + r) coding coefficient matrix.
The PACE benchmark employs only GF (2 8 ) coding, which has a negligible probability of linear dependent coded packets, thus reducing the number of required received packets compared to the FSW schemes that employ GF (2) coding for all inner coding. The small-generation η = 32 original Fulcrum benchmark with outer decoding has the smallest number of linear dependent packets in Figure 6(b) and correspondingly requires the smallest number of extra received packets for decoding. However, for the smallgeneration size η = 4, original Fulcrum with outer decoding has a mean number of close to ten linear dependent packets per generation of N = 64 source packets in Figure 6(b). Accordingly, small-generation original Fulcrum with η = 4 requires around ten packets to approach successful decoding and over 18 extra packets are required to reduce the decoding failure probability below 1%, see Figure 8.

IV. CONCLUSION
We introduced Fulcrum Sliding Window (FSW) coding to reap the benefits of both the Fulcrum RLNC coding and the sliding window RLNC coding. Based on the Fulcrum coding, FSW coding flexibly reaches diverse sets of destinations that need to decode with low-complexity XOR operations in the small Galois Field GF (2) or with computationally demanding operations in a large Galois Field, e.g., GF (2 8 ). Based on the sliding window coding, FSW achieves low inorder packet latencies as well as high encoding and decoding (computation) throughput. More specifically, we developed Fulcrum Non-systematic Sliding Window (FNSW) coding and Fulcrum Systematic Sliding Window (FSSW).
Our extensive evaluations indicate that the introduced FSSW coding achieves short in-order packet delays in conjunction with high encoding and decoding throughput. We also observed that FSSW preserves the Fulcrum RLNC feature of effective differentiation between the outer and combined decoding (high reliability, i.e., low decoding failure probability, at the expense of complex GF (2 8 ) decoding) versus inner decoding (reduced reliability or slightly increased delay for low complexity GF (2) decoding).
The PACE benchmark [25] achieves similarly short inorder packet delays and high reliability as FSW, but gives lower encoding and decoding throughput and, importantly, is limited to operating in a single Galois Field size. A smallgeneration strategy of the original generation-based Fulcrum RLNC coding [14] incurs a high coding overhead as the small-generation size is shrunken in an effort to achieve short in-order packet delays. The high overhead negates the delay reductions of the small-generation size. Also, smallgeneration Fulcrum coding creates a relatively high proportion of linear dependent coded packets, sharply reducing the reliability, i.e., increasing the decoding failure probability.
Future research can build on FSW in several directions. One interesting direction is to further accelerate the encoding and decoding through hardware acceleration modules [63], [64]. FSW allows the flexibility to support hardware modules that are limited to XOR operations as well as hardware modules that can execute operations in large Galois Fields. Another interesting future research direction is to explore deep learning techniques [65]- [68] to estimate the channel packet erasure probabilities and to adapt the coding parameters.