SpaRec: Sparse Systematic RLNC Recoding in Multi-hop Networks

Sparse Random Linear Network Coding (RLNC) reduces the computational complexity of the RLNC decoding through a low density of the non-zero coding coefficients, which can be achieved through sending uncoded (systematic) packets. However, conventional recoding of sparse RLNC coded packets at an intermediate node in a multi-hop network increases the density of non-zero coding coefficients. We develop and evaluate sparsity-preserving recoding (SpaRec) strategies that preserve the low density of non-zero coding coefficients of sparse RLNC with systematic packets. We develop SpaRec strategies with and without decoding at the intermediate nodes, with and without a specified coding rate, as well as with finite and infinite recoding window lengths. We evaluate the SpaRec strategies in multi-hop networks in terms of packet loss, packet delivery delay, as well as recoding and decoding (computation) throughput. We find that the SpaRec strategies substantially improve the RLNC performance compared to conventional recoding.


A. MOTIVATION
Random Linear Network Coding (RLNC) linearly combines a block (also referred to as generation) of G data packets with random coding coefficients from a Galois Field to create coded packets. The original G data packets can be recovered from any set of G received linearly independent coded packets through RLNC decoding. The decoding involves the multiplication and inversion of a G × G matrix of coding coefficients. The RLNC packet recovery from any set of G linearly independent coded packets does not require any signaling or coordination. Each coded packet only needs to carry its own coding coefficients, making RLNC wellsuited for error-prone communication networks, e.g., wireless networks [3]- [8] and content distribution networks [9]- [11]. However, the high computational complexity of the RLNC coding and decoding has hampered the adoption of RLNC in practical networks [12], [13].
One promising strategy to reduce the computational complexity while retaining the packet recovery property is to employ sparse coding coefficients, i.e., a low density of nonzero coding coefficients [14]. As reviewed in Section I-B, a low-coding coefficient density can be achieved with various strategies [14]. We focus on the systematic RLNC approach that intersperses only few coded packets with non-zero coding coefficients among the uncoded (so-called systematic) packets in a generation, i.e., so-called sparse systematic RLNC [15]- [19]. To date, sparse systematic RLNC has been mainly studied for end-to-end coding, i.e., sparse systematic RLNC encoding at the sending (source) node, and decoding at the ultimate destination node. However, an important performance-enhancing aspect of RLNC is recoding at intermediate network nodes, which can substantially increase the overall network performance [20]- [29]. With recoding,

Symbol Definition S
Source node Nn Intermediate node n G Generation length (size) in # of packets Pn Source packet n, n = 0, 1, 2, . . .

Cc
Coded packet c, c = 0, 1, 2, . . . generated by source w Coding window size, w ≤ G ϵ Packet erasure probability m Number of source packets in a subset m ≤ G k Number of coded packets transmitted after a subset c Code rate = m/(m + k) δ Delay of individual packets with a packet buffer that can hold the recoding window size in each intermediate node and one strategy that operates with smaller packet buffers in the intermediate nodes. Also, Section II specifies a rateless (adaptive) sparse recoding approach that can be combined with any of the proposed SpaRec strategies. Section III describes the multi-hop network evaluation set-up and defines the metrics for evaluating the packet loss, in-order packet delay, as well as recoding and decoding (computation) throughput. Section IV-A first evaluates the utilization of idle slots appearing due to packet erasures on the incoming links in the intermediate nodes. The evaluation demonstrates that sending coded packets in such idle slots can substantially reduce the packet delay. Section IV-B evaluates the impact of the code rate and the recoding window size. We find that rateless recoding with a finite recoding window size reduces the packet delay and increases the recoding and decoding throughput compared to recoding with a prescribed code rate or infinite recoding window size. Section IV-C compares the three specified SpaRec strategies operating with idle slot utilization, rateless recoding, and a finite window size against conventional recoding [67], [68], systematic RLNC [60]- [66], and a conventional recoding with small buffer benchmark [14]. We find that the proposed SpaRec strategies substantially reduce the packet losses while reducing the mean in-order packet delays down to approximately half of the benchmarks. Also, the SpaRec approaches can double the recoding throughput, and substantially increase the decoding throughput.

1) Overview
The general components of the Decode-Recode (Dec-Rec) recoding technique are: each intermediate node has a decoder to decode received coded packets instantly on-the-fly; and each intermediate node has an encoder to generate new coded packets, i.e., recoded packets. Intermediate nodes do not need to receive an entire generation before applying recoding. With our Dec-Rec approach, recoding occurs while the packets of a generation traverse the network. Intermediate nodes may receive packets out of order due to packet erasures (or packet reordering). In order to reduce packet in-order delay, the intermediate nodes follow an algorithm similar to a code design at the source. That is, the intermediate nodes first forward a subset of the received systematic packets and interleave coded packets in between the subsets of systematic packets. The source node encodes the original packets using sliding window network coding [69] with a prescribed finite length encoding window [37], [65], [70]- [75], set to w fin = 5. The first packets received by N 1 , as displayed in the receiv. column in Fig 1, namely the Pi, i ∈ 0, 1, . . . , G − 1, are the original (systematic) source packets in a given generation, while the Cj, j ∈ 0, 1, . . ., correspond to the coded packets generated by the source (see Table 1 for a summary of the notations). We consider that the time is slotted and in each time slot, either a systematic packet or a coded packet is transmitted over the erasure channel. After intermediate node N 1 receives packets, it sends them in the same sliding window pattern. Specifically, N 1 first sends a subset of the earlier arrived uncoded packets or decoded packets, then N 1 generates coded packets for Forward Error Correction (FEC). In the example in Fig 1(a), the code rate of N 1 is c = m/(m + k) = 4/5. Therefore, after sending m = 4 uncoded packets (P0, P1, P2, and P3), N 1 generates a coded packet as a combination of P0, P1, P2, and P3 and sends it to the next node. Then, N 1 continues to send P4. N 2 also follows the sliding window pattern for recoding the packets.

3) Packet Decoding and Seen Packets
Note that N 1 may be able to recover packets that were erased on the S-N 1 link by decoding received coded packets on-the-fly. An example of on-the-fly decoding is shown in the dec. column of N 1 . After receiving coded packet C0, N 1 is able to decode P3, and then sends P3 to N 2 . Another example is observed for N 2 : N 2 is able to decode P2 by receiving the coded packet that is a combination of P0, P1, P2, and P3 from N 1 .
If a packet erasures burst erases more than k packets, then k coded FEC packets are not sufficient to recover VOLUME 4, 2016 all erased packets. In this case, intermediate nodes have seen packets, which are partially decoded packets with information of future packets [76]. For sake of clarity, we give an example of the coefficient vector of a seen packet. Assume that on the S-N 1 link, packet P2 would get erased, together with P3. Then, the coded packet C0 would not be able to decode P2 and P3. In the coding coefficient row 0 0 1 F 0 0 0 , there would be a seen packet coding coefficient of 1 that corresponds to P2 and the F , which is a random value from the considered Galois field, corresponding to the P3. Note that since P0 and P1 are available uncoded, they would be subtracted from C0, and only the combination of P2 and P3 remains undecoded, being referred to as a seen packet [76]. When seen packets appear, intermediate nodes operating with Dec-Rec also send the seen packets to the destination. Receiving the seen coded packets can aid the decoding at the destination.
When a full generation is decoded by an intermediate node, then the node ignores any further packets arriving from an upstream intermediate node or the source. Then the node first checks if there are newly decoded packets that have not been previously transmitted in uncoded form and transmits all such packets in uncoded form. Subsequently, the node newly generates and transmits fully dense coded packets that are combinations of all packets in the generation until the end of the generation transmission is determined through conventional signaling, e.g., header signaling [77].

4) Idle Slot Utilization
In a time slot with a packet erasure, there may not be new information to send. More specifically, when a packet erasure occurs, an idle time slot appears, as illustrated in Fig 1(a) for several examples (see X in the receiv. column followed by idle in the sent column). Conventionally, nodes do not send any information in such an idle time slot.
For the SpaRec algorithms, including Dec-Rec, we propose to send coded packets instead of staying idle. Examples of such transmissions of coded packets in conventionally idle slots are shown in Fig 1(b). We observe from Fig 1(b) that N 2 recovers P2 one time slot earlier when it receives the coded packet that is a combination of P0, P1, and P2 than with the conventional approach in Fig 1(a). Similar to N 2 , the destination also receives packets earlier when idle time slots are filled with coded packet transmissions. For instance, P1 is recovered in time slot 6 with the conventional approach in Fig 1(a); however, P1 is recovered in time slot 2 in Fig 1(b).

5) Analysis of Recoding Window Size
For a given packet erasure probability ϵ of the outgoing link of a given source node or intermediate node and numerator m of the code rate c = m/(m + k) (which should satisfy c ≤ 1 − ϵ [69]), we propose to set the finite coding window size (length) of the source or finite recoding window size (length) of the intermediate node to If w yields a fractional number, we round to the closest integer. Only packets in the window are coded together. Based on the evaluations in [65] demonstrating low packet loss probability and low computational complexity for finite coding window length at the source, we consider finite encoding window length at the source throughout this study. However, infinite and finite length coding windows have not been studied for recoding in the intermediate nodes. Therefore, we evaluate the performance characteristics of the different recoding window lengths in Section IV-B. Since the code rate c of each node can be different, the numbers m and k of uncoded and coded packets, respectively, can be different at each node. Intermediate nodes can distinguish uncoded packets from coded packets by examining the coefficient vector of received packets: uncoded packets have only one non-zero element in the coefficient vector, while coded packets have more than one non-zero elements in their coefficient vector. Figure 2 shows an example of the Rec-woDec technique. C0 and C1 are coded packets generated by the source, S0, S1, and S2 are coded packets generated by the N 1 , while Z0, Z1, Z2, and Z3 are coded packets generated by N 2 . The code rates of the source, N 1 , and N 2 are c S = 3/4, c N1 = 2/3, and c N2 = 3/4, respectively. Figure 2 looks similar to Figure 1; however, with Rec-woDec, received coded packets are also combined with received uncoded packets to generate new coded packets. In Figure 2, we can consider finite length recoding windows at N 1 and N 2 , or infinite length recoding windows.

3) Finite Recoding Window Length
Suppose the window lengths of the source, N 1 , and N 2 are 4, 3, and 4, respectively. As shown in Figure 2, N 1 sends k = 1 coded packet for FEC after every m = 2 uncoded packets since the code rate is c = 2/3. After sending P0 and P1, N 1 generates the coded packet S0 in the third time step. Since in this time step, P0, P1, and P2 are in the buffer,  S0 becomes a combination of these three packets. When P0, P1, and P2 are combined, each of them is multiplied with a different random coding coefficient, following the general principles of RLNC recoding [77], [78]. The coding coefficients and the payloads are summed separately. Then, the summed coding coefficients are appended to the payload to be sent to the next node.
Similar to S0, coded packet S1 is also a combination of the last three received packets which are P4, P3, and C0. The coded packet C0 includes P0, P1, P2; as a result, S1 becomes a combination of P0, P1, P2, P3, and P4. Although the recoding window length of N 1 is w = 3, there is information of 5 packets in coded packet S1. This property is different from Dec-Rec, namely coded packets are not combined with uncoded packets in Dec-Rec. Thus, the number of non-zero elements in the coefficient vector of generated coded packets in Rec-woDec can exceed the recoding window length w fin ; whereas, in Dec-Rec the number of non-zero elements in the coefficient vector of the generated coded packets exactly equals the window length w fin .
The coded packet S2 is generated at N 1 since there is no uncoded packet available to send; S2 is a combination of C1, P4, and P3. In turn, C1 is a combination of P2, P3, P4, and P5; as a result, S2 is a combination of P2, P3, P4, and P5. Although P5 is erased and not available uncoded at N 1 , S2 still has information of P5 thanks to C1.
Examining N 2 in Figure 2, we see that only Z1 is generated for FEC, while Z0, Z2, and Z3 are generated to fill the idle slots. After D receives Z0, it is able to decode P1 because Z0 is a combination of P0 and P1. Although the recoding window size is w = 4 in N 2 , only P0 and P1 are available for recoding in that third time step. Afterwards, P2, Z1, and P3 are sent by N 2 .

4) Infinite Recoding Window Length
If an infinite-length recoding window is used, then the S and Z coded packets will be generated by using all received packets of a given generation, i.e., at most G packets. As an example, Z3 will be a combination of P0, P1, P2, P3, S1, and S2.

C. PROPOSED SPARSIFIED PURE RECODING (SPARSEPR)
Similar to Rec-woDec, sparsified pure recoding (SparsePR) does not have a decoder in the intermediate nodes. Also, similar to the Rec-woDec algorithm, with SparsePR, intermediate nodes send m uncoded packets followed by k coded packets as shown in Figure 2. However, the SparsePR operations in the recoder buffer are different from the Rec-woDec operations.
In Rec-woDec, every packet is stored in a different buffer as illustrated on the left-hand side of Figure 3. In order to store each packet individually, the number of packet buffers in an intermediate Rec-woDec node needs to equal the recoding window size i.e., w fin for a finite recoding window size or the generation size G for an infinite recoding window size. The transmitted packets that are out of the scope of the recoding window can be discarded to open packet buffers for newly arrived and un-transmitted packets. However, the deletion of old packets causes a loss of information. SparsePR addresses this issue by keeping more information with limited packet buffers.
In SparsePR, every received packet is immediately combined with the previously received packets in the packet buffers. An example of this combination at N 1 (of Figure 2) for SparsePR with three packet buffers (3B) is shown for the first three received packets in Figure 3 on the right-hand side. When packet P0 is received, it is first sent uncoded and VOLUME 4, 2016 then stored in the recoder packet buffer Q 0 . Then, packet P1 is received and sent uncoded. Afterwards, packet P1 is combined with P0 in Q 0 and then P1 is placed in the Q 1 buffer. When P2 is received, P2 is combined with the packets in the Q 0 and Q 1 buffers, then P2 is placed in the Q 2 buffer. In this third time step, the coded packet S0 should be sent as illustrated in Figure 2. Thus, the three buffers Q 0 , Q 1 , and Q 2 are combined to generate coded packet S0 (for the N 1 window size of w = 3). Effectively, S0 becomes the combination of P0, P1, and P2.
Note that an alternative way to create coded packet S0 could be to simply take the packet in Q 0 since the packet in Q 0 contains information of P0, P1, and P2. However, simply taking the packet in Q 0 for the transmission of coded packet S0 would cause linear dependencies in case multiple coded packets need to be sent, i.e., sending the packet in Q 0 more than once would not help the destination. Instead, creating different combinations from the last w packet buffers will aid the decoder in decoding erased packets.
The analogous combination steps are applied to the other received packets. Similar to S0, when coded packets S1 and S2 are generated, the last w = 3 buffers are combined. When the window size is not limited, then all buffers are combined to generate coded packets; whereby the number of packet buffers is not limited for an infinite window size.
There are advantages and disadvantages of combining the received packet with the packets in the packet buffers. The advantage is that intermediate nodes can have limited numbers of packet buffers and still have the information of the previous packets. For instance, packet buffer Q 0 progressively holds information of more packets, although its capacity is for only one packet. As a result, the generated coded packets will include the information of more packets and this will reduce the delay until decoding the full generation at the decoder. One disadvantage is that this constant combination increases the computational effort in the intermediate nodes.

D. RATELESS (ADAPTIVE) SPARSE RECODING
In this section, we remove the notion of a prescribed code rate c for the SpaRec strategies in the intermediate nodes.
With a prescribed code rate c = m/(m + k), a node sends m uncoded packets followed by k coded packets for FEC. In a single-hop network, these k redundant coded packets are very important to recover erased packets. However, it may not be prudent to obey a prescribed code rate c in a multi-hop network. As observed in the description of Rec-woDec in Section II-B, in order to utilize the idle slots, an intermediate node can send a coded packet when there are no uncoded packets to send (even when the code rate does not yet require the transmission of a coded packet). When intermediate nodes send coded packets to comply with a prescribed code rate c and to utilize idle slots, then the destination may receive unneeded coded packets. In order to avoid this problem of excessive coded packets we propose not to obey a prescribed code rate for recoding in the intermediate nodes. Instead, we propose to forward uncoded packets whenever they arrive, and to send a coded packet when an erasure occurs (creating an idle slot) or when a coded packet arrives.
For example, consider N 1 in the third time slot in Figure 2: after sending P0 and P1, N 1 generates and sends S0 although P2 is available in the buffer. Instead of sending S0 in the third time step, with rateless (adaptive) recoding, N 1 sends P2 in the third time slot. The coded packet S0 is then sent in the fourth time slot when C0 is received.

III. PERFORMANCE EVALUATION SETUP
This section first introduces the simulation scenario in Section III-A and the evaluation metrics in Section III-B. Figure 4 shows our experimental multi-hop network model for evaluating the packet loss and the packet delay; for the throughput evaluations, we consider one to four intermediate nodes. We model time with slots, whereby a time slot corresponds to the transmission time of one packet (as further elaborated in Section III-B2). We assume that all packets are available at the source without delay. Packet erasures are independently distributed and occur on each link with probability ϵ. We conducted 10 000 independent replications for each simulation scenario. The resulting 95% confidence intervals on the performance metrics are less than 1% of the corresponding sample means (unless noted otherwise) and are omitted from the plots to avoid visual clutter. The packet size was set to σ = 1500 bytes, which mimics the maximum size of UDP packets. Unless otherwise noted, all evaluations were conducted for a generation size of G = 60 packets, which is common in network coding studies [32], [79], [80]. We considered a Galois field size of GF (2 8 ), which ensures a negligible probability of linear dependency with RLNC.

B. EVALUATION METRICS 1) Packet Loss
The packet loss probability L in percent is the difference between the generation size G and the number µ of packets that are recovered at the destination (either by being received in uncoded form and by being decoded at the destination), normalized by the generation size G, i.e., Note that all seen packets are counted as packet losses if they are not fully decoded by the destination. For the loss evaluation, the source sends the number of packets that corresponds to the minimum code rate c min = m min /(m min + k min ) = 6/10 among the code rates in the caption of Figure 4. Specifically, the source sends a total of G + ⌈G/m min ⌉k min packets for a given generation. Whereby, the source follows the transmission pattern that corresponds to its source code rate c S = m S /(m S + k S ) = 8/10 for the ⌊G/m S ⌋ full subsets in the generation. Then, the source sends fully coded dense packets (i.e., combinations of all G source packets) until reaching a total of G + ⌈G/m min ⌉k min transmitted packets. In particular, the source sends a transmission pattern of m S = 8 uncoded packets followed by k S = 2 coded packets ⌊G/m S ⌋ = ⌊60/8⌋ = 7 times, i.e., the source sends seven full subsets, each consisting of m S = 8 uncoded packets followed by k S = 2 coded packets [that are combinations of w fin = 9 (see Eqn. 1) uncoded packets]. Then, the source sends the remaining 4 uncoded packets, followed by 26 fully dense coded packets, for a total of ⌈G/m min ⌉k min = ⌈60/6⌉4 = 40 coded packets for the generation of G = 60 uncoded packets. Afterwards, the source stops. Then, the destination counts the number G−µ of received uncoded and decoded original source packets to evaluate the packet loss according to Eqn. (2). Subsequently, the source proceeds to the next generation.

2) Delay
For a generation of G packets, the mean in-order packet delay D at the decoder is evaluated as: where δ(i), i = 0, 1, 2, . . . , G − 1, denotes the in-order delay of packet Pi in elapsed time slots. That is, δ(i) is the integer difference between the time slot in which Pi is received uncoded or decoded by the destination (whereby we neglect the computation delay for the decoding; the decoding computation delay is accounted for in the decoding throughput) and the order number i of packet Pi in the considered generation at the source. More specifically, one integer time slot models the delivery of one packet from the source via multiple intermediate recoding nodes to the destination without erasures, as illustrated for P0 in the first line in Figs. 1(a) and (b), as well as Figure 2. We note that in real physical networks, each intermediate node incurs one time slot delay for the packet transmission in the store-compute-forward process, i.e., the actual packet delay in a real physical network corresponds to the in-order packet delay metric δ(i), plus one time slot for the transmission delay for each intermediate node, plus applicable nodal processing delays, nodal queueing delays, and link propagation delays.
Effectively, our δ(i) delay metric counts the extra time slots that are incurred due to the coding and link packet erasures. Specifically, the delay metric δ(i) counts the extra time slots due to the coding at the source, the recoding in the intermediate nodes, as well as the packet erasures on the links and the corresponding packet recovery through coded packet transmissions so as to deliver the packet in order position i of a generation to the destination (and neglects the source and in-network processing, queueing, transmission, and propagation delays). We adopted this delay metric to focus on the delay components that are directly affected by the coding and recoding mechanisms and the link packet erasures, while excluding ancillary delay components that are unrelated (constant with respect) to the coding and link packet erasure dynamics.
For the delay evaluation and the throughput evaluation, the source follows the transmission pattern from Section III-B1, and then sends additional fully coded dense packets until the destination has recovered all G original source packets in the generation.

3) Throughput a: Decoder
For a generation of G packets, the decoder throughput T d is evaluated as σ is the packet size (in bytes), and τ is the total decoding (computation) time of a generation. The decoder throughput is evaluated in Megabits per second. For the decoding computation time evaluation, all packets received at the destination for a given generation were available to the destination decoder when the decoding commenced.

b: Recoder
The recoder throughput T r is evaluated analogously as the decoder throughput T d . For evaluating the recoding throughput, τ represents the total computation time incurred for recoding the packets of a generation at an intermediate node. VOLUME 4, 2016 Intermediate nodes have two packet processing stages. The first stage is the packet reading stage. When a packet is received, a time delay is incurred to "read" this packet. Whereby, reading a packet refers to the packet decoding in Dec-Rec and the examination of the coding coefficients to detect uncoded packets in SparsePR and Rec-woDec. Therefore, a different amount of reading time is incurred depending on the recoding approach. After reading a packet, the packet writing stage commences. In the writing stage, a coded packet is generated and transmitted to the next node, or an uncoded packet is transmitted to the next node. The total computation (processing) time for the packet reading and writing stages for a generation is represented as τ .
For the recoding computation time evaluation, all packets received at an intermediate node for a given generation were available to the recoder when the recoding commenced. For scenarios with multiple intermediate nodes, the recoding throughputs of the individual intermediate nodes were averaged.

1) Conventional Recoding
In conventional recoding [67], [68], an intermediate node sends only recoded packets. For recoding, all received packets in the buffer for a given generation are combined, i.e., the recoding window size is limited to the generation size G.

2) Systematic RLNC
With the systematic Random Linear Network Coding (RLNC) coding scheme for a prescribed code rate c [60]- [66], the source first sends the entire generation of G uncoded packets, followed by p coded packets, whereby c = G/(G+p). Intermediate nodes follow the same scheme for the recoding. In particular, an intermediate node sends G uncoded packets, and then recovers any erased packets after receiving redundant packets using its decoder. Recovered packets are sent according to the priority indicated by their order number. After sending G uncoded packets, the intermediate node generates and sends p coded packets, whereby the window size for generating coded packets at an intermediate node equals the generation size G.
For an example of systematic RLNC, consider Figure 1(a) with a generation size of G = 4 and code rate of the source and intermediate nodes of c = 4/5. The source sends the coded packet C0 after sending G = 4 uncoded packets, the intermediate nodes N 1 and N 2 follow the same packet transmission pattern. First, N 1 sends the received P0, P1, and P2 to N 2 ; and N 2 sends the received packets P0 and P1 to the destination D. After receiving C0, N 1 recovers P3, which was erased on the link from S to N 1 . After recovering P3, N 1 sends P3 to N 2 ; and N 2 receives P3 and sends P3 to D. After N 1 sends an entire generation of G = 4 uncoded packets, N 1 generates p = 1 coded packet from P0, P1, P2, and P3. By receiving this coded packet, N 2 decodes P2 and sends P2 to D. After sending the generation, N 2 also generates p = 1 coded packet from P0, P1, P2, and P3.
Note that in this illustrative example with generation size G = 4 and p = 1, the systematic RLNC dynamics equal the Dec-Rec dynamics in Figure 1(a). However, for larger generation sizes, e.g., G = 60, the dynamics differ since the systematic RLNC source sends first the G = 60 uncoded packets and then sends p = G(1/c − 1) redundant packets.
If an intermediate node cannot recover all G packets in a generation, then the intermediate node only forwards the received uncoded packets and packets that the node was able to decode. Then, the node stays idle and does not send coded packets. A generation of recoded packets without recovering the full generation would require a specific protocol to signal the end of the packet transmissions from the preceding node; such a protocol is out of the scope of this study. Cases of an intermediate node not being able to decode the entire generation arise only for the packet loss evaluation scenario, see Section III-B1, which limits the number of transmitted coded packets for a generation.

3) Conventional Recoding with Small Buffer (CRSB)
Another way of recoding is proposed in [14], namely a conventional recoding with a small buffer size (CRSB). With CRSB, intermediate nodes have a small buffer, e.g., a buffer holding up to 4 packets. When packets are received, they are stored in the buffer. When recoded packets are generated, all the received packets in the buffer are combined. After generating a fixed number of recoded packets, the packets in the buffer are discarded. Then, the intermediate node includes the next packet into the buffer for recoding; whereby the next packets may be held in a different "holding buffer" before entering the recoding buffer, see [14] for details.
An example of CRSB is illustrated in Figure 2 at N 1 with a buffer size of 4. When P0 arrives, it is stored in the buffer and sent to N 2 (irrespective of whether it is an uncoded or coded packet). The recoding process starts when the second packet is received. Then, when P1 arrives, P1 is combined with P0, and a combination of P0 and P1 is sent in the second time step. In the third time step, P2 arrives and a combination of P0, P1, and P2 is sent. In the fourth time step, C0 arrives and a combination of P0, P1, P2, and C0 is sent. Thus, consistent with the underlying conventional recoding, CRSB sends mainly coded packets; specifically, CRSB in an intermediate node recodes all packets, except for the first packet in a set of buffered packets, which is transmitted as it was received at the intermediate node (it could be received as a systematic packet or as a coded packet).
Since the buffer of size four packets is full at the end of the fourth time step, the buffer is cleared (purged). In the fifth time step, P3 is received and P3 is alone sent to N 2 since P3 is the only packet in the buffer after clearing. Then, P4 is received, and a recoded packet which is a combination of P3 and P4 is generated and transmitted. Then, packet P5 is sent by the source, but is erased on the link. The intermediate node sends another combination of P3 and P4 to fill the idle slot. In the last time step in Figure 2, C1 is received at N 1 . Then, another recoded packet is generated (from P3, P4, and C1) and send to N 2 .
According to this CRSB approach, nodes generate at least 3 coded packets, which is equal to the buffer size minus one, before discarding the packets in the buffer. If idle time slots appear, the intermediate node generates additional recoded packets from the packets that are currently in the buffer to fill the gap. Therefore, the number of generated recoded packets for each set of buffered packets can be different.
In summary, by recoding only a small number of packets, the CRSB approach can reduce the computation effort of intermediate nodes and the destination compared to the conventional recoding. However, the CRSB approach destroys the systematic sparse structure created by the source. For conventional recoding, the regular pure-recoder of the Kodo library was used; while for SparsePR, the Kodo purerecoder was adapted to implement the principle in Figure 2 with five packet buffers (5B, approx. half the packet buffers of the other SpaRec approaches), i.e., one packet buffer for holding the incoming packet and four packet buffers for holding packet combinations.
The throughput evaluations were performed on a computer with an Intel Core i5-6500 3.2 GHz processor and 8 Gbyte RAM operating with Ubuntu 20.04 with Linux 5.4.0.

IV. EVALUATION RESULTS
Section IV-A examines the delay implications of the idle time slot utilization. The delay and throughput of the proposed SpaRec algorithms with and without a prescribed code rate in combination with different recoding window sizes are evaluated in Section IV-B. Finally, the packet loss, in-order packet delay, and throughput performance of the best-performing SpaRec approaches are compared against the benchmarks in Section IV-C.  Dec-Rec-R Inf Rec-woDec -R Inf SparsePR-R-5B FIGURE 5. Boxplots of mean in-order packet delay D for a generation of G = 60 packets without utilization of idle slots, see Figure 1(a) (denoted by "idle"), and with utilization of idle slots, see Figure 1(b) (denoted by "non-idle") for three-hop network in Figure 4 with the code rates specified in the caption of Figure 4 and unlimited recoding windows.

A. EVALUATION OF UTILIZATION OF IDLE TIME SLOTS
The coding window sizes of N 1 , and N 2 were unlimited to observe the maximum impact of the idle case. We observe from Figure 5 that the idle slot utilization (corresponding to the non-idle results in Figure 5 We also note that irrespective of the idle slot utilization, the decoding in Dec-Rec lowers the packet delays, which is further examined in Section IV-C2. For now, we conclude that sending coded packets in the idle time slots decreases the delay. Therefore, we utilize idle slots throughout the remainder of this study.

B. IMPACT OF THE CODE RATE AND WINDOW SIZE
We observe from Figure 6 that the finite versus infinite recoding window size does not significantly affect the delay distribution. This is mainly because the finite window length w fin proposed in Eqn. (1) is long enough to ensure that sufficient numbers of packets are combined in the recodings to support the timely recovery of erased packets. In particular, Eqn. (1) specifies that the finite recoding window w fin covers the last subset of m packets, plus m × ϵ packets (which correspond to the expected number of packets that are erased on the outgoing link of a given recoding node). Generally, the earlier the destination recovers erased packets, the lower the delay of packets that were erased on any of the links. The essentially equivalent delays for finite versus infinite recoding windows demonstrate that the  finite recoding window w fin supports packet loss recovery essentially as well as an infinite window length. We also observe from Figure 6 that the rateless recoding achieves substantially shorter delays than the recoding with a prescribed code rate for the Dec-Rec and Rec-woDec approaches, while the SparsePR delays are independent of the code rate. For Dec-Rec and Rec-woDec, the shorter packet delays with rateless recoding are mainly due to the extra delays that are introduced when intermediate nodes enforce a prescribed code rate c = m/(m + k) for each subset of m uncoded packets. When an intermediate node enforces a prescribed code rate c, any uncoded packets that arrive immediately after m uncoded packets have been transmitted by the intermediate node, need to wait until the intermediate node has transmitted k coded packets to fulfill the code rate c. In contrast, the rateless approach allows an intermediate node to transmit uncoded packets immediately after their arrival, lowering the packet delays. On the other hand, in SparsePR, each received packet is immediately combined with the contents of the packet buffers, see Section II-C. Then, when a coded packet is generated by combining the packet buffers, the coded packet includes information about the latest received packet. Thus, a received uncoded packet is effectively not delayed by transmitting a coded packet.
We observe from Table 2 that for all approaches, rateless recoding achieves higher recoder and decoder throughput than recoding with a prescribed code rate c. This is mainly due to an excessive generation of coded packets when intermediate nodes enforce a prescribed code rate. Specifically, coded packets are generated to fulfill the code rate c, and to fill the idle slots. Generating coded packets for these two purposes tends to result in superfluous coded packets that are linearly dependent to the already received packets at the destination. More specifically, the decoder evaluates the coding coefficient vector of each received packet. If a coded packet is superfluous, i.e., is not useful to recover an erased packet (because the coded packet is a linear combination of the already received packets), then the coded packet is discarded. The decoder detects a superfluous coded packet from the coding coefficient vector through a modified version of the Gauss-Jordan algorithm that detects linear dependent packets [82]. Thus, some coding vector operations are required in the recoding nodes for Dec-Rec and in the destination decoder for all SpaRec approaches to detect superfluous packets. These coding vector operations cause slight reductions of the recoding throughput and the decoding throughput.
We also observe from Table 2 that the finite recoding window length generally tends to give slightly higher throughput levels than the infinite window length. This is mainly due to the slightly higher computational complexity of decoding coded packets that are combinations of a high number of packets. With the finite window length w fin , see Eqn.
(1), typically on the order of ten packets are combined in a coded packet; whereas, with the infinite window, up to G = 60 packets are combined. Based on the results in this section, we select the rateless (RL) recoding with finite (Fin) window length for the remaining evaluations in this article.

C. COMPARISON WITH BENCHMARKS
This section compares the performance of the SpaRec approaches Dec-Rec, Rec-woDec, and SparsePR, all operating with rateless recoding with a finite window w fin , see Eqn. (1), with the benchmarks conventional recoding, systematic RLNC, and CRSB in terms of packet loss, delay, and throughput.

1) Packet Loss Performance a: Packet Loss Dynamics of Benchmarks
We observe from Figure 7 that the medians and upper whiskers of the packet loss boxplots of all algorithms are zero; except for systematic RLNC. Systematic RLNC has the first packet loss quartile at around 23%, which means that 75% of the generations experienced packet losses above 23%. With systematic RLNC, the source sends the G = 60 uncoded packets followed by 40 coded packets. If these 40 coded packets are not enough to recover the packet erasures that occur between the source and N 1 , then N 1 cannot recover the entire generation and thereforefollowing the systematic RLNC benchmark operation, see Section III-C2-does not send coded packets to the next node N 2 . As a result, the next node N 2 would not be able to recover packet erasures that occurred between N 1 and N 2 . Thus, the destination tends to experience many missing packets with systematic RLNC as observed in Figure 7.
We observe from Figure 7 that conventional recoding results in rare outliers in the 70-100% packet loss probability range (whereby most of these outliers cluster in the 90-100% range); in contrast, CRSB results in frequent outliers in the 60-100% packet loss probability range. The main reason for the outliers at these high packet loss probabilities is the relatively high number of seen packets (see Section II-A3) in conventional recoding. Following the recoding study in [14], CRSB applies conventional recoding with a small buffer of 10 packets [14]. In particular, with CRSB, an intermediate node transmits only coded packets (unless the buffer holds only one uncoded packet, which can occur in N 1 ). However, the CRSB-recoded packets are combinations of fewer packets than with conventional recoding, i.e., the CRSB-recoded packets provide only a restricted (weaker) protection against packet erasures compared to conventionally recoded packets that are combinations of all packets of a generation that an intermediate node has received. Accordingly, CRSB tends to require overall more packet transmissions than conventional RLNC recoding in order to complete the decoding at the destination, as previously examined in [14, Figure 4(a)], which considers a similar recoding principle as CRSB. In our loss evaluation methodology, see Section III-B1, the source stops sending packets after a prescribed number of packet transmissions. Consequently, the event of not completing the decoding of a generation tends to occur more frequently with CRSB than with conventional recoding.

b: SpaRec Packet Loss Dynamics
The SpaRec approaches also combine only few packets, similar to CRSB. However, in contrast to CRSB, which transmits mostly coded packets, the SpaRec approaches send mostly systematic packets. Thus, the SpaRec approaches avoid the CRSB drawback of requiring a large number of coded packet transmissions for decoding. The SpaRec systematic packet transmissions also aid the decoding of seen packets on the fly; thus, mitigating the problem of conventional recoding's inability to decode the seen packets.
We observe from Figure 7 that compared to the benchmarks, the three proposed SpaRec algorithms result in outliers at lower packet loss probability levels, typically less than 60% with Rec-woDec and SparsePR, as well as typically less than 40% (with most outliers clustering below 35%) with Dec-Rec. The low packet loss probabilities of the Dec-Rec outliers are mainly due to the on-the-fly decoding at each intermediate node. Hence, Dec-Rec gives intermediate nodes a chance to recover erased packets and send them to the next node.
We observe from Figure 7 that Rec-woDec and SparsePR achieve the smallest numbers of outliers; specifically, only ten and nine of the 10 000 generations, respectively, had packets losses and these were around 40%. For explaining the low number of packet loss outliers of Rec-woDec and SparsePR, it is instructive to compare the dynamics of Rec-woDec and SparsePR versus the dynamics of Dec-Rec towards the end of the transmission of the packets of a generation, when the source sends fully dense coded packets (see Section III-B1). Dec-Rec independently decodes the entire generation at each intermediate node. Typically, the VOLUME 4, 2016 first intermediate node N 1 first succeeds in fully decoding the generation (usually when the source starts to send fully dense coded packets or a few time slots thereafter).
After N 1 has decoded the full generation, N 1 ignores any further packets arriving from the source. Then, N 1 first checks whether there are any newly decoded packets that have not previously been transmitted and transmits all such packets. Subsequently, N 1 generates fully dense coded packets that are combinations of all G packets in the generation. The successive intermediate node N 2 typically decodes the full generation a few time slots after N 1 , and then follows the same process of transmitting packets that were not transmitted previously and then transmitting newly generated fully dense coded packets.
In contrast, when the fully dense coded packets from the source arrive to N 1 , the rateless Rec-woDec and SparsePR include the fully dense coded packets in the combination of the w fin packets in Rec-woDec (resp. five packets in SparsePR) that are combined to generate recoded packets. Thus, the recoded packets become fully dense coded packets. Effectively, the fully dense coded packets from the source are thus immediately forwarded (in the sense of being included in the recoded packets) by N 1 to N 2 , and similarly N 2 immediately forwards the fully dense coded packets received from N 1 to the destination. Thus, with Rec-woDec and SparsePR, the intermediate nodes begin forwarding the fully dense coded packets essentially in lock-step, i.e., in a "synchronized" fashion. This synchronized forwarding of the fully dense coded packets by the intermediate nodes to the destination, gets the fully dense coded packets to the destination sooner than with Dec-Rec, where the intermediate nodes are not synchronized. Rather, with Dec-Rec, the intermediate nodes independently (asynchronously) decode the full generation. Thus, with Dec-Rec, the fully dense coded packets tend to arrive later at the destination, resulting in a higher tendency of not finishing the decoding at the destination when the source stops transmitting.  It is instructive to first consider these ripples in an erasure free network: The first m = 8 systematic packets Pi, i = 0, 1, . . . , 7, will have a delay metric δ(i) = 0. After the transmission of these first m = 8 packets, the source sends k = 2 coded packets to complete the transmission of the first subset of packets. These k = 2 coded packets introduce a delay for the subsequent m = 8 systematic packets of 2 time slots, resulting in delays δ(i) = 2 for systematic packets with order numbers i = 8, 9, . . . , 15 in the erasurefree scenario. The systematic packets in each subsequent subset are delayed by an additional two time slots, resulting towards the end of the generation in δ(i) = 12 for i = 48, . . . , 55, and δ(i) = 14 = ⌊G/m S ⌋k S = ⌊60/8⌋2 for i = 56, . . . , 59. Accordingly, the mean packet delay D across the generation of G = 60 packets is D = 6.5 in the erasure-free scenario.
It is also instructive to consider the other extreme scenario in the delay dynamics, namely the scenario when a packet Pi is erased on one of the links and only recovered at the destination when the fully dense coded packets arrive. For source code rate c S = m S /(m S + k S ), there are ⌊G/m S ⌋ full subsets at the source, i.e., a total of ⌊G/m S ⌋k S coded packets with finite coding window w fin are transmitted by the source as part of the ⌊G/m S ⌋ full subsets. Thus, the fully dense coded packets begin to arrive at the destination at the earliest in time slot G + ⌊G/m S ⌋k S . Therefore, a packet Pi with a delay δ(i) ≥ G + ⌊G/m S ⌋k S − i = 74 − i is generally only recovered when fully dense coded packets start to arrive to the destination. (Dec-Rec is an exception since the fully dense coded packets in Dec-Rec may arrive later to the destination due to the asynchronous decoding in the individual intermediate nodes, see the end of Section IV-C1. ) We also observe from Figure 8(a) that for all approaches, except systematic RLNC, the mean in-order packet delay δ(i) is relatively short for the first packet for conventional recoding and CRSB or the first subset of m = 8 packets for the SpaRec approaches, and grows longer with increasing packet order number i, and becomes again shorter as the packet order number i approaches the end of a generation of G packets. This general behavior is mainly caused by the averaging of 10 000 independent replications of the transmission of the G packets (i = 0, 1, . . . , G − 1) in a generation to obtain the plotted mean in-order packet delay δ(i). More specifically, packets experience typically two different delay dynamics: (a) erasure-free multi-hop transmission or "local" recovery with the coded packets for the subset that the packet belongs to, or (b) "global" recovery with the fully dense coded packets that the destination receives at the end of the generation. The erasurefree transmission or local recovery results in short delays that are on the order of the number m of packets in a subset and the total number ⌊G/m S ⌋k S of coded packets with a finite coding window w fin . On the other hand, the global recovery results in packet delays on the order of the "distance" G + ⌊G/m S ⌋k S − i of packet Pi from the end of the generation, when fully dense coded packets arrive to the destination. can recover these two erased packets when N 1 receives the k = 2 redundant coded packets of the first subset. With the other SpaRec approaches, the destination can similarly recover the two erased packets through on-the-fly decoding, provided no erasures occur on the other links. If one of these coded packets is erased, or a third systematic packet is erased, then these erased systematic packets require global recovery at the destination at the end of the generation.

b: Packet Delay Dynamics of Benchmarks
We observe from Figure 8(a) that the in-order packet delays δ(i) of conventional recoding and CRSB increase steeply over the first m = 8 packets, i.e., packets i = 0, 1, . . . , m, in a generation. This is mainly because conventional recoding at N 1 creates increasingly dense recoded packets as the source sends uncoded systematic packets; thus, N 1 destroys the structure of the sliding window coding scheme that the source uses. More specifically, the source first sends a subset of m uncoded packets to allow the destination to utilize these packets immediately, if they are not erased by the links. However, conventional recoding always creates coded packets which require the portion of the decoding matrix at the destination that corresponds to the packets that have been received so far to reach full rank for decoding the coded packets that have been received so far. Suppose that packets 0, 1, . . . , i − 1 < m are transmitted erasure-free and Pi suffers an erasure on the S-N 1 link or its recoded version suffers an erasure on the N 1 -N 2 link or the N 2 -D link; then, packet Pi cannot immediately be decoded at the destination. If a total of less than two packet erasures occur during the transmission of the first m = 8 packets and the subsequent k = 2 coded packets transmitted by the source traverse the network erasure-free, then the two packet erasures can be recovered following the RLNC principles, resulting in the downward "notch" for packet delay δ (7) (the last packet in the first subset of m = 8 packets) for conventional recoding in Figure 8(a). Packet erasures that cannot be recovered with the k coded packets that the source sends per subset result in seen packets at the destination that can only be decoded with the additional fully dense coded packets at the end of the generation. In the considered multi-hop network in Figure 4, a given packet is transmitted erasure-free over all three hops with probability 0.85 · 0.8 · 0.75 = 0.51. Hence, packet erasures are quite common, resulting in the steeply growing delays δ(i) for conventional recoding in Figure 8(a), as the additional fully dense coded packets at the end of the generation are required to recover from packet erasures. Accordingly, the mean packet delays D across the G packets in a generation are very high for conventional recoding in Figure 8(b).
We observe from Figs. 8(a) and 8(b) that CRSB gives somewhat higher packet delays than conventional recoding. This in mainly due to the relatively weak protection in CRSB by combining few packets which then requires more packet transmissions at the end of a generation, confirming the results in [14, Figure 4(a)]. More specifically, in conventional recoding, all packets for a generation in the buffer in an intermediate node are combined when recoded packets are generated. CRSB only combines the packets in a small buffer to generate recoded packets. By periodically purging the buffer in the CRSB approach (see Section III-C3), intermediate nodes lose the information of previous packets. Accordingly, CRSB requires more fully dense coded packets at the end of a generation.
With systematic RLNC, any packet erasure during the transmission of the G systematic packets in the first G time slots can only be recovered with the fully dense coded packets transmitted at the end of the generation. Accordingly, the mean in-order packet delay δ(i) decreases  with the "distance" to the end of the generation, as observed in Figure 8(a). Nevertheless, the systematic packet transmissions i that arrive erasure-free have an in-order packet delay δ(i) = 0, reducing the upper quartile of the mean packet delay D in Figure 8(b).

c: SpaRec Packet Delay Dynamics
We observe from Figure 8(a) that the in-order packet delays δ(i) with the SpaRec schemes are substantially lower than for the benchmarks, whereby Dec-Rec achieves generally the lowest packet delays. The Rec-woDec and SparsePR packet delays are typically a few time slots higher than the Dec-Rec delays. The somewhat higher packet delays δ(i) of Rec-woDec and SparsePR compared to Dec-Rec are primarily due to the inclusion of coded packets in the packet combinations created by Rec-woDec and SparsePR, increasing the coding density of the created recoded packets. These denser recoded packets tend to cause more seen packets at the destination, which require more fully dense coded packets at the end of the generation for decoding. In contrast, Dec-Rec only creates coded packets from systematic or decoded packets. Therefore, the Dec-Rec recoded packets have a relatively lower (sparser) coding density (as examined in more detail in Table 3), causing fewer seen packets at the destination, and thus requiring fewer fully dense coded packets at the end of the generation. Also, due to the decoding at each intermediate node, Dec-Rec tends to deliver more systematic packets to the destination than Rec-woDec and SparsePR.
Overall, we observe from Figure 8(b) that the SpaRec approaches significantly reduce the mean delay D of the packets in a generation compared to the existing recoding benchmarks. The third quartiles of the mean packet delay D with the SpaRec approaches are consistently below the first quartiles of the mean packet delays D of the existing bench-marks. The mean packet delays D of the proposed Dec-Rec approach are approximately half or less of the corresponding mean packet delays of the existing benchmarks.
In additional evaluations, which we cannot include in detail due to space constraints, we examined the sensitivity of the packet loss and delay performance to an underestimation of the erasure probability ϵ by increasing ϵ by 0.1 for each link in Fig. 4 while keeping the source code rate initially unchanged at c S = 8/10 (and rateless recoding in intermediate nodes). We found that the increased ϵ: (i) increased the mean in-order packet delays D by a factor of 1.5 to 1.8, and (ii) increased the upper quartiles of the conventional recoding and CRSB packet loss probabilities to approx. 87%, while the SpaRec upper quartiles remained below 57%. Decreasing, c S to 6/10: (i) reduced D to levels that are 1.2 to 1.6 above the original levels, and (ii) returned the packet loss probabilities to their original levels.
3) Throughput Performance a: Impact of Generation Size G on Recoding Throughput Figure 9 presents the recoder (computation) throughput T r and decoder (computation) throughput T d as a function of the generation size G. A larger generation size G increases the recoder computations for a recoding approach that processes all packets that have been received for a given generation, namely conventional recoding. In particular, conventional recoding combines all (up to G = 60) packets that have been received so far for a given generation [77]. The systematic RLNC and Dec-Rec recoders strive to decode the complete generation in each intermediate node and thus incur increasing computational complexity as G increases. More specifically, the systematic RLNC and Dec-Rec recoders check in each time slot whether a new packet has been decoded.
We observe from Figure 9(a) that the CRSB, SparsePR, and Rec-woDec recoding throughput levels tends to slightly increase with increasing generation size G. These T r increases for increasing G are mainly due to the sublinear growth of the number of superfluous packets that are generated per generation as G grows. Specifically, additional evaluations revealed for CRSB for G = 60 a mean of 28 superfluous packets, while the four times larger G = 240 had only a 2.5 times larger mean of 74 superfluous packets. A proportionally smaller number of superfluous packets implies a proportionally lower processing burden for intermediate nodes (and the destination) from superfluous packets.
b: Impact of Generation Size G on Decoding Throughput We observe from Figure 9(b) that conventional recoding gives the lowest decoding throughput. With conventional recoding, the destination receives only dense coded packets. Dense coded packets require substantial computation effort for decoding that grows with the generation size G on the order of G 3 [79], [80] since the entire G × G coding coefficient matrix and the corresponding packet payloads need to be processed [79], [80]. Accordingly, the decoding throughput T d [see Eqn. (4)] drops with a quadratic order, as observed in Figure 9(b). CRSB, which also delivers essentially only coded packets to the destination, achieves higher decoding throughput as only the coding coefficients corresponding to the up to 10 packets in the recoding buffer are non-zero. Thus, the CRSB recoded packets have a lower coding density compared to the dense coded packet of conventional recoding. We observe from Figure 9(b) that the SpaRec approaches are clustered together and achieve nearly the same decoding throughput levels, which are in the general vicinity of the systematic RLNC decoding throughput. The SpaRec approaches and systematic RLNC achieve high decoding throughput levels mainly because of the systematic packets received at the destination.

c: Impact of Number of Intermediate Nodes on Superfluous and Systematic Packets at Destination
Before the examination of the throughput as a function of the number of intermediate nodes in Section IV-C3d it is instructive to consider the mean number of superfluous received packets per generation at the destination as displayed in Figure 10. For a given generation, the number of superfluous received packets was evaluated by subtracting the generation size G from the total number of received packets until the generation could be decoded at the destination. Thus, effectively, Figure 10 shows the numbers of superfluous coded packets that are generated by the recoding algorithms per generation. These superfluous received coded packets are useless to the decoder at the destination in that they do not increase the rank of the decoder coefficient matrix, i.e., these superfluous received coded packets are linear combinations of previously received  packets. Determining the linear dependency takes some computational effort [82]. We observe from Figure 10 that systematic RLNC has essentially no superfluous coded packets. With systematic RLNC, the source node and each intermediate node sends each of the G packets in the generation once in systematic, i.e., uncoded form, and then sends fully dense coded packets, which can be utilized to recover any erased packet of the generation at the next intermediate node or the destination.
In contrast, we observe from Figure 10 that Dec-Rec has the highest numbers of superfluous received coded packets. This is mainly due to the relatively high number of linearly dependent packets that Dec-Rec generates by recoding a limited set of w fin systematic and decoded packets in the intermediate nodes. Rec-woDec and SparsePR create less superfluous packets by including received coded packets when combining packets to create recoded packets. These Rec-woDec and SparsePR recoded packets therefore tend to have a higher coding density (see Table 3) and to include information from a wider set of packets, reducing the prob-ability of linear dependency. More specifically, SparsePR has a higher coding density than Rec-woDec due to the continuous combination of each incoming packet with the packet buffers in SparsePR (see Fig. 3); therefore, SparsePR has a lower number of superfluous packets than Rec-woDec.
We observe from Table 3 that the recoding approaches that strive to build up the full generation of systematic (uncoded) packets at each intermediate node, i.e., systematic RLNC and Dec-Rec, deliver a high number of uncoded packets to the destination that is independent of the number of intermediate nodes. With systematic RLNC, the last intermediate node transmits all packets in uncoded form to the destination, followed by fully dense coded packets to recover the packet erasures on the last link. In contrast, Dec-Rec intersperses coded packets that are combinations over the finite coding window w fin (and thus have a low coding density) among the uncoded packet transmissions. Rec-woDec also combines packets in the w fin window, achieving low coding density; whereas, SparsePR codes with the packet buffer structure in Fig. 3, resulting in a moderately high coding density around 0.56. Both, Rec-woDec and SparsePR deliver more coded packets to the destination as the increased number of erasures with the increasing number of links create more idle slots. Similarly, with more links, the mean coding density of the delivered coded CRSB packets increases, mainly due to the increasing number of fully dense coded packets at the end of a generation. d: Impact of Number of Intermediate Nodes on Throughput Figure 11 presents the recoder throughput T r and decoder throughput T d as a function of the number of intermediate nodes. We observe from Figure 11(a) that all approaches exhibit decreasing recoding throughput as the number of intermediate recoding nodes increases, whereby this decrease is most noticeable for CRSB and systematic RLNC.
As the number of links that the packets have to traverse increases, the probability of packet erasures on the set of links increases, as each link erases a packet independently with probability ϵ. Accordingly, Dec-Rec, which strives to decode the entire generation in each intermediate node, requires more fully coded packets at the end of the transmission of a generation to decode the generation. Thus, when averaging over a full generation, the intermediate nodes-especially the intermediate nodes towards the end of the multi-hop path-have to decode on average more fully dense coded packets, which requires more computational effort for decoding. Therefore, the plotted Dec-Rec average recoding throughput (across the given set of intermediate nodes) decreases as the number of intermediate nodes on the multi-hop path increases.
Subtly different, in systematic RLNC, each intermediate node builds up the full generation from the systematic packets and coded packets received from the immediately preceding node along the path. Thus, the number of erased packets that need to be recovered in a given intermediate node is only affected by the erasures on the one link form the immediately preceding intermediate node (or source). However, the order in which the packets arrive to the individual intermediate nodes is more strongly re-shuffled as more systematic packets are erased on the successive links and each intermediate node immediately forwards the received systematic packets (one packet per time slot, in the packet order number from the generation at the source, see Section III-C2). Packets that have been erased on the incoming link are recovered with the fully dense coded packets at the end of the generation. The transmission order according to the packet order number, which has been adopted to reduce the in-order packet delays, requires more extensive searching for the first in-order decoded-not-yettransmitted packet as the number of intermediate nodes increases.
The CRSB recoder throughput in Figure 11(a) and decoder throughput in Figure 11(b) mainly decrease due to the relatively large number of fully dense coded packets that CRSB has to recode in the intermediate nodes at the end of a generation. CRSB offers weak protection against packet erasures, as examined in Section IV-C1a, and, commensurately, requires a high number of fully dense coded packets at the end of the generation (approx. 11 and 28 such packets for one and four intermediate nodes, respectively). The required high number of fully dense coded packets can be inferred from the high number of superfluous received coded packets with CRSB in Figure 10. More specifically, many of the recoded packets that CRSB generates in the intermediate nodes by combining the packets in the buffer during the transmission of the finite-window coded packets from the source are linearly dependent as they do not include erased packets from outside the range of currently buffered packets (and the buffer is periodically purged, thus on average only approximately 5 packets are combined for the considered 10 packet buffer). And the probability of packet erasures from outside the buffered range increases as packets are independently erased over a larger set of links. Therefore, the recovery at the destination requires the transmission of an increasing number of fully dense coded packets at the end of a generation as the number of intermediate nodes increases, resulting in the increasing mean coding density observed in Table 3. This increasing number of fully dense coded packets poses a relatively high computational load on the intermediate nodes for recoding, leading to the declining CRSB recoding throughput over a full generation in Figure 11(a). The computational burden from the increasing number of fully dense coded packets is even higher for the decoding in the destination, leading to the steep drop-off of the CRSB decoding throughput to near the levels of conventional recoding in Figure 11 On the other hand, Rec-woDec, SparsePR, and conventional recoding do not periodically purge the recoding buffer. Rather, they combine all received uncoded and coded packets within the full recoding window w fin (or the set of five  packet buffers in SparsePR). This reduces the probability of creating linear dependent coded packets, leading to the moderate numbers of superfluous packets in Figure 10 that increase very little with the number of intermediate nodes.
Thus, Rec-woDec, SparsePR, and conventional recoding avoid the large numbers of fully dense coded packets that CRSB requires at the end of a generation (e.g., Rec-woDec requires approx. 7.6 and 13 such packets for one and four intermediate nodes, respectively). Correspondingly, Rec-woDec, SparsePR, and conventional recoding do not suffer from a pronounced decrease of the recoding throughput in Figure 11(a) as the number of intermediate nodes increases.
However, with an increasing number of links, the probability that a systematic packet is transmitted erasurefree over all links to the destination decreases. Thus, the SpaRec intermediate nodes need to fill more idle slots with coded packets, which increases the computational burden, and, in turn, reduces the recoding throughput. This recoding throughput reduction is least pronounced for SparsePR, which combines each arriving packet with the packet buffers; thus, the creation of coded packets is a relatively very small additional computational load. The proportion of coded packets arriving to the destination increases as Rec-woDec and SparsePR fill more idle slots with coded packets, see Table 3. Commensurately, the decoding throughput levels for Rec-woDec and SparsePR are reduced in Figure 11(b) for an increasing number of intermediate nodes. Despite high numbers of superfluous received coded packets, see Figure 10, Dec-Rec achieves a relatively high decoding throughput, see Figure 11(b), mainly because Dec-Rec delivers a relatively high number of systematic packets to the destination decoder and the delivered coded packets have low coding density, see Table 3.

e: Summary of Throughput Results
When excluding CRSB due to its poor packet loss and delay characteristics, see Figures 7 and 8, we observe from Figures 9(a) and 11(a) that Rec-woDec and SparsePR achieve the highest recoding throughput levels. Also, Rec-woDec achieves the highest decoding throughput levels [see Figures 9(b) and 11(b)]. As we observe from Figures 9  and 11, the recoding throughput is generally lower than the decoding throughput, i.e., for the same computational capabilities in intermediate nodes and the destination, the recoding in the intermediate nodes is the bottleneck. The relatively higher computational complexity for recoding compared to decoding is mainly due to the computational effort for creating recoded packets to fill the idle slots that arose due to link packet erasures, including the computations for generating the pseudo-random numbers for the recoding.
The SpaRec approaches with the highest recoding throughput levels, namely Rec-woDec and SparsePR, achieve substantially higher recoding throughput levels than the highest recoding throughput benchmark: For instance, for a generation size of G = 240, Rec-woDec recodes approximately 463 Mbit/s compared to about 270 Mbit/s with systematic RLNC in Figure 9(a). For four intermediate nodes in Figure 11(a), Rec-woDec recodes approximately 450 Mbit/s compared to 218 Mbit/s with conventional recoding, thus Rec-woDec can double the recoding throughput compared to conventional recoding.

V. SUMMARY AND FUTURE WORK
Within the context of sparse systematic RLNC, we have proposed and evaluated a set of three distinct sparsitypreserving recoding (SpaRec) strategies: Dec-Rec for recoding with a decoder at each intermediate network node, Rec-woDec for recoding without a decoder and with sufficient buffers to hold all packets in the recoding window, as VOLUME 4, 2016 well as SparsePR for recoding without a decoder and with limited buffers. These three SpaRec strategies can operate with a finite recoding window size or with an infinite recoding window (which then extends to all the packets in a generation). Also, the SpaRec strategies can operate with a prescribed code rate or conduct adaptive (rateless) recoding without a prescribed code rate.
Our extensive discrete-event simulation based evaluations indicate that the practical finite-length recoding window enhances the recoding and decoding (computation) throughput. Also, the rateless recoding reduces the in-order packet delays. We have compared the SpaRec strategies with finite-length recoding window and rateless recoding against several benchmarks, namely conventional recoding, systematic RLNC recoding, and conventional recoding with small buffers. The benchmark comparisons indicate that the SpaRec approaches substantially reduce the packet loss probability, reduce the in-order packet delays (to nearly half of the benchmark delays), while enhancing the recoding throughput (can be doubled) and the decoding throughput.
Among the SpaRec approaches, Rec-woDec and SparsePR achieve the lowest packet loss probabilities, nearly the lowest packet delays, and the highest recoding throughput levels in the intermediate nodes. We also find that Dec-Rec is highly competitive, achieving the lowest in-order packet delays. Interestingly, the decoding in the intermediate nodes in Dec-Rec only moderately reduces the recoding throughput while achieving nearly the same decoding throughput compared to Rec-woDec and SparsePR.
There are several important directions for future research on sparsity-preserving recoding (SpaRec) for sparse systematic RLNC. The present initial SpaRec study has focused on single-path multi-hop networks. Future research should extend the SpaRec strategies to multi-path multi-hop networks [83]- [85]. Multi-path networks pose several new challenges, such as different delays and erasure probabilities on the different paths that need to be accounted for in the coding of the packet transmissions for the different paths. Generally, in order to avoid the complications of recoding in the intermediate nodes on the multiple paths, recoding strategies without a recoder may be a good initial starting point for researching recoding in multi-path settings.
Another important future research direction is to examine the energy consumption of the recoding algorithms. Commonly, hardware-based solutions substantially reduce the energy consumption compared to software-based solutions [86]- [88]. Therefore, it will be important to develop and evaluate efficient SpaRec hardware modules or accelerators for intermediate network nodes.