Sliding-window forward error correction based on reference order for real-time video streaming

In real-time video streaming, data packets are transported over the network from a transmitter to a receiver. The quality of the received video fluctuates as the network conditions change, and it can degrade substantially when there is considerable packet loss. Forward error correction (FEC) techniques can be used to recover lost packets by incorporating redundant data. Conventional FEC schemes do not work well when scalable video coding (SVC) is adopted. In this paper, we propose a novel FEC scheme that overcomes the drawbacks of these schemes by considering the reference picture structure of SVC and weighting the reference pictures more when FEC redundancy is applied. The experimental results show that the proposed FEC scheme outperforms conventional FEC schemes.


I. INTRODUCTION
R Eal-time video streaming has been widely used in online education, e-commerce, live broadcasting, and video conferences [1]. Factors such as weak wireless signal strength in wireless connections and congestion at network nodes can cause data packet loss during streaming, which can negatively affect the quality of user experience (QoE). Several FEC schemes can be adopted to mitigate this problem [1]- [5].
In real-time video streaming, FEC schemes can be divided into the following common schemes: frame-level FEC, group of pictures (GOP)-level FEC, expanding-window FEC, and sliding-window FEC. One common approach is to perform Reed-Solomon (RS) coding at the frame level, where the RS coding block contains video packets from the same video frame (i.e., each coding window contains a single video frame and its corresponding FEC redundancies) [6]. Under this constraint, the RS decoder does not need to collect source packets of many frames to recover lost packets; therefore, there is no decoding delay. However, when the number of source video packets generated in each frame is small, FEC is inefficient. In addition, the recovered packets of the current frame cannot help to recover the lost packets of previous frames, and the distortion of the previous frames may propagate to the current and following frames. In GOP-level FEC, the FEC coding window contains all the video frames in a GOP to produce the corresponding FEC redundancies [7], [8]. By utilizing a large coding window, GOP-level FEC is capable of handling burst loss. However, GOP-level FEC brings additional decoding delay when all the video frames in the GOP must be collected for FEC decoding, which is unacceptable in real-time video streaming.
Frame-level FEC allocates repair packets for each frame of the GOP, whereas GOP-level FEC packets are allocated once for the entire GOP, which introduces an additional decoding delay. Expanding-window and sliding-window FECs both eliminate the additional delay by allocating repair packets for each frame. The difference is that expanding-window FEC uses the current video frame and all previous frames to construct the coding window, so the window expands continuously until the entire GOP is covered [10], [11], whereas sliding-window FEC adopts a fixed window that contains only a portion of the data of the previous encoding window [13], [14]. In expanding-window FEC, the coding window size increases linearly within one GOP; for example, the f -th group of FEC packets is generated after all source packets from frame 1 to f are collected. In practical implementations, the computational cost and decoding delay are quite high when the size of the GOP is sufficiently large [12]. Thus, the application of expanding-window FEC in real-time video streaming is restricted. By contrast, sliding-window FEC can maintain good recovery performance without overaccumulating coding; it has sufficient recovery performance while maintaining low delay and low complexity. Slidingwindow FEC has a sliding coding window that typically consists of a set of frames consecutively in time in the same GOP. In this paper, we call this kind of coding scheme "timeorder sliding window FEC" or "continuous sliding window FEC".
In video coding, low-priority frames (i.e., P-frames or Bframes) are encoded based on the preceding high-priority frames (i.e., I-frames or P-frames) in a video sequence [9]. Therefore, if the reconstruction of a high-priority frame fails, the subsequent low-priority frame is regarded as a reconstruction failure, even if it is successfully recovered by the FEC decoder.
In general video coding, the FEC process is performed on a frame-by-frame basis. However, in the SVC case, the dependency between frames is not a simple one-dimensional frame-by-frame dependency in time order [15]. The frames are grouped into layers: the base layer and one or more enhancement layers. The frame in one layer may not reference the previous frame in time order, which is why time-order sliding-window FEC is not optimal in this case.
In this paper, we propose a sliding-window FEC scheme based on video coding dependencies, namely, "referenceorder sliding window FEC" or "discrete sliding window FEC". The proposed scheme uses the reference order as a constraint of FEC coding such that there must be a dependency between the frames in the encoding window. The proposed scheme automatically realizes unequal error protection (UEP) and does not need to allocate different amounts of redundancy to frames of different priority, because the scheme itself automatically obtains a higher recovery probability for higher-priority frames. Compared with conventional framelevel FEC and time-order sliding-window FEC, the proposed FEC scheme performs better in terms of the playable frame rate (PFR) and peak signal-to-noise ratio (PSNR).
The remainder of this paper is organized as follows. Section II describes the proposed FEC scheme. Section III provides a theoretical analysis of the proposed FEC scheme in the SVC mode. In Section IV, we present our experimental results for PFR and PSNR. Finally, conclusions are given in Section V.

II. SLIDING-WINDOW FEC BASED ON REFERENCE ORDER A. CODING WINDOW MANAGEMENT MECHANISM BASED ON TIME ORDER
The window management mechanism of time-order sliding window FEC is based on the frame timestamps [16], [17]; that is, the video frames in the FEC coding window are arranged according to the time when they are generated by the video encoder. In general video coding, the frame dependence is continuous (i.e., frame-by-frame reference), as shown in Fig. 1. Thus, it is feasible to use the time order as the rule for coding window management. As shown in Fig.  3, Window-1, the first FEC coding window, contains the first video frame, which is the intra frame I, and the second FEC coding window (Window-2) contains frame I and frame P1. The third coding window (Window-3) contains frames I, P1, and P2. Considering the general case, let T be the maximum number of frames in the coding window, let frame X be the X-th frame produced by the video encoder, and let the FEC coding window corresponding to frame X be Window-X, containing the preceding frames X T to X.

B. THE PROBLEM WITH TIME-ORDER CODING WINDOW MANAGEMENT
SVC adopts a hierarchical structure, in which frames in a GOP are allocated to the base layer or enhancement layer. This approach causes the reference order between frames to change; that is, the previous frame may not necessarily be the reference frame for the current frame. For example, as shown in Fig. 2, frame P2 references frame I instead of frame P1.
If we apply the time-order mechanism in the SVC case, the coding window management will be the same as that in Fig. 3, whereas the coding dependency will follow Fig.  2. Consequently, this mechanism may reduce the recovery probability, particularly under high packet loss rate (PLR) conditions. Suppose that frames I, P1, and P2 correspond to 8, 4, and 4 packets, respectively (the frames are divided into packets for transmission based on the maximum transmission unit (MTU)). Under the conditions of RS coding, we use RS (16,8) to encode Window-1 to generate 8 repair packets. For frame P1, RS (16,12) is used on Window-2 to  4 repair packets; for frame P2, RS(20,16) is used on Window-3 to generate 4 repair packets. Considering that the PLR is 50% and the decoder side is lucky in receiving 8 packets from frame I, Window-1 (containing only frame I) can be successfully decoded with RS coding. Under the further assumption that the decoder side also receives 3 packets from frame P1, this means that 11 packets in total have been received for Window-2, which is insufficient for decoding. As a consequence, at least 5 packets from frame P2 are required to decode Window-3; however, only 4 packets are required to successfully decode frame P2 in frame-level RS coding. Thus, the recovery probability of the time-order sliding window scheme is lower than that of the frame-level FEC. Because frame I is decodable and frame P2 references frame I in the SVC case, the recovered frame P2 can be rendered (which is not possible in frame-by-frame reference patterns); therefore, the PFR is also lower than that of framelevel FEC.
Thus, the window management mechanism based on time order is no longer appropriate in the SVC mode. The proposed coding window management mechanism based on reference order adapts to this scenario by considering the reference picture structure and works well for both SVC and non-SVC encoders.

C. CODING WINDOW MANAGEMENT MECHANISM BASED ON REFERENCE ORDER
Under the coding window management mechanism based on reference order, the frames in the coding window are arranged in accordance with the reference structure. When a new frame is generated and passed to the FEC encoder, the FEC encoder caches the frame and obtains its reference frame. During the FEC coding process, a backward search is performed to find the reference frames of the current frame, and all reference frames are added to the coding window. The backward search stops when the encoding window exceeds the maximum window size or when an intra frame is found. As shown in Fig. 4, Window-1 contains the first frame (i.e., frame I), and Window-2 contains frames I and P1. Because the reference frame of frame P2 is frame I rather than frame P1, Window-3 contains only frames I and P2, whereas under the coding window management mechanism based on time order, Window-3 contains frames I, P1, and P2. Compared with the coding window management mechanism based on time-order sliding window FEC, the new method guarantees that all frames in the FEC coding window have a coding dependency; therefore, whenever an FEC coding block is decodable, the recovered frames are ensured to be decodable by the video decoders in the subsequent video pipeline and can then be successfully rendered. For non-SVC encoders, reference-order coding window management works similarly to the time-order scheme.
The proposed reference-order-based window management method implements an efficient UEP because the higher the priority of a video frame is, the more referenced frames are obtained (either directly or indirectly) and the more times the frame is encoded; thus, the recovery probability is also higher. In the example of coding window management illustrated in Fig. 4, frame I has the highest priority, followed by frame P2. Frames P1, P3 and P4 have the lowest priority because these three frames are not referenced by any other VOLUME 4, 2016 Fig. 2 shows an example unidirectional prediction structure with 2-layer temporal scalability encoding. This section explores the coding window management based on reference order for more complex SVC structures, e.g., bidirectional prediction structures and prediction structures with spatial layers.
The coding window management mechanism for complex SVC structures follows the same principle; that is, the FEC coding window should contain only the reference frames of the current frame. When FEC coding is performed, the FEC encoder searches backward to find the reference frames of the current frame (namely, the direct reference frames, of which there may be one or more) and the reference frames of the direct reference frames (namely, the indirect reference frames). Then, all reference frames (direct and indirect) are arranged in accordance with their generation timestamps. For the case of spatial SVC, frames with the same timestamp are arranged in accordance with their spatial resolution. Suppose that the maximum coding window size is T . Then, to perform FEC coding, the FEC encoder selects the last T 1 frames from the sorted queue to construct the encoding window with the current frame. Fig. 5 shows an example of a bidirectional prediction structure with 1-layer spatial and 2-layer temporal scalability encoding. Following the above principles, when T is set to 4, the coding window for frame P5 is composed of frames P2, P3, P4 and P5, and the coding window for frame P6 is composed of frames I, P2, P4 and P6.   6 shows an example prediction structure with 2-layer temporal and 2-layer spatial scalability encoding. Following the same principles, when T is set to 4, the coding window for frame P 21 consists of frames I 00 , I 01 , P 20 , and P 21 , and the coding window for frame P 31 consists of frames P 20 , P 21 , P 30 , and P 31 . When T is set to 6, the coding window for frame P 31 is composed of frames I 00 , I 01 , P 20 , P 21 , P 30 , and

III. THEORETICAL ANALYSIS A. RECOVERY PROBABILITY CALCULATION
In this section, we use the systematic RS erasure code as an example to study the decoding probability of the aforementioned FEC schemes. For simplicity, an independent Bernoulli process is used to model the packet loss. Suppose that there are k source packets in the RS coding window and that k source packets are encoded via RS coding to generate h redundant packets (k + h = n). When n packets are sent over the transmission channel, the probability that b packets will be received is as follows: where p denotes the packet loss probability. As long as the receiver receives at least k of the n packets, the lost source packets can be recovered by the RS decoder; thus, the probability of the FEC coding window being successfully reconstructed is as follows: To understand the data recovery processes of frame-level FEC, sliding-window FEC, and the proposed sliding-window FEC scheme based on reference order, this section explores the recovery probability for a group of frames consisting of three given frames in the SVC mode (e.g., the first three frames in Fig. 2). Let these three frames be frames I, P1, and P2.
We use k 1 , k 2 , and k 3 to denote the numbers of original packets from frames I, P1 and P2, respectively, and h 1 , h 2 , and h 3 to denote the corresponding numbers of redundant packets for each frame. Let X, Y , and Z similarly represent the numbers of packets received from each frame. Thus, we have 0<=X<=k 1 + h 1 , 0<=Y <=k 2 + h 2 , 0<=Z<=k 3 + h 3 . The recovery probability for Window-i is denoted by Q i . If Window-1 contains only frame I, then the recovery probability for Window-1 is given by the following equation: For Window-2, the FEC encoding window contains two frames: frame I and frame P1. To successfully recover Window-2, the following two conditions must be met: Because X, Y , and Z are independent, the recovery probability for Window-2 can be further expressed as follows: Under the sliding-window FEC scheme based on time order, the frames contained in Window-3 are frames I, P1 and P2, and the recovery probability can be expressed as follows: where a = k 2 +k 3 , b = k 1 +k 2 +k 3 , c = a m, d = b m j, In contrast to the conventional sliding-window FEC scheme, the proposed FEC scheme takes the reference order as the criterion for moving and expanding the encoding window. Thus, frame P1 is excluded from Window-3, and the frames contained in Window-3 are frames I and P2. Therefore, the recovery probability for Window-3 is given by the following equation: Given the recovery probabilities for all three windows, the PFR can be calculated. Notably, when Window-1 can be reconstructed, the number of playable frames should be one; when Window-2 or Window-3 can be reconstructed, the number of playable frames should be two; and when Window-2 and Window-3 can both be reconstructed, the number of playable frames should be three instead of four because frame I can be rendered only once, although it is recovered by both windows. Therefore, the expected PFR under the proposed FEC scheme is given by Equation 8. VOLUME 4, 2016 Similarly, the expressions for the expected PFRs of the frame-level FEC scheme and the time-order sliding-window FEC scheme can be deduced, as shown in the following equations: Based on the above-derived expected PFR expressions, we draw curves to compare the expected PFRs of the different FEC schemes under different PLRs in the SVC mode, as shown in Fig. 7. The proposed FEC scheme outperforms both the time-order sliding-window FEC scheme and the framelevel FEC scheme.

B. DELAY ANALYSIS COMBINED WITH ARQ TECHNOLOGY
In practical real-time video streaming applications, FEC and automatic repeat request (ARQ) technologies are often used together to minimize the impact of network packet loss.
In frame-level FEC, when the number of packets lost in a video frame exceeds the number of corresponding redundant packets, the receiver can only wait for retransmission to obtain the lost source packets. The retransmission delay is related to the round-trip time (RTT) and the PLR. In high-RTT networks, the retransmission delay can be too large to wait the retransmitted packets for real-time applications. In the proposed FEC scheme, the subsequent redundant packets can help recover the packet loss of previous frames. As long as the recovered packet arrives earlier than the retransmitted packet, the receiver can use that packet and cancel the repeat request, which is helpful in reducing retransmission delay and bandwidth consumption.

IV. PERFORMANCE EVALUATION
The video sequences used in this experiment are commonly used video test sequences in the 4:2:0 YUV format (e.g., "F oreman", "F lower", and "Silent"). The video codec used is OpenH264. Each GOP contains 15 frames; the frame sequence in each GOP is "IP e P b P e P b P e P b P e P b P e P b P e P b P e P b ", where P e is a P frame in the enhancement layer and P b is a P frame in the base layer. The spatial resolution of the test sequences is 352 * 288. At the encoder, we set the frame rate to 30 fps. The target bitrate was set to 500000 bps. The OpenH264 encoder's usage type was "CAM ERA_V IDEO_REAL_T IME". To observe the error correction performance of the different FEC methods in the SVC mode, we set the temporary layer number to 2. The remaining settings of the OpenH264 encoder followed the default configuration. For the decoder, the default configuration of OpenH264 was adopted.
The PFRs and PSNRs under different PLRs were experimentally analyzed using (1) the frame-level FEC scheme [7], (2) the conventional sliding-window FEC scheme based on time order (referred to simply as the sliding-window scheme for brevity) [18], and (3) the sliding-window FEC scheme based on reference order (the proposed scheme).
Thus, the proposed FEC scheme outperforms both the time-order sliding-window FEC scheme [18] and the framelevel FEC scheme [7], particularly at high PLRs.

B. PSNR
The PSNR is an indicator for evaluating video quality that represents the ratio of the maximum possible signal power to the destructive noise power. The higher the PSNR is, the higher the video quality [20]. In other words, the PSNR reflects the extent to which video communication is affected by packet loss. Fig. 9 shows the PSNRs of the different FEC schemes under various PLRs in SVC mode. When the PLR is less than or equal to 4%, the three FEC schemes exhibit similar performance because it is not easy for sudden and continuous packet loss to occur under the condition of a low PLR. When the PLR is 6%, 8%, 10%, 12%, 14%, 16%, 18%, and 20%, the PSNR is respectively 35.22 dB, 33.21 dB, 32. 17 [18]. When the proposed FEC scheme is used, the PSNR is further increased to 36.58 dB, 34.90 dB, 33.45 dB, 32.41 dB, 31.99 dB, 31.37 dB, 30.12 dB, and 29.49 dB, respectively.
The above experimental data illustrate that the proposed method shows good performance in maintaining video quality under a high PLR in SVC mode.

V. CONCLUSION
In this paper, we have studied sliding-window FEC in realtime video streaming. A sliding-window FEC scheme based on reference order is proposed to improve the error correction performance for SVC encoders. The proposed scheme considers the frame dependency characteristics of the video encoder (source coding) in coding window management for sliding-window FEC (channel coding) and achieves an optimal UEP solution. Experimental results show that compared with frame-level FEC and time-order sliding-window FEC, the proposed FEC scheme achieves notable improvements in terms of the PFR for SVC encoders.