• Abstract

Broadcast of Scalable Video Over Wireless Networks

Scalable video coding technique has been desired for many years to realize a reliable transmission of video over heterogeneous networks. In this paper, we present a new system of scalable video broadcasting over wireless networks. For each group of pictures, a number of quality layers are produced using scalable video coding. We design the channel protection schemes using the forward error correction codes for the base layer and the enhancement layers, respectively. Given the clients' distribution, we propose a novel algorithm to determine both the source coding bit-rate and the channel coding bit-rate for each layer to maximize a system-defined utility function. Experimental results can demonstrate the superior of the proposed scheme to other schemes and the improvement is up to 2 dB.

SECTION I

INTRODUCTION

With increasing of the bandwidth in the mobile network, visual communication over wireless channels has become popular and received much attention. Wireless broadcasting enables various mobile users with different platforms to access to the multimedia information simultaneously. A distinctive feature of wireless broadcast system is that the receivers are highly heterogeneous in terms of their bandwidths and processing capabilities. It is therefore desirable to use multirate transmission, in which the receivers can receive video streams at different rates depending on their corresponding bandwidths. Scalable video coding (SVC) [1] has been shown to be a very attractive solution for this problem. It encodes raw video data into a number of layers of different priorities. The layer with the highest priority, called the base layer, contains the data with the highest importance. The enhancement layers with lower priorities may be encoded progressively to further refine the quality of the base layer stream.

Layered transmission has been studied by many researchers [2], [3], [4], [5], [6]. McCanne et al. firstly proposed the receiver-driven approach for layered video multicast [2]. This approach uses a layered video encoder to generate multiple layers from a single video sequence and transmits each layer over a separate multicast group. The number of layers as well as their bandwidths is pre-determined. The adaptation is performed at the receiver's end, where a receiver periodically tries to subscribe to more groups. Hsu and Hefeeda introduced a sender-driven mechanism [3], where the sender can use the feedback information to adjust the coding parameters dynamically to improve both the network utilization and the quality of the video obtained by the end users.

Another major challenge in video communication is the channel errors. Forward error correction (FEC) techniques are widely used to reduce the effects of errors on the decoded video quality [4], [5], [6]. Schierl et al. presented an approach for wireless video broadcasting using SVC with an unequal erasure protection scheme [6]. Different amount of FEC codes are allocated to different layers according to their priority to achieve graceful degradation. However, it pre-determines the source coding bit-rate and the channel coding bit-rate without consideration of receivers' statistics.

In this paper, the problem of wireless broadcasting of scalable video is addressed. To cope with the heterogeneity of different receivers, SVC is applied to encode the raw video data into multiple quality layers. FEC scheme is adopted to generate protection codes. The base layer is highly protected to guarantee that it can be correctly received by all the clients with a very low error rate. For the enhancement layers, unequal error protection (UEP) is employed to ensure that a lower layer achieves a higher protection priority. Given the receivers' statistics, we design an algorithm to determine the source bit-rate and the channel bit-rate for each layer to maximize a system-defined utility function. We implement the algorithm to verify its advantage, and we show how various allocation structures affect the overall utility.

The rest of the paper is organized as follows. In Section II, we give an overview of the system. In Section III, the system-level optimization problem is formulated and the proposed algorithm is described. The experimental results are presented and analyzed in Section IV. Section V makes a conclusion.

SECTION II

SYSTEM OVERVIEW

The basic framework of the system is depicted in Fig. 1. The most important component in the system is the video server. Given the input video and the clients' statistics, the server's task is to dynamically encode the raw video into multiple layers using the scalable video coding followed with allocation of channel protection codes to different layers to maximize the system-wide utility for all the clients. The layered streams, together with the protection codes, are packetized for transmission over error-prone network to the clients. Each receiver subscribes to all or partial of the packets depending on its bandwidth limitation. The receivers also send sparse feedback information on statistics about their network conditions back to the video server.

Fig. 1. Basic framework of the system.

In this paper, we use SVC for source coding to produce the scalable bit-stream. SVC [1] is the extension of H.264/AVC to achieve a wide range of spatio-temporal and quality scalability. In this work, the quality scalability is utilized for source coding to make it convenient to adjust the bit-rate in each layer. To achieve coarse-grain scalability (CGS), inter-layer prediction mechanism is employed. The texture of a picture is encoded using a larger quantization parameter to produce a quality base layer, which provides a minimum quality for the decoded video. A refinement of texture information is typically achieved by re-quantizing the residual signal with a smaller quantization step size.

A major challenge in video communication is the channel errors. The scalable video coding approach makes it easy to split the bit-stream into multiple layers. It allows transmission of these sub-streams with different protection classes. FEC schemes based on Reed-Solomon (R-S) codes are often used for channel protection. A (n, k) code has a symbol erasure protection, where n is the length of the code-word in symbols and k is the length of the information symbols per code-word. In this paper, each symbol is represented by one byte. With receiving of any k symbols among the n symbols, all the information symbols can be correctly decoded.

SECTION III

PROPOSED SCHEME

A. Bit-Rate Allocation for the Base Layer

The base layer provides a minimum quality of video, which should be guaranteed to be received by every client with very low loss rate. Thus, the bit-rate allocated to the base layer is determined as TeX Source $$r_0 = \min_{1 \leq i \leq C}b(i)\eqno{\hbox{(1)}}$$with C being the total number of clients and b(i) being the available bandwidth of the ith client.

The channel coding scheme for the base layer is illustrated in Fig. 2(a). Each rectangle denotes a packet. A block of packets is comprised of N0 packets, in which k0 of them are source packets. A packet-level FEC is applied on the source packets to generate N0-k0 parity packets. The video source bit-rate r0_S and the channel coding bit-rate r0_C are computed as TeX Source \eqalignno{r_{0\_S} &= r_0\left({k_0\over N_0}\right)&\hbox{(2)}\cr r_{0\_C} &= r_0\left({N_0 - k_0 \over N_0}\right)&\hbox{(3)}}

Given each client's end-to-end bit-rate and packet loss rate, the server has to decide the source coding bit-rate and the channel coding bit-rate to make the base layer received by all the clients with a very low loss rate, which is no more than ∊. ∊ is defined as the residual loss rate after error correction.

Fig. 2. Proposed channel protection scheme: (a) Channel coding for the base layer and (b) Channel coding for the enhancement layers.

We use Pb(i) to define the probability that the base layer is lost for client i. The probability is related to the packet loss rate over wireless packet-erasure channel. In this paper, we use a two-state Markov model to approximate the wireless channel's packet loss behavior [7]. The Markov model can be calculated by p(m,N), which illustrates the probability of losing m packets within N packets. As long as the number of lost packets does not exceed the number of protection packets, the original data can be reconstructed. Therefore, Pb(i) is formulated as TeX Source $$P_b(i) = \sum^{N_0}_{m=N_0-k_0+1}p_i(m,N_0)\eqno{\hbox{(4)}}$$where pi(m,N0) is the probability for client i to lose m packets within N0 packets. N0 is calculated as TeX Source $$N_0 = \left\lfloor r_0\cdot \left.\left({T\over F}\right)\right/8 M\right\rfloor\eqno{\hbox{(5)}}$$with T being the number of frames in a GOP and F being the frame rate of the video. M is the packet size in bytes. Let be the maximum probability for the base layer to be lost among all the clients. Now the problem can be formulated as TeX Source $$k_0^* = \max k_0, {\rm sub\ to}\ k_0 \leq N_0, P_{b\_\max}\leq \varepsilon.\eqno{\hbox{(6)}}$$

The searching of the maximum k0 is for the purpose of maximizing the source data to achieve a better video quality. Once k*0 is computed, the source bit-rate and the channel bit-rate of the base layer can be calculated using (2) and (3).

B. Bit-Rate Allocation for the Enhancement Layers

All the clients can access to the base layer and those clients with higher bandwidths may subscribe to more layers to improve the perceived video quality. Once a layer is lost, all the higher layers are useless. Therefore, we apply UEP on different enhancement layers of a GOP as illustrated in Fig. 2(b). There are totally L enhancement layers, where Si denotes the source data for the ith layer (i = 1,2,3,…,L). FEC codes are added as redundancy for protection of the source data. More protections are allocated to the lower layer while less for the higher ones. All the data are vertically split into N packets.

The length and the height of the source data in Si are denoted by k(i) and h(i), respectively. Let RS = [rS,1 rS,2rS, L] and RC = [rC,1 rC,2rC,L] where rS,i and rC,i are the source bit-rate and the channel bit-rate of the ith enhancement layer. The calculation of rS,i and rC,i are as follows, TeX Source \eqalignno{r_{S,i} &= 8k(i)\cdot h(i)\left/\left({T\over F}\right)\right.&\hbox{(7)}\cr r_{C,i} &= 8(N-k(i))\cdot h(i)\left/\left({T\over F}\right)\right.&\hbox{(8)}}

Given the clients' statistics, the server has to determine RS and RC in order to maximize the system-wide utility function. The proposed scheme can work with any form of user-defined utility function. In this paper, the PSNR of the perceived video is used to measure the utility.

Given the server's total sending bit-rate R, the available bit-rate for the enhancement layers, re, is computed as TeX Source $$r_e = R - r_0\eqno{\hbox{(9)}}$$

The problem is formally stated as follows, TeX Source \eqalignno{&U^*_{avg} = \max_{{\bf R}_{\bf S},{\bf R}_{\bf C}}U_{avg} = {1\over C}\sum^C_{i=1}u(i)&\hbox{(10a)}\cr&{\rm sub\ to}\ \sum^L_{i=1}(r_{S,i}+r_{C,i}) \leq r_e,k(1)\leq k(2)\le\cdots \leq k(L).&\hbox{(10b)}}where Uavg is the average utility and u(i) represents the utility received by client i. The first constraint restricts the total amount of bit-rate in the enhancement layers not to exceed the budget bit-rate. The second one confines the protection priority to be non-increasing for the layers from low to high.

To simplify the problem, we apply a classification method to partition all the clients into several groups based on their available bandwidths. Given that there are L enhancement layers, the total of C clients are clustered into L+1 groups. Here, we assume that C is far more than L. We use bmin(i) to represent the lowest bit-rate of all the clients in the ith group, where i = 0,1,2,…,L. bmin(i) serves as the data rate of group i. We assume that a client in the group with a higher index is able to access to more packets. As illustrated in Fig. 2(b), if a client can receive up to k(i) packets, it can decode the ith enhancement layer and all the lower layers. We prescribe the relationship between bmin(i) and k(i) as TeX Source $$k(i) = \left\lfloor b_{\min} (i)\cdot \left.\left({T\over F}\right)\right/8 M\right\rfloor\eqno{\hbox{(11)}}$$where i = 1,2,…,L.

We use Uavg to denote the average utility of all except for the clients in the first group because they can only access to the base layer. Client (i,j) is used to represent the jth client in the ith group. Let g(i) be the number of clients in the ith group, Uavg is formulated as TeX Source $$U'_{avg} = \left.\sum^L_{i=1}U_G(i)\right/\sum^L_{i=1}g(i)=\left.\sum^L_{i=1}\sum^{g(i)}_{j=1}u(i,j)\right/\sum^L_{i=1}g(i)\eqno{\hbox{(12)}}$$with UG(i) being the overall utility in the ith group and u(i,j) being the utility of client (i,j). u(i,j) is computed as TeX Source $$u(i,j) = \delta (0)\cdot P(i,j,0) + \left(\sum^i_{l=1}\delta (l)\cdot P(i,j,l)\right)\cdot P(i,j,0)\eqno{\hbox{(13)}}$$where δ(0) and δ (l) are the utility contributions of the base layer and the lth enhancement layer respectively. We assume that the rate-distortion function of the scalable video codec in each layer is known. P(i,j,0) and P(i,j,l) represent the probabilities for client (i,j) to correctly receive the base layer and the lth enhancement layer, respectively. They are calculated as follows, TeX Source \eqalignno{P(i,j,0) &= \sum^{N_0-k_0}_{m=0}p_{i,j}(m,N_0)&\hbox{(14)}\cr P(i,j,l) &= \sum^{N_{i,j}-k_l}_{m=0}p_{i,j}(m,N_{i,j})&\hbox{(15)}}Both pi,j(m,N0) and pi,j(m,Ni,j) are achieved using the two-state Markov model. Ni,j is the maximum number of packets that can be received by client (i,j), which should be computed as TeX Source $$N_{i,j} = \left\lfloor b(i,j) \cdot \left.\left({T\over F}\right)\right/8 M\right\rfloor\eqno{\hbox{(16)}}$$with b(i,j) being the available bandwidth of client (i,j). From the above deduction, it is shown that the changing of h(i) will result in the variation of source coding bit-rate of each layer and consequently affect the overall utility of the system. Therefore, the problem is re-formulated as follows, TeX Source \eqalignno{&U'^*_{avg} = \max U'_{avg}&\hbox{(17a)}\cr&{\rm sub\ to}\ \sum^L_{i=1}h(i) \leq M,0\leq h(i)\leq M.&\hbox{(17b)}}

To save the computation consumption, we employ a dynamic programming algorithm [5] to solve the problem.

SECTION IV

EXPERIMENTAL RESULTS

We assume that there are 10000 clients in the system, ranging from 64 to 2048 kbps. The bandwidth distribution follows a multimodal distribution: 50% of clients have an average bandwidth of 40 kbps and a standard deviation of 25 kbps; 35% of clients have a normal distribution with mean 1000 kbps and a standard deviation of 100 kbps; and 15% of clients have an average bandwidth 2000 kbps and a standard deviation of 200 kbps. The packet loss rate of each client is randomly generated, where the mean value is set to 0.03 and the variance is given as 0.015. All the clients are categorized into groups by using the K-means classification method. The server's sending data rate is 3000 kbps. The video sequence Foreman is used for testing, which is encoded at 15 f/s in CIF format. The GOP size is fixed at 16. Experiments are performed to transmit the video sequence over a two-state Markov channel. The packet size is 128 bytes. Due to the random nature of such a channel, 50 different runs of the experiments are conducted.

The available bandwidth for the base layer is the minimum receiving bit-rate among all the clients, which is 64 kbps in the experiment. It consists of both the source coding bit-rate and the channel coding bit-rate. The desired residual loss rate ∊ is given as 0.008. We vary the total number of enhancement layers L from 1 to 4. Under each L, the proposed algorithm is carried out to determine the coding structure that maximizes the average utility in the system. The source bit-rate of each layer and the average PSNR of the system are displayed in Table I. Obviously the overall utility increases with increasing of L because that a larger number of layers can better adapt to the heterogeneous bandwidth distribution to improve the system-wide utility. Fig. 3 gives the comparison of the proposed method against the other two schemes. For all these three schemes, the base layer is encoded using the same structure to guarantee that it can be received by all the clients with a very low error rate. In scheme 1, UEP is also applied on different enhancement layers. However, the height of the source data is equally allocated to different layers. In scheme 2, traditional method is employed, where equal error protection is added to different layers with fixed coding rate. During increasing of the total number of layers, the resulted system utility of scheme 2 slightly decreases due to the increase of overhead information for all the layers. From Fig. 3, we can observe that the proposed method exhibits obvious superiority over the other two schemes. When there is only one enhancement layer, all three schemes have the same performance because they use the same coding structure. With increasing of the number of layers, the proposed method shows more advantage than the other two methods. The improvement is up to 2 dB.

Fig. 3. Comparison of the proposed method against the other two schemes.
TABLE I Coding Structure of the Source Data and the Average PSNR of the Reconstructed Video
SECTION V

CONCLUSION

In this paper, we present a new system for wireless broadcasting of scalable video. To cope with the heterogeneity of different receivers, SVC is applied to encode the raw video data into multiple layers including a quality base layer and several enhancement layers. FEC scheme is adopted to generate error protection codes. We design different channel protection schemes for the base layer and the enhancement layers, respectively. Given the receivers' statistics, we propose a novel algorithm to determine the source bit-rate and channel bit-rate for each layer to maximize the system-wide utility. We implement the algorithm to verify its performance. The proposed method is compared with the other two schemes to show how various coding structures affect the overall utility. It is demonstrated that the proposed method outperforms other methods and the improvement is up to 2 dB.

Footnotes

Yu Wang, Lap-Pui Chau and Kim-Hui Yap are with the School of Electrical & Electronics Engineering, Nanyang Technological University Singapore, 639798

References

1. Overview of the scalable video coding: extension of the H.264/AVC standard

H. Schwarz, D. Marpe, T. Wiegand

IEEE Trans. Circuit Syst. Video Technol., vol. 17, issue (9), p. 1103–1120, 2007-09

S. Mccanne, V. Jacobson, M. Vetterli

Proc. ACM SIGCOMM, 1996, 117–130

3. Optimal coding of multilayer and multiversion video streams

C.-H. Hsu, M. Hefeeda

IEEE Trans. Multimedia, vol. 10, issue (1), p. 121–131, 2008-01

4. Video multicast using layered FEC and scalable compression

W.-T. Tan, A. Zakhor

IEEE Trans. Circuit Syst. Video Technol., vol. 11, issue (3), p. 373–386, 2001-03

5. Allocation of layer bandwidths and FECs for video multicast over wired and wireless networks

T.-W. A. Lee, S.-H. G. Chan, Q. Zhang, W.-W. Zhu, Y.-Q. Zhang

IEEE Trans. Circuit Syst. Video Technol., Vol. 12, issue (12), 2002-12

6. Wireless broadcasting using the scalable extension of H.264/AVC

T. Schierl, H. Schwarz, D. Marpe, T. Wiegand

Proc. IEEE Int. Conf. Multimedia And Expo, 2005-07, 884–887

7. A model of the switched telephone network for data communications

E. O. Elliott

Bell Syst. Techn. J., p. 89–109, 1965-01

Cited By

No Citations Available

Keywords

INSPEC: Non-Controlled Indexing

No Keywords Available

Authors Keywords

No Keywords Available

More Keywords

No Keywords Available

No Corrections

Media

No Content Available
This paper appears in:
International Symposium on Circuits and Systems
Issue Date:
2009
On page(s):
117 - 120
ISBN:
N/A
Print ISBN:
978-1-4244-3827-3
INSPEC Accession Number:
10760360
Digital Object Identifier:
10.1109/ISCAS.2009.5117699
Date of Current Version:
26 Jun, 2009