Abstract:
Adaptive Bit Rate (ABR) assignment plays a crucial role for ensuring satisfactory quality of experience (QoE) in video streaming applications. Recently the authors of [1]...Show MoreMetadata
Abstract:
Adaptive Bit Rate (ABR) assignment plays a crucial role for ensuring satisfactory quality of experience (QoE) in video streaming applications. Recently the authors of [1] proposed to use reinforcement learning (RL) based asynchronous advantage actor-critic (A3C), an on-policy method, Pensieve, to improve ABR algorithms. It has shown to achieve a higher QoE as compared to other traditional ABR methods. However, Pensieve is sample inefficient and frail to different random seeds and hyperparameters. In this paper, we present soft actor-critic based deep reinforcement learning for adaptive bitrate streaming (SAC-ABR), an off-policy method, which improves the QoE as compared to other existing state-of-the-art ABR algorithms under a wide variety of network conditions. Based on the maximum entropy RL framework, SAC-ABR aims to maximize entropy while maximizing the expected rewards, hence achieving a better exploration-exploitation tradeoff as compared to on-policy ABR methods. We present the overall design together with the training and testing results of SAC-ABR, and evaluate its performance as compared to other state-of-the-art ABR algorithms. Our results show that SAC-ABR provides up to 27.42% higher average QoE as compared to Pensieve and much higher QoE when compared to other traditional fixed-rule based ABR algorithms.
Date of Conference: 04-08 January 2022
Date Added to IEEE Xplore: 13 January 2022
ISBN Information:
ISSN Information:
Funding Agency:
References is not available for this document.
Select All
1.
H. Mao, R. Netravali, and M. Alizadeh, “Neural adaptive video streaming with pensieve,” in Proceedings of the Conference of the ACM Special Interest Group on Data Communication, ser. SIGCOMM '17. New York, NY, USA : Association for Computing Machinery, 2017, p. 197–210.
2.
“Sandvine: Global Internet Phenomena Report. (2021),” 2021. [Online]. Available: https://www.sandvine.com/phenomena
3.
A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer, and R. Zimmermann, “A survey on bitrate adaptation schemes for streaming media over http,” IEEE Communications Surveys Tutorials, vol. 21, no. 1, pp. 562–585, 2019.
4.
“ISO/IEC 23009–1:2014: Dynamic adaptive streaming over HTTP(DASH) - Part 1: Media presentation description and segment formats,” May 2014.
5.
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” 2018.
6.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA : A Bradford Book, 2018.
7.
T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson, “A buffer-based approach to rate adaptation: Evidence from a large video streaming service,” in Proceedings of the 2014 ACM Conference on SIGCOMM, ser. SIGCOMM '14. New York, NY, USA : Association for Computing Machinery, 2014, p. 187–198. [Online]. Available: https://doi.org/10.1145/2619239.2626296
8.
K. Spiteri, R. Urgaonkar, and R. K. Sitaraman, “Bola: Near-optimal bitrate adaptation for online videos,” in IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, 2016, pp. 1–9.
9.
Y. Sun, X. Yin, J. Jiang, V. Sekar, F. Lin, N. Wang, T. Liu, and B. Sinopoli, “Cs2p: Improving video bitrate selection and adaptation with data-driven throughput prediction,” in Proceedings of the 2016 ACM SIGCOMM Conference, ser. SIGCOMM '16. New York, NY, USA : Association for Computing Machinery, 2016, p. 272–285. [Online]. Available: https://doi.org/10.1145/2934872.2934898
10.
X. Yin, A. Jindal, V. Sekar, and B. Sinopoli, “A control-theoretic approach for dynamic adaptive video streaming over http,” in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, ser. SIGCOMM '15. New York, NY, USA : Association for Computing Machinery, 2015, p. 325–338. [Online]. Available: https://doi.org/10.1145/2785956.2787486
11.
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/
12.
G. Yi, “The acm multimedia 2019 live video streaming grand challenge,” The ACM Multimedia 2019 Live Video Streaming Grand Challenge, October 21–25, 2019, Nice, France.
13.
Z. Akhtar, “Oboe: Auto-tuning video abr algorithms to network conditions,” Oboe: Auto-tuning Video ABR Algorithms to Network Conditions, August 20–25, 2018, Budapest, Hungary.
14.
W. Li, “Qtcp: Adaptive congestion control with reinforcement learning, department of electrical and computer engineering, northeastern university, boston,” in GLOBECOM 2020–2020 IEEE Global Communications Conference.
15.
J. Jiang, V. Sekar, and H. Zhang, “Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive,” in Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, ser. CoNEXT '12. New York, NY, USA : Association for Computing Machinery, 2012, p. 97–108. [Online]. Available: https://doi.org/10.1145/2413176.2413189
16.
K. Spiteri, R. Sitaraman, and D. Sparacio, “From theory to practice: Improving bitrate adaptation in the dash reference player,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 15, no. 2s, Jul. 2019. [Online]. Available: https://doi.org/10.1145/3336497
17.
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” 2016.
18.
P. Saxena, M. Naresh, M. Gupta, A. Achanta, S. Kota, and S. Gupta, “Nancy: Neural adaptive network coding methodology for video distribution over wireless networks,” in GLOBECOM 2020–2020 IEEE Global Communications Conference, 2020, pp. 1–6.
19.
H. Jin, Q. Wang, S. Li, and J. Chen, “Joint qos control and bitrate selection for video streaming based on multi -agent reinforcement learning,” in 2020 IEEE 16th International Conference on Control Automation (ICCA), 2020, pp. 1360–1365.
20.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier-stra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” 2013.
21.
X. Jiang and Y. Ji, “Hd3: Distributed dueling dqn with discrete-continuous hybrid action spaces for live video streaming,” in Proceedings of the 27th ACM International Conference on Multimedia, ser. MM '19. New York, NY, USA : Association for Computing Machinery, 2019, p. 2632–2636. [Online]. Available: https://doi.org/10.1145/3343031.3356052
22.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” 2019.
23.
S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” 2018.
24.
V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in neural information processing systems, 2000, pp. 1008–1014.
25.
O. C. Sarah Filippi and A. Garivier, “Optimism in reinforcement learning and kullback -leibler divergence,” September 25, 2011.
26.
R. Netravali, A. Sivaraman, S. Das, A. Goyal, K. Winstein, J. Mickens, and H. Balakrishnan, “Mahimahi: Accurate record-and-replay for http,” ser. USENIX ATC '15. USA: USENIX Association, 2015, p. 417–429.