By Topic

Online Reinforcement Learning for Dynamic Multimedia Systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Nicholas Mastronarde ; Department of Electrical and Computer Engineering, University of California at Los Angeles (UCLA), CA, USA ; Mihaela van der Schaar

In our previous work, we proposed a systematic cross-layer framework for dynamic multimedia systems, which allows each layer to make autonomous and foresighted decisions that maximize the system's long-term performance, while meeting the application's real-time delay constraints. The proposed solution solved the cross-layer optimization offline, under the assumption that the multimedia system's probabilistic dynamics were known a priori, by modeling the system as a layered Markov decision process. In practice, however, these dynamics are unknown a priori and, therefore, must be learned online. In this paper, we address this problem by allowing the multimedia system layers to learn, through repeated interactions with each other, to autonomously optimize the system's long-term performance at run-time. The two key challenges in this layered learning setting are: (i) each layer's learning performance is directly impacted by not only its own dynamics, but also by the learning processes of the other layers with which it interacts; and (ii) selecting a learning model that appropriately balances time-complexity (i.e., learning speed) with the multimedia system's limited memory and the multimedia application's real-time delay constraints. We propose two reinforcement learning algorithms for optimizing the system under different design constraints: the first algorithm solves the cross-layer optimization in a centralized manner and the second solves it in a decentralized manner. We analyze both algorithms in terms of their required computation, memory, and interlayer communication overheads. After noting that the proposed reinforcement learning algorithms learn too slowly, we introduce a complementary accelerated learning algorithm that exploits partial knowledge about the system's dynamics in order to dramatically improve the system's performance. In our experiments, we demonstrate that decentralized learning can perform equally as well as centralized learning, while- - enabling the layers to act autonomously. Additionally, we show that existing application-independent reinforcement learning algorithms, and existing myopic learning algorithms deployed in multimedia systems, perform significantly worse than our proposed application-aware and foresighted learning methods.

Published in:

IEEE Transactions on Image Processing  (Volume:19 ,  Issue: 2 )