We consider the problem of average throughput maximization per total consumed energy in packetized point-to-point wireless sensor communications. Our study results in an optimal transmission strategy that chooses the optimal modulation level and transmit power while adapting to the incoming traffic rate, buffer condition, and channel condition. We formulate the optimization problem as a Markov decision process (MDP). When the state transition probability of MDP is available, the optimal policy of MDP can be obtained using dynamic programming (DP). Since in practical situations, the state transition probability may not be available when the optimization is done, we propose to learn the near-optimal policy through the reinforcement learning (RL) algorithm. We show that the RL algorithm learns a policy that achieves almost the same throughput as the optimal one, and the learned policy obtains more than twice average throughput compared to the simple constant signal to noise ratio (CSNR) policy, particularly in high packet arrival rate. Moreover, the learning algorithm is robust in tracking the variation of the governing probability.
Published in:
Global Telecommunications Conference, 2004. GLOBECOM '04. IEEE
(Volume:2
)
Date of Conference: 29 Nov.-3 Dec. 2004