Optimal Energy Supplementary and Data Transmission Schedule for Energy Harvesting Transmitter With Reliable Energy Backup

For the Energy harvesting (EH) transmitter equipped with a Reliable Energy Backup (REB), the following problem is challenging and important: how to minimize the amount of energy supplied by the REB, such that the harvested energy is efficiently utilized for transmitting a given amount of data within a fixed delay constraint. In this paper, we first develop a stochastic model for this problem. We then discuss the optimization issue of delay-constrained data transmission over a fading wireless channel. We transform the delay constraint into the penalty function, and then the energy constrained control problem is modeled as a Markov decision process (MDP) without constraint, by which we obtain the optimal energy supplementary policy and the minimum of the expected energy consumption from REB. In the special case that the energy of the transmitter is supplied by REB alone, we find that the optimal energy supplementary policy is non-decreasing in the elapsed transmission time and for the remaining task of data transmission. This substantially reduces the computational complexity required to implement the optimal energy supplementary policy for a general EH wireless device. Numerical studies validate the theoretical findings, and observations are outlined to demonstrate the characteristics of the optimal energy supplementary policy and the minimum expected energy expenditure from the REB.


I. INTRODUCTION
In order to fulfil the uninterrupted long-term or perpetual communication task, an energy harvesting (EH) transmitter can harvest energy from natural energy sources, for example solar power, wind, magnetic fields, mechanical vibrations, temperature variations. However, in general the energy that is harvested by a typical EH transmitter, is very limited. Moreover, due to the randomness and instability of EH manipulations and the stochastic communication process, the harvested energy may not be able to meet the energy consumption need of the EH transmitter at all times. That is, the EH transmitter if being only powered by the harvested energy may not guarantee the quality-of-service (QoS) of The associate editor coordinating the review of this manuscript and approving it for publication was Xujie Li . the communications. In this situation, a Reliable Energy Backup (REB) is used to support the communication of the transmitter during any failure appearing in its energy harvesting process. In practice, the REB is some paid-use power supply unit, it refers to any reliable energy source, either traditional or newly developed. For example, the constant battery, the diesel generator, the power grid, the constant energy transfer and so on. The EH transmitter is encouraged to use the free and replenishable harvested energy whenever available, when the residual energy can not meet the required energy consumption of the system, the REB is triggered to input energy to the transmitter to satisfy the energy consumption requirement. In this way, the efficient mixed usage of the harvested energy and reliable backup energy provides a key solution to robust wireless green communications of the EH transmitter.
Although the coexistence of the harvested energy and the backup energy is considered as a promising technology to tackle the problem of energy shortage in the communication system, it gives rise to many new challenges to the management of the energy supplementary from the REB. Since the backup energy is paid source, the EH transmitter should make the best use of the harvested energy and minimize the backup energy consumption while satisfying the QoS requirements. Therefore, energy-efficient mechanisms of the REB should be developed in the protocol stack of the EH transmitter. Obviously, these are theoretically and practically important as well for the design, development and optimization of EH transmitter to achieve uninterrupted long-term or perpetual operations.
There is a significant amount of recent literature on the efficient optimal transmission and energy allocation policies for EH communications based on various communication models [1]- [4], however these works assume that the energies are supplied by a constant energy source. Transmission strategies and power allocation policies for EH nodes in wireless communication systems have been studied in [5]- [9], these works on EH communication systems are based on the assumption that EH is the only source of energy for the nodes. In [10], the authors propose an enhanced date transmission control strategy to minimize the energy delivered from the primary battery while keeping the overflow probability of the data buffer below a desired threshold. In [11], the authors assume that the base stations (BSs) are powered by both on-grid energy and green energy, they develop algorithms to solve the green energy optimization problem. Reference [12] investigates the collaboration between multiple operators to improve the energy utilization, an energy-saving management mechanism is proposed to reduce energy consumption and optimize energy utilization. Reference [13] proposes a model of the flexible configuration of energy, through energy cooperation between base stations and smart grid. Reference [14] presents green energy optimization schemes to maximize energy utilization in cellular networks with hybrid energy supplies, and the algorithms to solve the green energy provisioning problem in BSs while satisfying the QoS requirements are proposed.
In this paper, we consider a point-to-point communication of an EH transmitter with REB. Our goal is to minimize the energy consumed supplied by the REB for transmission of a given amount of data within a time deadline over a fading wireless channel. Compared with the literature mentioned above, the main differences of our work are given as follows.
• Firstly, we propose a new stochastic system model to characterize a point-to-point data transmission of EH transmitter with REB. In this model, we consider a delay-constrained data transmission, that is, the data task should be transmitted completed in a given deadline. We use two stochastic processes to model the energy harvesting process and the channel fading process respectively, based on which the dynamic of the energy level of the transmitter and the evolution of the remaining data task process are characterized.
• Secondly, the attacked control problem is modelled as a Markov decision process (MDP), which enables us to obtain the desired optimal energy supplementary policy of the REB and the minimum expected energy consumption supplied by the REB. In the case that the transmitter is supplied only by REB, using the concept of supermodularity, we prove that the optimal energy supplementary policy is monotonous increasing function of the residual transmission time and the remaining amount of the data task. These properties substantially reduce the computational complexity required to implement the optimal transmission scheduling policy. The rest of the paper is organized as follows. Section II gives the analytical model for the EH transmitter with REB. Section III outlines the main results. Section IV provides numerical studies to support the theoretical results. Finally conclusions are given in Section V.

II. SYSTEM MODEL AND CONTROL PROBLEM
It is assumed that the time duration is partitioned into consecutive slots and the length of each slot is denoted by τ . They are indexed by a set T = {0, 1, . . .}, where the n-th time slot is denoted by [n, n + 1).
We consider a point-to-point delay-sensitive data transmission of an EH transmitter with REB. More specially, there is a given amount of data task denoted by D[ζ, N d ], where ζ is the amount of the data task, and N d is the execution deadline, the transmitter should transmit the data task D[ζ, N d ] to the receiver over a time-varying channel within N d time slots strictly. We assume that transmission consumes most of the energy in the transmitter and ignore other causes of energy consumption.
As shown in Fig. 1, the transmitter under consideration has two types of energy source to meet its energy consumption: the harvested energy extracted from the ambient, and the backup energy supplied by the REB. The harvested energy is clean green energy and it can be free to use, while the backup energy supplied by the REB is un-renewable and paid-use resource. The transmitter should make the best use of the FIGURE 1. The architecture and working mechanism of EH-transmitter with an REB. VOLUME 8, 2020 harvested energy and minimize the supplementary energy from the REB while satisfying the delay constraint.

A. ENERGY HARVESTING MODEL
Similar to [15], [16], we use the Markov energy harvesting model denoted by X h = {X h n , n ∈ T } to characterize the studied energy harvesting process. Specially, we assume the energy harvesting process has n h states which is denoted by h = {h 1 , h 2 , · · · , h n h }, the state of the energy harvesting process remains static within each time slot, but may vary at the end of each slot. We let p h hh , h, h ∈ h , be the transition probability of X h from state h to h , and denote the transition probability matrix by P P P H = [p h hh ] n h ×n h . When the state is h ∈ h , there are β[h] Joules of energy that can be drawn from the ambient in each slot. The harvested energy is stored in an energy buffer with capacity B. When the accumulated energy reaches B, the excess energy will be lost. Due to the delay of the energy transformation, we assume that the energy harvested in the n-th slot can only be used for transmission in the (n + 1)-th slot onward.

B. CHANNEL FADING MODEL
The channel fading process is reflected by X c = {X c n , n ∈ T }, and is then modelled by a finite state Markov chain [17], [18]. Specifically, the channel state space can be partitioned into n c non-overlapping ranges. Note that there is a map, in which each range is mapped into one channel state. Let the channel state space be c = {c 1 , c 2 , . . . , c n c }, and in channel state c ∈ c , we assume the corresponding channel state is α[c]. The channel state is assumed to remain static within each time slot, but may change at the slot boundary. We let the transition probability matrix of X c be P P P C = [p c cc ] n c ×n c , where p c cc , c, c ∈ c , is the transition probability of X c from state c to c .
When n c = 2, this model is reduced to be Gilbert-Elliot channel model.

C. COMMUNICATION MODEL
We assume that the transmitter has finite different discrete transmit power rate levels, which is denoted by where 0 = a 1 < a 2 < . . . < a n a < ∞, we also refer to A as an action space, and a i ∈ A as an action. At the beginning of slot n, n ∈ T , the transmitter needs to choose a power rate π n ∈ A to transmit the data task. Let π = {π 0 , π 1 , . . .}, and we refer to π as the transmit power control policy of the transmitter. Given the transmit power control policy π n and the channel state c ∈ C in the n-th slot, then the amount of data transmitted in this slot is given as follows: where the component N 0 is the power spectral density of the Gaussian noise, the component W is the bandwidth of the channel. One sees that µ(α[c], π n ) is a monotonically nondecreasing and in the meantime a concave function in π n for the given value α [c]. Thus, at the beginning of slot n the remaining amount of the (uncompleted) data task can be defined as where the notation [x] + = max{0, x}, and let −1 k=0 (·) = 0. We refer to X d = {X d n , n = 0, 1, . . .} as a remaining data task process of the transmitter, and denote the state space by Let X e n , n ∈ T , be the energy level in the energy buffer at the beginning of slot n, then we have and we refer to X e = {X e n , n ∈ T } as the energy level process of the transmitter with state space e = [0, B]. Now, we combine the above four stochastic processes X c , X h , X d , and X e to build our system model denoted by We call X n the state of the system in the n-th slot and let the state

D. ENERGY CONSUMPTION MINIMIZATION PROBLEM
In our energy supply mechanism, the transmitter exploits the replenishable harvested energy firstly, when the harvested energy can not meet the required energy consumption, then the REB is triggered to supply energy to satisfy the energy consumption of the system. Given the energy level x e n and the transmit power rate π n in the n-th slot, let s n (x e n , π n ) be the supplementary energy rate supplied by REB in this slot, then s n can be defined as Let s = {s 0 , s 1 , . . .}, and we refer to s as the energy supplementary policy of the REB. In practical application, the determining of the amount for backup energy s n from the REB is very important. If s n is too small, the system may be always in a status of energy shortage, the data task will be transmitted with a low rate, and the violation of the delay constraint will be occurred, which results in the lower level of QoS. On the other hand, if s n is too large, more backup energy is supplied, and the cost of the communication increases too. Furthermore, the power dissipation may be occurred since the finite capacity of the energy buffer, which lead to an inefficient use of harvested energy. Thus, developing an optimal energy supplementary policy s of the REB plays a critical role for the efficient use of energy of the EH transmitter with REB.
Let N c be an actual transmission time of the task D[ζ, N d ]. At the beginning of slot n, given the system state X n = (c, h, d, e), and the transmit power policy π , the expected supplementary energy from the REB until the data task is transmitted completely is defined by where E π [·|c, h, d, e] represents the conditional expectation given the initial state (c, h, d, e) under the policy π . We call V π n the cost function at the beginning slot n.
In this paper, the task is to find an energy supplementary policy s in order to minimize the total expected supplementary energy supplied by the REB and also to ensure that the data task can be transmitted within a strict delay constraint. From (4), when we determine the optimal transmit power π , we can obtain the optimal energy supplementary policy. Thus, we formulate the at hand constrained control problem in the slotted system as follows: given a data task D[ζ, N d ], the initial channel state c, the EH state h, and the energy level e, find a transmit power control policy π * such that, We call π * = {π * 0 , π * 1 , . . .} the optimal transmit power control policy, and then we obtain the optimal energy replenishment policy s * by In the following, we will reformulate the control problem (6) into a mathematical framework for analytical tractability.
To simplify the notation, we define x = (c, h, d, e) ∈ x and x = (c , h , d , e ) ∈ x , respectively. Herein no distinction between these two symbols is made, the symbols are used alternatively according to the convenience of the illustrating purpose.
Remark 1: Different from the expectation delay constraint in traditional control problem, we use an P-a.s., i.e., ''almost sure'', delay constraint in (6) to characterize the hard deadline, and the convex analytic approach is problematic in our system model, we need to transform this constraint condition into some other mathematical form easy to be processed. To address this challenge, similar to [19], [20] and [21], we transform the P-a.s. delay constraint into the penalty function, and ultimately we reformulate the control problem (6) into an finite horizon MDP with no constraint, the transmit power control policy and the minimum expected supplementary energy of the REB are obtained by solving the MDP. xx be the probability that the state at time (n + 1) is x if the state at time n is x and the action a ∈ A is taken. Then we have = (c, h, d, e), π n = a) n (e) be the energy consumption from REB in slot n, when the current energy level is e, and the action a ∈ A is taken, then we have

III. MDP REFORMULATION
n (e) = s n (e, a) = [a − e] + , n ∈ T d . 6) Terminal cost: To exclude these infeasible control policies that result in the violation of the delay constraint, as mentioned in remark 1, a terminal penalty cost is added at the end of the horizon as where we let (·) be a convex increasing function in the uncompleted data task x d N d at the end of the horizon. We denote the above MDP by {T d , x , A, P P P [n;a] , g [a] n , g N d }. In the reformulation, we replace the constraint with a terminal data loss cost. The terminal cost means that at the end of the deadline, the more the residual data task is, the more the penalty value is, and the increment of the penalty increases with respect to the amount of the residual data task. In practical application, can just be set as an arbitrary sufficient big number. For example, let = +∞ for the control policies that lead to the violation of time constraint. Thus, by our assumption, at the end of the execution deadline, the data task should be completed, otherwise, a big cost of violating the constraint will incur.
Since the MDP {T d , x , A, P P P [n;a] , g [a] n , g N d } is a finitehorizon MDP with finite action set, the minimum cost function V of the control problem (6) exists and it is the unique solution of the following dynamic programming equation.
For ∀x = (c, h, d, e) ∈ x , the minimum cost V n (x) and the optimal transmit power policy π * can be obtained by the following iteration algorithm.
for n = N d − 1, . . . , 1, 0, we have with the state action cost function We can see that the optimal energy supplementary policy of the REB are offline power control policies. That is, the policies can be computed offline first and then we compile the energy supplementary policies into a simple look-up table in the transmitter. At each decision epoch, what the transmitter should do is only to look up the table and choose the corresponding optimal energy supplementary policy to match the current state. Therefore, the energy supplementary policy is suitable for the transmitter with limited computational capability.
The following property is intuitive, it means that if more amount of the residual data task is left, i.e., a larger d, we should spend more backup energy from REB on the transmission, then V increases. Similarly, if more transmission time are passed, i.e., a larger n, more energy for transmission should be allocated, which will subsequently lead to a larger total energy consumption V .
Property 1: For the MDP {T d , x , A, P P P [n;a] , g [a] n , g N d } described above, the following two statements hold for n = 0, 1, . . . , N d : (a) V n (c, h, d, e) is a non-decreasing function in d when other parameters are fixed; (b) V n (c, h, d, e) is a non-decreasing function in n when other parameters are fixed.
The proof can be found in Appendix A.

B. SPECIAL CASE: TRANSMITTER WITH REB
In the following, we consider a special case where the energy consumption of the transmitter is supported only by the REB.
In this case, we investigate some structural results of the optimal energy supplementary policy, which can be used to reduce the computational complexity of the value iteration algorithm. When the energy of the transmitter is supplied by the REB, as a result, the system state is left with two elements: the channel state and the residual data task, then the system model is denoted by We define the natation y = (c, d) ∈ y and y = (c , d ) ∈ y , respectively. The transit probability of the system state reduces to be It should be noted that, for the transmitter with REB, there is no difference between the transmit power policy and the energy supplementary policy. Thus, we also refer to π as the energy supplementary policy.
For ∀(c, d) ∈ y , we denote the minimum expected cost function at decision epoch n byṼ n (c, d), which satisfies the following Bellman's equation of optimality: with One is able to solve Equation (14) using the value iteration algorithm as described in Section III-A. However, before that let us discuss some properties related toṼ π n (c, d) as outlined in Lemmas 1 and 2. These results are the pathways to the monotone policy in Theorem 1. The proof can be found in Appendix A.
The following result states that the optimal energy supplementary policy is non-decreasing in the transmission time that is passed and the remaining amount of the data task, respectively.
Theorem 1: For the transmitter with REB, the optimal energy supplementary policy π * n (c, d) is non-decreasing in d and n, respectively. That is, for With this monotonic structure obtained in theorem 1, we can significantly reduce the computational complexity of the value iteration algorithm. We summary the algorithm into Algorithm 1.
The main difference between Algorithm 1 and the traditional iterative algorithm is that the procedure in computing W n in (20), which inherits lower complexity than the traditional iterative algorithm. Particularly, when we computeW n , by the monotonic structure of the optimal policy, we just need n (c, d) • 3. If n = 0, output the cost functionṼ 0 (c, d), and the optimal policy π = {π * 0 , π * 1 , . . . , π * N d −1 }. Otherwise, go to step 2.
to find the optimal solution in action set {a 1 , a 2 , . . . , π * n+1 } instead of the larger set A = {a 1 , a 2 , . . . , a n a }, the computational complexity of the algorithm 1 can be further reduced.

IV. NUMERICAL RESULTS
In this section, our focus will in giving a numerical example to illustrate the computation of the optimal control policy, and we will also investigate the characteristics of the energy supplementary policy and the minimum expected total consumed energy from REB.

A. NUMERICAL SETTING
Let the length of each time slot be τ = 0.1 second, and we assume that the amount of the data task that the transmitter will be transmitted is ζ = 5 M, and the execution deadline is We consider a bandlimited Gilbert-Elliot channel, the channel has two states {0, 1}, representing the ''bad'' and ''good'' states of the channel, with the corresponding channel gains as 0 and 1.0 × 10 −12 . The transition matrix of the channel fading process is assumed to be P P P c = 0.5 0.5 0.4 0.6 .
We assume the channel bandwidth is W = 20 MHz, and the noise power spectral density N 0 = 2.0 × 10 −18 W/Hz. We use {0, 1, 2} to represent the ''low'', ''normal'' and ''high'' states of the energy harvesting process with the energy harvesting rates as 0 mW, 0.10 mW and 0.30 mW respectively. The transition matrix is assumed to be The capacity of the energy buffer is set to be B = 2 mJ. That is, in each slot the maximum transmit power rate the buffer can supplied is 2/0.1 mJ/s= 20 mW. The admissible transit power rate set of the transmitter is denoted by A = {a k = kδ a , k = 0, 1, . . . , n a }, where δ a = 0.01 mW is the TPR grid, and we let n a = 400. Noting that the state space x is a hybrid-state space, in which c , h are discrete and d , e are continuous, we use a state-space discretization to yield recursive approximations to the MDP we obtained, and the analysis of the convergence and the complexity can be found in [24]. We let the step-size of the power rate be 0.01 mW, we then obtain a finite horizon MDP with 756 states and 400 actions based on the method developed in this paper. Then the optimal energy supplementary policy and the minimum expected energy consumption from REB can be obtained. We list some parts of look-up table in Table 1, which shows the availability of our algorithm.

B. OBSERVATIONS
From Table 1, some observations are given as follows. We can see that when the state is (0, * , * , * ), where * means arbitrary state, the energy supplementary is zero, this because when the channel state is 0, the channel gain becomes 0, in this case no data can be transmitted successfully in this channel state. Thus, no data transmission is the best decision. We can see that the energy supplementary under state (1, 1, 1, 0.1) is zero, this because that the residual energy in the buffer can meet the optimal transmit power rate, then the REB need not supply energy for the transmission. In Table 1, the optimal energy supplementary under the state (1, 0, 2, 0.1) is same as energy supplementary under the state (1, 2, 2, 0.1), this observation can be explained by the fact that the energy harvested in the current can not be used due to the delay of the energy transformation.
We also plot some of our numerical results in the following Fig. 2-5 to highlight some characteristics of the optimal energy supplementary policy and the minimum expected  energy supplied by REB when the energy of the transmitter is supplied only by the REB. Fig. 2 -5 show that the optimal energy supplementary policy and minimum expected energy expenditure from REB with respect to remaining amount of data task and the transmission time that passed. From Fig. 2, we can see that when the channel state is 0, the optimal energy supplementary is zero no matter than what the residual data task and remaining time are. The explantation of this observation is same to the case of the state (0, * , * , * ) in Table 1.
From Fig. 4 and 5, we can see that given the residual data task, the energy supplementary from REB is increasing in the transmission time that has been used; and given the residual transmission time, it is increasing in the remaining amount of the data task, which are consistent with the results in Theorem 1. This observation means that an aggressive transmit power strategy must be implemented when the residual transmission time is less or the uncompleted data task is larger. Such characteristic can also be seen from Fig. 3.
We should note that when the residual transmission time is too small and the remaining data task is too large, in this case the transmitter can not transmit the remaining data task under the deadline even with the largest transmit power rate, then a high penalty value will be incurred due to the violation of the delay constraint. This fact explains why the energy  consumption shown in Fig. 4 is much high when the elapsed transmission time is large.

V. CONCLUSION
In this paper, we consider an wireless communication of an EH transmitter with REB over a wireless fading channel. We consider optimization of energy supplementary policy to minimizing the total expected energy consumption supplied by REB while the data task should be transmitted in a deadline strictly. We reformulate our control problem into a finite horizon MDP, and obtain the optimal energy supplementary and the minimum total expected energy supplementary from the REB by solving the Bellman equation. When the energy of the transmitter supplied only by the REB, we prove that the optimal energy supplementary policy is non-decreasing in the transmission time that passed and the remaining amount of the data task. Finally, the numerical experiments are used to illustrate the theoretical findings numerically. Our model can provide guidance on determining the optimal system parameters. One of the future research of this model is the performance metrics evaluation of the EH transmitter with REB under the optimal energy management strategy, such as the average energy consumption rate of the EH transmitter, the residual energy distribution, the data queue length in data buffer, and the packet blocking probability due to the delay constraint, and so on.
The first term of the right hand side is g [a] n (e), which is independent of d. By the induction hypothesis, the second term of the right hand side of (21), being a linear combination of non-decreasing functions, is non-decreasing in d. Thus, from (21), W [a] n (c, h, d, e) is a non-decreasing function in d. By (9), we have that the cost function V n (c, h, d, e) is non-decreasing in d.
To prove (b), we let a = 0, and from (9) we have The first term g [a] n of the right hand side is independent of d. By the induction hypothesis, we know thatṼ n+1 (c, d) is a concave function in d, in addition, [d − µ(α[c], a)] + is concave in d, thenṼ n+1 (c , [d − µ(α[c], a)] + ) is a concave in d. Therefore the second term of the right hand side of (22) is concave in d since it is a weighted sum of concave functions. Thus we have proved thatW [a] n (c, d) is concave in d. In the following, we show thatṼ n (c, d) is concave in d. Let Then we prove thatṼ n (c, d) is concave in d.
By mathematical induction, we have shown thatṼ n (c, d) is a concave function in d.

C. PROOF OF THEOREM 1
Proof: To prove that the optimal energy supplementary policy π * n (c, d) is non-decreasing in d, we just need to prove thatW [a] n (c, d) is a submodular function in (a, d) [23]. That is, let a > a, d > d, where a , a ∈ A and d , d ∈ [0, ζ ], we need to provẽ To obtain the above inequality, we substitute (22) into the left hand side of (23) and rearrange it, we then havẽ which lead to (23). Then we prove that the optimal energy supplementary policy π * n (c, d) is non-decreasing in d. By the similar method, we can proveW [a] n (c, d) is a submodular function in (a, n), then the optimal energy supplementary policy π * n (c, d) is non-decreasing in n. Then the results are proved.