Analysis of Buffer-Aided Energy Harvesting Device-to-Device Communications With Complex Boundary Behavior

In this paper, we investigate the performance analysis of the buffer-aided energy harvesting device-to-device (D2D) communication system. A joint data and energy resource scheduling problem is proposed considering the data packet and energy packet Poisson arrival but interactive departure. Considering the coupling relationship between data and energy, the joint resource scheduling problem is modeled as a data and energy coupled queuing model. We raise the expressions of carrier-to-interference and noise-ratio (CINR) interval division, which contains the dynamic interferences from cellular users and the other D2D users. To compromise between data delay and energy consumption, we propose a novel adaptive power transfer (APT) strategy that allows more energy consumption to improve data delay. Studying user mobility and APT strategies will lead to complex boundary behavior of the system transition state matrix. By random geometry theory and quasi-birth and death method to obtain steady-state transition probability, the expressions of throughput, delay and packet drop rate for both data queue and energy queue are derived. Simulations are demonstrated to verify the accuracy of the theoretical derivation results.


I. INTRODUCTION
On June 6, 2019, the Ministry of Industry and Information Technology officially issued 5G commercial licenses to China Telecom, China Mobile, China Unicom and China Radio and Television in Beijing, which marks the official entry of China's 5G into the first year of commercialization. In the 5G era of the Internet of Everything, the explosive growth in the number of terminal equipment and low-latency, low-power user requirements has made the contradiction between the massive data services of emerging communications services and the shortage of wireless spectrum resources increasingly apparent. As one of the key technologies of 5G, device-to-device (D2D) communication The associate editor coordinating the review of this manuscript and approving it for publication was Hayder Al-Hraishawi .
technology has the characteristics of providing spectrum utilization, significantly improving user experience, flexible service expansion, etc., and has attracted widespread attention [1], [2].
The buffer-aided D2D mechanism allows base station (BS) and helpers to temporarily store data in their local buffers and then transmit them in subsequent frames, which raises system throughput, saves transmission power and improves communication quality. Although the use of buffer assist mechanisms in D2D wireless communication can significantly improve system performance, a series of opportunities and challenges are introduced.
[3]- [7] discussed practical challenges in buffer-aided D2D communication system performance, such as transmission delay and optimal bandwidth allocation. [3] analyzed the basic effects of the buffer-aided D2D mechanism on D2D communications, which improves D2D system bandwidth conservation performance. The authors noted the correlation between the buffer sizes of the communication terminal and the average system performance. [4] studied the problem of resource allocation with dynamic data arrival and limited buffers in D2D communication networks and modeled it as a constrained Markov decision process (CMDP). [5] studied a D2D mode selection strategy based on dynamic mode selection. The optimization goal was determined by the buffer size and buffer status of the user equipment (UE). [6] introduced a spatiotemporal mathematical model to analyze the performance of priority data transmission in D2D cellular networks. The data stored in the cache and the location of D2D users were modeled through the thin Poisson point process and the priority queuing model. [7] maximized the efficiency of offloading potential D2D users while satisfying users' minimum throughput and limited cache capacity. Considering a three-node decode-and-forward buffer-aided relaying scenario, [8] proposed a new relaying protocol employing adaptive link selection. For both delayconstrained and delay-unconstrained transmission, an optimal power allocation and link selection policy was proposed. [9] investigated the fading two-hop full-duplex (FD) relay channel with self-interference. Three buffer-aided relaying schemes with adaptive reception-transmission at the FD relay were proposed, which enabled the FD relay to adaptively select to either receive, transmit, or simultaneously receive and transmit efficiently.
Energy harvesting (EH) technology [10] can collect from ambient energy sources and potentially reduce the dependence on the supply of grid or battery energy; it is especially suitable for future green wireless communication scenarios. [11] conducted research to maximize the average energy efficiency of all D2D links. Under the guarantee of the quality of service of the cellular user equipment (CUE) and the EH constraints of the D2D link, the EH slot allocation, power and spectrum resource block allocation of the D2D user equipment (DUE) were jointly optimized. [12] studied the optimization strategy of EE and delay tradeoff in an EH-based D2D communication cellular network, which includes a cellular user and multiple EH-powered D2D user pairs. Considering the limited energy and data storage space, the weighted EE and delay utility problems were modeled, and an experience sharing distributed cooperative learning optimization algorithm was proposed. [13] studied the D2D heterogeneous cellular network (DHCN) multiplexing CUE uplink spectrum resource DUE mode selection and resource allocation. A mode selection and resource allocation algorithm that aims to maximize the system throughput with harvesting and quality of service constraints was proposed. [14] focused on how to maximize the energy harvesting relayaided D2D communication sum rate without degrading the quality of service requirements of all users. A low complexity joint power allocation and relay selection algorithm was proposed.
[15]- [18] studied the effects of D2D user position dynamics by random geometric modeling. For both the random spectrum access strategy and priority spectrum access strategy, [15] modeled and analyzed the performance of EH D2D communication cellular networks by random geometry theory. [16] addressed how EH D2D communication system performance is subject to spatial randomness, time correlation, transmit power control, and channel uncertainty. For four different EH schemes, the D2D coverage rates were derived using the Markov chain model. [17] provided an analysis model of radio frequency EH multichannel downlink D2D communication based on random geometry. When the transmit power of the D2D transmitter was fixed, the probability of D2D communication collecting sufficient energy was derived. Considering the D2D communication scenario combining environmental backscattering and wireless power communication, [18] designed two mode selection protocols for hybrid D2D transmitters. Additionally, the environmental factors influencing the hybrid D2D communication performance were considered. However, the effects of mutual coupling of data and energy were not considered in [15]- [18]. [11], [19]- [21] jointly considered the influence of the coupling relationship between data and energy by using Markov theory on the performance of EH-DHCN communication scenarios. [19] studied long-term energy efficiency and delay tradeoff problems through discrete-time and finite-state Markov decision processes. [20] developed a framework for the design and analysis of DHCN by introducing the EH region (EHR) and modeling the status of harvested energy using a Markov chain. Using the derived UER distribution, a transmission mode selection scheme including the efficient UER selection method was proposed. Considering the joint data-energy beamforming technology, [21] proposed a synchronous wireless information and power transmission-based traffic offloading scheme for EH D2D communication cloud wireless access networks. [11] focused on the DHCN downlink spectrum resource allocation problem in which the goal was to maximize the average energy efficiency of all D2D links. Considering the quality of service guarantee of CUEs and the EH constraints of the D2D links, an average energy efficiency joint resource allocation problem was formulated. After the transformation of the original problem, an EH time slot allocation, power and spectrum resource block allocation iterative algorithm was proposed.
In this paper, we concentrate on the performance analysis of the data and energy coupling queuing model based on random geometry for buffer-aided EH D2D communications networks. First, a joint data and energy resource scheduling problem was proposed considering data arrival and energy harvesting, respectively, which obeys a Poisson distribution. The resources in this article mainly refer to data packets and energy packets. Specifically, D2D users need to make full use of energy resources to send data resources efficiently under complex channel conditions. In this buffer-aided EH D2D underlaying cellular networks, the interference of typical D2D users comes from the other D2D users and cellular users that obey PPPs. Second, the continuously changing channel state process is divided into several intervals to analyze and VOLUME 8, 2020 adaptively adjust the transmission power. We further consider the impact of user mobility on interference by using random geometry theory. Considering the tradeoff between data delay and energy exhaust, we propose a novel adaptive power transfer (APT) strategy that allows more energy depletion to improve data delay. The resource scheduling in this article mainly refers to D2D users considering the current changing channel state, as well as the state of data buffering and energy buffering, and selecting the best APT strategy. The expressions of carrier-to-interference and noise-ratio (CINR) interval division, which contains the interferences from CU and the other D2D users, are raised. Third, the APT-based joint resource scheduling problem is ascribed as a data and energy coupled queuing model (DECQM) that investigates the interdependence of data and energy. Using the quasi-birth and death (QBD) method, the steady-state distribution of the DECQM can be obtained by solving the transition probability matrix and transition probability submatrix. The expressions of the system performance parameter, such as throughput, delay and packet drop rate for both the data queue and energy queue, are also derived.
The main contributions of this work can be summarized below: • We proposed a novel APT strategy that allows more energy depletion to improve data delay in the EH D2D communication scenario. Considering the mobility of end users with random geometry theory, we raise the expressions of CINR interval division, which contains the interferences from CU and the other D2D users.
• Compared with traditional performance analysis of EH D2D communication networks, we focus on the more general data and energy Poisson arrival process. With the help of the data buffer and energy buffer, a joint data and energy resource scheduling problem is considered. We also focus on the tradeoff between energy consumption and data packet transmission and discuss how to achieve better system performance in time-varying channel surroundings.
• Different from the conventional buffer-aided D2D communication system, we jointly investigate the typical D2D transmission pair with independent arrival but interactive departure data packets and energy packets. The cooperative APT-based joint resource scheduling problem is ascribed as a DECQM that investigates the interdependence of data and energy. Applying the QBD process method, we can derive the expressions of system performance parameters such as throughput, delay and packet drop rate for both the data queue and energy queue. Some conclusions about using redundant energy expenditure to improve system performance are discussed. The rest of this article is organized as follows. The second section gives the corresponding assumptions and builds the system model and queue model. Section III proposes an adaptive power transmission strategy. In section IV, the quasibirth process method is used to solve the coupled processor queuing model. Section V derives some key performance indicators of the system. In section VI, the validity and rationality of the theoretical analysis are verified by Monte Carlo simulation. Finally, section VII briefly summarizes the conclusions and future research directions.

II. SYSTEM MODEL AND QUEUE MODEL A. SYSTEM MODEL
We consider an energy harvesting D2D communication heterogeneous cellular network, which allows a set of D2D users to multiplex the uplink channel of the cellular user, as shown in Figure 1. To simplify the expression, we assume that the base stations are arranged in a hexagonal grid in which its coverage radius is approximately R = 1 √ πλ B and the density is λ B [6]. Cellular users and a group of D2D user pairs share uplink channels with bandwidth B 0 . The data at the D2D transmitter come from the transmitter or from the communication network; for example, it is obtained from the BS or another D2D as the receiver. Due to the complexity and diversification of data sources, we assume that the arrival of data obeys a Poisson distribution.
Random geometry theory is a powerful mathematical tool for modeling and analyzing randomly distributed devices in the network and uses a random point process to simulate the location of the device. Based on the relevant theory of the random point process, the average performance of the network is derived. The locations of cellular users and D2D users are subject to the Poisson point process (PPP) C and D with parameters λ C and λ D , respectively. The distance between the cellular user and the serving base station is r, r ∈ (0, R) with probability density function 2r R 2 . For convenient analysis, we assume that the distance of each D2D user pair is fixed. Different from [5], each D2D transmitter is equipped with a data buffer and an energy buffer. The information scheduling process of the D2D terminal is as follows. The D2D terminal simultaneously performs energy collection and data transmission processes. Before each data transmission, the transmission power is determined according to the current channel state and energy storage state.
The D2D terminal adopts an adaptive strategy for adjusting the transmission power, which achieves a better information transmission effect. However, it is not easy to implement an adaptive power transmission strategy that considers both channel state and energy state. Therefore, a continuously changing channel is divided into M + 1 intervals, and their transmission powers are expressed as P D k,i , i = 0, 1 . . . , M ., where k represents the typical D2D transmission pair to be studied. P D k,0 = 0 indicates the transmit power of the cut-off area.
This article uses the underlying D2D communication mode, in which D2D users multiplex the channel frequency of CU users. Interference comes from CU users and D2D users that use the same frequency in this cell and neighboring cells. We can write the Shannon capacity at a typical D2D k receiver as where d k,k , h k,k and α D represent the transmission distance between typical D2D user pairs, small-scale fading coefficients and path loss factors, respectively. I C and I D represent interference from cellular users and other D2D users on the same channel, respectively. σ 2 is the noise power. The carrierto-interference and noise-ratio CIN R k of a typical D2D user k is defined as follows:

B. QUEUE MODEL AND ASSUMPTION
This article uses EH technology in [8][9][10][11][12], assuming that the energy at the D2D transmitter is derived entirely from the energy harvesting process. The energy harvesting source can be solar or radio frequency energy. Different from [5], we focus on the joint resource scheduling problem with data arrival and energy harvesting coupling. Then, the DECQM model is modeled, which is illustrated in Figure 2. The typical D2D transmitter equips a data buffer and an energy buffer, where the max buffer dimensions of the data buffer and energy buffer are N d and N e , respectively. We define a two-state Markov process (i, j) , i ∈ {0, 1, 2, . . . , N d } , j ∈ {0, 1, 2, . . . , N e } to describe the DECQM, where i and j denote the data queue length and energy queue length, respectively. The D2D transmitter is equipped with an energy collection device, which can perform energy collection and data transmission at the same time. This article assumes that the energy required for data transmission comes entirely from the energy collection process. To ensure the smooth and efficient transmission of data, the collected energy must be fully utilized. This is also the meaning of the formula derivation in the following sections. Since energy harvesting is a continuous process that is not easy to analyze, we discretize this continuous energy harvesting process. We define the size of an energy pack to E 0 and fix the D2D communication transmission slot length to T 0 . The M transmit powers satisfy the following relationship: According to the research results in [22], if the energy consumed is fixed, the stable transmit power in the whole transmission slot is the highest energy efficiency type.
The unit data packet size is fixed to L 0 , and we have . . , M . (5) Thus, we can divide the intervals of different transmit powers as follows: As mentioned above, unlike the cache configuration in [5], each D2D transmitter is equipped with a data buffer and an energy buffer. The buffer size of the data queue is N d , and the buffer size of the energy queue is N e . We consider the more general case, where data arrival and energy arrival obey Poisson distributions with parameters λ d and λ e , respectively. p d x and p e x represent the probability of x data packets or energy packets arriving in the current transmission slot.
The system state includes the data cache state and the energy cache state, as well as the D2D terminal transmit power. We adaptively adjust the system state according to the changing current channel state, which mainly refers to adaptively adjusting the transmission power of the D2D terminal. The strategy of adaptively adjusting the transmission power of the D2D terminal according to the current channel state is explained in the next section. VOLUME 8, 2020

III. ADAPTIVE POWER TRANSFER STRATEGY
The D2D transmitter adaptively adjusts the transmitting power according to the channel state through the collected energy and transmits a fixed-size data packet L 0 in a fixed time slot T 0 . Considering the Poisson point process distribution of D2D users and CU users, we divide the M + 1 channel state intervals according to channel gain as p ch x , x = 0, 1, 2 . . . , M . p ch x indicates the probability that the current transmission slot channel is in the first state x.
Case 0: the typical D2D user k in the current transmission time slot satisfies CIN R k < CIN R k,M . The current channel status is lower than the cut-off transmission threshold CIN R k,M . The channel status is so poor that data transmission cannot be performed. At this time, more energy packets are consumed, and it is too costly to transmit data with larger transmission power.
Case 1: the typical D2D user k in the current transmission time slot satisfies CIN R k > CIN R k,1 , which indicates that the current channel conditions are very good and exceeds the set fixed power threshold. Thus, a fixed transmit power can be used, and a single energy packet can send a data packet.
Case i: the typical D2D user k in the current transmission time slot satisfies CIN R k,i−1 > CIN R k > CIN R k,i , which indicates that the current channel conditions are average, which is lower than the set fixed power threshold. The adaptive power transmission strategy proposed in this paper needs to be adopted. Different transmission powers can be adjusted according to changes in current channel conditions, and a unit of data packets can be sent with different numbers of energy packets.
Proposition 1: The probability p ch x , x = 0, 1, 2 . . . , M in the current transmission slot with state x is expressed as follows: (11) where (12), as shown at the bottom of the page, Proof: See Appendix.

IV. QBD METHOD FOR SOLVING DECQM
The QBD process is the latest development of the classic birth-death process, and it has become an important tool for the analysis of complex stochastic models in the past three decades. In section III, we conclude the system model and queuing model, which becomes very complicated after considering the energy and data coupling relationship. Therefore, we introduce the QBD process method to solve it in this section. First, we define a two-dimensional Markov process (i, j) , i ∈ {0, 1, 2 . . . , N d } , j ∈ {0, 1, 2 . . . , N e } that contains (N d + 1) 2 (N e + 1) 2 states. Since the Markov process (i, j) becomes more complicated with the increase in the data buffer N d and the energy buffer N e , we can use the QBD method to solve it step by step [6]. A subset of states where i and j are horizontal and phase, respectively. The transition probability matrix of DECQM is expressed as where the elements on the secondary diagonal A i , i = 1, 2 . . . , N d and the elements on the main diagonal B i , i = 0, 1 . . . , N d are submatrices of the transition probability. The complex boundary in this article is reflected in the fact that the N e triangular elements on the entire C J I , First, the expression of the transition probability submatrix where P A i m,n with m, n = 0, 1 . . . , N d denotes the transition probability from state subset L i to state subset L i+1 . Different from the Bernoulli arrival hypothesis of data and energy in the literature [6], this paper considers the more general Poisson arrival, so the boundary conditions of the transfer concept matrix are more complicated.
After careful deduction, we find that the elements in the principal diagonal, the N e diagonals of the upper triangle, and the M diagonals of the lower triangle are not zero. The other elements are zero. Jointly considering the APT protocol and the data and energy Poisson arrival, the nonzero elements in A i , i = 0, 1 . . . , N d − 1 are given as where P A i j,j , j = 0, 1 . . . , N e is the expression of the principal diagonal element in A i , i = 0, 1 . . . , N d − 1. P A i j+m,j , m = 1, 2 . . . , M − 1 is the expression of the secondary diagonal element in the upper triangle, and P A i j+M ,j is the boundary condition. P A i j,j+n , n = 1, 2 . . . , N e − 2 is the expression of the secondary diagonal element in the upper triangle, and P A i j,j+n , n = N e − 1, N e is the boundary condition.

VOLUME 8, 2020
A N d is the boundary condition of submatrix A i in the transition probability matrix of P, which is given as where P B i m,n with m, n = 0, 1 . . . , N d denotes the transition probability from state subset L i to state subset L i . Similar to A i , i = 1, 2 . . . , N d − 1, the elements in the principal diagonal, the N e diagonals of the upper triangle, and the M diagonals of the lower triangle are not zero. The other elements are zero. Jointly considering the APT protocol and the data and energy Poisson arrival, the nonzero elements in B i , i = 1, 2 . . . , N d − 1 are given as where P B i j,j , j = 0, 1 . . . , N e is the expression of the principal diagonal element in B i , i = 1, 2 . . . , N d − 1. P B i j+m,j , m = 1, 2 . . . , M − 1 is the expression of the secondary diagonal element in the upper triangle, and P B i j+M ,j is the boundary condition. P B i j,j+n , n = 1, 2 . . . , N e − 2 is the expression of the secondary diagonal element in the upper triangle, and P B i j,j+n , n = N e − 1, N e is the boundary condition.  In section III, we propose a novel adaptive power transfer strategy that allows more energy depletion to improve data delay. To depict the tradeoff between the data delay and VOLUME 8, 2020 energy depletion, the relationship between N e and M needs to be discussed.
When N e < M and n = 1, 2 . . . , N e − 2, we have As seen from the transition probability matrix P, the entire upper triangular element is not zero, which leads to a more complicated formula derivation. Since the upper triangular elements are transition probability submatrices, and the number depends on N e , it is impossible to discuss their expressions one by one. After careful deduction, we give the general expression of the upper triangular elements C J I , I = 0, 1 . . . , N d − 1, J = 1, 2 . . . , N e as follows.
We can adopt the same method to acquire the submatrix of transition probability C J I , and the nonzero elements in them are given as follows.
Adopting the same analytical method as previously, we find that the elements in the principal diagonal, the N e diagonals of the upper triangle, and the M diagonals of the lower triangle are also not zero. To derive the expression of the principal diagonal element P C J I j,j , the relationship between N e and M should be discussed.
When N e > 2M and n = 1, 2 . . . , N e − 1, we have When N e = 2M and n = 1, 2 . . . , N e − 1, we have When N e < 2M and n = 1, 2 . . . , N e − 1, we have the expression of P C J I j,j in formula (47), as shown at the bottom of the next page.
The elements in the diagonals of the lower triangle are given as in formulas (48), as shown at the bottom of the next page, and (49).  C N e I is the boundary condition of submatrix C J I in the transition probability matrix of P, which is given as When I = 0, 1 . . . , N d −1, J = N e , we have the elements in the principal diagonal and in the upper triangle as follows: To derive the expression of the upper triangle element, the relationship between n and M should be discussed.
When n < M , we have We introduce the QBD process method to solve DECQM, which makes the problem description and analysis easy. Through deduction, we summarized all the transition states in detail. Thus, we can use a long-running process to obtain the steady-state distribution of QECQM, which further obtains a series of system performance parameters. Some conclusions about the DECQM model are described in the next section.

V. QUEUING STATIONARY ANALYSIS AND PERFORMANCE PARAMETER A. CAUSALITY RELATIONSHIP AND THE ITERATIVE SOLUTION
The queuing system of the steady-state distribution (SSD) of the DECQM can be obtained after a long-running process. Therefore, the SSD of the DECQM is defined as = [π(i, j)] , i = 0, 1 . . . , N d , j = 0, 1 . . . , N e when the time shaft is infinity. The SSD of the data and energy queue length are represented as follows: , π e (0) . . . , π e (N e )] π e (j) = where d and e are the SSD of the data and energy queue length, respectively. According to normalization to the sum of probability, the SSD of the DECQM is solved by classic queuing theory as follows: To gain the steady-state distribution and transition probability matrix P, we provide a fixed point iteration algorithm to solve (64) as Algorithm 1 in Table 1.

B. PERFORMANCE METRICS
After obtaining the SSD of the DECQM by the fixed point iteration algorithm, we can obtain some system performance parameters, such as the average data queue length, average energy queue length, average data throughput, average energy consumption, average data delay, average energy consumption delay, data drop probability and energy drop probability.
The expressions of average data queue length E [Q data ] and average energy queue length E Q energy can be acquired as follows: Given the channel state x probability of current transmission slot p ch x , x = 0, 1 . . . , M , the average data throughput E [Thr data ] and average energy consumption E Thr energy can be acquired as Then, the average data delay E [Del data ] and average energy consumption delay E Del energy are obtained by Little's formula as follows: We can also write the closed-form expressions of data drop probability p Drop data and energy drop probability p Drop energy as follows:

VI. SIMULATION RESULTS
In the previous sections, we used stochastic geometry theory and quasi-birth-and-death process methods to derive various performance indicators for EH D2D underlying cellular networks. For the performance of the average queue length, average throughput, average delay, and average packet loss rate of data and energy packets in DECQM, this section  uses the Monte Carlo method to verify the correctness of the formula derivation. Then, practical guidance for real system performance with physical significance is given. For the adaptive power transmission selection factor M in section III, a larger value of M indicates a more flexible power selection strategy. The setting of the M value indicates that for changing channel conditions, the maximum allows M energy packets to send one data packet, thereby offsetting the delay caused by poor channel conditions. That is, even under poor channel conditions, we can sacrifice more energy in exchange for improvement in system delay performance. We compare the performance difference between the fixed power transmission (CPT) strategy in the literature [6] and our proposed APT strategy. For a clearer comparison, the channel condition dividing value is set as M = 2, 4, 6. For the Poisson arrival process, the data arrival rate parameter λ d and energy arrival rate parameter λ e satisfy 0 ≤ λ d ≤ 1 and 0 ≤ λ e ≤ 1, respectively. The size of the data packet remains fixed, and one data packet contains 50 bits. The same simulation setting is used as in [6], and its specific simulation parameters are shown in Table 2. Figure 3 uses a 3D graph to describe the performance curve where the average queue length changes with average data arrival and energy arrival when DECQM is fixed at M = 2. It can be seen from the figure that as the average data arrival rate increases, and the average energy arrival rate decreases, Queue data gradually increases and tends to saturation. In particular, when the average energy arrival rate in the system is low, as the average data arrival rate λ d increases, the data queue quickly saturates because it cannot be sent. Conversely, if the average data arrival rate of a fixed system increases as the average energy arrival rate λ e increases, the data queue slowly decreases and then stabilizes. This is because, with the abundance of energy in the system, the APT strategy allows us to use more energy to send data under poor channel conditions, thereby alleviating the pressure on the data queue. Figure 4 depicts the average data throughput as a function of the average data arrival rate when the fixed average energy reaches λ e = 1.6. From the figure, it is not difficult to find that with the change in the average data arrival rate, the theoretical expression of average data throughput and the computer simulation value tend to be basically consistent, further verifying the correctness of the theoretical derivation. The average energy reaches a fixed level, which means that the energy supply in the middle system is relatively stable. As shown in Figure 4, as the average data arrival increases, increasingly more data are sent in the system. In general, all the average data throughput curves increase with increasing data arrival and then become saturated. The throughput performance of the APT strategy proposed in this paper is better than the CPT strategy in [6]. When M = 2, 4, 6, M = 2 is the best choice. Specifically, after the energy of the fixed system arrives, when the data reach a lower level, the performance advantage of M = 2 is not obvious. However, with the gradual increase in the average data arrival rate, the performance gap gradually becomes prominent because it can provide maximum data throughput and moderate energy consumption.
As shown in Figure 5, the theoretical derivation and the computer simulation curve basically coincide, which verifies the correctness of the theoretical derivation. Under different M values, as the average data arrival rate increases, the average data delay of DECQM begins to increase and then remains basically stable. The use of a large M indicates that more energy packets are needed to send the data packets,  and as a result, the data delay performance is degraded. Excessive consumption of energy packets will cause the delay of subsequent data packets due to insufficient energy. Figure 5 illustrates the average data delay as a function of the average data arrival when λ e = 1.6. In general, all the average data delay curves increase with increasing data arrival and then become saturated. The data delay performance of the APT strategy proposed in this paper is better than the CPT strategy in [6]. As mentioned earlier, when the channel status is not good, using a larger M value and consuming more energy packets to send a packet may reduce the data delay in the buffer, but it does not always achieve the best data throughput performance. For the proposed APT strategy, all the average data delay curves with different M values increase with increasing average data arrival. The M =2 has the lowest delay, M = 4 is second, and M = 6 has the worst delay. Figure 6 uses a 3D graph to describe the performance curve where the average data delay varies with average data arrival and energy arrival when DECQM is fixed at M = 2. It can be seen from the figure that as the average data arrival rate increases and the average energy arrival rate decreases, Delay data gradually increases. In particular, when the average energy arrival rate in the system is low, as the average data arrival rate increases, the data delay increases rapidly because the data cannot be sent in time. Conversely, if the average data arrival rate of a fixed system increases, as the average energy arrival rate increases, the data delay will gradually decrease and then reach stability. This is because, with the abundance of energy in the system, the APT strategy allows us to use more energy to send data under poor channel conditions, thereby alleviating the pressure of data transmission. Figure 7 depicts the performance curve of the average data packet loss rate as the average data arrives. It can be seen from the figure that the theoretical derivation value of the average data packet loss rate is basically consistent with the computer simulation value, further verifying the correctness of the numerical derivation. When λ e = 1.6, all the average data drop rate curves increase with the increase in data arrival and then finally become saturated. The data drop rate of the APT strategy proposed in this paper is better than the CPT strategy in [6]. When the energy resources are more abundant than the data resources, such as λ d < 0.1, the data packet loss performance curve of M = 2, 4, 6 is relatively close. When λ d > 0.1, as the data resources gradually increase, the performance difference in the data packet loss rate with different M values is only highlighted. Strategy M = 2 has the best performance, and strategy M = 4 and strategy M = 6 have the worst performance.

VII. CONCLUSION
In this paper, we focus on buffer-aided energy harvesting device-to-device communication networks. We describe the packet and energy packet Poisson arrival and the interactive leaving scenario. Then, the system model is described as the DECQM model. Applying the quasi-birth and death method, we solve the steady-state distribution of the proposed coupling processor queuing model and derive a few system performance parameters. The simulation verifies the accuracy of the theoretical derivation results.

APPENDICES PROOF OF THEOREM 1
We take p ch 0 as an example, which represents the probability of the CINR of typical D2D user k in current transmission time slot satisfy CIN R k < CIN R k,M .
Taking into account the properties of random geometry, we have where f r (r) is the distance from the typical D2D user to the served BS, which is equal to 2r R 2 , r ∈ (0, R). Because of g k,k ∼ exp (1), By further simplification and conversion, we can obtain By following the same derivation given in [6], the coverage probability is expressed as |r , I C , I D rdr  YING WANG received the bachelor's degree in telecommunication engineering from the Nanjing University of Posts and Telecommunications (NUPT), China, in 2014, where she is currently pursuing the Ph.D. degree in communication and information system. Her research interests include optimization for wireless communication and physical resource allocation for D2D Networking. VOLUME 8, 2020