Machine Learning-Based Energy-Spectrum Two-Dimensional Cognition in Energy Harvesting CRNs

Energy harvesting cognitive radio network (EH-CRN) is a promising approach to address the shortage of spectrum resources and the increase of energy consumption simultaneously in wireless networks. In this article, we propose a novel machine learning (ML)-based energy-spectrum two-dimensional (2D) cognition technology to improve the sensing accuracy as well as the network throughput in EH-CRNs, which consists of sensing, prediction and decision modules. More specifically, we first study the 2D sensing module which is achieved by a carefully constructed dynamic Bayesian network (DBN) to effectively exploit the coupling between spectrum usage and energy harvesting in EH-CRNs. Then we propose a deep neural network (DNN) based 2D transmission decision module to optimize the transmission energy of secondary users (SUs). With our proposed novel 2D cognition scheme, SUs can characterize the energy-spectrum correlation and transmit data with optimal transmission energy. The proposed ML-based 2D cognition is evaluated via extensive simulations in terms of sensing accuracy, prediction accuracy, and network throughput, and simulation results indicate that our proposed scheme significantly outperforms the conventional one-dimensional (1D) cognition scheme working in spectrum or energy dimension only.


I. INTRODUCTION
With the rapid increase of service demands in wireless communications, the spectrum resource limitation has become a bottleneck problem. However, studies reveal that the utilization efficiency of most licensed bands is considerably low [1]. To tackle this apparent contradiction between the limited spectrum resource and the low spectrum utilization, researchers have proposed cognitive radio networks (CRNs) [2], which allow unlicensed secondary users (SUs) to cognize the occupancy status of licensed spectrum and access the idle spectrum to the primary users (PUs) to improve the spectrum efficiency. Meanwhile, along with the fast The associate editor coordinating the review of this manuscript and approving it for publication was Waleed Ejaz . growth of wireless services, the energy consumption has also become a critical issue for wireless communication networks because of the increasing concerns on energy cost and carbon emission [3].
Aiming to conquer the double challenges in improving spectrum efficiency and reducing energy consumption, the energy harvesting CRN (EH-CRN) driven by renewable energy has received increasing attention from both academia and industry [3]- [8]. EH-CRNs are usually defined as CRNs wherein SUs perform energy harvesting powered by electromagnetic radiation, light, thermal gradients, or fluid flow from environment. For some EH-CRNs deployed in the harsh environment, where the grid-electricity-based power supply to PUs and SUs is unavailable, PUs will also harvest energy from environment to transmit data [6]. Hence both PUs and VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ SUs can perform energy harvesting in EH-CRNs, which may find interesting and critical applications in many practical scenarios. In this case, the licensed spectrum status (occupied or idle), is not only determined by whether PUs have data to transmit, but also depends on whether PUs have harvested enough energy. To this end, SUs shall carefully utilize the information in spectrum and energy dimensions jointly to estimate the licensed spectrum status with more efficient spectrum and energy utilization. This new problem is very challenging, which is the focus of our paper. There are some traditional methods focusing on sensing the spectrum status in EH-CRNs [4], [5]. For example, Chung et al. consider a fixed-length temporal sequential sensing system, which takes advantage of temporal correlation within the fixed duration in EH-CRNs, and the average throughput of the secondary network is maximized by balancing the sensing duration and the sensing threshold [4]. In a paper by Olawole et al. [5], SUs at different locations implement a cooperative spectrum sensing scheme, which takes into account the spatial correlation among different SUs, and the network throughput is enhanced by employing a near-optimal mixed integer nonlinear sensing. However, traditional methods of using temporal correlation in data transmission of PUs and SUs separately [4] cannot fully exploit the spatial correlations between PUs and SUs, while other methods exploiting spatial correlation [5] are difficult to adapt to continuous changes with correlation in the time domain. It is noted that a lack of consideration for temporal-spatial correlations between PUs and SUs may lead to inaccurate sensing results. Therefore, novel schemes that can address this difficulty in characterizing the temporal-spatial correlation are called for to achieve more efficient EH-CRNs.
Recently, machine learning (ML) techniques have been applied to conventional CRNs and EH-CRNs to exploit the temporal-spatial correlations between PUs and SUs. Some ML-based temporal-spatial sensing methods are discussed in conventional CRNs to promote the sensing performance. For example, Qiu et al. apply recurrent neural network (RNN) over temporal-spatial model to improve the cognition accuracy and traffic prediction performance of CRNs [9]. In a paper by Han et al. [10], the licensed spectrum status is obtained from both spatial and temporal domains by using Bayesian network (BN). Moreover, there are some emerging works focusing on temporal-spatial sensing in EH-CRNs. Specifically, some reinforcement learning (RL) based energy harvesting resource allocation schemes are proposed to enhance the capacity of the EH-CRNs [7], [8]. However, the works in [7] and [8] do not make use of the potential correlation between PUs and SUs in the energy dimension. Note that such correlation is an inherent but possible hidden behavior in the process of energy harvesting, which requires an in-depth mining to grasp the essential mechanism. It is worth pointing out that the licensed spectrum access of PUs is constrained by their harvested energy, and hence the spectrum cognition capability and spectrum utilization can be improved by finely mining the correlation of harvested energy between PUs and SUs. However, most existing ML-based sensing methods [7]- [10] are not capable of jointly characterizing temporal-spatial correlations in both energy and spectrum dimensions. To overcome this difficulty, dynamic Bayesian network (DBN) is introduced in EH-CRNs, which bears a potential to jointly characterize the information from spectrum and energy dimensions [11]. In the work of [11], Zhang et al. first integrate the correlations between PUs and SUs in energy and spectrum dimensions to improve the sensing accuracy in EH-CRNs by applying DBN. However, the superiority in spectrum status prediction and transmission decision for the future slot of cognition is not considered. In fact, the spectrum prediction and the transmission decision for the future slot are critical for efficient spectrum utilization. To elaborate further, if the SUs transmit data only based on the sensing results, the duration of transmission time will be limited in the current slot and a large amount of sensing overhead is not fully utilized. In addition, it is assumed that SUs in [11] can only use the energy harvested at the moment to transmit data without energy buffer to restore energy, which is not practical and may cause extra energy waste as compared to energy storage scenarios. To sum up, the existing works with ML technique in EH-CRNs cannot achieve comprehensive energy-spectrum two-dimension (2D) cognition, which should consider the energy storage and the spectrum access in the forthcoming transmission slots for enhanced spectrum utilization with smart and proactive decision. To this end, there is an urgent need to investigate a comprehensive ML-based energy-spectrum 2D cognition scheme that integrates spectrum status sensing and prediction for transmission decision of SU.
In this article, we consider an EH-CRN scenario, where both PUs and SUs perform energy harvesting equipped with energy buffer for practical energy storage. In particular, we apply ML technique in energy-spectrum 2D cognition, which can effectively exploit the correlation between spectrum usage and energy harvesting in EH-CRNs. Our proposed energy-spectrum 2D cognition scheme integrates spectrum status sensing and prediction for transmission decision of SU, which consists of the sensing, prediction and decision modules, respectively, to improve the sensing and prediction accuracy and the network throughput. To be specific, the contributions of this article are as follows.
1) We design a new energy-spectrum 2D cognition scheme, which contains not only the spectrum sensing module but also the state prediction module and the transmission energy decision module. Different from the existing cognition schemes that only contain sensing module [7], [8], [10], [11], our proposed 2D cognition scheme is more comprehensive and practical. In particular, the prediction module and the decision module are complementary to spectrum sensing for predicting the licensed spectrum status by exploiting the observations of SUs and estimations of PUs, thus making a smart and proactive decision on the transmission energy of SUs for enhanced performance. 2) We propose a DBN-based energy-spectrum 2D sensing module, which exploits 2D correlations between PUs and SUs, and estimates the harvested energy and the transmission energy of PUs at the current slot. Note that the correlations are not mined in traditional 1D sensing [4], [5] or existing ML-based temporal-spatial sensing [7]- [10] schemes. Compared with these existing works, our proposed DBN-based 2D sensing characterizes the energy-spectrum 2D correlations by using the hidden Markov model (HMM), a graphic realization of DBN, which effectively reveals the inherent behaviors of this complicated correlations. Empowered by DBN-based 2D sensing, the sensing accuracy can be significantly improved.
3) To further enhance network throughput, we propose a 2D prediction module to predict the transmission energy of PUs, the sensing signal of SUs, and the harvested energy of PUs and SUs in the forthcoming slots, and further propose a ML-based 2D decision module to determine the transmission energy of SUs by using prediction results. Different from the previous works that SUs always transmit data with a greedy strategy to exhaust the harvested energy in each slot [11], SUs in our scheme can dynamically optimize energy storage and transmission in the forthcoming slots to achieve better spectrum utilization.
The remainder of this article is organized as follows. Section II describes the system model of EH-CRN. In Section III, problem formulation in an energy-spectrum 2D cognition scheme of EH-CRN is analyzed, which contains the sensing, prediction, and decision modules. In Section IV, the proposed ML-based energy-spectrum 2D cognition algorithm is studied. Simulation results are shown in Section V. In Section VI, conclusions and future research directions are discussed.

II. SYSTEM MODEL
As illustrated in Fig. 1, we consider an energy-spectrum joint cognitive system in an EH-CRN. In the following part of this section, the network structure, spectrum sensing model, energy harvesting correlation, prediction and decision for transmission of SU are presented.

A. NETWORK STRUCTURE
As shown in Fig. 1, there is a primary network composed of a primary base station (BS) and an energy harvesting PU, and a secondary network composed of a secondary BS and an energy harvesting SU. The primary BS and secondary BS receive the data transmitted by PU and SU, separately. Both PU and SU perform energy harvesting to collect energy for transmission. The spectrum is licensed to the PU, thus SU needs to detect spectrum status to access the idle spectrum. Besides, both PU and SU are equipped with energy batteries as energy buffer to restore the harvested energy. Within our considered scenario, the licensed spectrum is occupied, which is not only related to whether the PU has data to transmit, but also to whether the harvested energy of PU is enough for transmission. In this case, the SU has to sense both the transmitted signal of PU in the spectrum dimension and the harvested energy of PU in energy dimension, which motivates the energy-spectrum 2D cognition of SU. Figure 1 illustrates the working process of an EH-CRN with energy-spectrum 2D cognition. It is noted that both PU and SU harvest energy from the environment for data transmission. If there is a transmission mission in the primary network, the PU will transmit data as soon as enough energy is harvested. For SU, when there is a transmission mission in the secondary network, it will transmit data only if the licensed spectrum status is idle. Therefore, the SU performs 2D sensing to jointly estimate the spectrum utilization and the harvested energy of PU at the current slot. Armed with the sensing results, the prediction and decision for the future slot are possible, which can facilitate the transmission of SU.

B. SPECTRUM SENSING MODEL
Spectrum sensing is based on the detection of the signals transmitted from the PU through the local observations of SU, wherein matched filter detection, energy detection, and cyclostationary feature detection are among the popolar solutions [2]. Energy detection is a method with simple operation and high detection accuracy [12], which is adopted in our spectrum sensing module of SU. The energy detection of the transmitted signal from PU is compared with a given detection threshold to make a judgement on whether the PU transmits data at the current slot. Specifically, let s t denote the transmitted signal of PU and r t (n) denote the received signals as observations with N samples at slot t, where n = 1, 2, . . . , N . The detection of the transmitted signal at SU can be presented as where n t (n) is the additive white Gaussian noise (AWGN) of the nth sampling point at slot t, u t is the transmitted signal of SU that is known by SU, and h t is the channel coefficient from PU to SU that is not varying across the N sampling points. Assuming the number of sample N is sufficiently large, the sum energy of the detection signal at SU Y t follows a Gaussian distribution, and the mean of distribution is related to the transmitted signal of PU s t . It is worth noting that the detection energy is different from the harvested energy. The amount of detection energy is very small and cannot be used for the transmission of SU [12].

C. ENERGY HARVESTING CORRELATION
Both PU and SU need to harvest energy from the environment for data transmission. Therefore, both PU and SU are equipped with energy harvesting devices to perform energy harvesting. The harvested energy at the geographically adjacent PU and SU are strongly correlated, since the envirionment presents a great deal of continuity both in time and space.
In the time domains, given the fact that the amount of energy harvested is closely related to the ambient environment, e.g., solar energy depends on weather conditions and wind energy depends on wind speed, the amount of energy harvested varies across multiple timescales. The temporal correlation in energy dimension is well exploited by using time-series data from the previous slots. For example, the harvested energy may follow Markov distribution for time horizon from several milliseconds up to a few minutes [13].
The spatial correlation between PU and SU in energy dimenssion can be further observed by using some existing datasets. For example, one dataset [14] from four photovoltaic (PV) sites is utilized here for illustrative purpose. Similar to the energy harvesting PUs and SUs, the PV sites are used for energy harvesting, the spatial correlation patterns among these PVs may show similarity to those between the PUs and SUs, thus offering valuable insights to our work. More specifically, we have analyzed the relationships on the harvested energy between each two geographically neighboring PV sites. Enumerating all the pair-wise relationship among the four PV sites, there are six pairs of energy harvesting relationships, and the approximately linear correlations are observable as shown in Fig. 2. The above observations offer valuable insights. The temporal and spatial correlations of energy harvesting can be exploited in EH-CRNs since PU and SU are adjacent to each other. The correlations will provide additional information about PU activities to enable SU to utilize the licensed spectrum efficiently.

D. PREDICTION FOR TRANSMISSION OF SU
If the SU has data to transmit and harvests enough energy, it must ensure that the PU does not have access to the licensed spectrum. The SU can estimate the status of licensed spectrum by using the sensing results; if the spectrum status is idle, SU can transmit data. It is worth noting that if considering the sensing module only, the SU cannot transmit data until obtaining the sensing results in every time slot, which will cause a waste of spectrum resources. Compared to the cognition schemes only considering sensing module, the SU in the cognition schemes considering sensing and prediction modules can predict the status of the licensed spectrum in the forthcoming slots based on the information at the current slot. In this way, the status of the licensed spectrum at the current slot has been predicted in previous slot, so the SU can transmit data at the beginning of the slot, which improves spectrum utilization efficiency.

E. DECISION FOR TRANSMISSION OF SU
Shannon formula provides a general and useful metric to evaluate the relationship between transmission rate and energy utilization of EH-CRN in AWGN channel. The instantaneous rate of PU R PU t and SU R SU t (in bits per second) over a licensed bandwidth can be expressed as where are the transmission energy of PU and SU which are not more than the residual available energy, h PU t , h SU t are the channel coefficients from PU and SU to BS, B is the system bandwidth, N 0 is the thermal noise power density, and t is the duration of data transmission. Note that when PU and SU transmit data at the same time, the spectrum accesses of PU and SU conflict, and the signals sent by PU and SU interfere with each other. The network throughput is further defined as where d PU t and d SU t are the spectrum access indexes of PU and SU, which indicate whether the PU or SU access the licensed spectrum and can only take the value 0 or 1.
From (3), the instantaneous rate and the transmission energy of SU show a logarithmic relationship, which means that the increase of the instantaneous rate will gradually slow down as the transmission energy keeps increasing. Meanwhile, reserving the energy of SU that is not used for transmission at the current slot can increase the available energy for the forthcoming transmission of SU to improve the network throughput. Note that greedily increasing the transmission energy of SU can maximize the instantaneous rate of SU but may decrease the network throughput. Therefore, a smart and proactive decision on the transmission energy of SU is necessary to balance the increase of the transmission energy and the reservation of more available energy for the future transmission.
To achieve the transmission decision of SU, the spectrum status should be accurately sensed and predicted, and the transmission energy should be optimized. The problems are formulated in a comprehensive cognition model in the next section.

III. PROBLEM FORMULATION IN 2D COGNITION SCHEME
In this section, we formulate the problem in our new energy-spectrum 2D cognition scheme of EH-CRN, which contains the sensing, the prediction and the decision modules, respectively. In particular, the sensing and prediction accuracy, which is evaluated by detection probabilitiy and false alarm probability, is enhanced in our proposed sensing and prediction modules, respectively, and the network throughput of EH-CRN is improved by optimizing transmission energy of SU in the decision module.

A. 2D SENSING MODULE
Sensing module is to detect whether the licensed spectrum is accessed by PU, which can be expressed as a binary hypothesis test H 0 (idle) or H 1 (occupied). In order to evaluate the sensing accuracy of spectrum status, the detection probability P d and the false alarm probability P f are introduced. The detection probability P d represents the probability that PU accesses the licensed spectrum and meanwhile SU detects the existence of the signal of PU in the spectrum. The false alarm probability P f represents the probability that PU does not access the licensed spectrum while SU detects the existence of the signal of PU in the spectrum. In a sensing problem, it is expected that P d is as large as possible when P f is a constant, and P f is as small as possible when P d is constant.
According to (1), the sensing signal in the licensed spectrum Y t is obtained. The 1D spectrum sensing problem can be formulated by maximizing the conditional probability of the transmitted signal of PU s t under the condition of the sensing Considering the 1D spectrum sensing accuracy, the detection probability P d and the false alarm probability P f can be expressed as where λ 1 is the threshold of the spectrum sensing. According to the temporal and spatial correlations in energy dimension, there is a linear relationship between the harvested energy of PU Considering the 1D energy sensing accuracy, the detection probability P d and the false alarm probability P f can be expressed as where λ 2 is the threshold of the energy sensing. To sum up, the above 1D spectrum sensing and 1D energy sensing can be achieved by using the observable information at SU in spectrum or energy dimension. However, both of the 1D spectrum sensing and the 1D energy sensing do not consider the correlation information from each other. If the spectrum sensing and energy sensing are jointly considered, the SU will detect the licensed spectrum more accurately and comprehensively. Therefore, the energy-spectrum 2D sensing is proposed to jointly exploit the observable information at SU in spectrum and energy dimensions. More specifically, the transmitted signal of PU s t and harvested energy of PU E PU t are jointly estimated by utilizing the sensing signal of SU Y 0 , . . . , Y t and the harvested energy of SU E SU 0 , . . . , E SU t . The energy-spectrum 2D sensing problem can be formulated by maximizing the joint conditional probability as Furthermore, the detection probability P d and the false alarm probability P f of 2D sensing can be expressed as It is noted that the traditional optimization methods in 1D spectrum or energy sensing are difficult to solve the 2D sensing problem without the prior information of the spectrum-energy 2D correlation. Specifically, the 1D sensing methods can estimate the transmitted signal of PU s t and the harvested energy of PU E PU t according to (5) and (8), respectively. However, the transmitted signal of PU s t in the spectrum dimension and the harvested energy of PU E PU t in the energy dimension are not independent of each other. The harvested energy at SU E SU t can provide energy-spectrum correlation information to PU s t . Moreover, the correlation coefficient between the conditional probabilities P(s t |Y 0 , . . . , Y t ) and P(E PU t |E SU 0 , . . . , E SU t ) is not a constant, but a multi-dimensional function based on s t and E PU t . Without the prior information of the coefficient, the traditional methods can only assume the correlation function in a specific form, which lacks of generalization ability.
Thanks to the ML method DBN, the 2D correlation can be characterized automatically and intelligently without any prior information of the correlation. DBN is based on the probability graph model, and uses the transition probabilities as parameters to characterize the relationships among the nodes in the graph, which is trained by statistics of historical data. On the basis of this feature, we utilize DBN to connect the observation information of E SU t and Y t to the detected information of E PU t and s t through the state transition matrices. The matrices consist of the conditional probabilities between the observation information at SU and the estimated information of PU. The conditional probabilities can be obtained by the training of DBN without the prior information about correlation. In this way, the transmitted signal of PU s t and the harvested energy of PU E PU t can be jointly estimated by DBN in a more accurate and generalized way, which will be discussed in detail in Section IV. With the 2D sensing results of the unobservable information, all information of PU and SU at the current slot can be obtained. By using the known information at the current slot and the energy-spectrum correlation characterized in DBN, the information of PU and SU in the forthcoming slots can be predicted in the 2D prediction module. The licensed spectrum status can be estimated as well through further operations.

B. 2D PREDICTION MODULE
In the 2D cognition scheme of EH-CRN, 2D prediction module is cascaded with 2D sensing module to achieve the prediction of the states of PU and SU in spectrum and energy dimensions in the forthcoming slots. Specifically, the sensing signal of SU Y t and the harvested energy of SU E SU t are observed at SU as the observation information, and the transmitted signal of PU s t and harvested energy of PU E PU t are estimated in the 2D sensing module as the sensing results. By using the observation information and the sensing results at the current slot, 2D prediction module aims to predict the harvested energy at PU E PU t+n , the transimitted signal by PU s t+n , the spectrum sensing signal by SU Y t+n , the harvested energy at SU E SU t+n , and the residual energy in the buffer of SU E R t+n , respectively, where n denotes the forthcoming time slot and n > 0. In particular, one-slot look-ahead prediction is applied when n = 1.
On the other hand, the energy-spectrum 2D correlation has already been characterized by using the conditional transition probabilities in the sensing module. The forthcoming states of PU and SU can be predicted by maximizing the conditional transition probabilities under the condition of the states in the current slot. The problem of 2D prediction based on the conditional transition probabilities can be formulated as P3 : max P4 : max Note that all of the conditional transition probabilities in the above problems (14) -(18) are obtained by DBN in the sensing module, which will be detailed in the Section IV later. Similar to the evaluation criteria of the sensing module, the evaluation criteria of the prediction module can also use the detection probability P d and the false alarm probability P f to represent the prediction accuracy. In particular, the detection probability P d represents the probability that the prediction is correct when the licensed spectrum status is occupied, while the false alarm probability P f represents the probability that the prediction is wrong when the licensed spectrum status is occupied, which can be expressed as follow: It should be pointed out that the estimation of the future spectrum status only depends on the prediction results of PU s t+n and E PU t+n , while all of prediction results of PU and SU are used as the input for the spectrum-energy 2D transmission decision of SU in 2D decision module following.

C. 2D DECISION MODULE
Through 2D sensing and prediction modules, the states of PU and SU in spectrum and energy dimensions are obtained. 2D decision module is the last step in the energy-spectrum 2D cognition scheme, which uses the prediction results to make a decision on how much energy the SU allocating to access the spectrum. In conventional CRN, SU is usually assumed to be stably powered, and thus SU transmits data with the maximum transmission energy when a spectrum hole is detected. In EH-CRN, the transmission energy of SU is limited by the amount of energy harvested. 2D transmission decision can improve the network throughput in two aspects. First, the better sensing performance of 2D sensing provides the more opportunity for SUs to access the idle spectrum with the protection of PUs. Secondly, 2D transmission decision can take advantage of and adapt to the random energy arrivals more effectively, because it can make full use of historical observed information about both spectrum and energy.
Our objective is to maximize the network throughput, which drives SU to increase the transmission energy for the improvement of the instantaneous throughput, and meanwhile to reserve more available energy in the buffer for the future transmission. Therefore, the optimal transmission energy of SU needs to balance the improvement of the instantaneous throughput and the future transmission. Assume that the size of data packet to be transmitted by SU at slot (t +n) is K t+n . When the instantaneous transmission rate is sufficient to complete the data packet transmission, the transmission rate does not need to be increased. The optimal energy allocation problem at each forthcoming slot can be formulated as C2 : where Here, E T ,PU t+n and E T ,SU t+n are the transmission energy of PU and SU, d PU t+n and d SU t+n are the spectrum access indexes of PU and SU, which indicate whether the PU or SU access the licensed spectrum and can only take the value 0 or 1, R PU t+n and R SU t+n are the instantaneous rate of PU and SU, and E R t+n is the residual energy at the beginning of the slot. Constraint (22) shows that SU will not increase the transmission energy as the rate is sufficient for data packet transmission in the slot. Constaint (23) is that the amount of the transmission energy is constrained by the sum of the harvested energy in the slot and the residual energy at the beginning of the slot.
In 2D transmission decision, the spectrum access index of PU d PU t+n cannot be accurately known and it is only inferred through sensing and prediction results. Furthermore, although the size of transmitted data packets K t+n and the channel coefficients h PU t+n , h PU t+n can be measured and recorded at slot (t + n), they are difficult to be obtained accurately in advance for the future decision at the current slot. Due to the fact that h PU t+n , h PU t+n , and K t+n cannot be accurately obtained as prior information, the traditional convex optimization method cannot solve the optimal energy allocation problem. ML methods have the ability to connect input and output without knowing the prior information from input to output, but using the neural network parameters to fit the transition process. To reduce the impact of uncertainty on the spectrum access index d PU t+n , the size of data packets K t+n , and the channel coefficients h PU t+n , h SU t+n , DNN-based transmission decision module is proposed, which establishes VOLUME 8, 2020 the relationship between the prediction results and the optimal transmission energy without additional prior information about d PU t+n , K t+n , h PU t+n , h SU t+n . The DNN-based transmission decision is detailed in the following section to maximize the network throughput.

IV. ML-BASED ENERGY-SPECTRUM 2D COGNITION ALGORITHM
To solve the problems of each module in 2D cognition, ML-based energy-spectrum 2D cognition algorithm is proposed. Specifically, energy-spectrum 2D correlations are characterized by DBN, the transmitted signal of PU s t and the harvested energy of PU E PU t are jointly estimated by DBN in the sensing module, the states of PU and SU in the forthcoming slots are predicted in the prediction module, and the transmission energy of SU are optimized by DNN in the decision module.

A. OVERVIEW ON ML-BASED ENERGY-SPECTRUM 2D COGNITION ALGORITHM
In this subsection, the ML-based energy-spectrum 2D cognition is described in details. Fig. 3 illustrates the structure of energy-spectrum 2D cognition, which contains the interactive relationship among the offline training modules and the online cognition modules. Historical information consists of PU and SU states, sizes of file and channel conditions in EH-CRN is used for the training of DBN in the sensing module and DNN in the decision module. The historical information is recorded at PU, SU and BS with real data at consecutive time slots. To elaborate further, the harvested energy and the transmission energy of PU are recorded at PU, the spectrum sensing signal, the harvested energy, the size of data packets SU are recorded at SU, and the channel conditions of PU and SU are recorded at BS. It is noted that the recorded information of PU cannot be used as the live information at SU in the online cognition, because the shortage of spectrum resources and the lack of stable energy supply limit the information communication between PU and SU.
Both offline training and online cognition based on DBN and DNN are carried out at SU. On the offline side, DBN is trained by the historical data of PU and SU states, and DNN is trained by all of the historical data consists of the states of PU and SU, the sizes of transmitted data packets and the channel coefficients of PU and SU in EH-CRN. With the trained DBN and DNN, energy-spectrum 2D correlations and the transmission decision policy of SU are obtained, which are further input into the online cognition. On the online side, the spectrum sensing signal and the harvested energy of SU at the current slot are detected by SU as the current observation. DBN based 2D sensing module estimates the hidden states of the harvested energy and the transmission energy of PU through the current observation. The future states of PU and SU are predicted based on the current observation and the current estimation of hidden states in 2D prediction module. The future states are input into the 2D decision module, and the transmission energy of SU are output. Finally, the SU will transmit the data packet with the optimal transmission energy output from DNN.

B. 2D CORRELATIONS CHARACTERIZED BY DBN
In this subsection, we adopt DBN to exploit the energy-spectrum 2D correlations during offline training and achieve 2D sensing during online computing. We first adopt a HMM structure which is the probabilistic graphical model of DBN to model the process of energy-spectrum 2D cognition. The 2D correlations are represented by the state transition matrices in the HMM structure. Note that the transition probabilities in the state transition matrices are the parameters of DBN, which are trained based on historical data. Based on the trained DBN, we propose an online algorithm to solve the 2D sensing problem.

1) THE PROBABILISTIC GRAPHICAL MODEL OF DBN
We adopt DBN to model energy harvesting and spectrum access of PU, the correlation between the harvested energy of SU and PU, and the correlation between transmitted signals of PU and spectrum detection of SU. Markovian behavior of PU is widely assumed in sensing the spectrum hole (e.g., in [11], [13]), and the energy arrivals of both PU and SU are modeled as finite-state Markov chains since finite-state Markov chains strike good balance between model complexity and accuracy for renewable energy harvesting. HMM structure is adopted as the probabilistic graphic of DBN to divide the states in the finite-state Markov chains into hidden states and observed states, as shown in Fig. 4(a).
In EH-CRN, the SU occupies the licensed spectrum only when the PU does not access. However, the harvested energy and the transmission energy of PU cannot be directly observed at SU to confirm the spectrum status directly. Furthermore, the shortage of spectrum resources and the lack of stable energy supply limit information interaction between PU and SU. Therefore, the SU could only estimate the hidden states of PU with observations at SU. Specifically, in the proposed HMM structure, the harvested energy and the transmission energy at PU are the hidden information of SU, while the spectrum sensing signal, the harvested energy and the residual energy at SU are the observations, and the detailed modeling is discussed as follows.
The proposed HMM structure is composed of the pair of distribution B 0 , B → , where B 0 is an initial distribution as shown in Fig. 4, and the residual energy at SU is assumed to be empty at the beginning. B → is a two-slice temporal Markovian chain as shown in Fig. 4(b), which defines the transition probabilities P(X t |X t−1 ) by means of a directed acyclic graph with the corresponding transition probabilities as O R t denote the spectrum sensing at SU, the harvesting energy at SU, and the residual energy at SU, which are the observed states of SU. To ensure that the number of states is finite, the values of states X s t are discretized into N s different levels from 0 to (N s − 1). HMM can well represent the correlations between PU and SU in both energy and spectrum dimensions by using the transition probabilities among the states. Pa(X s t ) is the parent of X s t in Fig. 4(b). Particularly, both the state of transmission energy at PU and the state of harvested energy at SU have two parents, and hence their current states depend on not only their own states at the previous slots but also the parent states at the current slot.
The transition matrices contain the spatial and temporal correlation information in energy and spectrum dimensions. The transition matrix of the harvested energy state at PUs A ∈ R N 1 ×N 1 , the transition matrix of the transmission energy state at PUs B(H E t ) ∈ R N 1 ×N 2 , the transition matrix of the spectrum sensing power state at SUs C ∈ R N 2 ×N 3 , and the transition matrix of the harvested energy state at SUs D(H E t ) ∈ R N 4 ×N 4 are given as where A, B(H E t ), C, D(H E t ) represent temporal correlation in energy dimension, the correlation between energy and spectrum dimensions, the spatial correlation in spectrum dimension, and the temporal-spatial correlation in energy dimension, respectively. It is noted that the transition matrix of the transmitted power state at PUs B(H E t ) and the transition matrix of the harvested energy state at SUs D(H E t ) are the function of H E t , because the state transition of the transmission energy at PU H T t and the state of the harvested energy at SU O E t depend on multiple parent nodes and cannot be represented by one constant matrix. Here, a i,j , b i,j,m , c i,j , d i,j,m are the specific conditional transition probabilities in the transition matrices, which are also the parameters of DBN W DBN = {a i,j , b i,j,m , c i,j , d i,j,m }. Next, these parameters will be calculated by offline training algorithm of DBN.

2) OFFLINE TRAINING OF DBN
After establishing the probabilistic graph of DBN, we propose an offline training algorithm to calculate the parameters of DBN. Note that the parameters of DBN W DBN = {a i,j , b i,j,m , c i,j , d i,j,m } contain the transition probabilities of each state in the probability graph model HMM, and the VOLUME 8, 2020 values of the parameters all range from 0 to 1. The parameters can be trained by using the iterative parameter estimation [11]. To be more specific, let W k DBN denote the estimated parameters in the kth iteration. The initial parameters within W 0 DBN in the iteration k = 0 are randomly initialized. Then, in each iteration, the newly estimated parameters W k DBN are calculated by historical states and previous parameters W k−1 DBN . The iteration process continues until the parameters converge.
In particular, the expectation of log-likelihood function L(W k DBN ; W k−1 DBN ) is used to evaluate the gap between the parameters of the kth and the (k − 1)th iteration, and the expected log-likelihood is non-decreasing during the iteration algorithm. The estimated parameters in W k DBN are obtained by maximizing the expectation of log-likelihood function L(W k DBN ; W k−1 DBN ) in each iteration when the index of iteration k > 1, which can be expressed as where i,j,m are the conditional probabilities in the state transition matrices A k , B k (H E t ), C k , D k (H E t ), respectively, which can range from 0 to 1. From (32), P( is a constant for slot t, and the maximum optimization problem can be separated into four subproblems, namely the last polynomial additions, to optimize parameters a k i,j , b k i,j,m , c k i,j , d k i,j,m , respectively. Following [11], the optimized values of the parameters in each transition matrices at the iteration k can be calculated as Note that the iterative algorithm is only guaranteed to converge at a stationary point, not at a global maximum point, which is related to the initialization of estimated parameters. This is because the problem (31) is solved in each iteration based on the condition parameters, and the global optimal parameters in each iteration are obtained based on the parameters in the previous iteration. It is known that the parameters in the initial iteration is determined randomly, so that the convergence point of the iteration algorithm is related to the initial parameters. In this case, the iterative algorithm needs to be repeated several times with different initial values. Then the parameters corresponding to the maximum value of log-likelihood function is selected. Assume that the number of iterations is R. In summary, the offline training algorithm of DBN is given as Algorithm 1.  Record the expectation of log-likelihood value and the parameters as L(W r DBN ; W r−1 DBN ) and W r DBN , respectively. 10: end for 11: Output the DBN parameters with maximum expectation of log-likelihood value from the record W m DBN .
After the training process of DBN in Algorithm 1, the conditional transition probabilities a i,j , b i,j,m , c i,j , d i,j,m are calculated, which are the parameters of DBN. In other words, the energy-spectrum correlations characterized by the optimal state transition matrices A, B(H E t ), C, D(H E t ) are obtained. With the correlations, the hidden states of PU can be estimated in the 2D sensing module.

3) ONLINE 2D SENSING BY TRAINED DBN
Based on the HMM structure established for EH-CRN, SU can use the observed information, i.e., spectrum sensing of received signals O S t and SU's harvested energy O E t , to infer the hidden information, i.e., PU's harvested energy H E t and transmission energy H T t . The 2D sensing process could be achieved by maximizing the condition probability of the hidden states according to the problem formulation in (11). Hence, the estimation values of the hidden states under 2D sensing are calculated as (Ĥ E As shown in Fig. 5, DNN is proposed to build connection from the future states of PU and SUĤ E ,Ĥ T ,Ô S ,Ô E ,Ô R to the transmission energy of SUD T , which performs data interaction with communication environment to enhance the network throughput. In the proposed DNN, there are three layers in the neural network, which are the input layer, hidden layer and output layer. In the input layer, the prediction results are input to the neural network, while in the output layer, the transmission energy of SU are output after the computation of DNN. We define W DNN = {w in , w out } as the parameters of the DNN. The parameters need to be trained by using historical dataset, which will drive the DNN to output the optimal transmission energy.
where f D denotes the mapping from the prediction results to the transmission energy, which is achieved by trained DNN.

D. COMPLEXITY ANALYSIS
According to Algorithm 5, our proposed ML-based energy-spectrum 2D cognition algorithm are summarized, which consists of the offline training process of DBN and DNN, and the online cognition process. The complexity of the each process is analyzed as follows.

1) TRAINING OF DBN
DBN needs to repeat the iterative algorithm multiple times to find a maximum convergence value of objective log-likelihood function. In each iteration, the joint conditional probabilities of states of PU and SU are calculated with an enumeration method from slot 0 to slot t in Algorithm 1. Therefore, the time complexity of the training of DBN is O(RN 2 ), where R denotes the iteration time till a maximum convergence value is found. Compared with the training algorithm in [11], the time complexity is not increased.

2) TRAINING OF DNN
Assume that the numbers of neural nodes in each layer of DNN are K 1 , K 2 , K 3 . Neural nodes in adjacent layers are connected to each other, which leads to the complexity O((K 1 K 2 + K 2 K 3 )N ) in the back-propagation training in Algorithm 4.

3) ONLINE 2D COGNITION
With the trained DBN and DNN, 2D cognition are realized by 2D sensing, 2D prediction and 2D decision in order. 2D VOLUME 8, 2020 Input the prediction results into DNN to obtain the transmission energy at SU D T t+n in the decision module. 9: if t ≥ n then 10: Transmit data with the transmission energy D T t calculated at slot (t − n). 11: end if 12: end for sensing performs a recursive calculation of joint probability according to (37)-(39) in Algorithm 2, which leads to the complexity O(N 2 ). Note that 2D prediction in Algorithm 3 and 2D decision by DBN are all forward sequential operations, the complexity of the two modules are O(K 1 N ) and O((K 1 K 2 + K 2 K 3 )N ), respectively.

V. PERFORMANCE EVALUATION
In this section, we present the performance evaluation of an EH-CRN to demonstrate the effectiveness of our proposed ML-based energy-spectrum 2D cognition.

A. SYSTEM SETUP
As the description of the network structure in Section II, an EH-CRN with one PU and one SU is considered, and the primary and secondary transmitters are assumed powered by harvested energy from the same renewable source, e.g., solar energy and wind energy.
We will illustrate the performance superiority of our proposed ML-based 2D cognition in sensing, prediction, and decision, respectively. Following the parameter setting in [11], the values of harvested energy of both PU and SU are quantized as {0, 1, 2, 3}, the channel noise level is set as σ 2 = 3 dBW, and the channel path loss between primary and secondary transmitters is −4 dB.

B. 2D SENSING
We use the receiver operating characteristic curve (ROC) to present the sensing performance under different sensing schemes. The ordinate of ROC is the missing detection probability P m = 1 − P d , and the abscissa is P f . The smaller area under ROC denotes the better sensing performance. Using two detection SNR values 0 dB and 5 dB, respectively, Fig. 6 shows the relationship between the false alarm probability P f and the miss detection probability P m for both 1D sensing and 2D sensing. It is observed that 2D sensing achieves much better performance than 1D sensing, because SU takes full advantage of the spatial and temporal correlations of energy and spectrum dimensions, and exploits the correlation between two dimensions in our proposed 2D sensing scheme. We also compare the sensing performance of 2D sensing with 1D energy sensing, in which the SU senses the spectrum hole only through its own harvested energy. As shown in Fig. 7, the 2D cognition scheme generates fewer false alarms with fixed miss probability (P m = 0.1). It is worth noting that there is a U-shape relationship between the false probability and PU loading of 2D cognition, where the PU loading  means the probability that the PU has data to transmit when enough energy is harvested. This is because that when the PU loading is low, the extra information about the harvested energy interferes the inference of PU behavior. As PU loading increases, the importance energy sensing in 2D cognition increases gradually. However, at the same time, the role of spectrum sensing decreases gradually. Continuously increasing PU loading will bring imperfections in 2D cognition. Therefore, there will be a compromise between the role of PU energy sensing and the role of spectrum detection, resulting in the appearance of U-shaped 2D curves. On the other hand, 1D energy sensing does not include PU spectrum detection, and hence there is no compromise in 1D energy sensing, thus the performance is monotonous. Note that 1D energy sensing may nearly deliver the same performance as 2D cognition when both the spatial correlation between harvested energy of PUs/SUs and the PU loading are very high.

C. 2D PREDICTION
We further compare the prediction performances of 2D cognition and 1D cognition. HMM is used to predict the spectrum hole in the 1D cognition scheme. Consider that not only the spectrum prediction is a classification problem, but also the consequences of prediction errors are different when the licensed spectrum status is idle or occupied. The miss detection probability P m and the false alarm probability P f can separately represent the prediction effect of two different kinds of spectrum status. In Fig. 8, we also use ROC to present the one-slot look-ahead prediction performance under different cognition schemes. It is observed that there are smaller areas under the 2D cognition curves, which denote that 2D prediction renders much higher prediction accuracy than 1D prediction.
The historical sensing information is further exploited to predict the future N p slots of spectrum status. For instance, N p = 0 means the case of spectrum inference with sensing, and N p = 1 means the case of one-slot look-ahead prediction. As shown in Fig. 9, 2D cognition achieves considerably higher detection probability with fixed false alarm probability (P f = 0.1), which means that the PU is better protected from the harmful interference caused by the SU's transmissions.

D. TRANSMISSION DECISION BASED ON 2D COGNITION
Due to the high correlation of harvested energy between PU and SU, the PU has a high probability to occupy the spectrum when the SU harvests enough energy. Thus, a small energy buffer is equipped at the SU to help SU stagger the transmission to improve the energy and spectrum efficiency. In the simulations for Fig. 10, the energy buffer of SU is set as 10 J and the maximum transmit power of PU is 6 W. The greedy energy allocation policy and the case without buffering [11] are introduced for comparison, whose philosophy is to exhaust all the stored energy in the current slot if the current slot is detected as idle, in order to obtain the maximal instantaneous throughput. It can be seen from Fig. 10 that 2D decision can obtain much better performance of the average throughput per slot than 1D decision with the same protection level for PU (P f = 0.8). When the detection SNR is 5 dB, the performance gains are up to 30% and 132% for the 2D cognition combined with the greedy energy allocation policy and the optimal energy allocation policy, respectively. In addition, extra 24% and 52% performance gains are obtained for the optimal 2D decision compared with the greedy policy and the scheme with no energy buffer.

VI. CONCLUSION AND FUTURE RESEARCH DIRECTIONS
In this article, we have proposed a ML-based energyspectrum 2D cognition scheme and discussed its potential applications to EH-CRNs. DBN is used to model the 2D cognition for EH-CRNs, representing the correlations between SUs and PUs in both energy and spectrum dimensions. In addition, the performance of ML-based 2D cognition scheme is evaluated in three aspects, namely sensing, prediction, and decision modules. The evaluation results illustrate the superiority of 2D cognition compared to 1D cognition. To encourage more research efforts on ML-based 2D cognition for EH-CRNs, the following research directions are envisioned.

A. MULTI-DIMENSIONAL COGNITION
There are other dimensions in EH-CRNs that may provide extra information to improve spectrum cognition performance in different scenarios. For example, different spectrum bands may jointly provide service to a PU, which causes the occupation probabilities of different spectrum bands to be correlated in multi-band CRNs. Moreover, when there are several SUs that need to use the licensed bands, the location distribution of SUs directly affects spectrum availability and spectrum sensing. Spectrum cognition that takes the spatial domain into consideration can obtain more information to improve the spectrum efficiency through multi-user sensing, which requires further study.

B. MACHINE LEARNING-BASED 2D COGNITION WITH MISSING DATA
In practical systems, it is difficult for energy harvesting SUs to obtain the complete energy and spectrum dimensional historical data. For example, SUs cannot detect the licensed spectrum without enough energy, and thus the information about spectrum status is missing when the SUs do not harvest enough energy. The incompleteness of historical data may degrade the performance of spectrum prediction. Therefore, developing efficient 2D cognitive schemes to handle missing data is an interesting topic for future study.