A Two-Phase Transfer Learning-Based Power Spectrum Maps Reconstruction Algorithm for Underlay Cognitive Radio Networks

In the underlay cognitive radio networks, the power spectrum maps (PSMs) estimation is the main challenge in sensing the idle wireless radio resources. Traditional deep learning-based algorithms achieve good estimation performance, under the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the PSMs in the target region. However, collecting the PSMs training data is not an easy task, which is time-consuming and requires a numerous number of sensing devices. For this reason, we propose a two-phase transfer learning generative adversarial network (TPTL-GAN) for the PSMs reconstruction task. The proposed algorithm relaxes the i.i.d. assumption in traditional deep learning-based algorithms, allowing us to estimate the PSMs based on the simulated or previously collected training data, which share similar rather than strictly identical distribution with the target data. In the first phase of the TPTL-GAN algorithm, we design a domain projecting (DP) framework to project the source domain to the adjacent domain. In the second phase, we propose a domain completing (DC) framework, which extracts helpful radio environment features from the adjacent domain and reconstructs the PSMs in the target domain. Through the above two phases, the proposed algorithm provides a more accurate PSMs reconstruction performance than the traditional methods, as verified by the simulation results.


I. INTRODUCTION
With the explosion of wireless communication devices, spectrum resources are facing huge unmet needs. Cognitive radio is a promising idea to realize better exploitation of the spectrum resources [1]. A fundamental application of the CR technology is the cognitive radio networks (CRNs). To overcome spectrum scarcity in wireless communications, the CR transceivers in the CRNs intelligently change their working parameters on the basis of detecting and utilizing the unused radio resources in spectral domain, spatial domain, etc [2], [3].
The primary users (PUs) and the secondary users (SUs) are the two main components in the CRNs [4]. The primary users are licensed users and initially own the right to use spectrum resources. However, the secondary users network is an unlicensed network that aims to dynamically access The associate editor coordinating the review of this manuscript and approving it for publication was Francesco Benedetto . the licensed spectrum. By allowing SUs to opportunistically access a licensed band, the CRNs can achieve better exploitation of the radio resources.
There are two main modes of the radio resources allocation scheme: the overlay mode ( Fig. 1) and the underlay mode (Fig. 2). In the overlay mode, secondary users can access the licensed bands, which are not occupied by primary users [5]. By identifying and exploiting the spectrum ''white holes'', the SUs work in a non-intrusive manner [6]. In the overlay mode, the ''white holes'' are mainly about the unused radio resources in the frequency domain. In order to avoid harmful interference to PUs, if several secondary users find that a licensed user is transmitting in a certain band, all SUs will avoid using that band. To further enhance the utilization of the wireless radio resources, the underlay cognitive radio is regarded as a promising mode.
As for the underlay CRNs, which are also known as the spatial reuse CRNs, the SUs are allowed to access the licensed spectrum if the interference from secondary users does not  degrade the quality of service of the primary users because of the attenuation in the radio propagation path [7]. The SUs exploit the wireless ''white space'' to enhance the utilization of the radio resources. The ''white space'' in the underlay cognitive radio is the unused spectrum resources in the spatial domain.
The underlay mode imposes strict restrictions on the SUs' transmitting power. To meet the above restrictions, there are mainly two methods. As for the first method, the SUs spread their signals over a wide frequency band [5]. The interference to the primary users is under a pre-selected threshold. This method is adopted in the underlay cognitive radio networks using ultra-wide band (UWB) technology, and it is primarily for short-range high data rate communications [5], [8]. Fig. 2 shows that the secondary users share a wide range of spectrum with the primary users in the underlay cognitive radio networks. Regarding the second method, i.e., the interference temperature method, the secondary users are allowed to transmit a higher power in the PUs' bands, provided that the total interference from secondary users is under the threshold [8].
In underlay CR networks, we use the power spectrum maps (PSMs) to enable rapid frequency deconfliction and maximize the radio resources utilization in the target region. The power spectrum map is a powerful tool to show the power spectrum (PS) of the PUs' signals across a finite geographical CRNs region. The PSM is a visible map of the radio environment, which visually overlays the power spectrum information on a geographical map.
The PSMs portray the PUs' power distribution in spectral domain and spatial domain [9], [10]. A powerful central controller (e.g. a base station) is commonly used to collect the spectrum sensing results from SUs, and estimate the power spectrum maps of the target region [8]. Normally, the PUs' transmitting power is stronger than that of SUs, and the SUs have to comply with the power limitation. On the basis of the estimated PSMs as well as the communication requirements of the PUs and SUs, the SUs adapt their transmitting power to minimize the interference to PUs, enabling the SUs to reuse the licensed bands dynamically [11].
As shown in Fig. 3, the general setting of estimating the PSMs includes several transmitting PUs and receiving SUs. We suppose that they are both uniformly distributed in the target region. The secondary users cooperate in estimating the PSMs in an additive white Gaussian noise (AWGN) environment. The variance of the Gaussian noise is assumed to be known, i.e., the noise floor in Fig. 3. We suppose that the number of the secondary users, their receiving power spectrum and the locations are known. However, the number of primary users, their transmitting power spectrum and their locations are unknown. Our goal is to estimate the power spectrum at any position in a particular region by using the known SUs' parameters, i.e., the PSMs of the target region.
As a significant method of the artificial intelligence (AI), deep learning technology has unique advantages in the PSMs VOLUME 8, 2020 estimation task [12], [13]. By learning from the training data set, we can obtain the prior knowledge or helpful information of the radio environment in the target region on the basis of the deep neural network (DNN).
In the supervised deep learning, the training data should be independent and identically distributed (i.i.d.) as the testing data [12]. As for the power spectrum maps estimation task, the incomplete PSMs in the target region are viewed as the testing data, and our goal is to estimate the complete PSMs on the basis of the incomplete PSMs. To meet the i.i.d. requirement in the supervised deep learning, we have to collect the complete PSMs training data in advance in the same region as the testing data [13], as shown in Fig. 4.
To collect the complete PSMs training data, we have to set the power spectrum sensing devices in advance in every position in the target region. However, collecting sufficient training data is not an easy task, which is time-consuming and requires a numerous number of sensing devices. Nevertheless, existing deep learning-based PSMs estimation methods are all based on the complete PSMs, which are hard to collect [13]. The hard-collected i.i.d. training data have become the most challenging obstacle for the deep learning methods in the PSMs estimation task.
Deep transfer learning is a promising technology to solve the above problem, which focuses on transferring the helpful information across different domains [14]. It relaxes the assumption that the training data should be i.i.d. with the testing data. For example, to estimate the power spectrum maps of a residential zone, we have to collect the complete PSMs training data in this zone to meet the i.i.d. constraint, required by the traditional deep learning-based methods [13]. However, as for the transfer learning-based algorithms, if we have already collected the training data from similar residential zones before, which share similar rather than strictly identical distribution with the target region, we can directly use them to estimate the PSMs in the target residential zone. The i.i.d. constraint between the training data and the testing data is relaxed in the transfer learning technology.
In this paper, we propose a novel transfer learning-based PSMs reconstruction algorithm named two-phase transfer learning generative adversarial network (TPTL-GAN) for underlay CRNs. The proposed transfer learning algorithm relaxes the i.i.d. assumption between the training data and the testing data, allowing us to estimate the PSMs based on the simulated or previously collected PSMs, which share similar distribution with the target data. First, we construct the PSMs reconstruction model as a regression model. We solve the regression task as an image reconstruction task through the color mapping process. Then, we propose a two-phase transfer learning GAN algorithm to extract helpful radio environment features from the training data and reconstruct the power spectrum maps of the target region. Finally, as verified by several simulations, our proposed TPTL-GAN algorithm provides more accurate PSMs estimation results than the traditional methods.
The original contributions of this work are as follows: 1) We propose a deep transfer learning-based TPTL-GAN algorithm to estimate the PSMs for underlay CRNs.
The proposed algorithm extracts helpful information of the wireless radio environment from the simulated or previously collected training data, which share similar rather than strictly identical distribution with the PSMs of the target region. Then, we utilize the extracted knowledge as the prior information to estimate the PSMs in the target region. Using deep transfer learning algorithm to estimate the PSMs for underlay cognitive radio networks has not been reported until now. 2) We design a domain projecting (DP) framework in the first phase of the TPTL-GAN algorithm. The DP framework searches and projects the source domain to the adjacent domain, which contains the helpful information to reconstruct the PSMs. In the second phase, we propose a domain completing (DC) framework, which extracts helpful radio environment features from the adjacent domain and estimates the PSMs in the target region.
3) The TPTL-GAN algorithm retains the latent features of the target data and solves the feature losing problem in traditional mapping-based transfer learning algorithms. Compared with the conventional methods, the proposed algorithm has better PSMs estimation performance, as verified by simulations. The rest of our paper is organized as follows. In Section II, we introduce the related works and the basic theory of deep transfer learning. In Section III, we analyze and build the PSMs regression model. In Section IV, we propose a deep transfer learning-based TPTL-GAN algorithm for underlay CRNs. In Section V, we conduct several simulations and analyze the results. Finally, we conclude our research findings in Section VI.

II. RELATED WORKS
On the basis of the known SUs' parameters, estimating the power spectrum at any position in a particular region is an undetermined task. There are actually infinite functions which can meet the SUs' parameters. To compress the PSMs estimation's solution space, traditional methods usually utilize the prior radio environment knowledge of the target area. Conventional PSMs estimation methods include spatial interpolation algorithms and the deep learning-based algorithms.
The traditional spatial interpolation algorithms include the inverse distance weighted (IDW) interpolation algorithm [15], [16] and the Kriging spatial interpolation algorithm [17], [18]. The inverse distance weighted interpolation supposes that the power spectrum maps depend on the distance d idw between receiving SUs and interpolation locations [15], [16]. Let ( 1 d idw ) p id denote the power value p id of the inverse distance, which controls the weights of the known power spectrum towards the unknown locations. In fact, the IDW algorithm is not related to any actual physical process. It is hard to determine whether a specific p id is appropriate or not.
As for the Kriging interpolation method, it estimates the PSMs with the linear combinations of the known power spectrum parameters, expressed by the semi-variogram functions, which can be regarded as the spatial characteristics assumptions of radio environment in the target region [17], [18]. The semi-variogram function (e.g., linear semi-variogram, exponential semi-variogram,etc.) reflects relationships between the average power value differences of different nodes and the distances separating them [19].
The above traditional interpolation methods perform well in some simple environments. In the real wireless environment, the signals attenuate randomly. The signal attenuation is mainly caused by the superposition of the radio propagation loss, the shadow fading effect, and the multi-path effect, etc. The above algorithms, which directly make inappropriate or biased assumptions about the radio propagation features, may lead to inaccurate PSMs estimation results for the practical complicated radio environment. For example, the IDW interpolation method assumes that the power spectrum only depends on the distance between secondary users and primary users [15], [16]. The IDW method has a poor power spectrum maps estimation performance in the urban area, where there is severe shadow fading effect.
On the basis of learning and adjusting the deep neural networks' parameters, deep learning can get infinite approximations of any complicated mappings or functions [20], [21]. By learning the radio propagation features from the training data set, the deep learning-based PSMs estimation algorithm has achieved more accurate PSMs estimation results than the traditional methods [13]. However, the hard-collected complete i.i.d. training data are the most challenging obstacle for the deep learning-based PSMs estimation algorithms.
As introduced in the previous section, transfer learning is a promising technology to solve the above problem in traditional deep learning methods [22]. It transfers the knowledge from the source domain to the target domain by relaxing the hypothesis that the distributions of the training data and the testing data must be i.i.d. [14]. The similarities between the target and the source domains affects the performance of the transfer learning algorithm. The more similarities between the target and the source domains, the more characteristics shared between the target PSMs data and the source PSMs data. The transfer learning algorithms estimate the power spectrum maps of the target region on the basis of the common characteristics. Thus, to improve the PSMs estimation performance in the target domain, we need to choose or construct the training data carefully, and ensure that the two domains share as many common features as possible.
Mapping-based algorithm is a significant category of transfer learning, which has been successfully applied in many applications. Traditional mapping-based deep transfer learning algorithms include Domain Adaptive Neural Network (DaNN) [23], Deep Domain Confusion(DDC) [24], Domain-Adversarial Neural Network(DANN) [25], etc. The basic idea of above algorithms is to project data from the source domain and target domain into a new common feature domain. They are domain adaptation or domain alignment algorithms. The sketch map of mapping-based deep transfer learning is shown in Fig. 5. In the common feature domain, data from source and target domains share the same label predictor. Although the data are different between two origin domains, they can be similar in the elaborate new data space.
The mapping-based algorithms use the data from two origin domains to train the two projectors, and use the labeled data from the source domain to train the labels predictor. However, because of the lack of the labeled data from the target domain, the algorithms inevitably lose some latent features of the target domain during the projecting process. It hurts the label-predicting performance for the data from the target domain.
To solve the feature losing problem in traditional mapping-based transfer learning algorithms, we propose the TPTL-GAN in Section IV. The proposed algorithm projects the source domain to the adjacent space of the target domain and retains the latent features of the target data, which improves the PSMs estimation performance. VOLUME 8, 2020

III. POWER SPECTRUM MAPS REGRESSION MODEL
We suppose that N P transmitting PUs and N S receiving SUs are uniformly distributed in a target area T. Under the AWGN with a known variance, the receiving secondary users try to estimate the PSMs in T. The SUs and PUs are located at , respectively. The signals received and sampled by the secondary user are parsed in blocks. Each block contains N blk samples. In the practical wireless environment, the signals usually vary in temporal domain with the channels. We assume that the channels of the radio environment in the target region are slow fading channels, and N blk is equal to the coherence interval. Thus, the channels can be viewed as unvaried in each time block [26]. Then, the receiving power spectrum of the secondary users are computed per time block. On the basis of the power spectrum, we estimate the PSMs per time block through the proposed TPTL-GAN algorithm. Let denotes the transmitting power spectrum of the primary users. We sup- denotes the unknown wireless radio attenuation function from the PU's location (p, q) to the SU's location (x, y). The known relations about the power spectrum and the corresponding locations are shown in (1). σ 2 denotes the variance of the AWGN in the target region T.
Let P S (f ; x, y) denote the power spectrum at location (x, y), which reflects the total distribution of the PUs' power across the spatial domain and the spectral domain. The PSMs model is shown in (2): Our task is to estimate the PSMs model (2) on the basis of the known PS relations (1), i.e., the power spectrum maps of the target region T.
In fact, the above goal is an undetermined task. There is actually an infinite number of PSMs functions which can satisfy (1). To compress the PSMs estimation's solution space, we need to make full use of the prior knowledge of the wireless spectrum environment in the testing area T.

IV. TPTL-GAN-BASED PSMs ESTIMATION ALGORITHM A. COLOR MAPPING PROCESS
We divide the testing area T into N × N grids, and assume that there is at most one user (one PU or one SU) in each grid. Then we normalize the PS of the secondary users and color the grids in the target area T according to the power components at different frequency points, i.e., we uniformly map the power values of different frequency points to different colors (Fig. 6).
The grids' colors in PSMs images indicate the power components of different frequency points in different locations. The brighter the color is, the larger the power component is. The complete PSMs are the maps where the power spectrums of all grids are known. The incomplete PSMs are the maps where the power spectrums of several grids are known. As shown in Fig. 6, the white squares in the incomplete PSMs represent the grids where there are no secondary users, which are exactly our goals that need to be estimated.
Through the color mapping process, we convert the PSMs estimation task into the images reconstruction task. Therefore, we can adopt the powerful image reconstruction method-generative adversarial network, to solve the PSMs reconstruction problem, i.e., we train the proposed TPTL-GAN to regress for the missing grids in the incomplete PSMs of the target region T.

B. TPTL-GAN ESTIMATION ALGORITHM
The sketch map of the TPTL-GAN algorithm is shown in Fig. 7, which includes two phases: the domain projecting phase and the domain completing phase. The above two phases are both based on deep neural networks. Their detailed frameworks are shown in Fig. 8.
Throughout, we use ''C'' to denote the complete PSMs images and ''I '' for incomplete PSMs images. The superscript ''r'' denotes the true or real PS distribution and ''g'' denotes the estimated results by the deep neural network. In addition, we use the subscript to denote different domains, i.e., S for the source domain, T for the target domain and A for the adjacent domain. For example, C r S denotes the 81236 VOLUME 8, 2020 true complete PSMs images from the source domain, and I g A denotes the generated incomplete PSMs images from the adjacent domain. Thus, the inputs of the TPTL-GAN are the complete PSMs training images from the source domain and the incomplete PSMs from the target domain, i.e., C r S and I r T , as shown in Fig.8.
The data I r T are the known incomplete PSMs in the target domain, which are collected in the target region. However, our goal, the complete PSMs in the target domain C r T , is unknown, which needs to be estimated based on I r T . As for the source domain, our proposed algorithm can extract helpful information from the data, which share similar rather than strictly identical distribution with the PSMs in the target region. Thus, we can use the simulated or previously collected PSMs C r S as the data in the source domain. It should be noticed that the similarities between the target and the source domains affects the performance of the transfer learning algorithm. Therefore,we should select one or more proper wireless radio propagation models (e.g., the inverse polynomial law model, the Okumura-Hata model, etc.) to generate the simulated source dataset according to the target region. For example, if region T is about the city environment, i.e., the urban area with the quasi smooth terrain, we can choose the Okumura-Hata model or/and other similar propagation functions to generate the PSMs data in the source domain. In addition, we should use several sets of different parameters of the radio propagation model to generate the source dataset, which help to improve the generalization performance of the TPTL-GAN. Furthermore, if we have already collected the real PSMs from the similar region in advance, it is also a good choice to extend our data in the source domain by adding the previously collected data into our simulated data set.

1) THE DOMAIN PROJECTING PHASE OF TPTL-GAN
In the first phase of TPTL-GAN algorithm, we design a domain projecting (DP) framework, which searches and projects the source domain to the adjacent domain. The adjacent domain contains the helpful information to reconstruct the PSMs. Compared with the traditional mapping-based transfer learning algorithms which project source and target domains to their common domain, we do not project the target domain in order to avoid the losing of latent features from the target domain during the projecting process.
As shown in Fig.8(a), the inputs of the DP framework are the complete PSMs in the source domain and the incomplete PSMs in the target domain, i.e., C r S and I r T . There are three deep neural networks in the proposed DP framework: the projector (P), the discriminator (D) and the generator (G).
As for the projector, the input I r S is the source domain's incomplete PSM image and the output of the projector is the adjacent domain's incomplete PSM image I g A . I r S is generated by the measurement function M θ (·), which samples lossy measurements from C r S . We design the M θ (·) according to the actual geographical environment and the distribution of the receiving nodes in the target region T. There are many types of M θ (·) which can sample incomplete PSMs images from C r S . Some of them are listed as follows. 1) Random-block-pixels M θ (·): Each pixel of the PSMs image is set to 0 independently with the probability θ.
In addition, θ is assumed to be uniformly distributed, i.e., p θ ∼ U (α 0 , 1). α 0 should be set equal to or less than the white squares' proportion in the incomplete PSMs in the target domain. 2) Random-block-patches M θ (·): Set the pixels inside of θ randomly chosen patches to 0. The size of each patch is set according to the target region T. 3) Random-block-patches-pixels M θ (·): This function is the superposition of the above two functions.
We should design the measurement function M θ (·) according to the shape of the white squares in the colored incomplete power spectrum maps, i.e., the distribution of the secondary users in the testing area T. For example, we use the Random-block-pixels M θ (·) if the sensing nodes are uniformly distributed in area T ; we use Random-block-patchespixels M θ (·) if there are streets and buildings in T. The power spectrum receiving nodes are uniformly distributed in the streets. The white patches represent buildings where there is no sensing nodes. The color pixels represent the PS receiving SUs.
Regarding the discriminator, we design it in the light of the generative adversarial network [20]. We define an adversarial game between the discriminator and the projector. The projector is trained to project I r S to the adjacent domain and generate a high-accuracy estimation of the adjacent domain, i.e., the projector is trained to fool the discriminator. However, the discriminator is trained to identify the PSMs between the target domain and its adjacent domain. The discriminator helps to improve the performance of the projector. With the training process of the DP framework, the identification ability of discriminator and the projection ability of the projector are enhanced continually until achieving a balance, where the discriminator can not tell I g A from the I r T . At this time, the target domain and the adjacent domain share similar latent features, i.e., we successfully find the adjacent domain of the target domain.
As for the generator, we use the projected result I g A to generate the complete PSMs images C g S in the source domain. With the training process of the DP framework, the generation ability of the generator and the projection ability of the projector are enhanced continually until achieving a balance, where C g S is an extreme close match to C r S , i.e., the real PSMs in the source domain. At this time, the adjacent domain retains the latent features from the source domain.
In the light of the Wasserstein GAN with gradient penalty (WGAN-GP) [27], we propose (3) and (4) as the objective function in the training process for the DP framework, according to the intentions of the above framework designs. We use (3) and (4) to train DP framework alternately.
According to (3), we train the discriminator and the projector alternately. The third term in (3) is the gradient penalty in WGAN-GP [27]. It improves the training stability of the DP framework. The coefficient of the gradient penalty is β 1 . I l 1 is the random linear interpolation between the true sample I r T and the generated sample I g A . According to (4), we train the projector and the generator alternately. β 2 is the training coefficient.

2) THE DOMAIN COMPLETING PHASE OF TPTL-GAN
As for the second phase, we propose a domain completing (DC) framework, which extracts helpful radio environment features from the adjacent domain and estimates the PSMs in the target domain. Compared with the traditional mapping-based transfer learning algorithms which use common domain features to estimate the PSMs, the DC framework employs features from both the adjacent and the target domain. The knowledge from the adjacent domain compresses the solution space of the PSMs estimation problem and improves the estimation performance.
As shown in Fig.8(b), the inputs of the DC framework are the complete PSMs in the source domain and the incomplete PSMs in the target domain, i.e., C r S and I r T . There are also three deep neural networks in the proposed DC framework: the projector (P), the discriminator (D) and the generator (G). We use the well-trained projector in the first phase and freeze its parameters in the second phase. We use the well-trained discriminator and generator in the first phase and fine-tune them in the DC framework.
Regarding the discriminator, we define an adversarial game between the discriminator and the generator. The generator is trained to reconstruct a high-accuracy estimation of C r T . Then the reconstructed results are sampled by M θ (·) to generate the incomplete PSMs I g T . The generator is trained to fool the discriminator through the above two processes. However, the discriminator is trained to identify the incomplete PSMs between the true incomplete PSMs I r T and the generated incomplete PSMs I g T . The discriminator helps to improve the performance of the generator. With the training process of the DC framework, the identification ability of the discriminator and the PSMs estimation ability of the generator are enhanced continually until achieving a balance, where the discriminator can not tell I g T from the I r T . At this time, we suppose that the reconstructed result C g T is an extremely close match to our goal C r T , i.e., we successfully estimate the power spectrum maps in the target region.
As for the generator, we fine-tune it to complete the PSMs from the target domain. In addition, we also train the generator to complete the PSMs from the adjacent domain, which compresses the solution space and improves the estimation performance of C r T . According to the intentions of the above framework designs, we propose (5) and (6) as the objective function in the training process for the DC framework. We use (5) and (6) to train the DC framework alternately.
81238 VOLUME 8, 2020 According to (5), we train the discriminator and the generator alternately. The third term in (5) is the gradient penalty term. I l 2 is the random linear interpolation between the true sample I r T and the generated sample I g T . According to (6), we train the generator to minimize the Euclid distance between the true and the estimated PSMs.
The estimation performance of the transfer learning algorithms depends on the similarities between the target PSMs data and the training data. In addition, the number of the training images is another important factor. The more complicated distribution the target data have, the more training data we need. Furthermore, the stronger mapping ability the neural network has, the more training data we need. The neural network may not obtain the true latent features of the radio environment from a small amount of training data, and the overfitting problem may occur. Thus, collecting as many data as possible to enlarge the training data set is recommended, under the constraint of the data similarities. In addition, the data augmentation methods are also recommended, e.g., images flipping, etc.
To avoid overfitting, we use two methods in the training process. The first method is the data augmentation. The input PSMs images are randomly transformed in turn through 3 operations: image transposing, vertically flipping and horizontally flipping, which make the data set 8 times larger than the original set. As for the second method, besides producing images for domain projecting, the measurement function M θ (·) used in the TPTL-GAN algorithm (Fig.8) also brings positive effect on avoiding overfitting, which acts as the Cutout technique [28]. In [28], the authors prove that the regularization method of masking out areas of the input images randomly in the training process, which they call Cutout, can improve the overall performance and robustness of the convolutional neural network.
In addition, we do not use the pooling layers (e.g., maxpooling, average-pooling, etc.) to counter overfitting in the proposed algorithm. The reason is that in the PSMs estimation task, the pixels of the PSMs images denote the power values in their corresponding positions. To sense the ''white space'' of the radio resources in the target region, every pixel matters. However, the pooling process may lead to the loss of the helpful information and the detail features, so we do not employ the pooling process. Using the two methods (the data augmentation and the measurement function M θ (·)) mentioned above are enough to avoid overfitting, as verified in the simulations.

C. THE NEURAL NETWORK STRUCTURE OF TPTL-GAN
There are three deep neural networks in the proposed TPTL-GAN: the projector, the generator and the discriminator. Their neural structures are shown in Fig.9 and Fig.10, which mainly include 4 widely used modules in the deep learning technology: the convolutional layer [29], the activation function [30], the fully connected layer [30] and the residual module [31]. Regarding the generator and the projector, their structures are designed in the light of the auto-encoders [32], as shown in Fig.9. The architecture shares a similar encoder-decoder structure, which exploits the input data and learns abstract characteristics. The encoder part captures the features of the input PSMs and converts them into the latent characteristics representations. Meanwhile, the decoder part uses the latent representations to solve the domain completing and the PSMs reconstruction tasks.
As for the discriminator, a deep convolutional neural structure is utilized to distinguish the true incomplete PSMs from the generated images. The deep convolutional structure plays an important role in deep learning-based computer vision, which has unique advantages in feature extracting of the images [29], [30]. The convolution process in the discriminator exploits the essential features of the PSMs images from the target domain and enhances the identification ability for the estimated PSMs.
In this section, we propose the two-phase transfer learning-based GAN algorithm to estimate the PSMs for underlay CRNs. We design a domain projecting framework in the first phase of the TPTL-GAN algorithm. The framework searches and projects the source domain to the adjacent domain, which contains the helpful information to reconstruct the PSMs. In the second phase, we propose a domain completing framework, which extracts helpful radio environment features from the adjacent domain and estimates the PSMs in the target domain.

V. SIMULATIONS
In this section, we test the PSMs reconstruction performance of the TPTL-GAN algorithm. The simulations run under the Windows 10 system on the Visual Studio Code software. We use the Pytorch framework to conduct the simulations.

A. SIMULATION SETTINGS 1) SETTINGS FOR THE WIRELESS RADIO ENVIRONMENT
In the practical wireless radio environment, the signal usually attenuates in the random fashion, which is mainly caused by 3 factors: the radio propagation loss, the multi-path effects and the shadow fading.
The radio propagation loss is usually built as a deterministic model, e.g., the inverse polynomial law model, the Okumura-Hata model, etc. In addition, we usually construct the shadow fading effect as the log normal distribution model, the knife-edge effect model, etc. The radio propagation loss and the shadow fading belong to the large scale fading category. However, the Rayleigh distribution model is usually built for multi-path effects, which belongs to the small scale fading category. To intuitively display the propagation characteristics extracted by TPTL-GAN, we adopt the large scale fading models to represent the radio propagation features in our simulations, i.e., we use the radio propagation loss model, log normal distribution-based shadow fading model together with the knife-edge effect model to construct the wireless radio environment. As for the small scale fading environment, the TPTL-GAN can also get good PSMs estimation results if the source and target domains share similar small scale fading distribution.
We divide the target region T to 48 × 48 grids and adopt the inverse polynomial law model γ pr = min 1, (d/d c ) −α as the wireless radio propagation model [26]. γ pr is the radio propagation gain from the transmitting PU to the receiving SU. d is the distance between PU and SU. d c is the preselected constants. α is the path loss coefficient, which depends on the wireless radio environment, e.g., α=2 for the free space propagation loss. As for the knife-edge effect, we set a wall represented by the black segment in Fig. 15 and Fig. 16. The i.i.d. log normal shadowing model is employed with zero mean and σ 2 sd variance.

2) SETTINGS FOR THE TARGET DOMAIN
As for the target domain, we assume α=6, d c =1 and σ 2 sd =1 for the target region T. We suppose there are two transmitting primary users working under the AWGN with known variance σ 0 2 , which are located at the grid (13,21) and (33,34). The receiving secondary users are uniformly distributed in the target region T. The number of SUs is about 15% of all grids. The PUs are transmitting random signals. By sampling the PUs' signals, their power spectrum can be obtained based on the periodogram algorithm, etc. In our simulation settings, the power spectrums of PU1 and PU2 are directly set, as shown in Fig. 11, which are centered at 25 MHz and 75 MHz.

3) SETTINGS FOR THE SOURCE DOMAIN
As introduced in the previous sections, our proposed algorithm relaxes the i.i.d. assumption in traditional deep learning-based algorithms, allowing us to estimate the PSMs based on the simulated or previously collected complete PSMs from similar regions. Thus, in our simulations, we simulate the radio environment which shares similar distribution with that of the above target domain. We generate 40,000 PSMs images from two sets of propagation constants: 1)α=1, d c =2, σ 2 sd =0.5; 2) α=2, d c =1, σ 2 sd =2. Each type includes 20,000 power spectrum maps under the AWGN with known variance σ t 2 . The number of the active PUs in the simulation region is randomly selected from 1 to 4 independently. The power value of the PU in each PSM is uniformly dis-  tributed between 0 and 1. In the training process, we use the random-block-pixels measurement function M θ (·) with α 0 =0.15, according to the distribution of SUs in the target region. In the above training data set from the source domain, we assume that the working bands of the PUs do not overlap at the same moment. Thus, there is only one PU in each power spectrum map image in the training set.

4) SETTINGS FOR THE TPTL-GAN
In the domain projecting phase of the TPTL-GAN, the learning parameters are as follows: the Adam algorithm is used for the DP framework training; the learning rates for the projector, the generator and the discriminator are 0.0001, 0.0001 and 0.0004; the gradient penalty coefficient β 1 is set to 10 [27]; β 2 is set to 10; the batch size is 24.
We use two indicators to monitor the convergence during the training of the DP framework: 1) the Euclid distance between C g S and C r S ; 2) the absolute values of the training loss of the discriminator. Fig. 12 and Fig. 13 are the convergence curves in the first phase training of the TPTL-GAN.
In the domain completing phase of the TPTL-GAN,the learning parameters are as follows: the Adam algorithm is used for the DC framework training; the learning rates for the the generator and the discriminator are 0.00002, and 0.00008; the gradient penalty coefficient β 1 is set to 10 [27]; the batch size is 24.
We use one indicator to monitor the convergence during the training of the DC framework: the Euclid distance between I g and I r . Fig. 14 shows the convergence curves of the data from the target and source domains during the second phase training of the TPTL-GAN.
The training of the DC framework converges faster than that of the DP framework, which is because we train the DC framework by fine-tuning the well-trained discriminator and generator from the DP framework. In addition, the curve of the target data declines to the convergence gradually. It verifies that although the distribution of the source data is different from that of the target data, our proposed TPTL-GAN obtains the ability to learn from the source data set and estimate the PSMs for the target data during the training process. As for the curve of the source data, it decreases to the convergence after an initial increase. In the beginning of the training process, the neural network has a good PSMs estimation performance for the source data because it is well-trained in the DP framework. The curve increases because the neural network adjusts its parameters to estimate the PSMs from the target data. Then, the curve decreases to the convergence because the neural network learns to reach a balance between estimating the PSMs from the target and the source domains.

B. THE PSMs RECONSTRUCTION PERFORMANCE OF THE TPTL-GAN ALGORITHM
To test the estimation performance of the TPTL-GAN, we compare the proposed algorithm with the IDW interpolation algorithm, the Kriging interpolation algorithm and the traditional mapping-based transfer learning algorithm: domain-adversarial neural network (DANN) algorithm [25]. We use the Kriging with the exponential semi-variogram. The power value of the inverse distance is set to be p id = 3. We select three indicators to test the TPTL-GAN algorithm: 1) The visual observation of the reconstructed power spectrum maps.
2) The estimated power spectrum of primary users.
3) The deviation of the estimated PSMs against different numbers of secondary users.

1) THE VISUAL OBSERVATION OF THE RECONSTRUCTED POWER SPECTRUM MAPS
The test for the direct visual observation of the power spectrum maps is relatively easy. We input the incomplete PSMs into the well-trained generator in the DC framework. Then  we observe the estimation performance. It is an qualitative and intuitive testing way. The PSMs reconstruction results for PU1 and PU2 at 25 MHz and 75 MHz are shown in Fig.15 and Fig.16. Compared with the true, complete power spectrum maps, the TPTL-GAN algorithm outperforms the Kriging interpolation and IDW interpolation from the direct visual observation of the PSMs of PU1 and PU2, especially in the region near the source of radiation and the area behind the wall.
In addition, the TPTL-GAN algorithm also outperforms the DANN algorithm from the direct visual observation, which demonstrates that the proposed algorithm retains the latent characteristics of the target domain and solves the features losing problem in DANN algorithm.

2) THE ESTIMATED POWER SPECTRUM OF PRIMARY USERS
Regarding the performance of the estimated power spectrum, we compare the TPTL-GAN reconstruction results with the  true primary users' power spectrum. The testing performance demonstrates the estimation ability of the TPTL-GAN for the unused bands.
Compared with the true power spectrum, Fig. 17 and Fig. 18 demonstrate that the TPTL-GAN algorithm has a better power spectrum reconstruction performance than Kriging interpolation, IDW interpolation and DANN algorithm.
As for the proposed TPTL-GAN, it outperforms the DANN algorithm, which is caused by the projection of the target domain in the DANN algorithm. The losing of the latent features of the target data is inevitable in the domain projecting process.
Regarding the Kriging interpolation algorithm in Fig. 17 and Fig. 18, the gap between the true and the reconstructed power spectrum can be explained by the fact that there will always be biased or inaccurate spatial features assumptions of the radio environment (i.e., the semi-variogram function assumptions) about the target region. However, spatial features assumptions of the radio environment are the core factor for the power spectrum maps reconstruction.
As for the IDW algorithm, it assumes that the PSMs only depends on the distance between the estimated nodes and the available nodes. The influences of the available nodes on the estimated nodes are controlled by the power value of the inverse distance, i.e., ( 1 d idw ) p id . Inaccurate power value p id will lead to imprecise power spectrum maps, as shown in Fig. 17 and Fig. 18. In fact, it is quite hard to examine whether a specific p id is appropriate or not.

3) THE DEVIATION OF THE ESTIMATED PSMs AGAINST DIFFERENT NUMBERS OF SECONDARY USERS
is from the imprecise power value setting in IDW, which controls the influence of the receiving users on the interpolation points. In Fig. 19 and Fig. 20, with the increase of the PS measurements, the deviation barely changes from 45% to 95% secondary uses' grids because of the gap between the real and the assumed radio environment features in IDW algorithm.
As for the Kriging algorithm, the deviation increases after an initial decrease in Fig. 19 and Fig. 20 because: (1) The PS measurements contain a little amount of information in the beginning for Kriging. The deviation decreases from 15% to 55% because of the increasing information of the PS measurements from secondary uses. (2) The true distribution of the PSMs show a relatively big gap from the inaccurate semi-variogram assumption with the increase of the PS measurements from 65% to 95% secondary uses' grids. The deviation increases because the more PS measurements from secondary uses, the larger deviations between the characteristics of the true complex wireless radio environment and the Kriging semi-variogram assumptions.

VI. CONCLUSIONS
In this paper, we propose a novel transfer learning-based power spectrum maps estimation algorithm named two-phase transfer learning GAN for underlay CRNs. Based on the domain projecting and the domain completing frameworks, the proposed algorithm relaxes the i.i.d. assumption in traditional deep learning-based algorithms, allowing us to estimate the PSMs based on the simulated or previously collected PSMs, which share similar rather than strictly identical distribution with the target data. Simulation results demonstrate that the TPTL-GAN provides a more accurate PSMs reconstruction performance than the traditional methods.
As introduced in previous sections, the estimation performance of the transfer learning algorithms depends on the similarities between the target PSMs data and the training data. In addition, the practical wireless radio environments are generally considered to be complicated and dynamic. In our future work, we will focus on the extension of the TPTL-GAN to the complicated or even unseen wireless radio environments, which may be achieved by constructing a more general and larger training data set and enhancing the features selection ability of the original TPTL-GAN algorithm.