Recovering Missing Values From Corrupted Historical Observations: Approaching the Limit of Predictability in Spectrum Prediction Tasks

Spectrum prediction is a key enabler of a range of emerging applications, from adaptive spectrum sensing to agile and proactive decision making, for dynamic spectrum access. Due to the fact that spectrum sensors easily experience an issue of missing readings and anomaly pollution, spectrum prediction with incomplete and corrupted historical observations has caused extensive concern. In this paper, we aim to tackle the challenging problem on how to accurately and efficiently recover the missing values from corrupted historical spectrum observations for approaching the limit of predictability in the spectrum prediction tasks. Specially, we first formulate a hankelized time-structured spectrum tensor that can naturally preserve both spectral and temporal dependencies among the historical spectrum observations. We then model the spectrum data recovery problem as tensor completion with its latent low-rank structure and sparse anomaly property. To efficiently solve the optimization problem, we design a robust online spectrum data recovery algorithm based on the alternating direction method. In addition, we also introduce the concept of maximum predictability to reveal the harmful effects of missing data and anomalies, as well as to evaluate the effectiveness of our proposed algorithm from an information theory perspective. Extensive experimental evaluations on the real-world spectrum observation dataset show that the proposed algorithm achieves significantly better spectrum data recovery performance in terms of both recovery accuracy and computation efficiency. It also indicates that not only the predictability of the frequency bands but also the prediction accuracy of any predictive algorithm can be improved by the proposed algorithm.


I. INTRODUCTION
A. BACKGROUND Over the past decade, the contradiction between spectrum shortage and spectrum under-utilization has motivated the active research on dynamic spectrum access (DSA). As a The associate editor coordinating the review of this manuscript and approving it for publication was Faisal Tariq . key enabling technique for DSA, cognitive radio (CR) provides the capability of harnessing the under-utilized spectrum resources in an opportunistic manner. The fundamental step of implementing CR technology is to capture the relevant information about the spectrum state evolution. It is based on spectrum data analytics, such as spectrum sensing and spectrum prediction. Briefly speaking, spectrum sensing detects the current spectrum state through signal detection, whereas VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ spectrum prediction infers future spectrum state evolution trajectory by fully exploiting the inherent statistical dependencies and regularities among historical observations. With the ability to foresee the state evolution of radio spectrum, spectrum prediction has attracted much attention during the past few years [1], [2], and has been widely developed for, such as, adaptive sensing scheduling [3], agile decision making [4], and proactive channel switching [5], in CR networks. So far, a diverse group of spectrum prediction methods have been proposed, e.g. Markov model-based spectrum prediction [6], machine learning-based spectrum prediction [7], Bayesian inference-based spectrum prediction [8], and deep learning-based spectrum prediction [9].

B. CHALLENGES
Missing values and anomalies are common in real-world spectrum data. In a typical CR network, large numbers of smart sensors are deployed and served as secondary users (SUs) to monitor the frequency bands in a real-time manner. However, in practical environment, the sensors easily experience an issue of missing readings due to unexpected hardware failures or communication interruptions [10]. Besides the data loss, the observed sensory data are also easily polluted by anomalies occur due to activity from malicious operations, or misconfigurations and failures of network equipments [11]. Previous studies [12]- [14] have demonstrated that the existence of missing values and anomalies in historical spectrum observations will compromise the prediction accuracy of several spectrum prediction algorithms. However, the algorithm-specific evaluations cannot reveal the inherent deficiency caused by missing values and anomalies for spectrum prediction, because the compromise could be incurred by either the structure of algorithms or the missing values and anomalies. Therefore, it is indispensable to introduce an algorithm-free metric to explicitly illustrate the effect of missing values and anomalies on spectrum prediction. The maximum predictability, which refers to the degree to which a correct prediction of the system's future state can be made ignorant of specific predictive algorithm, is a good option. It is derived from information theory by considering both the randomness and regularity of the system. Maximum predictability was previously introduced by Song et. al for analyzing the mobility of mobile phone user [15].
Recently, it has also been introduced into CR networks for analyzing TV band and GSM band dynamics underlying real-world spectrum measurements [16], [17]. When it comes to the scope of spectrum prediction in this paper, we define the maximum predictability as the highest potential accuracy that could be achieved by any spectrum prediction algorithm. In Section II, we analyze the effect of missing values and anomalies reflected on the maximum predictability. Our results indicate that the accuracy of any spectrum prediction algorithm is limited by missing values and anomalies due to their destructive effects on the quality of historical spectrum observations in terms of predictability. To this end, accurately yet efficiently recovering missing values from corrupted historical spectrum observations is non-trivial and challenging in spectrum prediction tasks.
The key point to tackle this challenge lies on how to accurately model the quantitative dependencies of the missing values on the known ones. As the historical spectrum observations are typically expressed in the form of one-dimensional time series, majority of the existing studies have essentially focused on spectrum data recovery from the vector view, by exploiting the dependencies in time domain. Filtering algorithms of various kinds, such as MAF (Moving Average Filter) [18], SES (Single Exponential Smoothing) [19] and Kriging [20] have been developed for spectrum data recovery. In [21], a minimum Bayesian risk-based scheme is also developed for robust spectrum prediction in the presence of sensing errors. However, such approaches can only learn temporal dependencies. It inevitably ignores the spectral dependencies among different frequency bands, leading to an inaccurate estimation in many circumstances. To solve this issue, some researchers have taken consideration of multiple frequency bands at the same time. They treated the historical spectrum observations as two-dimensional spectral-temporal matrices, whose rows correspond to frequency bands and columns correspond to time slots, and proposed various matrix-based methods to estimate the missing values by capturing the inherent low-rank structure of the spectrum matrices. To the author's best knowledge, [22] is among the first that introduces nonnegative matrix factorization to estimate the missing spectrum data in the presence of noise and fading. However, [22] assumes that anomalies are free and it only presents a batch algorithm therein. Moreover, in [23], the authors present a robust online scheme based on matrix completion and matrix recovery for spectrum data reconstruction and spectrum prediction. Although the matrix-based methods can well take advantage of the spectral and temporal dependencies, still they are limited by capturing only local dependencies, ignoring global dependencies such as periodicity and seasonality. Recently, tensor models are also introduced to jointly capture dependencies in time and frequency domains as well as the daily periodicity of the spectrum data. Then, based on it, tensor completion algorithms [24], [25] are developed for long-term spectrum prediction in the presence of missing data and anomalies. Despite their good performance on recovering missing values and separating anomalies, the computational complexity of the batch algorithms can easily become large. This will inevitably compromise the efficiency of spectrum prediction algorithms, since accurate but outdated prediction is useless.

C. CONTRIBUTIONS
In this paper, we propose to formulate a hankelized time-structured spectrum tensor to simultaneously capture the spectral-temporal dependencies as well as uncover latent time-directional structure of the spectrum data. Nevertheless, applying this high-level idea still requires addressing several challenges. The first challenge is that, the historical spectrum observations are easily corrupted by anomalies. It needs some careful design to formulate an optimization problem considering the effects of anomalies for low-rank tensor-based data recovery. Secondly, as the spectrum tensor grows infinitely as time goes by, it deserves an elaborative consideration to efficiently solve the convex optimization problem with a convergence guarantee. To this end, the main contributions of this paper are summarized as follows: 1) We introduce the concept of maximum predictability to represent the upper bound of the potential accuracy that any spectrum prediction algorithm can reach and capture the effects of missing values and anomalies on spectrum prediction from an information theory perspective. We show that the maximum predictability of real-world frequency bands can reach up to 97% on average and degrades significantly with missing data and anomalies. Our findings indicate that the real-world frequency bands are highly predictable but limited by missing data and anomalies. 2) We propose to transfer the historical spectrum observations into a hankelized time-structured spectrum tensor, and introduce CANDECOMP/PARAFAC (CP) decomposition [26] to decouple the interdependency of different tensor modes. We design a robust online tensor completion algorithm by drawing upon recent advances of alternating direction method of multipliers (ADMM) [27] to accurately and efficiently recover the missing values under a circumstance of anomaly pollution. 3) We conduct extensive experiments with a real-world spectrum observation dataset to evaluate our proposed spectrum data recovery algorithm based on hankelized tensor completion (SDR-HTC). Our results show that the SDR-HTC algorithm outperforms the state-of-theart spectrum data recovery algorithms in terms of both data recovery accuracy and computation efficiency. It is also shown that not only the maximum predictability of frequency bands but also the prediction accuracy of various spectrum prediction algorithms are improved by our SDR-HTC algorithm. This indicates the capability of our SDR-HTC algorithm to approach the limit of predictability in spectrum prediction tasks. ). An N th-order tensor can be denoted by X ∈ R I 1 ×I 2 ×...×I N , whose elements can be represented by its symbolic name with indexed as subscripts as x i 1 ,i 2 ,...,i N , i n ∈ {1, 2, . . . , I n }, n ∈ {1, 2, . . . , N }. Accordingly, when N = 2, a matrix can be obtained as X ∈ R I 1 ×I 2 , where each element can be denoted by x i 1 ,i 2 . Similarly, when N = 1, a vector can be obtained as x ∈ R I 1 and the elements of the vector can be denoted by x i 1 . The key variables used in this paper are summarized in Table 1. Some operations are quite essential and will be applied in this paper. Symbol "•" represents the vector outer product, while symbol "⊗" represents the matrix Hadamard product.
(.) represents the transposition operation. . 0 stands for the l 0 -norm, which counts the number of non-zero elements in the tensor. . 1 represents the l 1 -norm, which calculates the sum of the magnitudes of the tensor. . 2 stands for the l 2 -norm, which calculates the square root of the sum of the square of all the elements of the vector. . F denotes the frobenius norm, which calculates the square root of the sum of the absolute square of the elements of the tensor. . * represents the nuclear norm, which calculates the sum of the singular values (SVs) of the tensor.
Tensors, matrices and vectors with the square bracket, e.g.

A[t], A[t] and a[t]
, represent the computed values after performing t-times updates in the online-based subspace learning algorithm described in Section III.
Definition 1 (Slices of a tensor [28]): A Tensor can be represented by a set of two-dimensional slices, which are defined by fixing all but two indexes of the tensor. For a VOLUME 8, 2020 3rd-order tensor X ∈ R I 1 ×I 2 ×I 3 , three kinds of slices, i.e., the horizontal, lateral and frontal slices can be obtained and denoted by (X ) i 1 ,:,: , (X ) :,i 2 ,: and (X ) :,:,i 3 , respectively. In this paper, we typically use X f and X τ to represent the f th horizontal slice and the τ th frontal slice of a tensor X respectively. Definition 2 (Rank-1 tensor): Rank-1 tensor is also called simple tensor or decomposable tensor. The N th-order tensor X ∈ R I 1 ×I 2 ×...×I N is a rank-1 tensor as long as it can be written as the outer product of N vectors as X = v (1) • v (2) • . . . • v (N ) .
Definition 3 (CANDECOMP/PARAFAC (CP) Decomposition): The idea of CP decomposition is to express a tensor as the sum of a finite number of rank-1 tensors. By this way, the N th-order tensor X ∈ R I 1 ×I 2 ×...×I N can ∈ R I n ×R , the CP decomposition of tensor X can also be expressed as X = V (1) , V (2) , . . . , V (N ) .

E. ORGANIZATION
The rest of the paper is organized as follows. In Section II, we present predictability analysis of real-world spectrum observations and evaluate the effect of missing values and anomalies on spectrum prediction. In Section III, we present our model and optimization algorithm. Section IV presents experimental results and analysis, and Section V offers some concluding remarks.

II. PREDICTABILITY ANALYSIS
In this section, we detail the methodology of using statistical entropy measures and Fano inequality to quantify the degree of predictability and explore the impact of missing data and anomalies on predicting the state evolution of the frequency bands.
Entropy is probably one of the most fundamental measure to characterize the degree of predictability of a stochastic process. In general, lower entropy corresponds to the higher predictability, and vice versa. Let y f = {y f ,1 , y f ,2 , · · · , y f ,T } denote the spectrum observation sequence of a given frequency band f occurred at T consecutive time slots, where y f ,t , t ∈ 1, · · · , T is a random variable representing the spectrum observation of frequency band f at time slot t. Considering both the probabilities of the spectrum observations and their temporal order, the entropy of y f is given by where P s f represents the probability of finding a particular time-ordered subsequence s f in the spectrum observation sequence y f . However, the problem of finding all the subsets of a given set has exponential computational complexity. In practice, we use the estimator proposed by Lempel and Ziv [29] that rapidly converges to the value of the entropy. For the T -length spectrum observation sequence y f , the Lempel-Ziv entropy estimator is defined as where t represents the length of the shortest subsequence that starts at the t th time slot and does not appear from time slot 1 to time slot t − 1.
We define the predictability as the probability that an appropriate predictive algorithm can correctly predict the future spectrum state of a frequency band. By invoking Fano's inequality as in [15], this quantity is upper bounded by the maximum predictability max , i.e., ≤ max . Moreover, we can obtain the maximum predictability max of frequency band f through numerical calculation as where Q f represents the number of distinct observation values.
As an illustrative example, Fig. 1 shows the distribution of maximum predictability obtained from a real-world spectrum dataset. The dataset comes from the RWTH Aachen University spectrum measurement campaign [30]. It contains oneweek-length power spectral density (PSD) data sequences of total 14746 sampling frequency points in the 300∼3000 MHz frequency region. The size of the dataset is (339380, 14746). In the following, we first perform data preprocessing. We then determine the entropy and maximum predictability of each frequency point, and finally obtain the distribution of the maximum predictability over all frequency points. Some data preprocessing operations, including resampling and quantization, are quite essential to make it convenient for analysis. In order to speed up the calculation process, we resample the PSD data sequences in the original dataset with a 3 minutes inter-sample time. That is, we obtain a new dataset D 1 of size (3393, 14746) by resampling the original dataset. Due to the high variability of the measured PSD values, we quantize the measured PSD values of each individual frequency point and replace them with corresponding symbols to facilitate further processing. Specifically, we take the maximum value and minimum value of the original dataset as the upper and lower bound respectively, and quantize the PSD values in D 1 into 8 levels equally. The measured PSD values inside each quantization interval will be represented by a corresponding symbol, i.e., 1, 2, 3, 4, 5, 6, 7, 8. For the comparison purpose, we also obtain the maximum predictability of independent identical distributed (i.i.d.) Gaussian noise data sequences with the same size of D 1 and plot their distribution. From Fig. 1, we observe that the maximum predictability of real-world spectrum observations are much higher than that of the i.i.d. Gaussian noise data. This demonstrates that evolution of the frequency bands are highly predictable owing to their distinct temporal correlations or regularities. On the other hand, we also see that the maximum predictability max of each frequency point is lower than 1, which means that regardless of how good our predictive algorithms are, the evolution of the frequency bands cannot be predicted perfectly with no error. Besides, we observe that the average value of max over all sampling frequency points is 0.97. It indicates that the evolution of the frequency bands can potentially be correctly predicted from D 1 with an accuracy of 0.97.
To characterize the impact of missing data and anomalies on predicting the evolution of the frequency bands, we set up another new dataset D 2 by artificially injecting missing data and anomalies into the original real-world spectrum dataset from the RWTH Aachen University spectrum measurement campaign. We first inject anomalies to a portion of randomly selected measured PSD values following the standard anomaly injection method used in existing works [23], [31]. To be specific, we use the exponential weighted moving average method to predict the future entries based on their past values (i.e., y t = αx t +(1 − α) y t−1 , where α = 0.8 in our experiments) and use the maximum difference between the actual and predicted values as the anomaly size to be injected. We vary the fraction of entries to injected anomalies from 5% to 30%. Then we randomly drop a portion of PSD values to simulate the missing data. We vary the fraction of entries to be deleted from 10% to 90%. A noteworthy point is that data preprocessing is also performed with the new dataset D 2 in determining the maximum predictability max . Fig. 2 presents the average maximum predictability over all sampling frequency points for different percentages of injected anomalies and missing data. From the figure we observe that the average maximum predictability decreases significantly with the increase of the percentage of missing data and the percentage of injected anomalies. It indicates that the existence of missing data and anomalies degrades the predictability of the frequency bands due to their destructive effects on the temporal correlation and temporal regularities of the frequency bands. Considering the fact that high prediction accuracy can only be achieved by predictive algorithms when the evolution of a frequency band is highly predictable (i.e., with large maximum predictability), we can also draw a conclusion that the performance of spectrum prediction algorithms are limited by the missing data and anomalies.
So far, we have shown how highly predictable the real-world frequency bands are and the negative effects of missing data and anomalies on spectrum prediction. These findings motivate us in the next section to develop a robust online spectrum data recovery framework to tackle the challenges introduced by missing data and anomalies for spectrum prediction.

III. ROBUST ONLINE ALGORITHM DESIGN
In this section, we first mathematically formulate the optimization problem of our proposed robust online spectrum data recovery framework. Then, we provide detailed solutions and an algorithm to efficiently solve the optimization problem in an online manner.

A. PROBLEM FORMULATION
The objective of spectrum data recovery is to infer the missing or anomalous spectrum data from the obtained incomplete and corrupted spectrum observations. To achieve this objective, we should robustly model the underlying latent structure of the spectrum data. Even if we can keep a matrix form of the spectrum observations similar like [23] to capture the spectral and temporal dependencies among multiple frequency bands, we cannot deal with the similarity among partial temporal variations inside each frequency band. Considering the fact that evolution of frequency bands usually have periodic and seasonality characteristics accompanied with relatively large noise and fluctuation, a novel analysis method uncovering latent time-directional structure inside each frequency band is required. Thus, a hankel matrix representation with the tensor structure plays a crucial role by exploiting a three-dimensional model against noisy and fluctuated signals. To this end, we propose to transfer the spectrum observations into a three-dimensional hankelized time-structured spectrum tensor and analyze it. As shown in Fig. 3, we first transform the spectrum observation sequence of each frequency band into a hankel matrix by embedding a one-dimensional time series into multi-dimensional series. More concretely, we use a one-step sliding window of length W to embed the T -length spectrum observation sequence y f (Fig. 3-i) into a spectrum matrix with the hankel structure ( Fig. 3-ii), which is denoted by [32] where K = T − W + 1. Then, we generate the hankelized time-structured spectrum tensor Y ∈ R F×W ×K by plugging the hankel matrix Y f into the f th horizontal slice matrix of Y, that is (Y) f ,:,: (Fig. 3-iii). By applying tensorization, we can efficiently describe the spectral-temporal dependencies among multiple frequency bands.
There are many factors that contribute to the observations in the spectrum tensor, including signals, anomalies and inherent noise. It is natural to consider the spectrum tensor as a mixture of all these effects. In the following, we further model the spectrum tensor as Y = X + V + E, where X represents the signal of interest, V corresponds to the sparsely distributed anomalies, and E denotes the additive noise.
Missing of spectrum observation is common due to reasons such as the mobility of the spectrum sensors, the limited spectrum sensing capacities, and the transmission losses in the data collection processes. To characterize the missing observations in the spectrum tensor Y, we use the set to denote the tuples available, with a sampling operator P (.) to indicate the corresponding entries. We set the entries of Y that are not in to zero and keep the rest unchanged. Based on the notation, the spectrum tensor can be further expressed as P (Y).
To efficiently model the signal subspace X from the incomplete and corrupted spectrum tensor P (Y), an optimization problem can be formulated by leveraging the inherent low-rank structure of X and the sparsity property of V as where rank (.) stands for the rank of a tensor. ρ is a positive rank-sparsity controlling parameter, and ε is a noise-related parameter.
(X ) f ,:,: ∈ S H represents the constraint of each horizontal slice matrix of X with the hankel structure.
It should be noted that both rank and l 0 -norm criteria are in general NP-hard to optimize. To make the optimization problem in (5) trackable, one efficient way is to replace rank (.) with its tightest convex surrogate, the nuclear norm . * , and replace . 0 with its tightest convex surrogate, the l 1 -norm . 1 . However, it can be easily observed that the replacements result in a non-smooth optimization problem since both . * and . 1 are not differentiable at the origin, and the size of the optimization problem can become quite large with time. To tackle the above challenges, CP decomposition is introduced in this paper to decompose the signal tensor X by leveraging its low-rank property as [33] where R is the rank of X , a r ∈ R F , b r ∈ R K and c r ∈ R W denote the rank-one vectors. A = [a 1 , a 2 where µ r , µ s ≥ 0 are rank-and sparsity-controlling parameters. Finally, we consider an online-based setting of subspace learning to prevent all measurements and model parameters in the past from being stored. In practice, spectrum observations are acquired sequentially in time. The online-based setting of subspace learning can also help to adapt dynamically to the new patterns in the spectrum observations. To this end, we tackle an online tensor completion problem to estimate the CP factor matrices {A, b, C} and the anomaly component matrix V at each time slot t by considering the exponential weighted least squares cost function as Frobenious and l 2 norm regularizer where P τ (.) represents the sampling operator for the τ th frontal slice matrix of the spectrum tensor. µ h ≥ 0 is a hankel structure error-controlling parameter. {Y τ , V τ } ∈ R F×W are the τ th frontal slice matrices of Y and V, respectively. 0 < λ < 1 is the so-termed forgetting factor, which enables observations in distant past to be forgotten with exponentially decaying and the more importance to be assigned to recent observations. In the case of infinite memory λ = 1, the formulation (8) coincides with the batch estimator (7). It should be noted that, as for the hankel structure constraint of S H , we consider only two successive slice matrices to avoid model re-construction and diagonal-averaging using {b τ } τ =t−1 τ =1 in the past.

B. SOLUTIONS AND ALGORITHM DESIGN
In this subsection, we give the detailed solutions of the minimization problem in (8), where the optimization variables are {A, b, C} and V . It is readily seen that the problem is not convex for the joint variables. However, it is convex for each individual variable when the other variables are fixed. Thus, we take an alternating direction minimization procedure to successively optimize one variable with the others fixed under the framework of ADMM [27].
where ω t ∈ R F×W denotes a binary {0, 1}-matrix with ) as the inner objective to be minimized in (9), we can obtain the optimal b [t] by differentiating where , I R is an identity matrix of size R.

2) UPDATE V [t ]
Introducing (8), we further carry out the minimization with respect to V [t] as Here, we decompose the inner objective to be minimized in (11) into a parallel set of smaller sub-objectives, one for each Obviously, these subproblems are typical lasso regression problems, and an efficient solver can be found in [34]. VOLUME 8, 2020

3) UPDATE A [t ] AND C [t ]
In the next step of the alternating direction minimization procedure, we update A [t] and C [t] alternatively by addressing a second-order stochastic gradient based on the recursive least square method with forgetting parameters. We first fix {b, C, V } and carry out the minimization in (8) with respect to A as . The objective function in (13) can be decomposed into a parallel set of smaller problems, one for each row a f of A as [33] min a f Here (14) as α w [τ ] and β w [τ ], respectively. By setting the derivative of (14) regarding a f equal to zero, the optimal a f [t] can be obtained as The derivations of (15), (16) and (17) Similarly, we decompose the objective function in (18) into a set of smaller problems, one for each row c w of C as given in (19), as shown at the bottom of the next page. Here we denote (19) (19) as where J w [t] and U w [t] are calculated by updating their previous values as (21) and (22) shown at the bottom of the next page. After the above mathematical derivations, the overall algorithm to solve the minimization problem in (8) is summarized in Algorithm 1, where an estimated signal subspaceX is finally accomplished by eliminating the negative effects of missing data, and anomaliesV.
Consequently, the total computational complexity at each iteration in Algo- 3 . It indicates that the length of the sliding window W is dominant since rank R is assumed to be small.

IV. EXPERIMENTAL EVALUATIONS
Data preparation: In this section, we use the real-world spectrum dataset from the RWTH Aachen University spectrum measurement campaign [30] for the evaluation of the proposed SDR-HTC algorithm. Specifically, we use the 'IN Subband 1' dataset for all the following experiments. This dataset records the measured PSD-values over 8192 frequency points for one-week duration. Some fundational measurement parameters are shown in Table 2 . To characterize the impact of anomalies and missing data, we artificially inject anomalies to a portion of entries in the original dataset and drop elements in the dataset uniformly and independently using the same way as described in Section II. We use p miss and p anomaly to respectively denote the missing data percentage and the anomaly data percentage.
Parameter setup: For the proposed SDR-HTC algorithm, the chosen ranges of the rank-, sparsity-, and hankel structure error-controlling parameters are derived from the optimality conditions for (8), which indicate that for µ r , µ s and µ h selected from the corresponding ranges, an optimal solution can always be found. Along the lines of [23], [33], we empirically set the penalty parameters as µ r = µ h = 10 −3 and µ s = 6 × 10 −2 . We also set the forgetting factor λ = 0.7 to enable observations in the past to be forgotten with exponentially decaying. The estimated rank is set as R = 10.
Evaluation metric: To quantitatively evaluate the spectrum data recovery performance of our proposed SDR-HTC algorithm, we adopt one of the most widely recognized evaluation matrices, i.e., the normalized root squared error (RMSE) in dB to quantify the magnitude of error as where x n denotes the ground-truth to the estimated valuex n . In the following, we first conduct an experiment to compare our proposed SDR-HTC algorithm with the stateof-the-art spectrum data recovery methods under different missing data percentages and anomaly data percentages. Then, we perform predictability analysis of the recovered spectrum data obtained by our proposed SDR-HTC algorithm compared with the unrecovered spectrum data. Finally, Update the subspace factor matrix A [t]: 6: for f = 1, 2, . . . , F do 7: 8: 9 : 10: end for; 11: Update the subspace factor matrix C [t]: 12: for w = W , W − 1, . . . , 1 do 13: 14: 15: we perform spectrum prediction with various predictive algorithms on the recovered spectrum data to verify the effectiveness of our proposed SDR-HTC algorithm for spectrum prediction. We run all the experiments on a laptop computer with 2.20 GHz Intel Core i7 CPU and 16 GB RAM using MATLAB R2018b.

A. COMPARISON WITH THE STATE-OF-THE-ART SPECTRUM DATA RECOVERY METHODS
In this subsection, we compare our proposed SDR-HTC algorithm with several candidate baseline spectrum data recovery methods in the literature, which can be broadly grouped into three classes: 1-D time-series-based spectrum data recovery methods in time domain only, 2-D matrix-based spectral-temporal spectrum data recovery methods and 3-D tensor-based spectral-temporal spectrum data recovery methods. For the 1-D baseline methods, we select the well-known MAF (Moving Average Filter) interpolation method [18], SES (Single Exponential Smoothing) method [19] and CS (Compressive Sensing) method [35]. They are the earliest yet very effective spectrum data recovery methods, but they cannot deal with the corrupted data. For the 2-D baseline methods, we select the MCMR (matrix completion and matrix recovery) algorithm in [23]. MCMR is a robust online spectrum data recovery method, which also utilizes ADMM for solving matrix completion problem. It provides valuable intuitions for the optimization part in this paper. For the 3-D baseline methods, we select the most recent work in literature, i.e., HaLRTC (high accuracy low rank tensor completion) algorithm in [24]. Notably, the proposed SDR-HTC framework is quite different from HaLRTC at least in two aspects: i) SDR-HTC is robust to anomalies, while HaLRTC is designed without considering the impact of anomalies; ii) SDR-HTC works in an online manner, while HaLRTC is a batch algorithm. Fig. 4 compares the spectrum data recovery error of different methods under a missing data percentage from 0.1 to 0.9 in terms of RMSE. We evaluate the case where the fraction of entries to inject anomalies is p anomaly = 0.1 and the length of the sliding window for our framework is W = 100. We can see that our proposed SDR-HTC algorithm has a higher recovery accuracy than the other five algorithms. Especially, during the interval between p miss = 0.3 to p miss = 0.7, our framework reveals significantly higher recovery capability. We also observe that the recovery error of all methods increase with the percentage of missing data and the SDR-HTC always keeps at a relatively lower level.

1) RECOVERY ACCURACY
In the next, we set the missing data percentage as p miss = 0.2 and investigate the recovery performance under different corruption percentages (from 0.05 to 0.3). The results are shown in Fig. 5. Similar as the observations in Fig. 4 that the proposed SDR-HTC always outperforms the other five algorithms. However, the recovery RMSE of the SDR-HTC is not quite sensitive with an increasing percentage of the injected anomaly data. This result demonstrates the robustness of the proposed framework against sparsely-distributed anomalies.  2) COMPUTATION TIME Fig. 6 compares the computation time of different spectrum data recovery methods. We evaluate the case where the percentage of missing data is p miss = 0.2 and the percentage of injected anomalies is p anomaly = 0.1. Fig. 6 reveals that the proposed SDR-HTC algorithm is super fast comparing to CS and HaLRTC. In summary, the above experimental results suggest that our algorithm can achieve better recovery performance and computation efficiency with both partial and corrupted observations.

3) WINDOW LENGTH
The impact on the different window lengths W = [10,200] for the hankel structure of our proposed SDR-HTC algorithm are evaluated under different missing data percentages and different anomaly data percentages in Fig. 7. More concretely, we evaluate the case where p anomaly = 0.1 and p miss = {0.1, 0.3, 0.5, 0.7, 0.9} in Fig. 7(a) and the case where p miss = 0.2 and p anomaly = {0.1, 0.2, 0.3} in Fig. 7(b). It is shown in Fig. 7(a) that our method appears an obvious ''thresholding phenomena'' that the recovery RMSE is continuously decreased when the window length increases below a certain threshold (e.g. W = 50 for the case of p miss = 0.1). It indicates that there exists an optimal window length for the proposed SDR-HTC algorithm to achieve the highest spectrum data recovery accuracy. We can also see that the threshold is bigger when more spectrum observations are lost. A similar ''thresholding phenomena'' is also observed in Fig. 7(b). However, the threshold only slightly increases when adding more corrupted data. Additionally, as it has been analysed in Section III-D that the window length W is dominant in deciding the computation complexity of our algorithm and a larger window length W is required to achieve the optimal performance for more missing data and anomalies, it can also be concluded that the computation time of our algorithm will increase with the percentage of missing data and anomalies to maintain the optimal performance.

B. COMPARISON OF MAXIMUM PREDICTABILITY
The degree to which that any predictive algorithm can correctly predict the evolution of a frequency point is max . In this subsection, we compare the maximum predictability of the unrecovered spectrum data sequences with that of the recovered spectrum data sequences obtained by our proposed SDR-HTC algorithm. A noteworthy point is that we should ignore the first 10% recovery results when calculating the maximum predictability. This is because the first 10% results VOLUME 8, 2020 cannot reflect the real effectiveness of SDR-HTC due to the instability of the online subspace learning algorithm at the very beginning. Fig. 8 illustrates the comparison for a special case where p miss = 0.5 and p anomaly = 0.2. We can see that for any frequency point in the 20MHz ∼ 1520MHz frequency region, the maximum predictability of the recovered spectrum data sequence always outperforms that of the unrecovered spectrum data sequence. It indicates that our SDR-HTC algorithm can effectively alleviate the effects of missing values and anomalies for spectrum prediction and improve the predictability of the frequency points. Fig. 9 and Fig. 10 further compare the maximum predictability distribution of the unrecovered spectrum data sequences and the maximum predictability distribution of the recovered spectrum data sequences under different missing data percentages and different anomaly data percentages. We first determine the maximum predictability max of each spectrum data sequence and then obtain the distribution of the maximum predictability max over all spectrum data sequences in each case. Fig. 9 depicts the maximum predictability distribution of the unrecovered spectrum data sequences and the maximum predictability distribution of the recovered spectrum data sequences under different missing data percentages; whereas Fig. 10 presents the maximum predictability distribution of the unrecovered spectrum data sequences and the maximum predictability distribution of the recovered spectrum data sequences under different anomaly data percentages.
The above conclusions obtained from Fig. 8 are strengthened in Fig. 9, where we also observe higher maximum predictabilities of the recovered spectrum data sequences over that of the unrecovered spectrum data sequences. It indicates that we can reach a higher potential predictive accuracy if using the recovered spectrum data sequences with the predictive algorithms for spectrum prediction. While the maximum predictability of the unrecovered spectrum data sequences gets lower with the increase of the percentage of the missing data, our recovered spectrum data sequences maintains stable and high maximum predictability, especially  when p miss < 0.5. A similar result applies to Fig. 10, where higher and stable maximum predictabilities of the recovered spectrum data sequences are observed for any percentage of the injected anomalies. Combining Fig. 8, Fig. 9 and Fig. 10, our proposed SDR-HTC algorithm has shown its effectiveness for approaching the limit of predictability in spectrum prediction tasks.

C. COMPARISON OF PREDICTIVE ALGORITHMS
In the last subsection, we compare the prediction accuracy of several predictive algorithms on both the unrecovered spectrum data sequences and the recovered spectrum data sequences obtained via our proposed SDR-HTC algorithm. We employ five predictors: 1) simple forecasting strategy: EWMA (Exponential Weighted Moving Average) [23]  for predicting the frequency point evolution, and compare their performance given the value of the maximum predictability. We evaluate the case where p miss = 0.5 and p anomaly = 0.2, and randomly select a frequency point within the 20MHz ∼ 1520MHz frequency region for the following evaluation. For the selected frequency point, the maximum predictability of its unrecovered spectrum data sequence and the recovered spectrum data sequence obtained by our proposed SDR-HTC algorithm can be easily calculated by (3) as 0.9306 and 0.9563 respectively. Fig. 11 depicts the prediction accuracy of the five predictors. We can see that LSTM reaches the highest prediction accuracy among the five predictors by capturing the multiple features or hidden patterns of the frequency point. While LSTM achieves a spectrum prediction accuracy of 0.8689 with the unrecovered spectrum data sequence and a spectrum prediction accuracy as high as 0.9024 with the recovered spectrum data sequence, its performance is always limited by the maximum predictability max , which corresponds to the definition of maximum predictability in Section II. We also find that a higher prediction accuracy is obtained when performing spectrum prediction with the recovered spectrum data sequence for any predictor. This finding in line with the observations in Section IV-B indicates that our proposed SDR-HTC algorithm can effectively recover the missing values from the corrupted observations so as to approach the limit of predictability as well as provide higher potential prediction accuracy for the spectrum prediction algorithms.

V. CONCLUSION
In this paper, we focus on recovering missing values from corrupted historical spectrum observations to approach the limit of predictability in spectrum prediction tasks. Through analyzing a real-world spectrum observation dataset, we observe that there is a high predictability of frequency bands. We also find that the high predictability is degraded when part of the spectrum observations is missing or corrupted by anomalies. Based on our findings, we propose to formulate a hankelized spectrum tensor to take advantage of the latent time-directional structure of the spectrum data, and design a robust online SDR-HTC algorithm to efficiently recover missing values and separate anomalies. We have done extensive experiments based on real-world spectrum observations to evaluate the performance of our proposed SDR-HTC algorithm. We show that SDR-HTC can achieve a higher accuracy in data recovery with lower computation cost compared with the baseline algorithms. Besides, as for spectrum prediction, SDR-HTC enables to retrieve the high predictability of frequency bands and to improve the performance of any predictor.

ACKNOWLEDGMENT
This article was presented in part at the IEEE/CIC International Conference on Communications, China, in 2020.

DERIVATION OF A [t ] BY RECURSIVE LEAST SQUARES
This appendix describes the derivation of A [t].
Defining F a f [t] as the inner objective to be minimized in (14), its derivative with regard to a f [t] is calculated as Then, by setting this derivative equal to zero and using By introducing (26) and (27) into (25)   In 1992, he joined the Department of Radio Engineering, Southeast University, as a Research Associate. He was an Executive Vice President with the Nanjing Institute of Communication Technology. He is currently a Professor with the National Mobile Communication Research Laboratory, School of Information Science and Engineering, Southeast University. His research interests include 5G wireless systems, optical wireless communication technologies, and cognitive radio. VOLUME 8, 2020