Deep Recurrent Neural Networks for Ionospheric Variations Estimation Using GNSS Measurements

Modeling ionospheric variability throughout a proper total electron content (TEC) parameter estimation is a demanding, however, crucial, process for achieving better accuracy and rapid convergence in precise point positioning (PPP). In particular, the single-frequency PPP (SF-PPP) method lacks accuracy due to the difficulty of dealing adequately with the ionospheric error sources. In order to apply ionosphere corrections in techniques, such as SF-PPP, external information of global ionosphere maps (GIMs) is crucial. In this article, we propose a deep learning model to efficiently predict TEC values and to replace the GIM-derived data that inherently have a global character, with equal or better in accuracy regional ones. The proposed model is suitable for predicting the ionosphere delay at different locations of receiver stations. The model is tested during different periods of time, under different solar and geomagnetic conditions and for stations in various latitudes, providing robust estimations of the ionospheric activity at the regional level. Our proposed model is a hybrid model comprising of a 1-D convolutional layer used for the optimal feature extraction and stacked recurrent layers used for temporal time series modeling. Thus, the model achieves good performance in TEC modeling compared to other state-of-the-art methods.

Since radio wave signals pass through the electrons of the ionosphere, the signal velocity and the ray path change [5].
Consequently, the transmitted signals from the Global Navigation Satellite Systems (GNSS) are directly affected by the ionospheric variations, causing delays [6]. These delays depend on the signal frequency and the electron density along the transmission path. Hence, ionospheric variability introduces an additional error source in GNSS positioning [7]. The use of multiple navigation signals of distinct center frequency transmitted from the same GNSS satellite allows direct estimation of these ionospheric delays. Exploiting the fact that different signal frequencies are affected differently by the ionosphere, an appropriate processing strategy of multiple-frequency GNSS signals, eliminates the ionospheric error [7]. Contrary to multifrequency GNSS receivers, real-time (RT) single-frequency (SF) positioning with a low-cost receiver has received increasing attention in recent years due to its large amount of possible applications. However, in this case, one major challenge is the effective mitigation of these ionospheric delays [8]. RT-SF-standard point positioning (SPP)/precise point positioning (PPP) techniques use ionospheric vertical TEC (VTEC) products released by the International GNSS Service (IGS) RT service [9] to eliminate the ionospheric error and apply corrections to the model as external parameters [1]. However, these ionospheric VTEC products have global coverage.
Many researchers investigate the ionospheric variation, applying regional models. Linear or nonlinear filters are employed to model the TEC values at the regional level. As far as linear modeling is concerned, autoregressive (AR) or AR moving average (ARMA) filters derived from the time series signal processing research community are applied [10], [11]. These filters are linear regressions in the time domain, and thus, they cannot capture the complex and highly nonlinear phenomena that occurred in the ionosphere.
There are various useful indicators for TEC modeling. These parameters (indicators) related to geomagnetic and solar activity conditions are given as follows.
Geomagnetic Indices: Here, the planetary Kp-index and the Ap-index are used, while the disturbance storm-time Dst-index is related with ionosphere short-term changes from several hours to several days [12].
Solar Indices: The sunspot number (SSN) and the solar radio flux F10.7 better suit long-term variations of the order of months (27-day solar rotation) [13].
In this article, we overcome the aforementioned limitation of the linear filers by constructing robust nonlinear regional models for ionospheric variability modeling, applying as external ionosphere corrections. In particular, we propose a spatiotemporal deep learning architecture for electron density modeling, combining a convolutional neural network (CNN) to capture the spatial variability of TEC, and a gated recurrent unit (GRU) for temporal variability modeling. The proposed deep recurrent architecture successfully models ionosphere conditions and estimate the ionospheric delays on the GNSS satellite signals, treating ionosphere variations as a nonlinear time series regression problem.

A. Related Work
An AR model is a common linear regression approach for modeling of TEC values at a regional level [14], [15]. Such architectures model the TEC values based on previously measured electron density data. The limitation of an AR filter is that no external measurable parameters are allowed to be utilized by the model. Thus, ARMA filters have been investigated for TEC modeling [10], [16]. The main difference between an AR and ARMA process is that the first (i.e., AR) models a time series in terms of its own lags (previous observable measurements), while an ARMA filter includes two parts: an AR where previous observations affect the output and the moving average (MA) part where external observations trigger the output. In this context, the work of [11] proposes an adaptive AR process to capture the time variations of the electron density. In addition, the work of [17] combines wavelet transform with AR models for ionosphere variations estimation. However, the main common drawback of these linear models is that they fail to capture the abrupt changes in electron density values due to the linear assumptions made. In addition, linear regressions are not adequate to model abrupt changes existing in TEC signals.
Recently, and mainly due to the advances in artificial intelligence (AI) research, many efforts are utilized to develop TEC models using nonlinear regression architectures based on machine learning algorithms. In this context, a widely used model is the feedforward neural network, consisting of interconnected artificial neurons capable of modeling nonlinear input-output relationships [18]. The feedforward neural network examples for modeling electron density signals are the works of [19], [20], and [21]. In fact, feedforward neural networks are capable of approximating nonlinear ARMA relationships, therefore improving the performance in TEC modeling. Nonlinear ARMA filters with recursive capabilities have been also proposed in [22] and [23]. Other works in this field apply radial basic function (RBF) models [24], with advance nonlinear interpolation capabilities, or support vector machines (SVMs) [25]. The latter are supervised learning paradigms for data classification and regression.
The main limitations of the aforementioned nonlinear regression models are that they present convergence instabilities, especially when a large number of neurons are employed in the network. In addition, there are no recurrent feedback mechanisms among the artificial neurons. Therefore, such filters fail to approximate temporal dependencies and high abrupt changes in TEC time series values with high precision accuracy.
For this reason, recently, deep machine learning has been proposed as an alternative paradigm for regression and classification [26]. Deep learning incorporates multiple hidden neurons and applies advanced learning algorithms, such as input compression and dimensionality data reduction, to handle the computational issues arising when a large number of neurons are considered. Recurrent neural networks (RNNs) are a class of networks allowing connections (feedback) between the nodes (neurons) in order to model the temporal behaviors of a time series signal [27]. Thus, RNNs are capable of handling the time dependencies in TEC modeling. However, RNNs fail to approximate more complex temporal dependencies, presenting also computational issues in computing the gradient during the learning process, especially when a large number of neurons are employed (the so-called vanishing gradient problem [28]).
To address these limitations and simultaneously to retain the advantages of deep learning in approximating temporal dependencies, long short-term memory (LSTM) architectures have been recently proposed [29]. In the context of TEC modeling, LSTM networks memorize temporal correlations of the TEC signals, therefore providing better modeling capabilities [30]- [32].
Although the aforementioned architectures are good modeling approximators of TEC, they mainly focus on the temporal modeling dimension, which is of estimating TEC changes between different temporal epochs. However, a TEC time series signal also depends on the geographic location, and therefore, different weights should be assigned to the model for different station latitudes. This so-called spatial variability cannot be modeled through the traditional LSTM deep learning architectures. In addition, LSTMs have more trainable parameters (e.g., weights) compared to the traditional RNN models. Consequently, they give us, on the one hand, more controllability and, thus, better results but with the cost, on the other hand, of more complexity and the need for large amount of training (annotated) data.

B. Contribution
To overcome the aforementioned issues, in this article, we introduce a new a spatiotemporal deep learning paradigm for TEC modeling. The proposed method combines two deep learning structures: a CNN and a GRU. These two structures are combined together, forming a common trainable model.
In particular, the Earth's ionosphere shows marked variations with latitude, longitude, universal time, season, solar, and geomagnetic activity. Thus, our proposed model feeds all these variables as inputs to the model and relates them under a nonlinear relationship. TEC is characterized by high complexity, and it is space-and time-varying. In order to capture the spatiotemporal behavior of TEC, in our proposed model, measurements of every satellite visible from the observing location (ground station) are taken into account. This means that the model's input data form a spatial-temporal 3-D tensor.
The tensor consists of several time series data corresponding to different single satellite visible from the station over a time window period.
The CNN captures the spatial variability of the TEC values assigning different learnable weights to the following GRU, taking into account all the visible time series data from a ground station. The main operational unit of a CNN architecture is the convolutional kernels, units able to model with high-efficiency spatial signal properties under complex nonlinear relationships [26]. In other words, the purpose of the CNN is to appropriately adjust the model weights to better capture the particularities of each station.
On the other hand, the GRUs model the temporal variability of the ionosphere variability. GRUs are similar deep learning structures with LSTM, however requiring fewer parameters. Therefore, they retain the temporal-dependent capabilities of the LSTM, but they advance in terms of complexity (and, therefore, convergence) and the need for smaller training (annotated) datasets. For this reason, GRUs have been selected in this article for modeling the temporal ionospheric variability. In other words, our model leverages the capability of the CNNs to optimally approximate the spatial particularities of each station, based on a set of weights, along with GRUs able to learn the temporal dependencies of the TEC values.
Another contribution of this article is that our nonlinear model is based on the so-called sequence-to-sequence (Seq2seq) modeling framework; a family of machine learning algorithms to transform one sequence into another sequence. Seq2seq methods have been originated by Google engine for text-/speech-language translation [33], but they have recently been adopted in solving complex time series problems in the power energy domain [34]. The Seq2seq structure permits modeling of the ionosphere values for all satellites in view at every epoch [35]. Given that there are multiple input and output time steps, this problem is referred to as "many-to-many" sequence modeling problem. The Seq2seq modeling approach means that we are able to model not only the electron density values at the next epoch but a sequence of successive epochs.
The remainder of this article is structured as follows. Section II presents a brief background on GNSS and ionospheric variability before providing a formulation of the problem discussed. Section III describes the proposed spatiotemporal deep learning TEC model. Section IV describes the proposed model architecture that combines convolutional and recurrent layers together in an optimal structure. In Section V, an extensive experimental evaluation of the discussed methods is provided, while Section VI closes this article with a summary of findings.

A. GNSS Observations Preprocessing
As we have previously stated, our goal is to construct robust regional models of the ionosphere variability, applied in SF-PPP methods as external ionosphere correction information. During the training of these models, the ground-truth TEC values are taken into consideration. The workflow is given as follows: we first apply a dual-frequency undifferenced and unconstrained PPP model to estimate slant TEC (STEC) values. These values are then separated from satellite and receiver differential code biases (DCBs). Then, having pure STEC values, we convert them to VTEC ones. These VTEC ground-truth (labeled) values are combined with solar and geomagnetic indicators to construct the TEC model using a supervised learning framework [26], as explained in Section II-B. For this reason, in the following, we describe the methodology adopted for VTEC values estimation.
The code P and phase φ observations in a given frequency band f i between a receiver r and a GNSS satellite s are written as [36] P s where P s ( f i) r and φ s ( f i) r denote pseudorange and carrier phase observables, respectively; ρ s r is the geometric distance between the receiver to the satellite; dt r and dt s are the receiver and satellite clock offsets, respectively; d ( f i) r is the frequency-dependent receiver uncalibrated code delay (UCD), while d s f i is the frequency dependent satellite UCD; T s is troposphere delay; I s ( f i) is the line of sight (LOS) ionospheric delay on the frequency f i; δ ( f i) r and δ s f i are the frequency-dependent receiver and satellite uncalibrated phase delay, respectively; N s ( f i) is the phase ambiguity; and s P ( f i) and s φ ( f i) are the sum of measurement noise and multipath error for pseudorange and carrier phase observations.
In the case of dual-frequency GPS observations and assuming the frequencies f 1 and f 2 noted as "1" and "2," respectively, (1) is written as The code biases are commonly referred as Then, The termĨ 1 is computed as The uncombined PPP (UPPP) model computes the ionosphere delay as unknown parameter, in contrast to the traditional ionosphere-free (IF) model that combines multiple frequency observations to eliminate the ionospheric error. Our model is ionosphere-constrained dual-frequency PPP model that estimates the STEC values. These values are then separated from satellite and receiver DCBs, using the products of IGS service [37]. Finally, the VTEC values, called with the variable vtec in this article, are estimated as The above equation evaluates the ground-truth VTEC values. Then, the STEC, denoted with the variable stec, is converted into vtec through the mapping function MF I [38] where R e is the mean Earth's radius, θ is the satellite's elevation angle, and h is the height of the ionospheric layer and usually has been taken about 350 km. The GNSS preprocessing is implemented using the GAMP [36] GNSS processing software. Further details about the experimental setup are provided in Table I.

B. Nonlinear Ionosphere TEC Modeling
In Section II-A, we have estimated the ground-truth values of VTEC through (9). A typical model assumes that the ionosphere is a thin shell above the Earth, located near the mean altitude of maximum TEC (approximately 350 km). The intersection between a signal's LOS and this shell is called the ionospheric pierce point (IPP). In our model, we use as input the IPP points coordinates, noted by x φ and x λ . In addition, the information of daily time hour x dt has been incorporated into the model to boost its performance. Let us denote as vtec s (t) the value of the VTEC at a time instance t for a visible satellite s. Then, we have that In (11) f (·) refers to the nonlinear relationship between the VTEC values and the inputs of solar and geomagnetic indices. It is clear that this nonlinear relationship is actually unknown, and therefore, it is approximated by the proposed spatiotemporal deep learning model. The model parameters (weights) of the model are used to approximate f (·). These weights are estimated through the application of a supervised learning methodology [26]. In (11), variables x SSN and x F10.7 refer to the input solar indices, and x DST , x K p , and x A p are the geomagnetic indices used as input variables of the proposed nonlinear model (see Section I).

III. SEQUENCE-TO-SEQUENCE SPATIOTEMPORAL AI FOR TEC MODELING
As we have previously stated, a spatiotemporal deep learning model is used, in this article, to model the ionospheric variability. The proposed deep learning model consists of a convolutional layer (see Section III-A) and stacked GRU recurrent layers (see Section III-B).

A. Spatial Variability Modeling: The Convolutional Neural Network Layer
The purpose of the convolutional layer is to encode the spatial variability of the input signals by assigning different weights to the model depending on the station latitude. In other words, the output of the convolutional layer is to transform the input signals into a vector representation more suitable for TEC modeling. The weights of the convolutional layer are learnable during the training phase and each input contributes in a different way to the model, for different ground stations. Let us denote as I input the input sequence of data which are feeding to the convolutional layer. Then, the CNN transforms these inputs I input to an encoded vector f M more suitable for TEC modeling Then, the encoded vector f M feeds the GRU. The convolutional layer architecture and its position with respect to the whole deep learning model are shown in Fig. 2.

B. Temporal Variability Modeling: The Gated Recurrent Unit Layer
As we have previously stated, GRUs are simpler forms of LSTM models having fewer gates than the LSTM memory cell [39]. In addition, the LSTM networks are deep learning extensions of the RNN model, better modeling the nonlinear attributes of a time series signal [31]. Therefore, before describing the GRU used for modeling the temporal variability of TEC, we discuss the structure of the RNNs and their stacked version and LSTM networks A stacked recurrent model is an extension to the traditional one consisting of several recurrent layers one stacked over the other [40]. This is presented in Fig. 1. A stacked model has two main types of operation; the in-depth and the temporal operation. The in-depth operation implies that the response of one layer is propagated as input to the next layer. Instead, the temporal operation assumes that inputs at previous time instances trigger the current unit. The stacked approach adopted in this article is applied for all recurrent structures, that is, the RNN, the LSTM and its bidirectional mode, and the GRU.
1) Stacked Recurrent Neural Networks: Recurrent models are powerful tools for time series modeling. The main operational unit of an RNN is an artificial neuron, approximating a nonlinear operation of an inner product of the network weights (parameters) and output responses of other neurons or model input vectors. The difference of an RNN with a traditional neural network model is that, in an RNN, each neuron is also triggered from the response of other neurons at previous time instances, allowing the modeling of temporal dependencies. These neurons are also called hidden states since they are located between the input vector and the output of the model. In particular, the response of an artificial neuron is given by where tanh(·) is the hyperbolic tangent function, referring to the nonlinear operational unit of a single neuron of the RNN. Variables W and U are the learnable weight parameters of the model. The h l (t) is the response of a neuron at the lth level of the network at a time instance t. It should be mentioned h 0 (t) ≡ f M coincides with the output response of the convolutional layer, that is, the vector f M of (12), instead of the previous hidden state h l−1 (t).
Once the top-level hidden state is computed, the estimate of the output vtec s is obtained using (see Fig. 1) where V is a matrix corresponding to the output learnable parameters (weights) of the model. 2) LSTM Structure: LSTM is a special kind of the traditional RNN structure, where each node in the hidden layer is replaced by a more complex structure, called memory cell, instead of RNN's single neuron [41]. The core structure of a single memory cell is presented in Fig. 1. The memory cell contains three different components: 1) the forget gate f (t); 2) the input gate i (t); 3) the cell candidate g(t); and 4) the output gate o(t). For each component, a nonlinear relation to the inner product between the input vectors and respective weights is applied during the training process. In some of the components, the sigmoid function σ (·) is applied, while, in others, the hyperbolic tangent function tanh(·) is used. The forget gate f (t) keeps unnecessary information out of the memory cell, thus separating the worth-remembering information from the useless one [42]. The input gate i (t) regulates whether the information is relevant enough to be applied in future steps for the accurate estimation of TEC values. The cell candidate g(t) activates appropriately the respective state (true or false output from the tanh activation). The output gate o(t) decides if the response of the current memory cell is "significant enough" to contribute to the next cell.
Regarding stacked LSTMs, the additional LSTM layers can recombine the learned representation from prior layers and create new representations at high levels of abstraction. Stacked LSTMs or deep LSTMs were introduced by Graves [43] in their application of LSTMs to speech recognition, beating a benchmark on a challenging standard problem ⎡ When l = 1, the state is computed using the encoded vector f M (see Section III-A) instead of h l−1 (t).
The cell state at time step t is given by where denotes the Hadamard product (elementwise multiplication of vectors).
The hidden state at time step t is given by where σ denotes the state activation function.
3) Bidirectional LSTM: The stacked bidirectional LSTM (BILSTM) has a similar structure with the LSTM model with the difference that it allows bidirectional data processing. Therefore, the operational units of a bidirectional LSTM model are defined by two parts: the forward and backward parts. As far as the forward part is concerned, we have the following equations: and backward

4) Stacked GRU Architecture:
The GRU is simpler form of the LSTM unit, having two control gates: the reset gate r l (t) and the update gate u l (t) (see Fig. 1). The reset gate r l (t) is responsible for determining how much of information to forget. The update gate u l (t) is responsible for determining the worth-remembering information of the previous states that should be forwarded to the next state. Therefore, the gates r l (t) and u l (t) are related with the hidden states h l (t) and h l (t − 1) as follows: In (22), σ is the sigmoid function, and b 1 u and b 1 r are the respective biases of each component for the GRU. Variables W and U are the transition matrices of the lth GRU.
In stacked GRU configuration, a recursive approach is considered as regards the operation of each GRU. A new memory state, denoted ash l (t), acts as the consolidation of the hidden state of the previous layer h l−1 (t) and the previous hidden state h l (t − 1) of the current layer. The consolidated hidden stateh l (t) is given bỹ In (23), function tanh(·) refers to the hyperbolic tangent relationship. Equation (23) means that the consolidated state is related with the output of the hidden state h l (t − 1) at the time instance t − 1 and the output of the previous hidden layer h l−1 (t) at the time instance t. Using the values of the consolidated stateh l (t) and the values of the update gate u l (t), the value of the hidden state of the lth GRU element is estimated In more detail, the GRU-related abovementioned operations are illustrated in Fig. 1.

IV. PROPOSED CNN-GRU ARCHITECTURE FOR VTEC MODELING
A. Implementation Details Fig. 2 illustrates the CNN-GRU architecture. The network configuration setup consists of: 1) the input layer; 2) a 1-D convolutional layer; 3) two stacked recurrent GRUs; and 4) two dense layers (Fig. 2). The GRU accepts a 3-D tensor input. The number of kernels of the convolutional layer is 60 with a size of 5. The first GRU layer consists of 90 filters with a size of 1 × 5, while the second layer consists of 180 filters of the same size. The output is the sequence of VTEC values.

B. Evaluation Metrics
Here, we present the metrics used for model evaluation and comparisons with other linear and nonlinear approximators.
For each satellite s i , visible in ground station, a sequence of VTEC values is produced using the proposed CNN-GRU deep learning model. Then, we compute the absolute difference between the ground-truth vtec s i (t) value, as obtained from the GAMP software, and the estimated from our model vtec t. Based on the values of dvtec s i (t), the following metrics are considered: In the aforementioned equations: 1) mae refers to mean absolute error (mae) for all satellite s i per ground station; 2) mse to the respective mean square error; and 3) min (max) to the average minimum (maximum) of dvtec s i (t) for all visible satellites. Variable T denotes the time period over which the evaluation takes place.
Another evaluation metric used in this article is the percentiles values of (50th, 68th, and 95th) of the dvtec(t)  error distribution per ground station. The error dvtec(t) = |vtec(t) − vtec(t)| is defined as the absolute different between the estimated VTEC value and the ground truth for a station. A 95th percentile quantity means that the values 95% of the VTEC errors are contained, after having sorted all errors in ascending order. Reflecting the error by percentiles is better than by simple mae.

V. EXPERIMENTAL EVALUATION
The experimental setup covers various aspects of the ionosphere phenomenon in order to accurately evaluate the performance of our proposed model. Thus, in our experimental evaluation, we have included stations installed in various latitudes and longitudes at a worldwide level (see Table II). Also, we have included data from different years ranging from 2014 to 2018 to create robust training, validation, and test sets. We have chosen different years under different solar and geomagnetic activities, and also, we have tested different months in order to evaluate our algorithm under different weather and seasonal conditions.
We highlight that our proposed model is not a "pure" GRU model but a hybrid model comprising of convolutional and stacked recurrent GRU layers: the proposed spatiotemporal deep learning paradigm. We experimentally validate the superiority of the proposed CNN-GRU model for TEC modeling in comparison to other deep learning networks (LSTM [31], [44], BILSTM [30], and RNN [45] models) and state-of-the-art techniques for TEC modeling (AR [11] and ARMA [11] models). Also, the proposed method is compared with the conventional global ionosphere map (GIM) and IRI2016 global ionosphere models.

A. Data Preprocessing and Experiment Setup
This section describes the preprocessing approach used to extract the recurrent model inputs and outputs based on observables from the global IGS network of permanent GNSS stations. The necessary observation, navigation, precise orbit and clock, IGS ANTEX (igs14.atx), IGS SINEX, ocean tide loading coefficients, and DCBs are fed into the GAMP software for static PPP processing (see Table I). As described in Section II-A, the GNSS observations are preprocessed to estimate the VTEC values. GNSS observables for the years 2014 to 2018 were processed with 1-min data granularity for the selected ground stations (see Table II).
As far as the deep learning models are concerned, they have been trained and deployed using the Python Tensorflow and Keras libraries. The proposed CNN-GRU and the compared deep learning models have been trained using the adaptive moment estimation optimization algorithm (ADAM) [46] with a learning rate of 10 −4 . The model weights are updated using a minibatch size of 28 samples at each training iteration. We set the maximum number of epochs for training to 200, that is, the maximum number of training cycles of the network. In our experiments, we use a dataset of the years 2014-2018 (a five-year dataset). This dataset is divided into three subsets: the Training, the Validation, and the Test Set. The training set is used for the estimation of the model parameters (weights) during the learning process. The validation set assesses the performance of the model during the learning process on a set different than the one used for training to avoid overfitting. Finally, the test set evaluates the model on data that have not been used during training. In our experiments, the training set consists of 70% of the total available data, while both the validation and the test sets consist of the remaining 15% of the total data. Both training and validation sets cover the period between the years 2014-mid-2018. Data from the second half of 2018 are used as test data.
The AR and ARMA models [11] (see Section I-A) are implemented in Python using the statsmodels library. The order of the AR part for both AR and ARMA is selected to be 60, while the order of the MA part equals 5 in the case of the ARMA model.  SIX SELECTED STATIONS (GRAZ, IQAL, MAL2, QAQ1, RAMO, AND  TIXI) AMONG THE RNN -BASED METHODS AND TWO OTHER COMMONLY USED METHODS (ARMA AND AR) FOR OCTOBER 9, 2018. MAE AND  MSE METRICS ARE THE AVERAGE VALUES OF THE INDIVIDUAL PRN METRICS IN EVERY STATION, WHEREAS MIN AND MAX ARE THE  MINIMUM AND MAXIMUM MAE VALUES OBSERVED FOR THE INDIVIDUAL PRNS. THE BEST VALUES ARE PRESENTED IN BOLD

B. Performance Evaluation and Comparison
In this section, we compare the various learning architectures to model the temporal dynamics of the ionospheric TEC. In particular, we implement and evaluate the traditional RNN, the unidirectional LSTM, the BILSTM, and the proposed CNN-GRU deep learning model (see Section III). We also compare our model performance against linear regressors of AR and ARMA. The comparison is carried out in terms of performance accuracy and computational efficiency. Table III shows the performance evaluation metrics, as described in Section IV-B, of: 1) mae in TEC units (TECU); 2) mean squared error (mse) in TECU 2 ; and 3) minimum (min) and maximum (max) TEC values again in TECU. The results in Table III have been calculated for the six stations (see  Table II) and a randomly selected testing period of a whole day (October 9, 2018). In this table, we have also depicted the performance of the four compared nonlinear architectures (i.e., the proposed CNN-GRU, the BILSTM, the LSTM, and the RNN) along with the traditional AR and ARMA models. As is observed, the proposed CNN-GRU model has the best performance, in most cases, with a mae value between the interval [0.7 − 1.8]. On the other hand, the worst performance is for the AR and ARMA linear models with a mae value between  Fig. 3 illustrates the Quantile-Quantile (Q-Q) plots for the "iqal" station between the ground truth and the evaluated deep learning architectures. In the case of a perfect performance, the points of the Q-Q plot will lie on the line y = x (the red line in Fig. 3). The closer the data points are to the red line, the better is the model accuracy. Thus, the proposed CNN-GRU method has a better performance compared to the other ones since the blue points are closer to the red line. We also observe that higher vtec values are more challenging to be accurately modeled. However, again, the proposed CNN-GRU model achieves better performance for these high vtec values compared to the other models.  Table IV illustrates the 50%, 65%, and 98% percentile scores for the proposed and the compared nonlinear models. The proposed models of the "graz," "iqal," and "tixi" ground stations have the best performance, whereas the model of the "mal2" ground station presents the worst one. Again, the proposed CNN-GRU model has a better performance against the compared architectures. Based on the values of Table IV, we have concluded that the proposed CNN-GRU model, in the case of 98% percentile, achieves an improvement of 16.77% with respect to the RNN architecture and 12.71% and 9.49% with respect to the LSTM and BILSTM models, respectively. This means that the proposed CNN-GRU modeling error is not as large as in the case of the compared models. Table V illustrates the average mae error in TECU over the six stations for the training and validation sets. The results have been obtained for all the nonlinear models. In most cases, the proposed CNN-GRU model has a better performance than the other methods. In particular, from Table V, CNN-GRU Fig. 3. Q-Q plots for the "iqal" station between the ground truth and the other deep learning techniques used for comparison. It should be mentioned that the improvement in the validation set is more significant than one of the training sets since the latter refers to data that they have not to be used during training. The processing time in minutes and the number of the required trainable parameters per method are listed in Table VI. As is observed, the most efficient is the RNN architecture, which requires 23 min (for 200 training epochs), while the heaviest is the BILSTM model needing 280 min (for the same 200 epochs). The proposed CNN-GRU architecture requires 67 min for its training (again for 200 epochs).
The abovementioned processing time refers to the training phase of the models. It should be mentioned that training is carried out once. After model training is completed and the model parameters (weights) have been estimated, the time needed for modeling the TEC values is negligible. In particular, our trained CNN-GRU model requires 0.96 s to model the TEC values of over a period of a half year (test dataset: the second half of 2018). Fig. 4 illustrates the training and validation learning curves, as computed for four evaluated models, for the "ramo" station. As is observed, all models converge to an acceptable mae error after 200 training epochs.  we also depict the ground-truth data as a dashed line. The satellites selected in this figure are GNSS visible from the sites: "graz" (mid-latitude), "mal2" (near the equator), and "tixi" (aurora region). Also, the satellites depicted in this figure have been selected in various time intervals during the day from each site, focusing more where the highest values of the ionospheric variability are observed. For each station/site, two days are presented in this figure: 1) September the 3rd, which is a day with low ionospheric activity and 2) September the 11th, in which high ionosphere activity is observed. For these two days, the values of Dst and F10.7 parameters are illustrated in the top-left diagram of Fig. 6. As observed from this figure, September 11 is a day of increased solar activity. On the contrary, September 3 is a day with normal solar and geomagnetic conditions. As expected, the model performance on 3/9 is slightly better in contrast to estimations during the of 11/9, in which abnormal ionosphere variability is observed [see Fig. 5]. However, the model catches the   5. VTEC for individual PRNs (unique pseudorandom noise that each GPS satellite transmits) between the GPS ground-truth values and GRU, BILSTM, LSTM, and RNN methods in various hours two different days: a day of low ionospheric activity (3/9) and a day of high ionospheric activity (11/9). VTEC values estimated per satellite from a low-latitude station ("mal2") and a mid-latitude station ("graz") and a high-latitude station ("tixi").
trend of the line in both cases. Also, the model for the "graz" mid-latitude station shows better performance than the respective model of the "tixi" station. Overall, what is immediately noticeable in this figure is the CNN-GRU model's ability to adequately estimate VTEC values for every satellite separately. Fig. 6 shows the predicted VTEC values per station for a time period of a month (September 2018). The predicted values of the proposed CNN-GRU model are illustrated with green color, whereas the red line is the ground-truth data.
The top-left figure shows the F10.7 and DST values for the days of September. As observed, CNN-GRU predicted values achieve good performance. Also, the proposed CNN-GRU model catches the local maximum VTEC values during the days of increased ionospheric activity as mentioned above (i.e., September 11).   Fig. 7 shows the estimated mean VTEC values, as illustrated with a red line, for each station, for four different days between September 9 and 13, 2018, compared to ground-truth values with a black line. Here, we can observe the diurnal changes, as the VTEC values are varying during each day, depending on the electron density in the ionosphere. As observed, the ionospheric delay changes slowly through a daily cycle. It has its minimum values (2−4 TECU) between midnight and early morning and reaches its peak during the daylight hours. The "mal2" stations reach the bigger mean VTEC values   (∼ 30 TECU), whereas the "tixi" station (located in the aurora region) has the minimum observed values. However, the "tixi" station appears with various fluctuations. The mean standard deviation values are 1.20 for "graz," 0.80 for "iqal," 2.81 for "mal2," 0.70 for "qaq1," 1.78 for "ramo," and 1.04 for "tixi" ground station. This is illustrated in Fig. 7 where we have depicted with gray the standard deviation values.

D. Performance Evaluation for Different Months
At every time instance (point), the difference between the ground-truth VTEC values and the estimated ones from the various nonlinear learning models is computed. Thus, for the test time period of half a year (second half of 2018), we have an error time series for every station, deriving from the timewise differences of VTEC values (see Section IV-B). Fig. 8 depicts the distribution of these errors per month for the testing period (second half of the year 2018). Each box plot graphically depicts groups of vtec error values through their quartiles. Outliers are plotted as individual points, with the red cross symbol. The horizontal red line is the median value of the vtec error (50%), whereas the first and third quartiles, that is the 25th and 75th percentiles, respectively, are illustrated in a black dashed line. As observed, the performance evaluation per month is slightly different; however, these differences are not significant. In most of the cases, the proposed CNN-GRU method has a better performance as it concerns the outlier values, which are slightly fewer compared to other methods outliers (difference errors > 75%). In August, there are 18% fewer outliers in the proposed CNN-GRU compared to the RNN method. In the other months, this percentage is 5% for September, 13% for October, 17% for November, and 9% for December. Fig. 9 shows the mean VTEC values at every station, for IRI2016 and GIM TEC estimates compared to the ground-truth values and CNN-GRU TEC estimations, during four days between September 9 and 12. The VTEC maxima are shown for all stations between 8:00 and 12:00 A.M. In most cases, the CNN-GRU predicted values are similar to the ground truth. GIM-aided TEC values are also close to CNN-GRU TEC values. IRI2016 values seem to underestimate the ionospheric variability compared to the ground-truth TEC and TEC values from the GIM model. Fig. 10 shows the correlation coefficient between the proposed CNN-GRU, the compared nonlinear learning methods, the ground-truth GPS TEC data, and the values derived from IRI2016 and GIM models. Yellow blocks indicate high correlation, whereas blue blocks show smaller values for the correlation index r . Overall, the stations "iqal," "mal2," and "'ramo" attain higher correlation values (r > 0.75), whereas "tixi" (located near the equator) performs the worst values of correlation.

VI. CONCLUSION
In this article, we have proposed a combined CNN-GRU deep learning architecture for TEC modeling. The proposed model has been compared with other linear (e.g., AR and ARMA) and nonlinear (RNN, LSTM, and BILSTM) regression methods. Our main conclusions are given as follows.
1) The proposed CNN-GRU model achieves an improvement of 16.77% with respect to the RNN architecture and 12.71% and 9.49% with respect to the LSTM and BILSTM models, respectively, in the case of 98% percentile. This implies the capability of the proposed CNN-GRU network to capture TEC values both in normal and high solar activity periods. 2) The average mae improvement of the proposed model compared to the traditional RNN architecture is 23.9% with respect to the validation set, a set that it has not been used during training. In the case of the BILSTM model, the mae improvement is 12.65%. 3) The computational complexity of the proposed model is retained low although its performance accuracy is higher.
In particular, as far as model testing is concerned, our CNN-GRU architecture requires less than 1 s (0.96 s) to model the TEC values over a period of a half year. Instead, regarding training, our model requires about 67 min to be trained for 200 epochs in contrast to the BILSTM architecture where training is executed within 280 min for the same number of epochs. It should be mentioned that the training of the model is carried out once. 4) The mean standard deviation value of mae of our model is 1.38 over all the six examined ground stations. The worst performance is for the "mal2" station, which is located near the equator, while the best is for the "graz" station (mid-latitude). As future work, we intend to assess the accuracy of the TECs computed by the model in the position domain, with precise coordinates of the IGS stations. Also, we will further investigate model reliability, given as inputs GNSS observational data from a regional network of stations close to each other, to examine the extent to which such models capture the regional anomalies. In addition, more complicated deep learning architectures, such as semisupervised learning [47], [48] and/or generative adversarial networks (GANs) [39], can be examined to see if they can improve the results while retaining small computational cost.