Bidirectional Recurrent Imputation and Abundance Estimation of LULC Classes With MODIS Multispectral Time-Series and Geo-Topographic and Climatic Data

Remotely sensed data are dominated by mixed land use and land cover (LULC) types. Spectral unmixing (SU) is a key technique that disentangles mixed pixels into constituent LULC types and their abundance fractions. While existing studies on deep learning (DL) for SU typically focus on single time-step hyperspectral or multispectral data, our work pioneers SU using MODIS MS time series, addressing missing data with end-to-end DL models. Our approach enhances a long-short-term-memory-based model by incorporating geographic, topographic (geo-topographic), and climatic ancillary information. Notably, our method eliminates the need for explicit endmember extraction, instead learning the input–output relationship between mixed spectra and LULC abundances through supervised learning. Experimental results demonstrate that integrating spectral-temporal input data with geo-topographic and climatic information significantly improves the estimation of LULC abundances in mixed pixels. To facilitate this study, we curated a novel labeled dataset for Andalusia (Spain) with monthly MODIS MS time series at 460-m resolution for 2013. Named Andalusia MultiSpectral MultiTemporal Unmixing, this dataset provides pixel-level annotations of LULC abundances along with ancillary information.


Introduction
LULC mapping is normally addressed by classifying each pixel in a satellite image into a LULC class, also known as semantic segmentation (SS) in RS images.Frequently, the spatial resolution of an image and the thematic resolution of its LULC legend do not match, which leads to the mixed pixel problem, where pixels are not pure but contain several LULC classes.Accordingly, many methods have tried to estimate the relative abundances of each LULC class in a pixel from the combined spectral signature [1].Such estimation of the spectrum and the abundance of the LULC classes present within each pixel is known as Spectral Unmixing (SU) and is one of the most challenging areas of research in Remote Sensing (RS) [2].Various unmixing approaches, including linear and nonlinear methods, have been developed [3,4].Many of these approaches require the use of the pure spectral signature (the endmember) of each LULC class.However, the acquisition of endmembers might be hard in areas dominated by mixed pixels [5].To overcome this limitation, several methods have been introduced to avoid the need of endmembers extraction [6,2,5,7] as depicted in Figure 1.In the last years, modern DL models have been increasingly employed for addressing SU by directly learning the input-output mapping from the spectra of mixed pixels to their corresponding class abundances.Several studies explored the potential of DL methods for SU in LULC mapping using either single time-step HS data [2,8,9] or single time-step MS data [10].Including temporal information could be a great opportunity to improve SU methods [4] and a few works (see Table 1) have started exploring approaches with MS time series data.However, to the best of our knowledge, none have explored an end-to-end DL solution, where recurrent neural networks (RNNs) and LSTM networks are a perfect fit.
In contrast to traditional methods, the application of DL in SU facilitates the exploitation of ancillary information such as geographic location, topography, and climate.For example, in the field of computer vision, ancillary data has been successfully used by DL models to improve the performance during image classification [11,12,13].However, the introduction of ancillary information remains unexplored in spectral unmixing methods.We hypothesize that injecting such ancillary information could boost the performance of the predictive model in spectral unmixing.This information may help the model understand the spatial distribution and variations in climate of the different LULC types.
The primary problem addressed in this study is the spectral unmixing of LULC classes using MS time series data and ancillary information, and it faces several challenges: • Public labeled datasets with MS multitemporal data for spectral unmixing of LULC classes are not available.
• Creating a new dataset of MS time series plus ancillary information together with LULC abundances annotations is complex, costly and time consuming.
• Remote sensing data usually contains missing values due to atmospheric conditions or sensors' errors, which requires applying robust processing techniques.
Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data • Feeding ancillary information to spectral unmixing models is a promising direction but can be complex.
Ensuring that these data improve the model robustness is a challenge and it is not explored yet.
Given the above mentioned challenges, the main objective of this study is twofold: (1) to create a regional-scale dataset of more than 500, 000 MODIS 460m resolution pixels from Andalusia, Spain, and (2) to develop DL-based approach for SU, without the need of endmembers extraction, that estimates the LULC abundances per pixel using MS time series and ancillary data.This dataset provides for each individual pixel: (a) a MS time series of monthly observations during the year 2013 of the seven spectral bands of the MODIS sensor, (b) ancillary information containing geographic, topographic and climatic variables, and (c) their corresponding LULC class abundances at two different levels of the classes hierarchy, extracted Andalusia's official LULC map (SIPNA [14]).Furthermore, the DL-based method consists of a two branch neural network (NN) where the first branch process the MS time series using a LSTM-based model capable of handling missing values, and the second branch process the ancillary information.A graphical illustration of the followed workflow in this study is shown in Figure 2.
Two assumptions are made in this study: (1) LULC changes within a one-year timeframe are limited at a 460m pixel resolution, so our LULC abundance annotations are assumed to be static.(2) The selected MODIS time series data, ancillary information, and LULC annotations adequately represent the land dynamics of Andalusia, since more than 500, 000 pixels from Andalusia are collected, representing almost the whole region.The constraint includes the challenge of dealing with missing values in remote sensing data, which we solve by proposing a DL method capable of handling missing values.
The motivation behind this research is rooted in the need for improved methods to perform SU in complex, heterogeneous landscapes with MS time series data.The absence of accessible labeled datasets, combined with the complexity of creating new datasets, underscores the significance of developing innovative approaches to advance the field.
The primary contributions of this research can be summarized as follows: • We built Andalusia-MSMTU dataset: a novel MS multitemporal labeled dataset with mixed pixels from Andalusia, a highly heterogeneous region in Spain.Each pixel is annotated with LULC abundances.In addition to the MS multitemporal information, each mixed pixel has its corresponding geo-topographic and climatic information.Such dataset will open the possibility for new explorations.
• We designed and analyzed a DL-based approach that estimates the LULC abundances per pixel of LULC classes from MS time series data with and without ancillary information.

Related work
Firstly, general DL in RS methods are reviewed.Subsequently, related works on spectral unmixing overall, with a specific focus on employing deep learning methodologies, are introduced.Finally, works that build labeled datasets designed for the unmixing approaches are reviewed and comprehensively summarized in Table 1.

DL in RS
Thanks to the recent success of DL methods in many learning tasks, tons of efforts have been made to bring DL to RS field [15].Concretely, LULC classification task is of paramount importance since many environmental applications rely on LULC maps, such as urban planning, forest monitoring, change detection...
Traditionally, only one source of input data was used to perform the classification task, that is only using HS [16], MS [17], LiDAR [18] or synthetic aperture radar (SAR) [19].Recently, multimodal models has emerged with the promise to improve the LULC classification by combinaning the different input data types.[20] addresses challenges in LULC classification using a multimodal deep learning (MDL) framework.It tackles limitations of traditional deep learning in complex scenes, introducing five fusion architectures and emphasizing applicability beyond pixelwise classification to spatial information modeling.Also, [21] introduced MUNet, a multimodal unmixing network for HS images, leveraging LiDAR data to enhance discrimination in complex scenes.MUNet uses a SE-driven attention mechanism, incorporating height differences from LiDAR for improved performance.[22] presented IISU, an illumination invariant spectral unmixing model addressing spectral variability caused by variable incident illuminations.Utilizing radiance HS data and a LiDAR-derived digital surface model, IISU provides explicit explanations for endmember variability, outperforming existing models, particularly in shaded pixels.The proposed model yields more accurate abundances and shadow-compensated reflectance.In [23], authors built the C2Seg dataset for cross-city LULC classification, addressing limitations of DL models across diverse urban environments.Their proposed HighDAN network, employing high-resolution domain adaptation and adversarial learning, demonstrates superior segmentation performance and generalization abilities compared to existing methods.Following the modern self-supervised learning (SSL) paradigm, SpectralGPT [24] is proposed as a novel universal foundation model tailored for spectral remote sensing data, utilizing a 3D generative pretrained transformer.Trained on one million spectral RS images, it accommodates varied inputs, leverages 3D token generation for spatial-spectral coupling, and achieves substantial performance gains across geoscience tasks like scene classification and semantic segmentation.Finally, [25] introduces a subpixel-level HS super-resolution framework, DC-Net, addressing the distribution gap between HS and high spatial resolution MS images.The novel decoupled-and-coupled network progressively fuses information from pixel to subpixel-level, mitigating spatial and spectral resolution differences.Employing a SSL module ensures material consistency for enhanced HS restoration.

Spectral unmixing
The existing spectral unmixing methods can be broadly categorized as linear mixture models (LMM) and nonlinear mixture models (NLMM) according to the formulation describing the underlying mixing process of endmembers [26].
LMM consider that the spectral signature of a mixed pixel is a weighted sum of the endmember spectra and that the weights associated with the endmembers are given by their corresponding relative area abundance in the pixel.LMM-based methods have been widely developed in last decades including linear, geometrical, nonnegative matrix factorization, bayesian and fuzzy models among others [27,3,28,29,30,31].LMM typically assumes that the spectrum of each LULC class is characterized by a single fixed endmember.However, pure pixels from the same LULC class may have different spectra, which is called intra-class variability [32].To overcome this limitation, several multiple endmember spectral mixture analysis (MESMA) models have been developed [33,34,35,36].
Since the extraction of a large number of pure endmembers is still a great challenge in areas dominated by mixed pixels, several works without assuming any prior knowledge about the mixing process were introduced.These methods, also known as blind spectral unmixing (BSU) methods, include independent component analysis [37,38,39], non-negative matrix factorization [40,41,42,43], sparse component analysis [44] or wavelet-based [45] methods.
Given the nonlinear mixing effects of endmembers, NLMM have been proposed through the years to overcome LMM limitations and enhance the spectral unmixing performance.These include bilinear models [46], radial basis function networks [47], kernel-based models [48], neural networks and low-rank tensor [49] methods among others.
Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data

DL in spectral unmixing
Spectral unmixing have also met DL models, which fall under the category of NLMM.One of the first DL approaches for SU was proposed by [50], where they introduced three spectral bands values and the neural network (NN) predicts the abundances of three LULC classes.[51] compared NN, Linear Mixture Models (LMM), and fuzzy c-means for spectral unmixing of LULC classes, being the NN the best model given sufficient training samples.Then, [6] proposed a two-stage NN architecture that first reduces the dimension of the input vector using an auto-associative NN, and performs abundance estimation out of the reduced input using a MLP.Recently, [8] evaluated autoencoders with different hyperparameters.[52] introduced MSNet, a multi-stage convolutional autoencoder network designed for linear HU, achieving this by capturing contextual relationships between pixels.[53] introduced CyCU-Net for HU, enhancing performance by incorporating cycle consistency and self-perception loss.The network, leveraging cascaded autoencoders, preserves detailed material information and achieves high-level semantic preservation during unmixing.[54] introduced SeCoDe, a novel blind HS unmixing model designed for airborne and spaceborne HS imagery.
Leveraging sparsity-enhanced convolutional decomposition, SeCoDe effectively addresses spectral variabilities and maintains continuous spectral components.Going beyond autoencoder-like architectures, [55] introduced Deep HSNet, a novel siamese network for HU that considers diverse endmember properties from different extraction algorithms.
Deep HSNet incorporates a subnetwork to effectively learn endmember information, enhancing the accuracy of the unmixing process.Following the success of transformers architecture [56], [9] and [57] introduced NN architectures with the attention mechanism for abundance estimation.Regarding SSL for spectral unmixing works, [58] proposed a two-stage fully connected SSL network for BSU, addressing challenges of limited supervision and data requirements.
The network jointly estimates endmembers and abundances in the first stage, and learns HS image acquisition physics in the second stage.Also, AutoNAS [59] explored neural architecture search (NAS) for determining optimal network architecture in HU.Utilizing SSL and an affine parameter sharing strategy, it achieves optimal channel configuration.
Further, an evolutionary algorithm enables flexible convolution kernel search.
Regarding RNN-based works, the only work on SU using a LSTM-based network was introduced by [60].They proposed a nonsymmetric autoencoder network with a LSTM component to capture spectral correlation together with an attention mechanism to further enhance the unmixing performance.For a more detailed review of DL methods in spectral unmixing see [26] and [4].
In parallel, there exist few works that incorporate ancillary data to improve the performance of DL models.Most of these studies occur in the field of computer vision, such as high inter-class similarity classification problems [11], plankton image classification [12] or crop type mapping [13].Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data

Labeled datasets based on LULC products
Supervised learning requires high amounts of ground truth data to achieve better generalization.One of the biggest limitations in spectral unmixing is the limited availability of ground truth LULC maps [4,66].Some areas or regions, especially in western countries, have LULC ground truth based on visual interpretation for specific fields of study.For example, SIPNA [14] was intended for territorial planing in Spain.Its annotation was carried out by experts during several years.This dataset can be used to annotate RS data.
In parallel, there exist several annotated MS multemporal datasets prepared for supervised spectral unmixing (see Table 1).However, all of them are private.
Our work is the first in providing a public good quality MS multitemporal mixed pixel labeled dataset, named Andalusia-MSMTU, that includes not only spectro-temporal information but also geo-topographic and climatic ancillary data.Andalucía-MSMTU is organized into two hierarchical levels of classes with four and ten LULC types and it is especially suitable for building umimixing DL-based models for LULC abudance estimation.The proposed methodology constitutes the state-of-the art in Andalusia-MSMTU.

Preliminaries and background
We define a multivariate time series as a sequence of observations X = (x 1 , x 2 , ..., x T ), where T is the number of observations or time steps.Each observation x t ∈ R C where t ∈ {1, ..., T } consists of C variables, such that

Recurrent Neural Network
RNN [67] is a NN architecture specifically designed for handling sequential data.RNN consider the sequential relationship of inputs by using a shared function f to process each input.RNN process the time series using a recurrence approach at every time step t, computing a hidden state h t by considering the previous hidden state h t−1 and the current input x t : where h 0 is normally, at the beginning, the zero vector, i.e., h 0 = 0.
There are several choices on how to process sequential information.In this work, we focus on the LSTM network, which is an improvement of normal RNN solving some of its biggest limitations [68]

BRITS
In time series data and specifically in RS data, it is common to find missing values due to sensor errors, cloud cover and more [69].To handle this situation, there exists a type of RNN architecture that can learn to solve two tasks simultaneously: imputing missing values and classifying the input sequence data.This model is called Recurrent Imputation for time series (RITS) [70].The RITS model perform the imputation algorithm to assist the classification task and obtain the final classification as: where ŷ is the final classification, f out is the classification function, and h T is the last hidden state.
In practice, considering only unidirectional forward dynamic is problematic due to slow convergence, inefficiency in training and bias exploding problem [70].To overcome these issues a bidirectional version named Bidirectional RITS (BRITS) model is proposed also in [69] to learn forward and backward patterns by accessing information from past and future at any given time step.The final scheme of BRITS can be seen in Figure 3.

Study area and data construction
This section describes the study area and provides full details on how the used dataset was built and processed.
Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data

Study area
Andalusia is the second-largest, most populous, and southernmost autonomous community in Peninsular Spain (Figure 4).Andalusia is one of the most biodiverse and heterogeneous regions of Europe.It contains steep altitudinal gradients, and it has a wide variety of landscapes and climatic conditions which results in a vast variety of vegetation types that hold the greatest diversity of plant and animal species in Europe.The long and dynamic history of human activities has also led to a complex landscape configuration with frequent mosaics of small patches of different types of natural, semi-natural land covers and human land uses.Hence, Andalusia offers an ideal field laboratory to test the creation of detailed and fine scale LULC maps containing the abundance of each LULC class per pixel to monitor the socioeconomic and environmental dynamics in complex landscapes using DL and MS time series of satellite imagery.

MODIS pixel time series extraction
The time series data were extracted from two satellites Terra and Aqua using MODIS sensor at 460m spatial resolution and at monthly temporal resolution.As LULC changes during one year are very limited in a 460m pixel, we assume that the LULC abundances are representative of the full year.
Spatio-temporal filtering was applied using MODIS 'Quality Assessment' (QA) flags and the "State QA" flags.Moreover, as the process of Terra and Aqua data filtering generates many missing values, to further reduce the amount of noise in the data, two solutions were employed: (1) the 8-days time series data were transformed into monthly composites by computing the monthly mean from the individual observations, then (2) the monthly data from the Terra and Aqua satellites were combined to generate a merged Terra+Aqua monthly dataset.All this process was performed in Google Earth Engine (GEE) [73] and inspired by [74].

Ancillary data extraction for each MODIS pixel
In addition to the MODIS data, for every pixel we included geographic, topographic, and climatic ancillary information.Pixel longitude and latitude were directly extracted from MODIS metadata.Pixel altitude was obtained using the SRTM 30m/pixel digital elevation model [71].MODIS pixel slopes were calculated using GEE slope calculation algorithm on the same 30m elevation model.Finally, climatic data were downloaded from REDIAM's environmental Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data information [72], including potential evapotranspiration, precipitation, mean annual temperature, mean of the maximum temperatures, and mean of the minimum temperatures.All types of ancillary data came in different resolutions or scale, so to match the resolution of our MODIS pixels we computed the average across all finer resolution pixels inside each MODIS pixels to obtain the value at 460m resolution.

Pixels' LULC abundances annotation from SIPNA
To annotate each 460m MODIS pixel with the abundance of each LULC class, the official LULC map of Andalusia for the year 2013 (SIPNA) [14] was used.Given the coarse resolution of MODIS pixels we only considered level 1 (four classes) and an adapted version of level 2 (ten classes) of the classification hierarchy of SIPNA (Figure 5).Given that SIPNA provides information at sub-pixel level, we calculated the exact abundances of all the LULC classes existing in each MODIS 460m resolution pixel, as illustrated in Figure 6, using QGIS software [75] as follows: the SIPNA polygons were first converted to raster format providing a LULC map at 10m resolution.The rasterized map was then converted to match the spatial resolution of MODIS by counting the number of 10m resolution pixels of each LULC class and dividing them by the total number of 10m resolution pixels inside each 460m resolution, resulting in the class proportions for each 460m pixel of Andalusia.Finally, the MODIS pixels abundances annotations were Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data coupled with their corresponding time series and ancillary data to obtain the Andalusia-MSMTU dataset.With the help of several RS expert, we visually assess that the 10m resolution was suitable for the rasterization.The proposed values were 100m, 50m, 10m and 5m.The 100m and 50m resolution pixels were too coarse to maintain the quality of the different polygon annotations.The 10m and 5m resolution pixels were great to maintain the information and we finally decided to rasterize the polygons to 10m resolution because of computational and time convenience, since the 5m raster was 4 times more expensive than the 10m raster.
In Figure 7 it is showed an example of the calculation of class abundances for a given pixel.An illustrative example of the distribution of abundances of LULC classes in level 1 of the classification hierarchy over the Andalusia territory is displayed in Figure 8, being "agricultural lands" and "terrestrial lands" the classes that dominate the Andalusian territory.Andalusia-MSMTU dataset [76] is available in a public data repository hosted by Zenodo at: https://zenodo.org/records/7752348Formally, we have a set of n MS time series pixels {X 1 , X 2 , ..., X n } with their corresponding class abundances {y 1 , y 2 , ..., y n } where y i ∈ S C , i ∈ [1, n].S C is the sample space of class abundances commonly referred to as the simplex [77].In our case C is equal to 4 and 10 for level 1 and 2 of the hierarchy, respectively.
To enhance class abundance estimation further, in addition to using the MS multitemporal data we also include ancillary information from two types: • Geo-topographic data: Geographical coordinates (longitude and latitude), altitude and slope.Incorporating geographic coordinates can help the model understand the spatial distribution of land cover types, which can be valuable in guiding the spectral unmixing process and making it more contextually accurate.Similarly, adding topographic data (altitude and slope) provides useful information that complements the spectral characteristics of a pixel.In fact, terrain slope is known to influence surface reflectance, so incorporating it into the model can allow slope-related changes in reflectance to be taken into account, making its predictions more robust.• Climatic data: Precipitation, potential evapotranspiration, mean temperature, maximum temperature and minimum temperature.Some land cover classes, such as agricultural lands, forests, and wetlands, respond differently to variations in climate.By using climatic variables, the DL model can distinguish between these climate-dependent classes more effectively.
Below, we describe the architecture of the used model and the evaluation metrics.

Model architecture
Our BRITS-based approach to estimate the class abundances using MS multitemporal data and ancillary information for each mixed pixel is depicted in Figure 9.The proposed approach includes three components: Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data  1.Spectro-temporal feature extraction: We use BRITS model [70] to extract the spectro-temporal patterns in presence of missing values from our dataset.

Ancillary data feature extraction:
To incorporate ancillary information, we process the external information using a linear layer with ReLU non-linearity.

Concatenation and features combination:
The output features of part ( 1) and ( 2) are concatenated and processed by a final dense layer that outputs C (the number of classes) scores.
The final dense layer generates an unbounded outputs o where o ∈ R C with C being the number of classes.Following the work of [78], we applied the softmax transformation to obtain the final abundances predictions a ∈ S C : where a j denotes the abundance prediction for the jth class, o j denotes the final layer's output associated with the jth class, and e denotes the exponential function.
Finally, the NN is optimized by minimizing the mean-square error (M SE) between the abundances predictions and the reference abundances: where r ic and a ic are the reference abundance and the predicted abundance, respectively, for the cth class in the ith sample, and N is the number of training samples.
Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data

Evaluation criteria
To assess the effectiveness of the proposed unmixing model, four regression metrics are examined: • Pearson's Correlation Coefficient (CC): • Root Mean Squared Error (RMSE): • Relative Root Mean Squared Error (RRMSE): • Mean Absolute Error (MAE): where r i is the reference abundance, a i the predicted abundance, and r and a are the mean of both variables.Finally, we also considered F1-score (Formula 9) to evaluate how good is the model in predicting the majoritarian class in each mixed pixel.

Experimental design
To analyse the effect of introducing ancillary data and using different levels of the LULC legend on the performance of our DL approach for spectro-temporal unmixing, we considered different input data combinations, that is: using (1) and geo-topographic and climatic data In order to avoid spatial autocorrelation of neighbouring pixels we used a block train test splitting [79,80].Firstly, we divided the entire Andalusian territory in areas of equal size using blocks of 18x15 kilometers, which means that each block contains 1250 of 460m pixels approximately.Subsequently, 80% of the pixel blocks were assigned randomly to the training set, with the remaining 20% allocated to the test set.Figure 10 illustrates the areas of pixels designated for the training and testing sets.The source code to run these experiments will be available after acceptance at https://github.com/jrodriguezortega/MSMTU.
Implementation details: Our models undergo training using the Adam optimizer [81] for a total of 200 epochs with a batch size of 2048.We initialize the learning rate at 0.003 and progressively reduce it via the cosine learning rate decay scheduler.All experiments were conducted utilizing the PyTorch deep learning framework [82].
TimeSpec4LULC [74] pre-training: TimeSpec4LULC is an open-source dataset comprising MS time series data for 29 LULC classes, designed for training machine learning models.This dataset is constructed using the seven spectral bands from MODIS sensors, providing data at a 460m resolution, spanning the time period from 2000 to 2021.We found that pre-training BRITS model on TimeSpec4LULC dataset and fine-tuning it on Andalusia-MSMTU provides better results that training it from scratch, mainly because of the similarity between both datasets.

Experimental results
This section provides the experimental results of the proposed model at SIPNA level 1 and level 2.
Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data

SIPNA level 1
We evaluated the proposed model in Section 5 on different combinations of spectro-temporal data and ancillary data.
In particular, we considered these combinations: (spectro-temporal data), (spectro-temporal data + geo-topographic data), (spectro-temporal data + climatic data) and (spectro-temporal data + geo-topographic and climatic data).Besides, we also include the results of a a baseline model trained from scratch on spectro-temporal data only to show how the pretraining on TimeSpec4LULC dataset is highly beneficial.The results of these five models in terms of the average MAE, RMSE, RRMSE, CC, F1-score, RRMSE gain, CC gain across the four classes of level 1 are provided in Table 2.
Aditionally, the computational complexity of each model is expressed in terms of MFLOPs in the last column.Firstly, we can see in the first two rows that by just pretraining our model in TimeSpec4LULC dataset improves the results in every metric, proving the value of pretraining DL models in similar tasks to achieve better performance.Secondly, it can be seen that including ancillary information always improves the spectral unmixing performance with respect to the baseline model (using MS time series only and trained from scratch).The highest performance is achieved when including both, geo-topographic and climatic data together with the MS time series showing the lowest MAE, RMSE and RRMSE, with 1.10%, 1.17% and 3.39% of improvement respectively and highest CC and F1-score with 0.0276 and 0.0216 of improvement respectively, with respect to the baseline model.A further analysis of the five metrics for each class is depicted in Figure 11.In general, including the geo-topographic and climatic information improves the abundance predictions of all the classes of level 1.The "terrestrial lands" and "agricultural lands" achieve better performance in terms of CC, F1-score and RRMSE.However, the classes that benefit the most from adding the ancillary data are "artificial" and "wetlands" since the relative improvement is greater in these classes.
It is worth noting that the RMSE and MAE metrics are not fair for comparisons between classes as they do not take into account the range of abundance values within each class.The most appropriate metric for these comparisons in this case is the RRMSE.
To better illustrate the reasons behind these differences in performance between observed and predicted abundances in each of the four LULC classes, Figure 12 shows a density scatter plot for each class.The scatter plots of "artificial" and "wetlands" pixels showed a less aligned distribution along the 1:1 straight line than terrestrial and agricultural lands.In artificial and wetlands plots, most points are concentrated in the lowest abundances, while in terrestrial and agricultural lands points tend to concentrate in both the extremes of the abundance gradient but also along the 1:1 line.This proofs that the model works reasonably good for both abundant (terrestrial and agricultural lands) and scarce (artificial and wetlands) classes.
Finally, Figure 13 shows the results achieved by the best model on three test areas (top row) with their corresponding RMSE (middle row) and RRMSE (bottom row) per pixel maps.As we can observe, most of the pixels are in blue tones, which indicates a low RMSE and RRMSE and a great LULC abundances predictions.A reduced number of pixels with red tones in the RRMSE maps indicates an important prediction error relative to the scale of the reference abundance.These pixels mainly correspond to small heterogeneous rural areas with a large diversity of urban, crop and even forest areas, which makes the task of correctly predicting each and every LULC class abundances very difficult.

SIPNA level 2
Similarly, we evaluated the proposed model on these input combinations: (spectro-temporal data), (spectro-temporal data + geo-topographic data), (spectro-temporal data + climatic data) and (spectro-temporal data + geo-topographic and climatic data) considering SIPNA level 2. We also include the results of a baseline model trained from scratch on Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data spectro-temporal data only to show how the pretraining on TimeSpec4LULC dataset is beneficial for level 2 as well.The results of these five models in terms of the average MAE, RMSE, RRMSE, CC, F1-score, RRMSE gain, CC gain across the ten classes of level 2 are provided in Table 3. Aditionally, the computational complexity of each model is expressed in terms of MFLOPs in the last column.Again, we can see in the first two rows that by just pretraining our model in TimeSpec4LULC dataset improves the results in every metric, proving the value of pretraining DL models in similar tasks to achieve better performance.Similarly, including ancillary information improves the spectral unmixing task even in a much more difficult spectral unmixing setting (Table 3).Compared to the baseline, the best performing model (including all the ancillary data) decreases the MAE, RMSE and RRMSE by 0.56%, 0.65% and 2.80%, respectively and increases CC and F1-score by 0.0320 and 0.0332, respectively.
In the same way as in level 1, Figure 14 shows a comparison between the baseline model and the model including geo-topographic and climatic data for every class in each of the five metrics used for evaluation.In general, adding ancillary information improves the abundances predictions of all the classes.The best performance is achieved in Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data  "woody crops" and "annual crops" classes in terms of CC, F1-score and RRMSE.Besides, adding the ancillary information to the model achieves a greater improvement for the classes with the worst results like "combinations of croplands and vegetation", "barelands" and "artificial".
Looking at the density scatter plot for each level 2 class in Figure 15, we see that the correlation between the reference and the predicted abundances is generally good, except for "combinations of croplands and vegetation" and "barelands" classes since they show a large dispersion.It is worth emphasizing the strong performance of the model for the class Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data "greenhouses".Despite of having so few representation of middle range values of abundance in the pixels of Andalusia, the correlation in this class between the reference and predicted abundances is similar to the classes with a good representation.We argue that the reason for that could be due to their very high albedo, i.e., high reflectance in all bands.Finally, the worst performance metrics were obtained for "combinations of croplands and vegetation" class, which may be due to the mixed-nature of this class definition itself.By combining crop and vegetation this class is a mixture of some of the other classes and hence it is complicated for the model to predict the correct abundances.
Lastly, Figure 16 shows the results achieved by the best model on three test areas (top row) with their corresponding RMSE (middle row) and RRMSE (bottom row) per pixel maps.In general, most pixels are in dark blue tones (low error) in the RMSE maps, which at first glance may seem better than the results achieved for level 1.However, when looking at the RRMSE maps we can notice a slightly higher number of pixels with red tones than in the level 1 RRMSE maps, located mainly in heterogeneous rural areas.Given that in level 2 we have 12 LULC classes, there are more Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data heterogeneous pixels and consequently the unmixing task is harder.It is important to note that although the error at level 1 is lower in absolute terms, when it is relativized by the scale of the reference abundances it becomes higher than in level 1, indicating only moderate results compared to the good results obtained at level 1.For this reason, it is recommended to evaluate not only the RMSE but also the RRMSE to get better conclusions.

Discussions
Spectral unmixing of LULC classes is a challenging problem commonly addressed by physical models with the need for endmember extraction [3,4].Moreover, the variability naturally present in the spectral signature for a given LULC class (spectral variability) makes this problem even more difficult [83,84].DL methods represent a great solution to eliminate the need for endmember extraction and they are known for their robustness against noise given sufficient amounts of training data.Most previous works focus on HS or MS data and do not exploit temporal information to estimate the abundance of mixed pixels.Obtaining MS time series of large territories can be prohibitive due to the cost and time required to acquire them [85,86].In addition, no previous work has explored the possibility of adding  ancillary data to enhance the spectral unmixing results, which are used successfully in other computer vision tasks [11,12,13].
In this work, we tried to solve the mentioned constraints of previous works by: • developing Andalusia-MSMTU, a high-quality MS time series dataset of mixed pixels labeled with LULC class abundances at two classification levels and making it publicly available so other researchers can develop Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data new approaches in the field of spectral unmixing.This dataset followed several data pre-processing steps as explained in Section 4.2.1 in order to smooth spectral varibilities associated with the imaging process.
• proposing and analysing DL-based approaches for SU without the need of endmembers extraction.Moreover, we intentionally included pixels well distributed around our study area in models' training, which implies a high number of diverse pixels with different spectral variations.This way, the DL models will be robust against the spectral variations of pixels in the test areas.
Our results showed that our DL-based method achieved good results for spectral unmixing of LULC classes by using MS data and it can be used in areas with similar features such as the rest of Spain and mediterranean countries.Besides, by including ancillary information the model improved in terms of every metric used for evaluation, showing that adding external data is an interesting avenue to explore in spectral unmixing problems.
Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data Finally, one significant limitation still exists in our work.Although DL models have shown great performance in mapping complex input-output relationships and have demonstrated promising results for SU of LULC classes, they lack physical interpretability.This means that it is difficult to understand how the model arrived at its decision, and it may not be clear why certain input features were given more weight than others [87].In the context of spectral unmixing, physical interpretation may be desirable [88] because it allows us to understand the underlying physical processes that govern the interaction of electromagnetic radiation with land surface materials.

Conclusions
In this work, we introduced and made publicly available Andalusia-MSMTU dataset, a new DL-ready dataset to explore SU approaches on MS time series data.Furthermore, we introduced ancillary information to improve the spectral unmixing performance consisting on two geographic, two topographic and five climatic variables.Our experiments show that the use of MS time series data for LULC abundance estimation achieves good results, which are further improved by including ancillary information.
For future work, we would like to explore taking advantage of spatial autocorrelation between neighbouring pixels, which provides useful information for the spectral unmixing task [89], by arranging the MODIS pixels in images and using a Convolutional-LTSM network with a BRITS-like approach to deal with missing values.Moreover, given the recent availability of higher spatial resolution sensors like Sentinel-2, data fusion between MODIS long-term data and Sentinel-2 higher resolution data is another avenue to improve spectral unmixing performance.Finally, since common DL-based models lack physical interpretation and it is sometimes important in the context of spectral unmixing, an effort to make DL-based methods physically aware is worthwhile.

Figure 2 :
Figure 2: Flowchart of our proposed method.First, Andalusia-MSMTU dataset is built using MODIS MS multitemporal data plus geo-topographic and climatic data together with the corresponding LULC abundances annotations extracted from SIPNA.Subsequently, the deep learning based model is designed to use both, multi-spectral times series and geo-topographic and climatic data to estimate the LULC abundances.

Figure 5 :
Figure 5: Hierarchical structure of the SIPNA-based LULC classes.The blue boxes represent the level 1 (L1) classes.The green boxes represents the Level 2 (L2) classes.

Figure 6 : 5 Methodology
Figure 6: The used scheme for extracting class abundances in every MODIS pixel of Andalusia.(1) The original SIPNA polygons were converted to a 10m raster, then (2) the LULC abundances were computed for each 460m pixel.

Figure 7 :Figure 8 :
Figure 7: Example of how class abundances are obtained for each MODIS pixel.(a) shows the satellite image of the Google Satellite corresponding to one MODIS pixel.(b) shows the annotated SIPNA polygons.(c) the rasterized LULC map at 10m resolution.(d) the obtained abundance of level 1 classes for that MODIS pixel.

Figure 9 :
Figure 9: Our proposed neural network.The green box denotes the input data for a given pixel, i.e., MS time series data + ancillary data.The yellow box denotes the BRITS model for MS time series feature extraction, the red box denotes the ancillary data feature extraction layer and the blue boxes denote the final layers for features combination and softmax transformation of neural network's output.

Figure 11 :
Figure 11: Test results for the four SIPNA level 1 classes obtained by including all ancillary information (green) and without ancillary information (blue): CC values (top left), F1-score values (top right), RMSE values (middle left), RRMSE values (middle right), and MAE values (bottom).

Figure 12 :
Figure 12: Density scatter plots of every level 1 class abundances (predicted vs reference) for the best model (including all ancillary data).(a) Artificial, (b) Agricultural lands, (c) Terrestrial lands, and (d) Wetlands.

Figure 14 :
Figure 14: Test results for the ten SIPNA level 2 classes obtained by including all ancillary information (green) and without ancillary information (blue): CC values (top left), F1-score values (top right), RMSE values (middle left), RRMSE values (middle right), and MAE values (bottom).
Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data (

Table 2 :
Performance comparison of our model trained from scratch (first row) and finetuned from TimeSpec4LULC (second row) using only multi-spectral multi-temporal input data, by adding geo-topographic data only (third row), by adding climatic data only (fourth row) and by adding both geo-topographic and climatic (fifth row).The performance is expressed in terms of average Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Relative Root Mean Squared Error (RRMSE), Correlation Coefficient (CC), F1-score, RMSE gain and CC gain with respect to baseline model for SIPNA level 1 classes.Last column, "MFLOPs", indicates the model's computational complexity in terms of Mega FLOPs.

Table 3 :
Performance comparison of our model trained from scratch (first row) and finetuned from TimeSpec4LULC (second row) using only multi-spectral multi-temporal input data, by adding geo-topographic data only (third row), by adding climatic data only (fourth row) and by adding both geo-topographic and climatic (fifth row).The performance is expressed in terms of average Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Relative Root Mean Squared Error (RRMSE), Correlation Coefficient (CC), F1-score, RMSE gain and CC gain with respect to baseline model for SIPNA level 2 classes.Last column, "MFLOPs", indicates the model's computational complexity in terms of Mega FLOPs.