Deep Mixture Model-Based Land Surface Temperature Retrieval for Hyperspectral Thermal IASI Sensor

A deep mixture model was developed to retrieve land surface temperatures (LSTs) from infrared atmospheric sounding interferometer (IASI) observations. The IASI brightness temperature (Tb) data and the Advanced Very High Resolution Radiometer onboard MetOp (AVHRR/MetOp) LST data were randomly divided into training and test datasets, and a deep mixture model was constructed to simulate radiation transmission in order to invert the LST. The constructed model could evaluate dataset characteristics that included global features, local features, and time-domain predictions, covering most of the features of the satellite dataset. For the test datasets, the root mean square error (RMSE) indicated that the LST in Algeria and South Africa could be retrieved with an error of less than 2 K and 2.5 K, respectively. Compared with the AVHRR/MetOp LST product in March and December 2019 for Algeria and South Africa, the LST could be retrieved with the maximum RMSE of 2.5 K. The LST retrievals at nighttime had an RMSE of less than 2.0 K, which was superior to those retrieved during daytime for Algeria. This deep mixture model can be applied to time-series temperature prediction.


I. INTRODUCTION
The main objective in thermal infrared remote sensing is the accurate determination of the land surface temperature (LST) at scales that vary from the regional through to the global scale. With the rapid development of hyperspectral thermal infrared remote sensing, the atmospheric infrared sounder (AIRS) [1], the infrared atmospheric sounding interferometer (IASI) [2], and the cross-track infrared sounder (CrIS) [3], onboard satellites sensors are able to provide data that allow for fine spectral feature land surface information analysis. However, LST is also coupled with land surface emissivity (LSE). According to the radiative transfer equation (RTE), there are always N+1 unknowns (N LSEs and one LST) present when observing radiances in N bands (N equations), even if accurate atmospheric correction has been performed [4]. Therefore, the separation of temperature and emissivity is an important topic for consideration.
In recent decades, great efforts have been made to estimate surface temperatures from thermal infrared (TIR) data. In terms of accurate atmospheric correction, some The associate editor coordinating the review of this manuscript and approving it for publication was Qingli Li . methods commonly make use of physical constraints to obtain an accurate LST. For example, the iterative spectrally smooth (ISSTES) method [5]- [8] defines a smoothness index because inaccurate LST will cause the calculated LSE to retain characteristics such as atmospheric emission lines. The downwelling radiance residual index (DRRI) [9], [10] describes the direction and magnitude of the downwelling radiance residual feature associated with some commonly-used channel groups. In addition, the correlationbased temperature and emissivity separation (CBTES) algorithm [11] describes the relationship between the surface emissivity and atmospheric downward radiance to optimize surface temperature. However, some methods, such as the linear spectral emissivity constraint method [4], [12] and wavelet transform method [13], have adopted data dimensionality reduction to reduce the number of LSEs in order to solve underdetermined problems.
In fact, accurate atmospheric profiles are usually unavailable synchronously with TIR measurements. Therefore, some methods are dedicated to providing LST using hyperspectral thermal infrared data without performing accurate atmospheric correction. Examples include the regression retrieval method [14]- [16], multi-channel method [17], [18], artificial neural network (ANN) method [19]- [21], and twostep physical simultaneous retrieval methods [22]- [26]. The ANN method allows learning and recognition of complex non-linear patterns and can allow for the establishment of more complex relationships between independent and dependent variables without the need for an exact knowledge of the complex physics mechanisms [27]. However, currently proposed ANN methods for hyperspectral LST inversion are mostly trained using a limited simulation dataset containing one or two hidden layers, which clearly result in low accuracy (5 K bias compared with the IASI LST product) when applied to satellite data because the simulated data set does not contain all the actual land and atmospheric conditions present in satellite data [21]. Deep learning is characterized by a neural network that usually involves more than two layers that can learn representative and discriminative features in a hierarchical manner from the dataset. It is highly effective for object detection [28], image recognition [29], and semantic segmentation [30] for remote sensing. The accuracy of LST inversion may be improved through the use of deep learning to learn actual satellite data. However, there are few studies that have been performed on LST retrieval using deep learning technology with hyperspectral thermal infrared data. Therefore, in this study, a deep mixture model was proposed and applied to IASI sensor hyperspectral data in order to retrieve LST.
The remainder of this paper is organized as follows: A description of the satellite datasets is presented in Section II. The methodology is presented in Section III. The retrieval results are presented and analyzed in Section IV. The new method is applied and validated in Section V using satellite data. Finally, a discussion and conclusions are presented in Section VI.

II. STUDY AREA AND DATASETS
The chosen study areas were located in northern and southern Africa, which contain a variety of land surface coverage and are subject to many clear-sky days. The latitude and longitude range of the selected areas were predominantly determined by the scanning trajectory of the IASI sensor. The selected northern and southern areas were located in Algeria and South Africa, respectively. Fig. 1 shows the research area.
IASI real observations with different sensing start/stop times and different viewing zenith angles (VZAs) were collected to train this neural network. An IASI is a Fourier transform spectrometer which is based on the Michelson interferometer, associated with an integrated imaging system (IIS). The Fourier transform spectrometer provides high resolution infrared spectra and the IIS imager is a high spatial resolution broadband radiometer [2]. The IASI sensor onboard the MetOp-A satellite observes the land surface and atmosphere using 8461 channels that cover a range between 645 cm −1 and 2760 cm −1 at a resolution (unapodized) of 0.25 cm −1 for each sounder pixel and with a ground spatial resolution of approximately 12 km at nadir. The IASI provides high-resolution atmospheric emission spectra that  allow the derivation of temperature and humidity profiles with high spectral and vertical resolution and accuracy. Additionally, they allow the determination of trace gases, land and sea surface temperatures, emissivity, and cloud properties. In this study, the IASI Level 1C product (brightness temperature, Tb) covering all spectral samples was obtained from the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) and adopted as the input training data for the convolutional neural networks (CNNs). Additionally, the Advanced Very High Resolution Radiometer (AVHRR) LST product generated by the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) polar system satellites (MetOp) was utilized as the training target for the CNN. The EUMETSAT Polar System (EPS) Daily Land Surface Temperature product provides daytime LST retrievals based on clear-sky measurements. The satellite application facility (SAF) on land surface analysis (LSA) provides AVHRR/MetOp Daily LST product (LSA-002). It has been available since January 2015 on a daily basis in a sinusoidal grid centered at (0 • N, 0 • W), with a resolution of 0.01 • by 0.01 • . The AVHRR/MetOp LST product was reprojected according to the mathematical construction of the sinusoidal projection to obtain its latitude and longitude [31]. It was then resampled to the same pixel size for each IASI LEVEL 1C product using the spatialaverage value.
The time span of the IASI brightness temperature product and AVHRR/MetOp Daily LST product for the training dataset was between January 2016 and December 2018. The datasets from these three years were obtained to train and teach the constructed deep mixture model.

III. METHODOLOGY
The RTE can be expressed as follows [32]: where L (λ) is the at-sensor radiance at wavelength λ, ε is the surface emissivity, τ is the atmospheric transmittance, L up is the atmospheric upwelling radiance, L down is the atmospheric downwelling radiance, and B(λ, T s ) is the Planck function at surface temperature T s atwavelengthλ as described in (2).
where C 1 = 1.191 × 10 8 W · µm 4 (sr · m 2 ) and C 2 = 1.439 × 10 4 µm · K . L(λ) can be expressed as B(λ, T b ), where B(λ, T b ) is the Planck function at brightness temperature T b . It is clear that there is a complex nonlinear relationship between the brightness temperature and surface temperature. Consequently, a deep mixture model that can learn complex patterns was employed to retrieve LST directly.

A. CHANNEL SELECTIONS
Owing to the large amount of data required for deep learning training and the strong correlation between channels associated with hyperspectral data, the selection of an optimal subset of IASI channels is one of the most important factors affecting computation efficiency and retrieval accuracy in actual inversion. Many channel selection schemes that consider IASI sensors have been proposed for different purposes [33]- [37]. Examples of these purposes include the extraction of atmospheric profiles and cloud properties.
In this study, the smallest number of channels that could convey essential information on target surface parameters was identified. Taking into consideration that some mid-infrared channels are greatly affected by the sun, we only considered the 645-1600 cm −1 channel in the TIR region. Aires [20] analyzed wavelength sensitivity to Ts variations using a channel selection process that defined sensitivity as the mean variation L(λ) for a 1 K change in Ts. Therefore, for the 645-1600 cm −1 spectral region, 245 channels sensitive to the surface characteristics were selected along with a threshold of 75%, which represented a good compromise according to Aires' selection procedure (Fig. 2).

B. DEEP MIXTURE MODEL
Deep neural networks (DNNs) [38]- [39] and CNNs [40] have significant effects on nonlinear regression problems with large amounts of data. In the same way, long-term short-term memory network (LSTM) models [41] also have an effect on nonlinear regression of data with temporal relationships, but the effect is not clear. A combination of DNN, CNN, and LSTM can be used to evaluate the characteristics of datasets that include global features, local features, and time-domain predictions, covering most of the dataset features. Therefore, this study uses a combination of these three types of models to invert the surface temperature. Fig. 3 demonstrates the architecture of the constructed deep mixture model.
The input layer ensured that the pixel size was uniform at 1×245, which means that each satellite brightness temperature pixel incorporated 245 bands, i.e., the size of each sample was 1 × 245.
A DNN is an ANN incorporating multiple layers (≥3). The DNN model used a three-layer structure, and the hidden layers contained 128, 128, and 1 neuron, respectively. The sigmoid function, tanh function, and rectified linear unit (ReLU) function [42] are typically used as activation functions. The activation function f (·) adopted the ReLU function, which can train a neural network rapidly without a significant penalty to generalization accuracy [43]. The output of the ReLU function does not tend to saturate as the input gradually increases. This DNN model broke up the spatial information of the dataset, compressed the entire dataset into a single vector, obtained the global feature information through calculation of the weight, forwarded it, and finally obtained a global feature. DNNs could thus be used to obtain global large feature information from satellite image data in a single area.
As feed-forward artificial neural networks, CNNs are multilayer structures that incorporate convolutional layers, pooling layers, and fully connected layers [44]. They use local connections to effectively extract information from data and reduce the number of parameters by sharing the weights. The number of alternately arranged convolutional layers and pooled layers can be adjusted based on the retrieval target. A core function of the convolutional layer is learning the feature representations of the input data. The feature for the lth layer (convolutional layer) is obtained using convolution and activation operations with trainable parameters (i.e., the weight term W k and bias term b k ) and the activation function f (·). The trainable parameters are initialized randomly, subject to a uniform distribution. Mathematically, the feature value of the k-th feature of the l-th layer X l k is calculated using the following equation: Here, X l−1 i are K features of the (l−1)th layer (i= 1, 2, . . . ,K ). The convolution kernel W l k is a weight matrix whose size and number are specified manually. The three convolutional layers contained 245, 128, and 64 convolution kernels, respectively, and the corresponding convolution window widths were 10 × 1, 6 × 1, and 3 × 1. The activation function f (·) adopted the ReLU function. The pooling layer, usually placed between two convolutional layers, performed a down-sampling operation to achieve shift-invariance. Here, max-pooling operations [45] were adopted. The pooling kernel was set to a matrix size of 2. Finally, the highlevel reasoning of the neural network was performed via a fully connected layer containing three sets of convolutional and pooling layers in the architecture. Neurons in a fully VOLUME 8, 2020 connected layer share connections with all activations in the previous layer. To maintain the stability of the CNN network, the neural network node activation function at the fully connected level also made use of the ReLU function.
The CNN retained the spatial information of the data set, separated it into multiple local subdatasets, obtained its corresponding feature information by refining the weights of the subdatasets, and forwarded them one by one. Finally, the spatial structure was restored and integrated according to the local features of the subdataset for evaluation. The CNN could thus be used to capture the details of satellite data in a single region.
LSTM is a recurrent neural network model that can learn long-term dependencies and is used to process and predict important events with very long intervals and delays in the associated time series. Therefore, we used LSTM to analyze and predict satellite data in a single area in the time domain. The LSTM model was created according to the Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks (CLDNN) model [46]. The general structure of the CLDNN network includes an input layer with a timedomain-related feature. It connected several layers of CNNs to reduce the frequency domain change. The CNN output was transferred into several layers of LSTM to reduce the time domain change, and the output of the last layer of the LSTM was input to a fully connected DNN layer, which mapped the feature space to an output layer that was easier to classify.
The CNN part of the LSTM model created in this study incorporated a one-dimensional convolution layer (timedomain convolution), 64 feature maps, and a time-frequency domain filter with a size of 9 × 1. The input shape size of this layer was 245 × 1, the pooling layer between the convolutional and LSTM layers adopted the max-pooling strategy [45], and the pooling size was 2 × 1. The inter-layer dataset was processed for batch normalization [47]- [49]. The CNN layer was followed by 2 LSTM layers, the first of which incorporated 6 cells. The input shape size of this layer was 122 × 64, with an output shape size of 122 × 6. The second LSTM layer incorporated 32 cells. The input shape size of this layer was 122×6, with an output shape size of 32 × 1. The output of the LSTM was connected to two fully connected layers incorporating 32 neurons and 1 neuron, respectively.
The output tensors from the DNN, CNN, and LSTM models were spliced into a tensor, which was then passed to the output layer of the mixed model through a fully connected layer incorporating a single neuron. The three models were fused into a mixed model to process complex satellite observations and perform the corresponding surface temperature inversion. The combination of the three models involved the use of the mean square error (MSE) as a loss function, with the layer functional equation described as follows: where y i is the target answer of the i-th data point in a training batch and y i is the retrieval value given by the deep mixture model. The adaptive moment estimation (Adam) algorithm [50] was adopted as a gradient descent algorithm for the deep mixture model backpropagation stage. Adam is straightforward to implement and requires little memory. Adam is robust and well-suited to a wide range of non-convex optimization problems in the field of machine learning [51].

IV. RESULTS
To evaluate the accuracy and practicability of the LST retrieval, the IASI brightness temperature and AVHRR/MetOp Daily LST products were established as the training and test datasets. For the research area training dataset, 90% of the dataset was adopted as training data, and 10% of the dataset was utilized as the test dataset for the deep mixture model. RMSE and bias were adopted to validate the deep method's retrieval accuracy. Additionally, a histogram of the residuals between the values retrieved using the deep mixture model and the AVHRR/MetOp Daily LST products was demonstrated. The residuals between the retrieved LST and the AVHRR/MetOp Daily LST products for the selected two areas were predominantly located in the range between -5 K and 5 K (Fig. 4). When validated using the test dataset, the RMSE indicated that the LST in Algeria and South Africa could be retrieved with an associated error of less than 2 K and 2.5 K, respectively, using the deep mixture model (Fig. 4). The LST RMSE for South Africa was larger than that for northern Africa's LST retrieval result. One possible reason was that the selected area for South Africa's land surface types was greater than that selected for northern Africa. Additionally, the surface of northern Africa is more uniform. In the following sections, the deep mixture model will be utilized to provide an LST estimation for data generated at other times.

V. APPLICATION
The deep mixture model was applied to other IASI real observations made at other times. IASI Tb data generated over four seasons in the selected two areas were adopted to validate the retrieval accuracy of LST. For the research area in northern Africa, daytime and nighttime data were The two selected areas were mostly subject to clear skies on these days. Fig. 5 and 6 show the LST inversion values generated using the deep mixture model for the daytime and nighttime data of the selected two areas. Compared with the AVHRR/MetOp LST product, the LST could be retrieved with the RMSE of less than 3 K for March in northern Africa (Fig. 5a, 5b, 5c, and 5d) and less than 2.5 K for December in southern Africa (Fig. 6a, 6b, 6c, and 6d). The nighttime LST retrievals had a RMSE of less than 2.0 K compared with the AVHRR/MetOp LST product and were better than those retrieved during the day for northern Africa. One possible reason for this result is that the surface temperature was more uniform at nighttime. However, for southern Africa, the LST retrieval RMSE for the nighttime was slightly higher than for the daytime data. This was mainly because the number of clear-sky pixels was small for the daytime data. It can be concluded that the deep learning model could provide relatively accurate LST estimation.

VI. DISCUSSION AND CONCLUSION
It is generally difficult to obtain accurate atmospheric profiles synchronously with TIR measurements. Thus, the lack of accurate atmospheric correction affects LST retrieval accuracy. The retrieval of land surface temperature without the need for any prior atmospheric information would be ideal. This study introduced a deep mixture model to provide LST estimation for IASI hyperspectral data. The relationship between the IASI observations (brightness temperature) and LST was learned using deep learning. The constructed deep mixture model combined the advantages of DNN and CNN for extracting training data features and the LSTM model for sequence prediction.
For the research area training dataset, 90% of the dataset was adopted as training data, and 10% of the dataset was utilized as the test dataset for the deep mixture model. The results showed that the LST in Algeria and South Africa could be retrieved with the RMSE of less than 2.5 K. This model was also applied to IASI real observations made at other times. The LST retrievals at nighttime had an RMSE of less than 2.0 K compared with the AVHRR/MetOp LST product, and were superior to those retrieved during the day for northern Africa. Therefore, our constructed deep mixture model could be used to determine LST with good retrieval accuracy for time-series hyperspectral TIR data.
The accuracy of the validation result for 2019 was mostly lower than that of the training dataset. This was mainly because the mean and variance of the training dataset (2016-2018, three years) deviated slightly for the 2019 validation data. This affected the results of the deep learning model inversion. Collecting training data on a large time scale may improve the accuracy of the validation. Therefore, this can form part of future studies which will be limited by the data collection and processing time.
The proposed method does not require any atmospheric information, but does require a large amount of satellite data to train and teach the network. Therefore, the associated data collection and processing take a lot of time. LST retrieval for large-scale areas will have significant requirements in terms of data volume and computing performance, which represents a problem. Furthermore, the output training dataset for our deep mixture model is AVHRR/MetOp LST product, the retrieval accuracy for 2019 data was only validated with this product. Whether other satellite data products or ground measurement data can be selected to analyze the evaluation problems can be expected in future to further discuss the performance of our deep mixture model. He has published more than 100 articles in international refereed journals. His research interests include thermal infrared radiometry, parameterization of land surface processes at large scale, as well as in the assimilation of satellite data to land surface models.
JÉLILA LABED received the Ph.D. degree in physics of terrestrial environment in the thermal infrared domain from the University of Strasbourg, in 1990. She succeeded in 1991 at a national competitive examination. Since 1991, she has been working as an Associate Professor with the University of Strasbourg. She was the Director of Studies with the School of Physics (Telecom Physique Strasbourg) in the university, in 2005, function left, in 2010. Since 2010, she has been teaching engineering project management, in addition to experimental physics. She has organized and participated at many field campaigns in different countries. She has published more than 40 publications in the international refereed journals. Her research interests include radiometry in the thermal infrared domain applied to terrestrial surfaces, land surface processes modeling at large scale, particularly estimation of land surface temperature and evapotranspiration from satellite data, and urban climatology.
FRANCOISE NERRY received the Ph.D. degree in physics of terrestrial environment in the thermal infrared domain from the University of Strasbourg, in 1988. She received a two-year fellowship from the National Research Council, Jet Propulsion Laboratory, Pasadena, USA, where she was formed with spectro-radiometry techniques. She joined CNRS, as a Researcher, in 1990. Since 2004, she has been the Leader of the TRIO (remote sensing, radiometry and optical imagery) Team, ICube Laboratory. The main domains of interest of TRIO team are remote sensing, urban climatology and modeling and imagery of polarization. In 2004, she obtained the diploma enabling to direct Ph.D. students and she had formed already 14 Ph.D. students. She organized or participated at more than ten field campaigns in different countries. She has published more than 100 scientific articles in international peer-reviewed journals. Her research interests include radiometry and spectro-radiometry in the thermal infrared domain applied to natural surfaces, analysis of hyper-spectral data from ground interferometer or from IASI data, methods of analysis of remote sensed thermal infrared data, and surface fluxes parameterization in an urban environment.