Band Reconstruction Using a Modified UNet for Sentinel-2 Images

Multispectral (MS) remote sensing images are of great interest for various applications, yet, quite often, an MS product exhibits one or more noisy bands, strip lines, or even missing bands, which leads to decreased confidence in the information it contains. Meeting this challenge, this article proposes a UNet-based neural network architecture to reconstruct a spectral band. The worst case scenario is considered, that of a missing band, the reconstruction being performed based on the available bands. Besides the comparison with state-of-the-art methods, both the qualitative and quantitative analyses are fulfilled considering several metrics: root-mean-square error, structural similarity index, signal-to-reconstruction error, peak-signal-to-noise ratio, and spectral angle mapper. The experiments focus on Sentinel-2 open data within the Copernicus program. Various patterns of urban areas, agricultural regions, and regions from North Pole or Kyiv, Ukraine are included in our dataset to prove the efficiency of band reconstruction regardless of land-cover diversity.


I. INTRODUCTION
R EMOTE sensing used in the scope of earth observation (EO) is among the most essential technologies in learning and understanding the earth's surface. The systems used to acquire information are complex and sensitive, but to effectively exploit them, we expect a flawless operation. However, for various reasons, such as extreme atmospheric conditions or physical degradation of some components of these sensors, important information may be missing in the acquired images. As the time passes since satellites were launched and fulfill their mission, there is a great risk of sensor's degradation. These degradations can mean noise in the form of lines, the lack of partial information, or even the total lack of a spectral band. Fig. 1 shows such an example, which emphasizes the need of band recovery due to sensor-generated artifact in the acquired band. The recovered one does not contain any corrupt information because the complementary spectral information is clean and untainted.
Addressing the challenge of a missing or degraded spectral band, this article presents a method of valorizing the spectral information available in the other bands to predict it. We considered the worst case scenario, that is, the lack of any information relative to the missing band, while the reconstruction process uses the available complementary spectral bands.
The suggested method implements a convolutional neural network (CNN) architecture named U-Net, modified to fulfill a specific need: learn spectral and spatial information in order to better reconstruct the band. Designing a framework able to reconstruct any band of a Sentinel-2 (S2) product leads to a solution individually applied for each band. The following sections detail the proposed concept, the implementation aspects, the consideration of the sensor's spectral characteristics, and the experimental results. In the end, we provide the assessment of the results, draw conclusions, and highlight further perspectives.

II. RELATED WORK
Remote sensing instruments for recording information about the earth's surface are the main sources of data regarding the observation and understanding of the planet on which humans live. However, it infrequently happens that these sensors suffer certain physical degradations, which cause incomplete data acquisition . The missing data can be manifested both in the form of noises present on certain bands or stripe lines and the total absence of a spectral band. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The incompleteness of MS products affects numerous applications that rely on them. Hence, this challenge was widely investigated in an attempt to solve the missing information reconstruction.
Over time, different methods have been submitted to approach this subject, which can be differentiated according to the used basic principle: the information needed to perform the reconstruction. Thus, depending on the source of this information, four types of methods can be distinguished: spatial-based, spectral-based, temporal-based (multitemporal), and mixed (hybrid) methods [1].

A. Spatial-Based Methods
Spatial-based methods, also known as inpainting methods, are among the most classical and primitive because they use the remaining information to generate the missing information. The basic idea behind them is represented by the fact that both the missing data and the remaining data contain the same geometric and statistical structures [2]. Spatial correlation is the main hypothesis on which these methods are based.
Many of these solutions have been studied in the field of computer vision, but they can also be applied to multispectral remote sensing images. Over time, they have evolved from classical methods, based on algorithms and sequential transformations, to the use of neural networks. Zhang et al. [3] propose an alternative interpolation method for the local linear histogram-matching technique, namely the kriging geostatistical technique able to fill data gaps. The same interpolation has been used in [4] to retrieve aerosol optical depth over the pixels that could not be considered by the custom dedicated algorithm. The approach presented in [5] is based on the joint interpolation of image gray levels and gradient/isophote directions, smoothly extending in an automatic fashion the isophote lines into the holes of missing data. Maalouf et al. [6] introduce a Bandelet-transform-based inpainting technique to remove cloud-contaminated portions of a remotely sensed image and fill the missing data. Chan and Shen [7] introduce a nonlinear partial differential equation inpainting model based upon curvature-driven diffusions for nontextured images. Previous works use a maximum-a-posterioribased algorithm for both destriping and inpainting problems [8] or multichannel nonlocal total variation for textured images and reconstruction of large-scale areas [9].
However, spatial-based methods can encounter problems such as unconnected edges, blur effect, or texture inconsistency.

B. Spectral-Based Methods
The methods of recovering a band based on the use of spectral information are dependent on the data available on the other spectral bands. Due to the characteristics of the sensors, in the multispectral and hyperspectral images, there is a lot of redundant information that can be used for useful purposes. However, there is a prerequisite: the bands needed for reconstruction must exist and be complete.
So far, few spectral-based methods have been proposed to recover the missing information or even an entire band. The first ones addressed the problem of Aqua MODIS images that had band 6 with stripe noise and assumed the use of polynomial regression to retrieve this band from band 7 measurements [10], combining histogram matching to correct the detectorto-detector striping of the functional detectors with local leastsquares fitting that restores the missing data based on a cubic polynomial derived from the relationship between bands 6 and 7 [11] or within-class local fitting that incorporates scene types and spectral band characteristics [12]. These methods included only two bands as sources of information or reconstruction for Aqua MODIS, while Gladkova et al. [13] described a quantitative image restoration algorithm that handles a small number of functioning detectors to train a restoration function that is based on a multivariate regression using the information in a spatialspectral window around each restored pixel. For the same task, Li et al. [14] implemented a robust multilinear regression based on the spectral relations between working detectors in band 6 and all the other spectra, showing better results.
Later, methods based on neural networks revealed improved results. Rout [15] studied the ability of supervised and adversarial learning to address the task of missing band reconstruction with the sole supervision of existing spectral and spatial prior distribution. Their band recovery method modified the superresolution solution developed by Rout et al. [16], based on deep CNNs that encompass two major learning mechanisms: global and local residual learning.

C. Temporal-Based Methods
Temporal-based methods imply the use of additional information obtained from images acquired at a short time interval over the same geographical area.
The most well-known approaches include temporal replacement [17], [18], [19], [20], [21], the use of temporal filters [22], [23], [24], or the temporal learning model [25], [26]. Zeng et al. [17] used multitemporal regression analysis and a regularization method to recover missing pixels for Landsat ETM imagery. Furthermore, based on the concept of utilizing temporal correlation of multitemporal images, in [18], a patchbased information reconstruction algorithm spatiotemporally segments a sequence of images into clusters containing several spatially connected components called patches and then clones information from cloud-free and high-similarity patches to their corresponding cloud-contaminated patches. In [19], missing measurements are reconstructed through an unsupervised contextual prediction process that reproduces the local spectrotemporal relationships between the considered image and an opportunely selected subset of the remaining temporal images, while Zhang et al. [20] handle missing data by creating appropriate covariates and then fitting a functional concurrent linear model on the resulting data. A straightforward method based on the Savitzky-Golay filter to smooth out noise caused by cloud contamination and atmospheric variability in NDVI time series is presented in [22], while a changing-weight filter approach for reconstructing a high-quality NDVI time series is presented in [24]. The reconstruction of area obstructed by clouds based on the compressive sensing theory introduced in [25] enables finding sparse signal representations in underdetermined linear equation systems, while, for the same task, Li et al. [26] advance two multitemporal dictionary learning algorithms, expanding the K-SVD and Bayesian algorithms, to make better use of the temporal correlations.
The biggest drawback of these methods lies in the ground changes appeared in the time elapsed to first accurate available data used for reconstruction. These changes may be due to new construction or natural hazards, such as flood or fire, or even problems that occurred during the acquisition of the image (observation conditions and atmospheric conditions).

D. Hybrid Methods
Since the three types of methods presented above have both advantages and disadvantages when used separately, there is also the possibility of combining them in order to obtain better results.
Hybrid methods for information recovery assume the use of additional information from different domains. Therefore, there can be several possibilities of combinations, i.e., spatiotemporal methods [17], [27] or spatiospectral methods [28], based on the idea of making the prediction process learn from information available in the cloud-free neighborhood of contaminated areas for the contextual reconstruction of cloud-contaminated areas in multitemporal multispectral images. There are also methods, such as in [29], advancing a unified spatial-temporal-spectral deep CNN for reconstruction, that use information from all three domains. Solutions developed for pan sharpening (i.e., predicting pixels signatures at higher resolution) are also combining the spatial and spectral information. In [30], a four-layer CNN is proposed using a loss function without requiring a reference.
Later, methods based on convolutional networks [31], proposing a new approach to denoising, inpainting, and superresolution of hyperspectral image data based on intrinsic properties of a CNN without any training, or generative networks [32], implementing a modified unsupervised CNN context generate model, were explored.
Although using the combined advantages of two or more methods would result in obtaining results with greater accuracy, it must be taken into account that, in any combination that includes the temporal domain, the existence of very recent preceding images is determined by too many uncontrollable factors.
In the process of reconstructing a band, the super-resolution effect can also be encountered if the recovery development involves all the spectral bands complementary to the band to be reconstructed, regardless of their spatial resolution. For example, Rout et al. [16] use multisensor bands as input information to reconstruct a band in the SWIR domain at a resolution that the target sensor does not have.
Brodu [33] presents Superres, a super-resolution method for S2 products based on exploiting both the local consistency between neighborhood pixels and the geometric consistency of subpixel constituents across multispectral bands in order to bring all the bands from 20 and 60 m/pixel down to 10 m/pixel. Starting from the highest resolution bands, band-dependent information is separated from information that is common to all bands' geometry of scene elements. This model is then applied to unmix low-resolution bands, preserving their reflectance, while propagating band-independent information to preserve the subpixel details.
Lanaras et al. [34] introduce DSen2 and VDSen2, state-ofthe-art CNNs to perform end-to-end upsampling; these are two configurations of ResNet architecture, i.e., Deep Sentinel-2 and a Very Deep Sentinel-2, being trained with low resolution data. Thus, one has access to a virtually infinite amount of training data by downsampling real S2 products. They use globally sampled data over a wide range of geographical locations to obtain a network that generalizes across different climate zones and landcover types and can super-resolve arbitrary S2 product without the need of retraining. Before [34], there are no significant results in applying deep learning for super-resolution. A comparatively shallow three-layer CNN architecture was originally designed for single-image (blind) super-resolution as introduced in [54] . They train pan-sharpening networks for Ikonos, GeoEye-1, and WorldView-2. Similarly, PanNet, a network introduced in [55], based on the high-performance ResNet architecture, was also applied on WorldView-2, WorldView-3, and Ikonos. Due to their performance and relevance as state-of-the-art approaches for our endeavor, the abovementioned methods Superres and DSen2, VDSen2 will serve as baseline methods for comparison and validation.
Methods of synthetic generation of a band are also represented by those who seek to obtain multispectral images based on RGB ones. Rangnekar et al. [35] train a conditional adversarial network to learn an inverse mapping from a trichromatic space to 31 spectral bands within 400-700 nm, an aerial hyperspectral dataset. Similarly, Rodríguez-Suárez [36] focus on conditional generative adversarial networks (CGANs) to achieve the reconstruction of multispectral images from RGB images. Different regression network models (convolutional neuronal networks, U-Net, and ResNet) have been adapted and integrated as generators in the CGAN and compared in terms of performance for multispectral reconstruction. A very comprehensive work that reviews all the methods of multispectral image enhancement from the point of view of super-resolution, noise reduction, inpainting, or restoration is proposed by Tsagkatakis et al. [37].

III. PROPOSED CONCEPT
With the primary objective to fully exploit the spectral information in the interest of reconstructing a missing band of a multispectral image, this article proposes a method to extract that information from the concurrent spectral bands of the same product. This concept is based on the following premises.
1) Multispectral images represent a product describing a certain area from the earth's surface containing bands acquired at different wavelengths. Consequently, there is a spatial and spectral correlation between bands. 2) Deep neural networks have proven their ability to learn from the unique representations of various target attributes. 3) The worst case scenario involves the total absence of a spectral band, but the existence of complementary spectral information in the same multispectral image is sufficient, and no other supplementary data are required. Starting from these premises, this article proposes a generalized method to reconstruct any missing or corrupted band from a multispectral image. The only condition is the integrity of the complementary spectral information. Fig. 2 illustrates the general overview of the method. The CNN, UNetBRec, receives as input all the spectral bands except the one to be reconstructed and returns as output a single band with the same width and height. During the training process, the network performs a comparison between ground truth and the generated band to adjust its parameters and obtain a better result. The method is generalized for each band, the single difference between the trained models being the bands received as input and the one used to compare the output. In the case of an L2A S2 product, there is a set of 12 trained networks; each of them is used to recover a specific band.
The proposed concept is designed to prevent the need for additional information, other than the one available in the multispectral product for which one of the bands is to be recovered. Consequently, our solution is able to produce the missing band in a short time by preserving both the spectral and spatial properties of the product.
The benefit of band reconstruction arises from the subsequent use of the thorough multispectral product, as required by the current applications of EO data.

IV. BAND RECONSTRUCTION FOR MULTISPECTRAL IMAGES
The extensive adoption of deep-learning-based methods in numerous fields demonstrated that problems impossible to solve with classical methods became approachable and even solvable using neural networks. One such example is the following situation: due to the lack of correlation of spectral bands' distribution, it was considered unrealistic to reconstruct one band using the others. However, once neural-network-based solutions were used, the opposite was demonstrated, the results obtained being more than satisfactory and, in some cases, even more eloquent than the ones obtained using classical methods.

A. Proposed Deep Learning Architecture
U-Net [38] is a state-of-the-art CNN build upon on the "fully convolutional network" introduced by Long et al. [39]. The main characteristic of this network is the U -shape architecture (see Fig. 3) containing a contracting path and an expanding path. The contracting operation is obtained through pooling operators, while the expanding is achieved through upsampling operators. The two branches, down and up, are interconnected through concatenation operations in order to pass spatial and spectral information. Therefore, the symmetry between the two parts of the network is almost perfect.
While the original U-net architecture has been implemented for the segmentation of neuronal structures in electron microscopic stacks, in this article, we introduce a modified U-Net architecture called UNetBRec, presented in Fig. 4, to address the missing band reconstruction challenge. UNetBRec has the same U-shape structure; the dimensions of the parameters have changed: the number of layers decreased from 23 to 14, meaning that the levels on downsizing has decreased from four to two, to reduce the number of network's parameters. The input patch dimensions decreased from 572 × 572 single band to 304 × 304 × 11 bands. The main adjustment is related to sequential convolutions that have been settled to 1 × 1 so that the output will be a single band, with the same dimensionality as the input ones. The typical architecture of a CNN is followed in the down branch. The sequence of two 1 × 1 unpadded convolutions, each followed by a rectified linear unit (ReLU) and a 2 × 2 average pooling, is repeated during the contracting path, while the expanding path follows a sequence of an upsampling operation, a 1 × 1 convolution, a concatenation with the correspondent feature map from the contracting path, and two 1 × 1 convolutions, each followed by the ReLU. The upsampling operation is, in fact, a deconvolution, which may be seen as a transposed convolution. The last layer is a 1 × 1 convolutional layer used to get to the desired dimension of the resulting reconstructed band, which is equal to the input dimension. The strategy of this architecture is to successfully learn local and global features relevant for band reconstruction.
In Fig. 4, each box represents a multichannel feature map, with the number of channels on top and the dimension of feature map on the bottom. The arrows define different operations, explained by the legend. The multiplication of the feature channels has been chosen as a result of multiple experiments.

B. Physics-Aware Multispectral Image Band Reconstruction
Several satellite sensors record images with multiple spectral bands at different spatial resolutions, the main advantages being: r simultaneity of spectral band recording; r illumination similarity between bands; r atmospheric conditions similarity between bands; r very precise coregistration; r acquisition distance similarity between bands.
The reasons behind recording at varying spatial resolution may be transmission bandwidth restrictions, band designation, or storage.
S2 is one of the products of Copernicus Sentinels mission, which uses satellites that record multispectral images with bands at different spatial resolutions. Bands of S2 product have 10-, 20-, and 60-m resolution. The previously mentioned advantages apply in this case, thus making it possible for the proposed solution to rebuild a spectral band. However, it must be taken into account that the exclusive use of information from the concurrent spectral bands can generate, in addition to band reconstruction, an improvement in terms of resolution. Thus, for 60-m-resolution bands, a resolution improvement may occur, while for 10-m-resolution bands, the quality would be preserved due to the presence of multiple bands with the same resolution.
Another important characteristic of a band is its spectral signature. In the process of band reconstruction, the preservation of the signature demonstrates the effectiveness of the applied method.
The following subsections illustrate and emphasize the improvements which the proposed method brings to the results. Both spectral signature and resolution are analyzed.
1) Reconstruction of 60-m-Spatial-Resolution Bands: S2 has three bands at 60-m resolution, but as we use the Level 2 products, band 10 is not included due to the fact that it does not contain surface information. The remaining two bands of 60-m resolution are bands 1 and 9.   Although the spectral signatures are not identical and do not follow exactly the same pattern, the effect could be explained by the super-resolution itself. Many similar pixels from the initial band belonging to a not clearly defined area in the reconstructed band may have contrasting values in order to define a delineate clear-cut contour of the objects from the earth's surface.
The examples for band 9 prediction (see Fig. 6) also demonstrate that although the main reason was not the super-resolution, its effect is present and boosts the improvement of the reconstructed band. The presence of more details in the reconstructed scene is also illustrated in the graphs of the spectral signatures, which highlight both the preservation of the patterns and a slight modification due to the increase in contrasts between the neighboring pixels.
2) Reconstruction of 20-m-Spatial-Resolution Bands: The number of bands with 20-m resolution in S2 products is equal to six, namely the bands 5, 6, 7, 8A, 11, and 12. Being the resolution that most of the bands are acquired at and being a very small difference compared to the highest resolution, 10 m, the reconstruction is made more accurately. The minor difference between the initial band and the reconstruction is visible in terms of brightness, both in spectral signature pattern and band visualization.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  Although from a visual point of view, it is not noticeable, there are small differences in the amplitude of pixel values, but when displayed, the images are calibrated so that the contrasts can be observed, and small differences remain hidden. Instead, the spectral signature graphs highlight even the smallest difference between the value of a pixel of the initial band and that of the corresponding pixel in the reconstructed one. Fig. 12(e) and (f) illustrates the greater difference between initial and recovered bands registered in the spectral signature graphs but still preserves the pattern. Both the examples imply higher values of initial band pixels. Taking into account that the wavelengths of these two bands are the longer ones being from the SWIR part of the EM spectrum, it may be deducted that in some cases, although the band will be reconstructed with success, the amplitude of the pixels may not be as higher as in  As regards resolution, it was neither improved nor worsened, so that the original quality of each band was preserved. Also, the spectral signature graphs highlight the maintenance of the pattern, registering small amplitude differences between the original and the reconstructed.  The quality of 10-m band resolution reconstruction is very similar to that of the ones with a resolution of 20 m, so that the observations regarding the amplitude difference between the pixels are preserved. At the level of details and pixelation, the visual inspection distinguishes neither differences nor the spectral signature graphs, as the patterns are preserved. Spectral signatures only highlight the differences in amplitude, as can be seen in Figs. 13(e) and 16(e).
Regarding the difference compared to the bands with a resolution of 60 m, it can be mentioned that in the case of those, the improvement can be seen as a benefit for the subsequent analysis.
Super-resolution was taken into account only considering the physical characteristics that define a multispectral product, but it was not an objective in itself. The fact that such a result was obtained can only encourage further research on this aspect.

V. EXPERIMENTAL RESULTS AND EVALUATION
This section details the implementation aspects: the dataset selection for training and testing, the stages of experimental fulfillment, and metrics used to evaluate the proposed method. Finally, experimental results are presented and analyzed.

A. Train and Test Datasets
Deep learning algorithms usually assume the existence or the need to create a dataset for training and testing operations. However, the method proposed in this article does not involve such a need, the use of the products obtained by the sensors of the S2 mission being sufficient. As the method takes into account the worst case scenario, that assumes the nonexistence of the band to be reconstructed, the only condition is that the complementary spectral bands should be available. Also, in the training stage, it is important to have the initial band so that the network can learn how to reconstruct it.
S2 mission involves two satellites, 2A and 2B, placed on the same orbit, which fly with a phase difference of 180 • [40]. The images resulted from the sensing activity of the two MSI identical sensors may be available for usage at different levels of processing. This work has used Level 2A processed products acquired by both the sensors for the experimental results. The locations of the images have been randomly picked, the main objective being the variety in climate zone and distribution across the globe. Various patterns of urban areas and agricultural Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   Table I) are included in our dataset to prove the efficiency of band reconstruction regardless of land-cover diversity.
An S2 Level 2A product name is defined by a naming convention and has the following form [41]: MMM_MSIXXX_YYYYMMDDHHMMSS_ Nxxyy_ ROOO_ Txxxxx_ ProductDiscriminator.SAFE, where: 1) MMM represents the mission id, naming S2A or S2B; 2) MSIXXX stands for the level or processing (MSIL1C-Level 1C, MSIL2A-Level 2A); 3) YYYYMMDDHHMMSS represents the sensing date and time (e.g., 20220323T195051); 4) Nxxyy stands for PDGS processing baseline number (e.g., N0301); 5) ROOO represents the relative orbit number, which may take values between R001 and R143; 6) Txxxxx is used for the tile number; 7) product discriminator represents also a date field and is used to determine the difference between end user products sensed at the same date; 8) SAFE is the product format.  Table I shows the specific products used for training and testing in the proposed method. The first six columns define the principal fields of the naming convention, which help to uniquely identify the products, and the last two columns position the products in the earth's geographical space by specifying the city and country. S2 data can be downloaded for free from the Copernicus Open Access Hub [42].
Each S2 image has 12 spectral bands, being processed at level 2A. The 12 bands initially have different spatial resolution; therefore, in order to have a uniform resolution, we performed an upsampling operation using the nearest neighbor method to bring all the bands to a 10-m GSD. The size of each resulting image is 10 980 × 10 980 × 12.
To create the training dataset, a subset of 10 944 × 10 944 × 12 was taken from each of the two images shown in Table I, as intended for this purpose. Next, the subset was divided into patches of 304 × 304 × 12, resulting in a number of 1269 patches from each image. Finally, the sets of patches obtained from images were concatenated, thus creating a training dataset of 2592 patches with a size of 304 × 304 × 12.
For testing, each of the images went through the same process as for training, upsampling, subset selection, and patching process, except for the final concatenation. In that way, each image could be passed through the testing process sequentially.

B. Implementation Details
The proposed method implementation was achieved using Python 3.6.13 and TensorFlow 2.3.1 for GPU. Training step was performed on a distributed system containing an Intel(R) Xeon(R) E5-2620v4@2.10 GHz CPU and eight PCIe-connected Tesla K80 GPUs, with 12 GB of RAM each.
As the number of bands of an S2 image is 12, we trained 12 models, one for the reconstruction of each band. The networks were trained having different batch sizes and number of epochs. Table II displays the corresponding batch number and epochs for each model. These numbers were chosen experimentally, after many tries, choosing the ones with better accuracy and lower loss. The duration of training for one model was about 45 min.
The others parameters were set identically for all the trained models. The weights were initialized using he_normal [43], and stochastic gradient descent, through the Adam method [44], was chosen for optimization. Learning rate was set to 1e-3. The loss function used was mean square error (MSE) and was computed between the real band and the reconstructed one.
The filters for the convolutional layers were set in the following way: first two had 88 filters, next two 704, following another two had 1408, next three had 704, another three had 88, one had 11, and the last one had one filter.
For numerical stability, the pixel reflectance values were scaled so that the resulting interval be [0,. . .,255], according to the following formula: where p represents the radiance value of a pixel from a band and x is the input image. The scaling operation was performed imagewise, before transforming it into a set of 304 × 304 × 12 patches.

C. Evaluation Metrics
In order to quantitatively evaluate the results obtained by the proposed method, different state-of-the-art indexes that measure the accuracy of spatial and spectral profiles' preservation were used. As a consequence, the reconstruction accuracy can be determined and studied. Most of these metrics have demonstrated their use in computer science research and were endorsed also in the remote sensing domain.
1) Root-Mean-Square Error (RMSE): RMSE is a very commonly metric used to measure differences between true values and the ones obtained by an estimator or a model. The formula that defines RMSE is wherex is the reconstructed band, x is the initial band, and n is the number of pixels in a band.
2) Structural Similarity Index Measure (SSIM): SSIM measures the similarity between two images, x andx, by taking into consideration three comparison measurements: luminance (l), contrast (c) and structure (s), being generally defined as [45] SSIM(x, Luminance (l) comparison is defined by the formula where μ x is the average of pixel values in x, μx is the average of pixel values inx, and c 1 represents a constant defined by (K 1˜L ) 2 . K 1 is a constant << 1 (0.01) and L is the dynamic range and usually equals 2 Nr−1 , with Nr being the number of bits per pixel. Contrast (c) comparison is defined by the following formula: Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where σ x and σx are the variances of x andx, respectively. c 2 represents a constant defined by (K 2˜L ) 2 . K 2 is a constant equal to 0.03. The following formula stands for the structure (s) comparison: where c 3 = c 2 2 . As a result, the final formula that defines SSIM is SSIM has a predecessor, which is called "universal quality index" (UQI) defined in [46] and [47]. UQI stands for the special case of SSIM, where c 1 = c 2 = 0, but returns unstable results when the sum of averages or variances is very close to 0.
3) Signal-to-Reconstruction Error (SRE): SRE [34] is a useful metric in the case of images that contain clouds, fog, or other phenomena that determine high reflectance values. These high values also generate large absolute reflectance errors. Considering that this metric measures the error relative to the mean image intensity, it is an optimal solution to evaluate the differences between the reconstructed band and the original one. SRE has the property of compensating the effect generated by the high reflectance values.
The formula that defines SRE computation between two bands, x, and its reconstruction,x, is illustrated in the following equation: where μ 2 x is the average of pixel values in x. Resulting values of SRE are expressed in decibels (dB).

4) Peak-Signal-to-Noise Ratio (PSNR):
PSNR is delivered as a logarithmic quantity and is expressed in dB. It is a very commonly used metric that expresses a global measurement of image quality. In comparison with SRE, it is not very suited to computing errors between images with different brightness because the peak intensity remains constant [34].
PSNR is defined by the following formula [48]: where MSE(x,x) is defined as standing for mean squared error, being an often used measure to compute the error function in neural network models.
Optimal PSNR values differ depending on the data type used to represent the pair of images between which the metric is computed [49], [50], [51], [52].

5) Spectral Angle Mapper (SAM):
The SAM [53] represents the angular deviation between initial and reconstructed spectral signatures. SAM values are given in degrees. This metric is very useful in the case of spectral images because it ignores brightness, measuring how precise the spectral distribution of a pixel is respected in the reconstructed band, by comparison with the initial band.
The formula that defines the computation of SAM pixelwise is (11) where nb is the number of bands.
As the proposed method compares images with only one band, the SAM is computed for each pixel, and the final value represents an average over the whole image.
The metric implementation used in this article was the one proposed by Müller [56]. The code, implementation details, and instructions for usage are available on GitHub [57].

D. Results and Discussion
This article introduces a method to restrain the information contained by a multispectral image in order to reconstruct a missing band. The performances of UNetBRec are quantitatively and qualitatively evaluated. For quantitative analysis, state-ofthe-art image reconstruction assessment metrics are computed for UNetBRec and other latest generation methods in order to critically study their achievements. The qualitative analysis is performed by visually comparing the results obtained with the ground truth.
1) Quantitative Analysis: The main evaluation metrics of our quantitative comparison are presented in Section V-C, namely RMSE, SSIM, SRE, PSNR, and SAM. UNetBRec has 12 versions, one for each band reconstruction. An overall comparison, as regards used metrics, shows that depending on the resolution of the band, the image for which it is tested, and the differences between the reflectance values, the evaluation metrics lie within an acceptable range in terms of performance. For example, as can be seen from Table III, in the case of band B9, the RMSE registers a higher value compared to that of other bands, which means a bigger error, but the SRE has a value that is among the best values. This is mainly due to the fact that the SRE measures error relative to signal strength, which RMSE does not. For the same reason, the PSNR does not have such high values, compared to other bands. The SAM is, however, the metric according to which the spectral signature is preserved and highlights the good accuracy of the reconstruction.
For each band, the minimum, maximum, and average values were calculated on the entire test dataset. The networks trained for bands B11 and B12 have been evaluated to obtain poorer metric values, but their accuracy remains high, due to the fact that these two bands are part of SWIR and have wavelengths quite far from the bands used for their reconstruction. B12, being right on the edge of this range, records the weakest results at the average values, but if the minimum values are also observed, it can be deduced that for some test data, the accuracy is among the best.
Moreover, the 60-m band reconstruction also encounters lower metric evaluation, but it also can be explained. Amid  TABLE III  QUANTITATIVE EVALUATION OF UNETBREC USING RMSE, SSIM, PSNR,  SRE, AND SAM METRICS reconstruction, these bands benefit from a resolution improvement, so their comparison with the initial bands determines the differences in metric evaluations. As baseline methods, the ones proposed by Lanaras et al. [34], i.e., DSen2 and VDSen2, and by Brodu [33], i.e., Superres, are used. Although their purpose is different, that of obtaining super-resolution, the main reasons for the comparison are the use of the same type of data, S2 images, and the similarity to a certain extent in terms of approach. The main difference is defined by the lack of bands to reconstruct in training in the case of UNetBRec, while the two methods use all the bands to obtain an improved version from the point of view of resolution. In conclusion, UNetBRec obtains the desired band using the complementary ones, while the competition use all the bands to retrieve super-resolution ones.
Average results over all the test images and all the bands are displayed in Table IV. B2, B3, B4, and B8 are not included in this average quantization as the other methods use the 10-mresolution bands as grounds to obtain the super-resolution in the others. The state-of-the-art Superres yields rather poor results, while DSen2 and VDSen2 perform similarly, with VDSen2 being slightly better in most of the error metrics. UNetBRec   TABLE IV  AVERAGE COMPUTED METRICS COMPARED BETWEEN STATE-OF-THE-ART  METHODS AND UNETBREC   TABLE V  QUANTITATIVE EVALUATION BETWEEN UNETBREC AND STATE-OF-THE-ART  METHODS SUPERRES, DSEN2, AND VDSEN2 OVER RMSE AND SRE METRICS,  BANDWISE reduces the RMSE with 37% and increases the SRE with more than 10 dB, thus demonstrating superiority. The SAM error metric is the only exception, encountering a decrease compared with VDSen2 by 4%, and 1.2% referring to DSen2, but an increase by 22.5% relating to Superres. The difference in SAM values is generated by the resolution improvements and comparison with the initial band. Taking this into account, UNetBRec performs and generates very satisfactorily regarding the purposed scope, approaching and even surpassing state-of-the-art methods that use information of all the bands. On a bandwise analysis, UNetBRec proves its superiority, as shown in Table V. Bands B2, B3, and B4 are not included in this comparison because in the other methods, they remain unchanged and are used as references to obtain the super-resolution of the complementary bands.
All methods' performances vary depending on the band, the differences fitting in the range of what can be called a satisfactory performance. Band B5 turns out to register promising error values along all the methods. This could be explained by the small wavelength difference between it and bands B2, B3, and B4, which, having the best resolution, 10 m, contain a lot of information in both the spectral and spatial domains.
Along all the bands, the superiority of UNetBRec is emphasized through error metric results over state-of-the-art methods.
2) Qualitative Analysis: The qualitative analysis of the results obtained with UNetBRec involves comparing the reconstructed band with the original one, also visualizing the difference computed pixelwise between them.
The following examples will highlight two important aspects: the accuracy of the method on scenes from areas of the globe for which training was not carried out, such as the North Pole, and the accuracy on scenes containing atmospheric phenomena, such as thin clouds. The two examples are representative because they show niche cases as regards the large amount of high reflectance values, which cause an increase in the obtained range of pixel  values. Thus, these examples represent a strong performance demonstration of UNetBRec regarding the recovery of a band. Fig. 17 illustrates a scene from the North Pole, which is analyzed on each band by comparing ground truth, reconstruction, and difference bands. Taking into account the discussion on the quantitative analysis and the calculated errors, the observations made remain valid as follows.
1) 60-m-resolution bands highlight differences in terms of the contours of the objects, with the difference image recording higher values (marked in white) in those areas. 2) SWIR area bands also show a greater difference due to their isolation in an area of the spectrum where there are not so many bands to help the reconstruction with a sufficiently high accuracy. 3) Visual spectrum bands benefit from a reconstruction with high accuracy, which makes the visualization of the difference contain many small values (marked in black). The area most often marked as different is water. It generally registers very low reflectance values, which in the present case represents a very large contrast to the values associated with the considerable area of snow, which surrounds it. These errors along high-contrast edges usually are generated in cases of blurred edges or contrast intensification. Fig. 18 illustrates one example of reconstruction, which is not influenced by high contrast between the edges of the reflectance values range. The reconstruction of any of the bands is carried out with high accuracy, which can be seen from the visualization of the difference. The method exhibits only slight traces of a behavior involving large errors along high-contrast edges.
The bands from the NIR and SWIR part of the EM spectrum, as they are acquired at a longer wavelength, succeed in capturing information about the earth's surface even through thin clouds, the information content being enriched from one band to another. The rendering of the reconstruction error highlighted by the difference exhibits low values, highlighting the ability of the proposed method to recover the missing band from the concurrent ones. The difference in content and the high reconstruction accuracy are the main factors that demonstrate the power and efficacy of the proposed method.

VI. CONCLUSION
This article provided an efficient and rapid neural-networkbased method to recover a missing spectral band of an S2 product. The data needed to perform the reconstruction were obtained from the complementary spectral bands. The starting point of this method was a CNN, namely U-Net. Modifications consisting in reducing the number of convolutional levels and changing the dimensions of the parameters used were essential for its adaptation to our purpose, U-Net being initially used for segmentation. Each band of a multispectral S2 image involved training a model intended to reconstruct it. The adapted network, UNetBRec, demonstrated both quantitatively and qualitatively its improved effectiveness among baseline methods. Moreover, this method proved the efficiency of band reconstruction regardless of land-cover diversity, our datasets' scenes being randomly distributed across the world.
The main advantages of this method are as follows.
1) No labeled datasets are required.
2) It does not require additional information from other sensors or from the same sensor, but at a short time interval. 3) It is an unsupervised method. 4) The corrupt band is not used based on the assumption that it does not exist. We stated that it is an unsupervised method that does not require any labeled datasets due to the fact that the reconstructed band is learned from the other, existing, noise-free bands.
The following disadvantages may be considered: the existence necessity of the complementary bands to the one for which the reconstruction is carried out and the fact that the method it was not trained and tested multisensor .
Considering the balance between pros and cons of the above method, future research will focus on the generalization for multiple sensors and the possibility to derive a super-resolution method. Since 2007, she has been a Professor in information theory with the Department of Applied Electronics and Information Engineering, Faculty of Electronics, Telecommunications and Information Technology, UPB. She is involved in national and European research grants in the field of earth observation data processing and visualization targeting applications in agriculture, disaster and humanitarian crisis management, and biodiversity monitoring. In 2010, she cofounded CEOSpaceTech, the Research Center for Spatial Information that promotes technological and scientific areas of research in earth observation and related fields. She acted as a Project Manager for "GEODIM-Platform for Geo-Information in Support of Disaster Management," "LEOSITS-Long Term Data Exploitation for Satellite Image Time Series-Extraction of Classes for Scene Dynamic," and "VATEO-Visual Analytics Tool for Earth Observation Images," and as a Principal Investigator for "eVADE-ESA ITT-Interactive Visual Analysis Tool for Earth Observation Data." As a result of winning Copernicus Incubator in 2019, she cofounded OGOR, an agritech startup that builds a live journal based on Copernicus satellite images. She serves as a Europe Principal Investigator in the frame of ESA NASSC Dragon 5 Cooperation, coordinating the project on "Large-scale spatial-temporal analysis for dense satellite image series with deep learning" for the term 2020-2024. From 2019 to 2021, she represented UPB in H2020 SPACE-END: ENDEAVOUR