Introduction
Earth observation (EO) represents the most efficient approach to detect and monitor Earth surface dynamics and changes [1], [2], [3]. Nowadays, the available open-access remote sensing datasets represent a massive source of information that has not been completely exploited yet to its full potential and requires new methodologies, supported by technological advances [4]. Among these, long time-series analysis (TSA) is undoubtedly promising to better understand Earth’s surface change processes, making possible to study not only drastic changes, but also long-term or more subtle phenomena, crucial in climate change studies. In this context, multispectral missions, such as Landsat and Sentinel, that capture images of our planet since the early 1970s, represent a precious data source that deserves deeper examination.
The Landsat program, started officially in 1967 from a partnership between the National Aeronautics and Space Administration (NASA) and the U.S. Geological Survey (USGS), is the longest running and the first medium-resolution EO mission. Its programmatic development focuses on program continuity providing high-quality data without interruptions because of continuous turnover and updating of satellites [5], [6]. One major turning point for the program and for the EO industry was in 2008, when the Landsat archive policy turned into open access, delivering data with different processing and complexity levels [7]. The same programmatic path has been followed in the design and maintenance of the open-access Sentinel-2 mission, managed by the European Space Agency (ESA) in the framework of the European EO program “Copernicus.” Some improvements with respect to the Landsat program are in the spatial and temporal resolution [8].
In this context, the development of new technologies in the era of big EO data is crucial to enable the full exploitation of the data potentiality [9]. This demand has been successfully accomplished with the development of cloud-computing technologies, such as Google Earth engine (GEE), which was used in this study [10]. GEE is a platform for geospatial analysis consisting in a multipetabyte analysis-ready data (ARD) catalog colocated with a high-performance, intrinsically parallel computation service powered by Google [11].
Because of this great amount of free data accessible through cloud computing platforms, traditional remote sensing applications, such as change detection, have successfully moved from standard approaches involving image pairs to TSA of remotely sensed data [12], [13]. The pixel-based TSA for monitoring pixel trajectories over time represents, nowadays, a well-established methodological trend [14]. An overview regarding potentiality, methodologies, and challenges of remote sensing time series can be found in [15].
Moreover, this approach to the analysis of multispectral data is rapidly becoming a standard practice in vegetation-related studies, including monitoring of both forest and agricultural environments [16], [17]. The reviews by Bégué et al. [18] and Gómez et al. [19] cover a variety of examples where this methodology proved to be effective in identifying seasonal phenological variation, allowing, for example, a better classification of crop types. In this context, information on the biophysical characteristics of vegetation collected by multispectral cameras can be successfully synthesized because of vegetation indices (VIs), which are obtained through the combination of spectral bands [20], [21]. Four of the most widely used VIs are considered in this study: normalized difference vegetation index (NDVI), soil adjusted vegetation index (SAVI), enhanced vegetation index (EVI), and normalized difference moisture index (NDWI). In the last years, these VIs has been successfully implemented in a large variety of applications for their simplicity and effectiveness [22], [23], [24], [25], [26].
However, multispectral derived datasets have some known limits that need to be overcome to improve TSA applications. For example, a recent study conducted by USGS [5], investigating user needs for future Landsat missions, raised attention on desired cloud-free observation frequency: the survey showed that in the 71% of cases, subject matter experts believe that a weekly cloud-free observation frequency is a breakthrough requirement. Having cloud-free data information on a weekly basis would actually lead to a significant improvement in data effectiveness for their applications. One possible answer to this need is the harmonization of several sensors, which would allow to increase the number of acquisitions. Furthermore, since the main limitation of passive optical sensors is the dependence on weather conditions, harmonization would increase the probability of collecting cloud-free images. Looking at the present, as demonstrated by the analysis of Li and Chen [27], the harmonization of Sentinel-2 MSI and Landsat OLI/OLI+ (equipped on Landsat-8 and Landsat-9) would highly increase the revisit time, up to a 2.3-day global average, giving satellite remote sensing a new perspective for land surface monitoring.
In order to achieve these purposes, some authors proposed statistical calibration parameters to adjust spectral reflectance values across similar instruments and build synthetic multisatellite constellation. Mandanici and Bitelli [28] analyzed the correlation between the corresponding bands of Sentinel-2A and Landsat-8 on selected but limited sites, also evaluating the effects of spatial heterogeneity. Chastain et al. [29] proposed a cross-sensors comparison of Sentinel-2A and 2B MSI, Landsat-8 OLI, and Landsat-7 ETM+ top of atmosphere (TOA) spectral bands, providing the regression coefficients that allow to integrate MSI data with ETM+ and OLI over the conterminous United States (CONUS). Denaro and Lin [30] proposed the use of nonlinear models for the cross-sensor normalization of Landsat-7 and Landsat-8 imagery. More recently, Xie et al. [31] estimated cross-sensors linear transformations between Landsat-8 and Sentinel-2 based on only 76 image pairs spread all over the world. Cao et al. [32] proposed a similar analysis including Landsat-7 and deriving the transformations from a larger dataset but limited to the Chinese territory.
Other authors explicitly addressed the problem of vegetation index comparison. Li et al. [33] conducted a cross comparison of four VIs derived from six pairs of Landsat-7 and Landsat-8 images over Myanmar. Roy et al. [34] compared Landsat-7 ETM+ and Landsat-8 OLI and computed transformation coefficients for the integration of their spectral bands and the NDVI obtained from them. Chen et al. [35] proposed transformations to harmonize the NDVI computed from Landsat-4-5 multispectral scanner and thematic mapper (TM), elaborating on simulated data derived from Hyperion hyperspectral images. Mancino et al. [36] presented a specific case study in Italy comparing six VIs between Landsat-7 and Landsat-8 and finding statistically significant differences among four different land-cover classes.
In the last years, NASA and USGS developed the harmonized Landsat-8 and Sentinel-2 (HLS) surface reflectance (SR) dataset currently at version 2.0, which provides harmonized multispectral bands with global coverage [37]. However, at the time of the writing, there are no “ARD” for vegetation index TSA including the full Landsat constellation and Sentinel data. Such datasets would further expand and facilitate the use of EO by even technicians who are not experts in remote sensing.
Therefore, this study aims to propose a set of cross-sensor transformation coefficients, which are valid on a continental scale and easy to implement for end-users, in order to create harmonized vegetation index time series. Starting from the approach implemented by Chastain et al. [29] and Roy et al. [34], this work introduces four major novelties. First, from a methodological point of view, sampling strategy was improved to provide an estimation of the repeatability of the computed coefficients. Second, the analysis was performed using the recently released Landsat Collection-2 dataset, replacing the Collection-1 used in previous study. Moreover, the cross comparison was performed on vegetation indexes and not on the single bands. Third, the paired observations samples were gathered for the first time from the entire European continent. Finally, this study encompasses data from Landsat-5 (L5), Landsat-7 (L7), Landsat-8 (L8), and Sentinel-2 (S2) altogether, allowing the extraction of VIs time series starting from 1984.
In the following sections, we provide a summary of L5, L7, L8, and S2 characteristics and their products (Section II); a description of the methodology include preprocessing, data gathering, sampling, and statistical analysis (Section III), and finally, the results and the transformation coefficients are presented (Section IV).
Materials
A. Landsat
Landsat program has been collecting Earth surface multispectral images since 1972, and it represents the longest living remote sensing mission in the world. It is a joint project between NASA, responsible for the satellite construction and launch, and USGS, which manages the archive and the distribution of data. The mission was designed to achieve efficient multispectral monitoring of land surface, with 30-m pixel average resolution and a revisit cycle of 16 days on every point on Earth. Every new Landsat satellite has been designed to create a pair constellation with the previous one still operating, resulting in an eight-day revisit coverage [6].
This study included datasets acquired by three different Landsat instruments: L5 TM, L7 enhanced TM Plus (ETM+), and L8 operational land imager (OLI).
After its launch in March 1984, L5 was operated by USGS until January 2013. For more than 29 years, it acquired over 2.5 million images of the Earth, largely exceeding its original three-year designed life. Its TM sensor captured the Earth surface spectral reflectance in six bands at 30-m spatial resolution and 120-m spatial resolution for the thermal band [38].
The L7 was launched in April 1999 and carried the ETM+ instrument. The ETM+ represents an improvement with respect to the TM sensors, with the addition of the panchromatic 15-m resolution band. From June 2003, when the scan line corrector (SLC) failed, L7 images were acquired and delivered with gaps, producing a loss of information up to the 22% [39]. The decommissioning of L7 began in mid-2021, leaving its orbit to the new Landsat-9.
L8 was launched in February 2013 equipped with the OLI and the thermal infrared sensor (TIRS). OLI measures the visible, NIR, and SWIR part of the electromagnetic spectrum, while TIRS operates in the thermal region. Following the spectral improvements achieved with ETM+ instrument, OLI was designed with a panchromatic 15-m band and eight 30-m spectral bands. In OLI, the new ultrablue band (Band 1) and the Band 9 (1.36–
In order to improve Landsat data consistency and interoperability across sensors, the archive was reprocessed twice. After the first global Landsat-1 to Landsat-8 reprocessing into Collection-1 data [40], the second major reprocessing of the archive, performed in 2021, led to the release of Collection-2 data, which replaced Collection-1 starting from January 2022. Collection-2 major improvement regards geometry accuracy: for a better exploitation of archive interoperability, the L8 Ground Control Points were rebaselined to the ESA S2 Global Reference Image. In addition, the digital elevation model sources were updated, and the accessibility from commercial cloud-based environment was improved [41].
Since the release of the Collection-1 data in 2016, the Landsat products are structured in a three-level hierarchical quality inventory to grant consistency in Landsat data processing and traceability of data quality records [42]. The highest data quality products with high geolocation accuracy (root mean square error (RMSE)
In addition, three different product levels are delivered: the global Level-1 data [top-of-atmosphere (TOA)], Level-2 data (SR), and the U.S. ARD. Level-2 data products are obtained applying atmospheric correction to Level-1 products (standard Landsat products) with a Solar Zenith Angle lower than 76° [43]. In particular, the following algorithms are used: the land SR code (LaSRC) algorithm (version 1.5.0) [44] was applied to L8 OLI scenes, while L5 TM and L7 ETM+ SR products are generated using the Landsat ecosystem disturbance adaptive processing system (LEDAPS) algorithm (version 3.4.0) [45].
In this study, the SR Collection-2 Tier-1 datasets were used for all the Landsat instruments considered.
B. Sentinel-2
S2 mission by ESA is a twin polar-orbiting satellite phased at 180° to each other: Sentinel-2A and Sentinel-2B were launched in 2015 and 2017, respectively. Their orbit is angled of 98.62° and acquires images over land and coastal areas with a 290-km width swath, covering the Earth surface between the latitudes 56° South and 83° North. The mean local solar time at the descending node is 10:30 A.M. [46].
The two satellites are equipped with the multispectral instrument (MSI), a sensor that measures the Earth’s reflected radiance in 13 spectral bands, from VNIR to SWIR, providing imagery at different spatial resolution, ranging between 10 and 60 m, as summarized in Table I.
Users can freely download two different product levels. Level-1C products provide TOA reflectance data. The atmospheric correction of Level-1C scenes, by means of the Sen2Cor processor, produces the bottom-of-atmosphere (BOA) Level-2A products, i.e., the SR image [47], [48]. For the study, the SR Level-2A dataset was used.
Fig. 1 shows the operational timeline of the satellites considered in the study and gives a comparison of main mission characteristics.
C. Vegetation Indices
Four VIs were considered in this study: NDVI, EVI, SAVI, and NDMI. These VIs are obtained from the following bands: green, red, and red-edge bands (highly correlated with chlorophyll and other pigments contents); NIR band (sensitive to leaf structure), and the SWIR band (sensitive to water content) [20], [21]. NDVI is the most widespread index sensitive to chlorophyll computed from NIR and red bands [49], [50]. Being one of the most stable indexes, NDVI allows comparisons of seasonal and interannual changes in vegetation growth. Some improvements to NDVI were implemented to reduce the environmental effects to index variations. SAVI proposed by Huete [51] minimizes background soil brightness influences of NDVI. On the other hand, the EVI is used to reduce atmospheric effects that could lead to high biomass saturation [52]. Finally, NDMI consists in the normalized difference between NIR and SWIR, and it helps in vegetation water content assessment, useful, for example, when dealing with irrigations systems [53], [54]. In this study, these VIs were calculated according to (1)–(4), using the Landsat Collection-2 and Sentinel Level-2A SR datasets; the bands used are those highlighted in bold in Table I, which are the most similar across sensors. The coefficients in the formulas for EVI and SAVI are those suggested by USGS for the computation of on-demand vegetation indexes [44], [45], [55] \begin{align*} \text {NDVI} &= \dfrac {\text {NIR} - \text {Red}}{\text {NIR} + \text {Red}} \tag{1}\\ \text {EVI} &= 2.5 \cdot \dfrac {\text {NIR} - \text {Red}}{\text {NIR} + 6 \cdot \text {Red} - 7.5 \cdot \text {Blue} + 1} \tag{2}\\ \text {SAVI} &= 1.5 \cdot \dfrac {\text {NIR} - \text {Red}}{\text {NIR} + \text {Red} + 0.5} \tag{3}\\ \text {NDMI} &= \dfrac {\text {NIR} - \text {SWIR}}{\text {NIR} + \text {SWIR}}. \tag{4}\end{align*}
The coefficients in the formulas for EVI and SAVI are those suggested by USGS for the computation of on-demand vegetation indexes. Indeed, these were selected in this study to ensure, as much as possible, a generalized analysis able to accommodate most land-cover types. For example, in (3), the
Methodology
The vegetation indexes derived from the spectral bands of the TM, ETM+, OLI, and MSI sensors were compared. The sensors were compared in pairs, by randomly sampling the indexes values from overlapping images acquired with a maximum delay of one day, as detailed in the following sections. As a consequence, to ensure statistically robust samples of data, the comparison was possible just between those sensors, which were actively operating for a common period of time of at least two years (at the time of the writing). For this reason, the newer Landsat-9 sensor is not included in this study.
A. Study Area
The study area covers the entire European continent (Fig. 2), which was tiled in 100 subregions for computational reasons. This area comprehends a wide spectrum of land-cover types and ecosystems, encompassing cultivated land (25%), natural vegetation (up to 65%), water bodies and wetland (6%), and urban area (2%) according to the Copernicus Global Land Cover 2018 [57]; thus, it provides an exhaustive and varied set of data.
B. Data Gathering
The data sampling was completely performed in GEE accessing the Landsat (Collection-2) and Sentinel-2 SR datasets available in its catalog.
First of all, the image collections were filtered based on time, space, and cloud cover. Specifically, at least two common years of acquisition were selected between the two missions; the search area was limited to Europe, and the maximum image cloud-cover percentage was set to 1% for Landsat data and to 0.1% for S2 data. This difference in percentages was derived empirically after several tests, and it is intended to compensate for the higher revisit time of S2, resulting in higher data availability and for the lower effectiveness of the cloud masking algorithm of S2 [58]. In order to obtain balanced datasets concerning images acquired in different seasons and, thus, in different vegetative states, the dataset was split into two different time spans: one from October to March (autumn-wintertime) and one from April to September (spring-summertime). Because of cloud-free images shortage during wintertime and to balance the datasets between autumn-wintertime images and spring-summertime images, the search time span for the autumn-wintertime images was doubled. In addition, the search time span was carefully adjusted to ensure a similar population of valid pixels for each sensors couple, considering also the operational period of each sensor and some specific peculiarities, such as the SLC issues of L7. Table II provides specific information about the metadata filters applied to produce the dataset used for the analysis. In particular, time span, covering summer and winter period separately, and cloud-cover percentage are the filters applied to a single satellite collection. The total record (Tot records) is the number images, satisfying those filters for each satellite. The Joined collection is the number of pairs of images satisfying clouds and time span filters, which were acquired over the same area almost in the same date (±1 day) by the two satellites considered in the cross-sensor analysis (the maps of the footprint intersections of these acquisitions are provided in the Supplementary Materials).
C. Pixel Masking
At this point, the selected images were pixelwise masked on the basis of the pixel quality assessment (QA) bitmask band. For Landsat products, it was generated from the C function of mask (CFMask) algorithm. The CFMask derives from the function of mask (FMask), which is able to label the scene pixels as cloud, cloud shadow, cirrus, snow/ice, or water, and provides a bit-mapped values output [59], [60]. This product was used to remove high- and medium-confidence clouds, dilated clouds, cloud shadows, snow/ice, and water pixels, in order to include in the analysis only clear land pixels. Only for L8 products, it was possible to mask also pixel marked as high confidence cirrus.
The same process was performed for the S2 images by means of the scene classification map (SCL) QA band, which, likewise the one for the Landsat products, labels the pixels on the basis of a classification process and, thus, allows the user to easily perform pixelwise masking. High- and medium-probability clouds, cloud shadows, cirrus, water, and snow/ice pixels were removed, accordingly with the masking process done for the Landsat products [61].
Furthermore, saturated and out-of-range pixels were masked using the radiometric saturation QA bands and valid value range. This means that all the saturated pixels and the pixels with a value of the vegetation index outside the range of interest, which is [0, 1] for NDVI, EVI and SAVI, and [−1, 1] for NDMI, were discarded.
D. Image Coupling, Coregistration, and Reprojection
The images, or portion of images, of two different sensors, filtered and masked as described above, which are spatially overlapping and acquired within 24 h, were paired, finely co-registered with each other, reprojected, to make sure the images shared the same coordinate reference system, and resampled at the coarsest resolution (30 m). This was done to avoid differences in the VI values due to land-cover changes, bad spatial overlap, or differences in pixel size.
Despite the maximum time difference of 24 h between the two images of each pair, which should ensure no land-cover changes occurred, there are still some pixels showing huge reflectance differences. Therefore, the paired images were further masked following the methodology proposed by Roy et al. [34], which is based on the pixelwise difference in the blue band values. However, using here images already corrected at SR, pixels with a difference greater than the 50% of the average were discarded.
E. Sampling
For computational limit reasons, cross-sensors comparisons were not performed on the entire pixel population but on statistical samples randomly extracted from the population of valid pixels (after masking) belonging to the paired images. For each couple of sensors, samples were independently selected on a purely random basis. A statistical analysis was performed to assess the optimal sample size looking for a trade-off between computational complexity and statistical significance.
Different sample sizes (in the range between 1000 and 500 000 pixels) were tested by repeating the extraction 100 times and evaluating the variance of the cross-sensors parameters (described in Section III-F).
Fig. 3 shows the analysis performed on the L7 and L8 NDVI pair, here presented as an example. As it can be seen from these plots, the decrease of the variance with the increase of the sample size is asymptotic, and higher number of pixels would result only in an unfruitful increase of the computational burden. The optimal sample size was, therefore, set to 300 000 pixels, which corresponds—for the NDVI index—to a standard deviation of the linear regression coefficients lower than 0.0004 (i.e., intercept equal to 0.0002 and slope equal to 0.0003).
Variation of RMA coefficients for L7 and L8 NDVI image pairs. (a) RMA intercept, slope, and
The case of the comparison between L5 and L7 is singular because of the exceptionally long period in which both the satellites were contemporary operational. This circumstance offers the opportunity to further analyze the linear relationships, in particular to investigate possible fluctuations of the estimated coefficients during time. For this purpose, further samples of the same size were extracted, one for each year in the period 1999–2011.
F. Cross-Sensor Analysis
For each couple of sensors, two ordinary least square (OLS) regressions and a reduced major axis (RMA) regression were computed. The OLS regression allows to find a transformation function from a sensor to the other: the slope and intercept parameters change depending on which variable (i.e., which sensor) is defined as dependent or independent; thus, the OLS regression was performed twice, inverting dependent and independent variable each time, in order to provide transformation functions from a sensor to the other and vice versa [34]. On the contrary, the RMA regression is performed only once, since the relationship between the interchanged variables can be obtained with a simple algebraic operation [62]. In any case, it is assumed that both the dependent and independent variables are subject to errors, which is appropriate because of the possible residual errors that the data may have, such as atmospheric correction and sensor calibration errors [29], [34].
The goodness of the fit of the regressions was evaluated with the coefficient of determination (
In order to provide an overall measure of similarity between the datasets, three different difference metrics were derived as follows:\begin{align*} \text {MD} &= \sum _{i}^{n} \dfrac {v_{i}^{A}-v_{i}^{B}}{n} \tag{5}\\ \text {RMSD} &= \sqrt {\dfrac {\sum _{i}^{n} \left ({v_{i}^{A}-v_{i}^{B} }\right)^{2} }{n}} \tag{6}\\ \text {MRD} &= \dfrac {\sum _{i}^{n}\dfrac {v_{i}^{A}-v_{i}^{B}}{0.5 \left ({v_{i}^{A}+v_{i}^{B} }\right) }}{n} 100 \tag{7}\end{align*}
G. Validation
The effectiveness of the suggested linear transformations was checked by evaluating the differences in the VIs values between every couple of sensors before and after the harmonization. These differences are computed on fully independent samples, obtained as follows. Exploiting the Landsat Worldwide Reference System, the 30% of the nominal scene centers over Europe were randomly selected. The areas belonging to these Landsat standard full scenes (tiles defined by their paths and rows) were excluded from the extraction process of the samples used for the cross-sensor analyses described in Section III-F (Fig. 4). This 30% of the tiles was used for the extraction of the independent samples for validation. For each couple of sensors, one validation sample was created, consisting of at least two million random points. These are the points used to statistically analyze the differences in the VIs values. The balance between the validation and training tiles, in terms of land-cover classes, was verified (average difference around 1% and in any case lower than 6%).
Landsat standard full scene footprints. The ones randomly selected as validation areas are highlighted in yellow.
Results
The results obtained from the comparison of the vegetation indexes derived from different sensors are reported in Table III and Figs. 5–11, and they are organized by couple of sensors: OLI and MSI, ETM+ and MSI, ETM+ and OLI, and TM and ETM+. As explained above, for every sensor couple, the results are obtained through 100 independent extractions of 300 000 paired observations. These analyses highlighted both differences and similarities between these products and allowed to derive transformation coefficients to be used for a harmonized integration of the different datasets.
Scatterplots of the VIs for S2 MSI (vertical axis) against L8 OLI (horizontal axis). (a) NDVI. (b) EVI. (c) SAVI. (d) NDMI. The plot colors illustrate the probability density of VIs values with logarithmic scale. The solid lines show the three regression fits.
Residuals distribution of MSI and OLI applying the RMA coefficients (OLI independent variable) for (a) NDVI, (b) EVI, (c) SAVI, and (d) NDMI. Red dashed lines represent the mean values.
Scatterplots for all of the vegetation indexes for S2 MSI (vertical axis) against L7 ETM+ (horizontal axis). (a) NDVI. (b) EVI. (c) SAVI. (d) NDMI. The plot colors illustrate the probability density of VIs values with logarithmic scale. The solid lines show the three regression fits.
Residuals distribution of MSI and ETM+ applying the RMA coefficients (ETM+ independent variable) for (a) NDVI, (b) EVI, (c) SAVI, and (d) NDMI. Red dashed lines represent the mean value.
Scatterplots for all of the vegetation indexes for L7 ETM+ (vertical axis) against L8 OLI (horizontal axis). (a) NDVI. (b) EVI. (c) SAVI. (d) NDMI. The plot colors illustrate the probability density of VIs values with logarithmic scale. The solid lines show the three regression fits.
Residuals distribution of ETM+ and OLI applying the RMA coefficients (OLI independent variable) for (a) NDVI, (b) EVI, (c) SAVI, and (d) NDMI. Red dashed lines represent the mean value.
Scatterplots for all of the vegetation indexes for L7 ETM+ (vertical axis) against L5 TM (horizontal axis). (a) NDVI. (b) EVI. (c) SAVI. (d) NDMI. The plot colors illustrate the probability density of VIs values with logarithmic scale. The solid lines show the three regression fits.
A. OLI and MSI
In Table III(a) and Fig. 5, the results of the analysis involving the OLI and MSI sensors are presented. Considering the paired observations collected between 2016 and 2020, the lowest MD between corresponding indices was found in the NDVI, equal to −0.0004, while the highest in NDMI, equal to 0.0248. The RMSD values are quite similar for all the VIs, ranging from 0.0455 (SAVI) to 0.0586 (NDMI). All the regression models are highly significant, all showing
To evaluate the effectiveness of the transformations, the RMA coefficients were applied to the sampled OLI pixels belonging to the validation set, to compute the harmonized hOLI values that are expected to compare with MSI better. The hOLI values were obtained using OLI as independent variable and applying the coefficient in Table III(a). The differences between the hOLI VI values and the original values from the paired MSI are then computed (hDiff), and their histograms are shown in Fig. 6. The harmonization decreased the residuals for every index: NDMI has the highest improvement with a decrease of the MD of 0.0247 because of harmonization followed by the EVI with a decrease of 0.0109. Overall, the means of the harmonized residuals are very low, between −0.0008 (NDMI) and 0.0002 (SAVI).
B. ETM+ and MSI
The comparison between ETM+ and MSI instruments acquisition is showed in Table III(b) and Fig. 7. The MD of the sampled values ranges from −0.0286 (EVI) to 0.0059 (NDMI). The RMSD ranges from 0.0470 (SAVI) to 0.0600 (NDVI). The MRD values are all quite low, ranging from 0.2335 (NDMI) to −7.6555 (EVI). The
Also, in this case, the RMA transformation coefficients were applied to the samples belonging to the validation set, using ETM+ observations as independent variable, to produce the harmonized hETM+ VIs. The hETM+ and corresponding MSI residuals (hDiff) distribution and mean values are presented in Fig. 8. Also here, the harmonization decreased the residuals for every index: EVI has the highest improvement with a decrease of the MD of 0.0331. Overall, the means of the harmonized residuals are low, between 0.0033 (NDMI) and 0.0069 (NDVI).
C. ETM+ and OLI
The MDs of the sampled ETM+ and OLI paired observations range from a minimum of 0.0147, for the EVI, to a maximum of 0.0347, for the NDVI. The RMSD values go from 0.0416 (SAVI) and 0.0657 (NDVI), while the MRD is lower than 9 for all the vegetation indexes [Table III(c)]. The
Over all the validation samples, the harmonized hOLI was computed by means of the RMA using OLI observations as independent variable. The residuals distributions and mean values of the paired hOLI and ETM+ difference (hDiff) are presented in Fig. 10. Again, the harmonization decreased the residuals for every index: in this case, NDVI has the highest improvement with a decrease of the MD of 0.0342. Overall, the means of the harmonized residuals are very low, between −0.0007 (NDMI) and −0.0002 (EVI).
D. TM and ETM+
The results of the comparison between TM and ETM+ are summarized in Table III(d) and graphically displayed in Fig. 11. The MD ranges from 0.0003 (EVI) to −0.0189 (NDVI), and the RMSD from 0.0604 (NDVI) to 0.0388 (SAVI). The MRD is lower than 4, in absolute value, for all the indexes. All the regression models show a high significance (
Again, for all the validation samples, the harmonized hTM VIs were computed by the RMA using TM derived indexes as independent variables. The residuals distribution and mean values of the difference between hTM and ETM+ are presented in Fig. 12. Similar to the previous cases, the harmonization decreased the residuals for every index: NDVI has the highest improvement with a decrease of the MD of 0.0195. Overall, the means of the harmonized residuals are very low, between 0.0001 (NDMI) and 0.0013 (NDVI).
Residuals distribution of ETM+ and TM applying the RMA coefficients (OLI independent variable) for (a) NDVI, (b) EVI, (c) SAVI, and (d) NDMI. Red dashed lines represent the mean value.
E. Time
Because of the longer period of contemporary acquisitions of these two sensors, an additional analysis was performed here to investigate the stability over time of the computed parameters for the transformations. The intercepts and slopes of the RMA transformations recomputed for all the VIs but using 12 samples, each one composed of 300000 points and extracted in a different year between 1999 and 2011 (as stated in Section III-E). Fig. 13 shows, for every examined VI, the resulting 12 sets of intercept and slope parameters and how their values change over years, compared with the suggested ones presented in Table III(d). A sensible fluctuation can be noted for both intercepts and slopes from year to year, even though no apparent trends are detectable.
RMA (a), (c), (e), (g) intercept and (b), (d), (f), (h) slope coefficients computed for each VI on yearly samples of L5 TM and L7 TM+ paired observations. The red lines represent the coefficients resulting from the RMA computed within the entire period [from Table III(d)].
Discussion
In general, the presented results confirmed the data continuity expected from Landsat Collection-2. Furthermore, the calculated statistics validated its interoperability with the S2 mission. Indeed, the linear regression functions computed for each sensor pair showed a good agreement between the values of the indices, with
Observing the graphs in Figs. 5, 7, 9, and 11, two considerations can be made. First, NDVI observations are the most disperse with respect to the linear function fits. Second, EVI and SAVI sample densities are higher for lower values of the indexes, while NDVI is more homogeneously distributed along the entire range, showing a great concentration also for very high values of the index. In addition, the NDVI scatter is the most dispersed with respect to its linear regression lines: this is especially true when L7 ETM+ is considered. Specifically, in the comparison between ETM+ and OLI, some points present sensibly higher values of NDVI from one sensor compared with the other.
Likely, the explanation of the observed behaviors can be found in a well-known limitation of the NDVI, i.e., the saturation, which occurs especially in the areas of high biomass [52], [65]. This would explain the higher point density for high values of NDVI, if compared with those of the other indexes [for example, compare Fig. 11(a) with 11(b) and (c)]. SAVI and EVI are instead able to reduce noise and saturation; this is confirmed by the values reported in Table III in which the NDVI difference metrics are higher, compared with the other indices, almost for every sensors pair. The signal-to-noise ratio of SAVI, for example, is reported to be up to five times higher than NDVI depending on the green cover percentage [66]. The peculiar dispersion observed in the NDVI scatter plot between ETM+ and OLI suggests an influence of the spectral response function on the saturation phenomenon. Indeed, some outliers seem due to the saturation of the OLI (NDVI greater than 0.75 for OLI and NDVI between 0.3 and 0.6 for ETM+), while others show the opposite pattern even though with lower saturation levels [Fig. 9(a)]. More details about spectral response function comparison between L7 and L8 are provided by Roy et al. [34]. In addition, as previously highlighted by Irons et al. [67], the NIR band of OLI was designed narrower to exclude the ETM+ water vapor absorption feature at
Another aspect that can contribute to the different distribution of Vis values is calibration error propagation. According to the study by Miura et al. [68], assessing the impact of reflectance calibration uncertainties on VIs derived by MODIS Terra mission, such uncertainties propagation is index-dependent, and different patterns and magnitudes were found for NDVI, EVI, and SAVI.
However, it is important to remark that the majority of the pairwise values are concentrated around the regression lines, and the model well fit the data as demonstrated by the high
In general, the NDMI performs similar to the other indices, despite the fact that the SWIR bands—which are the ones with the lowest overlap in the spectral response function [29]—are involved.
A direct comparison of the coefficients of the linear transformations proposed here with analogous values published in previous works is problematic, because in many cases, the estimation is based on limited samples [28], [33], or using different regression algorithms [32], or on a limited timespan [36]. On the one hand, the numerical values of the coefficients may vary depending on the choice of the area of interest (AOI), its land cover, and the considered time span; on the other hand, the observed RMSEs are comparable. The coefficient proposed here is averaged among 100 random extractions of a very large number of pixels, and the obtained standard deviations are relatively small, proving the robustness of the solution. For example, a standard deviation by 0.0003 for the slope coefficient of the NDVI in Table III(d) means a variation of 0.00015 on the corrected index, when the original value is 0.5. As a further test, it was verified that, when repeating the whole process on the area of intersection among all the four cross-sensor analyses (see the Supplementary Materials), the difference in the results is negligible. Thus, the proposed coefficients can be used and considered valid all over Europe, at regional as well as continental scale.
More sensible fluctuations emerge instead from the analysis performed year by year when comparing ETM+ with TM. Indeed, their entity is one order of magnitude higher than the standard deviations observed in the 100 extractions over the full period considered as a unique population. Observing the charts of Fig. 13, it seems not possible to ascertain a common trend, nor a clear periodical oscillation. One possible justification for these fluctuations may be sought in the orbit drifting of L5. As pointed out by Roy et al. [24], L5 orbit was not maintained consistently over years, and a temporal pattern of increasing and then decreasing overpass times was observed, as the orbit was adjusted by periodic station keeping maneuvers. These resulted in changes in the illumination geometry at the moment of acquisition in a place. Anyway, further investigations are necessary to clarify this specific issue. All things considered, from the point of view of an end-user, it is recommended to use the average harmonization coefficients given in Table III.
As a final remark, using all the considered sensors together appears beneficial to two main scenarios: first, the creation of a very long TS from 1984 to present (for example, for climate change related studies) and, secondly, the generation of a dramatically denser TS henceforth (for near real-time monitoring applications). In both the cases, when assembling the TS including all these sensors, the estimated linear transformations can be used to produce a harmonized dataset. Due to its operational timespan, the authors recommend using L7 as the common reference and harmonizing all the other sensors with it.
Conclusion
The goal of the study was to extract cross-sensors transformation coefficients for popular vegetation indexes computed from Landsat-5 TM, Landsat-7 ETM+, Landsat-8 OLI, and Sentinel-2 MSI. The aim was to enable long time series to increase the frequency of data because of different sensors harmonization. For each sensor pair, RMA and OLS linear regressions were computed on 300 000 pixels, randomly sampled from couples of almost simultaneous acquisitions by different sensors, and the computations were repeated 100 times to check the repeatability. For the first time, a cross-comparison analysis on vegetation indexes (NDVI, SAVI, EVI, and NDMI) derived from Landsat Collection-2 and Sentinel SR (L2A) products acquired all over the European continent was performed. Furthermore, the study included data from L5, L7, L8, and S2 altogether, allowing the extraction of time series starting from 1984. This approach highly increases the acquisition frequency, combining the 16-day L8 repeat cycle with the five days of S2 and, thus, raising the chance to collect cloud-free images that enables effective vegetation monitoring. This study was able to compute coefficients that allow to create a consistent time series starting from 1984 to present, combining in a temporal sequence L5, L7, L8, and S2. The tests on validation datasets proved that the application of these coefficients reduces the average difference in VIs values between sensors by at least an order of magnitude.
Future studies will consider the integration of the recently launched Landsat-9 that can further increase data frequency. In addition, despite the intense effort by calibration and validation teams of both Landsat and Sentinel programs, only few studies, including in situ measurement for data QA, can be found in the literature about sensors harmonization. Collecting spectral signature measurements on the ground would help assessing data quality and comparing different products levels offered by these missions.
ACKNOWLEDGMENT
This research did not receive any specific grant from funding agency in the public, commercial, or not-for-profit sectors.
NOTE
Open Access funding provided by ‘Alma Mater Studiorum - Università di Bologna’ within the CRUI CARE Agreement