A New Approach for Surface Urban Heat Island Monitoring Based on Machine Learning Algorithm and Spatiotemporal Fusion Model

Land surface temperature (LST) is an important indicator for assessing the surface urban heat island (SUHI) effect. This paper presents a novel approach to derive LST estimates by integrating machine learning algorithm and spatiotemporal fusion model at high spatial and temporal resolution. The spatial resolutions of Landsat TM and Landsat 8 LST data were first downscaled using random forest (RF) algorithm from 120 m and 100 m, respectively, to 30 m. The resultant LST data were fused with MODerate-resolution Imaging Spectroradiometer (MODIS) LST data, by means of the Flexible Spatiotemporal Data Fusion method (FSDAF), in order to generate high spatiotemporal resolution summer daytime LST data covering the center of Chengdu city in China. The proposed new method was used to estimate the spatiotemporal variations of the summer daytime SUHI from 2009 to 2018 over Chengdu city. Results show that: (1) RF performs way better than the classical downscaling algorithm—thermal sharpening algorithm (TsHARP) for LST, and produces higher accuracy for different land covers; (2) the fused high spatiotemporal resolution summer daytime LST values were evaluated with in situ LST obtained from Chengdu Meteorological Office and the final validation results indicated that the proposed method, in generating LST dataset, can provide more details of urban thermal environment and produce higher accuracy than the traditional FSDAF; (3) significantly increasing trends of summer daytime SUHI intensity (SUHII) in the study area were observed. SUHII increased from 2.78 °C in 2009 to 4.04 °C in 2018. The highest and lowest summer daytime LST estimates were recorded over impervious surface area (ISA) and waters, respectively.


I. INTRODUCTION
The urban heat island (UHI) refers to the urban centers experiencing higher temperatures than their surrounding rural areas due to human activities, and is one of the most well-known negative impacts of rapid urbanization in local environment, climate, human health and energy consumption [1], [2]. Therefore, better monitoring and understanding the variation of UHI intensity (UHII) at various temporal (inter-annual, seasonal, diurnal) and spatial (from local to global) scales The associate editor coordinating the review of this manuscript and approving it for publication was Vicente Alarcon-Aquino .
is of critical importance for global environmental and urban climate research and impact studies, which has strong implications for designing effective measure to mitigate the UHI [3], [4]. Satellite remote sensing provides an unhindered tool for studying UHI, especially the Surface Urban Heat Island (SUHI), which represents the spatial temporal structures of land surface temperature (LST) differences between urban and suburban areas [5]. So far, various studies have been conducted using satellite derived LST to study SUHI for hundreds of cities around the world [6]. For instance, Peng et al. [7] analyzed the SUHI intensity (SUHII) of 419 largest cities around the world during 2003-2008 using MODerate-resolution Imaging Spectroradiometer (MODIS) MYD11A2 LST data. Similarly, Clinton and Gong [8] used MODIS (MOD/MYD11A2) LST data to investigate the SUHI in every urban area between latitude 71 and −55 in 2010.
Currently, LST data available for use in the study of SUHI are derived from Thermal Infra-Red (TIR) remote sensing such as Landsat TM/ETM+/8 and MODIS. However, the low spatial temporal resolution of the currently available satellite LST products largely inhibit their potential application in SUHI studies [9]. For example, Landsat TM/ETM+/8 TIR data provide LST data at spatial resolution between 60 m and 120 m. Moreover, their temporal resolution, which is around half to one whole month, together with frequent cloud contamination may prohibit its application for monitoring SUHI [10]. By contrast, MODIS provide high LST temporal resolution of one day or half a day, but with low spatial resolution (1000 m), and this may limit detailed spatial analysis of SUHI [11]. In other words, due to the tradeoffs among these spatial and temporal resolutions [12], there is currently no single satellite system that can directly acquire LST data of the whole earth at high spatiotemporal resolutions. It is, thus, necessary to develop a new method that can integrate multi-sensors to generate LST data at high spatial and temporal resolution for monitoring SUHI dynamics.
In recent years, several spatiotemporal image fusion models were developed in order to produce high spatial and temporal surface reflectance images [13]- [16]. Although these fusion models were originally designed to fuse shortwave surface reflectance rather than LST fusion [17]- [20], they have also offered a new ground for effectively predicting fine LST time series. Liu and Weng [21] adopted the STARFM model to simulate ASTER-like LST images using MODIS for west Nile over Los Angeles, and LST prediction residual was less than 1 • C. However, caution should still apply while directly using STARFM for LST prediction in urban environments since LST is usually affected by the surrounding features [9]. In order to generate diurnal Landsat-like LSTs, Wu et al. [22] used high spatial temporal LST data from Landsat, MODIS, and GEOS/SEVIRI and proposed the spatiotemporal integrated temperature fusion model (STITFM), which successfully predicted LST with an accuracy of 2.5 K compared with in situ data. Also, Zhu et al. [23] put forward a Flexible Spatiotemporal Data Fusion method (FSDAF), which can minimize the chance for error to predict dense time series LST data and has shown some advantages beyond other fusion methods [24].
At present, many of the existing methods were developed to fuse LST data from Landsat LST and MODIS LST in order to predict daily or 8-day LST at the Landsat TIR spatial resolution. Although the United States Geological Survey (USGS) distributes resampled data from Landsat TIR channel, it is important to recall that the actual spatial resolution of these datasets is still about 100 m (Landsat TM 120 m, Landsat 8 100 m). Therefore, directly applying the spatiotemporal fusion model in generating high temporal LST data with data from Landsat TIR channel remains far from meeting the needs of SUHI monitoring. For example, Sobrino et al. [25] found the spatial resolution of LST greater than 50 m can assess SUHI effect at district level, otherwise cannot distinguish the urban thermal environment between the different urban areas. Also, Stathopoulou and Cartalis [11] demonstrated that LST data are easily affected by regional landscape characteristics, and that high spatial resolution LST data could improve the SUHII estimation accuracy. Therefore, an improved LST data fusion approach should be developed to derive a sequence of LST datasets at 30 m spatial resolution for SUHI quantitative monitoring.
Chengdu city in Southwestern China enjoys a subtropical monsoon humid climate with many cloudy and rainy days throughout the whole year. Despite the well-established fact that poor weather conditions in this area result in very low probability of obtaining one cloud-free Landsat LST image throughout the year, little to no attention has been paid to addressing the problem. In this paper, in order to monitor the spatiotemporal variations of the summer daytime SUHII in Chengdu during 2009-2018, random forest (RF) was firstly adopted to downscale the Landsat TM and Landsat 8 LST data from 120 m and 100 m spatial resolution, respectively, to 30 m. The FSDAF was then adopted to generate high spatiotemporal resolution LST images from downscaled Landsat TM, Landsat 8 and MODIS LST data, in order to analyze the spatiotemporal variations of summer daytime SUHI in Chengdu.

A. STUDY AREA
Chengdu city (Figure 1a), the capital of Sichuan province, is in Southwestern China, between 102 • 54 E and 104 • 53 E longitudes, and 30 • 05 N and 31 • 26 N latitudes. The urban area lies on the western slope of the Longmen Mountain, it is low to the southeast and high to the northwest. With a typical subtropical monsoon humid climate, FIGURE 1. Study area: (a) The location of Chengdu city in Sichuan province, China, and (b) Geolocation of the study area and the distribution of weather stations in Chengdu city, with Landsat OLI true color image. VOLUME 8, 2020 the average annual precipitation is 918.2 mm, and annual average number of sunny days does not exceed 25 days. Rainfall is mostly abundant in the period from June to August, and the annual average number of 340 days experience cloudy and rainy weather. Although much rain is found in summer (June -August), it also important to note that this season remains the hottest during the year, with a seasonal mean temperature of 28 • C [26]. High spatial resolution remote sensing data are thus vulnerable to poor atmospheric conditions over the area, largely limiting their potential for summer daytime SUHI monitoring.
Chengdu city is one of the most densely populated and economically vibrant cities in the world, the population of the city exceeded 16.33 million in 2018. In recent years, Chengdu has experienced rapid urbanization and industrialization growth rate, while large areas of cropland and grasslands being converted into urban and built up areas. The center of Chengdu city, covering an area of approximately 3200 km 2 was selected for the present study ( Figure 1b).

B. DATA AND PREPROCESSING 1) IN SITU LST
In situ LST data were collected from Chengdu Meteorological Office covering the summer (from June to August) of 2009 and 2018. Data were acquired from 7 weather stations distributed across the study area ( Figure 1b).

2) LST RETREIEVAL FROM LANDSAT DATA
Landsat satellites' products constitute an important dataset for LST retrieval because of the relatively high resolution of Landsat. Landsat TM collects TIR channel data at 120 m spatial resolution while Landsat 8 has two TIR bands at 100 m spatial resolution. Cloud free Landsat TM and Landsat 8 (path/row: 129/39) covering the study area were download from the U.S. Geological Survey (http://earthexplorer. usgs.gov/). However, Landsat has a 16-day revisit cycle, which limit the use of Landsat data in SUHI studies. Data were geographically corrected and rectified to the Universal Transverse Mercator system (UTM/WGS84). Radiometric correction was applied to convert the digital number (DN) for Landsat TM and Landsat 8 into surface reflectance values using ENVI 5.3 software. Fast line-of-sight atmospheric analysis spectral hypercubes (FLAASH) model was adopted to remove the atmospheric influence [27]. We retrieved LST data from Landsat TM TIR band 6 and Landsat 8 TIRS band 10 using the mono-window algorithm, because it is a highly effective method for retrieving LST [28]. For details of the mono-window algorithm, please refer to Qin et al. (2001) [29].

3) PREPROCESSING OF MOD11A1 AND MOD11A2
MODIS daily LST product (MOD11A1) and 8 Day average LST product (MOD11A2) at the local time, around 10:30 am, covering the summer of 2009 and 2018 were obtained from NASA's Earth Observing System Data and Information System (https://earthdata.nasa. gov). These are 1 km spatial resolution LST products covering the study area (h26v05). MOD11A1 and MOD11A2 LST data were retrieved by means of the generalized split window algorithm [30]. Many research results show that the accuracy of the MODIS LST data is high in most large cities around the world [31]- [33]. We used MODIS Reprojection Tool to re-project data to the same coordinate system as Landsat. Then, we reduced the potential geometric errors by co-registering MOD11A1 and MOD11A2 to Landsat LST data. Finally, we used the quality control band in the MOD11A1 and MOD11A2 to determine cloud contaminated pixels in order to exclude them from further analysis.

C. IMPLEMENTATION FOR LST PREDICTION AND MONITORING SUMMER DAYTIME SUHI
In this study, the implementation consists of two parts that are, testing the proposed approach and monitoring summer daytime SUHI ( Figure 2). In the first part, three dates were chosen, 2 April 2018, 18 April 2018, and 5 June 2018. Both the downscaled Landsat 8 LST image using the RF model and MOD11A1 images captured on 2 April 2018 at 03:32 and 03:00 UTC, respectively, were used in this study as the start time (t 1 ) input base data for the proposed method (Table 1), while the other MOD11A1 data captured at 03:00 UTC on 18 April 2018 and 5 June 2018 were used as the predicted time (t m ) input base data for generating the high spatial resolution LST predictions at t m . In the following parts, we call LST derived from the proposed method or FSDAF as ''predicted LST'', and LST directly from MODIS and Landsat as ''observed LST''. To test the performance of our proposed method, the observed Landsat 8 LST images at t m were used as the referenced data for validation. In order to validate the accuracy of the predicted LST data, the coefficient of determination (R 2 ), the absolute average difference (AAD), and the root mean square error (RMSE) were computed between the predicted LST images and the observed LST images.
If the performance of the proposed method in the first part is better, we can conduct the next part. We selected the pairs of downscaled Landsat TM LST image acquired on 24 March 2009, downscaled Landsat 8 LST image acquired on 2 April 2018, and the MOD11A1 data in the same period as the input base data at t 1 ( Table 2). Afterwards, MOD11A2 at t m were used to fusion the predicted LST data at 30 m spatial resolution and temporal resolution of 8 days at t m in 2009 and 2018. Finally, the predicted summer daytime LSTs were averaged and used to assess the spatiotemporal variations of the summer daytime SUHI in the study area.

1) DOWNSCALING LST USING A RANDOM FOREST MODEL
RF is an integrated extension of decision tree. The basic concept of RF is to construct a set of uncorrelated classification and decision regression trees. RF is capable to unite a big number of binary decision trees constructed using bootstrap samples from the training dataset and randomly select a subset of independent variables and dependent variables at each node [34]. The results of RF training turn out to be the voting output for all decision trees. Recently, LST downscaling with machine learning algorithms, such as RF [35], support vector machine (SVM) [36], artificial neural network (ANN) [37], genetic algorithm techniques (GA) [38] have been proposed and gained more recognition with high computing precision and fast operation. RF is an effective machine learning method with the advantage of high accuracies in fitting the nonlinear statistical relationship between LST and other variables. Compared to common statistics-based downscaling LST methods, RF is more efficient in complex regions, especially in urban areas. Bartkowiak et al. [34] evaluated three LST downscaling methods in Alpine vegetated areas, RF was found capable of modelling non-linear relationships between LST and variables in a very robust way. It has been proven that RF can improve LST downscaling performance in arid regions (especially in deserts) [39]. Li et al. [40] compared 3 machine learning algorithms, and indicated that RF yielded accurate results both in urban and rural areas.
Choosing the predictor variables in RF downscaling approach should reflect the spatial variation of LSTs over different areas. Therefore, in this paper, four predictor variable groups were selected according to the correlations between LST and different biophysical variables. Table 3 provides a brief description of the predictor variables employed in this study. From Table 3, biophysical indices were calculated from the reflectance bands of Landsat TM and Landsat 8 with the spatial resolution of 30 m. The land classification maps were obtained using SVM based on cloud-free Landsat image in summer. A DEM was acquired from the NASA's Shuttle Radar Topography Mission (SRTM), at approximately 90 m resolution. DEM derivatives such as aspect, slope and hill shade were established using the spatial analysis tool in ArcGIS.
A detailed procedure for LST downscaling is presented in Figure 3.
In the first part (testing the proposed approach), LST retrieval from Landsat 8 with 100 m resolution and the predictor variables were registered to the UTM projection over the study area. In this study, the Landsat 8 LST was aggregated to 900 m as the original low resoluton LST (LST OLR ), the predictor variables including the biophysical indices, terrain factors, and reflectance band were aggregated to 900 m as well. On this low spatial resolution level, RF model the statistical relationship between low resolution LST LR at 900 m and the predictor variables, it can obtain as follows: where the subscript LR means variable at low resolution, L indicates the aggregated the high spatial resolution predictors to 900 m, ρ represents the reflectance band, b is the biophysical indices, the t is the terrain factor. The residual LST ( ) was the difference between the LST LR and LST OLR , as shown in Equation (2): Therefore, the training model was applied to the predictor variables with high resolution (100 m) given the scale invariance, which assume a unique statistical relationship between LST and other variables exists within a sensor scene at multiple spatial resolutions [35]. The downscaled LST with 100 m resolution (LST H ) can be expressed as follows: where the subscript H indicates the high resolution (100 m) variables.
In part 2 (monitoring summer daytime SUHI), LST estimates were derived from Landsat TM and Landsat 8 at their  original low resolution (120 m and 100 m, respectively) LST (LST OLR ). The biophysical indices, terrain factors, and reflectance band were also resampled to 120 m and 100 m, respectively. On this low spatial resolution level, RF model calculated the statistical relationship between low resolution LST LR and the predictor variable using Equation (1). Using Equation (2) and Equation (3), the training model was applied to the predictor variables with high resolution (30 m). RF was applied to downscale Landsat TM LST and Landsat 8 LST from 120m and 100 m, respectively, to 30 m. Figure 4 presents a detailed procedure of the proposed approach. In this study, we used FSDAF to fusion high spatiotemporal resolution LST data in order to monitor summer daytime SUHI covering the study area by combining the spatiotemporal characteristics of MOD11A1, MOD11A2 and downscaled RF LST image. FADAF was originally developed by Zhu et al. [23] to fuse high spatial and temporal data from one high resolution image and two low resolution images. Zhang et al. [24] fused MODIS and Landsat image by the FSDAF method to generate dense time series LST data. FSDAF integrated the advantages of STARFM, spectral unmixing analysis and spatial interpolation to minimize the chance for error, while capturing the LST change with time and increase the availability of high spatial and temporal LST data. The principle of FSDAF is that data from the different sensors are comparable and consistent.

2) IMPLEMENTATION OF THE PROPOSED APPROACH
In this study, FSDAF mainly includes the following steps [23]: (1) classifying downscaled LST using RF at t 1; (2) detecting the changes in each class of MODIS LST from t 1 to t m ; (3) predicting the high spatial resolution LST at t m according to predicted temporal changes and calculating the residuals of MODIS LST; (4) interpolating the MODIS LST at t m by the thin plate spline (TPS) function to predict the high spatial resolution LST; (5) distributing the calculated residual in step (3) to the predicted high spatial resolution of LST; (6) generating final high resolution predicted LST at t m according to the assigned weight by neighborhood information.
The calculation process can be expressed as follows: in which LST t m x ij , y ij is the predicted high spatial resolution LST at time t m , x ij , y ij is the coordinate index of the j th high spatial resolution pixel within the i th low spatial resolution pixel. LST t 1 x ij , y ij is the downscaled RF LST data at time t 1 , W k is the weight of the k th similar pixel.
LST(x k , y k ) represents the LST change from t 1 to t m of similar pixels. In Equation (5), LST x ij , y ij is the prediction of the total change of the target pixel x ij , y ij between t 1 and t m . ε high x ij , y ij is the residual which is allocated to the j th high spatial resolution pixel in the i th low spatial resolution pixel.
LST(c) is the change in LST over a certain class in the high spatial resolution data from t 1 to t m . For more detailed steps of the FSDAF model and the calculation of the weight, kindly refer to previous studies [23]. TPS function can be used to compute the residual by the following Equation: LST SP t m x ij , y ij = f TPS−b x ij , y ij (10) in which L(x i , y i ) is the residual values between high spatial resolution observed LST and predicted LST, m is the number of the sub-pixels in low spatial resolution pixel. L(x i , y i ) represents the change in LST of low spatial images from time t 1 to time t m , W (x ij , y ij ) is the normalized weight of LW (x ij , y ij ). LST TP t m is the LST at time t m predicted based on temporal change. LW (x ij , y ij ) presents the weights of residual distribution, E h0 (x ij , y ij ) is the temporal prediction error, HI (x ij , y ij ) is the homogeneous coefficient, E he (x ij , y ij ) represents the equal error within a low spatial pixel. LST SP t m x ij , y ij is the LST at time t m of each pixel in high spatial resolution image based on TPS after optimizing the parameter. f TPS−b x ij , y ij is the TPS functions.

3) SUHI INTENSITY ANALYSIS
SUHII indicator was used to assess summer daytime SUHII in the study area from 2009 to 2018. The urban and rural area difference in the average summer daytime LST was adopted to identify summer daytime SUHII variations by Equation 11:

A. LAND COVER CLASSIFICATION WITH LANDSAT TM AND LANDSAT 8
In order to analyze the spatiotemporal variations of the summer daytime SUHII in Chengdu, the first step was defining urban and rural areas in 2009 and 2018. Cloud-free Landsat TM and Landsat 8 data of 2009 and 2018 were acquired in summer. The land cover maps of Chengdu in 2009 and 2018 were classified using SVM [43], [44]. The urban land cover classes have been identified as follows: (i) vegetation; (ii) water; (iii) bare soil; (iv) impervious surface area (ISA). A Google Earth image acquired in the year 2018 at 1m spatial resolution was used as the reference data to evaluate the accuracy of classification results. A total of 1500 sample points from the Google Earth image were used as validation points to assess the accuracy of the classified results. Table 4 illustrates the producer's and user's accuracies. Urban area is defined as a high intensity and densely occupied by ISA. Rural area is defined as the buffer zone that includes a rural area (as 150% of the urban area) around the urban area ( Figure 5).

B. TESTING THE PROPOSED APPROACH
In order to test the performance of RF in downscaling LST data, thermal sharping algorithm (TsHARP) as the most common statistics-based LST downscaling algorithm [45], [46] was also used to downscale LST. Firstly, Landsat 8 LST    (Figure 6a) was aggregated to 900 m as the low resolution LST (Figure 6b). The LST downscaling results of RF and TsHARP from 900 m to 100 m are shown in Figure 6c and Figure 6d, respectively. Table 5 shows the variable importance (VI) scores of RF in the study area. The contributions of DEM, FV, Red band are higher than other variable because Longquan Mountain is to the east of the study area, the spatial distribution of summer daytime LSTs is controlled by DEM, vegetation fraction and the red band that is sensitive to vegetation changes. Besides, IBI shows high importance proves LSTs is also affected by impervious surface cover in the study area.
As shown in Figure 6a, four kinds of subareas notably subarea 1 in the ISA, subarea 2 in the bare soil area, subarea 3 in the water area, and subarea 4 in the vegetation area were used to further test the LST downscaling performance for different land cover types. By computing the RMSE and the mean error (ME) between the 100 m downscaled LST and 100 m observed Landsat 8 LST, the statistical results on the performance of RF in LST downscaling and TsHARP model for different land cover types are shown in Table 6.  Figure 7 shows the spatial distributions of LST error over ISA, bare soil, water body, and vegetation by using the RF and TsHARP.
From Table 6, we can see that RF produces higher LST downscaling accuracy than TsHARP. Over the study area, the RMSE of RF is 2.15 K, which indicates better performance compared to TsHARP. The ME of the downscaled LST for the RF is -0.07 K, about 0.26 K lower than TsHARP. In Figure 7, the downscaling performances of RF is better than TsHARP in different land cover types of the study area. From Figure 7b, 7d, 7f, and 7h, obvious underestimations and over estimations of the different land cover types when using TsHARP can be observed. The RMSEs of four land cover types range from 3.43 to 6.93 K by using TsHARP, while the MEs are also not close to 0. This result shows that TsHARP, that is only based on NDVI, has unsatisfactory results in downscaling LST over various and complex land cover types. This is because NDVI has great sensitivity to differences in land cover categories. The RMSEs of RF ranged from 1.44 to 2.03 K, while the MEs were close to 0, which proves RF based on the relationship between LST and multitype variables can produce the satisfactory LST downscaling results.
Therefore, the downscaled RF LST image could be used as the start time LST base data of FSDAF for generating the high spatial resolution LST predictions at predict time. The next step is testing the proposed method in the first part (testing the proposed approach) of this study. Figure 6c and 8a were the downscaled RF LST at 100 m spatial resolution and MOD11A1 on 2 April 2018. Figure 8b and 8e are the observed MOD11A1 data on 18 April 2018 and 5 June 2018 for predicting the LST data at 100 m spatial resolution on the same date (Figure 8c and 8f). The observed Landsat 8 LST (Figure 8d and 8g), recorded on 18 April 2018 and 5 June 2018 can be used to evaluate the predicted LST results at the similar time. Figure 9 shows scatter plots of correlation between the observed Landsat 8 LST at 03:32 UTC and predicted LSTs at  Figure 9, it can be observed that the R 2 between the predicted LST and the observed Landsat LST are 0.9025 and 0.9415, and AAD values are 0.031 K and 0.027 K, respectively. RMSE values are 0.089 K and 0.082 K on the same dates. These accuracy assessment results show that the proposed method has a better performance to generate the high spatiotemporal resolution LST data.

C. MONITORING SUMMER DAYTIME SUHI
The accuracy assessment results in the first part (testing the proposed approach) proves that the performance of the  proposed method is better, which paves way to the next step that consist of predicting 8 day summer day summer daytime LST data at 30 m spatial resolution.
Firstly, Landsat TM LST at 120 m spatial resolution observed on 24 March 2009 ( Figure 10a) and Landsat 8 LST observed on 2 April 2018 were resampled to 90 m spatial resolution (Figure 10d) and downscaled by RF to 30 m (Figure 10b and 10e), and further compared with the TsHARP as shown in Figure 10c and 10f.
Since there is no real satellite derived LST data at 30 m spatial resolution, we validated the downscaled LST data using in situ LST data. Figure 11a and 11b shows the scatterplots of correlations between the downscaled RF LST and downscaled TsHARP LST versus in situ LST retrieved on the 24 th March 2009. Figure 11c and 11d shows scatterplots of downscaled RF LST and downscaled TsHARP LST versus in situ LST acquired on 2 nd April 2018. From Figure 11, we can see the R 2 of the downscaled LSTs by RF exceed 90%, thus an accuracy improvement of approximately 10%, compared to TsHARP method. Therefore, high spatiotemporal resolution summer daytime LSTs covering the study area in 2009 and 2018 were This permitted to fusion predicted LST data at 30 m spatial resolution and LST data at 8 days temporal resolution.
In addition, the traditional FSDAF model was also used to evaluate the performance of the predicted LST results based on the proposed method. Figure 10a and Figure 12a were the observed Landsat TM LST at 120 m spatial resolution and MOD11A1 on 24 March 2009, Figure 6a and Figure 8b were the observed Landsat 8 LST at 100 m spatial resolution and MOD11A1 on 2 April 2018 as the input base LST at t 1 to FSDAF. Then, the second input was MOD11A2 at t m to fusion 120 m predicted summer daytime LST in 2009, and 100 m predicted summer daytime LST in 2018. It is noted that, although the Landsat TIR channel (Landsat TM 120 m, Landsat 8 100 m) is first resampled to 30 m by USGS before distribution to the users, the actual spatial resolution of observed LST retrieved form Landsat TIR channel and the predicted LST results based on FSDAF remains approximately 100 m.
Comparing the predicted LST form the proposed method and the traditional FSDAF at the same period, one notes that the general spatial distribution of the predicted LST results from the proposed method is consistent with FSDAF results.
Hence, the new approach could provide much better spatial resolution and provide important details of urban thermal environment such as continuous river, urban inner road and ring road. The R 2 between the predicted LST and in situ LST ( Table 7) also show that the proposed method can produce high spatiotemporal resolution LST data with higher accuracy than traditional FSDAF.  Figure 13c is the summer mean daytime LST at 120 m spatial resolution using FSDAF in 2009, Figure 13f is the summer mean daytime LST at 100 m spatial resolution using FSDAF in 2018.
As shown in Figure 13, high summer daytime LSTs were mainly concentrated in the urban center and gradually extended to the urban surroundings from 2009 to 2018. The reason for this might be related to the anticipated rapid urbanization driven by population increase and fast economic growth in Chengdu. The local government deployed large industrial enterprises to the suburban of the city.
In order to further study the impact of urbanization on the urban thermal environment, we investigated the changes of summer mean LST data derived from the new method, the traditional FSDAF method and MOD11A2, respectively, over four dominant land cover types, such as ISA, vegetation, bare soil, and water body during 2009 to 2018. The results are shown in Table 8. Firstly, compared to the traditional FSDAF, the summer mean daytime LST of each land cover types using the new method is closely relate to the results derived from MOD11A2 suggesting significant consistency. Besides, similar trends were identified in all three LST product sets. Highest summer daytime LSTs were observed over ISA, followed by bare soil, vegetation and water body. ISA exhibited highest summer daytime LST because it is more readily dissipated as sensible heat and less energy is stored effectively, which means low thermal inertia resulting in an analogous thermal VOLUME 8, 2020  response behavior like bare soil. Vegetation had a lower summer daytime LST since vegetation's thermal capacities easily release heat stored through canopy transpiration. The water had the lowest summer daytime LST because water areas possess specifically large heat conservation capabilities. Therefore, in Chengdu, changing spatial patterns' control will be more useful in handling SUHI. The strategy of embedding high density vegetation, such as planting more shade trees and other vegetation species has been proved as an effective way to mitigate SUHI. In addition, no further urbanization should be envisaged in areas where the urban thermal environment is seriously deteriorating. Instead, actions to improve the local thermal environment need to be undertaken such as using highly reflective, porous and light-colored paving materials and building. Improving energy efficiency would also halt the expansion of SUHI. Figure 14 shows the summer daytime SUHII over the period from 2009 to 2018. Significantly increasing trends of summer daytime SUHII in the study area were observed using all four LST data. Using the new method, as proposed in this paper, it has been found that SUHII increased from 2.78 • C in 2009 to 4.04 • C in 2018 in Chengdu city. Compared to the traditional FSDAF and MOD11A2, these results are more closely related to the SUHII acquired from in situ LST data,  thus signaling that the proposed method is susceptible to provide reliable SUHI monitoring results.

IV. DISCUSSION
Considering the limitations of the existing spatiotemporal fusion models, this paper has presented a spatiotemporal fusion method by integrating RF and FSDAF to generate high spatiotemporal resolution summer daytime LST data. The results show that the proposed method can generate high resolution LST images with high accuracy (the RMSE and AAD value are both less than 0.089 k). Compared to the traditional FSDAF, the proposed method can increase the spatial resolution of the predicted LST image from 100 m and 120 m to 30 m. In addition, results of this method for monitoring summer daytime SUHII are more correlated with the SUHII derived from in situ LST data.
However, the proposed method also has shortcomings that cannot be neglected. One issue relates to the fact that predicted LST data can be largely influenced by the quality of the input LST data at start time and predicted time. In this study, due to limitations of cloudy and rainy weather, we only selected Landsat and MODIS LST data for only 2 years as the input base data to fusion high spatiotemporal resolution LST data. It could be helpful in the future, to find an approach that tackles this problem by integrating some other high spatial resolution remote sensing data, in order to detect the diurnal and seasonal variations of daytime SUHIs. Secondly, since there is no actual satellite-derived LST data at 30 m resolution to evaluate the accuracy of predicted LST at 30 m spatial resolution yet, we evaluated the accuracy by in situ LST obtained from 7 weather stations, but the number of weather stations is relatively limited. The reliability of the proposed method may be influenced by the big differences in spatial resolutions between 30 m LST data and 1000 m MODIS LST data for LST data fusion. Future studies may utilize more in situ LSTs and in situ sensors mounted on fixed meteorological stations or traverses of vehicles to evaluate the predicted results. Thirdly, future studies may use the spatiotemporal fusion model to generate high spatiotemporal urban surface biophysical variables covering longer and continued time period to allow for genuine quantitative analysis of the relationship between SUHI and the urban surface biophysical indicators.

V. CONCLUSION
In this study, the proposed method was successfully used to generate high spatial resolution (30 m) and temporal resolution (8 days) LST images from available Landsat TM, Landsat 8 and MODIS LST data in the summer of 2009 and 2018. Then, we generated the predicted summer daytime LST over the study period, analyzing the spatiotemporal change of summer daytime SUHII in Chengdu. Several conclusions were drawn from this research: (1) the performance of the proposed method could predict LST with relatively strong accuracies (R 2 , AAD, RMSE values were in the range of 0.9025-0.9415; 0.027-0.037; 0.082-0.089, respectively).
(2) Compared to the traditional FSDAF, the predicted summer daytime LST images derived from the proposed method can increase the spatial resolution of the predicted LST image from about 100 m to 30 m and provide more details of urban thermal environment. The predicted high spatiotemporal resolution summer daytime LST can be used for quantitatively monitoring the variations of SUHI. (3) The significantly increasing trends of SUHII in the study area were observed by using the proposed method, rising from 2.78 • C in 2009 to 4.04 • C in 2018. As discussed above, the limitations of the proposed method should be addressed, and the quantitative analysis of the relationship between SUHII and high spatiotemporal urban surface biophysical variables should be thoroughly studied in future works.