Spatio-Temporal Analysis of Urban Heatwaves Using Tukey g-and-h Random Field Models

Real-time heatwave risk management with fine-grained spatial resolution is important for analysis of urban heat island (UHI) effects and local heatwaves. This study analyzed the spatio-temporal behavior of ground temperatures and developed methods for modeling them. The developed models consider two higher-order stochastic spatial properties (skewness and kurtosis), which are key to understanding and describing local temperature fluctuations and UHI effects. Application of the developed models to the greater Tokyo metropolitan area demonstrated the feasibility of statistically incorporating a variety of real datasets. Remotely sensed imagery and data from a variety of ground-based monitoring sites were used to build models linking urban covariates to air temperature. Air temperature models were used to capture high-resolution spatial emulator outputs for modeling ground surface temperatures. The main processes studied were the Tukey g-and-h processes for capturing spatial and temporal aspects of heat processes in urban environments. The main finding is that consideration of not only the mean temperature but also the variance, skewness, and kurtosis parameters can reveal hidden heatwave structures.

Since the first papers on urban heat islands (UHIs) were published (see [3]), the need for quantifying and modeling spatially fine granular temperature processes in urban environments has grown in prominence. It is becoming increasingly possible to improve both the resolution of spatial-temporal temperature models as well as to distinguish between air temperature models and ground surface temperature models. Using a newly created framework, we developed methods for combining different data sources to model both air and surface temperatures in local urban environments in a consistent manner.
We see two main directions for the use of such air and ground temperature models. The first is related to climate initiatives and smart city design. The second is related to The associate editor coordinating the review of this manuscript and approving it for publication was Gerardo Di Martino . policies on the reduction of risk and the mitigation of population mortality effects arising from urban heatwaves (see for instance [2]). We briefly provide context for these two application domains below.
From the perspective of climate initiatives and smart city design, global initiatives are being established to study local temperature processes through such channels as ''green finance'' by international organizations such as the Climate Bond Initiative. These organizations are actively developing guidance and frameworks for the quantification of pollution reduction in smart city environments. In particular, they are interested in the inter-relationships between CO2 emissions in urban environments and local temperature profiles, especially how such temperature profiles are related to emissions in urban environments (see discussions in [5], [7], [8]). This interest is driven by a certification and regulation perspective in which funding for low-carbon buildings and efficient transportation systems is increasingly linked to verifiable VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ emissions targets. In such applications, there is a strong demand for accurate modeling of the spatial-temporal dynamics of both ground and air temperatures. Local urban temperature modeling is important because it can also inform policy for emission reduction standards. The components of such models are increasingly being used as benchmark references to set standards to be achieved by green infrastructure projects for transportation and lowcarbon building initiatives in smart city environments. They are even increasingly being used to justify access to the proceeds from, for instance, green bond debt instruments. The eligibility criteria for such instruments are increasingly reliant on scientific certification of the reductions to be achieved. Quantification of such reductions can be obtained from statistical modeling of local temperature profiles using models like the ones presented here. Different attributes of a model can be applied to different financial products such as green bonds and climate insurance. For instance, model outputs can be used to assess the trigger for green bond cancellation. For climate insurance, the outputs from the local models can be summarized and used to construct indexes for use in determining the size of insurance payouts for claims based on climate loss events. The amounts of financing for green project funding through debt instruments, such as green bonds and their associated securitization markets, are projected to reach 1-2 trillion USD globally in the next couple of years [1]. This places an even stronger spotlight on the methods used to quantify spatial temperature processes and thereby better understand UHI effects for use in assessing the suitability of environmental projects funded by such debt capital (please see discussions in [6]).
In the context of developing local temperature models for health-related policies, the outputs of local UHI and heatwave models for both air and ground temperatures can help inform policy for mitigation and severity/exposure reduction for at-risk populations. Such models can be used to develop strategies for reducing high mortality rates due to extreme temperature distress, especially in elderly urban populations. The effects that extreme temperature events can have on populations are exemplified by the large number of deaths from heat-related distress during the great European heatwave of 2003 (estimated to be >52,000) and the Russian heatwave of 2010 (estimated to be around 15,000). Numerous other heatwave events in India, Pakistan, the Middle East, and Australia have also resulted in extremely high mortality levels. Strengthening resilience against and improving adaptability to climate-related hazards, such as these severe heatwave events, from a political perspective was thus identified as one of the 17 Social Development Goals adopted by the United Nations in 2015. This has made heatwave monitoring and management an important issue in smart city design and policy agendas for aging urban populations around the world.
In this study, we targeted the Tokyo metropolitan area, which is the largest metropolitan area in terms of population (35 million people) and which contains one of the largest urban-dwelling elderly populations on the planet.
We analyzed the spatio-temporal behavior of the ground temperatures, focusing on not only mean and variance, which have been considered in many spatial and temporal temperature studies, such as [9], [10], but also skewness and kurtosis, which are key parameters describing extreme heat. We also developed methods for modeling the behavior.

II. STATISTICAL MODELING CONTEXT
We created a statistical framework for producing emulator output in the form of a spatially resolved gridded dataset of ground temperatures. This emulator output is constructed by combining temperature observations in space and time from a combination of satellite remote sensing imagery and a variety of ground-based monitoring site data. The resulting emulator output data is modeled using Tukey g-and-h processes for capturing the spatial and temporal aspects of heat processes in urban environments. These processes are the natural extension of Gaussian process (GP) regression models and are used to incorporate transformations and/or warping properties that explicitly enable skewness and kurtosis to be parameterized. Spatial modeling is achieved by applying the Tukey g-and-h random fields model to the combined dataset (see [11]- [13] and [14]).
The dataset is very large due to the use of fine-scale spatial and temporal resolutions, resulting in a significant computational challenge for the model estimation component of the application. The amount of computation is reduced by applying spectral covariance rank reduction when the model is applied to the emulator modeled ground temperature dataset.
Our analysis of the data for Tokyo suggests that the mean, variance, and spatial skewness are important in characterizing ground temperatures and subsequent UHI and local heatwave propensity. It also revealed that the model parameters varied considerably among urban, suburban, and mountain areas and that one can cluster such areas to form adaptive local climate policy responses to UHI effects in different parts of the cityscape.

A. SPATIAL-TEMPORAL TEMPERATURE MODELING FOR URBAN ENVIRONMENTS
Spatial and temporal data related to temperature, noise, and pollution are increasingly being collected as part of smart city environmental monitoring initiatives (see [15], [16]). This has been facilitated by the development of remote sensor technology and global positioning systems (GPS), along with other remote sensing modes (see [17]).
Our focus here is on the modeling of such spatio-temporal data. Models for such data are commonly developed using GP techniques. However, the number of observation points or design points n for fine-resolution models applicable to the study of local urban environments can be very high dimensional for both time resolution and spatial resolution. In such cases, the classical estimation of a spatio-temporal GP, or kriging, requires the inversion of an nT × nT covariance matrix, where n is the number of sample sites and T is the number of observation times. The computational cost for estimation and prediction is of the order O((nT ) 3 ), so computation is feasible only when n and T are reasonably small. However, when modeling a local climate region, n is typically very large. Since the time resolution is often on the scale of minutes, T can be very large when ground-based sensor data are combined with daily satellite data over many months or years.
The widespread use of such GP and kriging models is largely due to the relative simplicity they provide for model formulation and the statistical characteristics they reveal related to the sufficiency of the analysis of second-order process information on the spatial-temporal mean and covariance structure of the underlying process. We argue that, in the context of local resolution models, one may wish to develop models to account for higher-order stochastic structures that may vary in space and time. In the local climate modeling setting we studied, we had large samples in space and time that we believe could potentially have rich information on not only second-order process characteristics of mean and variance. Increased sample resolution can potentially enable the measurement and therefore modeling of higher-order aspects of such spatial-temporal processes, such as spatial skewness and kurtosis describing extreme tail behavior. Such higherorder information should be included in spatial-temporal models to ensure that the resulting estimations are accurate, especially when seeking to explain extreme events such as extreme heat and UHIs.
Modeling spatial and temporal extremes has been attracting attention in geostatistics owing to the increasing availability of data with improved sampling resolutions. The max-stable process [34], which is a natural infinitedimensional extension of the univariate generalized extreme value distribution, has been extended in a spatial setting, e.g., [29]- [31]. The proposed spatial max-stable processes have frequently been applied to model extreme climate events, e.g., [32], [33]. Unfortunately, as the likelihoods associated with the spatial max-stable processes have no closed-form expressions ( [31]), they are not easily developed, especially for spatial-temporal contexts with highresolution sampling, such as when n or T is large. Moreover, there has been little to no work on the temporal time series characterization of such processes.
Furthermore, max-stable processes are based on the assumption that the index of regular variation characterizing the extreme spatial behavior is homogeneous in space and time, an assumption we did not wish to make with the fine-resolution process modeling undertaken in this work. Furthermore, such approaches are more applicable to the characterization of only extreme behaviors as opposed to our objective, which is to extend the modeling from second-order characteristics to higher-order stochastic structures while also allowing for the regular variation properties of models to capture potential heavy-tailed extremes should they appear locally in some regions of space over time.
Therefore, we aimed to develop new classes of processes, building on the recent studies of [13], [14], and [58]. In those studies, different models of the Tukey g-and-h (TGH) random field (TGH-RF) were proposed. Such models are typically developed as transforms or warpings of a GP via TGH transformation. This family of transformations enables explicit modeling and estimation of higher-order features such as coskewness and co-kurtosis of an underlying process (please see explicit derivations of such population process characteristics in [58]). Regarding temporal components, such models were recently extended by Yan and Genton [35] to capture non-Gaussian autoregressive processes.
Warped GP models in the TGH family are now more suitable for modeling in such big data contexts as evaluating likelihood, first outlined in [56] and [57]. These models have been resolved in efficient computational frameworks, such as the numerically efficient approach reported in [36] for maximizing an approximate likelihood, at least for small-n sample settings. We have developed methods that reduce the numerical cost incurred in the inversion of the n × n covariance matrix, which normally makes computation intractable when n is large. In other words, we have developed a TGH-RF modeling approach that is fast and applicable to large spatial datasets.
We can use such models and observation data extracted from remotely sensed images for temperature data analysis, which produces a very large sample size in space. The data used to train a TGH spatial process model typically is of the order of gigabytes per minute. To facilitate the use of such data in local temperature modeling contexts, we developed two approaches, a low-rank TGH-RF modeling approach and a sparse TGH-RF modeling approach. We compared them with the original TGH-RF approaches introduced above in terms of computational time and estimation accuracy. The developed approaches are natural extensions of the ideas of the laGP framework ( [22]) extended to the TGH process context.
Another contribution of this study is the development of a high-resolution dataset of emulated ground temperatures in the Tokyo metropolitan area; the developed TGH-RF models were applied to this dataset for heatwave analysis. There are at least two monitoring systems available to provide temperature data: station monitoring and remote sensing. The former monitors temperature, humidity, and other climate variables VOLUME 9, 2021 by the minute and by the hour at spatially distributed monitoring stations. The latter monitors ground temperatures with ultra-high spatial resolution for a particular observation window per day; however, values are often missing in certain spatial regions due to cloudy/rainy conditions (see Section III). We combine these data for temperature emulation. For discussions on combining multiple spatial datasets, see [37]. In short, we developed low-rank and sparse TGH-RF modeling approaches and applied them to the emulated temperature dataset. This study is organized as follows. In Section III, we describe the emulator we developed for generating spatially fine ground temperature maps daily by combining surface temperature data from satellite remote sensing with ground-based air temperature data observed at monitoring stations in the study region, the greater Tokyo metropolitan area. Section IV describes the development and application of our approximated fast TGH-RF models for analyzing temperature distribution properties, including spatial co-skewness and spatial co-kurtosis, and compares this novel solution to classical approaches. Section V further explores temperature distribution properties. Finally, Section VI summarizes the key points and mentions potential future work.

III. COMBINING SATELLITE REMOTE SENSING DATA WITH GROUND MONITORING STATION DATA
In this section, we first outline the properties of the different data sources we used. Next, we explain the first stage of our analysis in which the framework we created was used to combine satellite remote sensing ground temperature data with ground-based weather monitoring air temperature data as well as spatial covariate information and thereby produce emulator model output data.

A. AIR AND GROUND TEMPERATURE DATA PROCESSING AND EMULATOR MODELING: OVERVIEW
The ground temperature was our primary focus as it reflects the absorption and radiation of heat from roads, buildings, and other urban materials, which determine the intensities of heatwaves and UHIs. To emulate ground temperatures accurately, we combined the air temperature data, which had low spatial resolution and high accuracy, with remotely sensed ground temperature data, which had higher spatial resolution and lower accuracy.
Air temperature datasets for the target area were obtained from two sources. The Japan Meteorological Agency (JMA) provided hourly temperatures monitored at 78 stations (as of 2013) since 1976. NTT DOCOMO Inc. provided air temperatures observed every minute at 206 stations since 2013. The locations of the 78 + 206 monitoring stations are plotted in Fig. 1.
A ground temperature dataset was obtained from remotely sensed images captured by the Moderate Resolution Imaging Spectroradiometers (MODIS) onboard the Terra and Aqua satellites (http://modis.gsfc.nasa.gov/). They generate worldwide ground temperature maps with a spatial resolution of 1 km at 01:30, 10:30, 13:30, and 22:30 every day. There are 31,235 observation points in the target area. We crosschecked the data against the other two data sources to ensure accuracy. Example plots of ground temperatures on six days in August 2013 are shown in Fig. 2. As you can see, some values are missing due to cloudy/rainy conditions. The missing values in August had a daily mean ratio of 0.70 while the ratio for September was 0.65. Nevertheless, we were able to utilize 9,284 and 10,813 MODIS data values in August and September, respectively.
As described above, both the air and ground temperature datasets were incomplete. The next section explains how we combined the data from these different sources with local spatial covariate information, which is detailed in Table 1, to develop a regression-based emulator model. This model was used to estimate the missing values, enabling us to obtain a complete ground temperature model output profile in space and time.
Use of the additional location-specific explanatory variables in Table 1 was found to result in statistically significant improvements in the emulator model used to identify discontinuous changes in air/ground temperatures along the border separating urban and non-urban areas. Fig 3 summarizes the stages in the emulator regression model used to produce an emulator spatial-temporal map for ground temperatures. The input data are (A1) the air temperature data (JMA and NTT DOCOMO) and (A2) the ground temperature data (MODIS). We first modeled the air temperatures using (B1) a regression Spatial Best Linear Unbiased Estimator (S-BLUE) ( [38]) and the explanatory variables in Table 1, as described in Section III-B2). This produced (C1) the emulated air temperature data. The emulated air temperatures and variables in Table 1 were used as explanatory variables in (B2) a rank reduced S-BLUE approach (see Section III-C) to emulate the (C2) ground temperature data in 1 km grids.

B. REGRESSION S-BLUE FOR AIR TEMPERATURE EMULATION (B1)
Section III-B1 explains the model, and Section III-B2 explains the regression S-BLUE approach.

1) SPATIAL REGRESSION MODEL
We assumed the following regression structure to model the observed air temperatures: Table 1, z is a vector of regression coefficients, and represents the noise process. The (n×n) covariance matrix of the noise process, denoted C, was parametrized by a distance-decay exponential kernel capturing spatial dependence: where d(s i , s j ) is the Euclidean distance between sites s i and s j measured in units of meters [m], r is a range parameter, and τ 2 and σ 2 are variance parameters for spatial and nonspatial variations, respectively. The kernel can be replaced with Gaussian, spherical, or other positive definite kernels (see [39]). Our model describes discontinuous changes in temperature, e.g., at borders separating urban and non-urban areas, using the regression term and describes continuous changes using a spatially dependent process. The same structure is assumed to be behind the unobserved temperature at location s * ∈ D ⊂ 2 .
where X(s * )(K × 1) are the explanatory variables observed at site s * , and c(s * ) (n × 1) is a vector in which the i-th element equals c[d(s * , s i )].

2) LINEAR APPROXIMATION USING S-BLUE
A non-linear spatial smoothing function behind air temperatures was linearly approximated using a spatial best linear unbiased estimator, S-BLUE, as studied in [38]. S-BLUE, denoted here asf (s * ), is a linear approximation of a nonlinear spatial smoothing function used to estimate missing observation Y (s * ). It does this by minimizing the meansquared error E (y(s * ) −f (s * )) 2 under the constraints of linearity and unbiasedness. Solving the minimization problem on the basis of (1 and 3) produces the regression S-BLUE estimator:f whereẑ = (X TĈ −1 X) −1 X TĈ −1 Y . We used this regression S-BLUE to emulate air temperatures in 1 km grids. The variance parameters {τ 2 , σ 2 , r} were estimated using the robust weighted least squares method [40]. We used the gstat R package (https://cran.r-project.org /web/packages/gstat/) for the estimation.

C. RANK REDUCED REGRESSION S-BLUE FOR GROUND TEMPERATURE EMULATION (B2)
The variables in Table 1 and (C1) are the emulated air temperatures, which are used as explanatory variables for (B2) the ground temperature emulation. Unfortunately, the original S-BLUE, which involves matrix inversionĈ −1 , is computationally infeasible because of the large sample size of the MODIS data. To reduce the computational cost, we approx-imatedĈ byÊ Lˆ LÊ T L , whereˆ L is an L × L diagonal matrix consisting of the L largest eigenvalues approximated by the Nyström extension ( [49]), andÊ L is an n × L matrix consisting of the corresponding L approximated eigenvectors. Thus, we approximateĈ using the L principal eigenpairs. Given the eigen-decomposition, the S-BLUE estimator yieldŝ The complexity for the rank reduced regression S-BLUE is O(n), making it feasible for the MODIS data.

1) AIR TEMPERATURES (C1)
We used the regression S-BLUE approach explained in Section III-B2 to emulate air temperatures in 1 km grids at 13:00 JST each day for 61 days. The mean R-squared value of the model for each day was 0.818, which verifies the accuracy of our emulation model. We illustrate a randomly selected, but representative, example of the fiting results below. Fig. 4 shows example boxplots of the estimated t-values of the regression coefficients for the air temperature model. Note that the variance inflation factor, which exceeds 10 if there is severe multicollinearity, verifies that the explanatory variables are not collinear with each other. It can be seen that the air temperatures were significantly lower in cropland areas and significantly higher in populated areas. These results are plausible considering the cooling effects of natural environments and the trapping of heat due to UHI effects in populated areas. Furthermore, we see that elevation is negatively significant. The positive sign for latitude might reflect the cooling effect of the ocean in the south combined with the thermal storage effect of the inland areas in the north.
Emulated air temperatures at 11:00, 12:00, and 13:00 on August 17 are plotted in Fig. 5 (top). A land cover map is shown in (6) for reference. The plots show a rapid increase in air temperatures, especially near and in the urban areas.

2) GROUND TEMPERATURES (C2)
Ground temperatures in 1 km grids at 13:00 JST each day for 61 days were emulated using the rank reduced S-BLUE approach (see Section III-C). In accordance with the results of preliminary analysis, L was set to 200. For days with less than 5,000 observations, we used data for that day plus the data for one day before and one day after to stabilize the estimation. To linearize the air/ground temperature relationship, we applied the Box-Cox transformation to air temperatures, using a multiplier estimated a priori by maximizing the likelihood of the linear model between the two temperatures. The mean R-squared value across the days was 0.7303, suggesting VOLUME 9, 2021  reasonable emulation accuracy. Box plots of the estimated t-values of the regression coefficients are shown in Fig. 7. As expected, the air temperatures increased the ground temperatures. The populated and urban areas had a positively significant effect on the ground temperature. These results demonstrate the existence of UHIs in these areas. Furthermore, latitude had a positively significant effect. We attribute this to the existence of a basin in the target area that typically has higher temperatures in the north.
Emulated ground temperatures at 11:00, 12:00, and 13:00 on August 17 are also plotted in Fig. 5 (bottom). Their spatial patterns differ from those of the air temperatures; ground temperatures are clearly a better indicator of heat in urban areas (see Fig. 6). This is because ground temperature reflects thermal storage and radiation due to urban building materials, which are the principal sources of UHIs (e.g., [43]). This is also the reason ground temperatures are typically higher than air temperatures. Although air temperatures are commonly used for heatwave risk assessment (e.g., [41], [42]), our results suggest that ground temperature is a better indicator of UHI effects. Although the results are only for one of the 61 target days, examination of the data for the other days showed that ground temperatures in urban areas were high on most of the target days. The temperature increases after 11:00 were relatively small in the suburban areas, such as those indicated by the ellipses in Fig. 5 (bottom), because the temperature had typically peaked by 11:00.
Emulated ground temperatures at 13:30 on August 12-17 are plotted in Fig. 8. Comparison with the original observations (Fig. 2) shows that the emulated temperatures were reasonably high in the urban areas and low in the mountain areas, even on August 17 when missing observations were dominant.
The emulated ground temperature data are available at (https://figshare.com/s/5a7e2d792d5968078cc7). The MODIS dataset is challenging to use due to missing ground temperature records resulting from cloud occlusions. A potential enhancement to our ground temperature emulation approach is to further extend our emulator model to also take into account solar radiation, shaded area ratio, and other additional physical explanatory variables. However, determining a physically consistent, reliable, and statistically robust method to adjust for the effect from cloud occlusion is a challenging task. Although this approach is not directly addressed in our current emulator model, the decrease in ground temperatures in cloudy areas is indirectly addressed by including air temperature as an explanatory variable in the ground temperature model.

IV. GEOSTATISTICAL ANALYSIS OF EMULATED GROUND TEMPERATURES A. OUTLINE
The ground temperatures are represented byỸ t (s i ), where i ∈ {1, . . . , n} and t ∈ {1, . . . , 61} indicate the grids and days, respectively. A preliminary analysis of the distribution of the ground temperatures demonstrated the existence of skewness and kurtosis in the distribution and that their strengths greatly depended on the location.
We thus considered analyzing the process behind ground temperatures on the basis of skewness and kurtosis using the TGH-RF model. Unfortunately, this model is computationally intensive and thus not suitable for such analysis. Accordingly, as described in Section IV-D, we extended the model to a low-rank model for estimating skewness and kurtosis, which are assumed constant over space, and, as described in Section IV-D2, we extended it to a sparse model for estimating local skewness and kurtosis. We used these extended models to model ground temperatures in each day.

B. TUKEY G-AND-H DISTRIBUTION
The Tukey G-and-H (TGH) distribution is formulated as where Z ∼ N (0, 1), and {a, b, g, h} represent the location, scale, skewness, and kurtosis of the temperature process.
As g and h approach 0, then a and b approach the mean and standard deviation ofỸ . The TGH distribution is particularly useful for analyzing skewness and kurtosis. The next section introduces the TGH-RF model combining the TGH distribution with GP that we used to consider spatial dependence.

C. TUKEY G-AND-H RANDOM FIELDS (TGH-RF) MODELS 1) ORIGINAL MODEL
The original TGH-RF model [46] is an extension of the GP model incorporating skewness and kurtosis parameters. The TGH-RF model is formulated as where Z t (s i ) is a Gaussian random variable satisfying The TGH-RF model is an extension of the GP model and is used to estimate skewness (g) and kurtosis (h). The log-likelihood of the model at time t is given as where Z t and Z (s i ) include θ 1 ∈ {a, b, g, h}, whereas C includes θ 2 . All three are parameters describing spatial dependence. Xu and Genton proposed the following parameter estimation procedure [36]: (i) Set the initial values forθ 1 andθ 2 .
(ii) Iterate the following steps until parameter estimates converge.
(ii-1) Numerically maximize L(θ 1 ,θ 2 ) with respect to θ 1 (ii-2) Numerically maximize L(θ 1 , θ 2 ) with respect to θ 2 Unfortunately, this algorithm is not suitable for our large dataset due to the iterative calculation. To cope with this problem, the subsequent two subsections extend the TGH-RF model by introducing low-rank and sparse approximations.

D. LOW-RANK TGH-RF MODELING 1) MODEL AND ESTIMATION
To accelerate the computation, we approximated Z t with E L L , where L ∼ N (0, p m L ), E L is an n × L matrix of the first L eigenvector of a prespecified spatial correlation matrix, R, L is an L × L diagonal matrix in which the elements are the first L eigenvalues of the matrix, and p =  Corr[E L L ] = E L L E T L , which is a low-rank approximation of C. The cost for the eigen-decomposition, which has computational complexity of O(n 3 ), can be reduced by using the Nyström extension ( [49]), a sparse greedy approximation ( [50]), or another eigen-approximation technique (see [51]).
The i, j-th elements of the C matrix are given by the exponential kernel exp(−d i,j /r), where d i,j is the Euclidean distance between sites i and j, and r is a given range parameter. In accordance with the method in [47], r is given a priori by the maximum distance of the minimum spanning tree covering the sample sites. Although the fixed r is somewhat restrictive, the scale parameter m assumes the role of the range parameter ( [48]). More importantly, the fixed r drastically accelerates the computation, as we will demonstrate below.
The log-likelihood (10) can be rewritten using the eigenpairs: Z t (C −1 + hI)Z t and |C|, which have complexity O(n 3 ), are replaced with Z T t E(p −m L + hI)E T L Z t and L l=1 λ m l , which have complexities O(Ln) and O(L), respectively. The computationally efficient log-likelihood specification is not available without fixing the r parameter (because the eigenpair changes depend on the r value). Equation (11) can be maximized exactly like Eq. (10), in which θ 2 = m.
Monte Carlo simulation of the low-rank approach revealed that it accurately approximated the a, b, g, h parameters relative to the original TGH-RF model in a computationally efficient manner as long as the number of basis functions was not too small. Details of the validation study are in Supporting Material B.

2) APPLICATION TO EMULATOR MODEL GROUND TEMPERATURES
The low-rank approach was used to estimate the moment parameters (location, scale, skew, kurtosis) for 7 clusters for the 61 target days. As shown in Fig. A2, the clusters were numbered in accordance with their distance from the center. The estimated parameter values are plotted in Fig. 9. The top panels show them for the three urban clusters, i.e., the three clusters with geometric centers closest to Tokyo Station, and the bottom ones show them for the other four, non-urban, clusters (see Fig. 14). In each panel, the solid lines represent parameter estimates, and the dashed lines represent their 95% confidential intervals.
For the non-urban clusters, location parameter a gradually decreased over the period, whereas for the urban clusters, it remained large across the period. This may be due to the fact that urban materials store heat. Heatwaves are thus estimated to last longer in urban areas than in non-urban areas. The gradual increase in scale parameter b for the urban clusters indicates that heat is more variable at the end of the summer than at the beginning.
Skew parameter g was positively significant for many clusters. The temperatures are estimated to have a right-skewed tail. In other words, the temperature can become extremely high, resulting in hazardous heat in rare cases. This suggests the importance of considering skewness, which is ignored in standard second-order spatial models (e.g., the GP model), in heat risk estimation. By contrast, kurtosis parameter h had large standard errors and thus was statistically insignificant in most cases. This indicates that kurtosis is less important in heatwave modeling. The estimated probability densities for each day for each cluster are plotted in Fig. 10. The distributions for clusters 1, 3, 4, and 5, which are near the center of the Tokyo area, are right-skewed while those for clusters 6 and 7, which are in mountain areas, are nearly Gaussian. This means that if we assume a Gaussian distribution for temperature, the right skew is ignored, and risk is underestimated.

E. SPARSE TGH-RF MODELING 1) OUTLINE
The low-rank approach has two limitations. First, it tends to underestimate small-scale spatial variations (see [52]). Second, it is based on the assumption that the a, b, g, h values are the same across space although they might actually vary across geographical space. To overcome these limitations, we developed a sparse TGH-RF approach by extending the approached used by Gramacy and Apley [22].
Their approach, which is called local approximate GP (laGP), estimates a local GP for each prediction location independently, using n(s * ) local subsamples, which we denote D(s i ). Although the n(s i ) nearest-neighbors (NN) method works well in many situations, the NN-based design is not optimal because accurate GP estimation requires a minimum amount of subsample spread ( [54]). Gramacy and Apley thus proposed a greedy search algorithm to find D(s * ) [22], minimizing the Bayesian mean-squared predictive error. We reduced the computational burden of the TGH-RF model by applying their approach.
Regarding b 2 (s 0 ), Xu and Genton showed that the variance in the TGH process is identical to the variance in GP (see Lemma 3 in [14]). Therefore, for a local TGH process, the conditional variance is given exactly like laGP (see Eq. 5 in [22]): where The following recursive equation is based on the one in [22]: where G j (s j+1 ) = g j (s j+1 )g j (s j+1 ) T . Thus, b 2 (s 0 )|y(s j+1 ) is updated sequentially by substituting (19) and (20) into (17).

3) LOCAL SAMPLING DESIGN
Since the variance updating equations for the local TGH process are identical to those for the laGP, the expression used for the laGP variance can be used here: Therefore, the following local design algorithm, which was proposed in Algorithm 1 of [53], can be used here. 1) Let s k(−j) denote the k nearest neighbors to s 0 among the locations not currently in sub-design s j . Set δ j+1 as the maximum variance reduction from N jk (s 0 ). That is, 2) Set where λ min is the minimum eigenvalue of C(s j , s j ). Let Then, 3) Set j = j+1 and repeat 2 and 3 until either the reduction in variance R u (s 0 ) falls below a prespecified threshold or the local design budget is met.

4) LOCAL TGH PROCESS ESTIMATION PROCEDURE
The following procedure ( [22]) is applicable if the laGP parameter estimation part is replaced with the laGH estimation part. 1) Choose a reasonable starting global lengthscale parameter θ 0 for all sites s 0 . 2) Calculate local design s j for each s 0 independently by sequential application of the procedure above. 3) Independently calculate the MLE lengthscaleθ(s 0 )|s j , thereby explicitly obtaining a global nonstationary predictive surface. 4) Set θ(s 0 ) =θ(s 0 ) possibly after spatial smoothing over all s 0 locations. 5) Repeat steps 2-4 as desired. Then independently output each prediction s 0 based on s j and possibly smoothed θ(s 0 ). Unlike the low-rank approach, the sparse approach estimates local variations of skewness and kurtosis. Furthermore, the independent computation for each site enables parallelizing local model estimations.
We performed a Monte Carlo simulation experiment to examine the parameter estimation accuracy of the local TGH modeling approach. The results suggest that it properly estimates a positive g value if the true process has a positive g. The same is true for negative g values. However, the estimates have a relatively large variation because only a small number of samples are used for the estimation. The local averaging, which we assume in the fourth step of the local TGH-RF estimation procedure, is valuable for reducing the variation so that credible estimates can be obtained.
Regarding kurtosis parameter h, the estimate properly takes values near zero when the true process has a zero h value while the estimate tends to take larger values when the true process has a positive h value. However, the estimates are smaller than the true value. Estimation of the h parameter using small local samples would thus be useful. See Supporting Material B for further details.

5) APPLICATION TO ENUMERATED GROUND TEMPERATURES
The sparse TGH-RE approach was used to estimate the {a, b, g, h} parameters at each site. To analyze the typical spatial patterns of these parameters and also to stabilize the estimates, the sparse TGH approach was fitted to 5-day intervals from July 1, and the resulting parameter estimates were averaged over the target days. For fast computation, the sizes of the local samples, which were optimized using the procedure explained in the previous section, were constrained to be equal to or less than 200. Fig. 11 illustrates the estimated parameters averaged across 11 days (except for July 19, when the mean maximum temperature was the highest). The estimatedâ(s 0 ) parameters  indicate that the temperatures in the urban areas were higher than those in the mountain areas. The estimatedb(s 0 ) values were larger in the mountain areas, where the up-and downdrafts over the mountains make the weather more variable. These results show that this approach captures such local variability in the temperature. In contrast,ĝ(s 0 ) andĥ(s 0 ) were nearly zero across the region. These values were almost zero across the 11 days, meaning that skewness and kurtosis are not so influential, at least on typical summer days. Fig. 12 plots the estimated parameters on July 19, when the mean maximum temperatures were the highest across the target days. Interestingly, their spatial patterns are quite different from the average patterns shown in Fig. 11. Theâ(s 0 ) parameter shows a similar spatial pattern in the mountain areas but much higher values near the center. This reflects the stronger presence of UHI effects near the center urban area.
The large values ofb(s 0 ) in the Tokyo Bay area suggest that sea breezes, humidity, and other sea-related factors make the temperature more variable on hot days. Theĝ(s 0 ) parameter had large positive values for the center of Tokyo, indicated by the ellipse. This suggests that extreme heat means not only an increase in temperature but also changes in the temperature distribution shape, which determines risk (e.g., exceedance probability). Theĝ(s 0 ) values tended to take positive values in the mountain areas as well, meaning that skewness is especially important on hot days.
Theĥ(s 0 ) values tended to be high in the basin area indicated by the white ellipse due to the flow of hot air into the basin. The large values indicating a fat tail might be caused by dramatic changes in temperature depending on the existence of a hot air flow. A fat tail (or a largeĥ(s 0 ) value) appeared only for the intensively hot day, an important finding for evaluating probabilistic risk appropriately.

V. EXPLORATORY TEMPORAL ANALYSIS OF EMULATED GROUND TEMPERATURES
The previous section described the analysis of the spatial variation in the skewness and kurtosis in the ground temperatures. This section further describes the distribution properties, focusing on the temporal aspects while considering the spatial heterogeneity of the distributions.

A. OUTLINE
Given that skewness and kurtosis were spatially inhomogeneous, we divided the study area into seven sub-regions 1 The clustering was performed as follows: (i) The TGH distribution was fitted to {Ỹ 1 (s i ), . . . , Y 61 (s i )}, where 1-61 represents the index of the 61 days, to estimate the moment parameters {a(s i ), b(s i ), g(s i ), h(s i )} for each grid s i . We used the l-moment matching method of Peters et al. [45]. They demonstrated that their method estimates the moment parameters more robustly and accurately than classical moment matching, maximum likelihood, and quantile matching methods. (ii) k-means clustering was applied to {a(s i ), b(s i ), g(s i ), h(s i )}, and the 30,572 grids were grouped into seven sub-regions. We used a standard k-means method that defines the k-mean centers on the basis of the arithmetic means of the subsamples. The next subsection explains the l-moment matching method. The subsequent subsection explains the estimation results for  the l-moment parameters. The next subsequent subsection explains the results of spatial clustering based on the estimated moment parameters.

B. L-MOMENT MATCHING
l-moments [44] are defined by linear combinations of order statistics. The m-th l-moment for a sample of ordered observations An l-moment exists if and only if the distribution has a mean; thus, l-moments can characterize a wider range of distributions than classical moments ( [45]). The first four lmoments are expressed as where F y (u) is the distribution function for the random variable y generated at quantile level u. Because l 3 and l 4 depend on scale l 2 , it was suggested in [44] that population l-skewness τ 3 = l 3 /l 2 and population l-kurtosis l 4 /l 2 be used to measure skewness and kurtosis. Unbiased estimators for the l-moments, which are also called sample l-moments, are given bŷ where q m equals An l-moment matching method was proposed in [45]   (ii) Estimate a and b using their unbiased estimators, resulting inb For further details of these estimations, see [45].

1) ESTIMATION RESULTS
The l-moment matching was applied to the emulated ground temperatures in each grid at 13:00 on the 61 target days  Fig. 13, these parameters changed across regions. Bothâ(s i ) andb(s i ) were higher in the highly urbanized areas, i.e., at the center and toward the west. Urbanization is believed to not only increase the mean temperatures but also to expand the temperature variations, which can cause extreme heat. Unlike the ground temperatures plotted in Fig. 13,â(s i ) was highest to the west of the center. This is reasonable because this area includes an inland area noted for experiencing high heat values. Note that the estimatedĝ(s i ) had positive values, i.e., right-skewedness, in most areas. This indicates that temperature distributions have positive skewness, which can introduce extreme temperature. Theĥ(s i ) values were very small in many areas, which is significant as it indicates the probability of temperatures being higher than normally expected.

2) CLUSTERING RESULTS
As described in Section V-A, to analyze the temperature behavior in each region, we divided the target area into seven clusters (see Supporting Material A for a study on the effects of the number of clusters considered). We applied k-means clustering to the estimated location, scale, skew, and kurtosis related Tukey g-and-h parameters, {â(s i ),b(s i ),ĝ(s i ),ĥ(s i )}.
The results are plotted in the left panel of Fig. 14. The four parameters were standardized before clustering to eliminate scale dependency. For comparison, the k-means method considering only mean temperatureâ(s i ) was also applied. The results (right panel of Fig. 14) simply indicate the tendency of the central area to be the hottest and for the temperature to drop as the distance from the center increases. By contrast, when {â(s i ),b(s i ),ĝ(s i ),ĥ(s i )} were considered, the central area and the north were in the same cluster. The result is intuitively reasonable: these two areas are noted hazardous heat areas due to hot air being carried from central Tokyo by southerly winds. This means that consideration of not only the mean but also the variance, skewness, and kurtosis can reveal the hidden heatwave structure. The goodness of two clustering results were compared using where m(s i ) ∈ {ã(s i ),b(s i ),g(s i ),h(s i )}, the four parameters equal {â(s i ),b(s i ),ĝ(s i ),ĥ(s i )} after standardization, C denotes a cluster, and |C| is the number of grids in the cluster. Distance D is large if the seven clusters are well separated: it was 1.05 for (a) and 0.86 for (b). Consideration of not onlyâ(s i ) but also {b(s i ),ĝ(s i ),ĥ(s i )} was found to be necessary to cluster ground temperatures accurately while consideration of skewness and kurtosis was found to be necessary to determine heatwave risk (see Section I). Fig. 15 displays boxplots of the parameters for each cluster. It shows that the seven clusters are distinctive; i.e., there is little overlap of the value ranges for the mean, variance, and skewness parameters. These three parameters tended to have higher values near the center and lower values in the peripheral areas, indicating that the temperature distribution property greatly depends on the degree of urbanization.

VI. CONCLUSION
This study analyzed the spatial and temporal skew and kurtosis behavior of ground temperatures, which were emulated by combining ground and remotely sensed observations. An l-moment matching approach was used for temporal analysis, and low-rank and sparse TGH-RF models were used for spatial analysis. The results demonstrated the importance of considering skewness and kurtosis in heatwave modeling. Specifically, strong skewness values were estimated at the center of Tokyo, which are known to experience a considerable amount of heat. The low-rank and sparse TGH-RF models were found to be useful for revealing tail structures in large spatial data.
Several issues remain to be addressed. First, we need to extend our low-rank and sparse TGH models to dynamic spatio-temporal models. It is also important to quantify the heatwave ''risk'' that emerges when hazards (e.g., high temperature), exposure (e.g., many people are outside), and vulnerability (e.g., many of them are elderly) are all set in the analysis. Fortunately, data relating these factors are increasingly available. For example, MODIS ground temperatures are useful for evaluating heatwave hazards (i.e., higher temperatures correspond to greater probabilities of heatwave hazards). Moreover, mobile GPS data can be used to determine how many people are exposed to a hazard. National census data including the elderly ratio and household income are valuable for estimating the ratio of people who are vulnerable to heat. District-level heatwave risk estimation using a wide variety of micro spatial and temporal information would be an important step towards data-driven heat risk management, which is increasingly important as global warming advances.

VII. SUPPORTING MATERIAL A. CLUSTER ANALYSIS WITH DIFFERENT NUMBER OF CLUSTERS
Cluster analysis of the parameters {a(s i ), b(s i ), g(s i ), h(s i )} as in Section V-B2 demonstrated that n C = 7 is reasonable, where n C is the number of clusters. Fig. 13 plots the clustering results when n C was optimized by minimizing the Bayesian information criterion (BIC). In this case, n C = 49 was found to be optimal. However, the resulting clusters were extremely difficult to interpret. To obtain an interpretable clustering result, we restricted the number of clusters to at most 10. Fig. 14 plots the Akaike information criterion (AIC) and BIC of the clusters when n C , was 3, 4, . . . 10. Both decreased as n C increased. However, the decrease became relatively slow after around n C = 7. Fig. 15 displays the clustering results when n C was 3, 7, and 10. When n C = 3, the center and suburban areas merged into one cluster. However, the central area is likely to have different heat behavior. Thus, the results for n C = 3 are too simple. The results for n C = 7 have a relatively clear spatial cluster pattern whereas those for n C = 10 have a more mosaic-like pattern.   On the basis of these results, we used 7 clusters as the spatial patterns are clearer than those for 10 clusters (and for 49 clusters with minimum BIC). The boxplots for parameters {a(s i ), b(s i ), g(s i ), h(s i )} have patterns similar to those for the 10 clusters, for which BIC is the minimum for n C ∈ {3, . . . 10}.

B. SIMULATION EXPERIMENT USING LOW-RANK TGH-RF MODEL
We compared our developed low-rank TGH-RF model to the original TGH-RF model of [14] in terms of computational time and estimation accuracy of a, b, g, h.
Samples for July 1 for the cluster including the central area were used for the comparison. In the low-rank approach, the number of eigenpairs in our approximation was increased from 200 to 1200 by 200. The calculations were performed on a Windows 10 64-bit system with 48 GB of memory and coded using R (version 3.4.2).
As shown in Table 2, the computational time required for the low-rank TGH-RF model estimation was much less than that for the original model, especially when L was small. Interestingly, the low-rank approach took only 51 s even when L = 1200 while the original full rank model, which implies L = n = 1500, took 3,494 s. This demonstrates     the computational efficiency of our approach, i.e., estimating scale parameter m instead of range parameter r. Fig. 23 plots the estimated parameters. Those estimated using the low-rank approach were reasonably similar to those for the original approach when the number of eigenpairs was equal to or greater than 600.
In summary, the low-rank approach accurately estimated the a, b, g, h parameters in a computationally efficient manner.

C. SIMULATION EXPERIMENT USING SPARSE TGH-RF MODEL
We examined the estimation accuracy of the sparse TGH-RF approach by simulation. TGH-RF was generated 100 times on 39 by 39 grids, and the estimation accuracy of the g and h parameters at the center (20,20) of the grid space was evaluated by fitting the sparse TGH-RF model. The center is denoted by s 0 . Following the process described in Section IV-E5, we set the sample size used to estimate the VOLUME 9, 2021    model for s 0 to or less than 200. An exponential kernel exp(−d(s 0 , s j )/r(s 0 )) was used to model spatial dependence, where r(s 0 ) is an unknown parameter. The true values for a(s 0 ), b(s 0 ), and r(s 0 ) were 0, 1, and 1, respectively. The true values for the g(s 0 ) and h(s 0 ) parameters were specified as g(s 0 ) ∈ {−0.5, 0.0, 0.5} and h(h 0 ) ∈ {0.0, 0.25, 0.5}; for each case, the sparse TGH-RF model was fitted 100 times.
Boxplots of the estimated g(s 0 ) parameters are displayed at the top of Fig. 24. Our approach successfully detected a positive g(s 0 ) value when the true value was positive. The same is true for negative g(s 0 ). Although the error tended to increase as h(s 0 ) was increased, the estimates remained unbiased. Our approach was found to detect skewness accurately. Boxplots of the estimated h(s 0 ) parameters are displayed at the bottom of Fig. 24. The estimates are nearly unbiased in all cases, and the variances of the estimates are similar across cases. Our approach is thus accurate for kurtosis as well.