Machine Learning-Based Estimation of High-Resolution Snow Depth in Alaska Using Passive Microwave Remote Sensing Data

Snow depth (SD) knowledge is significant in many applications related to hydrology, climate, and disaster management. Many SD models are developed using multifrequency spaceborne passive microwave (PMW) brightness temperature (Tb) observations because of their sensitivity to SD. The sensitivity of Tb to SD is affected by snow metamorphism, which constrains the utility of several empirical and conceptual models for estimating SD. For the first time, extremely randomized trees (ERT), a machine learning algorithm, which is less susceptible to data noise, is used in this study for estimating SD at high resolution (1 km × 1 km) for Alaska. Different ERT SD models (i.e., Alaska wide model, zonal model) are developed using Advanced Microwave Scanning Radiometer-2 data and auxiliary datasets for various Alaska regions during 2012–2021. These models are evaluated using three different cross-validations (i.e., sample, spatial, and temporal). Further, ERT models' predictive power assessment is performed using independent spatial, temporal datasets. The results indicate that: 1) inclusion of auxiliary parameters improves the accuracy of ERT SD estimates; 2) there is no substantial difference between the zonal and Alaska wide ERT model estimates; 3) when SD $ > $30 cm, the ERT models have outperformed the AMSR-2 product, the GlobSnow product, and the Chang model; 4) the mean absolute error in SD estimates increases with a decrease in latitude, an increase in elevation, and from early winter to late winter across Alaska. Overall, this study shows that the ERT SD model has good potential for improving moderate to deep SD estimates.

Space-borne passive microwave remote sensing (PMW) has been extensively used for over four decades  for monitoring various spatiotemporal characteristics of the snowpack including snow depth at different scales all over the globe [16], [17], [18], [19]. PMW radiation penetrates the snowpack, and offers cloud free snow data under all weather conditions with high temporal resolution [20]. PMW emission from a snowpack primarily contains two components, i.e., emission from snowpack volume and underlying ground. The combined emission from these two components is recorded as passive microwave brightness temperature by the sensors onboard the various platforms [21]. The emission from the ground is subjected to both volume scattering, and absorption depending on the state (i.e., dry or wet), microstructural characteristics of the snowpack [21], [22]. In dry snow, where grain size is comparable to PMW wavelengths, internal volume primarily takes place. Snowpacks having a higher depth constitute higher number of snow grains, thereby attenuating more PMW radiation. Larger attenuation of PMW radiation occurs at higher frequencies as compared to lower frequencies [23], [24], [25]. In the case of wet snow, for a given snow depth with increase in wetness, the brighntess temperature increases. Whereas with an increase in grainsize, the PMW brightness temperature decreases [26]. Therefore, brightness temperature observations collected by passive microwave satellites at different frequencies are sensitive to snowpack characteristics such as snow depth, density, and grain size [19], [27], [28], [29], [30], [31], [32], [33], etc. Depending on snowpack conditions, the snow depth penetration of PMW waves varies from 10 to 100 times its wavelength [21], [29]. As a result, PMW brightness temperature data offers valuable insights into understanding snowpack depth. Snow depth is positively related to PMW scattering and negatively related to brightness temperature [34]. Thereby, PMW brightness and temperature observations are extensively used in many studies for estimating snow depth over different parts of the world [16], [18], [35], [36], [37], [38]. Further, PMW observations are also used for the development of hemisphere scale operational snow products, i.e., AMSR-2 snow depth [39], GlobSnow snow water equivalent (SWE) [40], etc. AMSR-2 product is developed using a dynamic approach consisting of polarization ratio to account for snowpack microstructure variation [41]. Whereas, the GlobSnow product [40] is developed through the assimilation of synoptic snow depth observations into the Helsinki University of Technology (HUT) snow emission model [42]. The forward brightness temperature simulations from the HUT model, and PMW observations from various instruments onboard the NIMBUS-7 and Defense Meteorological Satellite Program (DMSP) satellites are used in the Bayesian difference minimization scheme to retrieve snow depth. However, both products have problems with overestimation of shallow snow depth, and underestimation of deep snow depth as demonstrated by several studies [38], [43], [44].
Many algorithms for snow depth estimation were developed using an empirical relationship between snow depth and brightness temperature difference of different frequencies and polarizations [16], [18], [41], [45], [46], [47]. However, the relationship between snow depth and brightness temperatures is not linear and is affected by various other snowpack properties, topographic, and environmental variables [14], [19], [23], [32], [48], [49]. The nonlinear and conceptual models developed by different researchers [32], [42], [50], [51], [52] to account for the limitations of linear models also have drawbacks in estimating snow cover thickness due to saturation of microwave response with increasing snow depth, and snowpack conditions [53]. Snow depth estimation using PMW brightness temperatures is an ill-posed problem that requires knowledge of various variables related to snowpack microstructure [22], [52], topography [54], [55], and landcover characteristics [29], [41], [47]. Data on all these variables are scarcely available; hence, many linear [16], [18], nonlinear [32], [56], and conceptual models [42], [52] were developed by generalizing the unknown parameters. Therefore, these models can result in biased estimates if the study region does not relate well with the generalized parameters. In this regard, the potential of data-driven techniques comprising various machine learning models such as support vector machines, random forests, etc., have been explored in several studies [33], [44], [57], [58], [59], [60], and these techniques have resulted in improved snow depth estimates. For example, Wang et al. [58] have developed a deep learning model for estimating snow depth over Alaska. Their method has shown good accuracy in snow depth estimations, with a mean absolute error (observed in cross-validation results) as low as 10 cm. However, their method [58] is not evaluated independently in different regions. Further, no inference was made on the model performance on spatially and temporally independent data locations. Though machine learning models have shown potential in estimating snow depth, these models also have limitations such as poor explainability, complex architecture, high computation time, extensive data requirement for training, etc. For example, machine learning models such as random forest and support vector machines in their default settings are susceptible to noise and outliers in the data resulting in high bias and variance in the machine learning model estimates [61].
Extremely randomized trees (ERT) [62] is a machine learning algorithm that partially addresses the long training time and high bias-variance problems associated with conventional tree-based models in the case of large noisy datasets. It has been successfully implemented in many studies, such as estimating soil moisture, PM 2.5 [56], [63], [64], etc. However, the potential of the ERT algorithm in estimating snow depth has not been investigated so far. PMW TB observations for a given snow depth can vary depending on the snowpack conditions (i.e., snow grain size, density, wetness, etc.). Thus, the PMW TB data used for developing snow depth models generally contains significant noise. This is more prominent in the case of thicker snowpacks, as the TB saturates with an increase in snow depth. The heterogeneity in TB resulting from contributions from mixed land canopy, diverse snowpack microstructures involving larger grain size, and density can lead to noise in the data, thereby affecting the training and testing of snow depth estimation models. Therefore, the present article presents ERT-based approach for snow depth estimation in the Alaska region. While many studies based on machine learning techniques are carried out to estimate and understand snow depth using PMW [33], [44], [58], studies addressing the variation associated with climatological zones are not present. Alaska is a large and important region in the Arctic region with a diversity of environments, from coastal to interior and from temperate to arctic. In this regard, this article also investigates if there is any advantage to implementing zonal models (explicitly developed for different regions in Alaska) over the Alaska wide model (designed for the whole of Alaska). The performance of a machine learning model can be affected by model drift resulting from conceptual and data shifts that may occur temporally [65]. This is very important in earth science problems [66], where many variables such as temperature, precipitation, etc., tend to have cyclicity and spatiotemporal variations. Hence, spatial and temporal validation of the models is essential. However, previous studies [58], [67] conducted over Alaska for snow depth estimation using machine learning have not conducted such an evaluation while developing the snow depth model. Localized spatiotemporal variation in attributes affecting the PMW response may impact the machine learning model (Alaska wide snow depth model), as the model generalization related bias can impact the predictions. Hence, this article also presents an overview of how MAE in model estimates of snow depth vary with different auxiliary parameters.
Accordingly, the current study has the following four objectives.
1) Development of new snow depth models using ERT algorithm for Alaska, the study region. 2) Evaluating the performance of models on spatially and temporally independent observations. 3) Assessment of the zonal, Alaska wide ERT snow depth models, AMSR-2, GlobSnow, and Chang snow depth products in different regions within the study area. 4) Analysis of MAE of ERT snow depth model estimates in the study area with respect to the auxiliary variables. The rest of this article is organized as follows. Section II describes the study area's geographical, topographical, and climate attributes. Snow season climatology is also briefed in Section II. This emphasizes why the assessment of zonal regional models is carried out in this study. Section III describes different datasets used in the study. Following that, Sections IV and V present the methodology and discuss the results, respectively. Finally, Section VI concludes this article.

II. STUDY AREA
Alaska is present between the latitudes of 51 • N to 71 • N and longitudes of 130 • W to 180 • W, as shown in Fig. 1. Geographically, it is located in the higher latitude region of the northern hemisphere; hence the cold climate is present for most of the months of the year. Alaska has a long coastline of 53 100 km as a boundary shared with three oceans, i.e., the North Pacific, the Bearing Sea, and the Arctic Ocean, in the north, south, and west directions, while the east side boundary is shared with Canada. Brooks and Alaska mountain ranges located in the north and south-central areas, respectively (see Fig. 2), are the prominent topographical features of Alaska state. The southeast of the Alaskan range is characterized by steep terrain.
Alaska's geography and coastal configuration strongly define the climate of various regions in Alaska. Bieniek et al. [68] have identified 13 different climatologically consistent regions in Alaska. In the current study, for developing machine learning models these 13 regions are grouped into six zones based on the previous climatological regions [69], adjacency, and land surface temperature (LST) variation. There exists a substantial variation of LST across various Alaska zones. Different climatic zones of Alaska, mean LST variation (2012-2020), and average snow depth observed over each zone during 2012-2021 are shown in Fig. 1. The northern region has lower temperatures and less precipitation. In contrast, southern and western coastal parts experience relatively higher temperatures and higher precipitations. Northern Alaska has a polar climate, Central Alaska has a continental climate, and Southern Alaska has a maritime climate.
Snow cover and snowfall are prominent features of the climate in Alaska. The snow season constitutes all months of the year except May through September. However, strong interannual variation exists in the snow pattern across various zones of Alaska. Mountainous and northern locations have stable snowpack that does not melt out until late May or June. In contrast, snowmelt events are common in interior regions from late April to early May and march to April in low-altitude regions of southern Alaska. Further, even during the winter, some of the Aleutian Islands and coastal areas of southeast Alaska do not have consistent snow cover. This diverse climate and topographical conditions prevailing in Alaska pose significant challenges in estimating snow depth using remote sensing.

III. DATASETS
Auxiliary features such as vegetation and elevation can impact the accuracy of snow depth estimations from snow depth models [54], [70], [71]. Hence, along with passive microwave data, various other auxiliary datasets are used in this study for constructing the ERT snow depth model. The details of different datasets are given in the following Sections III-A to III-D.

A. In Situ Snow Depth Observations
The Global historical climate network (GHCN) compiles the daily summaries of in situ weather observations such as air temperature, precipitation, snow depth, and others from a network of more than 10 000 worldwide stations. The daily snow depth datasets contain information about the station, such as station id, latitude, longitude, elevation, quality flags, and snow depth. Daily snow depth observations for the snow season, i.e., from September of each year to May of next year during 2012-2021 from 167 stations over the Alaska region are collected from the GHCN archive of climate data online portal (https://www.ncei.noaa.gov/cdo-web/). Quality control of observations is carried out before storing the data in the GHCN archive. A total of 135 348 snow depth observations collected during the study period are used for developing and evaluating different snow depth models. These observations are separated into six groups based on the zones divided (see Fig. 1).

B. AMSR-2 PMW Brightness Temperature Data
Advanced Scanning Microwave Radiometer -2 (AMSR-2) is a PMW sensor onboard the JAXA Global Change Observation Mission 1st-Water (GCOM-W1) SHIZUKU satellite launched in May 2012. AMSR-2 sensor records microwave emissions from the earth in 14 different channels operating at seven different frequencies (6.9, 7.3, 10.65, 18.7, 23.8, 36.5, and 89.0 GHz). It has global spatial coverage with different spatial resolutions at different frequencies and a temporal resolution of 1 d. The selected specifications of AMSR-2 are given in Table I. More details about the GCOM-W mission and AMSR instrument are available in the JAXA portal (https://suzaku.eorc.jaxa.jp/ GCOM_W).
AMSR-2 level-3 gridded brightness temperature data of the descending pass at 10 km resolution in 14 channels (7 frequencies, with H and V polarizations) during 2012-2021 for the snow season (September to May) is used in the current study. The AMSR-2 snow depth product is also collected to compare with the ERT snow depth product and in situ snow depth observations.

C. Elevation Data
Topographic information such as elevation, slope, and aspect affect the characteristics of snowpack [54], [70], [71]. Further, in a given region, high altitude mountain ranges typically have higher snow depth compared to less altitude regions. Hence, elevation and its derivatives are used in different studies for improving the accuracy of snow depth estimates [32], [33], [58]. GTOPO30 is a global elevation model dataset that provides elevation at a spatial resolution of 30 arc seconds (approximately 1 km) and is used in this study along with other datasets for estimating snow depth.
The topographical elevation variation over different regions is shown in Fig. 2. The highly elevated regions in Alaska are typically underlain by the presence of mountains, which act as natural boundaries between different climate zones and define the regional climate in the respective regions. Apart from mountain ranges, almost 50% of the area is relatively flat and present within 400 m elevation.

D. Vegetative Cover Fraction
Forest cover fraction intercepts PMW emission from the snowpack beneath the ground and contributes its own emission, resulting in inaccurate model estimates of snow depth [17]. Hence, many studies have attempted to account for the effect of forest canopies while estimating snow depth using PMW remote sensing [17], [19], [72], [73].
In the current study, continuous vegetation product from MODIS, i.e., MOD44B -Version-6 [74] is used to account for the vegetation cover in the study area. MOD44B dataset provides information on the tree and nontree vegetation cover fraction at 250 m. In the Alaska region, the north slope region is devoid of vegetation, while continental Alaska comprising the south, central, and north interior regions has extensive forest cover, i.e., as much as 50% in some locations, as shown in Fig. 3.

E. AMSR-2 SD Product
AMSR-2 SD product is developed using brightness temperature observations from ASMR-2 sensor onboard the GCOM-W1 satellite. The SD retrieval algorithm initially checks for the snow depth condition, i.e., shallow or snow. The polarization factor that is used to control the effect of snow grain size is optimized throughout the product development [41]. Following that, the snow depth is retrieved based on the landcover condition using brightness temperatures (at frequencies of 10, 18, and 36 GHz), polarization factor [75]. In this study, AMSR-2 SD product during 2018-2021 is used to compare with ERT snow depth estimates.

F. GlobSnow Product
Global Snow Monitoring for Climate Research (GlobSnow) is a satellite-based daily SWE product for the northern hemisphere [40], [76]. GlobSnow SWE products were developed using PMW data from the Scanning Multichannel Microwave Radiometer (SMMR, 1980(SMMR, -1987, Special Sensor Microwave Imager (SSM/I), and the Special Sensor Microwave Imager/Sounder (SSMI/S) (1988-till). The SWE estimation in GlobSnow consists of three essential steps: 1) estimation of snow grain size for in situ locations using an inversion scheme with HUT model; 2) generation of background snow depth and snow grain size using Kriging; and 3) SWE estimation using a Bayesian assimilation scheme. The near real time GlobSnow SWE product (during 2018-2021) is used in this study for comparing with ERT snow depth model estimates. The Glob-Snow SWE is converted to SD by adopting a density value of 240 kg/m3.

IV. METHODOLOGY
The present study implements the ERT algorithm for developing different snow depth models using the PMW brightness temperature data and auxiliary data consisting of geographical and topographical attributes. A large-scale ERT snow depth model (i.e., Alaska wide model) for the entire Alaska is developed utilizing the data from the whole of Alaska. Similarly, a zonal ERT model is developed separately for each of the six different climatic zones (see Fig. 1) of Alaska to understand if there is any advantage in using these zonal models. The Alaska wide model is implemented with two different configurations, i.e., with and without auxiliary attribute data, to emphasize the importance of topographical, vegetation, and location parameters for model development. Details about the ERT algorithm and a description of the methodology are given in Sections IV-A and IV-B, respectively.

A. Extremely Randomized Trees
An extremely randomized tree [62] is an ensemble of decision trees and is a variation of the random forest tree technique. ERT combines multiple fragile learners to generate a strong predictor using the ensemble process. In high noise datasets, traditional tree-based algorithms such as random forests have drawbacks such as high bias and variance caused by the optimal cut-point selection [62] at each node. ERT attempts to address the high variance in the predictions from a noisy dataset by randomizing both the cut-point and attribute selection at each node. Unlike bagging models, where boot-strapped data are used for learning, entire samples are used for learning in each decision tree of the ERT. Thus, several decision trees generated from the entire training sample are ensembled to ascertain the final result of the model. The random split-point selection and usage of the entire training data for tree development helps to reduce the bias, while ensembling aids to prevent the model from overfitting and making it insensitive to noise in the data. Another advantage is that ERT offers a considerable reduction in training time, i.e., as much as 50%-60% compared to other tree models such as random forest and tree bagging depending on the training samples and attribute space. A simple figure illustrating the extreme randomized tree model is shown in Fig. 4.
By default, the entire attribute space is used while developing the ERT model to get highly randomized trees in an ensemble, producing less variance. While the minimum number of leaves for split depends on the nature of the dataset, in general, a higher number for split leads to smaller trees with higher bias and variance. The optimal number for this shall be set based on the noise present in the dataset. The number of trees is another parameter, where error monotonically decreases with an increase in the number of trees. However, a large number of trees increases computation time. Hence, these parameters need to be configured carefully as each setting produces a result with varying training time and accuracy.

B. Model Development
Detailed information involving various processes such as pre-processing, model development, etc., are given in following subsections from 1 to 3. Various steps implemented for the development of the model are shown in Fig. 5.
1) Data Preprocessing: All datasets are processed to match the resolution and coordinate system. Elevation, slope, and aspect data are of 1 km resolution, and vegetation fraction data is of 250 m resolution, whereas AMSR-2 brightness temperature data is at 10 km resolution. Accordingly, in this study all datasets are resampled to 1 km × 1 km resolution. The direct resampling of AMSR-2 TB will not change the TB values within the area occupied by the original 10 km pixel. Consequently, for the entire 10 km × 10 km grid of AMSR-2, TB values would remain the same. However, the distribution of snow depth at a given location can be greatly affected by topographical parameters. The data for these parameters (i.e., elevation, slope, aspect, and vegetation fraction) exhibit considerable spatial variation at 1 km resolution. Therefore, the snow depth model developed by incorporating the auxiliary parameters will show variation at 1 km. After resolution matching, information from these datasets for selected GHCN stations is retrieved (based on latitude and longitude) for further processing comprising data standardization and transformation. Standard techniques in exploratory data analysis and feature engineering of machine learning such as data cleaning, logarithm transformation, one-hot encoding, etc., are implemented on relevant features during the data processing. The in situ snow depth observations are screened for missing values and error values in the initial processing. The microstructure characteristics of snow can vary with snow accumulation, stabilization, and melt processes, which are temporal activities [77]. Hence, in this study, the time information is used along with other variables for training the ERT model. The snow depth distribution in different zones of Alaska (see Fig. 7) is analyzed to understand and prepare the snow depth data for training the machine learning model. The results of the snow depth analysis are given in Section V-A.
2) Screening of Snow Pixels: The identification of snowcovered pixels in the study region is conducted using Grody's decision tree [78]. Different filters are applied to the brightness temperature data of AMSR-2 for identifying the pixels representing precipitation, wet snow, frozen desert, etc. These nonsnow pixels are removed from the data to contain only snow pixels.

3) Training the ERT Model:
The standardized and preprocessed data from GHCN stations is partitioned into different groups (i.e., group-A, group-B) as given in Table II for the development and testing of the model. Approximately 10% of stations from each zone are randomly selected to form group-B stations, and the remaining stations are considered as group-A. Data for the snow season, i.e., from 1st September of each year to 31st May of next year during 2012-2021 (9 years), is considered for developing and evaluating different models. The ERT model is trained using the screened snow pixel data from Grody's tree to fit the relationship between snow depth and multiple selected independent variables, i.e., brightness temperature data, topographical data, location, and time. Different datasets, i.e., in situ snow depth, PMW brightness temperature data, and topographical information, are processed and divided into six groups (i.e., one group for each zone of Alaska) to develop different snow depth models. The data grouped for each independent zone is used in training the respective zonal ERT snow depth model. Further, Alaska wide single snow depth model is developed by training the model using data from all zones.
Most of the early studies used 18 and 36 GHz brightness temperatures to estimate snow depth [16], [18], [29]. However, few studies have also reported that based on snowpack conditions, brightness temperature of 10 and 89 GHz correlates with snow depth [79]. Henceforth, the correlation of brightness temperature from a particular frequency with snow depth is highly variable and largely dependent on the snow depth conditions, which vary from place to place, and sampled subsets. Further, in data-driven models, the physical/conceptual relation between attribute space is not essential for the development of the models. Therefore, unlike previous studies, where only brightness temperature data of selected frequencies are used, in this study, all seven frequencies in both polarizations (H and V) are supplied to the ERT algorithm to construct the snow depth model. However, before developing the model, the parameters which have high mutual correlation (>0.9) are identified. Following that, different SD models are developed using all single parameters from each correlated group. The feature resulting in the lowest mean absolute error is selected from each group in addition to the other variables for the model development. Thus, feature optimization is carried out during the development of all SD models. The relative feature importance score (based on gini-impurity reduction) calculated for different parameters is depicted in Fig. 6. These results indicate that each of the single parameter used in the model has a weak connection with SD for deriving the model. This can lead to a lot of uncertainty in estimating SD from the developed SD model. However, the optimal model configuration, together with feature combination, can address this problem to a certain extent. Further, the selected machine learning approach, i.e., ERT is reliable in these situations [62], where generalization of the models is highly necessary. Therefore, for the development of the model, snow depth is considered as a function of brightness temperature at different frequencies, location, and topographic information such as altitude, slope, aspect, vegetation cover fraction, etc. Along with these parameters, time is also considered an important parameter as it can have an impact on the characterization of snowpack microstructure where Tb i = Brightness temperature of frequency i (where i varies as 6, 7, 10, 18, 23, 36, and 89 GHz), La = latitude, Lo = longitude, H = Elevation, S = Slope, A= Aspect, F = forest cover fraction, and T = month.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. In each zone to develop the ERT snow depth model, the training dataset is considered as a set of n-number of observation samples, i.e., S i = S 1 , S 2 , ..., S n , where each observation sample S i = f 1 , f 2 , ..., f p is a p-dimensional vector, with f 1 , f 2 , ..., f p representing the value of p no of characteristic features such as brightness temperature, elevation, slope, vegetation, etc. Though ERT is less sensitive to outliers and multicollinearity, removing multicollinearity can improve the training time [62]. Hence, before training the ERT model, highly correlated variables (above 0.9) are identified and filtered in each model's dataset to remove multicollinearity. The ERT model has three important user-defined parameters, i.e., 1) number of trees (M); 2) number of attributes selected at each node (K); 3) minimum number of samples required for splitting the node (n min ). Adding more trees does not necessarily mean a significant improvement to the ERT model, as the gain in the predictive performance of the algorithm would be small with the addition of more trees [80]. Further, the model/errors cost also stabilizes after a certain number of trees. Hence, hyperparameter tuning is used to select M and n min values. For training the model, as suggested by [62], all features and the entire training data sample are selected at each node for creating M number of independent decision trees. In the current study, M and n min values varied from (150-200) and (4-8), respectively, across different regions for different ERT models. The mean of the results computed from all trained decision trees is considered as the snow depth estimate for the given sample.
The evaluation of the model is carried out using three different types of cross-validations. Then predictive power of the model is assessed using spatial and temporal independent datasets. The detailed information about ERT model(s) evaluation and assessment of predictive power is described in Section IV.

4) Evaluation Approaches of the Model(s):
The performance of the developed ERT models is evaluated using the K-fold cross-validation (K-CV) technique [81], [82], where the data is divided into K-groups, and samples in each group are tested using the model trained from the remaining groups. Thus, the model is evaluated for K-times using a different test data fold each time, thereby testing all samples. The model evaluation metrics obtained during testing of each fold are averaged to calculate the final assessment of the model performance. The CV approach enables checking for problems such as overfitting and bias-variance variability. However, in the normal K-fold CV approach, validation is carried out using randomly grouped equal-sized samples (sample-based K-CV), which are often not spatially and temporally independent. Hence, sample-based CV does not indicate the generalized performance of the model. Therefore, in the present study, along with random samplebased CV two more approaches, i.e., temporal and spatial K-CV are used to evaluate whether the model is generalized spatially and temporally. These additional grouped K-CV approaches have been used for the evaluation of machine learning models in various other studies [44], [83], and operates similar to the sample K-CV approach. In these CV approaches, instead of random samples, year information from the date (time) and station location id (space) were grouped into K-folds for temporal and spatial K-CV, respectively. For sample-based K-CV, the K value is set to 10. In contrast, in space and temporal CV, the K value is set as 3 to facilitate a sufficient amount of data for training and testing the model. Data from group-A stations during 2012-2013 to 2017-18, is used for implementing all three types of cross-validation (see Table II). Standard regression metrics, i.e., mean absolute error (MAE), root-meansquare error (rmse), correlation coefficient (R), and coefficient of determination (R 2 ), are used for the evaluation of the model performance. The grouped K-CV provides an average estimate of the model performance in different spatial locations and time periods. However, the drawback of K-CV evaluation is that it underestimates the variance error [81] of predictions. Hence, in this study, in addition to three types of cross-validation, the temporal and spatial predictive power of models is estimated using two different holdout datasets. For assessing predictive power on temporally independent observations, data from group-A stations (2018-2019 to 2020-2021) is used for testing; whereas for spatiotemporal predictive power group-B station observations (2018-2019 to 2020-2021) are used as testing data (see Table II).

A. Spatiotemporal Variation of Snow Depth
In order to develop the machine learning model(s), the characteristics of the observed snow depth in different zones are analyzed and reported herein. The descriptive statistics and variation of observed snow depth in different zones are given in Table III and Fig. 7. The distribution of observed snow depth is not uniform across the different zones (see Fig. 7). Among different zones, zone-6 (Z6) has the highest mean snow depth (i.e., 53.92 cm). The lowest mean snow depth (i.e, 24.90 cm) is observed in zone-2 (Z2). Except in Z6, in the remaining zones, almost 95% of the observed snow depth values are below 90 cm, and 99% of observations are within 120 cm. However, few stations in Z3 and Z6 have recorded snow depth observations exceeding 200 cm.
Other than Z6, in the remaining zones mean snow depth is close to 30 cm. All zones indicate a positive skewness in the data, with varying interquantile ranges, which indicates the heterogeneity in the distribution of observed snow depth values across different zones. Zone-5 (Z5) and Z6 are geographically located near the south coast region and experience higher temperatures, and precipitation (rain, snowfall). Thus, the snow depth in Z5 and Z6 shows high variation with several of the observations having significantly higher snow depth values than the mean zonal snow depth. Thus, the distribution tail (last five percentile) in Z5 and Z6 indicates a wide snow depth range with a difference of almost 48% between the 95th percentile and the maximum value. This can lead to high uncertainty in ERT predictions in Z5 and Z6.

B. Effect of Auxiliary Parameters, Model Selection
Two different models, i.e., one with auxiliary parameters and another without auxiliary parameters, are developed to evaluate the overall impact of additional parameters i.e., topographical, landcover, and location attributes on the accuracy of snow depth estimates. Data from the group-A stations of the entire Alaska region from 2012-2013 to 2017-2018 was used for developing ERT snow depth models. The sample-based cross-validation results indicate that the ERT model with auxiliary parameters has given better estimates of snow depth than the ERT model without auxiliary parameters with an improvement in R (R 2 ) from 0.67 (0.35) to 0.86 (0.74), and a decrease in MAE (rmse) values from 17.9 (29.43) cm to 11.6 (18.73) cm, respectively. Hence, auxiliary parameters are included in the development of both the Alaska wide and zonal models. Along with the ERT SD model, random forest and support vector machine algorithm-based SD models are also developed using the PMW data and auxiliary parameters. The results emphasize that, all three machine learning models have shown similar performance with no substantial contrast in the results. The values of R (R 2 ) are 0.86 (0.71) and 0.83 (0.68) for random forest SD model, SVM SD model, respectively. Whereas, the MAE (rmse) values  9. Evaluation and comparison of the predictive power of models in different regions using temporally independent data (a) and (b) and spatially independent data (c) and (d). ERT i (i vary from 1 to 6), ERT all represents alaska wide, zonal ERT model. of the random forest and SVM SD models are observed to be 12.36 (20.84) cm, and 13.85 (24.49) cm, respectively. Evidently from these results, the ERT SD model has shown moderately better performance with a lesser mean square error among the three models. The cross-validation results observed in this study are also very similar to those results observed by Wang et al. [58], where different deep learning models are implemented to estimate snow depth. Therefore, the ERT approach is considered for carrying out further development, and analysis of different snow depth models. A comparison of different types of crossvalidation results for Alaska wide and zonal ERT models with auxiliary parameters is discussed in Section V-C. The detailed comparison of random forest and support vector machine model results is further given in Annexure-1.

C. Sample, Spatial, and Temporal Cross-Validation Results of ERT Models
In addition to the regular sample-based CV approach, spatial and temporal CV assessments are implemented in this study to assess the model's transferability and performance in varying space and time domains. The results of the tenfold sample, threefold spatial, and threefold temporal CV in their respective zones are given in Fig. 8 as well as in Table IV. In the figure, ERT all , ERT 1 , ERT 2 , ERT 3 , ERT 4 , ERT 5 , and ERT 6 denote the models developed for respective zones (i.e., the entire Alaska, and Z1 to Z6, respectively). The model parameters are configured by implementing hyperparameter tuning for all ERT snow depth models so that they are trained without any overfit and generalized within their model regions. A comparison of the three types of CV results from ERT models developed in each zone reveals that, with the exception of ERT 2 , there is no substantial difference between the rmse, and R 2 and other evaluation metrics for sample, spatial, and temporal validations. There is a contrast between three different CV evaluation results in the ERT 2 model. This could be due to the Z2, which has a relatively lesser number of stations with samples having varying distribution characteristics in different spatial and temporal domains.
Overall comparison of zonal models indicates, R 2 is relatively consistent in all zones (ranging between 0.7-0.8), whereas rmse error is considerably varying (ranging between ∼10 and 29 cm) from one zone to another. This indicates ERT snow depth models have shown consistent explainability for snow depth estimations, with varying magnitudes of errors across different regions. The variation in rmse is positively related to the mean snow depth (see Figs. 7 and 8) in various zones. The highest rmse (∼29 cm) and MAE (∼20 cm) are observed in Z6, which is present in the southeast and southwest coastal regions that has the highest mean snow depth among six zones, i.e., ∼54 cm of Alaska. Similarly, the lesser rmse (∼6 cm, ∼8 cm) and MAE (∼4 cm, ∼6 cm) are observed in Z2 and Z4, respectively, where mean snow depth is lower when compared with other zones. The Alaska wide model (ERT all ) shows MAE of ∼13 cm and rmse of ∼20 cm in all three cross-validations. The CV results of the ERT all model are comparable to the previous study carried out by Wang et al. [58], over the same study region (Alaska) using different types of neural networks. The results from Wang et al. [58] indicate an MAE ∼(10-14 cm) and rmse ∼(18-22 cm) for different neural network-based snow depth estimates. Except ERT 6 , all other zonal ERT models (i.e., ERT 1 to ERT 5 ) have lesser rmse, MAE, and higher R, R 2 values in their respective zones when compared with those of the ERT all model across the entire Alaska. These results demonstrate the need for a comparative assessment of the predictive power of Alaska wide and zonal models in different Alaska regions. This comparative assessment enables to understand if there is any merit in using zonal ERT models. Further, the cross-validation results are often not the true representation of the predictiveness of a machine learning model, as the mean square error can be underrepresented [81]. Hence, the predictive power of the ERT snow depth models in different zones is evaluated as given in Section V-D.

D. Evaluation and Comparison of the Predictive Power of the ERT Models
The characteristics of the snow depth data in each zone are different compared to the snow depth data combined from all zones (see Table V). Further, each Alaska zone has different climatological and topographical conditions. This climate variation can lead to snowpack having the same brightness temperature for different snow depths. These multiple factors together can impact the feature scaling, feature selection, and training phases of a machine learning model. In this regard, the predictive power of the ERT models is examined using spatial, temporal independent datasets (see Table II). Further, snow depth estimates from models are analyzed in snow depth bins divided with an interval of 30 cm. The results from comparison of predictive power of the models, and snow depth binning are given in the following sections. Further, an analysis of error in models snow depth is carried out with respect to different auxiliary parameters as given in Section III.

1) Comparison of Predictive Power of Different Models:
The trained zonal and Alaska wide ERT models are used to estimate snow depth for temporally (from group-A stations) and spatially (from group-B stations) independent observations. These estimates from the Alaska wide and zonal ERT models, AMSR-2 and GlobSnow snow depth products, and the Chang model in each zone are compared with in situ snow depth observations of spatially and temporally independent datasets. The results of the comparison are given in Fig. 9. The rmse (MAE) metrics of different models calculated across the entire Alaska region for temporally independent data are as follows: ERT all 30.30 (19.78)  However, the ERT all model performance is not uniform across different zones. The analysis of the ERT all model snow depth estimates in individual zones depicts varying magnitudes of rmse, MAE, and correlation metrics compared to the average performance over all zones. This variation in performance can be attributed to the change in snow depth observations range, and diversity of auxiliary parameter characteristics in different zones. Within a given zone, both Alaska wide and zonal ERT snow depth models have exhibited similar performance with comparable rmse, MAE, R 2 , and R metrics in the temporal (see Fig. 9(a) and (b) ], and spatiotemporal [see Fig. 9(c) and (d) ] predictive power evaluations. Overall results indicate both ERT models have performed better compared to the AMSR-2, GlobSnow products, and Chang model. The detailed metrics with regard to predictive power evaluation for temporally independent observations, and spatially independent observations is given in Table VI. The results from spatial, temporal predictive power assessments of ERT snow depth models show an increased rmse and MAE, decreased R 2 , and similar R values when compared with CV results (see Section V-C). Thereby, it indicates a decrement in explainability of the model outside the training data in spatial, and temporal domains. This increment in rmse and MAE across different zones is excepted as cross-validation underestimates the variance error in regression [81]. Despite the decrement in performance compared to CV results, in all zones, ERT snow depth models have shown better estimates with a lesser rmse, and MAE, and higher R 2 , and R values compared to the AMSR-2 product, GlobSnow product, and Chang model. A figure representing the spatial distribution of the ERT SD, AMSR-2 SD, and GlobSnow SD product is shown in Fig. 10.
The similarity of data characteristics between training and testing data can aid in getting a better model performance. But similarity alone does not guarantee that the trained model will perform better on test data. The characteristics of training and testing data (i.e., temporally partitioned data) are given in Table V. The metrics such as mean, standard deviation, skewness, etc., when compared indicates a significant contrast between the training and testing datasets over different zones. Other than topographical parameters, landcover, the snowpack characteristics have an impact on the PMW brightness temperature. Therefore, considering limited parameters without snow characteristics for snow depth model development can lead to varied performance of model for similar data in different regions. Hence, the ERT model(s) prediction performance is expected to deviate from the CV results, and also from one region to other.
2) Binned Analysis: To comprehend the error associated with change in snow depth, ERT and other model estimates are partioned into different bins by considering regular snow depth intervals (i.e., of 30 cm) of GHCN in situ observations. Snow depth is analyzed in 5 slabs, i.e., less than 30 cm, 30-60 cm, 60-90 cm, 90-120 cm, and above 120 cm. The MAE of different models estimated snow depth values in each defined interval are given in  better performance with the moderate improvement compared to AMSR-2, GlobSnow products, and Chang models. Once the snow depth exceeds 60 cm, the quality of the AMSR-2 product, GlobSnow product, and the Chang model strongly deviated showing a larger MAE when compared with actual observations. AMSR-2 product, Chang models rely primarily on the brightness temperature difference between 18 and 36 GHz frequencies, which saturates after a threshold of snow depth. Further, the correlation between the snow depth and the brightness temperature difference is not strong and varies based on multiple factors such as snowpack characteristics, topography, landcover, etc. Hence, the snow depth estimates from the Chang model, AMSR-2 product, and GlobSnow product deviate considerably from observed snow depth. In the case of the GlobSnow product, the in situ snow depth observations are used in the HUT model for optimizing grain size. Hence, in heterogeneous climate regions, and in places where synoptic snow observations are not available, the GlobSnow snow depth estimates are susceptible to large errors. ERT models have shown substantial improvement in accuracy compared to other products even when snow depth exceeds 60 cm. ERT models have produced improved snow depth estimates compared to other products, resulting from the inherent schema of the ERT algorithm; where in situ observations are grouped into different clusters based on the variation of randomly selected independent features at different nodes during the development of the snow depth model. The overall results from binning analysis demonstrate that ERT models are Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. 3

) MAE Analysis of Model Estimates With Respect to Auxiliary Parameters:
The change in auxiliary parameters can have an impact on snow depth characteristics as well as PMW brightness temperature observations. Therefore, it is important to understand how the variation of auxiliary parameters within the study region is affecting the model snow depth estimates. Hence, the change in MAE of snow depth estimates from four different models/products (ERT all , AMSR-2, GlobSnow, and the Chang model) with respect to different auxiliary parameters is analyzed (see Table VIII). The results denote a systematic variation of MAE in snow depth estimates with latitude, elevation, vegetation, and season. Both ERT models have similar performance as evident from the results of CV and predictive power evaluation. Therefore, ERT all model estimates (i.e., from temporal predictive power assessment) for the entire Alaska along with other models are considered for this evaluation.
From the north slope region to southern Alaska, with a decrease in latitude, the MAE of ERT all model increased from 9.52 to 37.12 cm. A higher mean snow depth is present in the upper latitude region (68 • N-72 • N) than the midlatitude region (60 • N-68 • N). However, the ERT all model has shown a lesser MAE in upper latitude region than the midlatitude region. This can be attributed to a lack of vegetation cover, and stable snow conditions that are prevailing in the upper latitude region of Alaska. Though, ERT all has less MAE (i.e.,9.52 cm), AMSR-2, GlobSnow, and Chang model showed large errors, i.e., 39.08, 61.62, and 47.37 cm, respectively, in the upper latitude region. In other snow depth products, the MAE varied linearly with the mean snow depth across different latitude ranges. Higher elevation (>900 m) regions have higher snow depth (i.e., 84.96 cm) and MAE (i.e., 37.07 cm), than lesser altitude regions (<300 m) which have relatively less average snow depth (i.e., 48.72 cm) and MAE (i.e., 18.27 cm).
There is no substantial variation in the trend of MAE observed with respect to slope, aspect, and longitude. However, areas with little to gentle slope (< 10 • ) have a relatively lesser MAE compared to those having a moderate to steep slope (10 • -30 • ), i.e., 18.73 versus 23.47 cm. The season plays a vital role in snowpack microstructure characteristics resulting in changes in the brightness temperature response of the snowpack. In prewinter (September, October, November) and midwinter (December, January, February) seasons, where the snowpack is relatively shallow and more stable than late-winter (March, April, May), lesser MAE of 14.15 and 16.85 cm are observed, respectively. Whereas in late winter, the large snowpack grains formed due to recrystallization from melt-freeze activities of snow have resulted in a higher MAE of 31.57 cm. Further, the snow depth has also consistently increased since early winter to late winter, resulting in a higher MAE as the snow season progresses. In the current study, high vegetation areas have less mean snow depth and a smaller MAE in the model estimates. Whereas, low vegetation areas (< 20%) are covered with snow of higher depth and results in higher MAE. Additionally, at most of the locations near the south coast where low vegetation is present, higher errors are observed in all models due to multitude of reasons, such as proximity to water body, mixed pixels causing problems in the resampling of brightness temperature, complex snowpack conditions due to warmer temperatures, etc. Therefore, though low vegetation is present, apparently higher errors are expected in these locations. The ERT model, AMSR-2, GlobSnow products, and Chang model have almost similar trends of MAE when analyzed with respect to different auxiliary parameters. However, higher values of MAE are observed in the AMSR-2, GlobSnow products and Chang model compared to the ERT model for each auxiliary parameter.

VI. CONCLUSION
Many of the earlier snow depth models were developed by mainly using PMW brightness temperature difference of 18, 36 GHz, and snow depth. However, the accuracy of these models' estimates is constrained by the challenges posed by the spatiotemporally dynamic snowpack microstructure, and auxiliary parameters. Usage of additional frequencies and auxiliary data with machine learning schemes for snow depth estimation have partially addressed this problem. However, strong spatiotemporal heterogeneity in snowpack microstructure can contribute to noise in the PMW brightness observations, which can affect the performance of machine learning models.
To address these problems, in this study, an ERT-based machine learning scheme is used for the first time in the estimation of snow depth in Alaska. Alaska has diverse climate and topographical conditions, hence divided into six zones that have relatively consistent climatic conditions. PMW brightness temperature data from AMSR-2, auxiliary parameters, i.e., location, elevation, vegetation, etc., and in situ snow depth are used as inputs for training the different ERT snow depth models. Alaska wide and zonal ERT snow depth models are developed and compared to understand if there is any significant advantage of using independent zonal models for snow depth estimation in various Alaska regions. The performance of the models is assessed using three different CV approaches. Further, spatially and temporally independent datasets are used for the assessment of the predictive power of the ERT models.
From the results of CV and predictive power evaluation, ERT model as well as all other SD products have shown larger errors in Z6 as compared to other zones. The mean snow depth is highest in Z6. Other than that, Z6 being located on the south coast of Alaska, experiences more warm temperatures (see in Fig. 1), receives higher snow fall. The climate in Z6 can potentially lead to more heterogenous snowpack conditions as compared to other zones. Further, many stations are very close the to coast line, the mixed pixel effect results in the inclusion of nonsnow response in the brightness temperature recorded by AMSR-2. These effects cannot be accounted by direct resampling of brightness temperature. Thus, these reasons could have possibly impacted snow depth retrievals in Z6, which are to be investigated further in the future research for the betterment of the model.
Overall, results indicate that the ERT snow depth models have provided improved estimates compared to the operational AMSR-2, GlobSnow snow depth products, and the Chang model. The inclusion of auxiliary parameters in the ERT model has resulted in improved snow depth estimates compared to the ERT model without auxiliary parameters. The R 2 (R) values improved from 0.35(0.67) to 0.74 (0.86), and MAE (rmse) values decreased from 17.9 (29.43) cm to 11.6 (18.73) cm by including the auxiliary parameters in the model. The CV and predictive power assessment results reveal that there is strong heterogeneity in the performance of the ERT model across different zones. However, both Alaska wide and zonal ERT models have provided similar results with very close rmse, R 2 , MAE, and R values. Further, the comparison with other machine learning approaches, i.e., random forest and support vector machine also indicates that the ERT model has moderately better performance over them in most cases. Therefore, it would be preferable to use the Alaska wide ERT model, as it will be of more interest to the scientific community. Spatial variability in the weather conditions and topographical features alter the snowpack condition, leading to varying snowpack thickness and heterogeneous PMW response from snowpack between different regions. Therefore, the accuracy of snow depth estimations is not uniform across various zones of Alaska. The Alaska wide ERT model evaluation (i.e., from temporal predictive power assessment) indicates that the Southern Alaska region (Z6) has the highest MAE (rmse), i.e., 32.95 (45.64) cm, while the north slope region (Z1) has a smaller MAE (rmse) of 9.52 (12.25) cm. The snow depth estimates from ERT model(s) and other products indicate that the MAE of the modeled snow depth estimates increase monotonically with an increase in in situ snow depth. The ERT models have shown considerable improvement in snow depth estimates compared to the AMSR-2 product, GlobSnow product, and the Chang model, especially when the snow depth is above 30 cm in different Alaska zones. Further, the analysis of MAE in snow depth estimates exhibits a regional pattern, i.e., high latitude regions have lesser MAE compared to low latitude regions. Also, the MAE increases from early winter to late winter, and from low altitude to high altitude regions.
Coming to limitations of this study, varying atmospheric and climatic conditions, snow grain size, snow stratigraphy, forest canopy can induce errors in the snow depth retrievals [21], [84]. Though this study considered forest cover fraction as one of the input parameter, the variability in the forest cover, its type, density, height, etc., are not considered here. These parameters can impact the upwelling PMW radiation thereby impacting SD retrievals. Additionally, the impact of mixed pixels (i.e., water, forest, and other nonsnow objects) is not considered while resampling brightness temperature. This can have an impact on snow depth retrievals, particularly near coastlines. Further, atmospheric conditions have an effect on brightness temperature is spectrally variable. Using multifrequency brightness temperatures is suggested in some studies [10], [60] to partially address this problem. Discerning the presence of snow under precipitating clouds is challenging. Considering this, along with using brightness temperatures of multiple frequencies, Grody's decision tree [78] is used to remove precipitating clouds. Further, wet snow pixels are also removed using the Grody's decision tree. Therefore, SD estimates are not available at these locations. Further, the SD model is developed using point-based SD measurements within a PMW grid. These point SD measurements often are not a good representative of spatially varying SD within the coarse PMW grid. In machine learning models, such as ERT, the nonavailability of proportionate representative samples at a particular snow depth interval can impact the accuracy of model estimates, due to poor learning. Further, the transferability of data-driven techniques to regions other than the model region is questionable due to the naive representation of the relationship between multiple model parameters. Hence, though global snow depth models can be derived based on the machine learning schemes, the entrust of regional models could be higher for snow depth estimation. Additionally, developing a machine learning model at a global scale is a challenging problem due to the dynamic nature of the snowpack microstructure associated with spatial and temporal changes. The snow stratigraphy also affects the brightness temperature observed in different PMW channels [21], [85]. Thus, the in situ observation samples devoid of snow microstructure, and stratigraphy used for training the ERT model are not fully representative of the snowpack, as the same signature can be given by another snowpack with a different microstructure. Incorporating snowpack microstructure data into models can enhance the accuracy of snow depth estimates. Thus, in the future, a combination of different frequencies and physics information from snow physics models can be employed to improve the model snow depth estimates with passive microwave observations.