GlacierNet: A Deep-Learning Approach for Debris-Covered Glacier Mapping

Rising global temperatures over the past decades is directly affecting glacier dynamics. To understand glacier fluctuations and document regional glacier-state trends, glacier-boundary detection is necessary. Debris-covered glacier (DCG) mapping, however, is notoriously difficult using conventional geospatial technology methods. Therefore, in this research for automated DCG mapping, we evaluate the utility of a convolutional neural network (CNN), which is a deep learning feed-forward neural network. The CNN inputs include Landsat satellite images, an Advanced Land Observation Satellite (ALOS) digital elevation model (DEM) and DEM-derived land-surface parameters. Our CNN based deep-learning approach named GlacierNet was designed by appropriately choosing the type, number and size of layers and filters, and encoder depth based on the properties of the input data, CNN segmentation process and empirical results. The GlacierNet was then trained using input data and corresponding glacier boundaries from the Global Land Ice Measurements from Space (GLIMS) database, and tested on glaciers in the Karakoram and Nepal Himalaya. Our results show proof-of-concept that GlacierNet reasonably identifies the boundaries of DCGs with a relatively high degree of accuracy, and that morphometric parameters improves boundary detection.


I. INTRODUCTION
The impact of climate change on glaciers has become evident in recent decades [1], [2]. However, glacier response to radiative and precipitation forcing is not always apparent, as there can be much variability in glacier subsystem responses [3]. To understand glacier fluctuations and to extract regional trends in glacier state and rates, glacier mapping is required using conventional methods of field mapping or digital mapping based upon geospatial data (i.e., imagery, DEMs). Despite progress in semi-automated mapping approaches, accurately mapping debris-covered glacier (DCG) and accounting for numerous uncertainties still remains a research challenge. Either manual on-screen digitization using multi-temporal satellite data or detailed The associate editor coordinating the review of this manuscript and approving it for publication was Hongjun Su. post-processing following semi-automated mapping is considered to be best practice [4].
DCG generally exhibit heterogeneous surface properties ranging from clean ice or snow-covered ice in the accumulation zone to progressively more widespread and increasingly thicker debris cover from the Equilibrium Line Altitude to the glacier terminus. These glaciers may be surrounded by lateral moraines, end moraine, periglacial deposits, and sediment avalanches containing similar reflectance characteristics as supraglacial debris. The topographic variation caused by debris fluxes and ablation dynamics causes a major hindrance in mapping DCG especially in the visible and near/shortwave-infrared bands where mineralogical mixtures of supraglacial debris does not promote enough spectral variability to differentiate glaciers from the surrounding terrain using conventional or manual techniques, and where the debris also can obscure some ice dynamical features such as crevasses. However, there are subtle differences in VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ debris spectral properties that maybe caused by topography at multiple scales, minerology, plants, lichens, and weathering alteration veneers governed by ice-flow and gravitational sediment fluxes. Furthermore, off-glacier debris generated from local mass movements may have different mineralogical and/or rock type composition compared to those transported down the glacier from up-valley rock sources. These differences, combined with pixel-scale reflectance differences in the bidirectional reflectance distribution function (BRDF) caused by the aforementioned factors can provide subtleto-significant variations in reflectance and topography that can be exploited by artificial intelligence (AI) techniques. Neural computing can recognize nonlinear patterns related to spatial concepts of variation, shape and topology, thereby providing the basis for potentially delineating complex DCG boundary zones. Numerous DCG mapping approaches combining data from different satellite sensors and digital elevation models (DEM) have been developed over the years [5]- [7]. Two primary semi-automated approaches that have been used include pixel-based and/or object-based analysis, which can potentially exploit information characterized in spectral data and indices, and geomorphometric parameters [8], [9]. Although the object-based approach has been recommended [8], the challenge of numerous approaches and techniques is that they depend on the use of empirical thresholding of spectral or topographic data [7]- [11], which is problematic given issues of anisotropic-reflectance and morphological variation. Furthermore, thresholding of morphometric parameters is governed by the uncertainty in computing inherently scale-dependent parameters. The variation in spectral and morphometric properties are also highly spatio-temporally dependent, which cannot be efficiently addressed using thresholding. Therefore, it is important to note that true object-oriented analysis requires proper object segmentation, spatial analysis, and spatial-topological analysis to characterize and group objects on the basis of unique object and topological properties [12]. Simplistic use of object parameter thresholding should not be expected to address the spatial complexity associated with detecting morphologically complex boundary zones.
Similarly, statistical-based pattern recognition approaches have been used to classify DCG, as investigators evaluate which parameters are best for producing statistical separability [13]- [15]. This approach is also highly empirical in nature, as feature-space dimensionality and feature-space properties govern classification results. This approach does not effectively account for the evolution of glacier geometry caused by climate-glacier dynamics and therefore is usually not universally applicable. Furthermore, the challenges using more conventional mapping approaches are reflected in the use of brute-force algorithms, parameter selection, and large inter-operator and inter-algorithm biases [16]. Almost any method-and even different operators using the same method and same images-ends up requiring substantial operator analyst intervention, and this not only invariably introduces bias but also is time consuming, hence, expensive.
Consequently, advanced AI techniques of machine learning offer significant advantages over simplistic and traditional pattern-recognition algorithms related to generalization, non-linear patterns, and the characterization of image elements and various spatial concepts. They also support the evaluation of the high-dimensional approach and takes advantage of recognizing important patterns, automatically deciding upon the best information and magnitude constraints.

II. DEEP-LEARNING BACKGROUND AND GLACIERNET APPROACH
Recently, machine learning has been used for glacier mapping effectively [17], [18], but it differs from the deep-learning approach presented here. Deep-learning approaches represent a subset of the family of machine-learning approaches and algorithms that can be used. Deep-learning is based upon the use of artificial neural networks, such as deep neural networks (DNN), recurrent neural networks (RNN), and feed-forward convolutional neural networks (CNN). In this study, a CNN is applied and evaluated for automated DCG mapping in the ablation zone (from the snowline to the terminus). The initial conceptual framework for CNNs and its connectivity pattern between neurons came from the organizational structure of an animals visual cortex [19]- [21]. One of the first CNN's named LeNet 5 was created by Yann LeCun to classify hand-writing digits in the 1990s [22]. However, at that time, the application of CNNs was limited due to insufficient computing power. The abilities of CNNs were mostly unnoticed until the publication of AlexNet [23], which uses graphics processing units (GPUs) to increase the speed of computation and acquire critical information from complex images. Since then, numerous neural networks have been developed including: 1) the VGG16 network, GoogleNet and MicrosoftNet for image recognition [24]- [26]; 2) the regionbased convolutional network (RCNN), fast RCNN and faster RCNN for object detection [27]- [29]; and 3) Multi-Domain Network (MDNet) for tracking multiple objects in video [30].
Deep-learning has also been used to map snow and glaciers [31], [32]. In [31], researchers used a finely-tuned AlexNet to extract features from Sentinel-2 images that were dimensionality-reduced using principal component analysis (PCA). The extracted data (i.e., glacier and background features) were then fed into a random-forest classifier. Additionally, the VV and VH polarization SAR C Band of Sentinel-1A satellite imagery and DEM-derived parameters including slope, slope azimuth, and surface curvature were used to produce classification results by another randomforest classifier. The final classification was based on the collective classification results.
Recent advancements in CNN's capabilities includes image segmentation, which refers to identifying parts of the image and understanding what object they belong to. In typical CNN classification models, such as the AlexNet, the convolutional process extracts features from pixel clusters 83496 VOLUME 8, 2020 and the fully connected layers classify the extracted features. Here, each input image has one predicted label. Contrary to typical classification, segmentation is a pixel-based classification where the segmentation network uses a convolutional layer to achieve both feature extraction and classification. The difference between these approaches is that the fully-connected layer requires massive parametrization to label each pixel compared to the locally-connected convolutional layer, as used in our study [33]. In this study, the use of our CNN segmentation model compared to previously used deep-learning [31] approaches enables us to utilize satellite and topographic data within the CNN architecture. Therefore, we do not rely on separate random forest or other classification techniques, which maybe suitable for limited geographic areas and smaller datasets (e.g., one glacier). In general, typical machine-learning methods are better than deep-learning at dealing with smaller amounts of data [34]. However, our choice of a CNN segmentation model was based upon evaluating and testing the CNN architecture for mapping multiple glaciers from two geomorphological diverse areas.
Two widely used CNN segmentation structures are U-net and SegNet [33], [35]. The U-net structure was created for biomedical image segmentation purposes, and the SegNet structure was designed semantic pixel-wise segmentation. More recently, both structures have been used for satellite image segmentation [36], [37]. SegNet has an advantage over U-net because its structure requires low memory and inference time, which reduces the amount of data transferred [33]. However, these highly complex models are designed for different applications containing a large amount of data, and may not work well with limited training samples available in our study area. Furthermore, a highly complex network not only requires a large amount of processing time and complex architectures with lots of neurons, it can also cause overfitting [38] that leads to characterization of glacierized pixels as non-glacier. Although our study area is large, the required amount of training data is still lower than other deep-learning networks [33]. Therefore, in this study, we develop a deeplearning approach named GlacierNet, and designed its main neural-network structure using SegNet as a reference model. We also use the transfer-learning method to decrease training time [39], and improve testing performance. Our ultimate objective is to accurately evaluate the utility of advanced CNNs for mapping complex glacier boundaries, and develop proof-of-concept for global glacier mapping.

III. STUDY AREA
We evaluated glaciers from two distinct High-Mountain Asia (HMA) regions: Nepal Himalaya and the central Karakoram in Pakistan ( Figure 1). Glaciers in these regions exhibit unique surface and geometric properties. Although both areas are dynamically active, the glaciers in Nepal, like most of those in the world, have retreated rapidly in the last few decades and formed thousands of supraglacial and proglacial Lakes [40]- [42]. In the Karakoram, the glaciers are extremely variable in terms of their size, state (surge-type, advancing, retreating), geometry and fluctuation rates, as investigators have recognized unique conditions such as the Karakoram anomaly and the DCG anomaly [43]- [46]. Unusual in the world, the Karakoram Anomaly involves an overall slightly positive mass balance in recent decades [3]. The relief in the Karakoram is also extreme, and many glaciers have thick and widespread debris cover-more so than the majority of Nepal Himalaya glaciers. Glaciers in the Nepal region receives significant input from the Indian Summer Monsoon, while Karakoram glaciers receive westerly-driven precipitation in the winter and spring and some (but less) monsoon precipitation in the summer [47]. Thus these two geographic regions represent extremely different glacier conditions and properties for evaluating our CNN approach to glacier mapping.

IV. DATA USED
For the training and testing of GlacierNet, we used 17 data layers as our input (all eleven Landsat 8 bands downloaded from www.glovis.usgs.gov, 30 m ALOS DEM [48], and five layers containing geomorphometric parameters that are derivatives of the DEM). The ALOS DEM used in this study has a root mean square error (RMSE) of 3.5-6.75 meters [49], [50]. None of the 17 layers are indispensable; they all have complementary advantages. We analyzed layer cross-correlation and observed that the network can generate results with fewer input layers, but with lower accuracy (Table 1; evaluation indexes used in the table are discussed in the methodology). Prior to using them as inputs, we resampled all input layers using nearest neighbor method to a 15-m spatial resolution to match the Panchromatic band. We also normalized all data values to the same order of magnitude based upon minimum and maximum values.
Cloud cover always poses limitations in optical remote sensing image analysis, especially in the high-altitude glacierized basins; therefore, we only selected images with minimal cloud cover. We used one Landsat 8 image acquired on 2015-09-30 (LC81400412015273LGN01) for the Nepal Himalaya and two mosaicked images acquired on 2016-9-24 (LC81480352016268LGN02) and 2016-10-01 (LC81490352016275LGN01) for the Karakoram.
For expected output, the glacier-boundary shapefiles from the Global Land Ice Measurements from Space (GLIMS) database (www.glims.org) were used after modification. Since our focus here is only to map DCG, GLIMS outlines were modified to (i) remove the snow-covered accumulation zone because it lacks distinguishable features from surrounding snow-covered terrain for the training and testing, and (ii) adjust for any changes that may have occurred due to terminus fluctuations since outlines were mapped. We did not modify lateral boundary outlines of a glacier or anywhere else, even though in some cases it appears that glacier outlines are erroneous or the glacier has significantly downwasted. Due to recent glacier changes, or due to mapping errors, the lateral edges may no longer match the boundary portrayed in satellite data, thus causing errors in our training data. However,  our proposed GlacierNet can accept a certain amount of error because it uses a region-based computing process. According to [51], in deep-learning, more training data with low-quality labels are better than less training data with high-quality labels. Therefore, the data preparation does not require a strict quality check for every glacier boundary used for training, especially for larger regions that permit more training data to be used.

V. METHODOLOGY
Here, we first discuss our CNN structure and its components and then data settings, post-processing and evaluation.

A. GLACIERNET CNN STRUCTURE
The GlacierNet CNN is designed to account for spatial characteristics such as variation, texture, anisotropy, and patterns.

1) ENCODING AND DECODING PROCESS
Similar to SegNet, the GlacierNet CNN structure includes two main process encoding (i.e. feature extraction) and decoding (i.e. feature classification) ( Figure 2). The structures of encoding and decoding processes are symmetrical, and each encoder in the encoding process has one corresponding decoder in the decoding process ( Figure 2). The primary purpose of the encoding process is data (i.e. DCG and background) feature extraction; it is composed of encoder stages connected in ascending order of the serial number. After the encoding, the decoding classifies the objects according to the extracted features. Similar to the encoding process, several decoder stages are included in the decoding process, but they are connected in descending order of the serial number. Besides the connection in order, another type of connection is established between the encoders and the corresponding decoders to transport the necessary decoding information.
Inside each encoder stage, there are several convolutional layers, batch normalization layers, Rectified Linear Unit (ReLu) Layers, and one 2 × 2 max pooling layer. The decoder stage has similar components, but instead of the max 83498 VOLUME 8, 2020  pooling layer at the end, it has an unpooling layer at the beginning of the stage ( Figure 2).

2) CONVOLUTIONAL, NORMALIZATION AND RELU LAYERS
The convolutional, batch normalization and ReLu layers are the fixed combination in the GlacierNet CNN structure. In this combination, first, the 17 input data layers enters the convolutional layer, which then connects the input elements inside the window region to compute the local gradient for generating feature maps. Its parameters include h number of m×n×d filters (Figure 3), and a h-elements bias vector. In each filter, d is equal to the number of layers of input data, and m×n is the convolutional window size. The output of the convolutional layer has h layers of data since each filter combines multiple layers of output to one layer. Generally, each data layer size is reduced to M −m stride +1× M −n stride +1 after the convolution process, but in this structure, all the convolutional window strides are set to 1, and the input data and output data maintains consistent resolution through zero padding. Then, the feature maps are fed into the batch normalization layer that normalizes the data to increase learning speed and reduce the sensitivity to network initialization. Finally, the ReLu that is the most commonly used activation function max (0, input) for neural network adds non-linearity to the normalized data [23]. Comparing the ReLu function with other activation functions, such as the sigmoid function, the ReLu function avoids the problem of vanishing gradient in the deep neural network. However, it would cause the problem of an overly large gradient in the meantime, since the ReLu function does not limit the output value range. That is why batch normalization layer is so important here.
In deep-learning, a deeper network sometimes suffers from vanishing gradient in backpropagation or overfitting. So based on this idea and proven network structures (e.g., SegNet, U-Net, etc.), we set the experimental range for the number of convolutional layers in each stage to 1-4, and found that 3 layers in each stage perform best (Table 2 ).

3) POOLING AND UNPOOLING LAYERS
The max pooling layer (Figure 4) inside the encoder chooses the maximum value within the pooling window as output  for translation invariance [33], and extracting outstanding features. The unpooling layer inside the decoder is similar to the back propagation process of max pooling (Figure 4). It restores the size reduced by max pooling, and the input value goes back to the original location of its corresponding maximum value. So, the original size and location index are the decoding information.
The pooling layer and unpooling layer can be treated as separators for different encoders and decoders ( Figure 2). Because of the symmetrical network structure used in this study, the number of pooling layer, unpooling layer, encoders, and decoders are the same. As the encoders and decoders, each pooling layer in the encoder stage has one corresponding unpooling layer in the decoder stage, and the consistent number of pooling and unpooling layers allows the same resolution for both the input and output. So the number of stages is decided based on the times of optimal pooling. One 83500 VOLUME 8, 2020 major use of pooling layer is to extract the outstanding data features. Insufficient pooling leads to extraction of trivial (less prominent or insignificant) features, while too many pooling loses too much information. Though no theory can directly tell us the balance point, we did 5 groups of experiments where the optimal pooling times ranged between 2-6 ( Table 2 ). Since each pooling layer has a size of 2 × 2 and the stride of 2, each feature map has the size of inputsize 2 n after the encoder stages processing, where n is number of pooling layers. The input data size is 256 × 256, so 2-6 pooling times means the feature map size would be 64 to 4.

4) FILTERS
The available CNN structures, such as AlexNet and VGG16, distinguishes 3-D depth perception from 2-D photographs by increasing the number of filters, layer-by-layer [52]. However, the nadir-view of Landsat satellite images used in this study does not need to understand the concept of depth, relief, or high-level 3-D objects [52]. Thus, the number of filters for every layer is kept consistent. Moreover, our aim here is to use minimal training data to achieve high accuracy in DCG mapping using high-dimensional GlacierNet classification. For that, we reduced the number of filters in each convolutional layer during the design experiment, as explained through the two-dimensional classification example (Figure 5). It decreased the required similarity level but introduced some misclassified background, which was removed by the subsequent process (Figure 2; discussed below). Therefore, it is critical to use the appropriate number of filters that can recognize the maximum number of correct pixels and only minimal misclassification that can then be easily removed. Based on these strategies, along with our experiments with multiple filters and to maintain consistency in each convolutional layer, the number of filters in each layer is set to 32 (Table 2), except the last convolutional layer. Furthermore, DCG contains debris, ice, water, and snow, which are spectrally similar to the nearby materials, forcing the CNN network to consider more neighbor pixels or objects to make a decision on the cluster that belongs to DCG. To aid this process, we decided to use the 5 × 5 filter that considers more neighboring pixels than a 3 × 3 filter (Table 2).

5) NETWORK INITIAL OUTPUT
Since there is the same number of pooling and unpooling layers, and the last convolutional layer has only 2 filters because it is a classification network for two classes, the output of the decoding process is a 256 × 256×2 matrix. The matrix is converted to the same size matrix of categorical probability score for every pixel through a softmax layer and the final class decision is made.

B. AUTOMATED POST-PROCESSING
The small binary image outputs of GlacierNet CNN are combined into a large image that marks DCG and background pixels in the study region. Then, to improve the DCG mapping, size thresholding, normalized difference water index (NDWI) and hole filling are applied to the binary images as the automated post-processing techniques. The size thresholding is used to remove the misclassified background pixels, which usually comprise small regions, such as dirty late-season snow fields. It only contained dozens to hundred of pixels and was removed by connected-region size thresholding. Some proglacial lakes contain icebergs which have a similar signature as ice cliffs on DCG, if the training network lacks enough water pixels (proglacial lakes), it classifies them as the glacier. The Nepal Himalayan region has several such lakes, and our training groups ( Table 2 ) do not always contain enough water pixels. Therefore, we use the NDWI, which increase the intensity difference of water and non-water pixels, to remove the misclassified water pixels during the post-processing step. The NDWI of Landsat 8 is defined as: After removing the misclassified background, the holes within boundaries were filled using the size and slope threshold. We observed only a few dozen such holes containing a very small number of pixels and generally occurring away from the training region.

C. DATA SUBSET AND SETTINGS
The original satellite image swath is large, but to make it compatible with the network input requirements, we used a 256 × 256 sliding window to subset it to fit the study area ( Figure 2). One sub-image only contains a part of the glacier. Therefore, the window size is not unique and decided by considering the swath of the object and the network property. Furthermore, to connect every sub-image and create different samples for data augmentation purposes, the shifting stride is set to 16 or 32 so that sub-images have overlapping areas. VOLUME 8, 2020

D. GEOMORPHOMETRIC PARAMETERS
In addition to altitude (DEM), geomorphometric parameters were utilized to improve boundary estimation performance. It include slope angle, slope-azimuth divergence index (SADI), profile curvature, tangential curvature, and unsphericity curvature. First and second derivatives of altitude were approximated from a quadratic surface [53] in a 3 × 3 moving window and used to compute slope angle, slope azimuth, and curvature parameters by the methods of [54], [55]. SADI was computed over a 3 × 3 window as: where η is the number of neighbors, ϕ i is the slope azimuth of grid cell i, and ϕ p,i is the slope azimuth from the focus to i. These morphometric parameters represent the unique morphology of glacier surfaces and therefore provide information that is highly orthogonal (not correlated) to that derived from spectral data. Specifically, glacier surfaces exhibit unique morphometry when characterized from different referencing perspectives (i.e., curvature metrics). Profile curvature, tangential curvature, and unsphericity each capture secondderivative altitude variation such that vertical, horizontal, and omnidirectional variations in slope are accounted for, respectively. Furthermore, the margins of glaciers tend to be convergent (low SADI), and the ablation zone tends to have a complex slope-azimuth structure (high spatial variability in SADI at low altitudes), and the upper glacier tends to have well-organized pattern of moraines more or less parallel to the margins of the glacier (alternating linear paths of high and low SADI); the spatial frequency of SADI variation tends to be much higher on the glacier surface than on mountain slopes. SADI is particularly useful because a CNN will be sensitive to such unique spatial patterns. Furthermore, different glaciers may exhibit striking differences in convexity or surface pitting (as from supraglacial ponds), surface ridging (from ogives or ablating medial moraines), or crevasses, etc. based on whether it is aggressively advancing, or stagnating. Each glacier may have somewhat of a ''signature'' micro-topography based on its unique dynamics, mass balance, and recent history. The deep-learning automated approach to SADI and all the other geomorphometric parameters-and then integrating with reflectance based parameters-is analogous to how a human expert might synthesize information and use such patterns for mapping, but without subjectivity or a slowly shifting individual bias or without inter-operator bias. Characterization of our morphometric patterns explains why our training dataset percentages are quite high, as presented below.

E. TRANSFER-LEARNING METHOD
The training process for our deep-learning neural network is time consuming, and it generally requires enormous computational resources. Although substantial investment can be made to improve the hardware, it always lags the rapid growth of large-scale data. The option to avoid such a bottleneck is to reduce the number of training epochs using a transfer-learning approach, which utilizes the pre-trained network from a known location elsewhere, and applies it to an unknown region to avoid starting the process from scratch ( Figure 6). In this study, we trained our network from the ground-up in the Karakoram region and then used it as the initial training weights for the glaciers in the Nepal region, rather than randomly initializing the weights.

F. EVALUATION METHOD
The GlacierNet-derived outlines and modified GLIMS polygon were found to have a few pixel differences in the area where the ablation zone transitions into the accumulation zone (approximated as the late season snowline). Therefore, we set the slightly underestimated snowlines (SUS) for every DCG mapping. The modified GLIMS polygon snowlines were adjusted to overlap with SUS. If GlacierNet output surpasses the SUS and maps DCG beyond it, then snowlines are adjusted to overlap with SUS. This adjustment is limited to the snowline area and only done to calculate evaluation indexes, such as intersection over union (IOU) values [40], recall, precision, specificity, F-measure, and accuracy [31], which are used for evaluation and defined as:

A. GLACIERNET CNN STRUCTURE EXPERIMENT IN KARAKORAM
The systematic GlacierNet CNN structure experiment was initially designed based on the characteristics of CNN and empirical results showing the probability of accurately identifying DCG during each test. We ran 13 different training sessions using the same 10% of the study area where each session included some variations in CNN structure such as the number of filters, stages and convolutional layers, and the size of filters. The result shows that GlacierNet CNN structure in group 4 containing 4 encoder and decoder stages, 5 × 5 filter size, 3 convolutional layers in each stage, and 32 filters in each convolutional layer depicts the highest IOU (Table 2 ).

B. GLACIERNET DATA PERCENTAGE EXPERIMENT
In this experiment, we used three scenarios of 10%, 15% and 20% of the geographic area as training regions to observe how sensitive the training area is to the overall output (Figure 7). For each of these scenarios, we also selected three slightly different geomorphological areas (Table 3) within the image to assess how surface characteristics might influence the selection of the training region. We fixed the upper limit of the training data percentage to 20% to effectively use the available resources and not exhaust the supercomputer allocation [57]. In our observations, based on evaluation indexes, 15% or 20% appears to result in relatively high performance ( Table 3).
The GlacierNet performed exceedingly well in detecting the debris-cover glacier boundary, snowline, and terminus with relatively high accuracy (Figure 8 & Figure 9). Terminus identification is one of the challenging aspects of mapping DCG; however, in our case study, GlacierNet was able to accurately estimate terminus locations and tongue margins for Karakoram and Nepal Himalayan glaciers, despite widely varying glacier tongue and terminal moraine characteristics. For example, the terminus for the Bualtar, Barpu, and Virjerab glaciers in the Karakoram were similar to the multiple glacier outlines for the area available in the GLIMS repository. The Karakoram glaciers terminus position from this region are known to be relatively stable [58]; hence, these similarities were expected. The Nepal Himalayan glaciers, on the other hand, have observed significant retreat and evolution of moraine-dammed glacial lakes at the terminus [42], complicating the comparison with the original GLIMS outlines. Although, as discussed, GLIMS outlines are either digitized VOLUME 8, 2020 manually or using semi-automated techniques, and it is not clear if we are comparing to the right output. Therefore, we also compared our boundaries with the SADI index, which uniquely identifies convergent glacier boundaries and divergent moraine ridges, and GlacierNet output matches these patterns quite well (Figure 9).

C. TRANSFER-LEARNING EXPERIMENT
We use a trained network that has learned 20% of the Karakoram region as the initial point for training the Nepal Himalaya region. Transfer-learning significantly increases the training speed by reducing the number of training epochs. The training initialize accuracy is higher than 90%, while the randomly initialized networks are ∼30-40%. The IOU is also slightly increased as demonstrated by the group 10-B where it rose from 0.6952 to 0.7220. For example, the GlacierNet that learned 10% of the area failed to estimate the areas closer to the terminus of the Shalong glacier (Figure 10 a). After taking advantage of transfer-learning, the GlacierNet is able to correct the errors, although the boundary is still not perfect because of using limited training data (Figure 10 b).

A. DATA PERCENTAGE EXPERIMENT ANALYSIS
From a technical perspective, one of the challenges in deep-learning is to understand how much data should be used for training to produce scientifically valid results. It is also necessary to understand what spatial information and patterns the CNN is using to produce accurate boundary locations. Based upon our work, there is a large degree of uncertainty associated with answering this question, as we focused on producing reasonable results, not evaluating scale-dependent information characterized by the CNN. That would require a completely different study.
From a scientific perspective, submitting raw data versus specific information that is related to process-form relationships will make a significant difference in the amount of information that is required to accurately map glacier boundaries. More data and information may not significantly  produce better results. This represents the inherent difference between black-box empirical pattern-recognition approaches versus structural deterministic characterization of properties and process-form relationships. Nevertheless, it is evident that training using more data over larger areas yields better results, but there are no apparent guidelines about what process-morphometry dynamics are being accounted for, and how such dynamics vary over different areas given variations in climate and tectonic forcing.
Large-scale mapping studies over the Himalaya or globally would require extensive training data and computational resources (e.g., supercomputers), and would be very time consuming. Nevertheless, a combination of a scientific structured approach to characterization of geomorphological systems coupled with CNN could potentially reduce the computational complexity.
Our initial work demonstrates that it is possible to produce reasonable results by using up to 20% of the total area for training. Although the 20% data usage for training is still higher than most machine-learning approaches, it is typical for deep-learning algorithms [34]. Glaciers are complex and diverse primarily due to topography and supraglacial debris and feature variations across the ablation zone, mineralogical differences within and across regions, varying relief, sediment shape and size, and so forth. Compared to typical machine learning, deep-learning establishes more rigorous and comprehensive classification boundaries; therefore, it requires more data. Furthermore, we also tested the implications for only using 10% and 15% of the data, and the performance of GlacierNet was improved by increasing the amount of training data (Table 3 ).
Results appear to be better in the Karakoram region than in the Nepal region given a similar percentage of training data, primarily because Karakoram glaciers exhibit a greater topographic geometry prominence due to extreme relief, climate forcing and less variability of supraglacial debris and less feature variability compared to Nepal glaciers. In the Nepal region, glacier surface characteristics include more supraglacial and proglacial lakes [59], [60], ice cliffs, supraglacial streams, low ice velocity [61], and extremely heterogeneous debris thicknesses [62], due to pronounced glacial thinning [63], [64]. Given the dominance of radiative forcing in Nepal, glacier geometries and topographic prominence on the landscape is less defined, with greater supraglacial debris and feature variability, such that training and defining an accurate decision boundary is more difficult. Furthermore, the accumulation of snowfall in winter and spring months in the Karakoram rather than during the summer monsoon in Nepal Himalaya has led to diminished sensitivity to warming that regulates ablation dynamics, supraglacial topographic evolution and therefore, reduced mass loss rates [65]. Consequently, GlacierNet can be trained to characterize prominent spectral and topographic variation related to boundary zones. In addition, our Karakoram study area is ∼8 times (31,127.145 km 2 ) larger than the Nepal Himalayas (3,734.559 km 2 ). Since we have forced our model to use no more than 20% of training data, it probably has less training data available to characterize distinct glacial surface features in Nepal Himalaya. This suggests that for smaller study regions, more training data may be needed for accurate mapping (>20%). A critical point is to utilize training areas covering a variety of samples and not just focus on the size of the training area. For example, a comparison between two different training regions, each containing 10% of the total area, but a different altitude and aspect distribution produces different IOUs (Figure 11, Table 3 ).

B. TRANSFER-LEARNING NETWORK
A lot of similarities exist between our two test regions, however, climate forcing mechanisms, surface processes and VOLUME 8, 2020 lithologies vary enough that direct application of one net into another area produces misclassification in the range of 10%. This has the potential to be significantly reduced if newly developed land-surface parameters can better characterize the multi-scale morphological variation on glacier surfaces, as similar surface processes and ablation dynamics occur on all DCG, although the spatial distribution in the dominance of them can vary. Here, we applied a transfer-learning network trained on the Karakoram data to the Nepal data. The advantage of applying a trained network is that it has already established the Karakoram classification boundary upon which the Nepal Himalayan data samples can tune its classification boundary to map the DCG in Nepal Himalaya ( Figure 6) -thus reducing the training epochs. Furthermore, since the network that has learned similar examples is more ''knowledgeable,'' and the classification boundary is more comprehensive, the testing performance of the transfer-learning network (TLN) is better than a network of randomly initialized weights ( Figure 6). Transfer-learning is a powerful method, which can be formalized by characterizing composition, surface process and morphology relationships that enable much more diagnostic mapping capabilities and exhibits more universal applicability with respect to geographic regions.

C. LIMITATIONS 1) LAKE
The Nepal Himalaya region exhibits relatively high retreat rates, thinning and the formation of supraglacial and proglacial lakes. Our algorithm was trained to detect supraglacial lakes and include them within the glacier outline, but large proglacial lakes pose a challenge for the algorithm. It is especially evident since the DEM (and its derivative products) were acquired at a time when the glacier terminus was downvalley and was not occupied by the proglacial lakes as is the case in the current Landsat images being analyzed ( Figure 12). Such data disagreements, although common, confuses the network's decision-making capabilities, thus resulting in error. The potential solution includes adding more water samples in training data and forcing the network to trust satellite image data when disagreement occurs. However, as discussed before, adding more training is not always the best option. Therefore, to reduce the network's confusion and errant or ambiguous output, we generated the NDWI and mapped all water pixels, which was then used as a separate input to remove water pixels that were outside the glacier boundary but still mapped as DCG. We compared the network outputs from randomly initialized and TLN without any NDWI processing and observed that the TLN could reduce the data disagreement effect because it has a deeper image of what DCG looks like ( Figure 12).

2) ICE-CORED MORAINE
The GlacierNet classifies a pixel according to its learned data; however, occasionally, it misclassifies areas that exhibit a high-degree of similarity. For example, three common misclassifications include the ice-cored moraine in front of the proglacial lakes (Figure 13 a), terminus overestimation (Figure 13 b), and proglacial rivers. Although unique, some distinctive characteristics such as thermokarst lakes and possible ice cliffs on the ice-cored moraine are similar to the heavily debris-covered glacier terminus. Since the algorithm is trained to pick glacier features and unique patterns across the ablation zone, it identifies ice-cored lake containment areas as a glacier. Manual editing of end moraines could be a viable option, as the lakes are there to highlight the end moraines. Automated approaches could be applied, for example, if a  small patch of supposed glacier occurs downvalley from a lake and is unattached to areas up the side slopes and valleys, then it is an end moraine. We note that end moraines are not large and have a freeboard unlike the glacier terminus, so we used a size thresholding process to eliminate end moraines. For the proglacial river misclassification, it mainly occurs in the basin where DCG has several supraglacial lakes and streams in the training data sets, which leads to misclassification of a few proglacial rivers pixels as DCG.

3) SNOWLINE
The accurate identification of snowline is important in calculating the total ablation zone of the glacier. However, we observed occasional network confusion/error in picking mixed pixels (ice vs. debris vs. dry snow vs. wet snow vs. firn) especially around the snowline area. Since our focus in this study is to understand the effectiveness of deep-learning in mapping DCG, we chose to use the SUS. Although, we have not evaluated the GlacierNet sensitivity to the image resolution, the confusion with the mixed pixels only occurred at the snowline, which can sometimes be very subjective in 30m spatial resolution data. We believe that high-resolution images will undoubtedly help reduce confusion with the mixed pixels.

4) CLOUD AND SHADOW
Cloud and its shadows (Figure 14 a), and relief cast shadows are issues that have historically affected optical remotesensing-based glacier mapping. Although it is unavoidable, the areas exhibiting relief cast shadows can still be useful because even though illumination is relatively lower, the intensity gradient still exists to support network classification (Figure 14 b). We have not observed direct evidence of any shadow influenced errors in our testing. This could be a major advance of our work compared to other automated glacier mapping approaches.

5) DEM
Sometimes we have observed a few pixels within the DCG showing errors larger than the normal RMSE causing misclassification. In addition, the ALOS DEM has some voids, but our use of a region-based computing process reduces its effect on the network. However, to make it more reliable, we took advantage of a larger filter size, such as the 5 × 5.

VIII. CONCLUSION
In this research, we proposed GlacierNet that utilizes a deep -learning approach to automatically delineate DCGs. This approach is unique compared to more conventional methods where researchers must spend an inordinate amount of time and effort to manipulate data and utilize a variety of empirical analysis and pattern-recognition algorithms, and post classification procedures to produce reasonable results. Our deep-learning approach evaluates and converges on feature identifications, and weighs the advantages of each input data layer automatically. We tested our method on two study areas in the central Karakoram and Nepal Himalaya regions. The GlacierNet delineates accurate DCG boundaries after being trained using data from 10 to 20% of the Karakoram region, and 15 to 20% of the smaller Nepal area. We implemented a transfer-learning method in the Nepal Himalaya area, which significantly reduced the training time, and improved the IOU. Future work will require applying the deep-learning methodology to characterize snow-covered and exposed glacier ice facies based on spatial constraints, flow geometries, and surface morphological and compositional variations. A scientifically structured approach towards characterizing basin geomorphological systems may significantly reduce the amount of training time needed to produce superior glacier outlines compared to human interpretation. This will require us to modify the architecture of a CNN to be able to diagnostically characterize complex variations in glacier dynamics over geographic regions with varying climatic and tectonic forcing factors and ablation dynamics. Our initial results serve as an attempt to demonstrate proof-of-concept for improved mapping of DCG and existing glacial inventories using a deep-learning approach. More research regarding characterizing the multi-scale morphometric variations in topography related to process dynamics should significantly improve mapping results.

ACKNOWLEDGMENT
The author Zhiyuan Xie thanks funding support from the College of Arts and Science, School of Engineering, and Integrative Science and Engineering Center, University of Dayton. The author Umesh K. Haritashya thanks the Ohio VOLUME 8, 2020 Supercomputer Center (www.osc.edu) for extending allocation hours through project PNS0435 to run their algorithm.The authors thank the National Aeronautics and Space Administration (NASA) and the United States Geological Survey (USGS) for providing Landsat 8 images, Japan Aerospace Exploration Agency (JAXA) for ALOS DEM, and Global Land Ice Measurements from Space (GLIMS) for providing glacier boundaries free of charge. They also thank C. Scott Watson for downloading ALOS DEM for this work and two anonymous reviewers for suggesting improvements to this manuscript and two other anonymous reviewers from the previous version of this manuscript, which was submitted elsewhere. Their comments motivated them to revise their manuscript extensively and helped in its rapid acceptance in this journal. UMESH K. HARITASHYA is currently the Endowed Mann Chair of natural sciences and an Associate Professor with the Department of Geology, University of Dayton. He and his group has received research grants from agencies, such as NASA, USAID, UNDP, and the National Geographic Society. He has published more than 40 articles, book chapters, and an edited book. His research interests include cryospheric science, glacier hydrological modeling, remote sensing, and artificial intelligence.