Towards Solving Ambiguous SAR Textures in Convolutional Neural Networks for Automatic Sea Ice Concentration Charting

—Automatically producing Arctic sea ice charts from Sentinel-1 synthetic aperture radar (SAR) images is challenging for convolutional neural networks (CNNs) due to ambiguous backscattering signatures. The number of pixels viewed by the CNN model in the input image used to generate an output pixel, or the receptive ﬁeld, is important to detect large features or physical objects such as sea ice and correctly classify them. In addition, a noise phenomenon is present in the Sentinel-1 ESA Instrument Processing Facility (IPF) v2.9 SAR data, particularly in subswath transitions, visible as long vertical lines and grained particles resembling small sea ice ﬂoes. To overcome these two challenges, we suggest adjusting the receptive ﬁeld of the popular U-Net CNN architecture used for semantic segmentation. It is achieved by symmetrically adding additional blocks of convolutional, pooling and upsampling layers in the encoder and decoder of the U-Net, constituting an increase in the number of levels. This shows great improvements in the performance and in the homogeneity of predictions. Second, training models on SAR data noise-corrected with an enhanced technique has demonstrated a signiﬁcant increase in model performance and enabled better predictions in uncertain regions. An eight-level U-Net trained on the alternative noise-corrected SAR data is presented to be capable of correctly predicting many ambiguous SAR signatures and increased the performance by 8.44% points compared with the regular U-Net trained on the ordinary ESA IPF v2.9 noise-corrected SAR data. This is the ﬁrst installment of this multi-series installment of articles related to AI applied to sea ice (in short AI4SeaIce).

or shipping through the Northern sea routes connecting the North Pacific to the Atlantic Ocean [2].With a growth in geopolitical attention in the Arctic [3], Arctic nations are pressured to affirm their sovereignty, requiring the deployment of military patrol vessels.Traversing the Arctic waters safely and efficiently necessitates up-to-date charts of the constantly moving and changing sea ice conditions highlighting the contemporary sea ice extent, local concentration, and auxiliary descriptions of the ice conditions.
For several decades, sea ice charts have been manually produced by visually inspecting and analyzing satellite imagery [4].Synthetic aperture radar (SAR) images are often used for this task due to the high resolution and the capability of acquiring images independently of clouds and sun illumination.Observations from other space-borne sensors are used by the ice analysts when available and advantageous, including optical and passive microwave radiometer (PMR) observations.In optical imagery, the difference between bright white sea ice and dark blue open water is easily distinguishable, but the dependence on sun illumination and cloud-free conditions reduces the utility for operational sea ice charting.The microwave signatures of sea ice and open water in PMR observations are generally easily distinguishable, but the coarse resolution (typically tens of kilometers) limits its use for detailed sea ice charting and nautical navigation purposes.However, the backscattering signatures of SAR are difficult to interpret and require trained ice analysts to describe the sea ice conditions.In addition, there are ambiguities in the electromagnetic signature between open water and sea ice arising when strong winds occur and for specific ice conditions, such as compact or landfast ice.The Sentinel-1 extra-wide (EW) mode SAR covers an area of about 400 km × 400 km or in the order of 10 000 × 10 000 pixels in the native level-1 medium resolution of 40-m pixel spacing [5].Due to image size, manual inspection is labor-intensive and time-consuming, limiting the time the ice analysts can spend on each image, meaning that they have to prioritize maritime operational regions near the sea ice edge [4].Moreover, the accuracy of an ice chart diminishes with time due to the dynamic nature of the sea ice, and therefore, ice analysts have limited time to perform the analysis.Therefore, automation of sea ice chart production has the potential to increase the use of captured satellite imagery, faster and more frequent chart deliveries, a higher level of detail, and a broader geographical coverage while increasing consistency.
With the development of Earth observation (EO) programs and artificial intelligence (AI), automating sea ice concentration (SIC) mapping using deep learning and convolutional neural networks (CNNs) was initially published by Lei Wang in 2016 [6] exemplifying the potential of CNNs for sea ice charting.The authors used semantic segmentation, classifying individual pixels, to map sea ice in the Beaufort Sea, using dual-polarization SAR images from RADARSAT-2 and a fully connected classification layer for pixel-wise labeling.The model was further developed highlighting the advantage of fully convolutional networks in 2017 [7].More recent advancements have been carried out by the Automatic Sea Ice Products (ASIP) project funded by the Innovation Fund Denmark [continued as part of the AI4Arctic European Space Agency (ESA) initiative] in 2020 [8], which sought to overcome the challenges of high wind speed and compact sea ice SAR ambiguities by fusing Sentinel-1 SAR and AMSR2 PMR in an atrous pyramid convolutional network [9].The most recent publication in the field [10] applies a U-Net CNN architecture [11] to downscaled Sentinel-1 SAR data and carries out experiments with both categorical and regressional loss functions, and a combination of them.In another branch of sea ice charting, classifying the type of sea ice, instead of concentration, has been carried out in [12].
This article investigates creating sea ice charts automatically, based solely on Sentinel-1 dual-polarization SAR images to produce high-quality ice charts retaining high resolution and level of detail, while simplifying the operational data pipeline.In addition, relying only on Sentinel-1 satellites can increase the number of ice charts produced, because the scenes without (spatio-temporarily corresponding) PMR do not need to be discarded.Unlike the standard applications of computer vision, i.e., to regular camera images, the large scale of SAR images (up to 10 000×10 000 pixels) creates significant obstacles, as computer resources capable of training CNN models directly on these images are not commonly available.Meanwhile, semantic segmentation network architectures such as U-Net [11] were developed for 572×572 pixels pictures, much smaller than SAR images.The receptive field of the CNN model is a measure of how many input pixels are contributing to the final prediction of a pixel in the output layer.The regular U-Net has a receptive field of 188×188 pixels.In contrast, sea ice objects and features may extend for thousands of pixels, which the model is oblivious to.A notable conclusion and suggestion given in [8] is to increase the receptive field of the model, allowing it to predict a value of the output pixel, based on the information from a wider area in the input image.In [13], the authors attempted to improve the results from [8] using different spatial windows to train a model, indicating a positive impact of increasing the effective spatial receptive field.This is evident when we consider that the ice analysts are capable of inspecting entire SAR images, contrary to the models, which may give ice analysts an advantage, as shown Fig. 3. Identifying important features further away from the area of ambiguity can help differentiate between sea ice and open water.Furthermore, the persistent speckle noise and thermal noise-induced subswath transitions in the Sentinel-1 SAR scanning technique TOPSAR, visible as long-grained vertical lines, limit the model performance [14].
Therefore, this article investigates the effects of applying an alternative SAR noise correction scheme [14], developed by the Nansen Environmental and Remote Sensing Center (NERSC), and increasing the number of layers, and the size of the associated receptive field of the U-Net model architecture.
The article is organized as follows.Section II describes the utilized dataset with examples, preprocessing, and data distribution.Section III presents the classification approach, data pipeline, and evaluation metrics.Section IV outlines the model architectures, and how the receptive field is increased.This is followed by Section V highlighting the results of the experiments, accompanied by a discussion.The article is concluded with a summary of the main findings and results in Section VI.

II. AI4ARCTIC/ASIP SEA ICE DATASET-VERSION 2
The experiments are realized using the ESA AI Ready Earth Observation (AIREO) sea ice dataset, AI4Arctic/ASIP v2 (ASID-v2) [4].It was compiled by the Technical University of Denmark (DTU), Danish Meteorological Institute (DMI), and Nansen Environmental and Remote Sensing Center (NERSC), and released on October 2, 2020.It comprises 461 scenes, each containing a Sentinel-1 dual-polarized HH and HV SAR image, auxiliary image parameters, a corresponding ice chart manually drawn by sea ice experts from SAR image, and PMR measurements from AMSR2 instrument on board the JAXA GCOM-W satellite.Scenes are distributed across the Greenland coast, as illustrated in Fig. 1, from March 14, 2018, to May 25, 2019.All data are co-located and georeferenced, and the size of the dataset is 315 GB.In this study, we utilize only SAR images as input data and sea ice charts as reference.

A. Sentinel-1 SAR
The Copernicus Sentinel-1 A and B SAR satellites operate in the C-band with 5.405-GHz frequency (5.5-cm wavelength) [5].The data products used are exclusively medium-resolution level 1 ground range detected (GRDM), recorded in EW operational mode [93 m × 87 m resolution (range × azimuth), with a pixel spacing of 40 m], and data are available in the original GRDM geometry with no geographical projection.The backscatter values are calibrated and converted from dB in the range [−30, +10] to a linear scale within [−1, 1]; however some outliers may still be present.Due to negative backscatter values, some scaled values are as low as −4.5 [4].EW SAR images are created by combining five subswaths in the azimuth direction, which exhibit slight radiometric variations.The Sentinel-1 TOPSAR technique, which electronically steers the antenna beam, causes the weighting of the radar echoes to vary, creating a scalloping effect, where the center of the burst is brighter than the edges [15].Moreover, the scalloping extent is dependent on the antenna steering angle of each burst, which makes the effect subswath-dependent and consistent within the same subswath.The initial near-range subswath with the lowest incidence angle is particularly affected.
The ESA Instrument Processing Facility (IPF) v2.9 (November 28, 2019) has been widely deployed and compensates for the scalloping effect by applying the inverse of the scalloping gain function [15].More recently, an extended thermal noise correction method has been proposed by the NERSC.The authors in [14] and [16] suggest that the noise is composed of an additive thermal [17] and a multiplicative textural noise component.IPF v2.9 offsets the scalloping component well, and the multiplicative part is introduced during scaling by the SAR processor and suggested to be offset by subwindow-wise adaptive rescaling of additive denoised pixel values.Rescaling is based on the optimal coefficient of the noise-induced standard deviation and the estimation of the noise contribution to local standard deviation [14]

B. Sea Ice Charts
Each ice chart is produced based on Sentinel-1 image, and it is a snapshot of ice conditions at the acquisition time.Ice analysts interpret the SAR image and draw polygons in a commercial GIS software production system based on fairly homogeneous regions of sea ice conditions.The conditions are described by multiple parameters and follow the World Meteorological Organization (WMO) code for sea ice-Sea Ice GeoReferenced Information and Data (SIGRID3).The primary descriptive factor is SIC-a metric from 0% to 100%, indicating the ratio of sea ice to open water, where 0% is icefree open water and 100% is fully covered sea ice.The ice concentration mapping is created through a creative process of individual interpretation steered by common guidelines with no associated uncertainty.However, studies have suggested that ice analysts assign concentrations that vary on average 20% and up to 60% discrepancies [18].Intermediate SICs (10%-90%) are particularly difficult to assess.The regions near the edge of the sea ice cover-called the marginal ice zone-receive more attention because it is the most important area for maritime operations.In comparison, inner ice areas with low maritime activity receive less attention.Despite these uncertainties, we treat each pixel as equally valid.
The ASID-v2 ice charts are delivered as polygons with an individual ice code with an associated lookup table containing a multitude of parameters for the specific polygon, including total SIC, partial concentration, stage of development, and form of dominant ice types.In our analysis, the SIC is divided into 14 different classes.Eleven of these describe the concentration from 0% to 100% (inclusively) in discrete increments of 10%.Two describe less than 10% sea ice and bergy water, both of which have been converted to the representative openwater class (0%) in the following experiments.One class describes landfast ice, which is converted into 100% sea ice class.The equivalent ice chart produced from SAR images in Fig. 2 is illustrated in Fig. 3(a) with a similar mask applied.In Fig. 3 I).The mislabeled region is of relatively high homogeneity in SAR image potentially causing this issue.

D. Data Distribution
The training and testing scenes are selected among the 461 scenes.Six scenes containing errors in the ice charts depicting open water as 100% sea ice are removed and scenes without sea ice are discarded to balance class distribution.For training, 306 scenes are selected.In collaboration with DMI, 23 scenes, deemed difficult by professional ice analysts, have been selected for testing.There is a split of roughly 9:1 between the train and test sets, respectively.The geographical distribution of scenes is illustrated in Fig. 1.In Fig. 4  The main obstacle when choosing the train and test set distributions is the large scenes (up to roughly 5000 × 5000 pixels after preprocessing and downsampling), and only 461 such scenes are available.The number of pixels is very high, but the diversity in geographical and seasonal samples is limited.Hence, there is an abundance of data, but it may not be representative of other periods or regions.Randomly cropping patches creates multiple smaller samples from each scene.However, sampling both the train and test data from the same scene is avoided to minimize biases, as each scene is spatially correlated.Therefore, selecting the test scenes is a balancing act, ensuring similar regional, seasonal, and class representation to the training data while retaining a reasonable train and test split.Due to these constraints, the usual training, validation, and test split, the latter two are combined to provide a broader test.

III. IMPLEMENTATION AND DATA PIPELINE
The estimation of SIC can be formulated as a regression problem (predicting absolute sea ice percentage) or a discrete classification problem.In this study, SIC estimation is formulated as a classification problem with the (weighted) categorical cross-entropy as the loss function, which represents the dissimilarity between real distribution of labels and output distribution predicted by the model.The categorical crossentropy loss evaluates each pixel vector individually, and then averages over all the pixels, essentially giving each of them an equal weight.To push the model to pay more attention to underrepresented classes, each class is weighted based on the median frequency weighting scheme [19].It is defined as median freq class freq (1) where w class is the class weight, median freq is the median frequency of all the classes, and class freq is the class frequency For each pixel, the weighted cross-entropy loss is defined as is the vector of predicted class probabilities, for each class, and class is the index of the true class.The final weighted loss is calculated by summing the class losses across all the N pixels pixels in the batch and discounting it by the sum of class weights loss = Batches are created by randomly cropping 512 × 512 or 768 × 768 pixels directly from the training scenes, with a batch size of 64 or 32, respectively.Patches with no valid pixels are discarded.As the number of valid pixels vary among scenes, a probability, p(s i ), of sampling from the i th scene is defined as where n(s) is the number of pixels in a given scene, s, and s * denotes the scene with the least amount of pixels in the scene, i.e., Thus, the scene with the least amount of "valid" pixels is sampled only once, and a scene with ten times as many pixels will be sampled ten times more often.The model is trained on 500 batches between each testing step, which constitutes one epoch.After a batch is compiled, data augmentation using an open-source Python imgaug [20] module is applied.Each batch is assigned a random set of augmentations independently for every patch and identical for SAR and ice chart data.The possible augmentations are the dihedral group; 0 • , 90 • , 180 • , or 270 • rotations, and horizontal, vertical, and two diagonal flips, i.e., eight in total.Additionally, there is a 50% chance of applying between 1 and 4 of the following affine transformations with a random bounded magnitude; [−44.99,44.99] degrees of rotation, ±30% scaling, ±30% translation, and ±10 degrees of shearing.The purpose of the applied data augmentation is to provide additional variation in the samples, limiting the number of identical patches that the model sees.This can help minimize model overfitting.
After each training epoch, the testing step occurs.Testing is carried out without augmentation on full scenes, i.e., no patching or stitching is utilized.This is in line with the recommendations for large images given in [21].Performance is assessed based on the statistical R 2 coefficient, also known as the coefficient of determination, which is a measure of similarity between two sets of data.It is defined as where y true i is the true i th pixel, ŷtrue i is the mean true pixel value, and y pred i is the predicted class of the i th.The R 2 metric has two main advantages over accuracy.First, as SIC is in its nature a continuous value from 0% to 100%, predicting 70% or 90% sea ice when the correct value is 80% is far better than predicting 0%, and this is encouraged by the R 2 coefficient.Second, accuracy does not reflect the data imbalance well.The models are trained for about 80-90 epochs (approximately 22-24 h training duration) using the Adam optimizer with a fixed learning rate of 10 −4 and default hyperparameters.The testing step takes approximately 2 min, and each scene approximately 6 s.After the final training iteration, the model version, which achieved the highest R 2 -score on the testing step, is selected.For comparing models, we also utilize this score.The R 2 test performance is calculated based on all the testing pixels (excluding masked pixels, class 11).Two Nvidia TeslaV100 SXM2 32-GB graphics cards have been used for computation.The training environment and models have been created in Python 3.8.2 using the open-source PyTorch 1.8 library.

IV. CNN MODELS
U-Net has a near-symmetric encoder-decoder structure in which the contracting path captures rich low-level representations while the expanding path enables precise localization.Skip-connections are used between corresponding pairs of encoder and decoder blocks to propagate information from the contracting path to the expanding path.This facilitates the recovery of high-frequency spatial information and improves the boundary accuracy.In the U-Net architecture, a block constitutes a sequence of two 3 × 3 convolutional layers, each followed by a batch normalization (BN) procedure and the rectified linear unit (ReLU) activation function.In the contracting path, 2 × 2 max-pooling operations are used for feature map downsampling.Similarly, in the expanding path, every block is preceded by a bilinear upsampling operation.Each symmetric encoder and decoder block with a skip connection between them is defined here as a level.A schematic overview of a regular four-level U-Net architecture is shown in Fig. 5.The original U-Net in [11] uses the identical number of filters in the convolutional layers across the same level and doubles them for each level, i.e., 64, 128, 256, and 512.In this work, we limit the number of filters to 16 in the initial and final levels, and 32 in the remaining levels to both simplify and minimize the risk of overfitting to ambiguous SAR textures.The U-Net implementation is available at [22].
The receptive field size of sequential convolutional layers can be calculated using the recursive equation [23] where r 0 is the equivalent number of pixels in the input image, L is the total number of layers, l is the current layer index, k l is the kernel size of that layer, i is the index of the previous layer, and st i is its stride.Fig. 6 illustrates the receptive field per layer of a two-level encoder and decoder in the U-Net.
Starting from the layer L in the encoder, and going left, it is clear that every 3 × 3 convolutional layer increases the receptive field by 2 pixels, while the 2 × 2 pooling layer with stride 2 doubles it.The receptive field for the two-level U-Net encoder is 32 pixels.Additional levels raise this to 68, 140, 284, 572, 1148, and 2300.The receptive field of the decoder with respect to the final layer pixels can similarly be identified by starting from the model output, as illustrated in Fig. 6.On the contrary to pooling layers, upsampling halves the receptive field.With two convolutional layers between each upsampling operation, the total receptive field can at most be composed of four encoder pixels.The blue stripped pixels in the U-Net illustrated in Fig. 6 represent the increase in receptive field per additional decoder pixel.This makes the effective receptive field of the U-Net equal to the encoder receptive field plus 3 × 2 N levels for models with at least two levels.The receptive field of the U-Net models is thus 44, 92, 188, 380, 764, 1532, and 3068 for 2-8 levels, respectively.Therefore, adding more levels in the U-Net is an effective way of both increasing the number of layers in the  I.
architecture, enabling the network to model more complicated functions, and expanding the amount of available information for predicting individual pixels through a larger receptive field.

V. RESULTS AND DISCUSSION
In Table I, ten architectures, associated model hyperparameters and the highest achieved performance based on the test scenes, are presented.The table is ordered with respect to the receptive field size and noise correction.The patch size was increased in models 9 and 10 to better accommodate the increased receptive field during training.These models were only trained on NERSC noise correction, as it had proven to be superior in models 1-8.Due to hardware constraints (GPU memory), the associated batch size was lowered to 32.
The best performing model is number 10 with eight levels, a receptive field of 3068 pixels, and trained on NERSC noise correction, with a patch size of 768 × 768 with an overall R 2 -score of 86.34%.The resulting inference on the scene in Figs. 2 and 3(a) is illustrated in Fig. 7. Predictions are more homogeneous, and the upper left corner is correctly classified as 100% sea ice compared with the result from standard U-Net with ESA noise correction in Fig. 3(b), demonstrating the improvement made by both the extended U-Net architecture and NERSC noise correction.The R 2 -score for the scene is 47.12%.Discrepancies are noticeable in the fjords in the upper half and right side of the image.In addition, intermediate SIC (10%-90%) may be improved in the image center.Generally, fjords are difficult to classify as few pixels are present from shore to shore due to SAR resolution.Moreover, in the absence of rough open-ocean dynamics, the sea ice may form smooth surfaces and reduce SAR backscatter making it appear darker due to specular SAR reflections.It is also possible that ice analysts have experienced from previous years that these fjords would usually be covered with sea ice at the image acquisition time, which can help deciding in ambiguous scenarios.
Furthermore, the results indicate that increasing the number of levels in U-Net generally improves the model performance, with greater improvements for those with -3-5 levels.Applying NERSC noise correction is also shown to provide clear improvements, particularly for models with lower receptive fields.As Sentinel-1 TOPSAR noise is spatial, it may occupy a substantial portion of small receptive fields.The noise is associated with small-grained pixels that may cause open water to become ambiguous with small sea ice floes.A larger receptive field may make the model more robust to spatial noise.
The difference between models trained with two noise corrections and the impact of increasing the number of levels in the U-Net is illustrated in Fig. 8.The initial row in Fig. 8(a)-(d) demonstrates models trained on ESA noisecorrected SAR data, and the second row (e)-(h) exhibits those trained on NERSC noise-corrected.The columns show predictions from U-Nets with -3-6 levels for each noise correction.The associated R 2 -scores are presented underneath the images.The NERSC-trained U-Nets are more robust against the ambiguous region in the upper left corner.For both noise corrections, increasing the number of U-Net levels provides more homogeneous predictions without incorrect open-water gaps.The performance in fjords is also decent.The associated R 2 -scores highlight the importance of NERSC noise correction for this particular scene with level 3 U-Net scoring better than all the models training on ESA noise-corrected data.
While producing predictions more similar to the manually drawn ice charts, models with fewer U-Net levels are capable of creating more detailed charts but have inferior overall performance.There may be a trade-off between the level of detail and homogeneity-a larger receptive field increases the homogeneity of SIC predictions, but it also appears to reduce the level of detail in predictions.Creating a model capable of producing homogeneous predictions and fine details in areas with an abundance of maritime activity, such as the marginal ice zone, could be ideal.Ultimately, the decision should be in agreement with the national ice services and chart users.
Four additional scenes from different regions and seasons are illustrated in Figs. 9 and 10 (inferenced by U-Net number 10 with eight levels).Fig. 9  islands and coast do not interfere with the model's capability of identifying the sea ice.However, the drifting sea ice in the upper right quadrant of the scene is not predicted as detailed as the reference ice chart.This may be a downside of applying a U-Net architecture with a very large receptive field.
It is difficult to compare the presented models' metric performance with other recent publications, i.e., [8], [10], as different training and test sets are utilized.However, a visual qualitative inspection of the produced ice predictions can be carried out.In [8], predictions are able to differentiate open water from sea ice but suffer from the Sentinel-1 SAR speckle and subswath transition textures, which may be a result of utilizing the previous version of Sentinel-1 SAR noise correction and applying a model architecture with lower depth and smaller receptive field.Figures in [10] indicate strong correlations between hand-crafted ice charts and predictions but suffer similar issues related to ambiguous SAR textures in regions of rough ocean surfaces caused by strong winds, Sentinel-1 subswath transitions, and compacted sea ice.The results presented here appear to be more robust to these obstacles but are still challenged by ambiguous SAR signatures from landfast ice, limited context in the smaller fjords, and less-than-optimal performance at intermediate SICs.

VI. CONCLUSION
This study presents the issue of ambiguous SAR backscatter signatures on sea ice predictions in regions of fully covered sea ice for CNN models utilizing only Sentinel-1 SAR data.Two problems are investigated: impact of the receptive field of the applied U-net and selection of thermal noise correction scheme.Experiments with the standardized U-Net architecture clearly show that increasing the number of levels produces more homogeneous predictions with a stronger resemblance to ice charts created by trained human ice analysts.The results also indicate that NERSC noise correction is superior to ESA IPF v2.9 for predicting SIC and enables models to predict more reliably in regions of full sea ice covers with little variation in the SAR textures.Overall results indicate that increasing the receptive field of the model and applying a superior SAR noise correction is a significant step toward automatic production of high-resolution sea ice charts with standalone Sentinel-1 SAR in seconds rather than hours.

VII. FUTURE WORK
Several improvements to the presented SAR-only trained models could still be investigated.Increasing the decoder  receptive field could be achieved by adding additional convolutional layers in each decoder block, providing a receptive field greater than 4 in the final encoder layer.Naturally, further expansion of the number of levels of the U-Net could also be of interest but it requires changing the choice of testing scenes or refraining from downsampling the SAR pixel spacing from 40 to 80 m.However, this would reduce the effective spatial receptive field of the models.
Solving the landfast sea ice prediction problem could potentially be addressed using larger training patch sizes and additional scenes from, for example, Scoresbysund where this phenomenon is well-known.Adding a class for landfast ice could also encourage the model to address it specifically.Providing additional auxiliary data such as distance to land and season/month of image acquisition could enable the model to better understand regional and seasonal variations.
Finally, this work has only investigated increasing the number of U-Net levels on SAR-only models.Similar investigations could be carried out using an SAR and PMR data fusion model, which could lead to direct comparisons of these types of models.

Fig. 1 .
Fig. 1.Geographical location of training SAR scenes is shown as a red semi-transparent silhouette.The black frame highlights the ice chart location of Figs. 2 and 3(a).Image reproduced from the GEBCO world map 2014 (www.gebco.net).

Fig. 2 .
Fig. 2. Sentinel-1 SAR image from Baffin Bay, Western Greenland, acquired on December 12, 2018.The location is highlighted as a black square in Fig. 1.Brightness on images is adjusted by clipping with the individual 5 and 95 distribution percentiles and displayed in dB.(a) and (b): HH and HV SAR channels, respectively, noise-corrected with ESA IPF v2.9, and contain a land and ice chart mask.(c) and (d): HH and HV SAR channels, respectively, and noise-corrected with the NERSC denoising scheme.
. The applied noise correction uses ESA IPF v2.9 for the azimuth component and NERSC noise correction for the range additive and multiplicative components.Negative SAR backscatter values after noise correction are replaced based on neighboring pixels in a 10 × 10 moving window.Fig. 2(a)-(d) illustrates a scene from Baffin Bay, Western Greenland, captured on May 21, 2018.Parts (a) and (b) are noise-corrected using the ESA IPF technique, which contains a land mask from DMI and is limited by the extent of ice chart.Parts (c) and (d) are modified with the NERSC noise correction scheme, and with the DMI land mask absent.The images are grayscaled using the individual images' 5% and 95% percentiles.The difference between the two noise correction schemes is most visible along the vertical subswath transitions in HV channels.White pixels in all the images are not number (NaN) values, representing areas with no data and masked land.
(b), a typical ambiguous example is highlighted with areas in the upper left corner of the image identified as 100% SIC mislabeled as open water.It is produced by an SAR trained U-Net architecture (model #3 in Table

Fig. 3 .
Fig. 3. Example of sea ice concentration maps derived from SAR image in Fig. 2. (a) Manual interpretation by an ice analyst.(b) Semantic segmentation of SAR image by the U-net model [11] (#3 in TableI).
(a), the seasonal distributions for the train and test scenes are shown.An overview of the train and test data class distributions is presented in Fig. 4(b).Class 0 (open water), class 10 (100% sea ice), and class 11 (masked pixels) are most represented.The intermediate sea ice classes, classes 1-9, are less, though relatively equally, represented.The test scenes are welldistributed among the regions and seasons, with few in the South and less during spring.The test class distribution reflects the imbalance of the dataset with slightly elevated quantities of intermediate SICs compared with the training data.

Fig. 4 .
Fig. 4. (a) Seasonal distribution of train and test scenes.(b) Percentage of pixels belonging to each class with respect to the total pixels in the train and test split.Classes 0-10 refer to 0%-100% sea ice, and class 11 is masked pixels.
) N class represents the number of pixels per class and N pixels the total pixel count.Masked pixels are not included in the calculations and assigned a class weight of zero.The class weights are calculated based on the training data distribution w 0,1,...,11 = [0.039,1.413, 0.907, 0.925, 1.089, 1.401, 1.233, 1.154, 0.702, 0.369, 0.099, 0].

Fig. 6 .
Fig. 6.Receptive field of a two-level encoder and decoder in the U-Net.Each square represents a pixel at different layers.White pixels denote convolutional layers with associated yellow triangles of 3 × 3 kernel views.Blue striped pixels illustrate the added receptive field per extra pixel in the final layer L. Red pixels are pooling layers with accompanying red triangles for 2 × 2 pooling views.Green pixels are upsampling layers with the corresponding green trapezes as the upsampling views.
(a)-(d) is from Scoresbysund in Eastern Greenland, acquired on February 8, 2019.The model achieves an R 2 -score of 87.32% on this scene.Discrepancies are apparent at the end of fjord and at the mouth; otherwise, there is a strong correlation.The model shows great robustness to high backscatter values in the SAR near-range right portion of the image, creating ambiguous SAR textures between open water and sea ice.Scoresbysund is notorious for landfast sea ice which is present in this scene.It is partly incorrectly labeled by the model.Despite the good overall performance, landfast ice predictions are still troublesome for the model.Fig. 9(e)-(h) is from the Davis Strait in South Western Greenland, acquired on April 23, 2019.The model achieves a scene R 2 -score of 93.01% on this scene.The scene contains many sea ice floes with smooth surfaces resulting in low backscatter values, which appear as dark holes within the closed packed sea ice cover.The model predictions have a strong correlation to the reference ice chart, capable of identifying a high SIC, though not the exact concentration.There is also a clear boundary between sea ice edge and open water.Sea ice in the narrow fjords is detected and matches relatively well the concentration in the reference ice chart.Fig. 10(a)-(d), illustrates a scene acquired on August 22, 2018, from the Fram Strait, North Eastern Greenland.The model achieves an R 2 -score of 84.89% on this scene.The scene is characterized by a long sea ice tongue and varying ice concentrations.There is a strong resemblance between predictions and reference chart.The locations of the sea ice are accurate, but the quality of concentration prediction is mixed.The detailed features in the upper portion of the image are lost in a homogeneous polygon.In the lower right corner, the model has predicted a larger area to contain ice.The region contains low backscatter values in both the HH and HV SAR images, and thus it is very difficult to distinguish whether this is open water or newly formed sea ice such as nilas.The final scene in Fig.10(e)-(h) illustrates another scene from the Fram Strait which was acquired on September 3, 2018.The model achieves an R 2 -score of 88.31% on this scene.The scene contains sea ice close to the meandering coast with a multitude of islands and ice concentrations.There is a strong correlation between reference ice chart and model predictions.The concentrations appear accurate, and the

TABLE I SUMMARY
OF TRAINED MODELS AND PERFORMANCE.THE COLUMN NAMES REFER TO THE FOLLOWING: REFERENCE NUMBER, TRAINING PATCH SIZE, TRAINING BATCH SIZE, *RECEPTIVE FIELD (RF), NUMBER OF LEVELS IN U-NET ARCHITECTURE, NUMBER OF CONVOLUTIONAL FILTERS IN THE EACH LEVEL, SAR NOISE CORRECTION, AND FINALLY THE TEST R 2 SCORE