A Scalable High-Performance Unsupervised System for Producing Large-Scale HR Land Cover Maps: The Italian Country Case Study

This article presents an operational system for the automatic production of high-resolution (HR) large-scale land cover (LC) maps in a fast, efficient, and unsupervised manner. This is based on a scalable and parallelizable tile-based approach, which does not require the collection of new training data. The method leverages the complementary information provided by the existing LC maps and recent acquisitions of HR Earth observation (EO) images to identify map units that have the highest probability of being correctly associated with their labels, and exploit the obtained “weak” training set to produce an updated HR LC map by classifying the recently acquired EO data. Both steps, performed at tile level, can be implemented on a high-performance computing (HPC) environment, which simultaneously process all required tiles (independently of each other) for the entire study area. The method was tested considering the publicly available 2018 Corine LC map having a minimum mapping unit of 25 ha and the Sentinel-2 images to generate an HR LC map of Italy. The obtained map has a spatial resolution of 10 m and presents the nine major LC types (i.e., “artificial land,” “bareland,” “grassland,” “cropland,” “broadleaves,” “conifers,” “snow,” “water,” and “shrubland”). Validation was performed using the 2018 European Land Use and Coverage Area Frame Survey database made up of in situ data. The overall accuracy achieved for the Northern, Southern, and Central part of Italy and the Italian Islands is 91.29%, 91.63%, 92.21%, and 91.06%, respectively.

monitor land cover change (LCC) [1]. For this reason, over the past decades, several LC and LCC products have been produced at the national [2], continental [3], and global level [4] by various international or national initiatives. The first National Land Cover Database (NLCD) for the United States was published in 1992 [5] through the Multiresolution Land Characteristics consortium. A classification map of 21 LC classes was generated consistently across 48 states of the United States at a spatial resolution of 30 m. The classification map was generated using the 1990 Landsat Thematic Mapper (TM) and ancillary data (e.g., topography, census, soil properties, and other LC types, as well as wetland maps). In 1999, Landsat 5 and Landsat 7 images were used to expand and update the 1992 NLCD, thereby generating subsequent NLCD data products. In particular, the 2006 NLCD was the first mapping project to generate the Unites States nationwide LCC detection map with 30-m cell sizes with a minimum mapping of 1 acre [6]. In [7], Robinson et al. present a method to classify very-high-resolution (VHR) aerial images that combines HR labels available at local level and low-resolution (LR) labels extracted from the 2011 continental NLCD map. An ad hoc deep learning model is defined to perform the fusion of the considered multiresolution dataset.
At the European level, a coherent system of long-term LC maps has been generated by the European Environment Agency, namely the Corine land cover (CLC) map [8]. The classification scheme consists of 44 classes, mixed LC, and land use (LU) classes, with minimum mapping units of 25 ha for LC and 5 ha for LCC detected every 6 years [9]. According to the predefined classification scheme, the LC map is generated and updated nationwide by visual interpretation of optical/nearinfrared satellite images [10]. Because of the reliability of the LC map product, the Corine land cover (CLC) map is considered as a baseline for European LC mapping and subsequent applications [11]. As an example, in [12], Sentinel-1 images acquired in 2014 were resampled to 100-m resolution and used to reproduce the CLC mapping. To increase the temporal resolution of the CLC map, in [13], Baudoux et al. propose a method to translate an annually updated french national LC product into the CLC map. The method was able to achieve an accuracy of 81% for the entire French Country. In [14], Stoian  Focusing on global scale, various LC and LCC products [15], [16], [17], [18] have been produced over the years. Although different EO data and different classification schemes were considered, all of these maps aimed at monitoring the extent and distribution of major LC classes globally. In [19], Hua et al. analyze the spatial consistency (at different elevations and climatic zones) of five LC maps produced between 2000 and 2013 at the global and continent scales. The results show that the overall consistency of the five thematic products ranges from 49.2% to 67.63%, with the highest consistency recorded in Europe and the lowest in Oceania. As expected, the consistency is relatively low since these LC maps were produced considering different data sources (i.e., both ground reference data and EO data), classification schemes, and different classification approaches. Similar results are presented in [20], where Tchuente et al. compared four global maps focusing on the African continent. To compare the different LC maps, the four thematic legends were converted into more aggregated categories after a projection in the same spatial resolution. The results confirm that the agreement between the thematic maps ranges between 56% and 69%. In particular, although all the LC maps show reasonable agreement in terms of surface types and spatial distribution patterns, when focusing on heterogeneous landscapes they differ strongly from each other.
It is worth mentioning that all the aforementioned global products were generated at a coarse spatial resolution, i.e., from 300 m to 1 km. With the recent availability of Sentinel-1 and Sentinel-2 data acquired in the framework of the Copernicus Programme, much effort has been devoted to produce HR thematic products [21]. These data offer the ability to perform continuous monitoring of the Earth's surface at high temporal and spatial resolutions. With dozens of observations throughout the year, Sentinel-1 and Sentinel-2 sensors are a powerful source of information for the regular generation of HR LC classification maps. For instance, in [22] [24], a freely accessible LC map produced at 10-m resolution at global scale using both Sentinel-1 and Sentinel-2 data. The product depicts 11 LC categories (i.e., tree cover, shrubland, grassland, cropland, built-up, bare/sparse vegetation, snow and ice, permanent water bodies, herbaceous wetland, mangroves, and moss and lichen) and has an overall accuracy (OA) of 74.4% [24]. In [25], Brown et al. presented a novel approach to continuous generation of HR LC maps at global level at 10-m spatial resolution by exploiting deep learning and Sentinel-2 images. To train the fully convolutional neural network, a huge effort was devoted to the collection of the reference data. About 4000 Sentinel-2 images were manually annotated by experienced photointerpreters. All annotators were asked to label at least 70% of a tile in a maximum amount of time of 60 min per tile.
Although the availability of Sentinel-1 and Sentinel-2 data has significantly increased the possibility of frequently acquiring EO data, producing LC maps is expensive and time-consuming, especially when performed at country, continental, or global level [13]. In particular, the most challenging part is the collection of constantly updated annotated samples, which is extremely demanding on a large-scale through ground surveys or photointerpretation [26]. In addition, when dealing with large-scale LC mapping, distributed computing architectures are required to generate the output quickly and efficiently [22]. To solve these issues, this article presents a system architecture for automatically producing HR LC maps that does not require the collection of ground reference data (i.e., it is completely unsupervised), has been defined to work at the Sentinel-2 tile level to be scalable and parallelizable, and has been implemented in a distributed computing architecture to enable the production of large-scale LC maps in a short time. Specifically, the presented system architecture is based on the method proposed in [27], which was tested on a small study area of 1549 km 2 . The method automatically extracts a "weak" training set by combining information provided by an existing LC map, and a Sentinel-2 time series acquired during the target year to be classified. The obtained training set is then used to classify the considered Sentinel-2 image time series. The main advantage of this approach consists in its ability to utilize the complementary information provided by the considered data sources. While the thematic map has the advantage of providing LC information for the entire study area, the Sentinel-2 data can be used to verify the consistency of these classes with those actually present in the scene. Therefore, combining the two data sources allows us to extract training samples over the entire study area, and select "weak" labeled units having the highest probability of still being valid. Exploiting these two properties, in this article, we present the parallelized version of [27], defining a tilewise system architecture, i.e., the training set is extracted from the map for each Sentinel-2 tile and the classification is performed tile based. This condition allows us to implement the method on a distributed high-performance computing (HPC) platform, which can simultaneously process all required tiles (independently of each other) for the entire study area. The method was scaled to a national level, i.e., a study area of 301338 km 2 , to generate an HR updated map for the Italian country. It is worth noting that due to its high climatic variability, Italy presents different landscape and environmental conditions, allowing us to assess the robustness of the approach at a large scale.

II. STUDY AREA AND DATA PREPARATION
To generate an HR LC map of Italy, we considered the CLC map and the Sentinel-2 images. The map was generated for 2018 due to the availability of the 2018 LUCAS Copernicus in situ surveys, which allows objective validation of the obtained map. The considered study area is the entire Italian peninsula characterized by a spatial extent of ∼ 301.338 km 2 . The whole country is covered by the 60 Sentinel-2 granules reported in Fig. 1. The Sentinel-2 tiles are reported in the UTM projection and WGS 84 geographic coordinate system. Italy falls within the following two longitude fuses (i.e., longitude-east to west): 1) 30 tiles in UTM WGS84 32 N; 2) 30 tiles in UTM WGS84 33 N; and the following two latitudinal strips (i.e., latitude-south to north): 1) 15 tiles S (from 32°to 40°latitude north); 2) 45 tiles T (from 40°to 48°latitude north).
In the following, details on the considered data and their preparation for the processing are reported.

A. Considered Thematic Map: CLC Map
The considered thematic product is the 2018 CLC map, which has a spatial resolution of 100 m, with a minimum mapping unit of 25 ha. The classification scheme is based on a hierarchical structure made up of three levels and 44 classes. The first level is composed of five items, which correspond to the major LC types, namely "artificial areas," "agricultural land," "forests and seminatural areas," "wetlands," and "water surfaces." The   TABLE I  TRANSLATION OF THE 2018 CLC LEGEND INTO THE PROPOSED TARGET  LEGEND FOR THE CONSIDERED STUDY AREA second level is made up of 15 items, i.e., "urban fabric," "industrial, commercial, and transport units," "mine, dump, and construction sites," "artificial non-agricultural vegetated areas," "arable land," "permanent crops," "pastures," "heterogeneous agricultural areas," "forests," "shrub and/or herbaceous vegetation associations," "open spaces with little or no vegetation," "inland wetlands," "coastal wetlands," "inland waters," and "marine waters." However, the highest level of detail is provided in the third level, which presents the whole CLC nomenclature [28]. The production of this map is based on the visual interpretation of optical/near-infrared satellite images for locating, delineating, and identifying LC units. In particular, the production of the 2018 CLC map is performed by integrating the data of LCC between the years 2012-2018-as primary product-with the revised LC map of year 2012 (revised CLC2012)-as side product [29]. The LCC detected every 6 years have a minimum mapping unit of 5 ha, thus leading to the presence of many missed LCC. Moreover, due to the coarse spatial resolution of the map, compared to the spatial resolution of the Sentinel-2 images, many Sentinel-2 pixels are associated to the wrong LC. For all these reasons, the aim of the proposed system is to generate an HR map, which represents the LC classes present in 2018.
First, The CLC map is rescaled at the highest spatial resolution of Sentinel-2 data, i.e., 10 m. Then, its legend is converted into a set of LC classes, which can be discriminated according to the spatial and spectral properties of Sentinel-2. For this reason, we excluded the LU classes (e.g., "sport and leisure facilities,") that cannot be discriminated using the spectral information of Sentinel-2 only, as well as spatially mixed classes (e.g., "mixed forest") that are not expected to be present at the 10-m spatial resolution. Table I reports the CLC classes present in the considered  TABLE II  SENTINEL-2 IMAGE TIMES SERIES USED TO PRODUCE THE HR LC MAP study area. First, the CLC classes are converted into a detailed legend aimed to generate the training set. This condition allows us to accurately model the LC present in the final target legend. For instance, to accurately model the "Cropland" target class, it is necessary to include samples belonging to all its natural classes, i.e., "non-irrigated crops," "irrigated crops," "annual crops," and "rice fields." Neglecting the samples of any of these classes would affect the quality of the final classification result, which is converted into the desired target legend. This conversion is performed according to the land cover classification system (LCCS) [30], which is the common standard used for translating a comparing legends. The final target legend represents a set of widespread LC classes: "artificial land," "grassland," "cropland," "bareland," "broadleaves," "conifers," "shrubland," "water," and "snow."

B. EO Data: Sentinel-2 Data
The HR LC map is produced by using a times series made up of four Sentinel-2 images acquired between April and September. For each tile, the less cloudy images within this period were automatically selected leveraging on the cloud cover information estimated by the Sen2cor tool provided by the European Space Agency (ESA) [31]. These time series of images allow us to better discriminate the LC types present in the target legend with respect of the classification of a single Sentinel-2 acquisition. For instance, classes such as "Crop" and "Grass" may have similar spectral signatures for some Sentinel-2 acquisitions (e.g., April or May), but show different temporal profiles due to the crop's unique phenological properties. According to the level of detail of the desired target legend, the user can decide to consider denser time series of images (e.g., to capture rapid rate of changes) or longer time series of images (e.g., to distinguish different crop types). Table II reports the time series of Sentinel-2 images used for generating the HR LC map of the Northern, Central, and Southern part of Italy and Italian Islands (Sardinia and Sicily). For each Sentinel-2 tile, the considered images are reported. We considered only the spectral bands acquired at the spatial resolution of 10 m (i.e., Band 2, Band 3, Band 4, and Band 8) and 20 m (i.e., Band 5, Band 6, Band 7, Band 8a, Band 11, and Band 12), which are interpolated at 10 m considering the nearest neighboring technique. Each pixel is characterized by a multitemporal spectral signature of 40 features, i.e., ten spectral channels per four Sentinel-2 images.

III. SCALABLE HR LC MAP PRODUCTION SYSTEM
The objective of this study is to present a scalable highperformance approach that can be used to generate HR LC maps in a fast, efficient, and unsupervised way. The peculiarity of the considered system architecture is its tile-based implementation. This condition allows us to exploit the advantage of parallel computing offered by high-performance computing (HPC) systems. By increasing the number of tasks up to the required number of tiles, the time required to produce the nationwide LC map can be the same as that performed at the single tile level. The presented system architecture is based on the method proposed in [27], which was tested on a small study area of 1549 km 2 . Fig. 2 shows the flowchart of the proposed scalable HR map production system based on the following four main steps: tile-based data preparation, tile-based training set extraction, tile-based LC map production, and postprocessing.

A. Tile-Based Data Preparation
Let M ∈ R m×n be the considered thematic map used to extract the "weak" training set, having size m × n and characterized by a set of LC classes Ω c = {ω u } u . Let TS = (X 1 , X 2 , . . . , X q ) be the times series of recent EO data made up of q images, where X j ∈ R m×n×b is a multispectral image having m × n pixels and b spectral channels, with j = [1, . . . , q]. To efficiently produce the HR map, M is divided into k tiles, i.e., M i ∈ R l×p associated with the corresponding times series TS i , where i = [1, . . . , k] and l < m and p < n. In order to generate a harmonized final thematic product, tiles must be produced to guarantee a degree of overlap between adjacent tiles.

B. Tile-Based Training Set Extraction
Due to the coarse spatial resolution of the considered LC map, compared to the spatial resolution of the Sentinel-2 images, many Sentinel-2 pixels are associated with the wrong LC. To solve this problem, this step aims to identify the map labeled units having high probability of being correctly associated with their labels at the spatial resolution of the considered HR EO data [27]. In addition, both labeled units and EO data cover the entire study area, therefore, the first step of the method can extract a training set at tile level. To identity valid map labeled units, in [27], Paris et al. perform a data-driven clustering analysis per map-polygon. Indeed many LC maps are aggregated at the polygon level, where most of the pixels in the polygon are expected to be correctly associated with their LC classes. On the one hand, this is an effective strategy to reduce classification noise at pixel level and to facilitate the LC update (i.e., the CLC map is updated at polygon level for changes having a minimum mapping unit of 5 ha). On the other hand, this spatial aggregation leads to the presence of many pixels that are not correctly associated with their labels. In addition, the map may have classification errors. Therefore, it is necessary to select map units having the highest probability of being correctly associated with their labels to generate a reliable training set. In the considered implementation of the method, the clustering analysis is performed at class level, thus increasing the ability of the method to handle possible large LCC and reduce the computational burden of this step. When working at class level, the clustering step should be repeated for each class in the scene (e.g., for the experiments considered 9 times as we have nine classes). In contrast, when performed at the polygon level, it should be repeated thousands of times, depending on the number of polygons in the tile. Furthermore, while at the polygon level, the most abundant cluster may not be the one correctly associated with the polygon label due to changes, at the class level, it is reasonable to assume that most pixels are correctly associated with their labels. Let us focus the attention on the ith tile. For each LC class ω u present in M i , the samples associated with . . , C M i ω u,tu } according to their spectral similarity. To this end, we employ an unsupervised k-means clustering algorithm because of its computational efficiency [32].
The feature space used to perform the clustering analysis is made up of n f robust spectral indices strictly connected to the physical meaning of the considered LC classes (see Table III). For simplicity, these features are extracted from the least cloudy image of TS i , i.e., X q i (see Table II). Let x r ∈ R 1×n f be the rth pixel of ω u associated with the n f spectral features extracted from X q i . Starting from an initial random set of centroids, i.e., {m 1 , m 2 , . . . , m t u }, the k-means clustering algorithm associates each x r to the nearest centroid computed according to the Euclidean distance metric. In order to achieve a global minimum, the position of the centroids {m 1 , m 2 , . . . , m t u } is progressively adjusted by minimizing where m w is the centroid of the wth cluster C M i ω u,w , with w ∈ [1, . . . , t u ]. To increase the probability of selecting the most reliable samples, we analyze the sample distribution of each cluster in order to keep only the samples closer to its centroid. In particular, for each C M i ω u,w , the samples having distances from m w higher than the 75th percentile of all the cluster samples distances are discarded. Moreover, for most of the classes, the k-means is applied to detect t u cluster, in order to keep t u − 1 clusters, i.e., remove the spurious samples that are not correctly associated with ω u (i.e., possible LCC, misclassified pixels, polygon spatial aggregation). Note that this strategy allows us to handle multimode classes, i.e., classes including pixels having different spectral properties. A qualitative example of the map spatial aggregation problem is reported in Fig. 3, where pixels belonging to the LC class "artificial land" are divided into three clusters. While the first two clusters are correctly associated with their label (i.e., red and white roof are correctly associated with the label "artificial land"), the minor cluster represents the spurious pixels associated with the urban green areas. Once the clustering is completed and the noisy map labeled units are discarded, a stratified random sampling strategy is applied to generate h "weak" training sets having LC prior probabilities proportionate to what reported in M i , i.e., {TrSet 1 i , TrSet 2 i , . . . , TrSet h i }. Taking advantage from the availability of the many samples present in the map, the considered training sets are generated according to a bootstrap strategy without replacement. This condition allows us to generate a set of h "weak" training sets, independent form the statistical view point.

C. Tile-Based LC Map Production
The second step of the method performs the tile-based LC map production. For each M i , we generate the HR LC map by classifying the corresponding TS i of images using the h "weak" training sets computed in the previous step, i.e., {TrSet 1 i , TrSet 2 i , . . . , TrSet h i }. First, a feature selection step is carried out to select only the most relevant n g features of TS i , in order to reduce the feature space dimensions, and thus, the computation time. Note that this step also allows the increase of the classification accuracy, since it reduces noise and redundant information. The feature selection is performed considering the simple and fair Sequential Forward Floating Selection method, based on the Jeffrey-Matusita distance as separability criterion [33], [34]. This step is done once, in order to define the same set of features for all the tiles.
The classification model is based on support vector machines (SVM) classifiers with the "Gaussian radial basis function" [35] because of their generalization ability and capability to handle noisy training sets [36]. Indeed, since the training set extraction phase is completely unsupervised, it is reasonable to assume the presence of noisy labeled units. For this reason, the trained SVM classifiers are considered as "weak" learners. To mitigate the possible presence of noisy training samples, the final classification result is obtained by adopting a bagging technique to define the ensemble of SVMs, which are trained with the independent training sets {TrSet 1 i , TrSet 2 i , . . . , TrSet h i }. Let us define the decision functions of the h trained SVM models as x v ∈ R 1×n g be the vth pixel representing the set of n g selected features extracted from the TS i by the SFFS method. The final label associated with x v is determined through the majority voting rule as follows: where #{φ s i (x v ) = ω u } is the number of SVM models whose classification result is ω u for x v . Because of the statistically uncorrelated training sets, the classification errors are expected to be uncorrelated as well. Therefore, the aggregation results is expected to be more reliable that the one obtained using only one classifier [37].

D. Postprocessing
Once all the tiles are generated, a postprocessing step is performed to merge them into a whole consistent product. In particular, this step aims to harmonize the overlapping areas among neighboring tiles. To this end, a majority rule approach is considered, by taking into account all the classifications results computed by the SVM classifiers. Let us focus again on the generic vth pixel x v ∈ R 1×n g , assuming that it belongs to both the ith and the (i + 1)th tile. To define its classification value, we considered both to compute the following aggregation: At the end of this step, the harmonized HR LC map is available at the tile level.

IV. DESIGN OF THE EXPERIMENTS AND VALIDATION PROCEDURE
This section presents the implementation details used to design the experiments. Moreover, it presents the procedure employed to validate the generated HR LC map. To this end, the 2018 LUCAS database in situ survey was used, thus providing an objective quantitative validation of the results obtained for the whole study area.

A. Design of the Experiments
In the considered implementation of the method, we adopted the Sentinel-2 tiling grid, thus dividing the CLC map into 60 tiles having size l × p equal to 10980 × 10980 pixels. For all the considered LC classes, the number of clusters to identify is equal to 4 in order to keep only 3 of them except for "nonirrigated arable land," "permanently irrigated land," and "annual crops." For these classes, the samples belonging to all the four clusters were kept because of the low percentage of noisy pixels   in the CLC map for these classes, and the fact that they include several natural classes (i.e., different crops). Please note that these numbers were set according to the a priori knowledge of the properties of both the target legend and the CLC map for the considered study area.
Regarding the feature selection step, the Jeffrey-Matusita average distance reached saturation when the number of features was 25 out of 40, thus indicating that adding the remaining 15 features does not change the separability between the LC classes. The Sentinel-2 bands selected are reported in Fig. 4. This result demonstrates the importance of using a time series of images. Indeed, for all the Sentinel-2 acquisitions, we have at least 4 spectral bands selected. As expected the spectral bands acquired in the shortwave infrared portion of the electromagnetic spectrum were selected for all the images of the time series. In contrast, the Band 8a was not selected for any image of the time series.
For each tile, we extracted three training sets according to the bootstrap strategy without replacement. To retrieve the best model parameters for each SVM model, we considered a threefold cross-validation strategy. The standard grid search approach was applied to select C and γ parameters in a range of [100, 325, 550, 775, 1000] and [0.0001, 0.5, 1, 1.5, 2], respectively. Please note that, the tile-based approach allows us to customize the classification result at local scale. Indeed, it allows the detection of the best model parameters for each local training set representative of the considered tile.

B. Validation Procedure
To quantitatively evaluate the accuracy of the generated LC maps, we took advantage of the publication of the 2018 LUCAS database. The LUCAS survey is coordinated by the Statistical Office of the European Commission (Eurostat). It aims to collect harmonized data LC/LU, agroenvironmental, and soil data by field observation of geographically referenced points. The system is based on a set of independent rules that allow correlation with any classification scheme regardless of the scale and the source of the map. The LC labels present in the LUCAS database are defined according to the LCCS [38]. In particular, for each sample, there are both the LC and the LU indications. Hence, the LU information strongly improves the flexibility of the LC codes, since by combining different LC and LU classes, it is possible to retrieve specific information. These labels were used to match the surveyed samples into the set of classes present in the target legend (see Table I). We could translate 26 423 samples out of the 29 359 collected in Italy.
To spatially match the LUCAS points and the Sentinel-2 data, we considered a window of 3×3 Sentinel-2 pixels. The window is centered in the theoretical latitude and longitude coordinates of the LUCAS point associated with the surveyed point. If at least one of the pixels in the grid is associated with the LUCAS label, the point is considered correctly classified. Please note that the 3×3 window allows us to: address possible GPS errors and handle those classes for which the LUCAS observation is relative to an observation area defined by a 20-m radius around the point. The spatial distribution of the reference data is represented in Fig. 5, while Table IV shows the number of samples divided per class. The LUCAS samples are represented with the corresponding colors of the considered target legend. The samples are equally distributed throughout the country and allow for accurate validation of the obtained HR map.

V. EXPERIMENTAL RESULTS
In this section, we first present the quantitative results obtained according to the validation carried out using the LUCAS database. Then, qualitative results of the obtained map are reported and discussed. Finally, the computational effort required to generate the HR LC map for the whole country using an HPC is presented.

A. HR LC Map: Quantitative Results
To quantitatively evaluate the results obtained, we considered the standard classification metrics, i.e., the overall accuracy (OA%), the producer accuracy (PA%), the user accuracy (UA%), and the F-score (F1%). To perform this analysis, the Sentinel-2 tiles were split into islands, Northern, Southern, and Central part of Italy as presented in Table II. Table V shows the results  achieved the PA% and UA%, while Table VI shows the comparison in terms of F1% and OA% obtained for the whole Italian country, the north, the central, the south, and the islands, respectively. Due to the capability of the proposed system of extracting the training set at tile level, it adaptively handles the different landscape and environmental conditions of the whole Italian country by obtaining similar results regardless of the latitude. Indeed, the method is completely data driven. Moreover, the results obtained confirm that the proposed system architecture allows for the extraction of an informative and representative training set in an unsupervised but reliable way.
As expected, the minimum F1% is achieved on the "snow" (66.67%), since this peculiar LC class is not permanent for the whole year. The highest F1% are achieved by both the "artificial land" and "cropland" having on the whole country 94.39% and 95.42%, respectively. As demonstrated by the UA% and PA%, the method was able to accurately model the complex "cropland" class, by properly capturing the main natural classes aggregated under this label. In contrast, several samples are wrongly assigned to the "shrubland" class according to its UA% (i.e., 69.58%). This is also due to the fact that the size of the shrubs is smaller than the spatial resolution of Sentinel-2, thus leading to mixed spectral signatures associated to this class. For this reason, the "shrubland" class is confused with the "bareland" one, thus leading to a UA% of 61.30%. It is worth noting that the problem of confusing "bareland," "shrubland," and "grassland" classes at the 10-m spatial resolution of Sentinel-2 is well known when dealing with large-scale LC mapping [23], [39]. However, the F1% averaged over the nine LC classes is 82.80%, 82.43%, 84.88%, 84.64%, and 82.37% for the whole country, the north, the central, the south, and the Italian Islands, respectively. It is worth mentioning that such results demonstrate the possibility of generating HR updated maps in an unsupervised way. Fig. 6 shows the CLC map after the legend conversion and the HR LC map generated at a 10-m spatial resolution. The obtained map shows much sharper geometric details than the initial thematic product. Analyzing the obtained map in detail, the qualitative analysis confirms the quantitative results. Figs. [7][8][9][10][11] show some examples of the results obtained by comparing the: generated HR LC map, true color composition of one Sentinel-2 image, and the CLC map after the legend conversion.

B. HR LC Map: Qualitative Results
Let us first focus on Fig. 7, which shows the obtained HR map of the 32TPS Sentinel-2 tile. The obtained map accurately captures the details present in the scene by identifying the cultivated and urban areas located in the lower left part of the image. This is due to the fact that the considered map is generated at 10-m spatial resolution, while the original CLC map is provided with a minimum mapping unit of 25 ha. Similar results can be seen in Figs. 8-10 for the Sentinel-2 tiles 32TQM, 33TUL, 32TPN, respectively. Also in these cases, the level of detail provided to represent both the urban and the cultivated areas is much better than that present in the original map, where strong aggregations are visible. Finally, Fig. 11 represents the HR result obtained for the Sentinel-2 tiles 32TNS. The geometrical details of the urban area, the lakes, and the forest area present in tile 32TNS are consistent with what is visible in the Sentinel-2 image.
These qualitative examples, randomly chosen from different locations of Italy, demonstrate the effectiveness of the method in accurately extracting a reliable training set and its ability to generate HR LC maps in an automatic and unsupervised manner.

C. HR LC Map: Computational Time
The production of the HR LC map was carried out on an HPC cluster composed by 63 nodes for a total of 4052 cores and 36 TB of RAM interconnected using 10-Gbs Ethernet having InfiniBand network at 40 GBits. The cluster is run by the Linux CentOS 7 operating system and the job scheduling is managed by the portable batch systems (PBS) Professional software. Python 3.7.5 was adopted as programming language. The main libraries used by the developed workflow are numpy, gdal, scipy, scikit-image for processing geo-referenced images and scikitlearn for performing the classification the SVM classification. The whole processing chain took in average 200 min per tile, which correspond to one job executed on one node having 27 cores. Due to the availability of the 63 nodes, all the 60 Italian tiles were processed in parallel. Hence, the computational time was 200 min for the whole Italian country. As reference, the theoretical execution on an 8 core machine with 32 GB of RAM would require about 800 min for computing just one tile. By considering the resources needed by each single tile, the total number of resources used to compute the Italy coverage simultaneously were 120 cores, 2.4TiB RAM, 0.8TiB of disk.

VI. DISCUSSION AND CONCLUSION
This article has presented an operational system for producing large-scale HR LC maps. The proposed method is scalable and parallelizable in order to perform the map production in a fast, efficient, and unsupervised manner. To this end, the system architecture is defined to work at tile level, by exploiting the complementary information provided by an existing LC map and the recent EO data to extract a local training set for each tile leveraging the LC information provided by the 2018 CLC map, and generate the HR LC map tile based. To validate the proposed system, we generated 10-m spatial resolution LC map of the whole Italian country (which is covered by 60 Sentinel-2 tiles having a total extent of 301 338 km 2 ). The results were validated using the in situ surveys of LUCAS database, which provided ground reference samples collected all over the country.
The quantitative and qualitative results obtained demonstrate the effectiveness of the proposed approach, which was implemented on an HPC with 63 nodes that enables the production of the HR map in 200 min, which is the time required to produce a tile map. From the results obtained for the whole country, the islands, the north, the south, and the central part of Italy, it turned out that the method is able to extract a representative training set for each tile, thus leading to a reliable production of the map regardless of the LC types present in the scene. In particular, the possibility of extracting a local training set for each Sentinel-2 tile both enables the parallel approach and an accurate classification of the heterogeneous landscape of the Italian country. The adopted data-parallel strategy distributes the production of the LC map over multiple nodes, thus mitigating the computational load of the map production at the country level. Moreover, each Sentinel-2 tile (having size 10980 × 10980) was split into 25 subtiles (having size 2196 × 2196), which are processed in parallel by the cores of the node to further speed up the process. Note that the scalability of the workflow in handling more Sentinel-2 tiles depends on the number of nodes available in the HPC considered. According to the experiments conducted, by increasing the number of nodes to the number of tiles, the relationship between the speedup and the level of parallelism is linear. To work at the European level (covered by almost 1300 Sentinel-2 tiles) with the considered HPC, the production of the map would take approximately three days.
As future developments, we aim to test the proposed system architecture to work at continental scale. To this end, we plan to exploit the possibility of generating monthly or seasonal composites to harmonize the Sentinel-2 acquisitions of different tiles. Moreover, we aim to explore the possibility of using a convolution deep learning model to improve the classification results obtained. Finally, we aim to test the method with more complex classification schemes.