Early-Season Crop Mapping on an Agricultural Area in Italy Using X-Band Dual-Polarization SAR Satellite Data and Convolutional Neural Networks

Early-season crop mapping provides decision-makers with timely information on crop types and conditions that are crucial for agricultural management. Current satellite-based mapping solutions mainly rely on optical imagery, albeit limited by weather conditions. Very few exploit long-time series of polarized synthetic aperture radar (SAR) imagery. To address this gap, we assessed the performance of COSMO-SkyMed X-band dual-polarized (HH, VV) data in a test area in Ponte a Elsa (central Italy) in January–September 2020 and 2021. A deep learning convolutional neural network (CNN) classifier arranged with two different architectures (1-D and 3-D) was trained and used to recognize ten classes. Validation was undertaken with in situ measurements from regular field campaigns carried out during satellite overpasses over more than 100 plots each year. The 3-D classifier structure and the combination of HH+VV backscatter provide the best classification accuracy, especially during the first months of each year, i.e., 80% already in April 2020 and in May 2021. Overall accuracy above 90% is always marked from June using the 3-D classifier with HH, VV, and HH+VV backscatter. These experiments showcase the value of the developed SAR-based early-season crop mapping approach. The influence of vegetation phenology, structure, density, biomass, and turgor on the CNN classifier using X-band data requires further investigations, along with the relatively low producer accuracy marked by vineyard and uncultivated fields.


I. INTRODUCTION
M ODERN agriculture will have to combine the needs of productivity with those of environmental, economic, and social sustainability [1] in a climatic context made uncertain by the effects of climate change [2]. Information that can help in implementing advanced and integrated monitoring and forecasting systems to promptly identify the risks and the impacts of calamities and crop practices on agricultural environments is essential. Satellite Earth observation data were revealed to be optimal for the aforementioned tasks for three main reasons.
1) From the spatial point of view, they can cover wide areas with different spatial resolutions [3]. 2) From the temporal point of view, since they can be frequent, they can benefit from historical series for long-term analysis [4], and they can be punctual thanks to the continuous acquisition of Copernicus constellations [5]. 3) From an economic point of view, they are becoming more convenient thanks to the provision of free satellite data and software for their processing and display [6], [7]. Agricultural ecosystems are characterized by strong variations within relatively short time intervals. Depending on the observation period, the agricultural scenario can present itself in a totally different way due to the different biomass, phenology, and turgor that can be driven by cultivar and agricultural working, as well as weather conditions. These dynamics are challenging for crop classification and the knowledge of vegetation status can deliver crucial information that can be used to improve the classifier's performance [8].
To consider these aforementioned changes in agricultural vegetation and soil status, a multitemporal approach based on the study of time series of remotely sensed indices was revealed to be successful [9], [10]. Time series of satellite images offer the opportunity to retrieve the dynamic properties of target surfaces by investigating their spectral properties combined with temporal information on their changes [11].
Crop mapping represents important information in the context of programs for the monitoring of rural areas on a regional and global scale [12], [13]. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The early season mapping (ESM) allows for a refinement of crop mapping and plays a prominent role in facilitating the following: 1) the prediction of water consumption due to irrigation in water balance monitoring [14], [15]; 2) the control of pesticides [16]; 3) the control of food production and waste [17]; 4) the control of many prescriptions is contained in the plans of the Community Economic Policy. ESM is also the basis for developing algorithms and systems for monitoring the growth of vegetation and for the estimation of biomass and yield [18]. The main aim of ESM is therefore to provide public and private decision-makers with timely information on crop types and conditions that are revealed to be crucial for agricultural management in the context of programs for rural areas and resource management on a regional and global scale [19], [20]. ESM is based on the recognition of incomplete series of temporal trends of optical or microwave indices; more in detail ESM is carried out using only remotely sensed data acquired in the first months of the agricultural years. The quality of classification and the time horizon in which crop maps can be provided depends on the type and temporal resolution of satellite data, the classifier, and the operator's ability.
In literature, many studies showed the effectiveness of satellite optical data for crop mapping with moderate [21], [22], high [23], [24], very high resolution products [25], [26], and with their fusion [27], [28], using the bands or arranging them in optical indices [29], [30], with object-or pixel-based approaches [31], [32]. However, the usability of optical satellite data is strongly hampered by illumination and weather conditions since they can acquire images only in daytime and in cloud-free sky states. On the contrary, microwave sensors are generally not affected by weather conditions, clouds, and sun illumination, and with the advent and the evolution of synthetic aperture radar (SAR) sensors, the geometric resolution of the imagery is becoming even closer to the optical one.
Several studies describe crop mapping results carried out using L- [33], [34], C- [35], [36], and X-band [37], [38] over the integration of multifrequency sensors [33], [39]. Polarimetric SAR data and the related polarimetric-SAR technology were revealed to be breakthroughs for classification since they can benefit from up to four polarizations and a set of different decomposition algorithms that can derive new features from the whole scattering matrix to be used by the classifiers [40], [41].
Machine learning techniques have been applied in crop mapping with satellite data, for example, using support vector machines [46], [47], random forest [48], [49], or artificial neural networks [12], [50]. Deep learning (DL) is a subset of machine learning based on artificial neural networks related with an unbounded number of layers of bounded size, permitted to be heterogeneous and to deviate widely from biologically informed connectionist models [51]. The performances demonstrated by DL approaches in many areas of image processing have generated considerable interest in the extension of DL techniques to the entire universe of remote sensing, including features extraction [52], change-detection [53], and data fusion [54].
DL techniques were used for classification exploiting optical imagery with subdecimeter resolution [55], [56], very high spatial resolution [57], [58], hyperspectral [59], [60], and multispectral satellite data [61], [62]. SAR image classification using DL was described in [63], with polarimetric data in [64] and [65] and in integration with optical data in [66]. Ensemble and DL techniques have been proved to outperform other machine learning techniques such as support vector machines [67], [68] since they take advantage of the redundancy in the number of classifiers to decrease the variance of the estimation error.
Crop mapping is another important task that can benefit from the application of DL techniques with optical [69], [70] and the fusion/integration of optical and SAR data [66], [71], [72]. Few manuscripts describe the results from the use of DL techniques with SAR imagery for land and crop classification [73], [74], despite the good classification accuracy and efficiency [75]. In [76], Hirose used DL to conduct several pioneering works on land use classification with SAR.
A convolutional neural network (CNN) is a class of DL able to learn high-level context features through a large number of neurons arranged in multiple architectures. Especially, 3-D convolution can take into account the radiometric, spatial, and temporal components of a multitemporal stack of satellite scenes in a more delicate and rigorous manner, other than a direct concatenation of reflectance or backscatter images [77]. The main flaw in the use of CNN is represented by the high number of samples required during the training.
The present study evaluates the performance of CNN applied to in-season early crop classification of an agricultural area in the center of Tuscany, Italy, during 2020-2021. The test area is interesting for crop classification due to the small dimension of the fields and the similarity in life cycles of some crops that can be easily confused by the classifiers. X-band satellite data, purposely tasked and collected by means of the Italian Space Agency (ASI)'s COSMO-SkyMed constellation in StripMap PingPong mode, were used. The novelty of this work lies on 1) the use of X-band SAR backscatter alone for early crop mapping, in particular exploiting an unprecedentedly long and continuous time series of dual-polarization data; and 2) the analysis of the marginal gain obtained by postponing the production date of the classified map, which technically consists of an increase in the available satellite SAR scenes used for classification. Several tests were done to identify and assess the best CNN architecture, the different performance of the classifiers depending on the time of maps delivery, and the difference in accuracy attained using single-polarization versus dual-copolarized data.

A. Test Site and Ground Data
The selected test site extends 270 ha and is located south of Ponte a Elsa (43°41 20.37 N, 10°53 42.38 E), a small town in the Tuscany region, Italy, divided between the municipalities of Empoli (Florence) and San Miniato (Pisa), as shown in Fig. 1.
The area is mainly suited to viticulture and olive growing, and annual herbaceous crops like maize, sunflower, sorghum, wheat, and legumes are a secondary source of income for the small local farms, which generally do not exceed a few tens of hectares. The selected agricultural plots belong to five different farms and are located on the plain along the Elsa riverbed, surrounded by low and mild hills. Selected plots are always bigger than 1 ha with an average of approximately 3 ha. They have an irregular shape and are delimited by moats or shrubs. A criterion for the selection of those fields was the homogeneity in terms of species, soil type, and cultivation.
The plots were surveyed during several measurement campaigns carried out in spring 2020 and 2021 using the SMASH Field Data Collection mobile application [78], which allows the collection of georeferenced pictures with notes attached that can easily be exported in any vector format.
Herbaceous crops in the Ponte a Elsa test site are always annual and can be grouped in winter (wheat, rapeseed, and fava bean) and summer crops (sorghum, corn, and sunflower) that follow the annual field's rotation. Vineyard, olive tree, and pasture can be considered multiyear crops.
Winter species are generally seeded in autumn before the year of harvesting, i.e., wheat harvested in mid-June 2020 was seeded approximately in October 2019. Summer species are seeded the same year of harvesting, i.e., the sunflower harvested at the end of August 2021 was seeded approximately in April 2021. Pasture class consists mainly of forage crops, like alfalfa, clover, cat grass, oat, fennel, etc. These plots are grazed many times in a year. Finally, the uncultivated class is represented by natural vegetation that invades the fields during the year of rest from agricultural cultivation.

B. Satellite Synthetic Aperture Radar (SAR) Data
X-band SAR data from the COSMO-SkyMed (CSK) constellation [79] were collected for this research by ASI during a tailored monitoring campaign designed in 2019 and carried out in the framework of the project "ALGORITMI" [80].
StripMap PingPong (CSK-PP) images were acquired in rightlooking mode along ascending orbits (at ∼05:00 A.M. UTC), using alternating polarization HH and VV. The beam mode PP_12 was used, with an off-nadir angle ranging between 37.9°( near range) and 39.7°(far range) and an incidence angle between 42.6°and 44.4°, respectively. Following two periods of acquisitions were selected for the analysis: 1) from 1st January 2020 till 31st August 2020 and 2) from 1st January 2021 till 31st August 2021. Dates of imagery are provided in Table I. CSK-PP scenes were accessed as single-look complex slant range products (Level 1A), then multilooked to obtain squared pixel maps. The Ponte a Elsa test site lies on a flat plain, and thus, its backscatter is not influenced by orography. Anyway, terrain correction has been applied considering the formulas described in [81] and [82], using the 10-m resolution digital elevation model provided by Regione Toscana [83]. As a result, backscatter maps with a pixel dimension of 10 m, with the produced UTM 32N (EPSG 32632) projection, and are finally despeckled by means of a Kuan filter (window size = 3×3 pixels, equivalent number of looks = 1).
CSK-PP products exploit an incoherent dual-polarization mode since the phase link between two polarimetric acquisitions is not preserved [84]. While this prevents the possibility of obtaining useful polarimetric features like alpha and entropy from a dual-polarization decomposition, as those derived in [43] and [85] using TerraSAR-X, this dataset provides sigma naught backscatter maps in both VV and HH polarization (as those shown in Fig. 2), which was the key parameter used in the research.

C. Classification Algorithm
Since the classifier is trained to recognize crops on the basis (especially) of temporal trends of backscatter and species from winter classes tend to have similar yearly backscatter temporal trends (the same for species from summer classes), this represents the first difficulty for classification. In this research, the differences in temporal trends inside crops from the winter macroclass (or summer macroclass) are led mainly by the amount of biomass per square meter, plant water content, and by the different structures of the plants [86]. The inclusion of uncultivated (fallow) and pasture classes acts as another burden for classification due to its high heterogeneity in biomass, floristic composition, and dates of grazing or harvesting.
Furthermore, the entire 2020 and 2021 ground truth datasets were strongly unbalanced in terms of the number of pixels per class for both years as shown in Table II, although the variability of crops from one year to another depends on farmers' practice and decisions that cannot always be predictable.
On the other side, efforts were made to ensure a robust field data sample to cover each crop class through as many regular field visits as they were possibly allowed during a period of pandemic emergency and related health protection measures.
Eight following time steps were selected for the research, namely February, March, April, May, June, July, August, and September. For example, classification carried out in August consisted in producing a crop map feeding a classifier with a time series of SAR images spanning from January to July.
This article is focused on ensemble classifiers based on CNNs to perform crop classification over multitemporal CSK-PP imagery. An ensemble classifier is composed of three independently trained CNN classifiers. For a given input, the CNN classifiers produce a similarity score for each possible output class; the ensemble classifier output corresponds to the class that has the highest cumulative similarity score. Each CNN classifier operates on an input feature vector and provides a corresponding class as output. The input feature vectors are defined starting from the multitemporal image stack depicted in Fig. 3, where the acquisitions are preliminarily coregistered and sorted by polarizations (fast variable) and acquisition dates (slow variable).
In this experiment, the comparison of two CNN architectures that work on different kinds of input feature vectors as described  In the case of 1-D, this behavior is implicitly given by the size of the input feature vector; for the 3-D, this is accomplished by considering a 3-by-3 patch centered on each image pixel and assigning to it the classification output. The patch dimension, both for 1-D and 3-D classifiers, is a tradeoff between limiting the misclassification along the parcels' borders and grasping the contextual features.
The parameter values adopted in the previous layers are listed in Table III. The convolutional layer is responsible for linearly filtering the input to extract the features for the subsequent layers.
The batch normalization layer is introduced to stabilize the training process making it less dependent on the values of each minibatch in the learning phase. The rectifier linear unit (ReLu) layer is a common choice in the supervised classification of images [66] due to its efficient implementation and plausibility with the underlying signal model. The role of the max pooling layer is to reduce the number of outputs at each stage by preserving the local maxima of its input.  Independent of the architecture, the CNN classifiers of an ensemble share the following multilayer structure: 1) input layer; 2) four convolutional blocks; 3) first fully connected layer; 4) second fully connected layer; 5) softMax layer; Each convolutional block is composed of the following: 1) convolutional layer; 2) batch normalization layer; 3) ReLu activation function; 4) max pooling layer; 5) dropout layer. The dropout layer aims at reducing the overfitting by ignoring some of its inputs randomly chosen. The fully connected layers synthesize the output of the convolutional blocks by interconnecting all their inputs. Finally, the softmax layer provides the similarity score for each of the possible output classes. It is necessary to remark that the structure of the CNN classifiers in an ensemble classifier is the same, being the output size of the first fully connected layer the only difference (see the last column of Table III); this setting has been successfully adopted also in [66] and [87].
The training of the CNN classifiers is performed by means of the adaptive moment estimation algorithm [88], which is a common choice for this kind of problem. The minibatch size and the number of epochs are set to 128 and 100, respectively, with shuffling at the end of each epoch to reduce the overfitting. The initial learning rate is set to 10 −3 with a dropping factor of 0.1 every 20 epochs; the cross entropy with L_2 regularization is used as a loss function to avoid divergent behavior. A stratified threefold partitioning of the dataset was used in all experiments. In turn, two partitions are first oversampled by means of Synthetic Minority Oversampling Technique plus Tomek Links (SMOTE+TL), [88] to deal with the dataset unbalancing and then used to train an ensemble classifier from scratch; the remaining partition is used for testing. The presented results and related statistics are computed over the union of the testing partitions.

III. RESULTS
The results of early crop mapping using CSK-PP data and CNN for 2020 and 2021 are shown in Tables IV and V. Two metrics are used to describe the goodness of classification in this article, namely overall accuracy (OA) and producer accuracy (PA). The former is defined as the ratio of the number of correctly classified pixels over its total number; the latter is defined as the number of pixels correctly classified to a specific class over the total number of pixels of that class.

A. Achieved Classification Performance in Terms of OA
In 2020, two big increases are found in OA of crop classification in March (32.5%) and in April (almost 25%) when using 1-D with the combination of HH and VV (hereafter called 1D-HH+VV), when two and four SAR images, respectively, were available. Using a 3-D classifier, the highest increase in OA happens in March, with a difference above 33% using HH or VV (these combinations will be called, respectively, 3D-HH and 3D-VV hereafter) and above 31% using HH+VV (hereafter called 3D-HH+VV).
In 2021, a big increase was found in OA, above 30% in June using backscatter from only HH or VV and 1-D (these combinations will be called, respectively, 1D-HH and 1D-VV hereafter). For 1D-HH+VV this increase is approximately 27% and occurs earlier, i.e., in May, when ten and six images are respectively available. Using a 3-D classifier, the boost in classification performance is immediately higher over time with respect to 1-D until June. With 3D-HH+VV the highest increase of OA over time happens in March and is above 30%, achieved using only two SAR images. With 3D-HH and 3D-VV, the highest increase of OA occurs later, i.e., in June.
In 2020, VV was found slightly more semantic for crop classification with respect to HH from March, with an average gap of approximately 2% for 1-D. HH+VV always provides better classification performance with respect to the use of backscatter from a single polarization, with both 1-D and 3-D. The advantage of HH+VV increases until April with 1-D, when a difference above 30% OA is noticeable with respect to using backscatter from one polarization only. In the case of 3-D, the highest difference in OA of classification attained using 3D-HH+VV is already noticeable in February, with a better performance of approximately 20% with respect to classification carried out using 3D-VV or 3D-HH. 3D-HH+HV provides the best classification results for both years, as shown in Tables IV  and VI, ELSA TEST SITE ATTAINED IN  EIGHT MONTHLY TIME STEPS IN 2021 WITH TWO CNN: 1-D AND 3-D AND  BACKSCATTER FROM COSMO-SKYMED STRIPMAP PINGPONG DATA IN HH, VV, AND HH+VV Using 3-D, the difference in OA between classifications carried out using only HH or VV can be negligible, apart from April and May. In 2021, HH+VV allows the best OA to be achieved in crop classification when the advantage of using backscatter from both the polarizations increases until May using 1D-HH+VV. In this case, a difference of 35% of OA is noticeable with respect to using 1D-HH and above 25% using 1D-VV.
In the case of 3-D, a big difference in classification between using backscatter from both polarizations or a single one is noticeable already in March and April, with a better performance of approximately 20% of OA attained with respect to classification carried out using 1D-VV or 1D-HH. The difference in OA using HH+VV versus HH or VV drops dramatically after May using both 1-D and 3-D.  The OA in classification attained in 2020 is generally higher than the one in 2021, with the biggest difference reached in March or April. This is found although ten classes were required to be recognized in 2020 and eight in 2021. This assertion is valid generally until June when nine images were available in the 2020 dataset and ten in 2021. In this month, crop classification in 2020 and 2021 begins to attain the same OA with all polarizations, both for 1-D and 3-D, respectively.

B. Achieved Classification Performance in Terms of PA
The analysis of PA attained in the classification of crop types can help in the explanation of OA for each time frame, backscatter polarization, and CNN configuration. We first notice that all the tested classifiers exhibit a consistent behavior since the PAs in all classes improve and get closer as the deadline of the observation is postponed. Hence, increment in the OA is due to an improvement in all classes and not to improvements in the most represented classes only.
In 2020, sunflower, wheat, and, also, vineyard are the worst recognized classes at the beginning of the season. Uncultivated is a class that is easily confused especially at mid-season, along with vineyard at the end. The classes that are more recognizable are generally corn and sorghum and then wheat and rapeseed at the end of the season. PA of all the classes exceeds 90% late in the season, except for vineyard and uncultivated with 1D-HH and 1D-VV. In 2020 PA generally exceeds 90% just in June using 1D-HH+VV and 3-D with all the combinations of polarizations. The class that earlier exceeds or gets close to 90% of PA (just from March) is corn. Using 1D-HH and 1D-VV, the highest increment in PA is generally attained in June while using 1D-HH+VV and 3-D, this increase generally occurs earlier, i.e., in March, especially for corn, sorghum, pasture, vineyard, and olive tree. The increase in PA gained moving from HH or VV to HH+VV with 3-D is more limited. Moving from 1-D to 3-D, a noticeable amelioration in PA of all the classes is noticeable, especially for wheat.
In 2021, the worst recognized class at the beginning of the season is uncultivated and its recognition remains problematic for the whole year. The same is found in the vineyard. Also, corn and wheat are easily confused by the classifier at the beginning of the season. Over time, corn becomes one of the best identified classes along with olive tree and wheat. Again sorghum was revealed to be a crop easy to be identified by the CNN-based classifiers. With respect to 2020, corn is better identified at the end of the season. Using 1D-HH+VV and 3-D, PA of all the classes pass 90% just in June and in August or September 2021 using 1D-VV. The class that earlier exceeds or gets close to PA 90% is sorghum, especially with 1D-VV (already in July) and 3D-HH+VV (already in April).
Using 1D-HH and 1D-VV the highest increment in PA is generally attained in June, whereas using 1D-HH+VV, this increase generally occurs earlier, i.e., in May and in March or April using 3-D. Passing from a classification using backscatter from single polarization to HH+VV with 1-D, especially uncultivated benefited, instead of using 3D-HH+VV, the threshold of PA 70% is reached about a month earlier. Moving from 1-D to 3-D, a noticeable amelioration in PA of all the classes is noticeable especially from March to April.

IV. DISCUSSION
The aim of the project was to demonstrate that X-band dualpolarization SAR data can be effectively used to produce an early map of crops in a rural area of central Italy. To fulfill this purpose, we immediately opted for the use of time series of SAR satellite data collected at a high temporal resolution, to avoid the issue of gaps due to cloud cover that often affects optical data [92]. Briefly, priority was given to the temporal regularity of SAR satellite acquisitions over the spectral richness given by optical sensors.

A. Performance and Limitations of the Proposed Method
Being able to benefit from dual-polarization HH and VV, X-band, SAR backscatter imagery, like those coming from CSK-PP and a classifier based on CNN, arranged in two different architectures (1-D and 3-D), several tests were carried out on the possible combinations of satellite data acquired in Spring/Summer 2020 and 2021 with classifier configurations for early-season crop classifications.
The use of backscatter from both the polarizations always provided the best classification OA for all the eight monthly time steps (from February till September). When using only one polarization, like for other CSK imaging modes such as StripMap HIMAGE (albeit a much better spatial resolution) and having the choice of selecting imagery among the co-polarized, the VV is preferable. A possible explanation for the best classification performances of backscatter in VV polarization can be its highest sensitivity toward the vertical elements of plants, such as stems or trunks, that often constitute a big percentage of the entire vegetation biomass, as explained in [86] and [93], whereas backscatter in HH polarization is more influenced by soil moisture and roughness [94], which could represent sources of uncertainty for the classifier. The same results have been described in the literature for X-band [95] and C-band [96], and L-band backscatter allows slightly better crop classification results in HH polarization [96], but L-band scattering mechanisms on crops are not comparable with those in X-band [97]. Regarding the classifier architecture, the 3-D almost always showed better performance with respect to 1-D, except for February 2021, but again the OA of crop classification level marked in this month is very low. The 3-D-based classifiers achieved better performance due to the convolution performed on patches that also consider spatial information. Only the combination 3D-HH+VV permitted to attain the earliest crop classification accuracy above 80% already in April 2020 and in May 2021, and the earliest OA above 90% already in May 2020 and June 2021. Anyway, 90% in OA is always attained from June when using a 3-D classifier, with both single-polarization backscatter and HH+VV (and with 1D-HH+VV).
Based on this two-year-long experiment, the beginning of May, which is equivalent to a total amount of six/seven available images with an average of one/two images per month, emerged as the most likely deadline to achieve an encouraging result of early crop mapping on an agricultural test site with eight/ten classes by means of dual-polarization X-band SAR satellite data and CNN technology. In May, the winter species such as fava bean, rapeseed, and wheat are in full vegetation, although they did not reach the peak of growth, whereas the fields that will host the summer species are generally prepared for seeding. The classifier is therefore capable of carrying out a satisfactory classification already when winter species are in the stem elongation phase and the summer species are not sprouted yet and, indeed, these plots are still bare and smooth.
The difference in OA between the same months from two consecutive years of experiment (i.e., May 2020 and May 2021) dramatically decreases after April for 1-D and after March for 3-D for all the configurations of polarization and architecture. Classification OA in May 2020 and 2021 is very close, this is another reason to consider May as the ideal deadline for early classification. Thanks to the adoption of the oversampling algorithm, the negative effect of an unbalanced dataset was strongly reduced since it anticipates the recognition of the less represented classes right away.
The main parameters that play a relevant role in class recognition are the characteristics of the vegetation itself like phenology, structure, density, biomass, and turgor, but its influence on OA falls out of the scope of this research. Anyway, the use of a 3-D classifier and backscatter from both the polarizations tends to mitigate the differences in PA of the classes and among the years, since it copes with the spatial information coming from the 3 × 3 kernels of convolution and the spectral information coming from both the polarizations. These are other elements suggesting the use of 3D-HH+VV as the best performing approach to use.
The reasons for the weak PA of vineyards in 2020 and 2021 need to be further investigated. A possible explanation may be found in the influence of soil parameters, such as moisture and roughness, that overtake the effect of vegetation in this class.
Indeed in a vineyard, the canopy cover is always incomplete and a large portion of the soil underneath the plants can be targeted by the SAR sensor. Instead, the reason for the weak PA of uncultivated in 2020 and 2021 is due to its high intraclass heterogeneity.

B. Comparison With OA Achieved by Existing Methods
As already stated, an OA close to 99% in crop classification was attained in July using a stack of about ten scenes and 3D-HH+VV in our research. This result is among the best among those obtained in other works in which only SAR data were used for the classification of agricultural areas. For instance, in [35], a maximum OA of 93% is obtained in the recognition of 9 classes using dynamic conditional random fields and a long series of 45 images from Sentinel-1.
In [36], Useya and Chen achieved an OA of 99% using 30 Sentinel-1 scenes and a random forest-based classifier. In [38], Sonobe et al. used a very similar satellite dataset to ours, composed of 16 X-band dual-polarization HH and VV scenes to recognize six classes and the highest OA achieved was 95% using a support vector machine. In [37], by using ten X-band dual-polarization HH and VV scenes and aiming at recognizing eight classes, Sonobe attained an OA of 92.1% using a multiple kernel learning-based classifier. The results we obtained using only SAR data are good enough to be compared with other studies using optical sensors [69] or the integration/fusion between optical and SAR data [74].
Many published papers referring to "early mapping" of agricultural areas actually confuse this concept with the classification carried out using few images, while a proper early crop mapping is accomplished using the scenes concentrated at the beginning of the crop season. In [89], Kingma and Ba carried out an early mapping exercise of an agricultural area with seven classes using eight scenes from Sentinel-1 attaining OA = 92.9% using an artificial neural network-based classifier. In our experience, OA = 94.9% and OA = 98.5% were achieved, respectively, using seven and nine CSK scenes in 2020 with ten classes. Similar results were achieved using the integration of optical and SAR data like in [90], where OA = 93.7% was reached with the fusion of eight Sentinel-1 and two Landsat-8 images. In Moving to the comparison of our results with those from other studies related to crop classification using CNN, to the best of our knowledge, the literature seems to lack manuscripts describing the use of SAR satellite data. Although it is very difficult to compare works that aim to map very different test areas, it emerges that external results are in line or outstripped ours when (almost) the same number of classes are foreseen.
For example, in [73], Castro et al. aimed in recognizing 11 classes with 27 Sentinel-1 scenes, obtaining a maximum OA of 71.2%, even if they did not use the entire satellite dataset, but they trained and evaluated a classification approach based on CNN using the stacked features and the reference for the last image in the sequence, respectively. In [71], Adrian et al. aimed in recognizing 13 classes and they attained OA = 58.6% and OA = 81.2% using CNN-2D and CNN-3D, respectively, with a Sentinel-1 dataset.
The results obtained in our work with CNN using only SAR data are also satisfactory even if compared with those achieved through optical and SAR fusion. For example, in [66], Kussul et al. aimed in recognizing 11 classes using 15 Sentinel-1 and 4 Landsat-8 scenes, obtaining an OA of 93.5% with CNN-1D and 94.6 with CNN-2D, respectively. In [72], one of the experiments involved the joint use of Sentinel-1 and Sentinel-2 data and CNN, attaining OA = 87.7% for the recognition of 7 classes and OA = 87.5% for the recognition of 12 classes. Again, in [71], Adrian et al. attained OA = 94.3% with the fusion of Sentinel-1 and Sentinel-2 imagery, in the recognition of 13 classes.

C. Choice of the Metric
We conclude this section by discussing the choice of the metrics. OA is by far the most used metric in the literature for classifier comparison. Other global scores such as the area under the curve, F1-score, Matthews Correlation Coefficient, and Cohen's kappa (K) [98] are less frequently used in multiclass classification problems. They were computed on the proposed experimental results, showing very similar rankings to OA. Since OA is strongly dependent on the dataset balancing [99] we opted to support it with the best and worst PA to provide the reader with a range of performance while preserving a concise statistical description.

V. CONCLUSION
An agricultural test site in the country of central Tuscany, Italy, was selected and two years (2020 and 2021) of in situ measurement campaigns were carried out on a regular basis in winter, spring, and summer to gather information on species cultivated on more than one hundred plots each year. Backscatter from dual-polarization HH and VV X-band satellite SAR data from the COSMO-SkyMed constellation were acquired from January to September. Backscatter and in situ data were used for the training and validation of a crop classifier based on CNN arranged with two different architectures (1-D and 3-D). Ten crop classes in 2020 and eight in 2021, and eight monthly time frames (from February to September) were selected to test the improvement of crop classification over time and the increase of image availability.
Results showed that 3D-CNN structure along with the combination of backscatter at both the polarizations provides the best OA, especially during the first months of the year, i.e., OA is close to 90% in April 2020 and above 80% in May 2021. Nevertheless, the beginning of May, which is equivalent to a total amount of six/seven available images, with an average of 1or 2 images per month, emerged as the most likely deadline for encouraging results of early crop mapping. In case of only single polarization X-band data are available, VV is preferable with respect to HH.
Further efforts need to be spent to explain the influence of parameters like phenology, structure, density, biomass, and turgor on classification using X-band data and CNN classifier, along with the poor PA marked by vineyard and uncultivated.

ACKNOWLEDGMENT
The authors are grateful to the farms "Bandinelli Rino, Enzo e Claudio Società Agricola," "Fattoria Bini, Empoli," and the cultural association "Associazione Archeologica Volontariato Medio Valdarno" for having hosted and actively supported fieldwork activities carried out in their fields. In 1999, he attended the Course of Specialization and Advanced Training in the University of Florence. Since 2001, he has been a researcher with the Institute of Applied Physics (IFAC) "Nello Carrara" of Italian National Research Council (CNR), Florence, Italy. He has been involved in international research projects funded by the European and Italian Space Agency on remote sensing topics and has authored or coauthored 10 papers in peer-reviewed journals and books and a total of nearly 50 international publications. His research interests include the processing of data acquired by satellite platforms and remote sensing data applications. Trade Assessment by Remote Sensing Investigation) and SIASGE (Definition of products at X and X+L bands for SIASGE support). He is a coinvestigator in a regional project (Hydrocontroller) for the monitoring of hydrologic risk, and in an international project for sustainable water management for the economic growth and sustainability of the Mediterranean region (OPTIMED-WATER) in the frame of FP7 of the European Union. He is also involved in the ASI project METEMW that aims to develop innovative algorithms for the retrieval of hydrological parameters. He is the author or coauthor of 100 papers published in international peer-reviewed journals and conference proceedings.
Emanuele Santi (Senior Member, IEEE) received the M.S. degree in electronic engineering from the University of Florence, Florence, Italy, in 1997, and the Ph.D. degree in Earth's remote-sensing techniques from the University of Basilicata, Potenza, Italy, in 2005.
Since 1998, he has been a Researcher with the Microwave Remote Sensing Group, Institute of Applied Physics, National Research Council, Florence. He was and is currently involved in many national and international projects funded by the Italian Space Agency (ASI), European Community, European Space Agency, and Japanese Aerospace Exploration Agency, acting as a Team Leader, a WP Leader, and a coinvestigator. He authored or coauthored 168 articles, published in ISI journals, books, and conference proceedings (source Scopus). His research interests include the development and validation of models and inversion algorithms based on machine learning for estimating the geophysical parameters of soil, sea, snow, and vegetation from microwave emission and scattering.
Dr. Santi is a member of the "Centro di Telerilevamento a Microonde" (Microwave Remote Sensing Center), the Vice-Chair of the IEEE GRS Chapter CNI-29, and Conference Chair of the SPIE Europe Remote Sensing conference RS-106. In 2020, he was the Chair of the 16th Symposium on Microwave Radiometry MicroRad. In 2018, he was the recipient of the IEEE GRSS J-STARS Prize Paper Award for the best paper published in the J-STARS journal in 2017. He is interested in the integrations of open source GIS, field data collection, and remote sensing techniques, in arid and semiarid environments, to produce the land cover and land cover change maps mainly focused on rangelands and natural vegetation.

Giuliano Ramat
Simone Pilia (Senior Member, IEEE) is currently working toward the Ph.D. degree in monitoring physiological crop stress status by using remotely sensed data with the University of Basilicata, Potenza, Italy.
In particular, currently, his studies focus on the retrieval of natural parameters, in detail for snow soil and vegetation, by microwave emission and scattering. Since 2018, he has been with National Research Council (CNR). His research interests include microwave applications such as antenna design and microwave remote sensing. He won the Giampietro Puppi Award for doctoral thesis in physics and astrophysics in 2009. In the past he was active in quantum mechanics, statistical mechanics and teaching. Currently, he is with the Microwave Remote Sensing Group, Institute of Applied Physics "Nello Carrara" (IFAC), National Research Council (CNR), Florence. His research interests are in modelling of snowpack, soil and vegetation and application to retrieval of physical parameters by active and passive microwave remote sensing. He is referee for some international journals.