Classification of Different Irrigation Systems at Field Scale Using Time-Series of Remote Sensing Data

Maps of irrigation systems are of critical value for a better understanding of the human impact on the water cycle, while they also present a very useful tool at the administrative level to monitor changes and optimize irrigation practices. This study proposes a novel approach for classifying different irrigation systems at field level by using remotely sensed data at subfield scale as inputs of different supervised machine learning (ML) models for time-series classification. The ML models were trained using ground-truth data from more than 300 fields collected during a field campaign in 2020 across an intensely cultivated region in Catalunya, Spain. Two hydrological variables retrieved from satellite data, actual evapotranspiration (<inline-formula><tex-math notation="LaTeX">$ET_{a}$</tex-math></inline-formula>) and soil moisture (<inline-formula><tex-math notation="LaTeX">$SM$</tex-math></inline-formula>), showed the best results when used for classification, especially when combined together, retrieving a final accuracy of <inline-formula><tex-math notation="LaTeX">$90.1 \pm 2.7\%$</tex-math></inline-formula>. All the three ML models employed for the classification showed that they were able to distinguish different irrigation systems, regardless of the different crops present in each field. For all the different tests, the best performances were reached by ResNET, the only deep neural network model among the three tested. The resulting method enables the creation of maps of irrigation systems at field level and for large areas, delivering detailed information on the status and evolution of irrigation practices.

under their baseline scenario, total consumption of water will increase by 23% from 1995 to 2025. Monitoring total water consumption is particularly important in semiarid areas where freshwater resources are limited such as the Mediterranean region, where irrigation currently accounts on average for 69% [4], [5], and it goes up to over 84% for south and east-Mediterranean countries [6]. These percentages are projected to increase between 4% and 18% by the end of the century [7], [8], but important uncertainty factors such as population growth and food demand may raise this estimate to between 22% and 74% [9].
Monitoring, understanding, and improving the efficiency of irrigation practices is a fundamental step toward controlling and mitigating water demands. Precise knowledge of irrigation practices is needed to better constrain and reduce uncertainty in hydrological models that predict future trends for water use and account for the anthropological impact on the water cycle. At the local scale, it is important to have an overview of irrigation practices at fine spatial resolution for administrative and irrigation management purposes, such as monitoring irrigation water usage and optimizing the efficiency of irrigation. Field irrigation efficiency mainly depends on the irrigation system and the level of field modernization. Generally, irrigation systems used in semiarid areas include surface systems such as flood and furrow and pressurized systems such as sprinkler and drip [10]. Efficiencies vary between 90% for drip, 75% for sprinkler, and 60% for flood irrigation [11]: clearly, pressurized systems are more efficient. The low application efficiencies obtained by surface systems are mainly due to water losses associated with deep infiltration, soil evaporation, and flooding in some parts of the soil because of a poor flow design at the entrance of the furrow [12], [13].
Remote sensing is a unique and valuable tool, capable of addressing the lack of large-scale precise information over irrigation practices, and overcoming the limitations of analyses based on in situ observations, which are often prone to inconsistencies and gaps in the information collected. Current results in the field of remote sensing for irrigation practices featured the creation of global or regional scale maps of irrigated areas [14], [15], [16], [17], [18], [19], irrigation timings [20], [21], [22], and quantification of irrigation amounts at variable resolutions [23], [24], [25], [26], [27], [28], [29], [30]. In particular, for studies oriented on the mapping of irrigated areas, remote sensing data are often coupled with machine learning (ML) models, proving to be successful with both supervised [14], [15] and unsupervised approaches [16].
Among ML models for classification, one of the most popular algorithms for classification is random forest [31], widely used in the field of earth observation [32] both for land cover mapping [33], [34], [35] and irrigation mapping [14], [15], [18]. Recently, an adaptation of the Random Forest model for time-series classification was proposed by Deng et al. [36] and it is particularly useful for classification problems involving temporal changes. This model is often the baseline to test for new more performing ML models [37] such as Rocket [38], which has proven to be competitive with the state-of-the-art algorithms for classification of multivariate time-series [39], but with a much faster computational speed. Deep learning models have also been recently adapted to time-series classification [40]. ResNET [38] is a deep neural network model that presents a more complex architecture than traditional models, which allows it to adapt to complex tasks at the expense of computational power. Fawaz et al. [40] proved how ResNET outperforms traditional models for classification tasks in a comparison study performed using a large variety of multidisciplinary datasets. Moreover, in the field of irrigation mapping, Bazzi et al. [22] confirmed how deep learning models outperform traditional ML techniques.
The study of irrigation practices employs one or a combination of remotely sensed datasets from different sensors, from optical/thermal to active/passive microwave, at different spatial resolutions, but generally, two main hydrological variables are recognized to be key for the monitoring of irrigation practices: actual evapotranspiration (ET a ) and soil moisture (SM ) [41]. ET a has been widely assessed during the last decades through surface energy balance (SEB) models [42], [43], [44]. To be applied at field scale, SEB models require accurate land surface temperature (LST) data with sufficient spatial resolution. Novel methods have been recently developed to derive ET a and potential evapotranspiration (ET p ) at 20 m resolution using data from the European Commission's Copernicus program [45]. In particular, the method employed for this study consists in applying the two-source energy balance (TSEB) model [46] with data from Sentinel-2 and Sentinel-3 in combination with meteorological data forcing from the Copernicus climate data store (CDS) [47], [48]. This approach relies on downscaling Sentinel-3 thermal bands to Sentinel-2 spatial resolution using a data mining sharpener (DMS) approach [49].
Similarly, SM data from remote sensing has also been obtained at 20 m resolution through disaggregation techniques. Passive L-band microwave sensors are usually preferred as low resolution SM input since they are recognized to have the highest sensitivity to SM and lower signal-to-noise ratio with respect to active radar or optical sensors [50], [51], at the expense of spatial resolution, which is in the order of tens of kilometers. A common solution to overcome this limitation in terms of spatial resolution is the downscaling of the SM products with optical/thermal data, which provides land surface parameters at higher spatial resolution than their radar counterparts [52]. Disaggregation based on physical and theoretical scale change (DisPATCh) [53], [54] has been applied numerous times to disaggregate SM from passive microwave sensors to higher spatial resolution, through the use of optical/thermal products from MODIS and Sentinel-3 at 1 km [54] or Landsat at 100 m [55], [56]. A disaggregation at 20 m has been recently proposed [57] with the use of SMAP daily low resolution SM gridded at 9 km, Sentinel-2 normalized difference vegetation index (NDV I) at 20 m, and the enhanced Sentinel-3 LST products disaggregated at 20 m.
As noted by Massari et al. [41] there is still an open question on obtaining maps of irrigation systems from satellite data. So far, studies have been limited to mapping irrigated from nonirrigated areas or singular irrigation systems. Numerous studies have been performed to distinguish irrigated from rain-fed areas. At field level, Gao et al. [14] proposed to directly use a Sentinel-1 backscatter product to train two ML models (random forest and support vector machine) and detect differences in the satellite signal between irrigated and nonirrigated fields, reaching an overall classification accuracy of 81%. Similarly, Bazzi et al. [15] trained different ML models (random forest and a convolutional neural network) using Sentinel-1 backscatter signal and Sentinel-2 NDVI time-series, and reaching an overall accuracy of 94%. Passive microwave sensors also showed promising results in the detection of irrigation signals [58], [59] but their coarse resolution does not allow for detection at fieldlevel. Dari et al. [16] produced irrigation maps at 1 km resolution from SMOS and SMAP products disaggregated at 1 km, using an unsupervised clustering ML model and suggesting the need for high spatial resolution product to resolve the high spatial variability of irrigated areas.
Regarding the detection of unique irrigation systems, different studies have used deep learning to recognize the rounded shape of center pivot systems [60], [61], [62]. Moreover, a deep learning approach was recently employed by Liang et al. [63] to map contour-levee (flood) irrigation from aerial pictures. Despite these techniques being very effective in identifying a unique irrigation system, a general approach for creating maps of irrigation systems is still missing. To this date, there is no study (to the best of our knowledge) on creating maps of irrigation systems (i.e., sprinkler, drip, flood) at field level. These maps can have a wide application in the scientific community since they could replace the simplistic assumption of irrigation scenarios used in many land surface models (LSM), e.g., Noah [58], [64], [65], [66], or ORCHIDEE [67] models. Additionally, maps of irrigation systems could provide a useful tool for local policies, given that a complete and continuous overview of irrigation systems is lacking in many areas. These maps could give unprecedented insights to monitor the evolution of irrigation practices and promote and supervise the shift toward more sustainable and efficient irrigation methods.
In this context, we propose a novel methodology to produce maps at the field scale distinguishing between the three main irrigation systems, drip, sprinkler and flood, and also not irrigated fields. The hypothesis of this study is that differences between irrigation systems should be detectable by analyzing temporal patterns of actual evapotranspiration (ET a ) and soil moisture (SM ) at the field or subfield scale, through the use of a supervised ML model. It is expected that time series of remote sensing data reveal distinctive temporal patterns among different irrigation systems, given the large variation in the amount and timing of water applied for the different systems. The proposed methodology will be applied to a semiarid area of the Ebro basin (Spain) that is characterized by high variability of crop types and irrigation systems. The methodology will be then validated against both in situ data and independent administrative data, retrieved from statistical estimations at the district level and large surveys among farmers.
This article is organized as follows. Section II presents the study area, the field campaign, the administrative dataset, and remote sensing data used. The first part of Section III presents how time-series of remote sensing data are prepared, the additional crop classification of the ground truth dataset, the ML methods used, the final postprocessing applied after the ML classification and the metrics used for evaluation of the performances. A second part of Section III then introduces the different experiments performed in this study. Section IV presents the results in classifying irrigation systems using different remote sensing variables and ML methods. Finally, Section V concludes this article.

A. Study Area
The selected study area (41.28-42.02 N, 0.27-1.3 E) is located in the northeast of the Iberian Peninsula, in the province of Lleida (Spain). The climate in the region is typically Mediterranean with an average annual precipitation and reference evapotranspiration (ET 0 ) of 350 and 1100 mm, respectively. Irrigation in the area usually starts in mid-March and lasts until the end of November. The area is densely irrigated, with a variety of different techniques depending on the degree of modernization and the water allocation for the respective irrigation district: from traditional irrigation systems based on flooding techniques to more recent and efficient techniques that use sprinkler or drip irrigation.
The study area is divided into eight irrigation districts, covering a total surface of around 3000 km 2 . An overview of the study area is provided in Fig. 1. Irrigation practices vary depending on the seasonal water allocation, the different crop types, and the modernization level of each irrigation district. As an example, the "Canals d'Urgell" district is one of the oldest districts in the area, and irrigation is mainly performed through flooding. Farmers have full water availability throughout the growing season, but irrigation is performed in turns every 15-20 days. On the other hand, Algerri-Balaguer is a modernized pressurized irrigation district with a water allocation of 6000 m 3 /ha. Crops are mainly irrigated by sprinkler or drip irrigation. The Garrigues Sud district, located in the southern part of the study area, has a seasonal water allocation of around 1300 m 3 /ha, affecting, therefore, the type of crops grown (mostly olives and almonds) and irrigation practices (sustained deficit irrigation). The region has various types of cultivated crops, that can be grouped into: winter cereals (accounting for around 34% of the total area according to administrative databases), maize (accounting for around 7%, but not considering the percentage of maize growing as a second crop after winter cereals), fruit and nut trees (14%), vineyards (1%), and olives (9%).

B. Field Campaign
A field campaign was performed inside the study area during 2020, in order to collect a dataset of ground truth samples of fields with different irrigation systems and crop types. More than 300 fields were classified using four different labels: sprinkler, flood, drip, and nonirrigated. Fig. 1 shows how the samples were randomly distributed across all the irrigation districts considered, in order to have a representative dataset for the area. Table I summarizes the different irrigation systems of all the fields collected. It is possible to notice how approximately the same number of samples was collected for each irrigation system, ensuring a balanced dataset. In addition, each field was initially classified also by crop type by visual inspection and grouped in different classes: winter cereals, maize, alfalfa, olive, vineyards, or fruit and nut trees. Collecting these two variables provided an overview of the relationship between irrigation systems and crop types, which was needed to ensure the collection of a complete dataset representative of the different typologies of fields in the area.

C. SIGPAC-DUN Administrative Dataset
An administrative database, SIGPAC-DUN (Sistema d'informació geográfica de parcelles agricoles), was used in order to verify and expand the information about crop types collected during the field campaign, and to extract the exact shape contour of each field. SIGPAC-DUN is provided by the Catalan Ministry of Climate Action, Food, and Rural Agenda and it contains a large variety of spatial and alphanumeric information about agricultural practices at parcel level, with yearly updates. Most of the information contained in SIGPAC-DUN is directly submitted by the farmers through an annual agrarian declaration of cultivated crops and it gathers multiple details over the usage of the fields such as crop type. Parcel's shapes are also contained in SIGPAC-DUN, created from cadastral maps and image interpretation [68]. The dataset also contains information on presence of irrigation, indicated as a percentage of the field subject to irrigation (from 0 to 100), exclusively based on administrative data that indicates if irrigation is installed in the field, not its actual usage.
Table I summarizes all the samples collected during the field campaign divided by irrigation systems, as detected during the campaign, and crop types, as indicated in SIGPAC-DUN.
Despite being very detailed, only the latest version of the dataset, corresponding to the year 2021, contains information about two important aspects of the fields: which secondary crop type (if present) was cultivated during the year and what irrigation system (if present) was installed in the field. For this reason, when using SIGPAC-DUN it is not possible to catalog the fields with double crops when a second crop type is present during the year. When comparing the systems of irrigation declared in the latest SIGPAC-DUN database against the ground-truth dataset collected during the field campaign, a discrepancy of around 10% was found between the two datasets (33 out of 332 collected fields for 2020, as shown in Fig. 2). This discrepancy seems to suggest that SIGPAC-DUN reflects an outdated catalog of irrigation systems, since most of the misclassification between SIGPAC-DUN and the ground-truth dataset are between traditional flooding systems or not irrigated fields for SIGPAC-DUN and modern irrigation systems for the ground-truth dataset. This suggests that a process of modernization of the irrigation systems is taking place in the area, but it is not registered. As a matter of fact, Fig. 2 clearly shows that the highest discrepancy is found for the ten fields misclassified as flood by SIGPAC-DUN, which are in reality sprinkler systems.

D. Remote Sensing Data
Various remote sensing products were evaluated as potential inputs for the classification task, derived from different satellite data. ET a and SM at the subfield level (20 m) were the main hydrological variables considered, but additional variables were also considered in order to evaluate their feasibility in the classification task. These inputs can be broadly grouped into two categories, called Level 4 (L4) and Level 2 (L2) variables. A general overview of these variables is presented in Table II. L4 variables were estimated by the combination of multiple satellite data into different models, in order to obtain a set of more detailed hydrological information with a unified spatial resolution of 20 m, while L2 variables represent data directly retrieved from the satellites at their original processing level. L4 variables are: actual evapotranspiration (ET a ), DisPATCh soil moisture (SM ), crop water stress coefficient (K s ), and Sentinel-2 leaf area index (LAI). K s was considered given the proven strong link between this stress index and root-zone water depletion [72], which could provide new additional information about the field water content at a different depth level than surface SM . LAI was instead selected to test if a variable only related to vegetative growth could perform well in the task of classifying irrigation systems.
ET a estimates were obtained with the Priestley-Taylor twosource energy balance (TSEB-PT) model using Copernicusbased inputs [42], [46], [73] and following the methodology described by Guzinski et al. [47] which produced and validated a 20 m ET a product derived by applying TSEB-PT to remotely sensed data. The main input data required to run the TSEB-PT were retrieved from Sentinel-2 shortwave observations, Sentinel-3 LST , and ERA5 meteorological reanalysis data. Sentinel-2 images were used to retrieve the biophysical parameters of the vegetation through the Biophysical Processor [71]. These biophysical parameters were used to derive inputs needed in the TSEB-PT such as leaf area index (LAI), leaf optical properties, and transmittance. High-resolution thermal data at 20 m was retrieved by applying a data mining sharpening algorithm [49] to Sentinel-3 SLSTR LST images at 1 km, using shortwave multispectral data from Sentinel-2 [47], [48] as a higher resolution proxy. Meteorological parameters were retrieved from the ERA5 meteorological reanalysis, which delivers an hourly product gridded at 0.1 • . The required meteorological parameters are: air and dew point temperature at 2 m, wind speed at 100 m, surface pressure, and total column water vapor. Finally, ancillary data such as vegetation height, and leaf inclination angle were set based on a land cover map obtained from the Copernicus Global Land Service and a lookup table. Instantaneous energy fluxes at the satellite overpass were upscaled to daily water fluxes, expressed in units of mm/day, by multiplying the instantaneous ratio of latent heat flux over solar irradiance by the average daily solar irradiance [74].
Crop stress coefficient (K s ) was retrieved from the ratio of ET a and potential evapotranspiration (ET pot ). In order to be consistent with the TSEB-PT, the two-layer Shuttleworth-Wallace (SW ) model [75] was used to estimate the latter. The theoretical base of the SW model was provided by the Penman-Monteith energy combination equation, which has two parts: one for the soil surface and another for the plant surface. ET pot was computed with the SW model by setting a minimum stomatal resistance value of 100 sm −1 .
SM was created from the disaggregation of low-resolution original data employing the DisPATCh algorithm. DisPATCh uses a semiempirical soil evaporative efficiency model and a linearized relationship between soil evaporative efficiency (SEE) and low resolution SM to perform the disaggregation [54]. Differently from the classical version of the algorithm, a modification of DisPATCh for areas under high vegetation cover was added to the classic DisPATCh algorithm, as proposed by Ojha et al. [76]. For this study, SM was retrieved from the disaggregation of the original SMAP enhanced L2 SM product (L2_SM _P _E) gridded at 9 km. The Sentinel missions provided the high resolution optical and thermal data: NDV I maps at 20 m were extracted from the combination of bands 4 and 8 A of the MSI instrument from Sentinel-2, while thermal maps were retrieved from the Sentinel-3 mission and sharpened at 20 m using Sentinel-2 reflectances bands. A digital elevation map (DEM ) at 30 m from the Shuttle Radar Topography Mission was also used in order to account for topographic effects during the disaggregation process.
The considered L2 variables were: SMAP SM , Sentinel-3 LST , and Sentinel-2 NDV I. SMAP SM is the enhanced L2 passive SSM product (L2_SM _P _E) from the SMAP mission, gridded at 9 km [77]. LST is the L2 product from the SLSTR instrument on-board the Sentinel-3 satellite, which delivers daily 1 Km data [78]. NDV I was produced by combining band 4 (visible) and band 8 A (near-infrared) from the Level 2 product of the Sentinel-2 satellite, with a 20 m resolution and a temporal resolution of around five days.

A. Time Series Data Preparation
Annual time-series were extracted for each pixel of each field of the ground truth database. Three different years, 2018, 2019, and 2020 were used to create three annual time-series per pixel. While data on irrigation systems were only collected during the field campaign of 2020, no changes in irrigation systems were assumed for the two previous years: an assumption that was confirmed for the majority of the fields by inquiry with farmers and/or professionals working in the area. Considering multiple years is beneficial to: 1) substantially increase the ground truth dataset; and 2) allow the models to learn and generalize from a larger dataset, more diverse in terms of meteorological and crop growing conditions. This increased variability in the dataset allows the ML models to be more robust to changes.
After the extraction of each time-series, a strategy was selected in order to fill the gaps whenever the data was unavailable for a particular day or pixel. For the case of ET a time-series, gaps were filled following the methodology proposed by Jofre-Cekalović et al. [79]: when not available, ET a was retrieved by multiplying reference evapotranspiration (ET 0 ) with the crop coefficient (K cs ). K cs was obtained as the ratio between ET a and ET 0 for those days with available data, while temporally interpolated for the missing dates. For the case of the DisPATCh SM time-series, the filling was performed using the original SM values from SMAP SM . For the rest of the variables used in this study, a simple linear interpolation was implemented as a gap-filling methodology.
As an additional preprocessing step, every time-series from each pixel and each variable are scaled through z-normalization, a standard technique that can speed up ML model convergence and improve performances [37], [80]. From this dataset at the pixel level, a field level dataset was created by calculating the median of all pixels contained in each field. The dataset at field level was used for the experiment with simpler ML models, while for deep neural network models the large dataset at pixel level was needed in order to tune all the parameters and avoid overfitting. Finally, these datasets were split into two equal parts, 50% for training and 50% for testing of the classical ML models. keeping an equal distribution of irrigation systems and crop types in the two groups in order to avoid imbalances towards a particular irrigation or crop type in the training or testing of the classification. Moreover, time-series from the same field were used consistently for only one task, training or testing, in an attempt to avoid undesired correlation between the two datasets. Finally, for each ML model, ten different runs were performed in order to extract more reliable performance metrics. During the different runs, train and test datasets were shuffled each time in a random fashion, but keeping the same constraints on the distribution of crop types, irrigation systems, and using same-field time-series for training or testing only.

B. Classification of Crop Types
Regarding specific information about crop types, using the SIGPAC-DUN dataset only partially completed the missing information about the years previous to 2020. SIGPAC-DUN does not contain information on the presence of secondary crops. For this reason, an additional analysis was performed on the fields with annual crops. This analysis consisted of a simple crop classification algorithm applied to the LAI time-series to detect the number of peaks occurring during the growing season and check for the presence of multiple crops along the same year. More specifically, LAI time series from Sentinel-2 at 20 m resolution were collected for each ground truth field and for each considered year. The median value was extracted among the pixels inside every field, for every available date. The resulting time series were first processed in order to remove outliers with the Hampel filter algorithm (using a threshold value of 3 and a window length equals to 3) and then linearly interpolated to daily intervals. Each time-series was classified based on the number of peaks present during the year using a simple peak detection algorithm which distinguished among winter crops (a single peak in the winter/spring period), summer crops (a single peak in the summer period), double crops (double peaks), or alfalfa, grown and harvested multiple times during the spring-summer period (multiple peaks in the summer period). Fig. 3 shows the LAI time-series for the different classes of annual crops that are grouped based on the number and position of peaks. Table III summarizes the number of fields collected in the campaign and used in this study as a dataset to train and test the ML models. These fields are divided per crop type and irrigation system, for each one of the three years considered. It is possible to notice how fields with annual crops vary in number throughout the three years since they showed changes in crop type from one year to another. Additionally, the total number of fields varies across the years varies since for a small number of fields the crop type could not be identified clearly, hence they were not considered for the specific year.

C. ML Models
Time series classification (TSC) is a specific machine learning class of algorithms that classifies data taking into account their ordered structure. This class of algorithms is chosen in this study since timing is a key element to distinguish different systems of irrigation [41]. Three different ML models were used, slightly adapted from the model presented in [40] and [81]. Table IV summarizes the reason of the selection of these particular models.
Time series forest [36] comes from the family of (decision) tree ensemble classifiers: it is a random forest [31] adapted to  III  SUMMARY OF THE TOTAL NUMBER OF FIELDS COLLECTED DURING THE FIELD CAMPAIGN PERFORMED IN 2020   TABLE IV  OVERVIEW OF THE MODELS USED IN THIS STUDY detect temporal features. Random forest is a set of classification trees, where each tree is trained on a random but independent portion of training data, using bagging or bootstrapping to select these training subsets [82]. Random forest algorithm are widely used [14], [15], [18], [32], [33], [34], [35]: the reason for their success lies in the low computational power required when compared to similar ML techniques, its stability (by design) against over-fitting, and its robustness against mislabelled training data. Additionally, this algorithm has a notable advantage in terms of interpretability of its prediction: each prediction has a confidence level that is retrieved from the percentage of trees that voted the same class. Time-series forest is a variation of random forest where each tree is split using a combination of distance and entropy gain: an approach that captures well temporal characteristics of the inputs. The model also allows for an insightful inspection of the classification process thanks to the possibility of producing temporal importance curves. These curves underline the parts of the time-series that contain the most useful information and reveal which is the most important statistical feature among the ones extracted. Given the wide diffusion of this ML method and its interpretability, time series forest was chosen as the benchmark model to run the classification experiment presented in this study.
Random convolutional kernel transform (ROCKET) is another algorithm designed for time-series classification. It was proposed by Dempster et al. [38] and proven to be competitive with the state-of-the-art algorithms for classification of multivariate time-series [39], but with a much faster computational speed. It is a kernel-approach classification inspired by convolutional neural networks. After producing a large number of kernels, two main features are extracted (maximum value and portion of positive values) that are then used to train a linear classifier (a ridge regression algorithm, as proposed in the original paper). An innovative aspect of Ro is the existence of a single hyper-parameter, corresponding to the number of kernels (set to 10 000 by default), which avoids computationally intensive hyperparameters's tuning, a task required by many other classifiers.  V  ACCURACIES OF THE CLASSIFICATION OF THE TIME-SERIES FOREST APPLIED TO DIFFERENT INPUT VARIABLES, WITH ANNUAL LENGTH OR  CROPPED TO THEIR VEGETATIVE PERIODS ResNET (Residual networks) is a deep neural network model which was designed initially for computer vision task and adapted to time-series classification [40], [83], given their success in terms of performances and wide diffusion for similar classification tasks. ResNET was shown to consistently outperform traditional ML models when applied to a large variety of different datasets [40]. ResNET main architecture is used as in Wang et al. [83]. This model was adapted in this study to be run with multiple variables (multivariate model), through the use of a late fusion of parallel networks [22], [84], [85], [86], [87].
The more complex architecture of this deep learning model requires the use of a large training dataset in order to prevent the rapid over-fitting of the model. For this reason, pixel-level time-series were used. Time-series contained in this dataset could be redundant, since adjacent pixels are expected to show similar values, given that spatial variability at 20 m resolution is limited. Nevertheless, the small variations that they present could still potentially improve the model accuracy, similarly to the improvement produced by most of the data-augmentation techniques, often used in deep learning, where slight changes are introduced in the dataset to create new training data [40], [88]. Training and testing datasets are still selected following the criteria of the other ML models, keeping all time-series from the same fields either for training or testing, thus avoiding data-leaking effects. Final performance metrics are still evaluated at field-level (after a spatial aggregation of the irrigation systems).

D. Postprocessing
After the model's training, annual maps of irrigation systems were produced and two main post-processing steps are implemented in order to correct for possible misclassification. Correcting the classification output with a statistical or knowledgebased approach is commonly used for multitemporal geo-spatial classifications [89], [90], [91]. The first postprocessing steps involved spatial aggregation at field-level for the irrigation systems maps produced at pixel-level. The SIGPAC-DUN field shapes were used as a mask and spatial aggregation was performed in order to select only one irrigation system for each field, based on the most recurring irrigation system predicted among all the pixels contained in the field. This approach is also realistic since only one system of irrigation is expected to be found for each field. The second postprocessing step involved a temporal analysis to detect and filter unlikely single-year changes of irrigation for a specific field. As a general rule, the presence of a single-year anomaly in the irrigation systems was always corrected, except for cases where the anomaly was found in the first or last year analyzed and the change could be explained by a modernization of the irrigation system (from not irrigated to irrigated or from traditional irrigation as flood to more modern irrigation systems such as drip and sprinkler). This knowledge-based temporal correction assumes that irrigation practices are not interrupted from one year to another and that modern irrigation systems are never replaced by traditional irrigation, given the significant infrastructure cost and no real production benefits of such change.

E. Evaluation Metrics
In order to compare the results from different ML models and variables, standard evaluation metrics are calculated by comparing the predicted versus actual irrigation system of the fields present in the test dataset. The accuracy is computed as the number of correct predictions over the total number of samples. The precision, also called user accuracy, is instead calculated for each irrigation system category as the number of correct predictions over the number of samples predicted with the same category. The average precision is expressed as the average of all the precisions calculated for all the categories. The recall, also called producer accuracy, is instead calculated for each irrigation system category as the number of correct predictions over the number of samples with the same category in the ground truth. The average recall is expressed as the average of the recalls calculated for all the categories.

F. Experimental Design
Tree different experiments were designed in order to explore different approaches in classifying irrigation systems.
1) Influence of Crop Types: The first experiment aimed at verifying if the model was able to correctly identify differences exclusively related to irrigation practices, or if it was merely classifying irrigation systems based on the crop type present in each field. There is a proven relationship between crop types and irrigation systems, where in most of the cases few prevalent systems of irrigation are present for each particular crop type, as shown in Table III. Different models of time series forest were trained on each crop type to predict the irrigation systems separately. Their aggregated accuracy was then compared with a general Time Series Forest model trained without discriminating by crop types in order to detect which approach was more favourable. The experiment was performed to assess the accuracy of SM , ET a , and both the variables together, SM +ET a .
2) Importance of Crop Vegetative Period: A second experiment was also designed using the same variables and the time-series forest model. This second experiment was used to check whether time-series classification models required to be manually cropped in advance or whether the model was able to independently select the period of more intensive irrigation. Time-series forest was run with only a part of the time-series, which was cropped to isolate only the vegetative period of crops, in which there was a greater intensity of irrigation. Cropping implied selecting the spring and summer period (from the 15th May) for all the crops except for winter cereals, where the winter season (until the 15th July) was used. The need for cropping time-series was evaluated through a comparison of the overall accuracies from the classification of irrigation systems using cropped time-series with the accuracies retrieved using entire time-series (e.g., ET a versus ET a,cropped , SM versus SM cropped , etc.). Another approach used to evaluate the need for cropping time-series was to visualize temporal importance curves in order to verify if the time series forest model trained with the complete time-series was able or not to independently select the most important part of the year for the classification of irrigation systems. These curves do not only show the most important period of the year for the classification task, but they also provide information on which of the extracted features is most useful (among the ones selected in this study: mean, standard deviation, and slope).

3) Model and Variable Selection:
After verifying the capability of the model to classify irrigation systems, a final experiment was designed to investigate which variables are most suited for the classification of irrigation systems and which model performs better for this classification task. The classification of irrigation systems was performed with both L2 and L4 variables, training the different models with each variable separately and with a combination of them. All the different ML models proposed in this study were used for the comparison: the two classic ML learning models (time-series forest and rocket) applied to both L4 and L2 variables and trained at field level (one time-series per field), and ResNET, the DNN model, applied only to L4 variables at pixel level, since the low spatial resolution of the L2 variables did not provide a large enough dataset for training and testing of this model. Each model and each variable were trained and evaluated ten times, changing each time the train and test datasets' distributions and the models' random initial weights. Median and standard deviation of the overall accuracies were used as a comparison metric.

A. Influence of Crop Types
Table V summarizes the accuracy retrieved from the simulations. The experiment was performed to assess the accuracy of SM , ET a , and both variables together, SM +ET a . Results from these initial experiments show that a general model (last column of Table V) has comparable accuracy with respect to the aggregated accuracy (second to the last column) of multiple models trained separately for each crop type (with a small difference of Δ = 2.16%). Using a general model, trained on every crop type has a noticeable advantage of not requiring a crop type map, which makes the approach more versatile since it can be adapted to areas where crop types are not known or where there are different crop types from the ones analyzed in this study.

B. Importance of Crop Vegetative Period
Table V also shows two additional results: first, it is possible to notice how combining ET a +SM leads to consistently higher classification accuracies than when using these two variables separately. All crop types show better accuracy when both variables are used for classification. The only exception to these results seems to be the classification of irrigation types for alfalfa, where accuracies remains low even when using ET a + SM , when comparing the two different irrigation systems present, flood and sprinkler. A possible explanation could be the multiple rapid vegetative growth cycles that characterize this crop. The rapidity may result in very similar irrigation practices between flood and sprinkler, since a shorter period of time available for irrigation reduces the variability of the two irrigation systems both in terms of water amount and intervals of time between consecutive irrigations. A second result is that cropping does not lead to improvements in terms of accuracy. Removing parts of the time-series actually degrades the final results, confirming that ML models for time-series are able to independently select and exploit the most valuable part for the time-series with no need for this preprocessing step. As an additional proof of the capability of the ML model to independently select the most interesting part of the time-series, temporal importance curves from the time series forest were calculated. Fig. 4 shows the importance curves for the irrigation classification task for the different crop types and the corresponding variables (ET a and SM ) used to generate them.
Temporal importance curves show that the model only focuses on a specific part of the year that corresponds to the irrigation and growing periods for each crop. It is noticeable that the feature importance curves for ET a and SM do not always select the same period and the same feature, suggesting that both variables provide complementary information for this classification task.
Among the three feature importance curves for the ET a time-series, the "mean" curve has a higher value for most of the crop types. This is also visually evident when looking at the time-series ET a under different irrigation systems: ET a shows a clearer difference in magnitude for fields irrigated with different irrigation systems, which has a direct relationship with the amount of water supplied to the field. Flood-irrigated fields tend to have the highest values of ET a , especially during the warmest period, when water is almost exclusively supplied by irrigation. This behavior was expected since flood irrigation is a traditional system that notably employs the largest quantities of water. Flood-irrigated fields show higher ET a for all the annual crops, from Fig. 4(a) to (d), especially for winter cereals, where flood-irrigated fields are compared to nonirrigated fields. For fields cultivated with double crops, alfalfa, and maize, flood is compared with sprinkler-irrigated fields and it is notable how flood-irrigated fields are still evapotranspirating more, even if the difference between these two systems is less evident. The second row corresponds to orchards, from Fig. 4(e) to (g): it is noticeable that when comparing nonirrigated and drip-irrigated fields (for the case of olive and vineyard fields), the model is still selecting the "mean" as the most important among the feature importance curves since drip-irrigated fields shows higher values than nonirrigated fields. Only in the case of fruit and nut-bearing trees "slope" is most important. The "slope" curve produced from the ET a time-series is calculated as the first derivative of the time-series, and can be interpreted as the degree of change of the time-series during the season: Fig. 4(e) shows a clear different timing in irrigating with drip and flood, so there is a clear difference in temporal changes of ET a , while nonirrigated fields have almost constant values during the summer season, thus, not showing temporal changes.
Regarding SM time-series, it is most evident how fields irrigated differently have a different range of variation of SM , especially during the warmest periods. For this reason, the "std" feature importance curve is selected by the model as the most informative: For winter cereals, it is evident that the "std" curve detects differences between the different irrigation systems during the crop growing period. Similarly, the "std" feature importance curve is also showing the highest peaks during the summer period for the case of alfalfa, double crops, vineyard, and olive fields. The only exception is present for the importance curves created from the SM time-series of the fruit and nut-bearing trees. In this case, the "slope" curve presents the highest peak, which is during the month of May/June: this is also visible in the SM curves, where during May the fields irrigated by a drip system show marked differences with respect to flood-irrigated and nonirrigated fields.

C. Model and Variable Selection
After verifying the capability of the model to classify irrigation systems, we proceeded to investigate which variables are most suited for the classification of irrigation systems and which model performs better for this classification task. Fig. 5 summarizes the accuracies retrieved for the three different models trained using different sets of the input variables described. All models have higher accuracy when using L4 variables, with the highest accuracy in terms of average being reached when all the L4 variables are used together, followed by the combination of ET a and SM only. Despite performing best, the accuracy reached combining all the L4 variables is very close to the accuracy reached using ET a and SM together, but it shows a larger standard deviation, caused by the addition of K s and LAI which probably do not positively contribute to the classification. For this reason, we considered ET a and SM to be the best combination of variables. These two variables show good accuracy even when used separately, so it could be possible to use these single variables in the classification process, only losing a small amount of accuracy. Nevertheless, ET a and SM together reach higher accuracies because their information complement each other: while SM is capable of distinguishing better large wet surfaces caused by flood irrigation, ET a can detect higher plant evaporation from drip with respect to nonirrigated fields. Combining the two variables always brings a general improvement of the prediction, as also demonstrated in Table VII.
Regarding the different models used, the ResNET model consistently shows higher results. ResNET did not only out-perform the other models in terms of accuracy, but it showed consistently higher results for all the different metrics used to compare the different models, as presented in Table VI for experiments performed with ET a and SM as input variables.
The final overall accuracy of ResNET in classifying different irrigation systems is comparable (and in some cases higher) to accuracies presented in literature for studies involving irrigation mapping at field level, such as Gao et al. [14] and Bazzi et al. [15]. This result shows that with the presented approach it is possible to keep a high overall accuracy even when adding complexity to the problem of irrigation classification. The only drawback of ResNET is that it requires a significantly higher computational cost for the model training than the other two traditional models. In case computational cost is an issue, a good tradeoff between accuracy and computational time is offered by the rocket model, which is less accurate than ResNET for this specific classification task but it is around one order of magnitude faster in training and suggested to be used as a default model for multivariate classification tasks given its remarkable results for large scale studies [39]. Another advantage for rocket is that its training time is linearly scalable with the size of the training set [38], which could be of great value in case this classification approach is applied to large irrigated areas, where the number of training samples inevitably grows.
Finally, Fig. 6 shows the confusion matrix for ResNET with ET a and SM as inputs. The matrix is calculated on the final results, after the postprocessing step, which included aggregating the model prediction at the field level and performing temporal postprocessing for the three different years. Precision and recall values are also presented and indicate how all the values of the metrics are very close to each other: an additional indicator of the robustness of the classification, which is not imbalanced towards any particular irrigation system. The lowest metric is represented by the precision for the drip irrigation system, which appears to be the label that is most misclassified by the ML model: as a matter of fact, in a few cases, the model appears to classify drip irrigation as flood or nonirrigated. Drip irrigation is sometimes confused with nonirrigated fields due to the low soil wet surface around the emitter, which minimizes losses through evaporation and runoff. Additionally, there is also a more marked misclassification in those irrigation districts with limited water allocation, where sustained deficit irrigation strategies are usually adopted, such as some areas of Segarra-Garrigues or Garrigues Sud. We have also realized that some recently planted fields of grapevines, almonds, and pistachios trees were classified as nonirrigated, but instead were drip-irrigated. This probably occurred due to its still low canopy vigor, evapotranspiration, and soil moisture values throughout the growing season. On the other hand, the confusion between drip and flood irrigation could be explained either due to a decrease in SM and ET a on dates between irrigation events or due to a higher ET a caused by crop cover between rows. In both cases, time-series between flood and drip-irrigated fields may look similar. In order to improve classification for these particular cases, the selection of more fields with these characteristics in further studies will help to obtain a more robust classification.

D. Comparison With SIGPAC-DUN
In order to perform a more comprehensive analysis of the quality of the classification of irrigation systems derived based on ET a and SM , a comparison of the percentage of different systems of irrigation at the irrigation district level was performed. As previously mentioned, the latest SIGPAC-DUN dataset [68] includes the first map of irrigation systems at the field level, allowing for a direct comparison between the irrigation maps produced by the ResNET model and this administrative classification. Fig. 7 visually compares the distribution of irrigation systems as classified by SIGPAC-DUN and by the ResNET model for the selected study area. The borders of the different irrigation districts are also shown as black continuous lines and three specific areas are selected for a visual comparison. It is possible to notice how generally the ResNET model produces a map with more modern irrigation systems than SIGPAC, where more fields are nonirrigated or flood-irrigated. Only in the last of the three comparisons of Fig. 7(c) (bottom-left) there is a large area (in the Segarra-Garrigues district) depicted as flood-irrigated by SIGPAC-DUN but predicted as nonirrigated by ResNET (verified to be correct by visual inspection). Fig. 8 shows a direct comparison between the different systems of irrigation as estimated by SIGPAC-DUN and predicted by the ResNET model used for this study. It is noticeable how the ResNET model consistently predicts a lower percentage of traditional irrigation by flooding, and in almost all cases (but for Canals d'Urgell and Algerri-Balaguer) a decrease in the percentage of nonirrigated fields. This discrepancy was expected, since SIGPAC-DUN showed some inaccuracies already from the comparison with the ground truth dataset. In particular, SIGPAC-DUN showed a tendency to misclassify modern irrigated fields as not irrigated or irrigated with the traditional system of flood. This suggested that the database probably reflects a picture that is not up-to-date, and it is also visible in this general comparison of the entire study area with the map produced by the ResNET model. A secondary cause for the discrepancy is also the limitation in the classification of the ResNET model, which reaches a final accuracy of around 90% when compared to the ground truth database.
Additionally, the maps of irrigation systems from SIGPAC-DUN and ResNET were compared with approximate values retrieved from literature for the three largest irrigation districts: Table VII shows the values extracted for this comparison. The literature data were collected in 1999 for Canal de Pinyana and Canals d'Urgell [92] and for 2018 for Canal d'Aragó i Catalunya [23] and they are presented here as the percentage of  irrigation systems over cultivated area. This comparison among sources belonging to different years allows to establish again the general trend for the study area in modernizing irrigation systems. It is possible to notice how old estimates from literature present larger percentages of traditional irrigation systems when compared against SIGPAC-DUN, which in turn contains larger traditional irrigation percentages than ResNET from 2020, considered the most updated source. As a matter of fact, even though the map for SIGPAC-DUN is delivered for 2021, the information contained over irrigation systems are a collection of administrative surveys from various previous years.

V. CONCLUSION
A key missing information for irrigation management and for hydrological studies over irrigated regions is the precise knowledge, at field level, of the different irrigation systems installed in each field, and the trends and changes of these systems over different years. This study has provided for the first time a method to classify irrigation systems (flood, sprinkler, drip) and not irrigated fields, using remotely sensed time-series at subfield scale resolution. Key hydrological variables were used as inputs for the classification of irrigation systems through the use of different ML models.
Two main hydrological variables, ET a and SM at 20 m led to the best performance in the classification of irrigation systems when combined together, regardless of the ML model used for the classification. This result is indicative of the usefulness of these datasets in providing complementary information: SM directly detects large surface soil wetting, thus easily detecting large uses of water, as in flood and sprinkler irrigation. ET a is instead able to detect fields that are irrigated with drips, which does not create dramatic changes in the soil water content but keeps the plant at high ET a levels (close to potential evapotranspiration) with respect to nonirrigated fields.
An initial experiment was run in order to verify if the difference in crop types was interfering with the prediction of irrigation systems. Results showed that crop type does not interfere with the task of classifying irrigation systems, and similar performances are reached when using a general model for all the crops or multiple models specialized by crop types. These proved the feasibility of classifying systems of irrigation using only hydrological variables from remote sensing.
Among the three ML models tested, ResNET showed the best performance for all the metrics used for this classification task. ResNET is the only deep neural network model proposed for this study, and its architecture was shown to be more suitable for detecting more complex variations from the analyzed data. This characteristic is an advantage since irrigation practices strongly vary, both in timing and amount among different fields, even for fields that grow similar crops and employ the same irrigation system.
Finally, we compared a map of irrigation systems derived from the ResNET model using ET a and SM of the year 2020 against the irrigation systems map provided by the SIGPAC-DUN administrative catalog. Results at the district level showed a general agreement in the percentages of irrigation systems, even though the ResNET map appears to classify more fields with more efficient irrigation systems (drip and sprinkler irrigation). These differences are present since the ResNET map delivers a more updated depiction of irrigation systems compared to the map from SIGPAC-DUN, and captures the ongoing conversion from flood to more efficient irrigation systems.
This work represents the first study dedicated to the automatic detection at the field level of irrigation systems from satellite remotely sensed data. It shows how this is achievable with good accuracy when applied to semiarid areas. Over semiarid areas, the low cloud cover allows for high availability of thermal and optical satellite data, and irrigated and nonirrigated areas are easily distinguishable due to the marked difference in soil water availability from the surrounding dryland areas.
This study only focused on semiarid regions since they are areas where water availability is a topic of increasing concern, and where optimization in the use and distribution of water for agriculture can bring the most noticeable improvements. One research avenue could be to apply the proposed methodology to more temperate areas. This application is expected to be challenging for two main reasons: first, in those areas crop phenology is more similar between irrigated and nonirrigated crops [93], second, there will be larger gaps in the time-series due to more frequent cloud cover, that will reduce the amount of information usable by the model.
Other Mediterranean regions are expected to be well suited for applying this methodology, where transfer learning techniques could be explored. This approach will require a small amount of local training data, which will only be used to tailor the weights of the last layer of the pretrained ResNET model in order to improve accuracies for the specific region of interest. Unsupervised learning could also be explored as a solution that avoids ground truth data collections, but which is often less performing than supervised methods.