Early-Season Mapping of Winter Wheat and Garlic in Huaihe Basin Using Sentinel-1/2 and Landsat-7/8 Imagery

Early crop mapping is essential in predicting crop yield, assessing agricultural disasters, and responding to food price fluctuations. Winter wheat is a major food contributor in China. Existing early season maps of winter wheat strongly depend on the shape of the time series curve, which limits applicability on large scales. Besides, the effect of garlic on winter wheat mapping is often ignored. In this study, we determined how early we could identify winter crops (winter wheat and garlic) by examining time series of different lengths, and generated annual 30-m winter wheat and garlic map of the Huaihe basin using the random forest classifier, Sentinel-1/2, and Landsat-7/8 time-series imagery. The results showed that garlic could be identified at the end of November by using four composite images with an overall accuracy (OA) of 0.88, followed by winter wheat recognizable at the end of January by using eight composite images with an OA of 0.91. The proposed framework can also be implemented in other regions and crops to generate early season distribution maps of different crops.

global, national, and regional scales [1].Furthermore, the linear growth of the world's population, 1 the "ceiling effect" of grain yields [2], and the limited expansion of cropland [3] will lead to a widening food demand gap and a serious challenge to the Sustainable Development Goal of Zero Hunger.Given these challenges, the production and dissemination of relevant, timely, and actionable knowledge of agricultural conditions as well as production prospects at national, regional, and global scales to improve food security have been advocated by various cross-institutional platforms [4].Therefore, it is necessary to develop a timely and accurate crop-mapping tool that provides critical information for governments, policymakers, and traders to crop management.In addition, crop maps can serve as input for various environmental models that improve the accuracy of simulations of agricultural responses to environmental factors [5].
Winter crops (winter wheat, winter garlic, and winter rapeseed) are primary contributors to world food production and have a significant impact on global food security [6], [7], which are widely grown in China, especially in northern China.Moreover, planting winter crops can make full use of the cropland in winter, and increase the degree of intensive use of farmland [8], which will also increase the capacity of cropland for windbreak and sand fixation.The winter crops in the Huaihe basin are mainly winter wheat and garlic, while rapeseed is mainly sparsely distributed [9], [10], [11].Therefore, only winter wheat and garlic were considered in this study when distinguishing the planting areas of winter crops.Satellite remote sensing supports long time series of multifrequency and multiscale Earth observations and has been effective for crop monitoring at various crop and geographical scales [7], [12], [13], [14], [15].In this context, winter crop maps at various geographic scales have been created.At the national scale, the Cropland Data Layer [16], the Sen2Agri automated system [4], and the Annual Crop Inventory of Agriculture and Agri-Food Canada [17] are updated annually.At the regional scale, maps of the On the North China Plain [18], [19], [20], Huanghuai Plain [21], Shandong province [22], Henan province [8], [23], Jiangsu province [24], and Bekaa plain [25] were also produced.However, these maps are derived from observations throughout the whole growing season, which means that the resultant maps need to be available at least after harvest.It is too late for the scientific management of winter crops.Earlier winter crop recognition refers to obtaining planting information in the early or middle of the growing season.Earlier obtaining information on winter crop planting can not only provide data support for scientific management but also provide an assessment basis for insurance in areas with extreme weather events and natural disasters [26].
Open archives of massive satellite imagery (e.g., Landsat and Sentinel) and rapid evolution of cloud platforms [e.g., PIE-Engine and Google Earth Engine (GEE)] allow us to acquire and process the latest and almost instantaneous images [27], [28].Consequently, efforts have been made to generate early winter crop products based on various satellite datasets [29].Skakun et al. [30] used a Gaussian mixture model to distinguish winter crops from spring and summer crops based on normalized difference vegetation index (NDVI) and growing degree days information obtained from moderate resolution imaging spectroradiometer (MODIS) images.Based on MODIS enhanced vegetation index (EVI) imagery, Potgieter et al. [31] used an unsupervised k-means classification algorithm to map winter crops.However, the coarser spatial resolution of MODIS cannot resolve the parcel-based fields in numerous agricultural landscapes [4], especially in Asia and Africa (farms in these regions are typically smaller than 0.04 ha) [32].As Sentinel-2 data becomes available, Tian et al. [33] identified winter wheat and garlic based on two Sentinel-2 images by analyzing the differences in NDVI between different land cover types.However, the quality of images for a particular date is difficult to guarantee for other years or regions, so its applicability is limited.Based on the prior phenological knowledge of winter wheat, an early season map of winter wheat was generated using multiple phenological metrics and a rule-based algorithm [34].The thresholds for the phenology metrics are based on the phenology calendar of winter wheat, which can fluctuate for regions with different climatic conditions.
With the demand for spatio-temporal resolution of data gradually increasing, some studies have attempted to construct high spatio-temporal resolution datasets using multisource data for crop classification.For example, a study produced a 30-m early season winter wheat map using the time-weighted dynamic time warping method and found that winter wheat could be identified as early as the end of March based on Landsat 7/8 and Sentinel-2 data [9].Similarly, Hao et al. [35] identified early season winter wheat by calculating the Euclidean distance of each pixel from the standard seasonal variation curve using the antibody network algorithm using Landsat 7/8 and Sentinel-2 data.These approaches are mainly based on incremental time windows, and generate early season maps by comparing the NDVI seasonal variation of each pixel with the standard seasonal variation curve of the target crop.Therefore, the shape of the time series curve severely affects the classification results and is of limited use due to the large differences in crop characteristics across spatial distributions.Recently, a study developed early season maps of soybean, corn, and rice in Northeast China using random forest algorithms based on Sentinel-1/2 data, and the results showed satisfactory accuracy of the early season maps [36].Therefore, it is promising to map early season crops in other regions using a combination of different satellite datasets and Random Forest classifier.
Considering the limitations of existing maps and the actual demands of the government, insurance companies, and farmers for earlier and up-to-date information on winter crops fields, we explored the potential of mapping early-season winter crops (winter wheat and garlic) and applied it to the Huaihe basin at a spatial resolution of 30 m.The resultant map generated in this study permits in-season monitoring of winter crop growth conditions and provides data support for yield assessment, insurance, and grain trade.

A. Study Area
Huaihe basin in eastern China covers parts of Henan, Anhui, Jiangsu, and Shandong [see Fig. 1(a)].The basin covers an area of 270 000 km 2 , with 70.41% of the cropland [see Fig. 1(b)].The entire study area is primarily flat plain, which provides excellent growing conditions for crops.Crops can be divided into winter crops and summer crops, with winter crops being mainly winter wheat and garlic, which are usually sown in October and harvested in June of the following year.

B. Datasets 1) Landsat and Sentinel Data:
Given the interannual growth cycle, we collected the surface reflectance data (Level-2A) of Landsat-7/8 and Sentinel-2 from 1 October, 2020, to 1 July, 2021, at the GEE platform, as Level-1C data are sensitive to changes in atmospheric composition over time [37].Landsat-7/8 carried Enhanced Thematic Mapper (ETM+) and Operational Land Imager (OLI) sensors, respectively, with 30 m spatial resolution and revisit periods of 16 days.Sentinel-2 Multi Spectral Instrument data have a spatial resolution of 10 m and a revisit period of five days.The quality of Landsat data is assessed using the CFmask algorithm, which can effectively mask clouds, cloud shadows, snow, etc. [38].Bad-quality observations in Sentinel-2 data were identified by the quality assessment band (QA60) and were masked eventually.Differences in wavelength and spectral reflectance between Landsat and Sentinel-2 imagery require the harmonization of images from different sensors to obtain consistent and comparable results [39], [40].We used ordinary least squares to match Landsat-7 and Sentinel-2 bands  to Landsat-8 standards and resampled the Sentinel-2 data to a spatial resolution of 30 m using bicubic resampling to match the Landsat data.
Sentinel-1 consists of two satellites, Sentinel-1A and Sentinel-1B, carrying C-band synthetic aperture radar (SAR) sensors with a spatial resolution of 10 m and a temporal resolution of 12 days.We used the VH band of Sentinel-1 SAR Level-1 ground range detected (GRD) products in the interferometric wide swath instrument mode from 1 October, 2020, to 1 July, 2021, on the GEE platform.As shown in Fig. 2, the number of total and good-quality observations per pixel varies spatially.Approximately 71.87% and 54.66% of pixels had >80 total and good-quality observations between 1 October, 2020, and 1 July, 2021, respectively, with higher observations in the region of overlapping satellite orbits [see Fig. 2
3) Ground Data Samples: The ground data samples are divided into training and validation samples [see Figs. 3 and  4].The training samples are mainly from the fieldwork during 2020-2021.We collected geo-referenced field photos of different crops during the fieldwork, including winter wheat, garlic, corn, peanut, etc.In addition, we also collected multispectral images of winter wheat and garlic information acquired through an unmanned aerial system.The role of unmanned aerial vehicle (UAV) imagery is to help us visually interpret the Google high-resolution imagery to expand the reference samples for this study.For example, if we identify a small region as a winter wheat field from the UAV imagery, then it can be marked as a winter wheat field at the corresponding location in the Google image, and the parcels that match the color and texture characteristics of the winter wheat field marked in that Google image are also considered as winter wheat fields.The validation samples were mainly derived from the visual interpretation of very high-resolution Google Earth images.Specifically, we used a stratified random sampling method to generate sample points in the study area based on farmland layer and labeled sample points by visually interpreting Google Earth images.Stratified random sampling is based on the area of each stratum [41].For example, for the winter wheat map, we randomly sampled 1500 sample points in proportion to the area of the winter wheat and nonwinter wheat stratum, including 457 winter wheat and 764 nonwinter wheat sample points after removing some sample points that are close to the training sample.Given the concentrated distribution of garlic in the study area, we selected three major garlic distribution areas according to previous literature and images to verify the garlic distribution results (see Fig. 4).It is worth noting that in order to ensure independent sample validation, those samples used for training were removed in the validation section.

C. Methods
The workflow to generate the early-season winter crops map was shown in Fig. 5, including the following steps.
1) Multiple vegetation indices and polarization band time series were generated by cloud removal, compositing, gap filling, and smoothing based on Landsat and Sentinel imagery.
2) The length of the indices time series was gradually increased by adding the number of images, which were used  [42].However, the importance of indices for winter wheat and garlic classification remains unclear.Therefore, we tested the classification performance of commonly used indices using the variable importance method in random forest algorithm.
Based on the importance ranking of the variables in Fig. 6, we selected the better performing variables, including the four vegetation indices and VH polarization band.The NDVI [43] is sensitive to vegetation greenness that is intimately related to crop growth conditions, which enable different growth stages of crops to be recorded [44].The EVI optimizes the vegetation signal and has higher sensitivity at higher biomass [45].We used normalized green-red difference index (NGRDI) aiming to differentiate garlic during the growing season, as garlic leaves turn yellow (a combination of green and red) during bud differentiation [46].The growth cycle of winter crops usually lasts from October to June, which is a significant difference in soil moisture content during the difference in growth cycles of other crops (e.g., spring crops, whose growth cycle spans mainly from February to August).Therefore, we chose the land surface water index (LSWI) to identify winter crops from the crop [47].In addition, we also used the VH polarization band to classify crops by their structural characteristics.These indexes can be calculated as respectively, where ρ NIR , ρ Red , ρ Green , ρ Blue , and ρ SWIR are the near infrared, red, green, blue, and shortwave infrared band.Regular composite images can reduce the effect of clouds and the temporal inhomogeneity of the observations [48].In this study, we composited images at 15-day intervals, whereby the maximum of all observations within each 15-day interval was calculated as the observation (see Fig. 7).When high-quality observation was not available within a time interval, the adjacent high-quality observations were used to interpolate the data gap linearly [49].Furthermore, the Savitzky-Golay filter with a moving window of 9 and order of 2 was used to remove the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.noise residuals [41], [50], [51].It is worth noting that LSWI is sensitive to moisture conditions and therefore smoothing is not necessary [41], [51].
2) Setting Input Variables: This study assumes that the accuracy of the resultant maps improves as the number of images used increases within a certain scope.On this basis, we controlled the number of images used by changing the end date of the input time series.The start date was fixed at the beginning of October when the winter crops were sown.The end dates were added in fixed intervals of 15-day, from 1 October to 1 July of the next year (see Fig. 8).The earliest identifiable dates of the winter crops are defined as the Matthews correlation coefficient (MCC), given as follows, which is considered more stable than the Kappa coefficient and F1 score, reaching a local maximum [52]: 3

) Training and Validation of Classifiers:
The random forest is an algorithm that can aggregate various decision trees [53].Owing to its robustness and efficiency, it has been used to identify corn [54], soybean [55], and rice [56] with high accuracy.Compared with traditional classifiers, the random forest is capable of reducing generalization errors by using only a subset of features for segmentation [36].Specifically, two parameters were set in the GEE: 1) The minimum number of the terminating nodes was fixed to 10, for preventing overfitting [57].2) The number of trees was fixed to 100, given the tradeoff between accuracy and cost [57].It is due to the cost increasing linearly as the number of trees increases while the accuracy rises.Apart from that, other parameters used the default values from GEE. Noticeably, both winter wheat and garlic were identified using the individual binary classifiers, that is, winter wheat and nonwinter wheat classifiers, and garlic and nongarlic classifiers.
For the training and validation of the classifier, we used the ground data samples described in Section II-B-3.The accuracy of the resultant map was assessed by the confusion matrix, and the MCC, OA, user accuracy (UA), and producer accuracy (PA) were calculated.

A. Accuracy Assessment of Resultant Maps
To explore the potential for early identification of winter wheat and garlic and to determine how early we could produce distribution maps before harvest, we assessed its performance using time series datasets of different lengths.As the number of images gradually increased, the classification accuracy improved and reached a maximum at a certain time.Specifically, eight images composited in eight periods (10.01-10.16,10. 16-10.31, 10.31-11.15, 11.15-11.30, 11.30-12.15, 12.15-12.30,12.30-01.14,and 01.14-01.29)can detect winter wheat with an MCC of 0.80 and an OA of 0.91 [see Fig. 9

B. Early-Season Winter Crops Map
Comparing the different accuracies caused by using different number of images in Section III-A, garlic can be identified as early as the end of November, followed by winter wheat at the end of January (as shown by the red mark in Fig. 9).Therefore, we used images from 1 October, 2020, to 1 December, 2020, to map the distribution of garlic in 2021 and images from 1 October, 2020, to 1 February, 2021, to map the winter wheat distribution in 2021 (see Fig. 10).As shown in Fig. 8, winter wheat was mainly distributed in the northwestern and central part of the study area, which is eastern Henan, northern Anhui, and southeastern Shandong [see Fig. 10(a)].Garlic was mainly

A. Integration of Multisource Images
The coarse-resolution images, such as MODIS, have numerous mixed pixels owing to the widespread existence of cropland smaller than 0.2 ha in China, where the smallholder system is predominant [58], [59].With the availability of 30 m Landsat images and 10 m Sentinel images, considerable effort has been devoted to integrating them to map cropland [4], [48], [60].In this study, we integrated Landsat-7, Landsat-8, and Sentinel-2 images to obtain a multisource dataset that is a near-daily single-sensor-like time series [61], thereby allowing crops to be mapped at earlier dates and with higher accuracy [36].

B. Early-Season Winter Crops Map Over Large Spatial Domains
Winter crops are vital grain contributors in China, thereby the information on their planting is relevant for food security.Given the timeliness of existing maps and the rarely available early-season maps for multiclass of winter crops, this study was devoted to exploring the potential for early-season mapping of winter wheat and garlic and determining how early we could obtain the distribution maps with satisfactory accuracy.Instead of the previous multiple crop type maps generated from comprehensive classifiers, we identified winter wheat and garlic separately, that is, using individual binary classifiers (winter wheat and nonwinter wheat, garlic, and nongarlic) for each crop, and the final winter wheat and garlic map was the overlay of the two binary maps.Garlic could be identified at the germination stage at the end of November, when NGRDI and LSWI were higher than in winter wheat and with much higher VH than in winter wheat due to sparse cultivation [see Figs.7 and 10(d2)].Winter wheat could be recognized in the subsequent tillering period at the end of January when NDVI and NGRDI were higher than in other crops while its VH was lower than in other crops (see Fig. 7), which was much earlier than previous studies that concluded that winter wheat was mapped three months before harvest [9].
The accuracy of the crop distribution map increases with the extension of the time series, since the more features captured by the classifier, the better the classifier performs within a certain period [4], [44].Yet both winter wheat and garlic exhibited a progressive decrease in nearly all accuracy metrics (MCC, OA, and UA) after reaching the optimal classification accuracy, particularly for garlic (see Fig. 9).This can be attributed to the fact that classifier performance tends to drop significantly when the number of features is extensive and the proportion of genuinely informative features is small [62].Furthermore, attention needs to be paid to the problem of overfitting the majority class due to the imbalance of samples [1], although the number of training samples in this study is relatively consistent.

C. Uncertainty
Several potential factors may affect the accuracy of the resulting map.First, image quality is the critical factor for fine mapping, as cloudy or rainy weather may cause the absence of high-quality observations and the appearance of abnormal observations.Despite our attempts to reduce the impact using image compositing, interpolation, and smoothing (see Fig. 11), there is still some gap between the real values and the processed values.Higher spatial and temporal resolution satellite archives on the GEE platform may provide an approach to address this data quality issue.Second, we used the land cover product of 2020 to delineate the extent of cropland, which may increase the error of the winter crop map in 2021.Third, compared with the pixel-based crop identification in this study, approaches that make full use of crop spatial information and avoid pixel loss (e.g., object-based approach) are expected to further improve map accuracy [63].Moreover, more novel machine learning algorithms, such as the extreme gradient boosting algorithm (XGBoost), can reduce the degree of model overfitting while increasing the computational speed [64], which could be a consideration for future study.

V. CONCLUSION
As population and consumption growth continues, information on crop planting is particularly critical to regional and global food security.More importantly, early-season crop distribution information provides key data to support stakeholders.To the best of authors' knowledge, existing methods for mapping winter wheat early season mostly rely on time series shapes, which has limitations for application over large regions.Besides, the effect of garlic is not considered.Here, we generated a map of winter wheat and garlic distribution in 2021 by using ground data samples, random forest classifier, and multisource datasets.Particularly, we explored the potential for early identification of winter wheat and garlic and determined how early we could generate a distribution map before harvest.The early season mapping method proposed in this study solves the limitations of previous methods by generating large-scale maps as long as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.ground reference samples are obtained.Based on the approach proposed in this study, garlic could be identified at the end of November with OA of 0.88, followed by winter wheat recognizable at the end of January with an OA of 0.91.This information provides support for forecasting food production and prices, and offers a scientific basis for governments and policymakers to timely find regions that may suffer from severe food crises.

Fig. 1 .
Fig. 1.(a) Location and (b) land cover of the study area.

Fig. 2 .
Fig. 2. Numbers of (a) total and (b) good-quality observations per pixel during the study period.

Fig. 3 .
Fig. 3. (a) Distribution of training samples and validation samples for winter wheat and nonwinter wheat.(b) and (c) Zoom-in views regions where the training samples are mainly distributed.

Fig. 4 .
Fig. 4. (a) Distribution of training samples and validation samples for garlic and nongarlic.(b)-(d) Major garlic distribution areas.