How Well Do EO-Based Food Security Warning Systems for Food Security Agree? Comparison of NDVI-Based Vegetation Anomaly Maps in West Africa

The GEOGLAM crop monitor for early warning is based on the integration of the crop conditions assessments produced by regional systems. Discrepancies between these assessments can occur and are generally attributed to the interpretation of the vegetation and climate data. The premise of this article is that other sources of discrepancy related to the data themselves must also be considered. We conducted a comparative experiment of the growth vegetation anomalies routinely produced by four operational crop monitoring systems in West Africa [FEWSNET, GIEWS, ASAP, VAM] for the 2010–2020 period. We collected a set of normalized differences vegetation index-based indicators (% mean, % median, and Z-score) and proposed original methods to analyze and compare the spatio-temporal variations of these indices using Hovmöller representation, statistics, and spatial analysis. To facilitate systems comparison, a classification scheme based on the percentile rank values of anomaly indicators was applied to produce 3-class alarm maps (negative, absence, and positive anomalies). Results show that, on an annual basis, the per-pixel similarity is relatively low between the four systems [24.5%–34.1%], and that VAM and ASAP are the most similar (70%). The reasons of the products discrepancies come mainly from different preprocessing methods, especially the choice of the reference period used to calculate the anomaly. The negative alarm agreement classes show no eco-climatic zoning influence, but negative alarms hot-spots were locally observed. The negative alarm agreement maps can be a useful tool for early warning as they synthesize the information provided by the different systems, with a confidence level.


I. INTRODUCTION
W ITH 15% of the total population affected by undernourishment, food insecurity issues remain prevalent in West Africa [1]. High population growth, household food, and livelihoods-based primarily on low agricultural production due to low use of external inputs and rainfed conditions, and high rainfall variability are among the principal drivers of food insecurity. In addition to these factors, the security and health risks experienced by the region, have been exacerbated in recent times by the COVID-19 pandemic [2], making the agricultural production systems particularly fragile and fluctuating. Thus, the conjunctural aspects of agricultural production are combined with the structural aspects of the inherent vulnerability of the populations. Since the major droughts of the early 1970s, several global early warning systems (EWSs) for food security have been developed in the region to enable decision-makers to anticipate crises and to assist in planning emergency measures by targeting populations and/or areas at risk [3]. Since 2016, the Group on Earth observations, global agricultural monitoring [4] has published monthly GEOGLAM crop monitor for early warning 1 (CM4EW) bulletins that reflect an expert consensus among the main EWSs on crop growing status and conditions for the main crops in countries considered most at risk of food insecurity. To reach a consensus, the international organizations in charge of the various EWSs meet monthly to share their analyzes of crop conditions based primarily on EO data, agro-meteorological model outputs in addition to national reports field data and their own expertise, and discuss assessment discrepancies to ultimately reach a final conclusion on crop conditions [5]. The final consensus assessment is based on the CM4EW classification system comparing current crop conditions to the 5-year average. Classifications include exceptional, favorable, watch, poor, and failure. While there is often agreement in crop conditions assessments between organizations, discrepancies between organizations can occur when there is conflicting information from differing sources [5] (see Fig. 1); in areas where little reliable information is available from the field, priority is given to information that comes from converging remote sensing-based sources.
In these EWSs, satellite information is mainly used to derive vegetation index anomalies from low spatial resolution image time series to serve as proxies of crop health and status. The normalized differences vegetation index (NDVI) is the primary vegetation index for monitoring crop conditions. To this end, the NDVI value of the current compositing period (8-day or 10-day generally) is compared to the average NDVI value of the same period calculated over the previous years, or to what is assumed to be a normal situation, to provide an NDVI anomaly that can be used to track crop growing conditions throughout the season. These NDVI anomalies are used to draw conclusions on the vegetation status and potential impacts on agricultural yields and production.
In their review of the current operational global and regional agricultural monitoring systems, Fritz et al. [6] identified different gaps in data and methods. Because knowing which product to use in an environment where an increasing number of products are available remains a challenge, they recommend better understanding of the differences between different input datasets (precipitation and vegetation indices), in particular where these datasets have discrepancies, and to develop tools for automated comparison. The study presented in this article is in line with this recommendation and proposes, as a preliminary analysis, a comparative experiment of the growth vegetation anomalies produced by the crop monitors of the main EWSs in West Africa for the 2010-2020 period. To this end: 1) we collect a set of NDVI-based vegetation growth anomaly indicators (one per EWS) and develop a spatio-temporal approach to compare the extreme values; 2) we analyze and compare the spatio-temporal variations of these indices through space (with and without a cropland mask) and time (with and without a crop calendar masks), using statistics and spatial analysis tools. The rest of this article is organized as follows. In the following section, we present the background of our study through a short review of the crop monitoring systems in West Africa. In Section III, we present the study area, the datasets used, and the outlines of the methodology adopted. In Section IV, we present the statistical comparison between the systems and the systems agreement maps, which are then discussed in Section V. Finally, Section VI concludes this article.

A. Agricultural Monitoring Systems in West Africa
In a recent study, Nakalembe et al. [7] reviewed the application-ready satellite-based agricultural monitoring systems covering West Africa. Four of these systems are partners of the GEOGLAM CM4EW [5], and are included in this study. The famine early warning systems network (FEWS NET) developed by USAID, the global information and early warning systems (GIEWS) of FAO, the Seasonal Monitor of the World Food Program (VAM), and the European anomaly hot spots of agricultural production (ASAP) system of the joint research center (JRC). However, because of their importance for West Africa, it is worth mentioning three other crop monitoring systems in the region: global agricultural monitoring (GLAM) of NASA, the University of Maryland, and the USDA Foreign Agriculture Service [8], the AGRHYMET system [9] of the permanent interstate committee for drought control in the Sahel (CILSS) that relies mainly on precipitation data and agrometeorological modeling, and CROPWATCH [10] of the RADI (Chinese Institute of Remote Sensing and Digital Earth) that reports crop (maize, rice, wheat, and soybean) conditions for Nigeria. These systems are not included in the study because, they either provide incomplete regional coverage, or the remote sensing data they use are shared with the previously mentioned systems.

B. Crop Conditions/Anomaly Indicators
The EWSs crop monitors use different data sources, but they all use optical data (NDVI-based) that provide information on crop development and vigor. The NDVI time series are used to calculate vegetation growth anomaly indicators that are then classified to produce vegetation anomaly maps that are published in regular bulletins and geoportals. These vegetation anomalies, in conjunction with other data sources (meteorological data, crop model simulations, field information, national/regional information) are used to provide a basis for the convergence of evidence of agricultural conditions that comprise the consensus based assessments under the GEOGLAM CM4EW [5], [11]. In some systems, additional data such as conflicts, market prices, and implementation of policies are used together with the agricultural conditions to alert national and international decision-makers on developing food security concerns impending food crises. Table I summarizes the characteristics of the main vegetation growth anomaly products used in the crop monitors of the main EWS of West Africa. In this article, only one NDVI-based anomaly indicator per system is used for analysis and comparison. Fig. 2 shows the flowchart of the data processing and analysis. Our work starts with the collection of the datasets of NDVI  I  DETAILS OF THE EO-DERIVED VEGETATION ANOMALY PRODUCTS USED IN THE CROP MONITORS OF THE EWSS IN WEST AFRICA; ONLY THE NDVI-BASED  ANOMALY INDICATORS IN BOLD ARE USED IN THIS  anomaly indicators of four EWSs, for the West Africa region over the 2010-2020 period. Then, to make values comparable in space and time, the anomaly values of the different products are harmonized over the whole area and period through percentile calculation to produce anomaly maps (nine classes) and alarm maps (three classes). These maps are then analyzed and compared at different spatial (national and regional) and temporal scales (annual and 11-year period), using statistics and spatial analysis tools.

A. Study Area
The study area is between 4.4 • N and 18 • N (the North Sahel limit) and 19 • W-24.5 • E, including 17 West African countries (see Fig. 3). West Africa's climate is controlled by the northsouth movement of the Intertropical Convergence Zone (ITCZ). As a result, West Africa's precipitation regime is characterized by latitudinal belts of decreasing rainfall and wet season length. In the Guinean region, precipitation is abundant year-round with a bimodal pattern. As latitude increases, the amount of precipitation decreases, as well as the duration of the monomodal wet season. However, this latitudinal pattern is somewhat modified Fig. 3. Climate [23] and farming system [24] zone maps in West Africa. by altitude, with higher mountain elevations (e.g., the Guinean Highlands and the Jos Plateau in central Nigeria), receiving more precipitation. The annual precipitation variability also becomes more significant with latitude, with a coefficient of variation around 0.3 in the Guinean region, to over 1.4 in the Sahel [21]. As throughout West Africa crops are mainly rainfed, the farming systems broadly follow the rainfall latitudinal gradient, with a system dominated by agro-pastoral millet and sorghum crops in the semiarid Sahel, by cereal-root crop mixed in the Soudanian part, by root crops in the Soudano-Guinean part, and by humid low-land tree crop in the Guinean part. According to Dixon [22], ten farming systems among the 16 possible over the continent are present in West Africa, illustrating the high diversity of the agro-environments in the study area.
Over the last decades, the study area experienced many environmental changes. In the Sahelo-Soudanian zones, since the drought period of the 1980s, an alternation of dry and wet years in the mid-1990s, followed by a rain resumption, is observed (see Fig. 4); these variations seem to be linked to the surface temperature of the North Atlantic Ocean [25]. All over the study area, land-use changes are also observed [26], [27], in particular in the Guinean part characterized by an increasing deforestation rate since the 2010s (see Fig. 4).

1) Anomaly Indicator Datasets:
Among the anomaly indicator datasets available for each system (see Table I), we chose only one anomaly indicator per system to simplify the analysis. Similarly, we restricted the study period to 2010-2020, which we felt was a good compromise between the amount of data to be processed and the inclusion of varied climatic conditions. The indicators were collected for the entire study area at their original spatial and temporal resolution (see Table I).
1) The anomaly indicators for the ASAP (NDVI z-score from the consolidated archive) and VAM (% of mean NDVI) systems were provided, respectively, by JRC and WFP. 2) FEWS NET anomaly indicators (% of median NDVI) were downloaded directly from the FEWS NET website. 3) GIEWS anomaly classes (9 classes built on % of mean NDVI) were downloaded from the GIEWS website. The anomaly values are not available, only the classes can be downloaded.

2) Other Crop Datasets:
To focus the analysis on agricultural production, we constrained the data analysis in space and time, by using a cropland mask and a crop growing season mask, respectively. Among the readily accessible global, continental, and West African cropland masks [7], we used the Global Land Cover-SHARE [28] released in 2014 and composed of land cover datasets produced with satellite data acquired between 2008 and 2012. GLC-SHARE provides a set of 11 major thematic land cover classes among which the cropland cover class was used in this study. The spatial resolution of this dataset is 1 km, and the pixel value indicates the percentage of cropland within the area. We use a threshold of 10% to transform the dataset to a boolean cropland map (i.e., cropland = 1 if pixel > 10%, cropland = 0 else). The crop growing season was calculated from the phenological indices provided by ASAP [17], which used an approach based on thresholds on the green-up and decay phases [29]. The start and end of a season was estimated through the historical average of the smoothed NDVI over the period 2002-2016 [18]. Two growing seasons were considered and maps were provided at a 1 km spatial resolution.

C. Dataset Processing
Data processing consists of the following: 1) anomaly data spatial aggregation at a common scale; 2) harmonization of the anomaly indicators; 3) classification of the harmonized anomalies. Anomaly products are at different spatial resolutions (see Table I). To this end, all maps used in this study were resampled to a 1 km spatial resolution according to the ASAP product grid. To preserve the original anomaly values in the unaltered scene, the nearest neighbor resampling method was applied. Then, to ease the comparison and analysis of the different NDVI anomaly indicators used in the four crop monitoring systems (% mean, % median, and Z-score), a classification scheme was applied in two steps.
1) First, to harmonize the indicators, for each system the percentile rank values of anomaly indicators were computed from the entire dataset (all pixels of the study area, and all dates over the 2010-2020 period). 2) Then, 9-class (for qualitative comparison) and 3-class (for quantitative analysis) maps were produced. a) The 9-class (referred hereafter as the Anomaly classes) correspond to seven 10-percentile classes between the 15th and 85th percentiles, plus two extreme classes (corresponding to the 15th percentile or less, and to the 85th percentile or more), and is close to GIEWS nomenclature. b) The 3-class (referred hereafter as the Alarm classes) correspond to the two extreme percentile classes plus one median class (15th-85th percentiles); these three classes are labeled "negative alarms," "positive alarms," and "no alarm," respectively. All the processing was performed using GDAL and Rasterio libraries with Python 3.9.

D. Dataset Comparison and Analysis 1) Spatio-Temporal Representation of the Anomaly Classes:
Because of the marked latitudinal gradient of the vegetation in the West African region, a latitude-time Hovmöller diagram representation [30] was adopted to plot the 9 classes of NDVI anomalies. Furthermore, in order to be more consistent with agricultural production, we also produced spatially and temporally constrained Hovmöller diagrams using the cropland mask and the crop growing season mask.
2) Statistical Comparison of the Alarm Maps: For the spatial and temporal comparison of the anomaly products, a similarity metric is used. It corresponds to the proportion of pixels assigned to the same alarm class (i.e., for negative, positive, and absence of alarm) between systems. Similarity metrics were computed for different time steps (year, and 11-year period), and different system sets (pairwise, and 4 × 4). To synchronize the 8-day VAM product with the 10-day products (see Table I), we used the closest date (leading to a maximal shift of 2 days between VAM and the other products). In addition, Spearman's rank correlation was calculated for pairwise systems comparison. Finally, the yearly percentage of the negative and positive alarms was computed, and the 2010-2020 trends of the different products were compared.
3) Production of Alarm Agreement Maps: Complementing the statistical comparison of the products, we produced agreement maps of alarm classes between the four systems at the regional and national scales. This was conducted in the following three steps. a) We computed annual and 2010-2020 aggregated alarm maps for each system, by calculating first the occurrences of the positive and negative alarm classes over each sq. km cropland and over the considered period, and then by applying a top 15% classification scheme on the number of occurrences. b) To reduce and filter out the errors related to georeferencing and rescaling the products from different spatial resolutions, a 3 × 3 majority filter was applied to the aggregated maps of the four systems. c) Finally, the filtered maps were merged to produce the agreement maps. These latter were prepared according to a classification scheme [6], [31] in which three levels of alarm class agreement are distinguished: Low agreement (pixels where two of the four systems are in agreement; it is possible that the other two pixels are identical to each other, no distinction was made); high agreement (pixels where three of the four systems are in agreement) and full agreement. This process summarized in the Fig. 5 allows us to enhance spatial patterns of the level of agreement.

A. General View
The spatial and seasonal variations of the harmonized crop growth anomalies are represented by the time-latitude Hovmöller diagrams in Fig. 6(a) (all data considered) and    II  ANNUAL SIMILARITY INDICES (EXPRESSED IN PERCENTAGE) OF THE 3-CLASS  ALARM MAPS OF THE FOUR SYSTEMS; THE SIMILARITY INDICES ARE  COMPUTED FOR THE WHOLE STUDY AREA AND ALL COMPOSITING DATES  ("ALL PIXELS Table II show that, on an annual basis, the per-pixel similarity is relatively low, between 19.6% and 34%. This agreement increases when a cropland mask is used (5.7% average gain), and when a cropland and a crop calendar masks are used (3.1% average gain).
A deeper analysis of the four systems similarity (see Table  III) indicates that the similarity percentage is mainly due to the "no alarm" class (between 19.60% and 32.33%), while only 0.28%-3.60% of the pixels are similar in terms of negative alarms, and 0.67%-5.02% are similar in terms of positive alarms. These low alarm similarity values must be brought back to the mean percentage of the positive and negative alarms proportions, which are each around 15% by construction.
In the following sections, only the cropland pixels and the growing season dates will be considered for analysis and comparison.
2) Pairwise System Comparison: The pairwise comparison conducted over the 2010-2020 period (see Table IV) indicates that the most similar systems in terms of alarm classes are the VAM and ASAP with around 70% similarity, followed by the ASAP-FEWS NET pair (around 61% similarity). The most divergent systems are FEWS NET and GIEWS (around 52% similarity). The conclusions are identical when using a Spearman rank correlation test (see Table V).
3) Temporal Comparison: Fig. 7 indicates the mean annual percentage of negative [see Fig. 7(a)] and positive [see Fig. 7(b)] alarms over the study area. As expected, the values are around 15%-due to the methodology used to define the alarm classesbut the annual variability appears to be rather high, between 6.4% and 28.1% for the negative alarms, and between 5.1% and 33.3% for the positive alarms. Overall, VAM and ASAP show similar interannual patterns with close values of negative and positive alarm percentages and high interannual variability. FEWS NET and GIEWS, on the other hand, follow similar temporal patterns characterized by low interannual variations. Considering the trends, the Pearson statistical test indicates a significant increase of the negative alarms for FEWS NET (pvalue = 0.015), and a significant increase of the positive alarms for ASAP (p-value = 0.038). All other trends are not significant at a 95% confidence level. However, we observe for all systems an increase of the positive alarms for the last 3 years (2018, 2019, 2020) that was already visible on the Hovmöller diagrams (see Fig. 6).

C. Agreement Maps 1) 3-Alarm Classes:
Agreement maps between the alarm products derived from the four systems were produced for each year (see example of the year 2010 in Fig. 8(a); maps for the other years are provided in Appendix A), and for the whole 2010-2020 period [see Fig. 8(b)]. For the whole period, we observe about 28% of low agreements, and 72% of high and full agreements.
We then calculated the distribution of the anomaly classes agreement per country (see Fig. 9). Surprisingly, we observed no clear geographic pattern. High and full agreements are found in both some Sahelian (Senegal and Mali; >75%) and Guinean (Guinea and Gambia; >85%) countries, and, at the opposite, low and no agreement are found in other Sahelian (Mauritania; 39.7%) and Guinean (Sierra Leone; 46.7%) countries.
2) Negative Alarm Classes: We then focus on the negative alarm class, as it corresponds to areas that would require special attention for potential negative cropping outcomes and food security concerns. Fig. 10(a) shows the example of the 2010  negative anomaly agreement map between the four systems (maps for the other years are provided in Appendix B). The negative class represents a small percentage of the area (by construction, around 15% of the area), but with clear spatial patterns. When integrated over the 2010-2020 period [see Fig.  10(b)], the spatial patterns of the negative alarms appear more clearly, with a score of 60% of high and full agreements between the systems (and 40% of low agreement).
When the 2010-2020 negative alarm agreement classes are calculated per country (see Fig. 11), we observe that Nigeria is the most consistent country, with 67.3% of full and high agreement, while Liberia and Sierra Leone are the least consistent countries with 100% and 96% of low agreement, respectively. It is also worth noticing that Guinea-Bissau shows no negative anomalies.

A. Comparison of the Anomaly Products 1) Unexpected Discrepancies Between the Systems:
Because of the use of the same EO data (except for GIEWS), one would expect large similarities between the NDVI anomaly maps. The different results obtained show that it is not the case, both in space and time. The Hovmöller representation and the statistical analysis are the two tools used to characterize and quantify the discrepancies between systems.
Thanks to the north-south eco-climatic gradient in the region, the Hovmöller diagram has proven to be an interesting tool to illustrate the spatio-temporal variability of the NDVI anomalies in the region. It indicates a contrasted spatio-temporal distribution of the anomaly classes according to the systems, with extreme anomaly classes largely present for FEWS NET (negative classes in the Guinean region) and VAM (both positive and negative classes in the Soudanian region). However, after filtering the pixels and dates that are outside the cropland and the growing season, respectively, the Hovmöller diagrams of the different systems seem more comparable. This effect is confirmed by the statistical comparison of the 3-alarm classes that show a similarity increase between the four products (from 27% to 30% on annual average).
The statistical comparison of pairwise systems also confirms the interpretation of the Hovmöller diagrams, with about 70% of similarity between the alarm classes of ASAP and VAM (Spearman correlation of 0.25), and only 52% of similarity between the alarm classes of GIEWS and FEWS (Spearman correlation close to 0).
2) Potential Sources of Discrepancies: Despite the fact that NDVI time-series data forms the core of all systems, we showed that the NDVI anomalies' spatial and temporal patterns show strong discrepancies that need to be understood. The anomaly products processing chains of the different systems can give elements to explain the differences. First, the NDVI data were calculated from reflectance data acquired by different EO satellites, MODIS for three systems, and NOAA-AVHRR and METOP data for GIEWS. This could partly explain the low similarities and correlations obtained between the products of GIEWS and those of other systems. Second, the algorithms used to smooth the NDVI time series differ between the crop monitoring systems. ASAP and VAM use the same algorithm, which could explain why they are quite similar, while the other systems use different preprocessing methods (see [32] for details about the eMODIS products used in FEWS NET).
It is easy to understand the importance of such algorithms, as cloud cover is still an important issue in the region. This point is particularly important for the analysis conducted during the crop growing season (the use of a crop calendar mask decreases the EWSs similarity measure calculated for the cropland; Table  II), and in the coastal area of the Guinean countries, where large discrepancies are observed.
Third, to compare the systems, the anomaly indicators were spatially aggregated at a common resolution, which can contribute to noise in the information. However, the proximity of the ASAP (1 km) and VAM (5 km) systems indicates that the native spatial resolution of the products is not a major source of discrepancy. Finally, the vegetation anomaly indicator could play an important part in the system discrepancies. It is not so much the formula used to calculate the indicator, but rather the period chosen for the reference. Thanks to the harmonization carried out, the formula of the indicator, although different (z-score for ASAP and percentage deviation to the mean for VAM), should not have an important weight, while, on the contrary, the reference period can have an important impact on the value of the anomaly.
The reference period varies between 12 years, for VAM, to 30 years, for GIEWS. Yet, considering the high demographic growth and the rainfall variations in West Africa, important land use and vegetation conditions changes have occurred over the last decades [26], [33], in particular an increase of vegetation cover in the Northern part of the area, due to increasing annual rainfall [34], [35], and a decrease of vegetation cover in the Southern part, mainly due to the deforestation [27]. These land processes occur at different time periods and places, introducing variations in the NDVI reference used to calculate the anomalies, and consequently in the anomaly values.

B. Agreement Maps, a Decision Support Product?
In addition to the quantification of the similarity between products for the extreme anomaly classes, this study aimed to provide alarm maps at regional and country scales that are qualified by the agreement score (full, high, and low). We produced agreement maps for the 3 alarm classes confounded, and for only the negative alarm class that, a priori, is the most important class for food security and early warning. The maps were postprocessed with a 3 × 3 majority filter. This smoothing strategy allows for a more general overview of spatial trends and eases visual interpretation. This also filters out the noise due to the resampling of the products with varying resolutions and errors in georeferencing.
Regarding the alarm maps (3 classes), 28% of low agreement were observed in contrast to the 72% of high and full agreements for the whole considered period (2010-2020) and cropped pixels. As previously mentioned, these agreements are higher than the scores of the pixel-per-pixel similarity. This is explained by the classification scheme used to aggregate the 10-day period alarm maps. Indeed, classifying the pixels with the top 15% occurrence of negative and positive anomalies as, respectively, negative and positive alarm classes increases the similarity between the products for a given period. We observed on the maps that low agreements are mainly located in the coastal area of the Guinean countries (Sierra Leone, Ivory Coast, Ghana, Benin, Togo, and Nigeria). The high presence of clouds in this area may bring some differences in the vegetation alarm maps computed from the different products, with different temporal filtering.
For the negative alarm maps, the agreement between systems is less with about 40% of low agreement class, and 60% of cumulated high and full agreement class, for the whole period and area. This result was expected as the no-alarm class represents a large percentage of the total similarity between systems. The negative alarm agreement results show no ecoclimatic zoning influence at the country level, but we observed a spatial pattern of negative alarms with hot-spots in the Tillabery (South West Niger), East and Center-East (Burkina Faso), and Alibori (North Benin) regions, and two large areas in Central and North-East Nigeria, for the 2010-2020 period. The annual and decadal negative alarm agreement maps can be a useful tool for early warning, by helping to prioritize the emergency measures. These maps synthesize the information provided by the different crop monitoring systems and provide information on the confidence level associated with the negative anomaly through the agreement class. However, we should keep in mind that negative alarms are not always synonyms of a decrease in agricultural production. Land cover changes inside a pixel, such as deforestation in the southern part of the study area, can result in a decrease of NDVI without necessarily being linked to a decline of crops conditions in the area. At the opposite, pixels classified as positive alarms can correspond to a decrease of agricultural production in particular in the Sahel region where the abandonment of cultivated land results in an increase of NDVI because natural vegetation has higher NDVI than crops in this region [36]. So, it is important to keep in mind that land cover or land use changes can induce NDVI anomalies that are not linked to crop conditions anomalies. Data on land cover dynamics, such as the forest cover change [37] or the cropland change [38] maps, should, thus, be included in the assessment of crop conditions.

C. Study Limitations
Despite important methodological and thematic contributions, we are aware that the study has certain limitations related to both the datasets and the comparison method used.
In terms of datasets, the first limitation is that only one type of NDVI-based anomaly products was considered, while other crop conditions indicators exist (see Table I). NDVI shortcomings are well known (atmospheric noise, saturation for high levels of biomass, etc.), but NDVI remains a robust index [39], well correlated to the active vegetation amount, and available on all (optical) satellite platforms from the beginning of EO. The second limitation is that the NDVI-based anomaly products used in the study are different in nature (anomaly classes for GIEWS, and anomaly values for the other systems).
In terms of data processing, the spatial and temporal resampling of the initial products could have introduced some bias. Likewise, the alarm classes are defined using an arbitrary threshold of 15% of the extreme percentiles, and the results could have been different with another threshold value. Another method limitation is the use of a unique cropland map for the whole period of analysis (11 years); it is well known that for the last four decades, West Africa has experienced large land use changes fueled by high demographic growth, with an increase of cropland, replacing and fragmenting savannas, woodlands and forests [27]. In the same way, the use of a unique growing season calendar for the whole study period is problematic because West Africa is well known for the high variability of its rainfall pattern and therefore crop phenology; Furthermore, this calendar is derived from land surface phenology metrics, and even when considering only cropland pixels (that are in reality mixed cropland-natural vegetation pixels), it is improper to consider it as a crop calendar.
Finally, the main limitation of the study remains that we only compare, not order, the NDVI anomaly products. The product hierarchization was not part of the study, but should be a priority for the next studies. Research perspectives on this topic are presented in Section VI.

D. Future of EO-Based Crop Monitors
While NDVI anomalies are the focus of this article, the EWSs use other types of remote sensing variables to assess the crop and rangeland conditions, such as thermal infrared data to compute the temperature conditions index, brightness temperature to derive soil moisture or precipitation data to compute the water requirement satisfaction index (GWSI). Likewise, high spatial resolution satellite time series (Sentinel 1 and 2, and Landsat) are already used in some EWSs to focus on particular areas. For example, ASAP 2 offers the possibility to access and analyze these high resolution data at the field level, but the full capacity offered by such data is not fully and systematically exploited. The arrival of new data in the field of Earth observation and the enormous progress in data processing pave the way for a new generation of EWS. At short-term, for national and regional early warning products based on vegetation anomalies, improvements are expected to come mainly from the methods, more than from new Earth observation mission, because the new EO systems are too recent to offer sufficiently long reference periods (5 years of data available so far for Sentinel-2 constellation, for example). Because of NDVI limitations (atmospheric noise, saturation for high levels of biomass, etc.), other spectral indices could be tested such as EVI that is thought to have a greater sensitivity to high density canopies, and a lower sensitivity to the atmosphere, but whose benefits compared to the NDVI are questioned in the literature (e.g., [40], [39]). Other spectral indices including short-wave infrared band (e.g., NDWI, NDMI), for estimating vegetation water content, or green band (e.g., GCVI, GNDVI) for estimating the vegetation nutrient concentration, could also be tested in West Africa where water and nutrients are limiting factors of crop productivity. Another short-term improvement of the crop monitors, could be the use of agroecological zones (e.g., GAEZ; [41]) to calculate and classify the anomalies percentiles. Zone-specific alarm class maps could be more meaningful to assess potential impacts on agricultural yields.
At mid-term, significant advances are expected from the following.
a) The ancillary products, such as more accurate cropland and crop group masks using Sentinel data, and more accurate detection of the growing season (start and end of season) b) Data processing, in particular improved image time series preprocessing (NDVI smoothing or gap-filling), real-time processing, and cloud-computing. c) Improved data and products access (cf. new opportunities offered by initiatives such as the africa regional data cube/digital earth africa). d) Improved decision support products, more readable by decision makers (cf. GEOGLAM). Coarse-resolution crop monitoring systems are essential for decision-makers, and we are not convinced that vegetation anomaly maps produced at a higher spatial resolution would be better for national or regional crop monitoring as, to cite [20], it is very difficult for a government to take actions at a granular level lower than the district or municipality, such as a farm or commune. The future for EO crop monitoring will certainly include data from the next generation of hyperspectral and thermal satellites with higher spatial resolution and revisit time. Finally, all these improvements will benefit from advancements in computer science that will help to combine heterogeneous data, such as EO data, crop model simulations, numeric media data, and crowd-sourced data in order to support regional and national EWSs.

VI. CONCLUSION
GEOGLAM, the GLAM system of systems, ensures the coordination and information sharing of the regional and national systems [42]. The coordination under the GEOGLAM CM4EW requires comparing and integrating crop conditions assessments produced by systems that integrate vegetation and climate anomalies, ground observations, and other information sources and synthesizing these efforts into a consensus-based assessment that represents the agreement of the EWSs involved. Discrepancies between these assessments can occur, for many reasons [5]. One source is certainly the interpretation of these data, which varies with the data source and the expert sensitivity. Our study reveals that, upstream of the crop conditions mapping, the different NDVI anomaly maps produced and used by the different systems are surely another source of discrepancy.
The main contributions of this study are methodological and thematic. In terms of method, to the best of authors' knowledge, it is the first study to compare the vegetation anomalies component of the CM4EW systems. We developed an original approach to visualize and compare the vegetation anomalies both in time and space. In terms of thematic, this study reviewed the crop monitoring systems implemented for West Africa, in particular the NDVI anomaly products used, and identified potential reasons that could lead to discrepancies in crop conditions assessments. These reasons are multiple, but seem to come mainly from different preprocessing methods used, especially the NDVI smoothing algorithms and the reference period used to calculate the anomaly. NDVI smoothing is particularly sensitive in the Guinean part of West Africa, where the satellite image time series are noisy due to a dense cloud cover, resulting in a high discrepancy between the systems. However, the issue of low confidence in the alarm class is somewhat mitigated by a lower food security risk in this region (the area is less prone to yield reduction caused by a lack of water) compared to the other regions of West Africa. We also showed that the choice of the reference period was particularly important in West Africa, where the environmental and land use changes are strong. We therefore recommend careful study of the reference period selected, and in regions where there have been large changes in land cover or climatic conditions, we recommend using a shorter reference period. Another important output of this study is the production of synthetic decadal and annual alarm agreement maps. These agreement maps can be a useful tool for early warning by helping to prioritize the emergency measures in hot-spot areas displaying negative alarms and a high level of reliability (expressed in number of concordant systems).
Like any study, this work has certain limitations, but they are not penalizing considering the objective of comparison of anomaly products in time and space. However, it is obvious that the next steps will be to better understand these differences and to hierarchize the systems for different applications or different geographic regions. For the first step, we can use the same reference period for the long term statistics, resample row data at the same resolution, and compute different anomaly indicators from the same NDVI dataset. For the second step, we will need external data to evaluate the temporal and spatial consistency of the different systems, and thus, make recommendations. For the first step, we can use the same reference period for the long-term statistics, resample row data at the same resolution, and compute different anomaly indicators from the same NDVI dataset. For the second step, we will need external data to evaluate the temporal and spatial consistency of the different systems, and thus make recommendations. Ideally, these data would be crop yield statistics, provided they are collected following a sound methodology and not "harmonized" in the aggregation process. Alternatively, we could use the outputs of agrometeorological or crop models (such as SARRA-O model, for the West African region [43]), or the results of automatic language processing methods applied to media data such as online newspapers [44].
To conclude, this work is a contribution to the upstream process of the food security data processing chain. Recognizing the importance of reducing sources of uncertainty for monitoring food insecure areas, this work shines a light on important sources of discrepancies between systems that should be considered for effective agricultural lands monitoring. He is currently a Computer Vision Engineer. After three years of research on field crop phenotyping with the INRAE research institute, Paris, France. He joined the CIRAD, Montpellier, France, in 2021 after two years of Postdoc with the University of Tokyo, Tokyo, Japan, University of Queensland, Brisbane, QLD, Australia, and Arvalis (France). He has experience in the development of algorithms for image segmentation, object detection, and 3-D point cloud analysis. His research now focuses on the use of machine learning/deep learning and remote sensing for the estimation of agricultural variables, crop practices, in particular for food security.
Louise Lemettais received the double master's degree in geography of risk from the University of Grenoble, Grenoble, France, in 2020, and in science in geomatics in environment and planning from the University of Toulouse, Toulouse, France, in 2021.
She is currently a Geomatics Engineer. She has several research experiences with CIRAD, a French scientific organization specialized in developmentoriented agricultural research for the tropics and subtropics, Montpellier, France. She is currently a Research Engineer with IRD, Marseille, France, a French scientific organization that addresses international development issues with its partners in the South. Her research work focuses on the development of satellite-based methods to assess vegetation dynamics in relation to climate change and agriculture.
Louise Leroux is currently a Researcher with CIRAD institute, AȋDA, Montpellier, France, unit, which works on agroecology and sustainable intensification of annual crops production in terms of quantity and also quality where its relevant, in a particularly constrained tropical environment. She is currently a Geographer with a strong background in remote sensing applied to agricultural monitoring. She focusses her research on the use of remote sensing technologies combined with statistical or biophysical modeling to improve the cropping systems descriptions (where are the crop areas and what kind of crops, what are the agricultural practices, what are the interactions with the surrounding landscape? etc.) and improve the assessment of agronomical and environmental performances of smallholder cropping systems. Over the last years she worked mainly in West Africa, with a focus on agroforestry systems. She was previously seconded to Centre de Suivi Ecologique, Dakar, Senegal, and she is currently with the IITA Nairobi, Nairobi, Kenya, where she conducts her research mainly in Ethiopia and Rwanda.