By Topic

• Abstract

SECTION I

## INTRODUCTION

LAND cover is one of the most critical environmental variables. It has, for example, numerous direct and indirect effects on environmental properties and processes that strongly impact on human health and well-being. Forests, for example, dominate the terrestrial carbon cycle and form the world's largest bank of species diversity making information on forest cover critical to the two greatest societal concerns of the day: climate change and the conservation of biodiversity. Changes in forest cover are, therefore, critical to scientific studies of carbon cycling and species conservation. There is, therefore, considerable need for accurate and timely information on land cover classes such as forest and the only practical means to derive this information over large areas and on a frequently repeatable basis is via satellite remote sensing.

Land cover mapping is one of the most common applications of satellite remote sensing. Considerable research has addressed a wide variety of issues connected with this application, from sensor design through image-preprocessing to map production and evaluation. The latter issue is of particular importance, especially if remote sensing is to provide land cover information to support to major scientific and policy applications such as contributing to the United Nations collaborative initiative on Reducing Emissions from Deforestation and forest Degradation (REDD). Accuracy assessment is typically undertaken on a non-site or, more commonly, a site specific basis and has evolved greatly over the last four decades [1]. It is now widely accepted that every map should be accompanied by a rigorously derived accuracy statement, otherwise it is no more than one untested hypothesized representation [2], perhaps being little more than a pretty picture [3].

Although guidelines exist that specify good practices for accuracy assessment in remote sensing [2] the methods are often not followed [4], [5]. Indeed, because of the many challenges encountered in a mapping programme [6], [7], [8], [9], [10] the best practices are sometimes impractical to implement. The validation of a map, in which its accuracy is assessed, is a distinctly non-trivial task [6], [7]. A key concern is that it is often extremely difficult to acquire a suitable ground reference data set upon which to base an accuracy assessment. Concerns about ground reference data abound especially in relation to the quantity and quality of the reference data used to evaluate the map [11], [12], [13], [14]. Thus while the main recommended approach to accuracy assessment [2] requires the collection of a potentially large sample of high quality ground reference data at sites selected in accordance to carefully designed probability sampling design the implementation of such approach is often extremely difficult. A range of research agendas have been defined to meet the challenges faced in accuracy assessment with activity directed at issues ranging from the provision of data sets that may aid an accuracy assessment [15] to the evaluation of methods for accuracy assessment without reference data [14], [16]. A further alternative explored here is to exploit the potential of reference data provided by volunteers, often amateurs or neogeographers collecting information as part of collaborative projects sometimes hosted on the internet.

The recent rise in citizen science and, in particular, citizen sensing [17], [18] offers an attractive, if complicated, alternative means to ground reference data collection to support map validation activities. The potential of volunteered data in accuracy assessment has been explored [19], [20] but many issues remain to be fully addressed. There are many concerns with volunteered data sets, notably their variable, and typically unknown, quality together with a set of ethical and legal concerns associated with its collection and use [21], [22]. Here, the aim is to further illustrate the potential of volunteered data in a way that explicitly recognizes its typically unknown and imperfect nature. Specifically, this article provides a brief summary of research into the validation of the forest cover representation provided by the European Space Agency's Globcover map using two sources of volunteered data. Section II outlines the data used in this research. Section III provides a summary of the methods used and especially latent class modeling. Section IV provides the results of the analyses and a discussion before the main conclusions are drawn in Section V.

SECTION II

## DATA

Attention focused on forests in West Africa and their representation in the Globcover map (Fig. 1); the map is available from http://dup.esrin.esa.int/globcover). The Globcover map was selected as it provides a contemporary and relatively fine spatial resolution (300 m) representation of land cover. The map was produced using data from the MERIS sensor acquired over the period December 2004 to June 2006. The map has also been validated and so there is a guide to the accuracy of the map. The latter is critical in that it provides targets for the research to achieve.

Fig. 1. Extract of the Globcover map for the study area. The classes have been adjusted for illustrative purposes.

Although the Globcover map has a relatively high thematic resolution the focus here was on a single class in order to reduce uncertainties, especially those associated with associated with class definitions. Here, the focus was upon only forests in West Africa. To reduce uncertainties about different forest types, the forest classes that occurred in the region were aggregated into a single class. There are many definitions of what constitutes a forest [23], [24] and here a site was considered to be forested if it had at least 15% canopy cover. The forest class was formed by aggregating a suite of mosaic, closed and open forest classes contained in the original Globcover representation of the region.

A rigorous validation of the Globcover map informed by published documentation on good practices has been undertaken [25]. This validation involved 16 experts, 3 focusing on Africa, who labeled 3167 points selected from the map. The Globcover map was estimated to have an overall accuracy of $\sim$67.1% rising to $\sim$79.2% if expressed on a class area-weighted basis and using only the ground reference data with a high confidence in the labeling [25]. The classes aggregated to form the forest class of interest in this work may also be aggregated in the confusion matrix used in the estimation of Globcover map accuracy. Following this aggregation of classes, the producer's accuracy for forest and non-forest classes was estimated as 84.04 and 74.89% respectively. These values provided benchmarks for use in the evaluations of the estimates derived using the volunteered data.

Inspection of the confusion matrix used in the formal validation of the Globcover map [25] indicated that the omission and commission errors for the forest class were asymmetric resulting in the extent of forest being overestimated by a factor of $\sim$1.42. A variety of methods to correct for misclassification bias are available [5], [26], [27]. Here, the ratio of the commission to omission errors was used to rescale downwards the estimated extent in order to adjust for the asymmetry in the observed classification errors. It was, however, apparent that some of the inter-class confusion evident in the matrix derived over the globe involved classes that did not occur in the region of study. Additionally, it is well known that map accuracy can vary spatially [28] with large regional variation observed in global maps [29], suggesting that the use of more local information is often desired. An evaluation of the Globcover map for the continent of Africa suggests that omission and commission are still imbalanced across the region [30] but less so than indicated by the formal validation of the entire map. The latter suggests that the extent of forest may be overestimated by a factor of $\sim$1.10. Critically, the extent of forest depicted in Globcover appears to be exaggerated by a factor of $\sim$1.10 to $\sim$1.42. These latter values may be used to rescale or revise downward the estimated extent of forest.

The key focus of this article is on the potential of using volunteered data as a means to validate the map and derive useful information without conventional ground reference data. With the formal validation providing authoritative estimates of map accuracy and forest extent, the volunteered data were used to provide a guide to the accuracy of the forest representation provided by Globcover. This analysis used the data acquired by volunteers at 99 locations across West Africa.

Two sources of volunteered data were used to generate the ground reference data set. First, freely available photographic data on ground conditions at the point of intersection of lines of latitude and longitude provided by the Degrees of Confluence project (www.confluence.org), a web-based collaborative project, were used as a source of spatially extensive information on ground surface cover (Fig. 2). An aim within the project is for each confluence point to be visisted by a volunteer and a set of photographs of the site obtained; typically the data set provided for each location contained four photographs viewing north, south, east and west from the confluence point. Additional data are often provided, including evidence of the photographs having been acquired at the correct location as well as text that could also reveal useful information about the site. Here, attention focused solely on the set of photographs acquired at or near the point of confluence (Fig. 3). The photographs for all 99 successfully visited and documented confluence points available at the time of the research (winter 2011) for the Ivory Coast, Ghana, Togo, Benin and Nigeria were used (Fig. 2). While the set of field visits from which the ground photographs were acquired were undertaken over a long period, the potential of forest seasonality effects and the time gap of up to seven years between map production and field visits are concerns. However, given the focus on low latitude forests and with the annual rate of forest area change being <0.5% for Western and Central Africa in the period 2000–2010 [31] the potential for major error is reduced. A second set of data was derived from volunteers at Nottingham University who labelled each confluence point as forest or non-forest based on the set of photographs available for it. Each site was labelled as forest (1) or non-forest (0) independently by a total of four people (A-D). Thus for 99 systematically distributed points across West Africa, volunteered data on forest cover were available to compare against the labeling depicted in the Globcover map.

Fig. 2. Map of West Africa showing the 1° lines of latitude and longitude.
Fig. 3. Examples of photographs acquired at or near confluence points; the size/shape of some photographs have been adjusted for presentational reasons. The photographs were acquired from the Degrees of Confluence Project, contributed by N. Bieger, A. Kovacs, R. Mautz, H. Resch, D. Wood and reproduced with their permission).

Aside from the use of land cover class labels generated by volunteers rather than experts, the fundamental nature of the sample used conforms to recommended practice in accuracy assessment. The systematic sample design, for example, is an equal probability design that is easy to implement and one of the recommended best-practice methods for use in relation to mapping large areas [2], [32]. The size of the sample was also anticipated to be suitable for credible accuracy assessment purposes. The required sample size can be calculated from sampling theory and is not a function of the size of the data set but of the desired precision in estimation and selected level of statistical confidence [33]. While the latter are a function of the specific objectives of a study, the sample size used, 99, is large enough to allow estimation with a margin of error <6.0% assuming that the popular target accuracy of 85% producer's accuracy can be used as a prior estimate of the accuracy and working at the 90% level of confidence.

SECTION III

## METHODS

For each of the 99 points of confluence, a set of four labels indicating forest presence (1) or absence (0) was available from the volunteers (A-D). These labels supplemented the set extracted from the Globcover map (E). The labels derived by each volunteer were compared against those from each other volunteer and in the map to provide a guide to the degree of agreement between the volunteer derived reference data and the depiction in the Globcover map. The level of agreement between pairs of labels was expressed as a percentage cases agreeing in label and also by the kappa coefficient of agreement [34]. The latter was calculated from, TeX Source $$\mathhat\kappa={p_{o}-p_{c}\over 1-p_{c}}\eqno{\hbox{(1)}}$$ where $p_{o}$ is the observed proportion of agreement and $p_{c}$ the proportion of agreement expected by chance. Although the kappa coefficient is unsuitable as a measure of accuracy in remote sensing [35], [36] it is used here primarily as an index of the level of inter-rater agreement and not of accuracy.

The key focus of this paper is on using the volunteered data to validate the Globcover map, including the estimation of forest extent (a non-site specific measure of map accuracy). Here, it is recognised explicitly that all five representations are imperfect; some of the volunteers, for example, had minimal relevant experience and hence error might be expected. It was shown in [16] that by casting the problem of estimation from imperfect data in terms of a latent class analysis a set of imperfect variables may be used to derive information on the accuracy of land cover maps. Here, the set of four labels derived from all volunteers were combined with those depicted in the map to yield information on forest cover and the accuracy of its representation. A key focus was on the estimation of forest extent, expressed as a percentage of the region covered by forest, and the accuracy of the map, expressed here by the producer's accuracy of the forest and non-forest classes.

The quality of imperfect data may be estimated by a latent class analysis [37], [38]. In a latent class analysis it is assumed that each of the input variables is an imperfect indicator of the unobserved, and so latent, variable of interest but that the observed associations among them can be explained by a latent variable [37]. Here, the latent variable is the forest class, F, while the observed or manifest data of the analysis are the sets of class labels derived from the four volunteers and the Globcover map. With the labels from sources A, B, C, D and E being represented by ${\rm a},{\rm b},{\rm c},{\rm d},{\rm e}=0$, 1 a latent class model for the probability of obtaining a pattern of class labels over the sources may be written as TeX Source $$\pi_{abcdef}=\pi_{f}^{F}\pi_{abcdef}^{ABCDE\vert F}\eqno{\hbox{(2)}}$$ where, based on the assumption of conditional independence, TeX Source $$\pi_{abcdef}^{ABCDE\vert F}=\pi_{af}^{A\vert F}\pi_{bf}^{B\vert f}\pi_{cf}^{C\vert F}\pi_{df}^{D\vert F}\pi_{ef}^{E\vert F}\eqno{\hbox{(3)}}$$ is the conditional probability that the pattern of class labels is $(a,b,c,d,e)$ given that the case has a forest class status $f$ (1 or 0) and $\pi_{f}^{F}$ is the probability that a case has the forest class status $f$ [37], [38]. Assuming that the model fits the data and reflects the class information as planned, the two sets of parameters of the latent class model equate, therefore, to key measures of classification accuracy used in remote sensing [16]. Attention here focuses on $\pi_{1}^{F}$ as an estimate of forest extent and $\pi_{11}^{E\vert F}$ and $\pi_{00}^{E\vert F}$ which represent respectively the producer's accuracy for the forest and non-forest classes in the Globcover map. The latter measures provide a detailed description of the accuracy of the binary classification and are the typical measures of interest in mapping studies as well as benefit from a theoretical independence on class abundances [12]. The fit of a latent class model to the observed data is commonly evaluated with the likelihood ratio chi-squared statistic, $L^{2}$; with a model typically viewed as fitting the data if the value of $L^{2}$ is sufficiently small to be attributable to the effect of chance [39].

The basic model defined by (2) and (3) may provide a poor fit to the data if the assumption of conditional independence that underlies it is not satisfied. However, the model in (1) can be adapted to allow for conditional dependence between all or some of the manifest variables. This is useful since it might be expected that the volunteers would all tend to correctly label the cases that were clearly non-forest (e.g. location 5°N 3°W in Fig. 3) and struggle most with the labeling of points that had some tree cover, probably close to the 15% tree cover threshold value used in the definition of forest. A test for conditional dependence was undertaken using a modified version of the log-odds ratio check method using the CONDEP programme (acquired from http://www.john-uebersax.com/stat/condep.html). With this method, the log-odds ratio for the observed and expected data are compared and a $z$ score calculated. The calculated value of $z$ indicates the extent of conditional dependence. Large values of $z$ suggest conditional dependence occurs and the values may be interpreted against standard tabulated critical values for specified levels of statistical significance if desired.

SECTION IV

## RESULTS AND DISCUSSION

Of the 99 locations, 50 were labelled as belonging to forest in the Globcover map. This indicates 50.50% of the region is covered with the class defined here as forest. However, the validations of the map reported in the literature [25], [30] indicate that this is likely to be an overestimate; this may in part be related to the breadth of the class definition used. Re-scaling the estimate of forest extent for the observed overestimation factor derived with the global confusion matrix in [25] suggests that the extent of forest cover is actually 35.36% while using the rescaling factor derived from the accuracy assessment for Africa reported by [30] suggests that the extent of forest is 45.82%.

There was only a relatively low degree of agreement between the volunteers in terms of their labeling of the locations from the photographs (Table I). In total, the volunteers agreed unanimously on the labeling of 48 locations (19 forest and 29 non-forest) with disagreement evident for the remaining 51 locations (Fig. 4). Pairwise comparison of the set of labels generated for the 99 locations showed that the four volunteers varied greatly in terms of labelling. The degree of agreement between pairs of labellers varied between 62.66 and 79.79% or, in terms of the kappa coefficient of agreement from 0.282 to 0.553 (Table I). These relatively low levels of agreement lead to an initial assessment that highlights some concerns for the use of volunteered data in accuracy assessment. Critically, while interpretation is limited by the absence of a gold-standard reference, the low levels of agreement may suggest that the individual volunteers differ greatly in terms of their perception of forest and so might be taken to suggest that such data have little useful role in map validation, especially as there is no obvious means to select between the volunteers.

TABLE I KAPPA COEFFICIENTS INDICATING DEGREE OF INTER-RATER AGREEMENT
Fig. 4. Summary of the degree of agreement between the volunteers for each point of confluence; gaps occur for sites not visited. Open circle—all agree non-forest; green circle—all agree forest; blue, black and red locations that 1, 2 and 3 volunteers respectively labeled as forest.

The limitations of the volunteer data are also evident when compared against the Globcover map. Again, only low levels of agreement between volunteered and mapped labels were observed, with agreement varying from 58.58% to 77.77% or 0.170 to 0.556 in terms of the kappa coefficient of agreement (Table I). These results suggest only relatively low degrees of agreement exist in the labelling provided by the different sources of data on forest cover. The results may suggest that none of the individual volunteers was able to provide data that could be confidently used in accuracy assessment. There are, of course, many potential sources of uncertainty and error in the analysis including problems with the photographic reference data, its labelling, the time period between the mapping and photography as well as error in the Globcover map. None-the-less, the low levels of agreement depicted in Table I do not initially suggest a positive role for any of the volunteers in a validation programme.

The estimated extent of forest over the test site derived from the volunteered data ranged from 32.33% to 57.57%. The wide range in the estimates is also apparent when viewed relative to the anticipated actual range of 35.36%–45.82%. These results further highlight some concerns with the use of volunteered data and suggest little confidence in using directly the volunteered data from any one source in map validation.

A variety of approaches might be used to derive an enhanced estimate of forest extent and map accuracy. One simple approach is to derive the average of the estimates from the 4 volunteer labellers. This approach yielded an estimate of 42.67% forest cover. Similarly, it would be possible to combine the individual classifications in a basic ensemble method. For example, by allocating each point to the class associated with the most frequently allocated label, and making a random class allocation for tied cases, yielded an estimate of 40.40% forest cover. Both of these estimates lie within the 35.36%–45.82% range anticipated and show that the set of imperfect labels derived by the multiple volunteers can be used to derive a credible estimate of forest extent. Here, however, attention is focused especially on the potential of the latent class modelling approach.

Using the set of four class labels for each point derived by the volunteers (A-D) and the map label (E), a latent class model based on (1) and (2) yielded a model that provided a good fit to the observed data; $L^{2}=22.73$ $(p=0.20)$. However, as volunteers might be expected to find some of the labelling tasks equally challenging with some locations obviously forest (or non-forest) while a set of others with some evidence of tree cover more problematic (see Fig. 3) it may be expected that the conditional independence assumption does not hold. The levels of inter-rater agreement observed (Table I) also suggest that some degree of dependence may exist between some of the sources. A test of assumption of conditional independence indicated that for some of the variables a degree of conditional dependence existed (Table II). Based on Table II the latent class model was adjusted such that: TeX Source $$\pi_{abcdef}^{ABCDE\vert F}=\pi_{abdf}^{ABD\vert F}\pi_{cf}^{C\vert F}\pi_{ef}^{E\vert F}\eqno{\hbox{(4)}}$$ In (4) the three variables indicated as being conditionally dependent (A, B and D) are those with the largest values of the z score in Table II. The model defined by (1) with (4) was applied to the data. This model appeared to fit the data slightly more closely than the original model that was based on the assumption of conditional independence; $L^{2}=14.20$ $(p=0.28)$. The parameters of this model were used to derive estimates of the accuracy of the Globcover map from the non-site and site specific perspectives that are widely used in the validation of maps derived by remote sensing.

TABLE II RESULTS OF TEST OF CONDITIONAL INDEPENDENCE ASSUMPTION

From a non-site specific perspective attention is focused on the estimate of forest extent and so the magnitude of the $\pi_{f}^{F}$ of the fitted model. The model based on (1) and (4) gave an estimate of 44.44% forest cover. Rigorous evaluation of this estimate is difficult in the absence of a true gold-standard reference data but it is very close to the value derived from Globcover map, especially after allowance for the known misclassification errors in the map. Indeed the model based estimate lay between the two values derived, after rescaling for misclassification errors, from conventional accuracy assessments: 35.36% and 45.82%. The latent class model, therefore, appears able to provide a credible estimate of the extent of forest cover in the region.

From a site specific perspective, the conditional probabilities of the latent class model provided estimates of the producer's accuracy for the forest and non-forest classes. The estimates for the forest and non-forest class were 81.22% and 74.55% respectively. Both are very close to the estimates derived from the confusion matrix produced in its formal validation [25] which were 84.04% and 74.89% for forest and non-forest respectively.

The results indicate that imperfect data arising from volunteered sources have the potential to provide useful information on map properties and accuracy. Specifically, the extent of forest (a non-site specific measure of accuracy) was estimated to within 1.38% and 9.08% of the rescaled estimates derived from the validation analysis provided by [30] and [25] respectively. The producers' accuracy of the forest and non-forest classes were also estimated to within 2.82% and 0.34% of the values derived from the formal validation. It should also be noted that the volunteered data have other advantages. For example, the systematic sample design used ensures that data are derived over a widespread area and could be used to aid map production (e.g. by provision of prior information on class abundance) and the nature of the design is suitable for many map accuracy and comparison objectives [32]. The latent class model parameters could also be used to indicate the quality of the labels derived from the different volunteers.

There are, however, many issues to explore with the use of latent class analysis in remote sensing applications. Further work should seek to explore issues connected with the number of volunteers and their level of expertise as well as concerns linked to incomplete and uncertain labeling, especially for multi-class classifications. These are important issues in relation to the work reported as only a small number of volunteers were used and the uncertainties with the data (e.g. due to the time gap between map production and field visits and common problems such as those associated with the definition of the forest class [40]).The potential to also steer volunteer activity, forming active citizen sensors, should also be explored. It should also be stressed that while credible estimates have been derived the method should be used with caution and is, not yet, viewed as an alternative to standard good practice methods. Many of these issues are the topic of a recently launched European Union funded COST Action on ‘Mapping and the Citizen Sensor’ (TD1202—readers interested in participating or following its activities are encouraged to see www.cost.eu/domains_actions/ict/Actions/TD1202 for further details).

SECTION V

## CONCLUSIONS

Volunteers have considerable potential to contribute constructively to land cover mapping programmes. This potential has grown rapidly over recent years, fostered by recent advances in geoinformation technologies and now offers great opportunities to enhance studies of the Earth's environment if it can be carefully used. Although the quality of volunteered data is often a concern it was shown that realistic estimates of forest cover and of map accuracy could be derived from easy to acquire and inexpensive, if not free, volunteered information.

This paper has confirmed the potential value of internet-based collaborative activities such as the Degrees of Confluence project for the provision of useful, spatially extensive, data to aid map evaluation. Moreover, it has shown that additional data from multiple interpreters can provide class labels for the data that can be used for validation purposes. Although these multiple classifications may be of unknown and uncertain quality they may be used together to aid the derivation of credible estimates of map accuracy. A latent class analysis may be applied to the sets of imperfect data to derive estimates of map accuracy. Although there were only relatively low levels of inter-rater agreement and each volunteer's labels showed poor agreement with the Globcover map, it was evident that the estimate of forest cover derived from the latent class analysis was close to that depicted in the map and accuracy estimates close to those derived from authoritative methods.

### ACKNOWLEDGMENT

We gratefully acknowledge the help and support provided by the volunteers that contributed to this work which extends that reported in a paper presented at IGARSS 2012 in Munich and the referees for their comments on the article. In particular, we are grateful to the volunteers: contributors to the Degrees of Confluence project and students at Nottingham University. Permission to use the photographs reproduced in Fig. 3 was gratefully received from N. Bieger, A. Kovacs, R. Mautz, H. Resch, D. Wood. Thanks are also due to the creators and distributors of the software used: LEM for the latent class analyses and CONDEP for the test of conditional dependence. We are also grateful to ESA and the ESA GlobCover Project, led by MEDIAS-France, for GlobCover products. The sample locations are defined in the paper and the data sets used available from cited sources. The preparation of this article gained in part from support via an EPSRC grant (reference EP/J0020230/1) and the development and launch of COST Action TD1202 (Mapping and the citizen sensor) which focuses on aspects of volunteered geographic information for mapping applications.

## Footnotes

The authors are with the School of Geography, University of Nottingham, Nottingham, NG7 2RD, U.K. (e-mail: giles.foody@nottingham.ac.uk; doreen.boyd@nottingham.ac.uk).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available