A Risk-Based Approach for Enhancing the Fitness of Use of VGI

Volunteered Geographic Information (VGI) phenomena offer an alternative or supplement to the authoritative mechanism of geospatial data acquisition. It allows people without professional geospatial skills or knowledge to participate in the geospatial data collection. VGI has been boosted by recent advances in geospatial technology and applications. VGI applications have shown great potential in various areas such as disaster management and public health. However, VGI suffers from a lack of quality assurance, because VGI contributors may lack knowledge of the geospatial domain and credibility. Moreover, VGI data may have different levels of detail and precision, and may have been collected for different purposes. Appropriate VGI data for a specific application may be less appropriate for another application. End-users may use VGI data without being aware of its appropriateness to their requirements. This may cause a risk that arises when end-users inappropriately use, or be uncertain about using VGI data in their applications. This risk may undermine the VGI project in which end-users are involved. This paper proposes an approach that aims to enhance VGI quality assurance by measuring the spatio-semantic similarity between user requirements and provided VGI, and to evaluate the fitness-for-use of VGI. The proposed approach is based on an algorithm to help VGI end-users to deal with the risks related to the quality of VGI data. The approach helps end-user make appropriate decisions about VGI data (e.g., considering or not considering VGI data and being careful when using VGI data).


I. INTRODUCTION
In the last decade, with the rapid development of geographic information systems (GIS) and remote sensing (RS) and the use of the Internet for GIS applications, there has been an important increase in the availability and openness of geospatial data. The availability of geospatial data has expanded with the emergence of volunteer geographic information (VGI). VGI refers to the use of the web to access, assemble, and disseminate geospatial data provided voluntarily by individuals [7], [15], [20]. VGI aims to allow people without professional geospatial skills or knowledge to participate in collecting geospatial data (e.g., OpenStreetMap). VGI has been widely adopted in many application areas such as disaster management, reporting the spread of avian influenza, and traffic management [2].
The associate editor coordinating the review of this manuscript and approving it for publication was Amjad Mehmood .
However, the quality of VGI data may be insufficient for a particular use. In fact, VGI contributors may provide geospatial data without necessarily having knowledge of the geospatial domain [4]. In addition, contributors may lack credibility (e.g., involved in vandalism or sending false information). Moreover, VGI data may be produced by many contributors who have used different technologies and tools. As such, VGI data may have different levels of detail and precision, and may have been collected for different purposes [43].
Consequently, VGI data may be characterized by uncertainty and insufficient appropriateness for a particular use [27], [39], [42]. Such uncertainty may cause a risk of misusing data and may undermine the VGI project in which end-users are involved [2].
To effectively respond to the uncertainty related to geospatial data in the context of VGI, end-users should be aware of the VGI data quality with regard to their application VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ (i.e., intended use). Thus, evaluating and enhancing the fitness-for-use of VGI has become increasingly important. The aim of this paper is to provide a step forward in making users aware of how appropriate VGI data are for their intended use, and to enhance the fitness-for-use of VGI. We propose an approach to help end-users to deal with the risk of misusing VGI data. To this end, we first propose a spatio-semantic similarity measure between the user requirements and the provided VGI. Then, we evaluate a set of indicators to assess the fitness-for-use of VGI data. This evaluation will help end-users make appropriate decisions about geospatial data in the context of VGI. For example, if the quality of VGI data is very low, it may not be ideal to use it.
In the next section, we review some works that proposed approaches and algorithms to evaluate and enhance the quality of VGI data. In Section III, we discuss the risk of misuse of geospatial data in the context of VGI, and we present our proposed approach to enhance the fitness-for-use of VGI data. Our approach consists of a spatio-semantic similarity measure between user requirements and the provided VGI, an assessment of a set of indicators for VGI quality, and an algorithm that aims to help end-users make appropriate decisions with regard to VGI quality. In Section IV, we propose a prototype, we developed, to implement our approach. In section V, we present some experiments and results. Section VI presents a comparison with related state-of-the-art methods. Finally, we conclude the study and present further works.

II. RELATED WORKS
The quality of VGI data can be described by guidelines and indicators [5], [43]. Quality guidelines have been identified mainly by standardization organizations (e.g., the International Organization for Standardization (ISO)). Some researchers have based their work on such guidelines to reach a certain level of compatibility between VGI data and reality. This is done mainly by comparing the VGI data to authoritative data as a reference. Authoritative data normally follow standardization guidelines, and may be available from commercial or governmental sources [11], [16], [24], [29].
Regarding quality indicators, suggested methods aim to describe aspects of VGI data quality to enhance its reliability. The main standards organizations (e.g., ISO 19157, ICA, FGDC, and CEN) have defined a set of quality indicators. The commonalities among them are accuracy (thematic, positional, and temporal), consistency, and completeness [6], [22]. Completeness describes the extent to which elements of the reference dataset are present in geospatial dataset. This can be attributed to a lack of data. Consistency is the degree of adherence to the rules of data structure, attribution and relationships [28]. Accuracy is the closeness of the collected data to the truth [31].
Some researchers have based their work on the above mentioned indicators to evaluate the quality of data produced from the VGI process [17], [43]. To evaluate completeness, researchers compared VGI with authoritative data. These studies are based on comparing the length in a grid map [18], [24], [46], comparing the total area [19], [30], or comparing VGI data with the ground truth [33]. Consistency is measured by comparing different dataset objects with other objects of the same theme (intratheme consistency) or objects of other themes (inter-theme consistency) [5], [19]. Ali and Schmid [4] focused on detecting inconsistencies in VGI related to incorrect data classifications. Other studies have focused on assessing topological relationships in VGI, either based on mathematical techniques to determine topological consistency [12], or based on the similarity between spatial objects in multirepresentation [26].
With regard to accuracy, some studies used trained volunteers to evaluate and enhance the closeness of the VGI data to the ground truth [36]. Other studies have assessed the accuracy of VGI with regards to authoritative data. Al-Bakri and Fairbairn [3] assessed spatial accuracy by comparing VGI and authoritative data, and found that the errors were quite high in the context of VGI. The authors attributed these errors to the fact that data contributors usually use low-precision instruments such as personal GPS devices to collect data.
In addition to the most common indicators (i.e., accuracy, consistency, and completeness), there are more abstract indicators to assess the quality of VGI, such as trustworthiness, credibility, and vagueness [43].
Trustworthiness can be expressed as the time at which the contributor has been registered with the VGI project [44]. With regard to VGI credibility, some studies have proposed crowd-intelligence-based approaches to evaluate and correct errors. Other works have proposed the intervention of gatekeepers (i.e., the social approach) [21], [25]. Regarding vagueness evaluation in the context of VGI, De Longueville et al. [13] proposed an approach that consists of automatically capturing the scale at which VGI data is produced. VGI produced at lower scales is classified as vague.
Previous works have mainly compared VGI data with authoritative data as a reference. However, authoritative data has limitations including the availability and accuracy. In fact, authoritative data may be unavailable or missing, and hence it is not possible to assess the quality of VGI. Accessing authoritative data, if available, is often restricted by constraints or it costs relatively much [5], [27].
In addition, while the aforementioned studies assessed some indicators to evaluate and enhance VGI quality, they did not focus on the context of a particular intended use. In fact, VGI data originally collected for a given purpose could be used for another purpose. Although it could be considered appropriate for the original purpose, it may be considered less appropriate for another purpose.
Also, existing studies have proposed solutions in a nonsystematic manner. That is, they have not been based on predefined or ordered phases. As a result, end-users have to put extensive time and effort into identifying the risks related to quality of VGI data. Even with such intensive efforts, they 90996 VOLUME 10, 2022 may fail to identify the severity of the risks of misusing VGI data. Consequently, existing approaches for evaluating VGI data quality remain vulnerable to the risk of data misuse. Accordingly, and inspired by the methods of risk management in the field of project management [37], we defined an approach to deal with the risks of data misuse in VGI. The approach is based on evaluating a set of indicators of VGI quality and a risk management method that aims to facilitate decision-making regarding the risks of VGI misuse.
Our proposed approach to risk management consists of evaluating the risks related to VGI quality, and reacting to the risks by making appropriate decisions related to the use of VGI data.

III. ENHANCING THE FINTESS OF USE OF VGI A. RISK RELATED TO VGI QUALITY
End-users typically do not have a proper idea of VGI data quality. Consequently, they may make incorrect assumptions regarding the use of VGI data. Such assumptions have the potential to expose end-users to the risk of data misuse. This risk is characterized by the probability that end-users inappropriately use or be uncertain about using VGI data in their applications (or integrating VGI data with their existing data), and by the damage that can be caused by such inappropriate use or uncertainty.
We illustrate the risk of data misuse with the following example: a VGI contributor provides georeferenced image taken by mobile phone (see Figure 1). This contributor sent the image with no detailed information about the geospatial features shown in the image (i.e., lack of metadata). Without more complete, clear, or detailed indications, the quality of VGI may be insufficient and lead users to false assumptions, such as what black polygon within a road refers to; a pothole or bulge on the road? Such uncertainty may cause a risk of inappropriate data usage. Our method for identifying and responding to the risks of VGI misuse is based on evaluating the fitness-for-use, that is, quality information about a dataset's suitability for a particular application or conformance to a set of requirements.
While a good fitness-for-use of VGI indicates that it is less likely to have a risk of data misuse; insufficient fitness-for-use indicates a higher risk. Consequently, evaluating the fitnessfor-use of VGI allows to identify and evaluate of the risk of data misuse.
Moreover, evaluating the fitness-for-use of VGI facilitates the response to the risks of data misuse. Indeed, based on the fitness-for-use evaluation, end-users can be advised: a. Not to use a dataset that has low fitness-for-use. b. To use a dataset which has a good fitness-for-use. c. Carefully consider a dataset that has satisfactory fitness-for-use. In our approach, we propose a set of indicators to evaluate the fitness-for-use of the VGI data with regard to user requirements. Therefore, we first propose a spatio-semantic similarity measure between user requirements and the provided VGI data.

B. SIMILARITY MEASURE BETWEEN USER REQUIREMENTS AND PROVIDED VGI
The idea is to identify the similarity between the provided VGI information and the explicit user requirements. The development of a similarity measure is then a fundamental step in VGI quality assessment. Two aspects of similarity were considered: (1) semantic similarity and (2) spatial similarity. In the following sections, we detail and propose a semantic similarity measure as well as a spatial similarity measure between user requirements and provided VGI description. The spatial similarity measure integrates a topological distance, a metric distance, and an orientation distance between user requirements and provided VGI description.

1) SEMANTIC SIMILARITY MEASURE BETWEEN USER REQUIREMENTS AND PROVIDED VGI
The proposal of the semantic similarity measure between user requirements and the provided VGI is detailed in the next section. We first present a formalization of some basic concepts.

a: USER REQUIREMENTS REFERENCES
We suppose that the user presents a query describing the VGI Let Q: the set of the data elements presenting he user requirements regarding VGI.
We define the set of references corresponding to user requirements as is the set of keywords (concepts) introduced in the user requirements; 1 ≤ i ≤ n; i and n are positive integers.

b: PROVIDED VGI REFERENCES
We assume that the user can present a description of the VGI provided. This description is decomposed into a set of keywords describing the provided VGI. VOLUME 10, 2022 Let M : the set of Provided VGI description. We define the set of references corresponding to the provided VGI description as follows: } is the set of keywords (concepts) introduced in the user requirements.

c: SEMANTIC SIMILARITY MEASURE BETWEEN USER REQUIREMENTS AND PROVIDED VGI
The similarity measures proposed in literature are based on the knowledge representation model presented in ontologies and semantic networks [41]. The concepts in our proposal are represented by references. To compute the semantic distance between references and concepts, we refer to an edge-counting method based on the Rada distance [40] using an application ontology representing the different concepts of the provided VGI. The Rada distance computes the minimum number of edges that separate concepts in the ontology. We opted for the Rada distance for semantic similarity evaluation because it is simple, accurate, and efficient [40], [41].
Let A = (S ij ) 1≤i≤n,1≤j≤n denote the matrix of the semantic distances between the references R i q (1 ≤ i ≤ n) of the user requirement and the references R j m (1 ≤ j ≤ n) of the VGI description; 1 ≤ i ≤ n and 1 ≤ j ≤ n; S ij : Distance between the reference R i q of the user requirements (Q) and the reference R j m of the provided VGI description (M) using the Rada distance basing on the knowledge representation model offered by the application ontology.
The semantic distance between the user requirements (Q) and the VGI description (M), denoted as D sem (Q, M ), is obtained as follows: The semantic similarity measure is derived from the semantic distance as follows: (2)

2) SPATIAL SIMILARITY MEASURE BETWEEN USER REQUIREMENTS AND PROVIDED VGI
In the literature, spatial similarity assessment is mainly based on the topology, direction, and distance between spatial elements [8], [32]. Thus, to compute the similarity between user requirements and the provided VGI, we define a topological distance, metric distance, and directional distance between VGI user requirements and the provided VGI description.
In the next section, we propose and formalize three different distances between user requirements and provided VGI: an orientation distance, a topological distance, and a metric distance, as well as a spatial distance that integrates the three proposed distances.

a: ORIENTATION DISTANCE BETWEEN USER REQUIREMENTS AND PROVIDED VGI
To access the orientation distance between user requirements and the provided VGI, we propose computing the distance between the orientations of spatial objects presented in each of them. Therefore, to compute the orientation distance between spatial objects in user requirements and provided VGI, we use the graph of spatial directions and the costs of transformation defined in the TDD model [1], [32]. According to this model, nine types of orientations are defined (north, northwest, west, southwest, south, southeast, east, northeast, and equality), and the cost of moving from one direction into a close direction is equal to 2. Figure 2 shows the conceptual neighborhood and cost of transformation from one direction to another, as defined in the TDD model. Let B = B ij : Denote the matrix used to measure the orientation distance between the spatial objects in user requirements and the spatial objects in the provided VGI, 1 ≤ i ≤ n, 1 ≤ j ≤ m, and b ij : the direction distance between object I in Q and object j in M ; aij= MCT (ob i 1 , ob j 2 ): represents the minimum cost of transformation of the orientation of object i in Q to the orientation of object j in M using a conceptual neighborhood network.
The distance in terms of the direction between user requirements (Q) and the provided VGI (M), denoted as D dir (Q, M), is computed as follows: 90998 VOLUME 10, 2022

b: TOPOLOGICAL DISTANCE BETWEEN USER REQUIREMENTS AND PROVIDED VGI
Computing the topological distance between user requirements and provided VGI description includes evaluating the topological distance between pairs of spatial objects in each of them.
To evaluate the topological distance between spatial objects, we proposed in [1] to use the conceptual neighborhood proposed by Li and Fonseca [32] in the TDD model for spatial similarity assessment. The graph is shown in Figure 3. In the TDD model, the authors propose to decompose the conceptual neighborhood graph into three groups of topological relationships. The distance between two arcs in the same group (intra-group) is equal to 2. However, the distance between two arcs belonging to two different groups is equal to 3. Except for the distance between the two nodes (meet and overlap), where the transformation cost is equal to one. Figure  Let T = t ij ; denote the matrix used to measure the topological distance between spatial objects of user requirements (Q) and spatial objects provided VGI description (M), 1 ≤ i ≤ n, 1 ≤ j ≤ m, and t ij is the topological distance between the object i in Q and the object j in M using a conceptual neighborhood graph.
The topological distance between Q and M, denoted by D topo (Q, M), is computed as follows:

c: METRIC DISTANCE BETWEEN USER REQUIREMENTS AND PROVIDED VGI
To compute the metric distance between user requirements and VGI description, we propose to compute the thematic distance between spatial objects presented in each of them.
For this objective, we propose to use the traditional model [1], [32] composed of four possible situations for distances (equal, near, medium, and far). In this model, the cost of the transition from one situation to another is equal to one. Let M = mt ij , denote the matrix used to measure the metric distance between spatial objects of user requirements (Q) and spatial objects provided in the VGI description (M), 1 ≤ i ≤ n, 1 ≤ j ≤ m, and mt ij : the metric distance between object i in Q and object j in M using a metric distance model.
The metric distance between Q and M , denoted by D met (Q, M), is computed as follows:

d: SPATIAL DISTANCE BETWEEN USER REQUIREMENTS AND PROVIDED VGI
The spatial distance between spatial objects integrates topological, orientation, and metric distances [1], [8], [32]. Thus, the spatial distance between the user requirements and the provided VGI description is computed as follows: Definition: Given user requirements (Q) and the provided VGI description (M ), the spatial distance between Q and M is computed as follows:

e: SPATIO-SEMANTIC SIMILARITY MEASURE BETWEEN USER REQUIREMENTS AND PROVIDED VGI DESCRIPTION
The principal objective of section B is to propose a similarity measure between user requirements and provided VGI data in order to evaluate the fitness-for-use of VGI. The similarity measure that we propose integrates semantic similarity as well as spatial similarity between user requirements and provided VGI description. Spatial similarity considers topological similarity, spatial similarity, and orientation similarity between spatial objects in user requirements and provided VGI.
The spatio-semantic distance between user requirements and provided VGI data reflects the degree of spatial similarity and semantic relatedness between them [1]. Therefore, the spatio-semantic distance between user requirements and VGI data is derived from the spatial and semantic distance as follows: Definition: Given user requirements (Q) and a VGI description (M),

C. INDICATORS FOR VGI QUALITY ASSESSMENT
We propose a restricted set of indicators and quantitative approach for evaluating the fitness-for-use of VGI data. These indicators are VGI completeness and credibility. It should be noted that these indicators do not aim to be complete or precise but rather indicate whether potential risks may occur. A small number of quality indicators provide synthetic key information about data fitness-foruse [14]. They allow end-users to have a global idea, and then they may or may not dig into details [14]. Typical decision-making processes use a small number of indicators [14].
Completeness is a widely used data quality indicator [29]. Measuring the completeness of VGI data is key for VGI to fit a particular usage. Credibility is an important issue in VGI as contributors, whether willfully or not, may provide false or misleading data. This may undermine the project in which VGI data is used and may cause a further lack of trust from end-users about the quality and usability of VGI [35].
For each indicator, the quality is assessed within the interval [0, 1]. A value of 1 indicates perfect fitness-for-use, whereas a value of 0 indicates very poor fitness-for-use.

1) COMPLETENESS OF ACCURACY
This indicator shows the number of VGI data elements with regard to the required elements. An element, whether data or metadata, may be a feature, attribute, or relationship with different elements. We represent a data element in the form {element nature, element type, element value}, where element nature indicates whether the element is related to data or metadata, and element type indicates whether the data is a feature, an attribute, or a relationship. On the other hand, the element value is the representation of the element. For example, the value of the feature pothole is a collection of pixels within the image. An example about a metadata element related to a data element pothole is the device used to take a picture of the pothole (e.g., a mobile phone).
We recognize thematic, spatial, and temporal completeness. The evaluation of completeness Cd is calculated as follows: where Nme (Nse and Nte) is the number of thematic (spatial and temporal) data elements provided by a contributor. Nmr (Nsr, Ntr) is the number of thematic (spatial and temporal) data elements required for a particular use. Wm (ws, wt) is a predefined weight for thematic (spatial and temporal) data elements. This weight indicates the importance of each type in a particular use. If the number of available data elements is equal to or greater than the required number of elements, the VGI data is complete, and its value is set to 1. Otherwise, the ratio of existing elements to the required elements shows the degree of VGI completeness. The number of elements (Nmr, Nsr, and Ntr) as well as the weights ( wm, ws, and wt) can be predefined by VGI process analysts with end-users. In the example presented in Section II (Figure 1), the emergency application's users need, besides the data elements available (geospatial features), another element specifying the type of the feature shown in the image (e.g., is it a bulge? Is it a poorly repaired hole on the road?). Thus, if we suppose that the weight of spatial completeness is 1, then C =3/4.

2) VGI CREDIBILITY
This indicator describes the degree of faith that we have regarding the data provided with regard to a particular usage. As such, an increase in the amount of negative feedback of the VGI data (negative report, or modification) lessens its credibility. Therefore, the greater is the number of reports and modifications, the lower is the VGI credibility. We evaluate VGI credibility using the following function: N report + N modify Nf where N report (N modify ) is the number of reports (modifications) related to the VGI data, and Nf is the total feedback related to the same VGI data.
The overall quality Qvgi is calculated as follows: where Qi is the ith quality indicator of the VGI data. where Wi is the ith predefined weight of the ith indicator. where Nq is the total number of indicators. Wiz is a binary number equal to zero if weight Wi is null. In this case we subtract this indicator from the total number of VGI indicators.
In the next section, we propose an algorithm to help users deal with the risk of misusing VGI in a systematic manner.

D. ALGORITHM TO MANAGE THE RISK RELATED TO VGI QUALITY
This section proposes an algorithm to help end-users make appropriate decisions about the risk of data usability in the context of VGI. The algorithm is based on the well-established possibilities of responding to risks (i.e., reducing, absorbing, or transferring risks). Figure 5 illustrates the proposed algorithm. Based on the previously identified indicators of the fitness-for-use of VGI data, we start by identifying and assessing the risks of data misuse (e.g., incomplete VGI fitness-for-use or lack of credibility). Based on risk assessment, end-users will be advised to respond to the risks of data misuse in many ways: -Suspending the VGI process if it presents a high risk of harmful consequences. In this case, end-users are invited not to use the VGI data. -Carefully using VGI data if it has satisfactory fitnessfor-use. -Encouraging the use of VGI, if VGI data has good fitness-for-use. The proposed indicators play a key role in the proposed algorithm. They allow to identify the risks of data misinterpretation and to draw conclusions about them. Specifically, these indicators have three principal aims: First, to help end-users identify risks related to VGI data quality. In fact, in the proposed algorithm, while a good quality of VGI data indicates that it is less likely to have a risk of data usability, poor quality indicates a higher risk.
Second, to help end-users make appropriate decisions (e.g., to consider or not VGI data).
To help end-users intuitively understand the degree of risk related to VGI data, we provide a set of warnings based on standard danger symbols proposed by ANSI [47]. These symbols are enriched by well-known mapping markers to assist end-users of VGI data. The symbols show the degree of risk of VGI data misuse, and thereby stimulate appropriate responses to such a risk. For example, if the warning is 'Danger', it would be better to suspend the VGI data use process, which may lead to considerable risk. Table 1 shows an example of how warnings can be predefined with regard to the quantitative values of the fitness-for-use of VGI data.
In the following section, we present a prototype called FitVGI, which we developed to instantiate our approach.  Based on this prototype, we continue with the example introduced in section III. A. (see Figure 1) to show how the proposed approach can be used to address the risks related to VGI data usage. We then implement our FitVGI prototype in a case study of a road maintenance project.

IV. THE FitVGI PROTOTYPE
To implement our approach for managing the risk related to the VGI data quality, we developed a prototype called FitVGI. The FitVGI prototype was implemented in Python environment. We used some packages such as GeoPandas, Plotly, NumPy, SciPy, and Scikit-learn to develop our prototype.
The prototype implements three main functionalities: data collection from contributors, risk-based evaluation of VGI fitness-for-use, and visualization of fitness-for-use indicators. VOLUME 10, 2022 To better illustrate our approach, we continue with the example of application presented in Figure 1. A Contributor takes a georeferenced image by a mobile phone and sends it to the FitVGI reception interface. Through this interface, the contributor can add some textual metadata (e.g., adding title, uploading a text file), as shown in Figure 6. The VGI data has two features: a road and pothole (or a bulge on the road). In addition to these two elements, endusers need to know if there are buildings close to the pothole. Consequently, Cd = 2/3.
In addition, this data has been sent recently, and two other contributors have proposed some modifications (adding more information about this data, i.e., metadata). In addition, this data was sent three times through the prototype. Thus, Td = (1 -(2/3)) =1/3.
Based on Table 1, the orange symbol is shown to end-users who should pay attention if they decide to use VGI data (see Figure 7).

V. EXPERIMENTS AND RESULTS
In order to show the utility of our approach, we implemented the FitVGI prototype in a case study of a road maintenance project in Tunis city (Tunisia). This ongoing project, managed by the Urban Planning Agency, aims at keeping the road structure as normal as possible and practicable. The project uses VGI data to locate and manage potholes and other road obstacles.

A. METHODOLOGY FOR APPROACH EVALUATION
We used a human evaluation technique to evaluate our approach. We believe that human experts in road maintenance are likely to be in a good position to evaluate the outcome of our approach. Road maintenance experts are qualified to indicate if the assessment of VGI data quality provided by our systems is really useful to give good recommendation to end-users according to their needs and preferences.
Our evaluation technique consists of computing the correlation between (1) the quality assessment provided by our FitVGI prototype and (2) the quality assessment provided by road maintenance experts. To calculate this correlation, we adopted the widely-used Spearman correlation method [45]. This method is used to measure the strength and direction of a relationship between two ranked variables using a monotonic function. Many works have used the Spearman's rank correlation coefficient to explore and evaluate quality indicators [10], [23].

B. DATA PROCESSING
We used a set of georeferenced images taken by mobile phones, and received by the FitVGI prototype. The images, provided by VGI contributors, present potholes, bad roads and other road obstacles located in different areas of Tunis. Figure 8 shows some of the received images.
Using our FitVGI prototype, we evaluate the quality of each of those images. The selection of the images to be used in the evaluation process is made based on the methodology of Miller and Charles [34] in building the sample in the correlation evaluation process. In fact, the evaluation process is made with reference to 30 images: 10 images that have a high level of quality (0.75 < Q ≤ 1), 10 images with a medium level of quality (0.5 < Q ≤ 0.75) and 10 images with a lower level of quality (Q < 0,5); quality is assessed by our FitVGI prototype.

C. EVALUATION PROCESS AND RESULT ANALYSIS
We asked 15 road maintenance experts to visit the road where the VGI images where taken, and then to assign to each image a quality value that reflects its appropriateness to the requirements of the road maintenance project (i.e., how much the image is appropriate to be used for road maintenance project). The value of the quality indicator assigned to each image should be between 0 and 1.
After collecting quality values, we calculate Spearman's coefficient that computes the correlation between the quality values of the FitVGI prototype and those provided by experts. We obtained 0.773 as a value of the Spearman's correlation coefficient (see Figure 9). . Correlation between quality assessment of FitVGI and quality assessment provided by road maintenance experts.
A value of the correlation coefficient equal to 0.773 presents a positive correlation between quality assessment provided by our FitVGI prototype and quality assessment provided by road maintenance experts. The result proves the ability of our approach to make useful quality evaluation of VGI data. In addition, and based on this evaluation, our approach proposes useful recommendations about using -or not using-VGI data.

VI. COMPARISON WITH RELATED STATE-OF-THE-ART METHODS
Previous works that proposed approaches and algorithms to enhance the quality of VGI may be categorized into three main groups: works proposing quality guidelines, works evaluating quality of VGI based on reference data, and works assessing VGI quality indicators.
The first group of works propose guidelines to enhance VGI quality (e.g., [28]). However, guidelines are often general and do not necessarily target a specific usage of data. On the other hand, our approach assesses the quality of VGI data in a particular usage (i.e., fitness-for-use).
For the second group, being based on reference data (basically authoritative data) may lead to limitations including the availability and the quality of reference data, and even if it is available, accessing reference data is often restricted by constraints or it may be expensive. In addition, these approaches do not take into account the intended use of VGI data. That is, even if VGI data is reflective to the reference data, it may be of poor quality in a particular usage. On the contrary, our approach does not refer to any authoritative data. It refers to the particular requirements of each end-user.
The third group includes algorithms that evaluate VGI quality indicators (e.g., [4], [9], [12], [27], [38], and [44]). These works focused on the internal quality of VGI data, and did not focus on the context of a particular usage. This may cause a risk of inappropriate usage of VGI data. In fact, VGI data may be appropriate for a specific usage, and less appropriate for another usage.
On the other hand, our approach takes into account the intended usage of VGI data. It evaluates the quality of VGI data for each particular usage. Table 2 summarizes the comparison of our work with three groups of state-of-the-art works. A comparison of our approach with state-of-the-art works. VOLUME 10, 2022 In addition, based on risk management, our approach provides a recommendation for each usage of VGI data. As such, end-users can be advised either to use VGI data which has a good quality, not to use VGI data that has poor quality, or to carefully use a VGI data with not good enough quality.

VII. CONCLUSION
Uncertainty related to VGI data has the potential to expose end-users to the risk misusing such data. This risk may be due to the fact that VGI data may be of poor quality with respect to end-user requirements (i.e., fitness-for-use). In this work, we proposed a method to evaluate the fitness-for-use of VGI data based on a set of indicators and a spatio-semantic similarity measure between user requirements and the provided VGI. We also proposed a risk-based algorithm that consists of a systematic process to make end-users aware of potential risks, and help them make appropriate decisions about using or not using VGI data.
We developed a prototype called FitVGI to implement our proposed approach to enhance the fitness-for-use of VGI data. The prototype presents the quality indicators to the end-users in an intuitive way to help them make appropriate decisions about the risks related to VGI data.
Based on VGI data sent by contributors and a human evaluation approach, we demonstrated how our proposed algorithm, as well as the evaluation of the proposed fitness-for-use indicators, can help end-users to make appropriate decisions about using or not using VGI data, or to be careful when using it. We equate our approach implementation to a recommendation system about the usefulness of the provided VGI data with regards to particular requirements.
We should remind that the defined indicators do not aim at being complete, or completely eliminating the risk related to VGI data, but rather making users aware of such a risk.
Further work is required to enhance the approach and the FitVGI prototype by allowing interaction not only with endusers, but also with VGI contributors. Therefore, contributors may rectify or withdraw data if it does not fit a particular usage.