Road Extraction Based on Level Set Approach From Very High-Resolution Images With Volunteered Geographic Information

The road network is of great importance to the modern traffic system, which changes frequently as its high speed of development. Simultaneously, a growing number of satellites provide a considerable number of remote sensing images that can be used for road extraction. However, most existing methods cannot get accurate results in the high-resolution remote sensing, since the complex background like the shadows and occlusions. Consequently, we proposed a method utilizing the volunteered geographic information data to extract the road from satellite images. At first, from Mapbox platform, the Volunteered Geographic Information data, OpenStreetMap vector tiles are collected to guide the downloading of corresponding road satellite images. Then a level set method is used to extract the road areas from the satellite images supported by the road vector. After that, a superpixel method is applied to improve the road areas by fusing the superpixels with the previous results. Experiments are performed on the satellite images from Mapbox, Google, and Yahoo in the Tianjin Port district on accuracy and efficiency. Experimental results demonstrate that our approach is more accurate and of higher performance than the other two state-of-the-art methods.


I. INTRODUCTION
Road network is a crucial component in the transportation system of urban, such as automobile navigation and urban planning [1], [2]. Increasing high-resolution remote sensing images are available as the development of satellite technology, and which promotes the evolution of constantly emerging new methods. Among them, road extraction from very high-resolution (VHR) satellite images has become a critical research topic, which plays an essential role [3] in navigation, cartography, land management, etc. However, As the VHR images contain much more information that is detailed, different objects may have similar spectrums, and the same objects may have different spectrums. It is challenging to extract the road from the image [4]. Besides, the shadows from the buildings, vehicles, and vegetation make it more complicated. Although many efforts have been devoted to extracting the road from the VHR images, it is still difficult The associate editor coordinating the review of this manuscript and approving it for publication was Stefania Bonafoni . to find a practical and efficient way of extracting road in the complex environment [5].
There are mainly two classification methods from different perspectives. Based on the different extraction targets, the methods can be divided into road area extraction and road centerline extraction methods [6]. The road area extraction methods aim to obtain the area of road coverage, and they belong to the category of image segmentation [7]- [11]. The road centerline extraction methods intend to draw the one-pixel-wide lines in the center of the road area, usually computed by some skeletonization methods [12]- [16]. According to the degree of automation, the road extraction methods can be classified into semi-automatic and automatic methods [17]. Semi-automatic methods need user interactions when extracting roads, which may cost more time and effort [17]- [24]. Conversely, automatic methods do not require use interactions in road extraction, which is undoubtedly faster and more efficient [25]- [33].
Silva et al. [34] utilized the Rodon transform method to detect the seed segments. According to the radiometric VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ characteristic and directions of road segments, they search for new segments to generate the centerline. Zhang et al. [35] introduced a semi-automatic approach that employed template matching and Euclidean distance transformation. The directions, widths, and starting points of roads were obtained by spoke wheel algorithm from the seed points, and then a reference template was generated to trace on the road to compute the target template. After that, the Euclidean distance transformation was applied to refine the centerline. Similarly, in [36], spoke wheel and region growing algorithms were combined to extract rectangle road segments based on manual seed points. Then the histogram was used to refine the rectangle. Sujatha and Selvathi [28] proposed a method based on adaptive global thresholding and connected component analysis to segment the road regions. Then, the mathematical morphology (MM) was exploited to remove the useless road segment to generate the centerlines. Bae et al. [37] computed the second derivative map of the roads and then refined it by the orientation and shape information to obtain the centerlines. Gao et al. [38] proposed an approach to extract the road centerline based on edge constraint, in which a road probability map was obtained by integrating the Mahalanobis distance transform and edge energy. The centerline was connected by the Fast Marching method in the end. These approaches all depend on the intensity, shape, and texture features to define the road and recognize the road areas. However, because of the high complexity in the VHR image, it is challenging to determine the road area just by these characteristics. Consequently, the accuracy of road recognition is not desired.
To further make use of the image information, many machine learning methods were introduced. For instance, Miao et al. [21] adopted the geodesic method to compute the road centerline, and Mahalanobis distance and kernel density methods were used to refine the road centerlines. Li et al. [8] proposed a fast level set evolution, replacing the traditional regularization term with the Gaussian kernel term to extract the road area. Li et al. [39] presented an unsupervised method to extract the road area. They fused the superpixel segmentation with the intensity, color, and texture features to produce the superpixels. Finally, the roads are classified by a Gaussian mixture model. Miao et al. [40] exploited a level set method to compute the road area based on the Hue-Saturation-Value color space in the image and the centerline is extracted through the geodesic method. Zhou et al. [15] proposed an automatic process that extracts road network. The road area was segmented by statistical region merging (SRM) firstly. Subsequently, the fast marching method and branch back-tracking method was utilized to obtain the centerline. The road centerline was refined by extracting the road width at last. Mohammadzadeh et al. [12] applied the particle swarm optimization (PSO) method to the fuzzy cost function to segment the road, and then the MM was used to compute the centerline in the road. Makhlouf and Daamouche [10] proposed an automatic method that utilized the PSO to obtain the morphological features of the road, and an SVM method was employed to extract the road area finally. These methods made road extraction more objective and accurate. Nevertheless, due to the poor homogeneity in the road areas, the extraction results are still not very satisfactory.
With the development of deep learning, especially for the convolutional neural networks (CNNs), a growing number of novel methods [41]- [48] were introduced to extract road networks. CNNs are widely used in image recognition and image segmentation because they could learn the object features automatically and achieve high precision in image processing tasks. Cheng et al. [49] presented a cascaded end-to-end CNN to extract the road area and centerline simultaneously, which includes two networks, one for road area extraction and the other for the centerline extraction. Zhang et al. [50] presented a network that integrated deep residual learning and U-Net to extract road from aerial images, in which the residual units can help the network to reduce complexity and raise efficiency. Li et al. [51] proposed a Y-shape CNN to segment road, which contains three branches for road semantic features extraction, detail features extraction, and features fusion individually. Although these algorithms based on deep learning can obtain high accuracy, they need tons of training data, which are costly and demanding. Moreover, the discontinuous areas often appear in the results of these methods owing to the complex texture and background, which affect the completeness of the road extraction.
Therefore, to improve the robustness of methods, some extra data were utilized to aid to extract the road network. Maillard and Cavayas [52] used the existing medium-scale topographic map guide to search for the road pixels in remote sensing images base on Freeman's code [53]. It could find the existing road and new roads from the image to update the map. Nonetheless, the method can hardly be used in high-resolution images due to the lack of big-scale topographic maps in certain areas. Mena and Malpica [26] presented an automatic method to extract the road centerline, and use a segmentation method based on texture to obtain the road area, in which the original GPS data are as the training data. Then the centerlines are extracted by the MM method. Sameen and Pradhan [54] presented a method utilizing high-resolution LiDAR and aerial orthophotos data to extract highways. Firstly, the height and color raster was obtained from LiDAR and orthophotos, respectively. Then the roads are acquired by a thresholding method on the raster. At last, the road network was extracted by the MM method. Benefited from the introduction of auxiliary information, the road area can be obtained accurately. However, most of the external data corresponding to the image are not easy to access, because they are from either laborious work or expensive technology, and many areas of the world are even not covered yet. In this case, volunteered geographic information (VGI) [55] can undertake the work.
VGI is spatial data collected from public volunteers through a way of crowd sourcing and free. The Open-StreetMap (OSM) is a typical representative among them. It is an open-source map service that has covered global, and the data are downloadable in the form of vector. Hence, it is appropriate for assisting road extraction on the remote sensing images. This paper presents an automatic method based on level set combining images and VGI data to extract roads. Before the road extraction, the road vector tiles are extracted from the Mapbox and then the satellite images are collected instructed by the vector tiles. At first, the VHR images and the corresponding road vector tiles from the OSM are set as the input dataset, and then a level set method is utilized to extract the road area. After that, the result is refined by the MM methods. Next, the road areas are improved by integrating the results from simple linear iterative clustering (SLIC) and the previous road areas. At last, the MM methods are applied to refine the results. This method is robust to the vehicles and shadows, and it can extract the road accurately without artificial manipulation. The contributions of the method are as follows: (1) The dataset from any place in the OSM can be obtained directly from the Mapbox and the features in OSM can be selected as the research objects. (2) A level set method with the OSM data are introduced, the OSM data, as prior knowledge, which helps the level set method avoid the selection of initial regions. (3) The MM methods are used to refine the binary segmentation results, which can remove the small holes and spurs in the road areas. (4) The SLIC method is adopted to constrain the segmentation areas since it can separate the homogeneous regions from the whole image effectively.
The remainder of this paper is organized as follows. In Section 2, we describe our proposed approach at first. Subsequently, experimental results are given in Section 3 and discussed in Section 4. Finally, the conclusion is presented in Section 5.

II. METHODOLOGY
This study intends to extract the road area from the VHR images automatically. The proposed approach is displayed in Figure 1. The framework of our road extraction method is as follows: 1) Collection of data from Mapbox, including satellite images and the vector tiles; 2) Extraction of road area by our level set method 3) refinement of segmentation results by MM method 4) superpixel segmentation through SLIC method 5) fusion of previous two segmentation results.

A. DATASET
In this study, an open-source program called label-maker [56] is adopted to create our dataset for road extraction. It can download the image tiles and the matched map vector tiles from Mapbox after setting the geographical coordinates. In the program, we can also set the features in the Open-StreetMap, and then it can download the images and map tiles that contain these features. In this study, we choose a place from Binhai New Area around the port in Tianjin as the research area, as shown in Figure 2, and the research target is 'highway' in the category. To obtain enough highresolution images to observe the road, we set it at Level 18 in OSM. The ground resolution is about 0.48 meters per pixel in this region and the tile size is 512 × 512 after combined from four neighboring tiles. After collecting the data, most of the road lines from OSM are in the range of road area. The image and vector tiles can be shown in Figure 3. Furthermore, VOLUME 8, 2020  we also use satellite images from Google and Yahoo at the same location as the supplementary datasets for experiments.

B. LEVEL SET EVOLUTION WITH OSM DATA
The level set method belongs to the category of the active contour model. The active contour model is divided into the parametric active contour model and the geometric active contour model. In the parameter active contour model, the target edge is in the form of a continuous curve, which can be expressed by an energy function. Therefore, the contours extraction process is transformed into solving the minimum value of the energy function. The most representative one is the Snake model proposed by Kass et al. [57]. This model has been successfully applied in biological image segmentation early. Still, it has disadvantages that the segmentation result is greatly affected by the setting of the initial contour, and it is difficult to deal with the change of curve topology.
The geometric active contour model evolves based on the geometric measurement parameters of the curve rather than the expression parameters, so it can deal with the topological structure better. The best representative level set method was proposed by American mathematicians Osher and Sethian [58] jointly, which has been successfully applied in fluid mechanics, computer graphics, materials science, etc. Caseles et al. [59] and Malladi et al. [60] applied this model to image processing and computer vision. This level set method can be implicitly expressed as a zero level set of a binary function z = f (x, y), i.e., the intersection of the three-dimensional surface z = f (x, y) and the XY plane. More generally, any n-dimensional surface can be expressed as the intersection of an (n+1)-dimensional surface and an n-dimensional hyperplane. This method represents the evolved surfaces or curves as the surfaces by high-dimensional functions and transforms the evolution equation into the partial differential equation of the high-dimensional level set function (LSF). Therefore, the level set turns the evolution of the geometric active contour model into solving a partial differential equation. Compared with the parametric active contour model, the geometric active contour model has many advantages, such as adaptability of topological changes, insensitivity to the initial position, and stability of the numerical solution. Of course, the conventional level set method also has certain shortcomings. In theory, the level set assumes that the LSF is Lipschitz continuous. However, in the actual image segmentation, the numerical values have to be discretized, so there may be certain errors. With iteration, the LSF may show irregularities, which may cause numerical errors and eventually destroy the stability of level set evolution. To solve the above problem, the numerical remedy method, reinitialization, is adopted. This method is to reinitialize the LSF as a signed distance function every several iterations. The issue of this way is that it is likely to move the original correct zero-level set to the wrong position, resulting in significant errors. Furthermore, there is no scientific theoretical guidance for the method of reinitialization, so it is difficult to choose which way to initialize the LSF and the best time for reinitialization.
In this study, a robust segmentation method distance regularized level set evolution (DRLSE) [61] is applied for road segmentation. This method can perform the level set evolution to maintain the shape of the curve without reinitialization because of the distance regularization term, which makes the results more stable and robust. The DRLSE is described as follows: For an image I in the domain , x, y are the coordinates of pixels in image and LSF is φ(x, y, t). The Contour of the road buffer can be represented by the zero level set φ. The energy function can be defined as: (1) where µ, λ, α are the coefficient of the distance regularization term, coefficient of the weighted length term, and coefficient of the weighted area term respectively in the energy equation. p is the distance regularization term with the form of the double-well potential function, and g is the edge indicator function. Function p and g are as follows: where G σ is a Gaussian kernel with a standard deviation σ . Before the level set evolution, image I is convolved by G σ for smoothing.
The Dirac delta function δ ε and Heaviside function H ε are defined as: The method can be minimized by solving gradient flow, the function is as follows: where d p is a function defined as: In this method, the road vector from OSM is buffered as initial LSF φ. Thus, the initial area does not need to be set manually, and it also can ensure the initial area in the road area in most instances depending on the accuracy of the OSM. The initial area and road segmentation results are shown in Figure 4a, b.
After the road area is extracted by the DRLSE method, the result is not very coherent. There are some defects and holes in the road area. We use the MM method to solve the problem. Mathematical morphology (MM) is a method used for analysis and processing of geometrical structures, based on set theory, lattice theory, topology, and random functions. MM is widely applied to digital images. The basic morphological operators are erosion, dilation, opening, and closing. In this case, the closing operation can mitigate the defects and holes. The red road area in Figure 4c shows the results.

C. SUPERPIXEL
The boundary of the road area extracted in the last section is still not attaching to the actual road edge, as shown in Figure 4c. To address this issue, we utilized a superpixel method. Superpixel methods can produce many compact and neighboring pixel blocks in the image, and the pixels in one block have a similar texture, color, or other low-level features. These pixel blocks are superpixels, which can represent the common features of the pixels inside, the grids in Figure 4c showing the results. To a certain degree, it can reduce the volume of the image. We fused the superpixels computed in this section and road area from the last section and considered the superpixels that intersects with previous results as road area. The silver-gray mask in Figure 4c displays the results.
Among the superpixel methods, the Simple Linear Iterative Clustering (SLIC) is a very useful and efficient method [62], [63]. The SLIC method helps to segment the image into many subregions to adjust the road area results. SLIC method uses CIELAB color space, and it just needs one required parameter K for input, which represents the number of superpixels. The method can cluster the image into K neighboring even clusters. Each pixel is assigned a cluster corresponding to the nearest clustering center. Assuming the image has a total of N pixels, then the diameter of each superpixel is approximate S = √ (N /K ). The search range of SLIC is 2S 2S, which can accelerate the algorithm convergence dramatically. The distance measurements are as follows: Among them, j denotes the number of the cluster center and i represent the number of a neighboring pixel. d c represents the color distance based on CIELAB and d s represents the spatial distance in the XY plane. The final distance metric D is defined as follows: where N s is the maximum spatial distance in the class, N s = S = √ (N/K), and N c is the maximum color distance. Through distance measurement D, cluster centers can be determined by the K-Means method.
After fusing the road area with SLIC, we adopted the morphological closing operation and the median filtering to improve the edges of superpixels to obtain the final road area. Figure 4d displays the results.

D. EVALUATION METRICS
In this study, precision, recall, F1, and intersection over union (IoU) are introduced to access the performance of our method. They are defined as follows: Among them, the true positive (TP) represents the road area both in the extraction image and reference image; the false positive (FP) stands that the area is in the extraction image but not in the reference image; the false negative (FN) means that the area is not in the extraction image but the reference image. In our study, the reference road areas are drawn manually.

III. RESULTS
To evaluate our method for road extraction, 100 images are selected randomly from Mapbox images including roads in the research area. In the meantime, we also choose the images in the same locations from Google and Yahoo as the other two datasets for further verification. From each dataset, we choose four images at the same locations for results display. We test our method performance in road segmentation, and compare it with the other two state-of-the-art methods [15], [40], hereafter referred to as ''Zhou 2019'' and ''Miao 2019'', respectively. In our proposed method, the initial road line is buffered with width 4 as the initial area, time step t = 2, µ = 0.01, λ = 6, α = −3, K = 1000 in all the three experiments. In Zhou 2019, the parameter Q = 700 in the three experiments. The parameters of Miao 2019 are different as datasets vary, and they are listed in each experiment.

A. EXPERIMENT ON MAPBOX IMAGES
In this experiment, the data are from Mapbox Satellite, and α is 2 for Miao 2019. The ground truth and the results of compared methods are shown in Figure 5. It can be seen that the roads in the pictures are all in low differentiation with the background. Simultaneously, there are shadows on or near the road area. These make it demanding to extract roads. For Zhou 2019 covered many areas not belonging to the road, especially in the first and third images, although it extracted most of the road domains in each image. In the second and fourth images, a large portion of the overpass is not detected and some non-roads are mistaken as the road. This may be because the method only considers the texture without the edge, so there are many disconnected areas extracted. Compared with Zhou 2019, Miao 2019 can achieve relatively higher accuracy in road extraction, but many road segments omissions are in the first three images, especially for the first one. The reason for this may be the occlusion from the buildings and vehicles and the indistinct features between the road and the background. In contrast, our method can extract nearly all the roads, though a part of the road missed in the first image, and little false extraction appeared in the second and fourth images. In the three methods, because of the thorough consideration of the features of spectrum and space, our method can extract most roads, and retain the connection and topology proper.

B. EXPERIMENT ON GOOGLE IMAGES
In this experiment, the data are from Google Maps, α = 1 for Miao 2019. The original images, ground truth, and the results of compared methods on Google images are shown in Figure 6. Roads in the images are still difficult to distinguish from the background, although fewer shadows appear than that on the Mapbox. Still, there are many vehicles on the roads. It is easy to find that the season in the third image is different from that in Mapbox because there are plants in this scene. Therefore, it is a different challenge from the first experiment. When it comes to the comparison of the three methods, Zhou 2019 covered nearly all the road areas in the first and last images, but many non-roads are mistaken as roads. In the second and third images, most of the roads are not detected, and many non-road areas are mistaken for roads. Compared with Zhou 2019, Miao 2019 obtained great performances in the first and last images, despite a little VOLUME 8, 2020 missing in the first image. In the second image, because of the occlusion of vehicles and the insignificant features of the road area, some areas are not covered. In the third image, the vegetation in the center of the road affects continuity. As a comparison, our method can extract nearly all the roads in the last two images, although some roads missed in the first two images due to the low differentiation in the color and texture. In the three methods, our method and Miao 2019 can extract most of the roads, and ours keeps topology acceptable.

C. EXPERIMENT ON YAHOO IMAGES
In this experiment, the data are collected from Yahoo, α = 1 for Miao 2019. The original images, ground truth, and the results of compared methods on Yahoo images are shown in Figure 7. Like previous images, the roads in the images are also in low differentiation with the background, but few shadows near the road area. However, the features of the road are a little indistinct. Thus, it is complicated to extract roads from these images. When it comes to the results of the three methods, the effects are in Figure 7. Zhou 2019 extracted almost all the road areas except some small segments, but many parts of non-road in the images are considered as road, and the topology is terrible in all the four images. By contrast, Miao 2019 can achieve relatively higher accuracy in road extraction, but many road segments omissions happened except for the second image. Many roads are missed, and the topology is even worse. In contrast, our method extracted nearly all the roads, though a little of the road was not covered. In the three methods, our method can  extract most roads accurately, and remain the connection and topology much better. In the three experiments, compared with the other two state-of-the-art methods, our method can achieve the best results visually in both the accuracy, recall ratio, and robustness.
Overview, Zhou 2019 mainly depends on the spectrum features of images and considers few edge features. Thus, the FP is very high and it cannot recognize the roads completely. Miao 2019 involves the edge information and texture features, so the results are quite well. By contrast, our method considers the texture and space information fully, obtaining the best results, which is even robust to the occlusion.

D. QUANTITATIVE ANALYSIS
The quantitative results in Table 1    than 0.6. In general, the performance of ours is much better than the other methods.

IV. DISCUSSION
In this section, we have discussions with the sensitivity of the parameters for our approach, and the comparison of time cost between our method and the other two methods. They are described in the following subsections.

A. PARAMETER SENSITIVITY ANALYSIS
The parameters µ, α, λ, in Equation 6 are vital in our method and they determine the results of the road segmentation considerably. The initial µ, α, λ, are respectively for 0.1, −3, 5. We tested the sensitivity of parameters through controlling variables. µ, α, λ are all from 0, to 0.14, −10, 10, respectively. The changes of metric results as µ, α, λ vary are shown in Figure 8a, b, c separately. In Figure 8a, the precision, recall, F1, and IoU remained stable at 0.8, 0.9, 0.85, and 0.73 as µ increased, until at last, they dropped down to 0 dramatically. According to the condition described in [61], µ t < 1/4, µ should be limited in a minimal value. Thereby, when µ is 0.01, it is a fit value where the F1 and IoU are all at peaks. In Figure 8b, it can be seen that the IoU and F1 raised to 0.74, 0.85, and then came down and kept flat at 0.72, 0.84 as α decreased. The precision came down from 0.82 and remained steady at 0.77. Compared with precision, the recall grew slowly from 0.87 to 0.90 as α reduced. When α is −3, the F1 is in the summit. Figure 8c illustrates the changes in metrics as the λ rises. The precision, F1, and IoU are nearly the same trend that they stayed at 0.77, 0.84, 0.72, and then increased to 0.8, 0.85, 0.74, and remained to the end. In contrast, the recall steady at around 0.92, despite a little change in the middle. The chart reveals that when λ is 6, the F1 and IoU reach the highest points.

B. COMPUTATION COST ANALYSIS
In this section, we compared our method with the other two methods in time spent. We performed all the methods on the computer with 3.5-GHz Xeon CPU and 16-GB memory on the platform of PyCharm. We tested our method on the three datasets and compared ours with Zhou 2019 and Miao 2019. Each experiment was performed five times repeatedly and the average running time is the computational cost. In V, it shows that the computing cost of our method is much less than Miao 2019, although it is slightly higher than Zhou 2019. Furthermore, our method is much better than Zhou 2019 in accuracy. Therefore, in terms of time spent, our method is still an efficient and effective approach to extract the road area.

V. CONCLUSION
A new road extraction method utilizing VGI based on the level set is proposed in this study. The remote sensing images are collected from the satellite maps guided by the VGI from OpenStreetMap initially. Next, a level set method is employed to extract the road area with the VGI as prior knowledge, which avoids setting the initial area manually. After the initial roads are extracted, the mathematical morphology and the superpixel methods are utilized to refine the road area.
The experiments are performed on three datasets from different satellite maps for verification. Compared with two state-of-the-art methods on the three datasets, our method obtains the highest scores on all four metrics. Our method can extract most of the road area no matter for the straight cross, curve cross, or even the overpass. It can keep the topology of the road quite well, although occasionally a little road areas may miss because of the occlusions and shadows. Certainly, there are still some shortcomings in our method. The parameters in the method cannot be adapted to all scenarios. Besides, it is difficult to segment the road accurately for very complex scenes, like the heavy occlusions. In general, though our method is not perfect, it can annotate the road area automatically and save labor. Therefore, it is an effective way for road area extraction in the complex very high-resolution remote sensing images.
In the future, we should improve the method to adjust parameters automatically to various images, which will increase the performance for the road segmentation. Also, more information inside or outside the image should be introduced to enhance the features of the road, which can enhance it dramatically. XING WANG received the B.S. degree from Wuhan University, Wuhan, China, in 2008, and the Ph.D. degree in photogrammetry and remote sensing from the School of Remote Sensing and Information Engineering, Wuhan University, in 2015.
He is currently a Lecturer with the School of Marine Science and Technology, Tianjin University. His research interests include remote sensing image processing, object detection, and image retrieval. JINGSHENG ZHAI is currently a Professor with the School of Marine Science and Technology, Tianjin University. He is also the Deputy Director of the State Key Laboratory of Geo-information Engineering. His major research interests include hydrographic database, electronic chart, marine geographic information engineering, coastal photogrammetry, spatial graphic algebra, and image morphological transformation. VOLUME 8, 2020