Loading [MathJax]/extensions/TeX/boldsymbol.js
Road Detection From Remote Sensing Images by Generative Adversarial Networks | IEEE Journals & Magazine | IEEE Xplore

Road Detection From Remote Sensing Images by Generative Adversarial Networks

Open Access

Proposed model contains a generator and a discriminator. The generator produces the segmentation map. Meanwhile, the discriminator takes the segmentation map or ground tr...

Abstract:

Road detection with high-precision from very high resolution remote sensing imagery is very important in a huge variety of applications. However, most existing approaches...Show More
Topic: Advanced Data Analytics for Large-scale Complex Data Environments

Abstract:

Road detection with high-precision from very high resolution remote sensing imagery is very important in a huge variety of applications. However, most existing approaches do not automatically extract the road with a smooth appearance and accurate boundaries. To address this problem, we proposed a novel end-to-end generative adversarial network. In particular, we construct a convolutional network based on adversarial training that could discriminate between segmentation maps coming either from the ground truth or generated by the segmentation model. The proposed method could improve the segmentation result by finding and correcting the difference between ground truth and result output by the segmentation model. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art methods greatly on the performance of segmentation map.
Topic: Advanced Data Analytics for Large-scale Complex Data Environments
Proposed model contains a generator and a discriminator. The generator produces the segmentation map. Meanwhile, the discriminator takes the segmentation map or ground tr...
Published in: IEEE Access ( Volume: 6)
Page(s): 25486 - 25494
Date of Publication: 13 November 2017
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Roads are the backbone and essential infrastructure in the urban space, connecting different functional areas and playing a very important role in human civilization. Extracting road information from remote sensing images is the premise of many urban applications such as traffic management, urban planning, road monitoring, geographic information system updating and so on [1]–​[4]. However, manually labeling the roads will require much labor and time and, in spite of high accuracy, creates a bottleneck for real-time response in urban service. To overcome this problem, automatic road extraction strategy has been proposed with the aid of various machine learning methods. Compared with traditional manual extraction strategy, the automatic way is more efficient in real-time updating of the transportation database; it is also more economical to save many manual extraction costs.

However, fully automatic road extraction processes are still a challenging problem, failing to provide satisfactory accuracy over practical application in urban areas. The difficulties lie in the large variations in the spatial appearance and physic material of roads. These variations appear in different aspects such as spectral reflectance, shape, contrast and so on. Various studies [5]–​[7] have been proposed to overcome these difficulties in recent years.

Most road detection approaches [8], [9] utilize the classification-based methods. Classification-based methods usually use the geometric features, photometric features and texture features of a road [10]. Unfortunately, the classification accuracy of the road is far from providing stable and satisfactory results on the large dataset [11]. As the input of classification model, spectral characteristics and spatial characteristics are both used to differentiate different classes. However, the spectral information could not give much help to extract the road class. The spectral information of the concrete road may be easily confused with other classes such as man-made concrete buildings [12], [13]. On the other hand, the road may be composed of different materials such as cement, asphalt and soil, as a result representing totally different spectral characteristics. Thus the accuracy of the models is very dependent on spatial features. Although the spatial characteristics of the road are very significant, the road is usually distributed in the complex backgrounds and the geometric shape may be influenced by building shadows, cars and trees on the road, resulting in gaps and discontinuities in the detected road.

Although many researchers have provided various strategies to describe the spatial features, it is still hard to overcome the large variations of the road class with a fixed template or criterion. In recent years, deep learning (DL) has shown superior performance by learning high-level features, with great success in the image processing field [14]–​[16]. Reference [17] proposed very impressive and promising auto-encoders for unsupervised feature learning for large scale dataset, which for the first time considers the denoising and stacked convolution stategy for high dimension image dataset. Current state-of-the-art segmentation methods [18] rely on convolutional neural network (CNN) approaches, following early work using CNNs for this task by Grangier et al. [19], Zhang et al. [20], and Farabet et al. [21]. However, the CNN is only based on the independent prediction model; spatial contiguity is ignored, thus various post-processing approaches have been explored. Conditional Markov random fields (CRFs) are one of the most effective approaches to enforce spatial contiguity in the output label maps. Shelhamer et al. [22] first proposed a framework of fully convolutional networks (FCN), which could adapt DL classification networks into full convolutional networks (FCN) and transfer the learned representation to satisfy the segmentation task on the pixel level. Based on this work, Chen et al. [23] proposed Recurrent Neural Networks by fully integrating CRF modeling CNN as the preprocessing step, making it possible to train the whole deep network end-to-end. However, FCN-based work obtains coarse segmentation maps owing to the fixed-size receptive field predefined in the network. Noh et al. [24] first proposed a deconvolution network, which is composed of deconvolution, unpooling, and rectified linear unit (ReLU) layers. The purpose of a deconvolution network is to recover the resolution of a feature map to the level of the input image. Deconvolution networks have been proven to identify detailed structures and handle objects in multiple scales naturally. Badrinarayanan et al. [25] proposed encoder-decoder architecture, with a hierarchy of decoders, one corresponding to each encoder, called Segnet. The decoder layer receives max-pooling indices from the corresponding encoder layer. Ronneberger et al. [26] proposed a U-net for semantic segmentation. The architecture consists of a contracting path to capture context and to propagate to higher resolution layers, and a symmetric expanding path that enables precise localization. U-net is also an encoder-decoder architecture with skip connections between mirrored layers in the encoder and decoder stacks. Cheng et al. [27] proposed a cascaded end-to-end convolutional neural network to simultaneously extract consistent road area and smooth the road centerline, which is also based on an encoder-decoder network.

In recent years, the deep convolution network has achieved important breakthroughs in pixel-wise classification in the nature image field. However, in the remote sensing field, public land use datasets are very few and small, which is still not sufficient to train an excellent deep convolution net. Meanwhile, compared to natural images, the variations of roads are more obvious, and the backgrounds are more cluttered. In this paper, we introduce a generative adversarial networks (GANs) approach [28] to solve the road extraction problem in remote sensing images.

GAN includes two training models. One is a generative model G to capture the data distribution, and the other is a discriminative model D to estimate the probability that a sample belongs to training data rather than being generated by G . With the iterative adversarial training between G and D , the capacity of model G is strong enough to simulate the probability distribution of the data, which is very close to the real training data. As we know, there are few studies of GAN in semantic segmentation, especially for road extraction on the remote sensing images. In particular, the generative model is a deconvolution network-based model to produce a label for each pixel; the adversarial term encourages the produced labels that are hard to distinguish from ground-truth ones. Luc et al. [29] use the GANs framework for the semantic segmentation task, and the results have proved that the GANs could enforce spatial label contiguity on a long-range step and, as a result, generate more accurate and smooth results compared to non-adversarial training. However, the generative term still uses low-level features to generate the segmentation map, so the boundary of the segmentation is not very clear. In this paper, we not only want to get a smooth segmentation map with the consideration of higher-order spatial consistency, but also want the segmentation map to contain more details to describe the boundary information. Thus we utilize the Segnet as our generative term to generate a segmentation map with high resolution.

In summary, the main contributions of the proposed approach are highlighted as follows:

  1. The generative adversarial network is first introduced into pixel-wise remote sensing image classification.

  2. The Segnet model is used as the generative model, which could ensure that the resolution of the segmentation is same as the input image. Meanwhile, the adversarial training approach enforces long-range spatial label contiguity to achieve more consistent road detection results than other comparing methods under complex backgrounds and occlusions of cars and trees.

The remainder of this paper is arranged as follows. The related road extraction work is systematically reviewed in Section II. In Section III, we briefly review GANs and the proposed GANs model used in the road extraction. Section IV provides detailed descriptions of our data set. Experimental evaluations as well as detailed comparisons between our method and state-of-the-art methods are provided in Section V. Finally, the conclusion and discussion will be outlined in Section VI.

SECTION II.

Related Work

Most existing road extraction methods depend on the binary classification model owing to the outstanding performance of existing classification models such as support vector machine (SVM) classifier [30], artificial neural network (ANN) [31] and so on. To extract the spatial features of the road, the edge detectors directional adaptive filters [32], and magnitude and orientation [6] could highlight the potential road points. Song and Civco [33] proposed the smoothness and compactness criteria to describe the spatial characteristics of each small segment. Tupin et al. [34] generate the potential road segments extracted with linear detectors in their local area. The last result can be obtained by utilizing Markov random fields (MRF) to refine these original segments. Yager and Sowmya [35] utilized SVM as the classifiers to classify whether the edges belong to the road class. However, this approach is very dependent on the accuracy of edge detectors. Huang and Zhang [36] proposed a road detection system based on multiscale structural features and SVMs, in which an object-based approach is used to extract multiscale information, and hybrid spectral-structural features are analyzed using SVM classifiers. Das et al. [6] introduced a multistage framework to extract roads from high-resolution multispectral satellite images, in which probabilistic SVM and salient features were used.

Another typical method is based on mathematical morphology. For example, Zhang et al. [37] utilize a series of mathematical morphological operations. The image is first segmented to small regions, which is used to separate the road class from the surrounding background and remove the noise on the road such as cars, trees and the shadows of the buildings. According to elongated characteristics of the road, a criterion is proposed to evaluate the major axis of minimal ellipse which encloses the segments. Zhu et al. [38] propose a line segment match method based on binary and grayscale mathematical morphology, which could get better results when the road surface is not continuous under the influence of bad weather or the occlusion by the building shadow. Ma et al. [39] proposed a multi-scale retinex (MSR) algorithm to enhance the contrast of the image, on which the road points could be more obviously compared with background points. And then Canny edge detection and Hough line transform automatic road extraction were used. Subsequently, the linear and curved road segments are regulated by the Hough line transform and extracted based on several thresholds of road size and shapes, in which a number of morphological operators are used such as thinning (skeleton), junction detection, and endpoint detection.

The snake model could delineate an object from a possibly noisy image [40]. The snake model utilizes the elastic deformation of the template generated by some control shape to match the potential local features by means of energy minimization. In the snake model, the initial road points should be selected first. Thus snake models are semi-automatic object extraction methods. Anil and Natarajan [41] first introduced the snake model into road extraction. In the following years, many new snake models have been applied to the road extraction. Gruen and Li [42] proposed LSB-snakes, which is a semi-automatic road extraction method. Least squares B-spline method is used to construct the spline curves of the road; and then the road centerline can be obtained by image matching, GIS data support and other operators.

Although methods above have made certain achievements to solve the road detection problem, however these are still some shortcomings. In the road detection application, most these approaches give a bad performance in the heterogeneous area, for example, the occlusions like trees and cars on the main road. To alleviate the above-mentioned shortcomings, a novel method based on generative adversarial networks (GANs) is proposed to handle with road detection problem. And proposed methods could achieve homogenous road result even in the heterogeneous areas or under the occlusions of trees and cars.

SECTION III.

Methodology

In this section, we first describe the principle of Generative Adversarial Networks (GANs). Then we introduce the semantic segmentation application of GANs. At last, the details of inference process are listed.

A. Generative Adversarial Networks (GANs)

Generative Adversarial networks (GANs) have drawn much attention in the last two years and in this section, we briefly introduce the principle of GANs.

In a GANs framework, the generative model is used to stimulate the data probability distribution and the discriminative model is used to find whether a sample is coming from the generative model or the ground truth map. The generative model and discriminative model form an adversarial training to obtain final results (Figure 1).

FIGURE 1. - Training GAN network to generate the label for RGB images. The discriminative model 
$D$
 learns to determine whether the input label map is real or not. The generator generates the most “authentic” sample to confuse the discriminative model 
$D$
.
FIGURE 1.

Training GAN network to generate the label for RGB images. The discriminative model D learns to determine whether the input label map is real or not. The generator generates the most “authentic” sample to confuse the discriminative model D .

The purpose of the generative model is to learn data probability distribution p_{g} over dataset x . First, we define a prior noise variable p_{z}(z) , and a mapping from p_{z}(z) to output image can be represented by G(z;~ \theta _{g}) , where G is a differentiable function with parameter \theta _{g } and the function of G is to produce the output image y . We also define the discriminative model as D (\boldsymbol {x} ; \theta _{d} ), which represents the probability that x came from the true data rather than simulation by G. And \theta _{d} is the parameter in the discriminative model.\begin{align}&\min \limits _{G} \max \limits _{D} L_{cGAN} (G,D)=E_{y \sim p_{data} (y)} [\log D(y)] \notag \\&\qquad \qquad \quad +\,E_{x\sim p_{data} (x),z\sim p_{z} (z)} [\log (1-D(G(x,z))]\qquad ~ \end{align} View SourceRight-click on figure for MathML and additional features. The first term tries to maximize the probability to make a correct prediction to the output sample y . The second term is to train G to minimize \log \,\,(1-D(G(x,z)) .

GANs consider output samples conditionally independent from each other; however, on the image the neighborhood samples share the same label to a certain extent. Thus samples in a local region can be considered dependent. The conditional GAN is proposed to learn a structure loss in a local region, that is, to learn a mapping from observed image x and random noise variables p_{z}(z) to the output y , which can be represented by G(x,z;\theta _{g} ).

The conditional GAN can be represented as follows:\begin{equation} G^\ast =L_{L2} (G)+\arg \min \limits _{G} \max \limits _{D} L_{cGAN} (G,D) \end{equation} View SourceRight-click on figure for MathML and additional features.

B. Generative Adversarial Networks (GANs) for Semantic Segmentation

In this section, we adapt the GAN for semantic segmentation. Let H and W is the height and weight of the image. We assume the RGB image x of size H\times W\times 3 . The output image of the road segmentation model can be represented as y of size H\times W\times C . C is the number of classes.

The first term in the GANs is used to maximize the probability that the segmentation model assigns the right label for each pixel on the images. The function loss is the same with semantic segmentation models; see e.g., [2], [15], [16], [21]. In this work, we use convolutional Encoder-Decoder Architecture as a segmentation model. The second term is used to discriminate the output s(y) of the segmentation network from the true label. And the mismatch in the higher-order label statistics between predicted label and true label should be penalized in this term.

The workflow of the proposed model can be shown in Figure 2. For the generator, an encoder-decoder network was used to produce the segmentation map with the same resolution as the input image.

FIGURE 2. - Proposed model contains a generator and a discriminator. The generator produces the segmentation map. Meanwhile, the discriminator takes the segmentation map or ground truth map as the input, combined with the RGB image, to decide the probability that the label map comes from a true map or a generated map.
FIGURE 2.

Proposed model contains a generator and a discriminator. The generator produces the segmentation map. Meanwhile, the discriminator takes the segmentation map or ground truth map as the input, combined with the RGB image, to decide the probability that the label map comes from a true map or a generated map.

For the segmentation problem we consider, we want to keep the underlying structure in the input image. The encoder-decoder architecture segmentation model is used as the generator. In particular, a hierarchy of decoders follows the encoder layers, and each decoder corresponds to an encoder. The decoder layers utilize the max-pooling indices trained from the corresponding encoder. Specifically, the max-pooling indices on layer i can be used on layer n-i, where n is the total number of layers.

In the following text, we introduce the adversarial network in detail.

Given a data set of N training images x_{\mathrm {n}} and a corresponding label maps y_{\mathrm {n}} , in our architecture, binary cross entropy loss is utilized to represent the classifier D . We use a(x,y)\in [{0,1}] represent the scalar probability that y is the true label of x .\begin{align} G\ast=&L_{L2} (G)+\arg \min \limits _{G} \max \limits _{D} L_{cGAN} (G,D) \notag \\=&\sum \limits _{n=1}^{N} l_{mce} (s(x_{n}),y_{n})\notag \\&-\,\lambda \left [{ {l_{bce} (a(x_{n},y_{n}),1)+l_{bce} (a(x_{n},s(x_{n})),0)} }\right]\qquad \end{align} View SourceRight-click on figure for MathML and additional features.

In the objective function above, l_{mce} (y\ast,y)=-\sum \nolimits _{i=1}^{H\times W}\,\,{\sum \nolimits _{c=1}^{C} {y_{ic} \ln y_{ic}^{\ast } } } , and this term denotes the entropy loss between the predicted label s(x_{n}) and the ground truth label y_{n} , which is used to optimize the segmentation model to generate samples closest to the ground truth. In the second term, l_{bce}\,\,(z\ast,z)=-[z\ln z^{\ast }+(1-z)\ln \,\,(1-z\ast)] denotes the binary cross-entropy loss, which is used to train the discriminator to make right decisions. We minimize the loss with respect to the parameters in the segmentation model, while maximizing the parameters in the adversarial model.

The alternative optimization is used to optimize the objective function. Since only the second term includes the adversarial process, we first optimize the discriminator term by fixing the generator term. Thus the gradient descent step is used to optimize the D with one step, and then one step optimization is used on G . To accelerate the calculation speed, the mini-batch stochastic gradient descent (SGD) is used.

When we optimize the discriminator term, which can be listed as follows:\begin{equation} \min \sum \limits _{n=1}^{N} {l_{bce} (a(x_{n},y_{n}),1)+l_{bce} (a(x_{n},s(x_{n})),0)} \end{equation} View SourceRight-click on figure for MathML and additional features. this objective function is used to minimize the binary classification loss above. We could optimize this term as a CNN network. The input of the CNN network consists of the label map and the corresponding RGB image. There are two probable choices for the label map: one is the true label y_{n} ; the other is the predicted label s(x_{n}) .

The detail architecture is shown in Figure 3. In the adversarial architecture, since two different sources of inputs represent different low-level representations, two branches are used to process the label map and corresponding RGB image, respectively. To balance the effect of these two kinds of signals, both are convolved to 64 channels. The signals are then passed through a stack of convolutional and max-pooling layers. At the last of the adversarial architecture, the sigmoid activation is used to generate the probability of 0 or 1.

FIGURE 3. - Adversarial architecture for road extraction. In the discriminative model, there are two branches used: the left one is the RBG image and right one is the segmentation map. When fusing the two kinds of signals, we change both inputs using 64 channels. The combined representations are then passed into another stack of convolutional and max-pooling layers.
FIGURE 3.

Adversarial architecture for road extraction. In the discriminative model, there are two branches used: the left one is the RBG image and right one is the segmentation map. When fusing the two kinds of signals, we change both inputs using 64 channels. The combined representations are then passed into another stack of convolutional and max-pooling layers.

After optimizing the adversarial term, then we fix the parameter in this term, and optimize the generative network. The term of objective function is listed as follows:\begin{equation} \sum \limits _{n=1}^{N} {l_{mce} (s(x_{n}),y_{n})-\lambda (l_{bce} (a(x_{n},s(x_{n})),0))} \end{equation} View SourceRight-click on figure for MathML and additional features.

In this objective function, we want to minimize the multi-class cross-entropy loss to find the result closest to the ground truth label. According to [17], proposed by Goodfellow, for the convenience of inference in practice, the term - l_{bce} (a(x_{n},s(x_{n})),0) is replaced by the term l_{bce} (a(x_{n},s(x_{n})),\textrm {1}) . Thus the practical meaning is that the objective function maximizes the probability that adversarial predict x_{n} is the ground truth map, instead of the synthetic label map. The critical points of these two formulations are the same, and the rational meaning of this modification is that the canceling signals change to additive signals, which is convenient to gradient descent optimization. The experiments in previous papers also confirmed that this modification could speed up optimization.

At inference time, we run the generator net in exactly the same manner as during the training phase. This differs from the usual protocol in that we apply dropout at test time, and we apply batch normalization [18] using the statistics of the test batch, rather than aggregated statistics of the training batch. When the batch size is set to 1, this approach has been termed “instance normalization” and has been proven to be effective at image generation tasks [38].

SECTION IV.

Data Set Descriptions

In this section, we introduce the experimental dataset in detail. However, few public road datasets could be used to evaluate the proposed road extraction method. We cut out the 550 images including road objects from Google Earth. We manually labeled these road objects as the ground truth map. The road dataset includes different kinds of areas, such as city, suburb and rural. The roads are represented by different types, such as highways, rural roads or streets. The resolution of input RGB images is 360\times 480 . Figure. 4 shows the examples in the road dataset, which includes the street roads in complex residential areas (first), rural roads in suburb areas (second), small paths in agriculture areas (third), highways in city areas (fourth), and the roads on the seaside (fifth). The width of the roads ranges from 2 pixels to 40 pixels. Some small roads are very obscure in the complex backgrounds, and some large roads are influenced by the occlusions of cars and trees. These factors make road extraction a very challenging problem.

FIGURE 4. - The examples in our road dataset. The first column shows the original images. The second column shows the segmentation reference maps.
FIGURE 4.

The examples in our road dataset. The first column shows the original images. The second column shows the segmentation reference maps.

SECTION V.

Experiments and Evaluation

In this section, we first introduce the experimental setting in the proposed method. Then we introduce the methods of comparison and the evaluation metrics for the comparison. Last, we present the experiment results, which include quantitative comparisons and visual road extraction results.

A. Experiment Setting

Our road dataset consists of 550 images. We randomly split the data into a sub-dataset for training with 320 images, a sub-dataset for validation with 100 images, and a sub-dataset for testing for 130 images. However, these datasets are still too small to lead the over-fitting problem, so the data augmentation is also used to expand the datasets. Firstly, we randomly cut out 15 patches with sizes of 200\times 200 from original images. Then we rotated each patch 90°, 180° and 270 °, followed by flipping the patches horizontally and vertically.

After a series of transformations, each patch is expanded to 8 patches, which includes two kinds of flips (horizontal and vertical flips) and 4 kinds of rotated operations. As a result, each original image in the dataset is expanded to 120 patches [15] \text {patches} \times 8 transformations]. Additionally, a dropout [43] of 0:5 was added to deeper convolutional layers, which are used to overcome the over-fitting problem. The optimization was performed for more than 100 epochs through the dataset, until no further performance increase was observed.

B. Comparing Algorithms

To verify the performance, the proposed GANs method for segmentation is compared with other state-of-the-art methods. Since the proposed method is pixel-wise segmentation, all compared methods are classification-based methods. We selected traditional a road extraction method based on the designed spatial features according to prior knowledge. Huang and Zhang [36] proposed the method based on extracting the structural features of the road, which is a super pixel-based method.

We also compare the semantic segmentation methods based on the CNN. First, we compare the state-of-the-art Fully Connected Network [22], which is the first try to solve the pixel-wise classification. Second, we compare a deep convolutional encoder-decoder architecture [25], called Segnet, which used a hierarchy of decoders to improve the resolution of the segmentation map. Segnet is also the generative term of our GANs work. Thus the com parison of Segnet could see the difference of adversarial training and non-adversarial training.

C. Evaluation Metrics

There are three evaluation metrics for object extraction. The first is the completeness (COM), which is used to measure the proportion of pixels having right prediction labels. The second is the correctness (COR), which is used to measure the percentage of matched road areas in the segmentation map. The third is the quality (Q) criterion, which is the combination of COM and COR. \begin{align*} COM=&\frac {TP}{TP+FN}\\ COR=&\frac {TP}{TP+FP}\\ Q=&\frac {TP}{TP+FN+FP} \end{align*} View SourceRight-click on figure for MathML and additional features. where TP presents the true positive, FP represents false positive and FN represents the false negative.

D. Comparison of Road Detection

To evaluate the effectiveness of the proposed GANs method for road detection, Figure. 5 shows the qualitative and quantitative results with the state-of-the-art methods.

FIGURE 5. - Visual comparisons of road area extraction results with different comparing algorithms. The green region represents the true positive (TP), red region represents the false alarm (FP), and the blue region represents the false negative.
FIGURE 5.

Visual comparisons of road area extraction results with different comparing algorithms. The green region represents the true positive (TP), red region represents the false alarm (FP), and the blue region represents the false negative.

According to the results, the influence of occlusions can be alleviated by all extraction methods to a certain degree, based on the consideration of spatial information. However, the results of Huang method show more FPs, which is represented by the red areas. FCN could obtain smoother results, but there are many mistakes on the boundary of the road (the FPs and FNs are high on the boundary). This is because FCN only produce the segmentation map with low resolution, and the edge information could not be preserved. And Segnet introduces deconvolution layers to up-sample the segmentation map to the resolution of the input image. Segnet shows higher extraction accuracy on the boundary compared with FCN. GANs added the adversarial training based on the work of Segnet and, as a result, could get the segmentation map with fewer FPs and FPs compared with Segnet.

Table 1 represents corresponding quantitative results of all comparison methods. In Table 1, the first three columns are the segmentation accuracy of three selected images, and the last column is the average performance of two images in the test set. As shown in Table 1, Segnet+GANs could get the highest accuracy compared with other methods. And Segnet is the second best in all methods. Specifically, the average Q of Segnet + GANs is about 1.5% higher than the Segnet, which demonstrates that adversarial training could improve the results.

TABLE 1 Quantitative Results on Different Comparing Methods on Road Detection, Where the Best Values in Bold. That the Last Column is the Average Performance of All Images in Test Set
Table 1- 
Quantitative Results on Different Comparing Methods on Road Detection, Where the Best Values in Bold. That the Last Column is the Average Performance of All Images in Test Set

In Figure 6, we display the evolution of the road detection accuracy on the train and validation sets, using either standard or adversarial training. We find that the adversarial strategy results in less over-fitting, i.e., generating a regularization effect, resulting in improved accuracy on validation data.

FIGURE 6. - Road detection accuracy across training epochs on road datasets on train data (left) and validation data (right), with and without adversarial training.
FIGURE 6.

Road detection accuracy across training epochs on road datasets on train data (left) and validation data (right), with and without adversarial training.

SECTION VI.

Conclusion

In this paper, a novel end-to-end convolutional neural network based on generative adversarial training is proposed. The adversarial training is used to further improve the performance of the segmentation model. In the proposed work, we utilize the adversarial training as a variational loss in the objective function for segmentation, which is equal to adding a regularization term to consider the higher-order consistency. For the segmentation model, we use Segnet to generate a pixel-wise classification map. The experimental results on the road dataset could verify the superiority of the proposed method. The segmentation map produced by the proposed method presents the best quantitative and visual performances.

References

References is not available for this document.