SAN: Selective Alignment Network for Cross-Domain Pedestrian Detection | IEEE Journals & Magazine | IEEE Xplore

SAN: Selective Alignment Network for Cross-Domain Pedestrian Detection


Abstract:

Cross-domain pedestrian detection, which has been attracting much attention, assumes that the training and test images are drawn from different data distributions. Existi...Show More

Abstract:

Cross-domain pedestrian detection, which has been attracting much attention, assumes that the training and test images are drawn from different data distributions. Existing methods focus on aligning the descriptions of whole candidate instances between source and target domains. Since there exists a giant visual difference among the candidate instances, aligning whole candidate instances between two domains cannot overcome the inter-instance difference. Compared with aligning the whole candidate instances, we consider that aligning each type of instances separately is a more reasonable manner. Therefore, we propose a novel Selective Alignment Network for cross-domain pedestrian detection, which consists of three components: a Base Detector, an Image-Level Adaptation Network, and an Instance-Level Adaptation Network. The Image-Level Adaptation Network and Instance-Level Adaptation Network can be regarded as the global-level and local-level alignments, respectively. Similar to the Faster R-CNN, the Base Detector, which is composed of a Feature module, an RPN module and a Detection module, is used to infer a robust pedestrian detector with the annotated source data. Once obtaining the image description extracted by the Feature module, the Image-Level Adaptation Network is proposed to align the image description with an adversarial domain classifier. Given the candidate proposals generated by the RPN module, the Instance-Level Adaptation Network firstly clusters the source candidate proposals into several groups according to their visual features, and thus generates the pseudo label for each candidate proposal. After generating the pseudo labels, we align the source and target domains by maximizing and minimizing the discrepancy between the prediction of two classifiers iteratively. Extensive evaluations on several benchmarks demonstrate the effectiveness of the proposed approach for cross-domain pedestrian detection.
Published in: IEEE Transactions on Image Processing ( Volume: 30)
Page(s): 2155 - 2167
Date of Publication: 20 January 2021

ISSN Information:

PubMed ID: 33471752

Funding Agency:


I. Introduction

Pedestrian detection that aims to predict a series of bounding boxes enclosing pedestrians for a given image, as a particular branch of general object detection, has been attracting more and more interests in both academia and industry. Driven by the surge of convolutional neural networks (CNN) [1], many CNN-based pedestrian detection approaches have been proposed to boost the performance [2]–[12]. However, these methods all assume that the training and test images have the same distribution, limiting the generalization of the proposed methods. As shown in Figure 1, using the detector inferred on the Caltech dataset [13] obtains a worse detection result on the CityPersons dataset [14].

Samples from the Caltech (a) and CityPersons (b) dataset. Since there exists an obvious visual difference between two datasets, merely applying the detector trained on the Caltech dataset generates many false detection results on the CityPersons, e.g., red and blue bounding boxes represent the positive and negative results.

Contact IEEE to Subscribe

References

References is not available for this document.