Abstract:
Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based o...Show MoreMetadata
Abstract:
Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as nonmaximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes1.
Date of Conference: 27-30 June 2016
Date Added to IEEE Xplore: 12 December 2016
ISBN Information:
Electronic ISSN: 1063-6919
Stanford University, USA
Stanford University, USA
Stanford University, USA
Stanford University, USA