Abstract:
Learning discriminative spatial-temporal feature representation and distance metric is crucial for video-based person re-identification. Most of current approaches direct...Show MoreMetadata
Abstract:
Learning discriminative spatial-temporal feature representation and distance metric is crucial for video-based person re-identification. Most of current approaches directly use the extracted feature vectors to compute similarity, while a single feature vector is not sufficient enough to overcome the noise caused by background clutters as well as larger variations in poses and viewpoints. To this end, we incorporate learning spatial-temporal feature representation and similarity measurement into a unified framework for video-based person re-identification. We propose a similarity measurement layer, which measures the implicit similarity of two video sequences in different regions. This strategy makes the network more robust to noise. Meanwhile, in order to alleviate the imbalance in the number of positive and negative samples, we propose a matching sampling loss to help training the similarity measurement layer. We extensively conduct comparative experiments on three challenging datasets iLIDS-VID, PRID-2011 and MARS. The experimental results demonstrate that the proposed approach can achieve favorable/superior performance compared with the state-of-the-art methods for the video-based person re-identification.
Date of Conference: 09-12 December 2018
Date Added to IEEE Xplore: 25 April 2019
ISBN Information:
Print on Demand(PoD) ISSN: 1018-8770