Loading [MathJax]/extensions/MathMenu.js
A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios | IEEE Journals & Magazine | IEEE Xplore

A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios


Abstract:

Retrieving specific persons with various types of queries, e.g., a set of attributes or a portrait photo has great application potential in large-scale intelligent survei...Show More

Abstract:

Retrieving specific persons with various types of queries, e.g., a set of attributes or a portrait photo has great application potential in large-scale intelligent surveillance systems. In this paper, we propose a richly annotated pedestrian (RAP) dataset which serves as a unified benchmark for both attribute-based and image-based person retrieval in real surveillance scenarios. Typically, previous datasets have three improvable aspects, including limited data scale and annotation types, heterogeneous data source, and controlled scenarios. Differently, RAP is a large-scale dataset which contains 84928 images with 72 types of attributes and additional tags of viewpoint, occlusion, body parts, and 2589 person identities. It is collected in the real uncontrolled scene and has complex visual variations in pedestrian samples due to the change of viewpoints, pedestrian postures, and cloth appearance. Towards a high-quality person retrieval benchmark, an amount of state-of-the-art algorithms on pedestrian attribute recognition and person re-identification (ReID), are performed for quantitative analysis with three evaluation tasks, i.e., attribute recognition, attribute-based and image-based person retrieval, where a new instance-based metric is proposed to measure the dependency of the prediction of multiple attributes. Finally, some interesting problems, e.g., the joint feature learning of attribute recognition and ReID, and the problem of cross-day person ReID, are explored to show the challenges and future directions in person retrieval.
Published in: IEEE Transactions on Image Processing ( Volume: 28, Issue: 4, April 2019)
Page(s): 1575 - 1590
Date of Publication: 26 October 2018

ISSN Information:

PubMed ID: 30371372

Funding Agency:


I. Introduction

Person retrieval with the querying conditions of specific visual attributes or portrait images is very useful in hunting criminal or terrorism suspects in large-scale surveillance scenarios. For example, describable person attributes can play the critical role in the retrieval of the two suspects in Boston marathon bombing event [1]. To complete the task, how to extract a “good” feature representation for the target person is a crucial and challenging problem due to the low image quality, the large variations of camera viewpoints, the large pose variations and occlusions in real unconstrained scenes. Although deep neural network based methods learning visual features from large-scale training samples have achieved a series of breakthroughs in various vision tasks, the used large-scale benchmark datasets e.g. ImageNet and COCO, are collected from Internet, which have intrinsic bias of cyberspace, e.g. selection bias, capture bias, and negative set bias [2]. Thus, for person retrieval in the domain of physical world, it is necessary to construct a large-scale and richly annotated pedestrian dataset for feature learning and algorithms evaluations. In this paper, considering two types of query modalities in person retrieval, i.e., image-based query and attribute-based query, as shown in Fig. 1, we collect a large-scale and richly annotated pedestrian (RAP) dataset as a unified benchmark for person retrieval in real visual surveillance scenarios.

The general framework for person retrieval based on different types of queries. The green and purple lines represent attribute-based and image-based queries, respectively.

Contact IEEE to Subscribe

References

References is not available for this document.