Abstract:
The task of crowd counting is to estimate the accurate number of people in photos taken from unconstrained surveillance scenes. It is in general a challenging problem due...Show MoreMetadata
Abstract:
The task of crowd counting is to estimate the accurate number of people in photos taken from unconstrained surveillance scenes. It is in general a challenging problem due to the input scale variations and perspective distortions. Previous methods make efforts to enhance the representation ability by using multi-scale features of the scene pictures. However, most of these methods directly add or fuse the features, in which the influences of different feature sizes are equally considered. In this paper, we propose a novel architecture called adaptive context learning network (ACLNet) to incorporate context of features in multiple levels. In this architecture, the original image features are enhanced by a multi-level feature generating module, and then the multi-level features are up-sampled to the same size and re-weighted for fusing. The ACLNet incorporates the context information existed in sub-regions of various scales adaptively, thus it is able to enhance the representative ability of multi-level features. We perform several experiments on public ShanghaiTech (A and B), UCF_CC_50 and NWPU-crowd datasets. Our proposed ACLNet achieves the state-of-the-art results compared with existing methods.
Date of Conference: 11-14 October 2020
Date Added to IEEE Xplore: 14 December 2020
ISBN Information: