Abstract:
In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale infon...Show MoreMetadata
Abstract:
In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale infonnation mainly from features with different spatial sizes. Powerful multi-scale representations usually rely on the cascaded pyramid framework. This framework largely boosts the performance but in the meanwhile makes networks very deep and complex. Instead, we focus on exploiting multi-scale information from layers with different receptive-field sizes and then making full of use this infonnation by improving the fusion method. Specifically, we propose an orthogonal attention block (OAB) and a second-order fusion unit (SFU). The OAB learns multi-scale infonnation from different layers and enhances them by encouraging them to be diverse. The SFU adaptively selects and fuses diverse multi-scale infonnation and suppress the redundant ones. With the help of OAB and SFU, our networks could achieve comparable or even better accuracy with much smaller model complexity. Specifically, our DANet-72 achieves 71.0 in AP score on COCO val2017 with only 1.0G FLOPS. Its speed on a CPU platfonn achieves 58 Persons-Per-Second (PPS).
Date of Conference: 05-09 July 2021
Date Added to IEEE Xplore: 09 June 2021
ISBN Information: